Commit Graph

8 Commits

Author SHA1 Message Date
Roman Rizzi 7e3a543f6f
FEATURE: Double gist length to 40 words (#888) 2024-11-01 13:09:03 -03:00
Roman Rizzi e8f0633141
DEV: Extend truncation to all summarizable content (#884) 2024-10-31 12:17:42 -03:00
Roman Rizzi e8eed710e0
FIX: Truncate OP for gists to help the model focus on the latest posts (#883) 2024-10-31 10:54:56 -03:00
Roman Rizzi dd404c924a
DEV: Use different feature_names for summarization strategies (#875) 2024-10-29 08:45:14 -03:00
Rafael dos Santos Silva 33da27e231
FIX: Change hot gist prompt to avoid title repeating #859 (#859)
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
2024-10-25 12:12:33 -03:00
Roman Rizzi ec97996905
FIX/REFACTOR: FoldContent revamp (#866)
* FIX/REFACTOR: FoldContent revamp

We hit a snag with our hot topic gist strategy: the regex we used to split the content didn't work, so we cannot send the original post separately. This was important for letting the model focus on what's new in the topic.

The algorithm doesn’t give us full control over how prompts are written, and figuring out how to format the content isn't straightforward. This means we're having to use more complicated workarounds, like regex.

To tackle this, I'm suggesting we simplify the approach a bit. Let's focus on summarizing as much as we can upfront, then gradually add new content until there's nothing left to summarize.

Also, the "extend" part is mostly for models with small context windows, which shouldn't pose a problem 99% of the time with the content volume we're dealing with.

* Fix fold docs

* Use #shift instead of #pop to get the first elem, not the last
2024-10-25 11:51:17 -03:00
Roman Rizzi 3533814870
UX: Avoid introductory phrases and summarize topics without replies (#848) 2024-10-21 17:53:48 -03:00
Roman Rizzi 27b5542357
FEATURE: Generate topic gists for the hot topics list. (#837)
* Display gists in the hot topics list

* Adjust hot topics gist strategy and add a job to generate gists

* Replace setting with a configurable batch size

* Avoid loading summaries for other topic lists

* Tweak gist prompt to focus on latest posts in the context of the OP

* Remove serializer hack and rely on core change from discourse/discourse#29291

* Update lib/summarization/strategies/hot_topic_gists.rb

Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>

---------

Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
2024-10-18 18:01:39 -03:00