discourse-ai/lib
Roman Rizzi ec97996905
FIX/REFACTOR: FoldContent revamp (#866)
* FIX/REFACTOR: FoldContent revamp

We hit a snag with our hot topic gist strategy: the regex we used to split the content didn't work, so we cannot send the original post separately. This was important for letting the model focus on what's new in the topic.

The algorithm doesn’t give us full control over how prompts are written, and figuring out how to format the content isn't straightforward. This means we're having to use more complicated workarounds, like regex.

To tackle this, I'm suggesting we simplify the approach a bit. Let's focus on summarizing as much as we can upfront, then gradually add new content until there's nothing left to summarize.

Also, the "extend" part is mostly for models with small context windows, which shouldn't pose a problem 99% of the time with the content volume we're dealing with.

* Fix fold docs

* Use #shift instead of #pop to get the first elem, not the last
2024-10-25 11:51:17 -03:00
..
ai_bot FIX: testing tool was not showing rag results (#867) 2024-10-25 16:01:25 +11:00
ai_helper FIX: Basic cleanup of AI Caption to remove line breaks and pipes (#857) 2024-10-23 18:38:29 -03:00
automation FEATURE: better logging for automation reports (#853) 2024-10-23 16:49:56 +11:00
completions FIX: Llm selector / forced tools / search tool (#862) 2024-10-25 06:24:53 +11:00
configuration FEATURE: improve visibility of AI usage in LLM page (#845) 2024-10-22 11:16:02 +11:00
database DEV: port directory structure to Zeitwerk (#319) 2023-11-29 15:17:46 +11:00
discord/bot FEATURE: Discord Bot integration (#831) 2024-10-16 12:41:18 -03:00
embeddings FIX: testing tool was not showing rag results (#867) 2024-10-25 16:01:25 +11:00
inference FIX: api key header error (#839) 2024-10-16 15:57:36 -03:00
nsfw DEV: Fix various typos (#434) 2024-01-19 12:51:26 +01:00
sentiment UX: Use stacked line chart for post sentiment (#737) 2024-08-02 14:23:29 -07:00
summarization FIX/REFACTOR: FoldContent revamp (#866) 2024-10-25 11:51:17 -03:00
tasks/modules FEATURE: Index embeddings using bit vectors (#824) 2024-10-14 13:26:03 -03:00
tokenizer FIX/REFACTOR: FoldContent revamp (#866) 2024-10-25 11:51:17 -03:00
toxicity FIX: Truncate content for sentiment/toxicity classification (#431) 2024-01-17 15:17:58 -03:00
utils FEATURE: Add basic connection check to DNS SRV resources (#563) 2024-04-12 10:39:19 -03:00
automation.rb FEATURE: allow llm triage to automatically hide posts (#820) 2024-10-04 16:11:30 +10:00
chat_message_classificator.rb DEV: port directory structure to Zeitwerk (#319) 2023-11-29 15:17:46 +11:00
classificator.rb DEV: port directory structure to Zeitwerk (#319) 2023-11-29 15:17:46 +11:00
engine.rb DEV: port directory structure to Zeitwerk (#319) 2023-11-29 15:17:46 +11:00
guardian_extensions.rb FEATURE: Make hot topic gists opt-in. (#846) 2024-10-21 15:15:25 -03:00
multisite_hash.rb FIX: properly cache user locale (#593) 2024-04-26 09:28:35 -03:00
post_classificator.rb DEV: port directory structure to Zeitwerk (#319) 2023-11-29 15:17:46 +11:00
summarization.rb FEATURE: Generate topic gists for the hot topics list. (#837) 2024-10-18 18:01:39 -03:00
topic_extensions.rb FEATURE: Generate topic gists for the hot topics list. (#837) 2024-10-18 18:01:39 -03:00