9 Commits

Author SHA1 Message Date
Roman Rizzi
4784e7fe43
FIX: Set default for existing records. (#1073)
We'll later copy the correct value from content_range. 1 should be the min highest post number a topic has.
2025-01-16 10:38:53 -03:00
Roman Rizzi
46fcdb6ba5
FIX: Make summaries backfill job more resilient. (#1071)
To quickly select backfill candidates without comparing SHAs, we compare the last summarized post to the topic's highest_post_number. However, hiding or deleting a post and adding a small action will update this column, causing the job to stall and re-generate the same summary repeatedly until someone posts a regular reply. On top of this, this is not always true for topics with `best_replies`, as this last reply isn't necessarily included.

Since this is not evident at first glance and each summarization strategy picks its targets differently, I'm opting to simplify the backfill logic and how we track potential candidates.

The first step is dropping `content_range`, which serves no purpose and it's there because summary caching was supposed to work differently at the beginning. So instead, I'm replacing it with a column called `highest_target_number`, which tracks `highest_post_number` for topics and could track other things like channel's `message_count` in the future.

Now that we have this column when selecting every potential backfill candidate, we'll check if the summary is truly outdated by comparing the SHAs, and if it's not, we just update the column and move on
2025-01-16 09:42:53 -03:00
Rafael dos Santos Silva
4980a4b2f7
FIX: Multiple concurrent summaries could result in pg index errors (#973) 2024-11-28 11:53:04 -03:00
Roman Rizzi
95762723de
PERF: Preload only gists when including summaries in topic list (#948)
* PERF: Preload only gists when including summaries in topic list

* Add unique index on summaries and dedup existing records

* Make hot topics batch size setting hidden
2024-11-25 12:24:02 -03:00
Roman Rizzi
9505a8976c
FEATURE: Automatically backfill regular summaries. (#892)
This change introduces a job to summarize topics and cache the results automatically. We provide a setting to control how many topics we'll backfill per hour and what the topic's minimum word count is to qualify.

We'll prioritize topics without summary over outdated ones.
2024-11-04 17:48:11 -03:00
David Taylor
945f04b089
DEV: Update plugin annotations (#871) 2024-10-28 14:07:09 +00:00
Roman Rizzi
c7acb4a6a0
REFACTOR: Support of different summarization targets/prompts. (#835)
* DEV: Add summary types

* Refactor for different summary types

* Use enum for summary types

* Update lib/summarization/strategies/topic_summary.rb

Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>

* Update lib/summarization/strategies/topic_gist.rb

Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>

* Update lib/summarization/strategies/chat_messages.rb

Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>

* Fix chat_messages single prompt

* Small tweak to the chat summarization prompt

---------

Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>
2024-10-15 13:53:26 -03:00
Sam
38153608f8
FIX: repair id sequence identity on summary table (#701)
1. Repairs the identity on the summary table, we migrated data without resetting it.
2. Adds an index into ai_summary table to match expected retrieval pattern
2024-07-04 12:23:46 +10:00
Keegan George
1b0ba9197c
DEV: Add summarization logic from core (#658) 2024-07-02 08:51:59 -07:00