This change introduces a job to summarize topics and cache the results automatically. We provide a setting to control how many topics we'll backfill per hour and what the topic's minimum word count is to qualify.
We'll prioritize topics without summary over outdated ones.
* FIX: we were never reindexing old content
Embedding backfill contains logic for searching for old content
change and then backfilling.
Unfortunately it was excluding all topics that had embedding
unconditionally, leading to no backfill ever happening.
This change adds a test and ensures we backfill.
* over select results, this ensures we will be more likely to find
ai results when filtered
1. on failure we were queuing a job to generate embeddings, it had the wrong params. This is both fixed and covered in a test.
2. backfill embedding in the order of bumped_at, so newest content is embedded first, cover with a test
3. add a safeguard for hidden site setting that only allows batches of 50k in an embedding job run
Previously old embeddings were updated in a random order, this changes it so we update in a consistent order