Commit Graph

43 Commits

Author SHA1 Message Date
Rafael dos Santos Silva 791fad1e6a
FEATURE: Index embeddings using bit vectors (#824)
On very large sites, the rare cache misses for Related Topics can take around 200ms, which affects our p99 metric on the topic page. In order to mitigate this impact, we now have several tools at our disposal.

First, one is to migrate the index embedding type from halfvec to bit and change the related topic query to leverage the new bit index by changing the search algorithm from inner product to Hamming distance. This will reduce our index sizes by 90%, severely reducing the impact of embeddings on our storage. By making the related query a bit smarter, we can have zero impact on recall by using the index to over-capture N*2 results, then re-ordering those N*2 using the full halfvec vectors and taking the top N. The expected impact is to go from 200ms to <20ms for cache misses and from a 2.5GB index to a 250MB index on a large site.

Another tool is migrating our index type from IVFFLAT to HNSW, which can increase the cache misses performance even further, eventually putting us in the under 5ms territory. 

Co-authored-by: Roman Rizzi <roman@discourse.org>
2024-10-14 13:26:03 -03:00
Mark VanLandingham 52d90cf1bc
DEV: Add apply_modifier for SemanticTopicQuery topics list (#830) 2024-10-10 12:13:16 -05:00
Sam 03eccbe392
FEATURE: Make tool support polymorphic (#798)
Polymorphic RAG means that we will be able to access RAG fragments both from AiPersona and AiCustomTool

In turn this gives us support for richer RAG implementations.
2024-09-16 08:17:17 +10:00
Sam 584753cf60
FIX: we were never reindexing old content (#786)
* FIX: we were never reindexing old content

Embedding backfill contains logic for searching for old content
change and then backfilling.

Unfortunately it was excluding all topics that had embedding
unconditionally, leading to no backfill ever happening.


This change adds a test and ensures we backfill.

* over select results, this ensures we will be more likely to find
ai results when filtered
2024-08-30 14:37:55 +10:00
Sam eee8e72756
FEATURE: API scope for semantic search (#785)
The new API scope allows restricting access to semantic search
only.
2024-08-30 09:35:20 +10:00
Sam 0687ec75c3
FEATURE: allow embedding based search without hyde (#777)
This allows callers of embedding based search to bypass hyde.

Hyde will expand the search term using an LLM, but if an LLM is
performing the search we can skip this expansion.

It also introduced some tests for the controller which we did not have
2024-08-28 14:17:34 +10:00
Rafael dos Santos Silva 1686a8a683
DEV: Move to single table per embeddings type (#561)
Also move us to halfvecs for speed and disk usage gains
2024-08-08 11:55:20 -03:00
Loïc Guitaut dd4e305ff7
DEV: Update rubocop-discourse to version 3.8.0 (#641) 2024-05-28 11:15:42 +02:00
Sam 8eee6893d6
FEATURE: GPT4o support and better auditing (#618)
- Introduce new support for GPT4o (automation / bot / summary / helper)
- Properly account for token counts on OpenAI models
- Track feature that was used when generating AI completions
- Remove custom llm support for summarization as we need better interfaces to control registration and de-registration
2024-05-14 13:28:46 +10:00
Rafael dos Santos Silva ab4544d897
DEV: Mark hypothetical_post_from method public (#607) 2024-05-07 15:17:26 -03:00
Bianca Nenciu 3e54697c5a
FIX: Load categories for related topics (#570)
This is necessary when "lazy load categories" feature is enabled to
make sure the categories are rendered for all related topics.
2024-04-15 09:31:07 +10:00
Rafael dos Santos Silva eb93b21769
FEATURE: Add BGE-M3 embeddings support (#569)
BAAI/bge-m3 is an interesting model, that is multilingual and with a
context size of 8192. Even with a 16x larger context, it's only 4x slower
to compute it's embeddings on the worst case scenario.

Also includes a minor refactor of the rake task, including setting model
and concurrency levels when running the backfill task.
2024-04-10 17:24:01 -03:00
Roman Rizzi 1f1c94e5c6
FEATURE: AI Bot RAG support. (#537)
This PR lets you associate uploads to an AI persona, which we'll split and generate embeddings from. When building the system prompt to get a bot reply, we'll do a similarity search followed by a re-ranking (if available). This will let us find the most relevant fragments from the body of knowledge you associated with the persona, resulting in better, more informed responses.

For now, we'll only allow plain-text files, but this will change in the future.

Commits:

* FEATURE: RAG embeddings for the AI Bot

This first commit introduces a UI where admins can upload text files, which we'll store, split into fragments,
and generate embeddings of. In a next commit, we'll use those to give the bot additional information during
conversations.

* Basic asymmetric similarity search to provide guidance in system prompt

* Fix tests and lint

* Apply reranker to fragments

* Uploads filter, css adjustments and file validations

* Add placeholder for rag fragments

* Update annotations
2024-04-01 13:43:34 -03:00
Rafael dos Santos Silva b327313115
DEV: Fix module namespace breaking reloads (#530) 2024-03-14 15:19:28 -03:00
Keegan George b515b4f66d
FEATURE: AI Quick Semantic Search (#501)
This PR adds AI semantic search to the search pop available on every page.

It depends on several new and optional settings, like per post embeddings and a reranker model, so this is an experimental endeavour.


---------

Co-authored-by: Rafael Silva <xfalcox@gmail.com>
2024-03-08 13:02:50 -03:00
Rafael dos Santos Silva a1f1067f69
FIX: Lower truncation size for Gemini Embeddings (#493) 2024-02-28 08:52:53 +11:00
Rafael dos Santos Silva 59fbbb156b
DEV: Make indexing less frequent when related topics is disabled (#468) 2024-02-09 16:08:54 -03:00
Sam ba3c3951cf
FIX: typo causing text_embedding_3_large to fail (#460) 2024-02-05 11:16:36 +11:00
Roman Rizzi fba9c1bf2c
UX: Re-introduce embedding settings validations (#457)
* Revert "Revert "UX: Validate embeddings settings (#455)" (#456)"

This reverts commit 392e2e8aef.

* Resstore previous default
2024-02-01 16:54:09 -03:00
Roman Rizzi 392e2e8aef
Revert "UX: Validate embeddings settings (#455)" (#456)
This reverts commit 85fca89e01.
2024-02-01 14:06:51 -03:00
Roman Rizzi 85fca89e01
UX: Validate embeddings settings (#455) 2024-02-01 13:05:38 -03:00
Sam dcafc8032f
FIX: improve embedding generation (#452)
1. on failure we were queuing a job to generate embeddings, it had the wrong params. This is both fixed and covered in a test.
2. backfill embedding in the order of bumped_at, so newest content is embedded first, cover with a test
3. add a safeguard for hidden site setting that only allows batches of 50k in an embedding job run

Previously old embeddings were updated in a random order, this changes it so we update in a consistent order
2024-01-31 10:38:47 -03:00
Rafael dos Santos Silva b41c5cc31c
FIX: Add table name to remove ambiguous column reference in SQL (#449) 2024-01-30 15:50:26 -03:00
Sam b2b01185f2
FEATURE: add support for new OpenAI embedding models (#445)
* FEATURE: add support for new OpenAI embedding models

This adds support for just released text_embedding_3_small and large

Note, we have not yet implemented truncation support which is a
new API feature. (triggered using dimensions)

* Tiny side fix, recalc bots when ai is enabled or disabled

* FIX: downsample to 2000 items per vector which is a pgvector limitation
2024-01-29 13:24:30 -03:00
Rafael dos Santos Silva fa6bc7f409
FIX: Automatic embeddings index could fail if it existed in the backup schema (#441) 2024-01-24 15:57:26 -03:00
Rafael dos Santos Silva 04bc402aae
FEATURE: Setting to control per post embeddings (#439)
* FEATURE: Setting to control per post embeddings
2024-01-23 22:09:27 -03:00
Roman Rizzi 04eae76f68
REFACTOR: Represent generic prompts with an Object. (#416)
* REFACTOR: Represent generic prompts with an Object.

* Adds a bit more validation for clarity

* Rewrite bot title prompt and fix quirk handling

---------

Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
2024-01-12 14:36:44 -03:00
Rafael dos Santos Silva 705ef986b4
FIX: Set ivfflat.probes using topic count, not post count (#421)
Fixes a regression from 140359c which caused we to set this globally based on post count, rendering the cost of an index scan on the topics table too high and making the planner, correctly, not use the index anymore.

Hopefully https://github.com/pgvector/pgvector/issues/235 lands soon.
2024-01-12 11:20:23 -03:00
Martin Brennan 37b957dbbb
DEV: Fix SemanticRelated module load error (#419)
Followup 2636efcd1b,
whenever ruby code was changed locally this would break
module loading, giving an "uninitialized constant
DiscourseAi::Embeddings::EntryPoint::SemanticRelated" error.
2024-01-11 13:52:50 +10:00
Rafael dos Santos Silva 8fcba12fae
FEATURE: Support for SRV records for Discourse services (#414)
This allows admins to configure services with multiple backends using DNS SRV records. This PR also adds support for shared secret auth via headers for TEI and vLLM endpoints, so they are inline with the other ones.
2024-01-10 19:23:07 -03:00
Rafael dos Santos Silva 23b2809638
FEATURE: Generate proper embeddings for posts/topics with embedded content (#401) 2024-01-05 10:27:45 -03:00
Rafael dos Santos Silva 6fc1c9f7a6
FEATURE: Try to automatically handle larger embedding indexes (#403)
* FEATURE: Try to automatically handle larger embedding indexes

* linteeeeeeeer
2024-01-05 09:56:28 -03:00
Sam 03fc94684b
FIX: AI helper not working correctly with mixtral (#399)
* FIX: AI helper not working correctly with mixtral

This PR introduces a new function on the generic llm called #generate

This will replace the implementation of completion!

#generate introduces a new way to pass temperature, max_tokens and stop_sequences

Then LLM implementers need to implement #normalize_model_params to
ensure the generic names match the LLM specific endpoint

This also adds temperature and stop_sequences to completion_prompts
this allows for much more robust completion prompts

* port everything over to #generate

* Fix translation

- On anthropic this no longer throws random "This is your translation:"
- On mixtral this actually works

* fix markdown table generation as well
2024-01-04 09:53:47 -03:00
Rafael dos Santos Silva cec9bb8910
FIX: Skip embeddings for blank content (#392) 2023-12-29 14:59:08 -03:00
Rafael dos Santos Silva c778592da4
FIX: Corner cases on post embedding and crawler related (#391)
* FIX: Simplify markup for crawler related

* FIX: Handle title-less topics when truncating for embeddings

* fix
2023-12-29 14:05:02 -03:00
Rafael dos Santos Silva 140359c2ef
FEATURE: Per post embeddings (#387) 2023-12-29 12:28:45 -03:00
Rafael dos Santos Silva 2636efcd1b
FEATURE: Render Related Topics for Crawlers (#386) 2023-12-28 15:32:03 -03:00
Rafael dos Santos Silva 1287ef4428
FEATURE: Support for Gemini Embeddings (#382) 2023-12-28 10:28:01 -03:00
Rafael dos Santos Silva 4d7ccdda2f
FEATURE: DNS SRV support for TEI (#363) 2023-12-18 13:21:21 -03:00
Rafael dos Santos Silva 381b0d74ca
FIX: Handle truncation in HyDE search (#342) 2023-12-07 10:36:56 -03:00
Rafael dos Santos Silva c8352f21ce
FIX: Fallback to whole LLM response when XML fail (#340) 2023-12-06 18:58:26 -03:00
Sam a0b9fb9721
FIX: explicitly load embedding strategies (#325)
If not, sometimes during tests these constants may not be loaded
leading to flaky tests
2023-11-29 16:36:56 +11:00
Sam 6ddc17fd61
DEV: port directory structure to Zeitwerk (#319)
Previous to this change we relied on explicit loading for a files in Discourse AI.

This had a few downsides:

- Busywork whenever you add a file (an extra require relative)
- We were not keeping to conventions internally ... some places were OpenAI others are OpenAi
- Autoloader did not work which lead to lots of full application broken reloads when developing.

This moves all of DiscourseAI into a Zeitwerk compatible structure.

It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there)

To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi

Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections

Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.
2023-11-29 15:17:46 +11:00