discourse-ai

Commit Graph

Author	SHA1	Message	Date
Rafael dos Santos Silva	791fad1e6a	FEATURE: Index embeddings using bit vectors (#824 ) On very large sites, the rare cache misses for Related Topics can take around 200ms, which affects our p99 metric on the topic page. In order to mitigate this impact, we now have several tools at our disposal. First, one is to migrate the index embedding type from halfvec to bit and change the related topic query to leverage the new bit index by changing the search algorithm from inner product to Hamming distance. This will reduce our index sizes by 90%, severely reducing the impact of embeddings on our storage. By making the related query a bit smarter, we can have zero impact on recall by using the index to over-capture N2 results, then re-ordering those N2 using the full halfvec vectors and taking the top N. The expected impact is to go from 200ms to <20ms for cache misses and from a 2.5GB index to a 250MB index on a large site. Another tool is migrating our index type from IVFFLAT to HNSW, which can increase the cache misses performance even further, eventually putting us in the under 5ms territory. Co-authored-by: Roman Rizzi <roman@discourse.org>	2024-10-14 13:26:03 -03:00
Sam	03eccbe392	FEATURE: Make tool support polymorphic (#798 ) Polymorphic RAG means that we will be able to access RAG fragments both from AiPersona and AiCustomTool In turn this gives us support for richer RAG implementations.	2024-09-16 08:17:17 +10:00
Sam	584753cf60	FIX: we were never reindexing old content (#786 ) * FIX: we were never reindexing old content Embedding backfill contains logic for searching for old content change and then backfilling. Unfortunately it was excluding all topics that had embedding unconditionally, leading to no backfill ever happening. This change adds a test and ensures we backfill. * over select results, this ensures we will be more likely to find ai results when filtered	2024-08-30 14:37:55 +10:00
Rafael dos Santos Silva	1686a8a683	DEV: Move to single table per embeddings type (#561 ) Also move us to halfvecs for speed and disk usage gains	2024-08-08 11:55:20 -03:00
Rafael dos Santos Silva	eb93b21769	FEATURE: Add BGE-M3 embeddings support (#569 ) BAAI/bge-m3 is an interesting model, that is multilingual and with a context size of 8192. Even with a 16x larger context, it's only 4x slower to compute it's embeddings on the worst case scenario. Also includes a minor refactor of the rake task, including setting model and concurrency levels when running the backfill task.	2024-04-10 17:24:01 -03:00
Roman Rizzi	1f1c94e5c6	FEATURE: AI Bot RAG support. (#537 ) This PR lets you associate uploads to an AI persona, which we'll split and generate embeddings from. When building the system prompt to get a bot reply, we'll do a similarity search followed by a re-ranking (if available). This will let us find the most relevant fragments from the body of knowledge you associated with the persona, resulting in better, more informed responses. For now, we'll only allow plain-text files, but this will change in the future. Commits: * FEATURE: RAG embeddings for the AI Bot This first commit introduces a UI where admins can upload text files, which we'll store, split into fragments, and generate embeddings of. In a next commit, we'll use those to give the bot additional information during conversations. * Basic asymmetric similarity search to provide guidance in system prompt * Fix tests and lint * Apply reranker to fragments * Uploads filter, css adjustments and file validations * Add placeholder for rag fragments * Update annotations	2024-04-01 13:43:34 -03:00
Keegan George	b515b4f66d	FEATURE: AI Quick Semantic Search (#501 ) This PR adds AI semantic search to the search pop available on every page. It depends on several new and optional settings, like per post embeddings and a reranker model, so this is an experimental endeavour. --------- Co-authored-by: Rafael Silva <xfalcox@gmail.com>	2024-03-08 13:02:50 -03:00
Rafael dos Santos Silva	a1f1067f69	FIX: Lower truncation size for Gemini Embeddings (#493 )	2024-02-28 08:52:53 +11:00
Rafael dos Santos Silva	59fbbb156b	DEV: Make indexing less frequent when related topics is disabled (#468 )	2024-02-09 16:08:54 -03:00
Sam	ba3c3951cf	FIX: typo causing text_embedding_3_large to fail (#460 )	2024-02-05 11:16:36 +11:00
Roman Rizzi	fba9c1bf2c	UX: Re-introduce embedding settings validations (#457 ) * Revert "Revert "UX: Validate embeddings settings (#455)" (#456)" This reverts commit `392e2e8aef`. * Resstore previous default	2024-02-01 16:54:09 -03:00
Roman Rizzi	392e2e8aef	Revert "UX: Validate embeddings settings (#455 )" (#456 ) This reverts commit `85fca89e01`.	2024-02-01 14:06:51 -03:00
Roman Rizzi	85fca89e01	UX: Validate embeddings settings (#455 )	2024-02-01 13:05:38 -03:00
Sam	b2b01185f2	FEATURE: add support for new OpenAI embedding models (#445 ) * FEATURE: add support for new OpenAI embedding models This adds support for just released text_embedding_3_small and large Note, we have not yet implemented truncation support which is a new API feature. (triggered using dimensions) * Tiny side fix, recalc bots when ai is enabled or disabled * FIX: downsample to 2000 items per vector which is a pgvector limitation	2024-01-29 13:24:30 -03:00
Rafael dos Santos Silva	fa6bc7f409	FIX: Automatic embeddings index could fail if it existed in the backup schema (#441 )	2024-01-24 15:57:26 -03:00
Rafael dos Santos Silva	705ef986b4	FIX: Set ivfflat.probes using topic count, not post count (#421 ) Fixes a regression from `140359c` which caused we to set this globally based on post count, rendering the cost of an index scan on the topics table too high and making the planner, correctly, not use the index anymore. Hopefully https://github.com/pgvector/pgvector/issues/235 lands soon.	2024-01-12 11:20:23 -03:00
Rafael dos Santos Silva	8fcba12fae	FEATURE: Support for SRV records for Discourse services (#414 ) This allows admins to configure services with multiple backends using DNS SRV records. This PR also adds support for shared secret auth via headers for TEI and vLLM endpoints, so they are inline with the other ones.	2024-01-10 19:23:07 -03:00
Rafael dos Santos Silva	6fc1c9f7a6	FEATURE: Try to automatically handle larger embedding indexes (#403 ) * FEATURE: Try to automatically handle larger embedding indexes * linteeeeeeeer	2024-01-05 09:56:28 -03:00
Rafael dos Santos Silva	cec9bb8910	FIX: Skip embeddings for blank content (#392 )	2023-12-29 14:59:08 -03:00
Rafael dos Santos Silva	140359c2ef	FEATURE: Per post embeddings (#387 )	2023-12-29 12:28:45 -03:00
Rafael dos Santos Silva	1287ef4428	FEATURE: Support for Gemini Embeddings (#382 )	2023-12-28 10:28:01 -03:00
Rafael dos Santos Silva	4d7ccdda2f	FEATURE: DNS SRV support for TEI (#363 )	2023-12-18 13:21:21 -03:00
Rafael dos Santos Silva	381b0d74ca	FIX: Handle truncation in HyDE search (#342 )	2023-12-07 10:36:56 -03:00
Sam	a0b9fb9721	FIX: explicitly load embedding strategies (#325 ) If not, sometimes during tests these constants may not be loaded leading to flaky tests	2023-11-29 16:36:56 +11:00
Sam	6ddc17fd61	DEV: port directory structure to Zeitwerk (#319 ) Previous to this change we relied on explicit loading for a files in Discourse AI. This had a few downsides: - Busywork whenever you add a file (an extra require relative) - We were not keeping to conventions internally ... some places were OpenAI others are OpenAi - Autoloader did not work which lead to lots of full application broken reloads when developing. This moves all of DiscourseAI into a Zeitwerk compatible structure. It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there) To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.	2023-11-29 15:17:46 +11:00

25 Commits