discourse-ai

Commit Graph

Author	SHA1	Message	Date
Rafael dos Santos Silva	eb93b21769	FEATURE: Add BGE-M3 embeddings support (#569 ) BAAI/bge-m3 is an interesting model, that is multilingual and with a context size of 8192. Even with a 16x larger context, it's only 4x slower to compute it's embeddings on the worst case scenario. Also includes a minor refactor of the rake task, including setting model and concurrency levels when running the backfill task.	2024-04-10 17:24:01 -03:00
Rafael dos Santos Silva	3b8f900486	FIX: Handle unicode on tokenizer (#515 ) * FIX: Handle unicode on tokenizer Our fast track code broke when strings had characters who are longer in tokens than in UTF-8. Admins can set `DISCOURSE_AI_STRICT_TOKEN_COUNTING: true` in app.yml to ensure token counting is strict, even if slower. Co-authored-by: wozulong <sidle.pax_0e@icloud.com>	2024-03-14 17:33:30 -03:00
Rafael dos Santos Silva	84cc369552	FEATURE: Bge-large-en embeddings via Cloudflare Workers AI API (#241 ) * FEATURE: Bge-large-en embeddings via Cloudflare Workers AI API * forgot a file * lint	2023-10-04 13:47:51 -03:00
Rafael dos Santos Silva	3e7c99de89	FEATURE: Support for locally infered embeddings in 100 languages (#115 ) * FEATURE: Support for locally infered embeddings in 100 languages * add table	2023-07-27 15:50:03 -03:00
Rafael dos Santos Silva	b25daed60b	FEATURE: Llama2 for summarization (#116 )	2023-07-27 13:55:32 -03:00
Rafael dos Santos Silva	b82074850e	DEV: Add tests to allmpnet tokenizer (#107 ) * DEV: Add tests to allmpnet tokenizer * lint	2023-07-14 11:37:21 -03:00

6 Commits