discourse-ai

History

Rafael dos Santos Silva eb93b21769 FEATURE: Add BGE-M3 embeddings support (#569 ) BAAI/bge-m3 is an interesting model, that is multilingual and with a context size of 8192. Even with a 16x larger context, it's only 4x slower to compute it's embeddings on the worst case scenario. Also includes a minor refactor of the rake task, including setting model and concurrency levels when running the backfill task.		2024-04-10 17:24:01 -03:00
..
all_mpnet_base_v2_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
anthropic_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
basic_tokenizer.rb	FIX: Handle unicode on tokenizer (#515 )	2024-03-14 17:33:30 -03:00
bert_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
bge_large_en_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
bge_m3_tokenizer.rb	FEATURE: Add BGE-M3 embeddings support (#569 )	2024-04-10 17:24:01 -03:00
llama2_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
mixtral_tokenizer.rb	Mixtral (#376 )	2023-12-26 14:49:55 -03:00
multilingual_e5_large_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
open_ai_tokenizer.rb	FIX: Handle unicode on tokenizer (#515 )	2024-03-14 17:33:30 -03:00