discourse-ai

History

Rafael dos Santos Silva 3b8f900486 FIX: Handle unicode on tokenizer (#515 ) * FIX: Handle unicode on tokenizer Our fast track code broke when strings had characters who are longer in tokens than in UTF-8. Admins can set `DISCOURSE_AI_STRICT_TOKEN_COUNTING: true` in app.yml to ensure token counting is strict, even if slower. Co-authored-by: wozulong <sidle.pax_0e@icloud.com>		2024-03-14 17:33:30 -03:00
..
all_mpnet_base_v2_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
anthropic_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
basic_tokenizer.rb	FIX: Handle unicode on tokenizer (#515 )	2024-03-14 17:33:30 -03:00
bert_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
bge_large_en_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
llama2_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
mixtral_tokenizer.rb	Mixtral (#376 )	2023-12-26 14:49:55 -03:00
multilingual_e5_large_tokenizer.rb	DEV: port directory structure to Zeitwerk (#319 )	2023-11-29 15:17:46 +11:00
open_ai_tokenizer.rb	FIX: Handle unicode on tokenizer (#515 )	2024-03-14 17:33:30 -03:00