3b8f900486
* FIX: Handle unicode on tokenizer Our fast track code broke when strings had characters who are longer in tokens than in UTF-8. Admins can set `DISCOURSE_AI_STRICT_TOKEN_COUNTING: true` in app.yml to ensure token counting is strict, even if slower. Co-authored-by: wozulong <sidle.pax_0e@icloud.com> |
||
---|---|---|
.. | ||
all_mpnet_base_v2_tokenizer.rb | ||
anthropic_tokenizer.rb | ||
basic_tokenizer.rb | ||
bert_tokenizer.rb | ||
bge_large_en_tokenizer.rb | ||
llama2_tokenizer.rb | ||
mixtral_tokenizer.rb | ||
multilingual_e5_large_tokenizer.rb | ||
open_ai_tokenizer.rb |