discourse-ai/tokenizers
Rafael dos Santos Silva d692ecc7de
FIX: Disable truncation and padding in all-mpnet-base-v2 tokenizer (#105)
The tokenizer was truncating and padding to 128 tokens, and we try append
new post content until we hit 384 tokens. This was causing the tokenizer
to accept all posts in a topic, wasting CPU and memory.
2023-07-13 21:09:46 -03:00
..
Apache License Refinements to embeddings and tokenizers (#61) 2023-05-15 15:10:42 -03:00
MIT License Refinements to embeddings and tokenizers (#61) 2023-05-15 15:10:42 -03:00
README.md FEATURE: Embeddings to main db (#99) 2023-07-13 12:41:36 -03:00
all-mpnet-base-v2.json FIX: Disable truncation and padding in all-mpnet-base-v2 tokenizer (#105) 2023-07-13 21:09:46 -03:00
bert-base-uncased.json FEATURE: Add a basic tokenizer API (#37) 2023-04-19 11:55:59 -03:00
claude-v1-tokenization.json Refinements to embeddings and tokenizers (#61) 2023-05-15 15:10:42 -03:00

README.md

bert-base-uncased.json

Licensed under Apache License

claude-v1-tokenization.json

Licensed under MIT License

all-mpnet-base-v2.json

Licensed under Apache License