Rafael dos Santos Silva d692ecc7de
FIX: Disable truncation and padding in all-mpnet-base-v2 tokenizer (#105)
The tokenizer was truncating and padding to 128 tokens, and we try append
new post content until we hit 384 tokens. This was causing the tokenizer
to accept all posts in a topic, wasting CPU and memory.
2023-07-13 21:09:46 -03:00
..
2023-07-13 12:41:36 -03:00

bert-base-uncased.json

Licensed under Apache License

claude-v1-tokenization.json

Licensed under MIT License

all-mpnet-base-v2.json

Licensed under Apache License