d692ecc7de
The tokenizer was truncating and padding to 128 tokens, and we try append new post content until we hit 384 tokens. This was causing the tokenizer to accept all posts in a topic, wasting CPU and memory. |
||
---|---|---|
.. | ||
Apache License | ||
MIT License | ||
README.md | ||
all-mpnet-base-v2.json | ||
bert-base-uncased.json | ||
claude-v1-tokenization.json |
README.md
bert-base-uncased.json
Licensed under Apache License
claude-v1-tokenization.json
Licensed under MIT License
all-mpnet-base-v2.json
Licensed under Apache License