mirror of
https://github.com/discourse/discourse-ai.git
synced 2025-02-17 17:04:48 +00:00
The tokenizer was truncating and padding to 128 tokens, and we try append new post content until we hit 384 tokens. This was causing the tokenizer to accept all posts in a topic, wasting CPU and memory.
bert-base-uncased.json
Licensed under Apache License
claude-v1-tokenization.json
Licensed under MIT License
all-mpnet-base-v2.json
Licensed under Apache License