discourse-ai

mirror of https://github.com/discourse/discourse-ai.git synced 2025-02-17 17:04:48 +00:00

History

Rafael dos Santos Silva d692ecc7de

FIX: Disable truncation and padding in all-mpnet-base-v2 tokenizer (#105 )

The tokenizer was truncating and padding to 128 tokens, and we try append
new post content until we hit 384 tokens. This was causing the tokenizer
to accept all posts in a topic, wasting CPU and memory.

2023-07-13 21:09:46 -03:00

all-mpnet-base-v2.json

FIX: Disable truncation and padding in all-mpnet-base-v2 tokenizer (#105 )

2023-07-13 21:09:46 -03:00

Apache License

Refinements to embeddings and tokenizers (#61 )

2023-05-15 15:10:42 -03:00

bert-base-uncased.json

FEATURE: Add a basic tokenizer API (#37 )

2023-04-19 11:55:59 -03:00

claude-v1-tokenization.json

Refinements to embeddings and tokenizers (#61 )

2023-05-15 15:10:42 -03:00

MIT License

Refinements to embeddings and tokenizers (#61 )

2023-05-15 15:10:42 -03:00

README.md

FEATURE: Embeddings to main db (#99 )

2023-07-13 12:41:36 -03:00

README.md

bert-base-uncased.json

Licensed under Apache License

claude-v1-tokenization.json

Licensed under MIT License

all-mpnet-base-v2.json

Licensed under Apache License