mirror of
https://github.com/discourse/discourse-ai.git
synced 2025-02-06 19:48:15 +00:00
BAAI/bge-m3 is an interesting model, that is multilingual and with a context size of 8192. Even with a 16x larger context, it's only 4x slower to compute it's embeddings on the worst case scenario. Also includes a minor refactor of the rake task, including setting model and concurrency levels when running the backfill task.
12 lines
258 B
Ruby
12 lines
258 B
Ruby
# frozen_string_literal: true
|
|
|
|
module DiscourseAi
|
|
module Tokenizer
|
|
class BgeM3Tokenizer < BasicTokenizer
|
|
def self.tokenizer
|
|
@@tokenizer ||= Tokenizers.from_file("./plugins/discourse-ai/tokenizers/bge-m3.json")
|
|
end
|
|
end
|
|
end
|
|
end
|