mirror of
https://github.com/discourse/discourse-ai.git
synced 2025-03-06 17:30:20 +00:00
* FEATURE: Add metadata support for RAG You may include non indexed metadata in the RAG document by using [[metadata ....]] This information is attached to all the text below and provided to the retriever. This allows for RAG to operate within a rich amount of contexts without getting lost Also: - re-implemented chunking algorithm so it streams - moved indexing to background low priority queue * Baran gem no longer required. * tokenizers is on 4.4 ... upgrade it ...
20 lines
697 B
Ruby
20 lines
697 B
Ruby
# frozen_string_literal: true
|
|
|
|
module ::Jobs
|
|
class GenerateRagEmbeddings < ::Jobs::Base
|
|
sidekiq_options queue: "low"
|
|
|
|
def execute(args)
|
|
return if (fragments = RagDocumentFragment.where(id: args[:fragment_ids].to_a)).empty?
|
|
|
|
truncation = DiscourseAi::Embeddings::Strategies::Truncation.new
|
|
vector_rep =
|
|
DiscourseAi::Embeddings::VectorRepresentations::Base.current_representation(truncation)
|
|
|
|
# generate_representation_from checks compares the digest value to make sure
|
|
# the embedding is only generated once per fragment unless something changes.
|
|
fragments.map { |fragment| vector_rep.generate_representation_from(fragment) }
|
|
end
|
|
end
|
|
end
|