discourse-ai/app/jobs/regular/generate_rag_embeddings.rb
Sam 830cc26075
FEATURE: Add metadata support for RAG (#553)
* FEATURE: Add metadata support for RAG

You may include non indexed metadata in the RAG document by using

[[metadata ....]]

This information is attached to all the text below and provided to
the retriever.

This allows for RAG to operate within a rich amount of contexts
without getting lost

Also:

- re-implemented chunking algorithm so it streams
- moved indexing to background low priority queue

* Baran gem no longer required.

* tokenizers is on 4.4 ... upgrade it ...
2024-04-04 11:02:16 -03:00

20 lines
697 B
Ruby

# frozen_string_literal: true
module ::Jobs
class GenerateRagEmbeddings < ::Jobs::Base
sidekiq_options queue: "low"
def execute(args)
return if (fragments = RagDocumentFragment.where(id: args[:fragment_ids].to_a)).empty?
truncation = DiscourseAi::Embeddings::Strategies::Truncation.new
vector_rep =
DiscourseAi::Embeddings::VectorRepresentations::Base.current_representation(truncation)
# generate_representation_from checks compares the digest value to make sure
# the embedding is only generated once per fragment unless something changes.
fragments.map { |fragment| vector_rep.generate_representation_from(fragment) }
end
end
end