discourse-ai/app/jobs/regular/generate_rag_embeddings.rb
Roman Rizzi 534b0df391
REFACTOR: Separation of concerns for embedding generation. (#1027)
In a previous refactor, we moved the responsibility of querying and storing embeddings into the `Schema` class. Now, it's time for embedding generation.

The motivation behind these changes is to isolate vector characteristics in simple objects to later replace them with a DB-backed version, similar to what we did with LLM configs.
2024-12-16 09:55:39 -03:00

26 lines
923 B
Ruby

# frozen_string_literal: true
module ::Jobs
class GenerateRagEmbeddings < ::Jobs::Base
sidekiq_options queue: "ultra_low"
# we could also restrict concurrency but this takes so long if it is not concurrent
def execute(args)
return if (fragments = RagDocumentFragment.where(id: args[:fragment_ids].to_a)).empty?
vector = DiscourseAi::Embeddings::Vector.instance
# generate_representation_from checks compares the digest value to make sure
# the embedding is only generated once per fragment unless something changes.
fragments.map { |fragment| vector.generate_representation_from(fragment) }
last_fragment = fragments.last
target = last_fragment.target
upload = last_fragment.upload
indexing_status = RagDocumentFragment.indexing_status(target, [upload])[upload.id]
RagDocumentFragment.publish_status(upload, indexing_status)
end
end
end