2024-10-15 13:53:26 -03:00
|
|
|
# frozen_string_literal: true
|
|
|
|
|
|
|
|
module DiscourseAi
|
|
|
|
module Summarization
|
|
|
|
module Strategies
|
|
|
|
class ChatMessages < Base
|
|
|
|
def type
|
|
|
|
AiSummary.summary_types[:complete]
|
|
|
|
end
|
|
|
|
|
FIX: Make summaries backfill job more resilient. (#1071)
To quickly select backfill candidates without comparing SHAs, we compare the last summarized post to the topic's highest_post_number. However, hiding or deleting a post and adding a small action will update this column, causing the job to stall and re-generate the same summary repeatedly until someone posts a regular reply. On top of this, this is not always true for topics with `best_replies`, as this last reply isn't necessarily included.
Since this is not evident at first glance and each summarization strategy picks its targets differently, I'm opting to simplify the backfill logic and how we track potential candidates.
The first step is dropping `content_range`, which serves no purpose and it's there because summary caching was supposed to work differently at the beginning. So instead, I'm replacing it with a column called `highest_target_number`, which tracks `highest_post_number` for topics and could track other things like channel's `message_count` in the future.
Now that we have this column when selecting every potential backfill candidate, we'll check if the summary is truly outdated by comparing the SHAs, and if it's not, we just update the column and move on
2025-01-16 09:42:53 -03:00
|
|
|
def highest_target_number
|
|
|
|
nil # We don't persist so we can return nil.
|
|
|
|
end
|
|
|
|
|
2024-10-15 13:53:26 -03:00
|
|
|
def initialize(target, since)
|
|
|
|
super(target)
|
|
|
|
@since = since
|
|
|
|
end
|
|
|
|
|
|
|
|
def targets_data
|
2024-10-25 11:51:17 -03:00
|
|
|
target
|
2024-10-15 13:53:26 -03:00
|
|
|
.chat_messages
|
|
|
|
.where("chat_messages.created_at > ?", since.hours.ago)
|
|
|
|
.includes(:user)
|
|
|
|
.order(created_at: :asc)
|
|
|
|
.pluck(:id, :username_lower, :message)
|
|
|
|
.map { { id: _1, poster: _2, text: _3 } }
|
|
|
|
end
|
|
|
|
|
2024-10-31 12:17:42 -03:00
|
|
|
def summary_extension_prompt(summary, contents)
|
2024-10-25 11:51:17 -03:00
|
|
|
input =
|
|
|
|
contents
|
|
|
|
.map { |item| "(#{item[:id]} #{item[:poster]} said: #{item[:text]} " }
|
|
|
|
.join("\n")
|
|
|
|
|
2024-10-15 13:53:26 -03:00
|
|
|
prompt = DiscourseAi::Completions::Prompt.new(<<~TEXT.strip)
|
2024-10-25 11:51:17 -03:00
|
|
|
You are a summarization bot tasked with expanding on an existing summary by incorporating new chat messages.
|
|
|
|
Your goal is to seamlessly integrate the additional information into the existing summary, preserving the clarity and insights of the original while reflecting any new developments, themes, or conclusions.
|
|
|
|
Analyze the new messages to identify key themes, participants' intentions, and any significant decisions or resolutions.
|
|
|
|
Update the summary to include these aspects in a way that remains concise, comprehensive, and accessible to someone with no prior context of the conversation.
|
|
|
|
|
|
|
|
### Guidelines:
|
|
|
|
|
|
|
|
- Merge the new information naturally with the existing summary without redundancy.
|
|
|
|
- Only include the updated summary, WITHOUT additional commentary.
|
|
|
|
- Don't mention the channel title. Avoid extraneous details or subjective opinions.
|
|
|
|
- Maintain the original language of the text being summarized.
|
|
|
|
- The same user could write multiple messages in a row, don't treat them as different persons.
|
|
|
|
- Aim for summaries to be extended by a reasonable amount, but strive to maintain a total length of 400 words or less, unless absolutely necessary for comprehensiveness.
|
|
|
|
|
2024-10-15 13:53:26 -03:00
|
|
|
TEXT
|
|
|
|
|
|
|
|
prompt.push(type: :user, content: <<~TEXT.strip)
|
2024-10-25 11:51:17 -03:00
|
|
|
### Context:
|
|
|
|
|
|
|
|
This is the existing summary:
|
|
|
|
|
|
|
|
#{summary}
|
2024-10-15 13:53:26 -03:00
|
|
|
|
2024-10-25 11:51:17 -03:00
|
|
|
These are the new chat messages:
|
|
|
|
|
|
|
|
#{input}
|
|
|
|
|
|
|
|
Intengrate the new messages into the existing summary.
|
2024-10-15 13:53:26 -03:00
|
|
|
TEXT
|
|
|
|
|
|
|
|
prompt
|
|
|
|
end
|
|
|
|
|
2024-10-31 12:17:42 -03:00
|
|
|
def first_summary_prompt(contents)
|
2024-10-25 11:51:17 -03:00
|
|
|
content_title = target.name
|
|
|
|
input =
|
|
|
|
contents.map { |item| "(#{item[:id]} #{item[:poster]} said: #{item[:text]} " }.join
|
|
|
|
|
2024-10-15 13:53:26 -03:00
|
|
|
prompt = DiscourseAi::Completions::Prompt.new(<<~TEXT.strip)
|
|
|
|
You are a summarization bot designed to generate clear and insightful paragraphs that conveys the main topics
|
|
|
|
and developments from a series of chat messages within a user-selected time window.
|
|
|
|
|
|
|
|
Analyze the messages to extract key themes, participants' intentions, and any significant conclusions or decisions.
|
|
|
|
Your summary should be concise yet comprehensive, providing an overview that is accessible to someone with no prior context of the conversation.
|
|
|
|
|
|
|
|
- Only include the summary, WITHOUT additional commentary.
|
|
|
|
- Don't mention the channel title. Avoid including extraneous details or subjective opinions.
|
|
|
|
- Maintain the original language of the text being summarized.
|
|
|
|
- The same user could write multiple messages in a row, don't treat them as different persons.
|
|
|
|
- Aim for summaries to be 400 words or less.
|
|
|
|
|
|
|
|
TEXT
|
|
|
|
|
|
|
|
prompt.push(type: :user, content: <<~TEXT.strip)
|
2024-10-25 11:51:17 -03:00
|
|
|
#{content_title.present? ? "The name of the channel is: " + content_title + ".\n" : ""}
|
2024-10-15 13:53:26 -03:00
|
|
|
|
|
|
|
Here are the messages, inside <input></input> XML tags:
|
|
|
|
|
|
|
|
<input>
|
|
|
|
#{input}
|
|
|
|
</input>
|
|
|
|
|
|
|
|
Generate a summary of the given chat messages.
|
|
|
|
TEXT
|
|
|
|
|
|
|
|
prompt
|
|
|
|
end
|
|
|
|
|
|
|
|
private
|
|
|
|
|
|
|
|
attr_reader :since
|
|
|
|
end
|
|
|
|
end
|
|
|
|
end
|
|
|
|
end
|