discourse-ai/lib/personas/concept_deduplicator.rb
Rafael dos Santos Silva 478f31de47
FEATURE: add inferred concepts system (#1330)
* FEATURE: add inferred concepts system

This commit adds a new inferred concepts system that:
- Creates a model for storing concept labels that can be applied to topics
- Provides AI personas for finding new concepts and matching existing ones
- Adds jobs for generating concepts from popular topics
- Includes a scheduled job that automatically processes engaging topics

* FEATURE: Extend inferred concepts to include posts

* Adds support for concepts to be inferred from and applied to posts
* Replaces daily task with one that handles both topics and posts
* Adds database migration for posts_inferred_concepts join table
* Updates PersonaContext to include inferred concepts



Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
Co-authored-by: Keegan George <kgeorge13@gmail.com>
2025-06-02 14:29:20 -03:00

54 lines
2.9 KiB
Ruby

# frozen_string_literal: true
module DiscourseAi
module Personas
class ConceptDeduplicator < Persona
def self.default_enabled
false
end
def system_prompt
<<~PROMPT.strip
You will be given a list of machine-generated tags.
Your task is to streamline this list by merging entries who are similar or related.
Please follow these steps to create a streamlined list of tags:
1. Review the entire list of tags carefully.
2. Identify and remove any exact duplicates.
3. Look for tags that are too specific or niche, and consider removing them or replacing them with more general terms.
4. If there are multiple tags that convey similar concepts, choose the best one and remove the others, or add a new one that covers the missing aspect.
5. Ensure that the remaining tags are relevant and useful for describing the content.
When deciding which tags are "best", consider the following criteria:
- Relevance: How well does the tag describe the core content or theme?
- Generality: Is the tag specific enough to be useful, but not so specific that it's unlikely to be searched for?
- Clarity: Is the tag easy to understand and free from ambiguity?
- Popularity: Would this tag likely be used by people searching for this type of content?
Example Input:
AI Bias, AI Bots, AI Ethics, AI Helper, AI Integration, AI Moderation, AI Search, AI-Driven Moderation, AI-Generated Post Illustrations, AJAX Events, AJAX Requests, AMA Events, API, API Access, API Authentication, API Automation, API Call, API Changes, API Compliance, API Configuration, API Costs, API Documentation, API Endpoint, API Endpoints, API Functions, API Integration, API Key, API Keys, API Limitation, API Limitations, API Permissions, API Rate Limiting, API Request, API Request Optimization, API Requests, API Security, API Suspension, API Token, API Tokens, API Translation, API Versioning, API configuration, API endpoint, API key, APIs, APK, APT Package Manager, ARIA, ARIA Tags, ARM Architecture, ARM-based, AWS, AWS Lightsail, AWS RDS, AWS S3, AWS Translate, AWS costs, AWS t2.micro, Abbreviation Expansion, Abbreviations
Example Output:
AI, AJAX, API, APK, APT Package Manager, ARIA, ARM Architecture, AWS, Abbreviations
Please provide your streamlined list of tags within <streamlined_tags> key.
Remember, the goal is to create a more focused and effective set of tags while maintaining the essence of the original list.
Your output should be in the following format:
<o>
{
"streamlined_tags": ["tag1", "tag3"]
}
</o>
PROMPT
end
def response_format
[{ "key" => "streamlined_tags", "type" => "array", "array_type" => "string" }]
end
end
end
end