mirror of
https://github.com/discourse/discourse-ai.git
synced 2025-06-30 19:42:17 +00:00
Related: https://github.com/discourse/discourse-translator/pull/310 This commit includes all the jobs and event hooks to localize posts, topics, and categories. A few notes: - `feature_name: "translation"` because the site setting is `ai-translation` and module is `Translation` - we will switch to proper ai-feature in the near future, and can consider using the persona_user as `localization.localizer_user_id` - keeping things flat within the module for now as we will be moving to ai-feature soon and have to rearrange - Settings renamed/introduced are: - ai_translation_backfill_rate (0) - ai_translation_backfill_limit_to_public_content (true) - ai_translation_backfill_max_age_days (5) - ai_translation_verbose_logs (false)
87 lines
2.6 KiB
Ruby
87 lines
2.6 KiB
Ruby
# frozen_string_literal: true
|
|
|
|
module DiscourseAi
|
|
module Translation
|
|
class LanguageDetector
|
|
DETECTION_CHAR_LIMIT = 1000
|
|
PROMPT_TEXT = <<~TEXT
|
|
You will be given a piece of text, and your task is to detect the locale (language) of the text and return it in a specific JSON format.
|
|
|
|
To complete this task, follow these steps:
|
|
|
|
1. Carefully read and analyze the provided text.
|
|
2. Determine the language of the text based on its characteristics, such as vocabulary, grammar, and sentence structure.
|
|
3. Do not use links or programing code in the text to detect the locale
|
|
4. Identify the appropriate language code for the detected language.
|
|
|
|
Here is a list of common language codes for reference:
|
|
- English: en
|
|
- Spanish: es
|
|
- French: fr
|
|
- German: de
|
|
- Italian: it
|
|
- Brazilian Portuguese: pt-BR
|
|
- Russian: ru
|
|
- Simplified Chinese: zh-CN
|
|
- Japanese: ja
|
|
- Korean: ko
|
|
|
|
If the language is not in this list, use the appropriate IETF language tag code.
|
|
|
|
5. Format your response as a JSON object with a single key "locale" and the value as the language code.
|
|
|
|
Your output should be in the following format:
|
|
<output>
|
|
{"locale": "xx"}
|
|
</output>
|
|
|
|
Where "xx" is replaced by the appropriate language code.
|
|
|
|
Important: Base your analysis solely on the provided text. Do not use any external information or make assumptions about the text's origin or context beyond what is explicitly provided.
|
|
TEXT
|
|
|
|
def initialize(text)
|
|
@text = text
|
|
end
|
|
|
|
def detect
|
|
prompt =
|
|
DiscourseAi::Completions::Prompt.new(
|
|
PROMPT_TEXT,
|
|
messages: [{ type: :user, content: @text, id: "user" }],
|
|
)
|
|
|
|
structured_output =
|
|
DiscourseAi::Completions::Llm.proxy(SiteSetting.ai_translation_model).generate(
|
|
prompt,
|
|
user: Discourse.system_user,
|
|
feature_name: "translation",
|
|
response_format: response_format,
|
|
)
|
|
|
|
structured_output&.read_buffered_property(:locale)
|
|
end
|
|
|
|
def response_format
|
|
{
|
|
type: "json_schema",
|
|
json_schema: {
|
|
name: "reply",
|
|
schema: {
|
|
type: "object",
|
|
properties: {
|
|
locale: {
|
|
type: "string",
|
|
},
|
|
},
|
|
required: ["locale"],
|
|
additionalProperties: false,
|
|
},
|
|
strict: true,
|
|
},
|
|
}
|
|
end
|
|
end
|
|
end
|
|
end
|