A more deterministic way of making sure the LLM detects the correct language (instead of relying on prompt to LLM to ignore it) is to take the cooked and remove unwanted elements.
In this commit
- we remove quotes, image captions, etc. and only take the remaining text, falling back to the unadulterated cooked
- and update prompts related to detection and translation
- /152465/12
In discourse/discourse-translator#249 we introduced splitting content (post.raw) prior to sending to translation as we were using a sync api.
Now that we're streaming thanks to #1424, we'll chunk based on the LlmModel.max_output_tokens.
We're seeing an aggressive number of translations being enqueued for a single post and locale. Historically, we trigger translation on `cooked` not `raw`, but that has changed a while back.
```
# from AiApiAuditLog, the same post is getting translated to the same locale within a few secs of each other
zh_CN - 2025-06-17 13:02:31 UTC
zh_CN - 2025-06-17 13:02:34 UTC
zh_CN - 2025-06-17 13:02:35 UTC
zh_CN - 2025-06-17 13:02:36 UTC
zh_CN - 2025-06-17 13:02:38 UTC
zh_CN - 2025-06-17 13:02:39 UTC
zh_CN - 2025-06-17 13:02:40 UTC
zh_CN - 2025-06-17 13:02:40 UTC
zh_CN - 2025-06-17 13:02:43 UTC
zh_CN - 2025-06-17 13:02:44 UTC
```
This PR prevents this from happening.
The AiApiAuditLog per translation event doesn't trace back easily to a post or topic.
This commit adds support to that, and also switches the translators to named arguments rather than positional arguments.
Previously I had omitted to add `locale` to the category, as categories tended to be just a single word, and I did not find it would be worth to carry locale information.
Due to certain LLMs that do poorer at translation, category descriptions got pretty messy. We added locale support here - https://github.com/discourse/discourse/pull/32962.
This PR adds the automatic locale detection, and skips translating to the category's locale.
Related: https://github.com/discourse/discourse-translator/pull/310
This commit includes all the jobs and event hooks to localize posts, topics, and categories.
A few notes:
- `feature_name: "translation"` because the site setting is `ai-translation` and module is `Translation`
- we will switch to proper ai-feature in the near future, and can consider using the persona_user as `localization.localizer_user_id`
- keeping things flat within the module for now as we will be moving to ai-feature soon and have to rearrange
- Settings renamed/introduced are:
- ai_translation_backfill_rate (0)
- ai_translation_backfill_limit_to_public_content (true)
- ai_translation_backfill_max_age_days (5)
- ai_translation_verbose_logs (false)