15 Commits

Author SHA1 Message Date
Keegan George
5ae80645bb
Merge branch 'main' into default-llm-model 2025-07-16 12:07:10 -07:00
Natalie Tay
ad6a8cb812
DEV: Switch translations out of structured output as it returns only a single value (#1503)
Since translations only require a single key back, there is little point in using structured output. This PR also includes some prompt updates dealing with quotes, details, and code.

Related: #1502

This does mean reverting discourse/discourse-translator#257, but we can see how it goes.
2025-07-16 13:38:00 +08:00
Keegan George
f593ab6e1e
DEV: remove custom: prefix
and update migrations for translations
2025-07-15 18:07:05 -07:00
Natalie Tay
d54cd1f602
DEV: Normalize locales that are similar (e.g. en and en_GB) so they do not get translated (#1495)
This commit
- normalizes locales like en_GB and variants to en. With this, the feature will not translate en_GB posts to en (or similarly pt_BR to pt_PT)
- consolidates whether the feature is enabled in `DiscourseAi::Translation.enabled?`
- similarly for backfill in  `DiscourseAi::Translation.backfill_enabled?`
  - turns off backfill if `ai_translation_backfill_max_age_days` is 0 to keep true to what it says. Set it to a high number to backfill everything
2025-07-09 22:21:51 +08:00
Natalie Tay
56f025cf44
FIX: Localize description excerpts as they have limits (#1490) 2025-07-08 10:36:41 +08:00
Natalie Tay
6f8960e549
FIX: Pass topic to context (#1488) 2025-07-07 14:59:48 +08:00
Natalie Tay
2b9a4f9232
FIX: Ignore captions and quotes when detecting locale and update prompts (#1483)
A more deterministic way of making sure the LLM detects the correct language (instead of relying on prompt to LLM to ignore it) is to take the cooked and remove unwanted elements.

In this commit 
- we remove quotes, image captions, etc. and only take the remaining text, falling back to the unadulterated cooked
- and update prompts related to detection and translation
- /152465/12
2025-07-03 22:57:48 +08:00
Sam
471f96f972
FEATURE: allow seeing configured LLM on feature page (#1460)
This is an interim fix so we can at least tell what feature is
being used for what LLM.

It also adds some test coverage to the feature page.
2025-06-24 17:42:47 +10:00
Natalie Tay
683bb5725b
DEV: Split content based on llmmodel's max_output_tokens (#1456)
In discourse/discourse-translator#249 we introduced splitting content (post.raw) prior to sending to translation as we were using a sync api.

Now that we're streaming thanks to #1424, we'll chunk based on the LlmModel.max_output_tokens.
2025-06-23 21:11:20 +08:00
Natalie Tay
e2d7ca0bb9
DEV: Indicate backfill rate for translations is hourly (#1451)
* DEV: Indicate backfill rate for translations is hourly

* add ai_translation_max_post_length

* default value update
2025-06-21 15:45:09 +08:00
Natalie Tay
d7a2af5505
DEV: Prevent multiple translation per post (#1443)
We're seeing an aggressive number of translations being enqueued for a single post and locale. Historically, we trigger translation on `cooked` not `raw`, but that has changed a while back.

```
# from AiApiAuditLog, the same post is getting translated to the same locale within a few secs of each other
zh_CN - 2025-06-17 13:02:31 UTC
zh_CN - 2025-06-17 13:02:34 UTC
zh_CN - 2025-06-17 13:02:35 UTC
zh_CN - 2025-06-17 13:02:36 UTC
zh_CN - 2025-06-17 13:02:38 UTC
zh_CN - 2025-06-17 13:02:39 UTC
zh_CN - 2025-06-17 13:02:40 UTC
zh_CN - 2025-06-17 13:02:40 UTC
zh_CN - 2025-06-17 13:02:43 UTC
zh_CN - 2025-06-17 13:02:44 UTC
```

This PR prevents this from happening.
2025-06-18 13:24:02 +08:00
Natalie Tay
b5e8277083
DEV: Move AI translation feature into an AI Feature (#1424)
This PR moves translations into an AI Feature

See https://github.com/discourse/discourse-ai/pull/1424 for screenshots
2025-06-13 10:17:27 +08:00
Natalie Tay
6827147362
DEV: Add topic and post id when using completions for traceability to AiApiAuditLog (#1414)
The AiApiAuditLog per translation event doesn't trace back easily to a post or topic.

This commit adds support to that, and also switches the translators to named arguments rather than positional arguments.
2025-06-06 23:24:24 +08:00
Natalie Tay
8a3a247b11
DEV: Also detect locale of categories and do not translate if already in the locale (#1413)
Previously I had omitted to add `locale` to the category, as categories tended to be just a single word, and I did not find it would be worth to carry locale information.

Due to certain LLMs that do poorer at translation, category descriptions got pretty messy. We added locale support here - https://github.com/discourse/discourse/pull/32962. 

This PR adds the automatic locale detection, and skips translating to the category's locale.
2025-06-06 22:41:48 +08:00
Natalie Tay
373e2305d6
FEATURE: Automatic translation and localization of posts, topics, categories (#1376)
Related: https://github.com/discourse/discourse-translator/pull/310

This commit includes all the jobs and event hooks to localize posts, topics, and categories.

A few notes:
- `feature_name: "translation"` because the site setting is `ai-translation` and module is `Translation`
- we will switch to proper ai-feature in the near future, and can consider using the persona_user as `localization.localizer_user_id`
- keeping things flat within the module for now as we will be moving to ai-feature soon and have to rearrange
- Settings renamed/introduced are:
  - ai_translation_backfill_rate (0)
  - ai_translation_backfill_limit_to_public_content (true)
  - ai_translation_backfill_max_age_days (5)
  - ai_translation_verbose_logs (false)
2025-05-29 17:28:06 +08:00