730 Commits

Author SHA1 Message Date
Keegan George
7dee59c97f
DEV: Remove no longer needed seeded model check 2025-07-21 08:08:20 -07:00
Keegan George
eb93e1736b
Final spec fix 🤞 2025-07-17 14:46:15 -07:00
Keegan George
cc34882074
FIX: rest of specs 2025-07-17 14:34:29 -07:00
Keegan George
4b03cdcc96
FIX: LLM controller spec 2025-07-17 13:31:26 -07:00
Keegan George
08d2f3ddf9
FIX: spam 2025-07-17 11:36:19 -07:00
Keegan George
7af3ce820d
DEV: Remove references to translation model 2025-07-17 09:27:08 -07:00
Keegan George
5841543bd9
FIX: validator 2025-07-16 14:51:32 -07:00
Keegan George
5ae80645bb
Merge branch 'main' into default-llm-model 2025-07-16 12:07:10 -07:00
Keegan George
b675c4c39b
DEV: Remove custom prefix in specs 2025-07-16 11:39:55 -07:00
Keegan George
0fadf1da1a
DEV: update enumerator 2025-07-16 10:56:18 -07:00
Roman Rizzi
06743d1939
FIX: Fix embeddings to use the old OpenAI tokenizer (#1506) 2025-07-15 14:44:11 -03:00
Keegan George
88811f97bf
fixes 2025-07-15 10:23:58 -07:00
Keegan George
2377b286dd
DEV: rely on default llm model in spec 2025-07-15 09:56:24 -07:00
Roman Rizzi
6a4dbd8126
FEATURE: Use Personas in Automation's llm_report script (#1499) 2025-07-14 12:47:21 -03:00
Keegan George
8da4318021
DEV: assign_fake_provider 2025-07-11 12:32:36 -07:00
Keegan George
b6a88ff03e
spec 2025-07-11 12:02:50 -07:00
Roman Rizzi
89bcf9b1f0
FIX: Process succesfully generated embeddings even if some failed (#1500) 2025-07-10 17:51:01 -03:00
Martin Brennan
6b5ea38644
DEV: Improve AI bot conversation submit upload handling (#1497)
Try fix a flaky spec in /ai_bot/homepage_spec.rb by using ember
data rather than inspecting the DOM directly to see if there
are any in-progress uploads.

Also add missing translation for in progress uploads warning.
2025-07-10 17:06:39 +10:00
Natalie Tay
d54cd1f602
DEV: Normalize locales that are similar (e.g. en and en_GB) so they do not get translated (#1495)
This commit
- normalizes locales like en_GB and variants to en. With this, the feature will not translate en_GB posts to en (or similarly pt_BR to pt_PT)
- consolidates whether the feature is enabled in `DiscourseAi::Translation.enabled?`
- similarly for backfill in  `DiscourseAi::Translation.backfill_enabled?`
  - turns off backfill if `ai_translation_backfill_max_age_days` is 0 to keep true to what it says. Set it to a high number to backfill everything
2025-07-09 22:21:51 +08:00
Natalie Tay
56f025cf44
FIX: Localize description excerpts as they have limits (#1490) 2025-07-08 10:36:41 +08:00
Rafael dos Santos Silva
6247906c13
FEATURE: Seamless embedding model upgrades (#1486) 2025-07-04 16:44:03 -03:00
Sam
ab5edae121
FIX: make AI helper more robust (#1484)
* FIX: make AI helper more robust

- If JSON is broken for structured output then lean on a more forgiving parser
- Gemini 2.5 flash does not support temp, support opting out
- Evals for assistant were broken, fix interface
- Add some missing LLMs
- Translator was not mapped correctly to the feature - fix that
- Don't mix XML in prompt for translator

* lint

* correct logic

* simplify code

* implement best effort json parsing direct in the structured output object
2025-07-04 14:47:11 +10:00
Natalie Tay
2b9a4f9232
FIX: Ignore captions and quotes when detecting locale and update prompts (#1483)
A more deterministic way of making sure the LLM detects the correct language (instead of relying on prompt to LLM to ignore it) is to take the cooked and remove unwanted elements.

In this commit 
- we remove quotes, image captions, etc. and only take the remaining text, falling back to the unadulterated cooked
- and update prompts related to detection and translation
- /152465/12
2025-07-03 22:57:48 +08:00
Rafael dos Santos Silva
d792919ddf
DEV: Move tokenizers to a gem (#1481)
Also renames the Mixtral tokenizer to Mistral.

See gem at github.com/discourse/discourse_ai-tokenizers


Co-authored-by: Roman Rizzi <roman@discourse.org>
2025-07-02 14:43:03 -03:00
Roman Rizzi
75fb37144f
FEATURE: Use personas for generating hypothetical posts (#1482)
* FEATURE: Use personas for generating hypothetica posts

* Update prompt
2025-07-02 10:56:38 -03:00
Sam
40fa527633
FIX: cross talk when in ai helper (#1478)
Previous to this change we reused channels for proofreading progress and
ai helper progress

The new changeset ensures each POST to stream progress gets a dedicated
message bus channel

This fixes a class of issues where the wrong information could be displayed
to end users on subsequent proofreading or helper calls

* fix tests

* fix implementation (got to subscribe at 0)
2025-07-01 18:02:16 +10:00
Yuriy Kurant
8527279594
dev: removes messages section from sidebar (#1477) 2025-07-01 17:23:11 +10:00
Roman Rizzi
5ca7d5f256
FIX: Strip uploads from msg when searching for rag fragments (#1475) 2025-06-30 15:03:17 -03:00
Natalie Tay
a94daa14e2
FIX: Return no topics when embeddings is disabled (#1473)
When an invalid model is set for embeddings, topics do not load even if embeddings is disabled.

Error:
## RuntimeError in TopicsController#show
Invalid embeddings selected model

This commit checks for valid settings before attempting to load related topics.
2025-06-30 17:45:04 +08:00
Kris
262bd8b145
UX: add filter to features page, update styles (#1471)
* UX: add filter to features page, update styles

* merge fix

* update toggle spec

* test fix
2025-06-30 09:26:53 +10:00
Roman Rizzi
8d943fa29d
FEATURE: Display spam module on features list. (#1469) 2025-06-27 14:18:01 -03:00
Roman Rizzi
b35f9bcc7c
FEATURE: Use Persona's when scanning posts for spam (#1465) 2025-06-27 10:35:47 -03:00
Sam
cc4e9e030f
FIX: normalize keys in structured output (#1468)
* FIX: normalize keys in structured output

Previously we did not validate the hash passed in to structured
outputs which could either be string based or symbol base

Specifically this broke structured outputs for Gemini in some
specific cases.

* comment out flake
2025-06-27 15:42:48 +10:00
Sam
73768ce920
FEATURE: Display bot in feature list (#1466)
- allows features to have multiple llms and multiple personas
- sorts module list
- adds Bot as a first class module
- fixes issue where search module was always configured
- some tests
2025-06-27 12:35:41 +10:00
Rafael dos Santos Silva
a40e2d3156
FEATURE: Update OpenAI tokenizer to GPT-4o and later (#1467) 2025-06-26 15:26:09 -03:00
Sam
3e74f09d06
FEATURE: improve custom tool infra (#1463)
- Add support for `chain.streamCustomRaw(test)` that can be used to stream text from a JS tool direct to composer
- Add support for llm params in `llm.generate` which unlocks stuff like structured outputs
- Add discourse.createStagedUser, discourse.createTopic  and discourse.createPost - for content creation
2025-06-25 16:25:44 +10:00
Sam
471f96f972
FEATURE: allow seeing configured LLM on feature page (#1460)
This is an interim fix so we can at least tell what feature is
being used for what LLM.

It also adds some test coverage to the feature page.
2025-06-24 17:42:47 +10:00
Sam
9f2a4094f5
FEATURE: persona/tool import and export (#1450)
Introduces import/export feature for tools and personas.

Uploads are omitted for now, and will be added in a future PR 

*   **Backend:**
    *   Adds `import` and `export` actions to `Admin::AiPersonasController` and `Admin::AiToolsController`.
    *   Introduces `DiscourseAi::PersonaExporter` and `DiscourseAi::PersonaImporter` services to manage JSON serialization and deserialization.
    *   The export format for a persona embeds its associated custom tools. To ensure portability, `AiTool` references are serialized using their `tool_name` rather than their internal database `id`.
    *   The import logic detects conflicts by name. A `force=true` parameter can be passed to overwrite existing records.

*   **Frontend:**
    *   `AiPersonaListEditor` and `AiToolListEditor` components now include an "Import" button that handles file selection and POSTs the JSON data to the respective `import` endpoint.
    *   `AiPersonaEditorForm` and `AiToolEditorForm` components feature an "Export" button that triggers a download of the serialized record.
    *   Handles import conflicts (HTTP `409` for tools, `422` for personas) by showing a `dialog.confirm` prompt to allow the user to force an overwrite.

*   **Testing:**
    *   Adds comprehensive request specs for the new controller actions (`#import`, `#export`).
    *   Includes unit specs for the `PersonaExporter` and `PersonaImporter` services.
* Persona import and export implemented
2025-06-24 12:41:10 +10:00
Natalie Tay
683bb5725b
DEV: Split content based on llmmodel's max_output_tokens (#1456)
In discourse/discourse-translator#249 we introduced splitting content (post.raw) prior to sending to translation as we were using a sync api.

Now that we're streaming thanks to #1424, we'll chunk based on the LlmModel.max_output_tokens.
2025-06-23 21:11:20 +08:00
Natalie Tay
740be26625
DEV: Also make sure locale detection skips PMs that are not group PMs when public content only (#1457)
In the earlier PR https://github.com/discourse/discourse-ai/pull/1432, when `SiteSetting.ai_translation_backfill_limit_to_public_content = false`, we **translate** PMs but **skip translating** PMs that do not involve groups.

This commit covers the missing case on **locale detection**.
2025-06-23 19:07:40 +08:00
Natalie Tay
e2d7ca0bb9
DEV: Indicate backfill rate for translations is hourly (#1451)
* DEV: Indicate backfill rate for translations is hourly

* add ai_translation_max_post_length

* default value update
2025-06-21 15:45:09 +08:00
Keegan George
a4194d3fb2
FIX: AI preferences tab button not appearing unless Helper enabled (#1452)
This update fixes an issue where the AI user preferences tab was not appearing unless `SiteSetting.ai_helper_enabled` was `true`. This is because we previously checked for it's presence when user preferences only had a single setting related to Helper. However, since then, we've also added search discoveries setting there too. As such, we don't want it to depend on Helper. We also sneak in this update a modernization of converting the preferences template from `.hbs` to `.gjs`.
2025-06-20 10:12:08 -07:00
Keegan George
baaa3d199a
FIX: streaming related specs (#1448)
## 🔍 Overview
This update fixes an issue where message bus streaming related specs
were not working correctly. To do so we pass the `last_id` when
subscribing to `MessageBus` which allows us to unskip those broken
tests.

---------

Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
2025-06-19 07:41:18 -07:00
Joffrey JAFFEUX
6a33e5154d
DEV: makes ai menu helper a standalone menu (#1434)
The current menu was rendering inside the post text toolbar (on desktop). This is not ideal as the post text toolbar rendering is conditioned on the presence of text selection, when you click a button on the toolbar, by design of the web browsers you will lose your text selection, making all of this super tricky.

This commit makes desktop and mobile behave in the same way by rendering their own menu and capturing the quote state when we render the post text selection toolbar, this allows us to reason a much simpler way about the AI helper.

This commit also removes what appears to be an unused file and corrects which was seemingly copy/paste mistakes.

⚠️ Technical note, this commit is correcting the message bus subscription which amongst other things allows to write specs which are not flaky. However due to the current implementation we have a channel per post, which means we need to serialize on last message bus id per post. 

We have two possible solutions here:
- subscribe at the topic level
- refactor the code to be able to use `MessageBus.last_ids` to be able to grab multiple posts at once instead of having to call `MessageBus.last_id` and done one Redis call per post

---------

Co-authored-by: Keegan George <kgeorge13@gmail.com>
2025-06-19 11:56:00 +02:00
Sam
37dbd48513
FIX: implement max_output tokens (anthropic/openai/bedrock/gemini/open router) (#1447)
* FIX: implement max_output tokens (anthropic/openai/bedrock/gemini/open router)

Previously this feature existed but was not implemented
Also updates a bunch of models to in our preset to point to latest

* implementing in base is safer, simpler and easier to manage

* anthropic 3.5 is getting older, lets use 4.0 here and fix spec
2025-06-19 16:00:11 +10:00
Natalie Tay
3e87e92631
DEV: Remove 'experimental' from translation features (#1439)
* DEV: Remove 'experimental' from translation features

* include compat

* include compat
2025-06-19 12:23:56 +08:00
Mark VanLandingham
cd14b0c0be
FIX: Bring back empty state message when appropriate (#1446)
The Today section was added always, but a side-effect was that we hid the empty state component. This commit brings back the empty state
2025-06-18 17:34:08 -05:00
Natalie Tay
d7a2af5505
DEV: Prevent multiple translation per post (#1443)
We're seeing an aggressive number of translations being enqueued for a single post and locale. Historically, we trigger translation on `cooked` not `raw`, but that has changed a while back.

```
# from AiApiAuditLog, the same post is getting translated to the same locale within a few secs of each other
zh_CN - 2025-06-17 13:02:31 UTC
zh_CN - 2025-06-17 13:02:34 UTC
zh_CN - 2025-06-17 13:02:35 UTC
zh_CN - 2025-06-17 13:02:36 UTC
zh_CN - 2025-06-17 13:02:38 UTC
zh_CN - 2025-06-17 13:02:39 UTC
zh_CN - 2025-06-17 13:02:40 UTC
zh_CN - 2025-06-17 13:02:40 UTC
zh_CN - 2025-06-17 13:02:43 UTC
zh_CN - 2025-06-17 13:02:44 UTC
```

This PR prevents this from happening.
2025-06-18 13:24:02 +08:00
Rafael dos Santos Silva
9dccc1eb93
FEATURE: Add Qwen3 tokenizer and update Gemma to version 3 (#1440) 2025-06-17 10:25:03 -03:00
Natalie Tay
df925f8304
DEV: Move examples out of prompt (#1438)
* DEV: Move examples out of prompt
2025-06-17 16:12:52 +08:00