This spec fails inconsistently with:
-fragment-n14
+You are a helpful Discourse assistant.
+You _understand_ and **generate** Discourse Markdown.
+You live in a Discourse Forum Message.
+
+You live in the forum with the URL: http://test.localhost
+The title of your site: test site title
+The description is: test site description
+The participants in this conversation are: joe, jane
+The date now is: 2024-11-25 20:23:02 UTC, much has changed since you were trained.
+
+You were trained on OLD data, lean on search to get up to date information about this forum
+When searching try to SIMPLIFY search terms
+Discourse search joins all terms with AND. Reduce and simplify terms to find more results.<guidance>
+The following texts will give you additional guidance for your response.
+We included them because we believe they are relevant to this conversation topic.
+
+Texts:
+
+fragment-n10
+fragment-n9
+fragment-n8
+fragment-n7
+fragment-n6
+fragment-n5
+fragment-n4
+fragment-n3
+fragment-n2
+fragment-n1
+</guidance>
This is a significant PR that introduces AI Artifacts functionality to the discourse-ai plugin along with several other improvements. Here are the key changes:
1. AI Artifacts System:
- Adds a new `AiArtifact` model and database migration
- Allows creation of web artifacts with HTML, CSS, and JavaScript content
- Introduces security settings (`strict`, `lax`, `disabled`) for controlling artifact execution
- Implements artifact rendering in iframes with sandbox protection
- New `CreateArtifact` tool for AI to generate interactive content
2. Tool System Improvements:
- Adds support for partial tool calls, allowing incremental updates during generation
- Better handling of tool call states and progress tracking
- Improved XML tool processing with CDATA support
- Fixes for tool parameter handling and duplicate invocations
3. LLM Provider Updates:
- Updates for Anthropic Claude models with correct token limits
- Adds support for native/XML tool modes in Gemini integration
- Adds new model configurations including Llama 3.1 models
- Improvements to streaming response handling
4. UI Enhancements:
- New artifact viewer component with expand/collapse functionality
- Security controls for artifact execution (click-to-run in strict mode)
- Improved dialog and response handling
- Better error management for tool execution
5. Security Improvements:
- Sandbox controls for artifact execution
- Public/private artifact sharing controls
- Security settings to control artifact behavior
- CSP and frame-options handling for artifacts
6. Technical Improvements:
- Better post streaming implementation
- Improved error handling in completions
- Better memory management for partial tool calls
- Enhanced testing coverage
7. Configuration:
- New site settings for artifact security
- Extended LLM model configurations
- Additional tool configuration options
This PR significantly enhances the plugin's capabilities for generating and displaying interactive content while maintaining security and providing flexible configuration options for administrators.
Implement streaming tool call implementation for Anthropic and Open AI.
When calling:
llm.generate(..., partial_tool_calls: true) do ...
Partials may contain ToolCall instances with partial: true, These tool calls are partially populated with json partially parsed.
So for example when performing a search you may get:
ToolCall(..., {search: "hello" })
ToolCall(..., {search: "hello world" })
The library used to parse json is:
https://github.com/dgraham/json-stream
We use a fork cause we need access to the internal buffer.
This prepares internals to perform partial tool calls, but does not implement it yet.
This re-implements tool support in DiscourseAi::Completions::Llm #generate
Previously tool support was always returned via XML and it would be the responsibility of the caller to parse XML
New implementation has the endpoints return ToolCall objects.
Additionally this simplifies the Llm endpoint interface and gives it more clarity. Llms must implement
decode, decode_chunk (for streaming)
It is the implementers responsibility to figure out how to decode chunks, base no longer implements. To make this easy we ship a flexible json decoder which is easy to wire up.
Also (new)
Better debugging for PMs, we now have a next / previous button to see all the Llm messages associated with a PM
Token accounting is fixed for vllm (we were not correctly counting tokens)
Polymorphic RAG means that we will be able to access RAG fragments both from AiPersona and AiCustomTool
In turn this gives us support for richer RAG implementations.
* DEV: Remove old code now that features rely on LlmModels.
* Hide old settings and migrate persona llm overrides
* Remove shadowing special URL + seeding code. Use srv:// prefix instead.
This is a rather huge refactor with 1 new feature (tool details can
be suppressed)
Previously we use the name "Command" to describe "Tools", this unifies
all the internal language and simplifies the code.
We also amended the persona UI to use less DToggles which aligns
with our design guidelines.
Co-authored-by: Martin Brennan <martin@discourse.org>
* Well, it was quite a journey but now tools have "context" which
can be critical for the stuff they generate
This entire change was so Dall E and Artist generate images in the correct context
* FIX: improve error handling around image generation
- also corrects image markdown and clarifies code
* fix spec
This commit introduces a new feature for AI Personas called the "Question Consolidator LLM". The purpose of the Question Consolidator is to consolidate a user's latest question into a self-contained, context-rich question before querying the vector database for relevant fragments. This helps improve the quality and relevance of the retrieved fragments.
Previous to this change we used the last 10 interactions, this is not ideal cause the RAG would "lock on" to an answer.
EG:
- User: how many cars are there in europe
- Model: detailed answer about cars in europe including the term car and vehicle many times
- User: Nice, what about trains are there in the US
In the above example "trains" and "US" becomes very low signal given there are pages and pages talking about cars and europe. This mean retrieval is sub optimal.
Instead, we pass the history to the "question consolidator", it would simply consolidate the question to "How many trains are there in the United States", which would make it fare easier for the vector db to find relevant content.
The llm used for question consolidator can often be less powerful than the model you are talking to, we recommend using lighter weight and fast models cause the task is very simple. This is configurable from the persona ui.
This PR also removes support for {uploads} placeholder, this is too complicated to get right and we want freedom to shift RAG implementation.
Key changes:
1. Added a new `question_consolidator_llm` column to the `ai_personas` table to store the LLM model used for question consolidation.
2. Implemented the `QuestionConsolidator` module which handles the logic for consolidating the user's latest question. It extracts the relevant user and model messages from the conversation history, truncates them if needed to fit within the token limit, and generates a consolidated question prompt.
3. Updated the `Persona` class to use the Question Consolidator LLM (if configured) when crafting the RAG fragments prompt. It passes the conversation context to the consolidator to generate a self-contained question.
4. Added UI elements in the AI Persona editor to allow selecting the Question Consolidator LLM. Also made some UI tweaks to conditionally show/hide certain options based on persona configuration.
5. Wrote unit tests for the QuestionConsolidator module and updated existing persona tests to cover the new functionality.
This feature enables AI Personas to better understand the context and intent behind a user's question by consolidating the conversation history into a single, focused question. This can lead to more relevant and accurate responses from the AI assistant.
* FEATURE: allow tuning of RAG generation
- change chunking to be token based vs char based (which is more accurate)
- allow control over overlap / tokens per chunk and conversation snippets inserted
- UI to control new settings
* improve ui a bit
* fix various reindex issues
* reduce concurrency
* try ultra low queue ... concurrency 1 is too slow.
This PR lets you associate uploads to an AI persona, which we'll split and generate embeddings from. When building the system prompt to get a bot reply, we'll do a similarity search followed by a re-ranking (if available). This will let us find the most relevant fragments from the body of knowledge you associated with the persona, resulting in better, more informed responses.
For now, we'll only allow plain-text files, but this will change in the future.
Commits:
* FEATURE: RAG embeddings for the AI Bot
This first commit introduces a UI where admins can upload text files, which we'll store, split into fragments,
and generate embeddings of. In a next commit, we'll use those to give the bot additional information during
conversations.
* Basic asymmetric similarity search to provide guidance in system prompt
* Fix tests and lint
* Apply reranker to fragments
* Uploads filter, css adjustments and file validations
* Add placeholder for rag fragments
* Update annotations
Adds support for "name" on functions which can be used for tool calls
For function calls we need to keep track of id/name and previously
we only supported either
Also attempts to improve sql helper
Introduces a new AI Bot persona called 'GitHub Helper' which is specialized in assisting with GitHub-related tasks and questions. It includes the following key changes:
- Implements the GitHub Helper persona class with its system prompt and available tools
- Adds three new AI Bot tools for GitHub interactions:
- github_file_content: Retrieves content of files from a GitHub repository
- github_pull_request_diff: Retrieves the diff for a GitHub pull request
- github_search_code: Searches for code in a GitHub repository
- Updates the AI Bot dialects to support the new GitHub tools
- Implements multiple function calls for standard tool dialect
* FIX: support multiple tool calls
Prior to this change we had a hard limit of 1 tool call per llm
round trip. This meant you could not google multiple things at
once or perform searches across two tools.
Also:
- Hint when Google stops working
- Log topic_id / post_id when performing completions
* Also track id for title
This persona searches Discourse Meta for help with Discourse and
points users at relevant posts.
It is somewhat similar to using "Forum Helper" on meta, with the
notable difference that we can not lean on semantic search so using
some prompt engineering we try to keep it simple.
* REFACTOR: Represent generic prompts with an Object.
* Adds a bit more validation for clarity
* Rewrite bot title prompt and fix quirk handling
---------
Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
DALL E command accepts an Array as a tool argument, this was not
parsed correctly by the invoker leading to errors generating
images with DALL E
Side quest ... don't use update! it calls validations and will now
fail due to email validation
* DEV: AI bot migration to the Llm pattern.
We added tool and conversation context support to the Llm service in discourse-ai#366, meaning we met all the conditions to migrate this module.
This PR migrates to the new pattern, meaning adding a new bot now requires minimal effort as long as the service supports it. On top of this, we introduce the concept of a "Playground" to separate the PM-specific bits from the completion, allowing us to use the bot in other contexts like chat in the future. Commands are called tools, and we simplified all the placeholder logic to perform updates in a single place, making the flow more one-wayish.
* Followup fixes based on testing
* Cleanup unused inference code
* FIX: text-based tools could be in the middle of a sentence
* GPT-4-turbo support
* Use new LLM API
Previous to this changeset we used a custom system for tools/command
support for Anthropic.
We defined commands by using !command as a signal to execute it
Following Anthropic Claude 2.1, there is an official supported syntax (beta)
for tools execution.
eg:
```
+ <function_calls>
+ <invoke>
+ <tool_name>image</tool_name>
+ <parameters>
+ <prompts>
+ [
+ "an oil painting",
+ "a cute fluffy orange",
+ "3 apple's",
+ "a cat"
+ ]
+ </prompts>
+ </parameters>
+ </invoke>
+ </function_calls>
```
This implements the spec per Anthropic, it should be stable enough
to also work on other LLMs.
Keep in mind that OpenAI is not impacted here at all, as it has its
own custom system for function calls.
Additionally:
- Fixes the title system prompt so it works with latest Anthropic
- Uses new spec for "system" messages by Anthropic
- Tweak forum helper persona to guide Anthropic a tiny be better
Overall results are pretty awesome and Anthropic Claude performs
really well now on Discourse
Introduces a UI to manage customizable personas (admin only feature)
Part of the change was some extensive internal refactoring:
- AIBot now has a persona set in the constructor, once set it never changes
- Command now takes in bot as a constructor param, so it has the correct persona and is not generating AIBot objects on the fly
- Added a .prettierignore file, due to the way ALE is configured in nvim it is a pre-req for prettier to work
- Adds a bunch of validations on the AIPersona model, system personas (artist/creative etc...) are all seeded. We now ensure
- name uniqueness, and only allow certain properties to be touched for system personas.
- (JS note) the client side design takes advantage of nested routes, the parent route for personas gets all the personas via this.store.findAll("ai-persona") then child routes simply reach into this model to find a particular persona.
- (JS note) data is sideloaded into the ai-persona model the meta property supplied from the controller, resultSetMeta
- This removes ai_bot_enabled_personas and ai_bot_enabled_chat_commands, both should be controlled from the UI on a per persona basis
- Fixes a long standing bug in token accounting ... we were doing to_json.length instead of to_json.to_s.length
- Amended it so {commands} are always inserted at the end unconditionally, no need to add it to the template of the system message as it just confuses things
- Adds a concept of required_commands to stock personas, these are commands that must be configured for this stock persona to show up.
- Refactored tests so we stop requiring inference_stubs, it was very confusing to need it, added to plugin.rb for now which at least is clearer
- Migrates the persona selector to gjs
---------
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
Co-authored-by: Martin Brennan <martin@discourse.org>
- New AiPersona model which can store custom personas
- Persona are restricted via group security
- They can contain custom system messages
- They can support a list of commands optionally
To avoid expensive DB calls in the serializer a Multisite friendly Hash was introduced (which can be expired on transaction commit)
This adds a new creative persona that has access to the underlying
model and no external integrations.
It allows people to use Claude/GPT models in a Discourse agnostic
way.
* FIX: Made bot more robust
This is a collection of small fixes
- Display "Searching for: ..." while searching instead of showing found 0 results.
- Only allow 5 commands in lang chain - 6 feels like too much
- On the 5th command stop informing the engine about functions, so it is forced to complete
- Add another 30 tokens of buffer and explain why
- Typo in command prompt
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
The researcher persona has access to Google and can perform
various internet research tasks. At the moment it can not read
web pages, but that is under consideration
Also adds ai_bot_enabled_personas so admins can tweak which stock
personas are enabled.
The new persona has a full listing of all site settings and is
able to get context for each setting.
This means you can ask it to search through settings for something
relevant.
Security wise there is no access to actual configuration of settings
just to the names / description and implementation.
Previously this was part of the forum helper persona however it
just clashes too much with other behaviors, isolating it makes
it far more powerful.
* sneaking this one in, user_emails is a non obvious table in our
structure.
usually one would assume users has emails so the clarifies a bit
better. plus it is a very common table to hit.
This splits out a bunch of code that used to live inside bots
into a dedicated concept called a Persona.
This allows us to start playing with multiple personas for the bot
Ships with:
artist - for making images
sql helper - for helping with data explorer
general - for everything and anything
Also includes a few fixes that make the generic LLM function implementation more robust