- Added Cohere Command models (Command, Command Light, Command R, Command R Plus) to the available model list
- Added a new site setting `ai_cohere_api_key` for configuring the Cohere API key
- Implemented a new `DiscourseAi::Completions::Endpoints::Cohere` class to handle interactions with the Cohere API, including:
- Translating request parameters to the Cohere API format
- Parsing Cohere API responses
- Supporting streaming and non-streaming completions
- Supporting "tools" which allow the model to call back to discourse to lookup additional information
- Implemented a new `DiscourseAi::Completions::Dialects::Command` class to translate between the generic Discourse AI prompt format and the Cohere Command format
- Added specs covering the new Cohere endpoint and dialect classes
- Updated `DiscourseAi::AiBot::Bot.guess_model` to map the new Cohere model to the appropriate bot user
In summary, this PR adds support for using the Cohere Command family of models with the Discourse AI plugin. It handles configuring API keys, making requests to the Cohere API, and translating between Discourse's generic prompt format and Cohere's specific format. Thorough test coverage was added for the new functionality.
BAAI/bge-m3 is an interesting model, that is multilingual and with a
context size of 8192. Even with a 16x larger context, it's only 4x slower
to compute it's embeddings on the worst case scenario.
Also includes a minor refactor of the rake task, including setting model
and concurrency levels when running the backfill task.
it is close in performance to GPT 4 at a fraction of the cost,
nice to add it to the mix.
Also improves a test case to simulate streaming, I am hunting for
the "calls" word that is jumping into function calls and can't quite
find it.
This PR lets you associate uploads to an AI persona, which we'll split and generate embeddings from. When building the system prompt to get a bot reply, we'll do a similarity search followed by a re-ranking (if available). This will let us find the most relevant fragments from the body of knowledge you associated with the persona, resulting in better, more informed responses.
For now, we'll only allow plain-text files, but this will change in the future.
Commits:
* FEATURE: RAG embeddings for the AI Bot
This first commit introduces a UI where admins can upload text files, which we'll store, split into fragments,
and generate embeddings of. In a next commit, we'll use those to give the bot additional information during
conversations.
* Basic asymmetric similarity search to provide guidance in system prompt
* Fix tests and lint
* Apply reranker to fragments
* Uploads filter, css adjustments and file validations
* Add placeholder for rag fragments
* Update annotations
This pull request makes several improvements and additions to the GitHub-related tools and personas in the `discourse-ai` repository:
1. It adds the `WebBrowser` tool to the `Researcher` persona, allowing the AI to visit web pages, retrieve HTML content, extract the main content, and convert it to plain text.
2. It updates the `GithubFileContent`, `GithubPullRequestDiff`, and `GithubSearchCode` tools to handle HTTP responses more robustly (introducing size limits).
3. It refactors the `send_http_request` method in the `Tool` class to follow redirects when specified, and to read the response body in chunks to avoid memory issues with large responses. (only for WebBrowser)
4. It updates the system prompt for the `Researcher` persona to provide more detailed guidance on when to use Google search vs web browsing, and how to optimize tool usage and reduce redundant requests.
5. It adds a new `web_browser_spec.rb` file with tests for the `WebBrowser` tool, covering various scenarios like handling different HTML structures and following redirects.
This commit adds the ability to enable vision for AI personas, allowing them to understand images that are posted in the conversation.
For personas with vision enabled, any images the user has posted will be resized to be within the configured max_pixels limit, base64 encoded and included in the prompt sent to the AI provider.
The persona editor allows enabling/disabling vision and has a dropdown to select the max supported image size (low, medium, high). Vision is disabled by default.
This initial vision support has been tested and implemented with Anthropic's claude-3 models which accept images in a special format as part of the prompt.
Other integrations will need to be updated to support images.
Several specs were added to test the new functionality at the persona, prompt building and API layers.
- Gemini is omitted, pending API support for Gemini 1.5. Current Gemini bot is not performing well, adding images is unlikely to make it perform any better.
- Open AI is omitted, vision support on GPT-4 it limited in that the API has no tool support when images are enabled so we would need to full back to a different prompting technique, something that would add lots of complexity
---------
Co-authored-by: Martin Brennan <martin@discourse.org>
* FEATURE: allow suppression of notifications from report generation
Previously we needed to do this by hand, unfortunately this uses up
too many tokens and is very hard to discover.
New option means that we can trivially disable notifications without
needing any prompt engineering.
* URI.parse is safer, use it
* FIX: Handle unicode on tokenizer
Our fast track code broke when strings had characters who are longer in tokens than
in UTF-8.
Admins can set `DISCOURSE_AI_STRICT_TOKEN_COUNTING: true` in app.yml to ensure token counting is strict, even if slower.
Co-authored-by: wozulong <sidle.pax_0e@icloud.com>
This allows users to share a static page of an AI conversation with
the rest of the world.
By default this feature is disabled, it is enabled by turning on
ai_bot_allow_public_sharing via site settings
Precautions are taken when sharing
1. We make a carbonite copy
2. We minimize work generating page
3. We limit to 100 interactions
4. Many security checks - including disallowing if there is a mix
of users in the PM.
* Bonus commit, large PRs like this PR did not work with github tool
large objects would destroy context
Co-authored-by: Martin Brennan <martin@discourse.org>
This PR adds AI semantic search to the search pop available on every page.
It depends on several new and optional settings, like per post embeddings and a reranker model, so this is an experimental endeavour.
---------
Co-authored-by: Rafael Silva <xfalcox@gmail.com>
Introduces a new AI Bot persona called 'GitHub Helper' which is specialized in assisting with GitHub-related tasks and questions. It includes the following key changes:
- Implements the GitHub Helper persona class with its system prompt and available tools
- Adds three new AI Bot tools for GitHub interactions:
- github_file_content: Retrieves content of files from a GitHub repository
- github_pull_request_diff: Retrieves the diff for a GitHub pull request
- github_search_code: Searches for code in a GitHub repository
- Updates the AI Bot dialects to support the new GitHub tools
- Implements multiple function calls for standard tool dialect
This provides new support for messages API from Claude.
It is required for latest model access.
Also corrects implementation of function calls.
* Fix message interleving
* fix broken spec
* add new models to automation
This PR adds a new feature where you can generate captions for images in the composer using AI.
---------
Co-authored-by: Rafael Silva <xfalcox@gmail.com>
This persona searches Discourse Meta for help with Discourse and
points users at relevant posts.
It is somewhat similar to using "Forum Helper" on meta, with the
notable difference that we can not lean on semantic search so using
some prompt engineering we try to keep it simple.
Affects the following settings:
ai_toxicity_groups_bypass
ai_helper_allowed_groups
ai_helper_custom_prompts_allowed_groups
post_ai_helper_allowed_groups
This turns off client: true for these group-based settings,
because there is no guarantee that the current user gets all
their group memberships serialized to the client. Better to check
server-side first.
1. Personas are now optionally mentionable, meaning that you can mention them either from public topics or PMs
- Mentioning from PMs helps "switch" persona mid conversation, meaning if you want to look up sites setting you can invoke the site setting bot, or if you want to generate an image you can invoke dall e
- Mentioning outside of PMs allows you to inject a bot reply in a topic trivially
- We also add the support for max_context_posts this allow you to limit the amount of context you feed in, which can help control costs
2. Add support for a "random picker" tool that can be used to pick random numbers
3. Clean up routing ai_personas -> ai-personas
4. Add Max Context Posts so users can control how much history a persona can consume (this is important for mentionable personas)
Co-authored-by: Martin Brennan <martin@discourse.org>
* FEATURE: allow personas to supply top_p and temperature params
Code assistance generally are more focused at a lower temperature
This amends it so SQL Helper runs at 0.2 temperature vs the more
common default across LLMs of 1.0.
Reduced temperature leads to more focused, concise and predictable
answers for the SQL Helper
* fix tests
* This is not perfect, but far better than what we do today
Instead of fishing for
1. Draft sequence
2. Draft body
We skip (2), this means the composer "only" needs 1 http request to
open, we also want to eliminate (1) but it is a bit of a trickier
core change, may figure out how to pull it off (defer it to first draft save)
Value of bot drafts < value of opening bot conversations really fast
- Allow users to supply top_p and temperature values, which means people can fine tune randomness
- Fix bad localization string
- Fix bad remapping of max tokens in gemini
- Add support for top_p as a general param to llms
- Amend system prompt so persona stops treating a user as an adversary
* UX: Validations to Llm-backed features (except AI Bot)
This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs.
Validations are:
* You must choose a model before enabling the feature.
* You must turn off the feature before setting the model to blank.
* You must configure each model settings before being able to select it.
* Add provider name to summarization options
* vLLM can technically support same models as HF
* Check we can talk to the selected model
* Check for Bedrock instead of anthropic as a site could have both creds setup
* FEATURE: add support for new OpenAI embedding models
This adds support for just released text_embedding_3_small and large
Note, we have not yet implemented truncation support which is a
new API feature. (triggered using dimensions)
* Tiny side fix, recalc bots when ai is enabled or disabled
* FIX: downsample to 2000 items per vector which is a pgvector limitation
Account properly for function calls, don't stream through <details> blocks
- Rush cooked content back to client
- Wait longer (up to 60 seconds) before giving up on streaming
- Clean up message bus channels so we don't have leftover data
- Make ai streamer much more reusable and much easier to read
- If buffer grows quickly, rush update so you are not artificially waiting
- Refine prompt interface
- Fix lost system message when prompt gets long
This allows admins to configure services with multiple backends using DNS SRV records. This PR also adds support for shared secret auth via headers for TEI and vLLM endpoints, so they are inline with the other ones.
Previous to this change it was very hard to tell if completion was
stuck or not.
This introduces a "dot" that follows the completion and starts
flashing after 5 seconds.
It also corrects the syntax around tool support, which was wrong.
Gemini doesn't want us to include messages about previous tool invocations, so I had to shuffle around some code to send the response it generated from those invocations instead. For this, I created the "multi_turn" context, which bundles all the context involved in the interaction.
* DEV: AI bot migration to the Llm pattern.
We added tool and conversation context support to the Llm service in discourse-ai#366, meaning we met all the conditions to migrate this module.
This PR migrates to the new pattern, meaning adding a new bot now requires minimal effort as long as the service supports it. On top of this, we introduce the concept of a "Playground" to separate the PM-specific bits from the completion, allowing us to use the bot in other contexts like chat in the future. Commands are called tools, and we simplified all the placeholder logic to perform updates in a single place, making the flow more one-wayish.
* Followup fixes based on testing
* Cleanup unused inference code
* FIX: text-based tools could be in the middle of a sentence
* GPT-4-turbo support
* Use new LLM API
* FEATURE: allow easy sharing of bot conversations
* Lean on new core API i
* Added system spec for copy functionality
* Update assets/javascripts/initializers/ai-bot-replies.js
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
* discourse later insted of setTimeout
* Update spec/system/ai_bot/share_spec.rb
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
* feedback from review
just check the whole payload
* remove uneeded code
* fix spec
---------
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
Introduce a Discourse Automation based periodical report. Depends on Discourse Automation.
Report works best with very large context language models such as GPT-4-Turbo and Claude 2.
- Introduces final_insts to generic llm format, for claude to work best it is better to guide the last assistant message (we should add this to other spots as well)
- Adds GPT-4 turbo support to generic llm interface
We were limiting to 20 results unconditionally cause we had to make
sure search always fit in an 8k context window.
Models such as GPT 3.5 Turbo (16k) and GPT 4 Turbo / Claude 2.1 (over 150k)
allow us to return a lot more results.
This means we have a much richer understanding cause context is far
larger.
This also allows a persona to tweak this number, in some cases admin
may want to be conservative and save on tokens by limiting results
This also tweaks the `limit` param which GPT-4 liked to set to tell
model only to use it when it needs to (and describes default behavior)
Keep in mind:
- GPT-4 is only going to be fully released next year - so this hardcodes preview model for now
- Fixes streaming bugs which became a big problem with GPT-4 turbo
- Adds Azure endpoing for turbo as well
Co-authored-by: Martin Brennan <martin@discourse.org>
Personas now support providing options for commands.
This PR introduces a single option "base_query" for the SearchCommand. When supplied all searches the persona will perform will also include the pre-supplied filter.
This can allow personas to search a subset of the forum (such as documentation)
This system is extensible we can add options to any command trivially.
* FEATURE: User sentiment on profile summary page
This introduces a new user stat in a user profile summary page.
It will show either neutral/positive/negative according to the dominant
sentiment in the user last interactions.
The user-stat widget is only rendered for staff.
Co-authored-by: Keegan George <kgeorge13@gmail.com>
Previous to this change we relied on explicit loading for a files in Discourse AI.
This had a few downsides:
- Busywork whenever you add a file (an extra require relative)
- We were not keeping to conventions internally ... some places were OpenAI others are OpenAi
- Autoloader did not work which lead to lots of full application broken reloads when developing.
This moves all of DiscourseAI into a Zeitwerk compatible structure.
It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there)
To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi
Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections
Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.
* FEATURE: Azure OpenAI support for DALL*E 3
Previous to this there was no way to add an inference endpoint for
DALL*E on Azure cause it requires custom URLs
Also:
- On save, when editing a persona it would revert priority and enabled
- More forgiving parsing in command framework for array function calls
- By default generate HD images - they tend to be a bit better
- Improve DALL*E prompt which was getting very annoying and always echoing what it is about to do
- Add a bit of a sleep between retries on image generation
- Fix error handling in image_command
* FIX: no selected persona should pick first prioritized one
Previously we were looking at `.personaId` but there is only an
id attribute so it failed
* FEATURE: new DALL-E-3 persona
This persona generates images using DALL-E-3 API and is enabled
by default
Keep in mind that we are still waiting on seeds/gen_id so we can
not retain style consistently between turns.
This will change as soon as a new Open AI API provides the missing
parameters
Co-authored-by: Martin Brennan <martin@discourse.org>
Introduces a UI to manage customizable personas (admin only feature)
Part of the change was some extensive internal refactoring:
- AIBot now has a persona set in the constructor, once set it never changes
- Command now takes in bot as a constructor param, so it has the correct persona and is not generating AIBot objects on the fly
- Added a .prettierignore file, due to the way ALE is configured in nvim it is a pre-req for prettier to work
- Adds a bunch of validations on the AIPersona model, system personas (artist/creative etc...) are all seeded. We now ensure
- name uniqueness, and only allow certain properties to be touched for system personas.
- (JS note) the client side design takes advantage of nested routes, the parent route for personas gets all the personas via this.store.findAll("ai-persona") then child routes simply reach into this model to find a particular persona.
- (JS note) data is sideloaded into the ai-persona model the meta property supplied from the controller, resultSetMeta
- This removes ai_bot_enabled_personas and ai_bot_enabled_chat_commands, both should be controlled from the UI on a per persona basis
- Fixes a long standing bug in token accounting ... we were doing to_json.length instead of to_json.to_s.length
- Amended it so {commands} are always inserted at the end unconditionally, no need to add it to the template of the system message as it just confuses things
- Adds a concept of required_commands to stock personas, these are commands that must be configured for this stock persona to show up.
- Refactored tests so we stop requiring inference_stubs, it was very confusing to need it, added to plugin.rb for now which at least is clearer
- Migrates the persona selector to gjs
---------
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
Co-authored-by: Martin Brennan <martin@discourse.org>
This PR aims to clarify sentiment reports by replacing averages with a count of posts that have one of their values above a threshold (60), meaning we have some level of confidence they are, in fact, positive or negative.
Same thing happen with post emotions, with the difference that a post can have multiple values above it (30). Additionally, we dropped the "Neutral" axis.
We also reworded the tooltip next to each report title, and added an early return to signal we have no data available instead of displaying an empty chart.