Account properly for function calls, don't stream through <details> blocks
- Rush cooked content back to client
- Wait longer (up to 60 seconds) before giving up on streaming
- Clean up message bus channels so we don't have leftover data
- Make ai streamer much more reusable and much easier to read
- If buffer grows quickly, rush update so you are not artificially waiting
- Refine prompt interface
- Fix lost system message when prompt gets long
This allows admins to configure services with multiple backends using DNS SRV records. This PR also adds support for shared secret auth via headers for TEI and vLLM endpoints, so they are inline with the other ones.
Previous to this change it was very hard to tell if completion was
stuck or not.
This introduces a "dot" that follows the completion and starts
flashing after 5 seconds.
It also corrects the syntax around tool support, which was wrong.
Gemini doesn't want us to include messages about previous tool invocations, so I had to shuffle around some code to send the response it generated from those invocations instead. For this, I created the "multi_turn" context, which bundles all the context involved in the interaction.
* DEV: AI bot migration to the Llm pattern.
We added tool and conversation context support to the Llm service in discourse-ai#366, meaning we met all the conditions to migrate this module.
This PR migrates to the new pattern, meaning adding a new bot now requires minimal effort as long as the service supports it. On top of this, we introduce the concept of a "Playground" to separate the PM-specific bits from the completion, allowing us to use the bot in other contexts like chat in the future. Commands are called tools, and we simplified all the placeholder logic to perform updates in a single place, making the flow more one-wayish.
* Followup fixes based on testing
* Cleanup unused inference code
* FIX: text-based tools could be in the middle of a sentence
* GPT-4-turbo support
* Use new LLM API
* FEATURE: allow easy sharing of bot conversations
* Lean on new core API i
* Added system spec for copy functionality
* Update assets/javascripts/initializers/ai-bot-replies.js
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
* discourse later insted of setTimeout
* Update spec/system/ai_bot/share_spec.rb
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
* feedback from review
just check the whole payload
* remove uneeded code
* fix spec
---------
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
Introduce a Discourse Automation based periodical report. Depends on Discourse Automation.
Report works best with very large context language models such as GPT-4-Turbo and Claude 2.
- Introduces final_insts to generic llm format, for claude to work best it is better to guide the last assistant message (we should add this to other spots as well)
- Adds GPT-4 turbo support to generic llm interface
We were limiting to 20 results unconditionally cause we had to make
sure search always fit in an 8k context window.
Models such as GPT 3.5 Turbo (16k) and GPT 4 Turbo / Claude 2.1 (over 150k)
allow us to return a lot more results.
This means we have a much richer understanding cause context is far
larger.
This also allows a persona to tweak this number, in some cases admin
may want to be conservative and save on tokens by limiting results
This also tweaks the `limit` param which GPT-4 liked to set to tell
model only to use it when it needs to (and describes default behavior)
Keep in mind:
- GPT-4 is only going to be fully released next year - so this hardcodes preview model for now
- Fixes streaming bugs which became a big problem with GPT-4 turbo
- Adds Azure endpoing for turbo as well
Co-authored-by: Martin Brennan <martin@discourse.org>
Personas now support providing options for commands.
This PR introduces a single option "base_query" for the SearchCommand. When supplied all searches the persona will perform will also include the pre-supplied filter.
This can allow personas to search a subset of the forum (such as documentation)
This system is extensible we can add options to any command trivially.
* FEATURE: User sentiment on profile summary page
This introduces a new user stat in a user profile summary page.
It will show either neutral/positive/negative according to the dominant
sentiment in the user last interactions.
The user-stat widget is only rendered for staff.
Co-authored-by: Keegan George <kgeorge13@gmail.com>
Previous to this change we relied on explicit loading for a files in Discourse AI.
This had a few downsides:
- Busywork whenever you add a file (an extra require relative)
- We were not keeping to conventions internally ... some places were OpenAI others are OpenAi
- Autoloader did not work which lead to lots of full application broken reloads when developing.
This moves all of DiscourseAI into a Zeitwerk compatible structure.
It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there)
To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi
Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections
Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.
* FEATURE: Azure OpenAI support for DALL*E 3
Previous to this there was no way to add an inference endpoint for
DALL*E on Azure cause it requires custom URLs
Also:
- On save, when editing a persona it would revert priority and enabled
- More forgiving parsing in command framework for array function calls
- By default generate HD images - they tend to be a bit better
- Improve DALL*E prompt which was getting very annoying and always echoing what it is about to do
- Add a bit of a sleep between retries on image generation
- Fix error handling in image_command
* FIX: no selected persona should pick first prioritized one
Previously we were looking at `.personaId` but there is only an
id attribute so it failed
* FEATURE: new DALL-E-3 persona
This persona generates images using DALL-E-3 API and is enabled
by default
Keep in mind that we are still waiting on seeds/gen_id so we can
not retain style consistently between turns.
This will change as soon as a new Open AI API provides the missing
parameters
Co-authored-by: Martin Brennan <martin@discourse.org>
Introduces a UI to manage customizable personas (admin only feature)
Part of the change was some extensive internal refactoring:
- AIBot now has a persona set in the constructor, once set it never changes
- Command now takes in bot as a constructor param, so it has the correct persona and is not generating AIBot objects on the fly
- Added a .prettierignore file, due to the way ALE is configured in nvim it is a pre-req for prettier to work
- Adds a bunch of validations on the AIPersona model, system personas (artist/creative etc...) are all seeded. We now ensure
- name uniqueness, and only allow certain properties to be touched for system personas.
- (JS note) the client side design takes advantage of nested routes, the parent route for personas gets all the personas via this.store.findAll("ai-persona") then child routes simply reach into this model to find a particular persona.
- (JS note) data is sideloaded into the ai-persona model the meta property supplied from the controller, resultSetMeta
- This removes ai_bot_enabled_personas and ai_bot_enabled_chat_commands, both should be controlled from the UI on a per persona basis
- Fixes a long standing bug in token accounting ... we were doing to_json.length instead of to_json.to_s.length
- Amended it so {commands} are always inserted at the end unconditionally, no need to add it to the template of the system message as it just confuses things
- Adds a concept of required_commands to stock personas, these are commands that must be configured for this stock persona to show up.
- Refactored tests so we stop requiring inference_stubs, it was very confusing to need it, added to plugin.rb for now which at least is clearer
- Migrates the persona selector to gjs
---------
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
Co-authored-by: Martin Brennan <martin@discourse.org>
This PR aims to clarify sentiment reports by replacing averages with a count of posts that have one of their values above a threshold (60), meaning we have some level of confidence they are, in fact, positive or negative.
Same thing happen with post emotions, with the difference that a post can have multiple values above it (30). Additionally, we dropped the "Neutral" axis.
We also reworded the tooltip next to each report title, and added an early return to signal we have no data available instead of displaying an empty chart.
This PR adds new reports for displaying information about post sentiments grouped by date and emotions group by TL.
Depends on discourse/discourse#24274
To ease the administrative burden of enabling the embeddings model, this change introduces automatic backfill when the setting is enabled. It also moves the topic visit embedding creation to a lower priority queue in sidekiq and adds an option to skip embedding computation and persistence when we match on the digest.
Adds an AI Helper function when selecting text while viewing a topic.
---------
Co-authored-by: Keegan George <kgeorge13@gmail.com>
Co-authored-by: Roman Rizzi <roman@discourse.org>
Per: https://platform.openai.com/docs/api-reference/authentication
There is an organization option which is useful for large orgs
> For users who belong to multiple organizations, you can pass a header to specify which organization is used for an API request. Usage from these API requests will count against the specified organization's subscription quota.
The new automation rule can be used to perform llm based classification and categorization of topics.
You specify a system prompt (which has %%POST%% as an input), if it returns a particular piece of text then we will apply rules such as tagging, hiding, replying or categorizing.
This can be used as a spam filter, a "oops you are in the wrong place" filter and so on.
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
If a module LLM model is set to claude-2 and the ai_bedrock variables are all present we will use AWS Bedrock instead of Antrhopic own APIs.
This is quite hacky, but will allow us to test the waters with AWS Bedrock early access with every module.
This situation of "same module, completely different API" is quite a bit far from what we had in the OpenAI/Azure separation, so it's more food for thought for when we start working on the LLM abstraction layer soon this year.
This adds a new creative persona that has access to the underlying
model and no external integrations.
It allows people to use Claude/GPT models in a Discourse agnostic
way.
We pass the text to the current LLM and ask them to generate a StableDifussion prompt.
We'll use that to generate 4 samples, temporarily creating uploads and returning their short URLs.
* FIX: Made bot more robust
This is a collection of small fixes
- Display "Searching for: ..." while searching instead of showing found 0 results.
- Only allow 5 commands in lang chain - 6 feels like too much
- On the 5th command stop informing the engine about functions, so it is forced to complete
- Add another 30 tokens of buffer and explain why
- Typo in command prompt
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
* FEATURE: HyDE-powered semantic search.
It relies on the new outlet added on discourse/discourse#23390 to display semantic search results in an unobtrusive way.
We'll use a HyDE-backed approach for semantic search, which consists on generating an hypothetical document from a given keywords, which gets transformed into a vector and used in a asymmetric similarity topic search.
This PR also reorganizes the internals to have less moving parts, maintaining one hierarchy of DAOish classes for vector-related operations like transformations and querying.
Completions and vectors created by HyDE will remain cached on Redis for now, but we could later use Postgres instead.
* Missing translation and rate limiting
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
The researcher persona has access to Google and can perform
various internet research tasks. At the moment it can not read
web pages, but that is under consideration
Previous to this change we relied on client side settings to
determine if an end user has access to the ai bot.
This meant that if a user was not aware they are a member of a
group (as it is with restricted visibility ones) they would not
see the bot button.
All checking has now moved to the server side, and tests were
added to cover.
This refactor changes it so we only include minimal data in the
system prompt which leaves us lots of tokens for specific searches
The new search command allows us to pull in settings on demand
Descriptions are include in short search results, and names only
in longer results
Also:
* In dev it is important to tell when calls are made to open ai
this adds a console log to increase awareness around token usage
* PERF: stop counting tokens so often
This changes it so we only count tokens once per response
Previously each time we heard back from open ai we would count
tokens, leading to uneeded delays
* bug fix, commands may reach in for tokenizer
* add logging to console for anthropic calls as well
* Update lib/shared/inference/openai_completions.rb
Co-authored-by: Martin Brennan <mjrbrennan@gmail.com>
Also adds ai_bot_enabled_personas so admins can tweak which stock
personas are enabled.
The new persona has a full listing of all site settings and is
able to get context for each setting.
This means you can ask it to search through settings for something
relevant.
Security wise there is no access to actual configuration of settings
just to the names / description and implementation.
Previously this was part of the forum helper persona however it
just clashes too much with other behaviors, isolating it makes
it far more powerful.
* sneaking this one in, user_emails is a non obvious table in our
structure.
usually one would assume users has emails so the clarifies a bit
better. plus it is a very common table to hit.
This splits out a bunch of code that used to live inside bots
into a dedicated concept called a Persona.
This allows us to start playing with multiple personas for the bot
Ships with:
artist - for making images
sql helper - for helping with data explorer
general - for everything and anything
Also includes a few fixes that make the generic LLM function implementation more robust
This command can be used to extract information about a discourse
site setting directly from source.
To operate it needs the rg binary in the container.
Besides updating the connector using the new tracking preference service interface, this PR fixes a bug where due to `ai_embeddings_semantic_related_topics_enabled` not having `client: true` the initializer never ran, and we didn't show the related topics list when scrolling to the bottom of a long topic.
Azure requires a single HTTP endpoint per type of completion.
The settings: `ai_openai_gpt35_16k_url` and `ai_openai_gpt4_32k_url` can be
used now to configure the extra endpoints
This amends token limit which was off a bit due to function calls and fixes
a minor JS issue where we were not testing for a property
* FEATURE: optional warning attached to all AI bot conversations
This commit introduces `ai_bot_enable_chat_warning` which can be used
to warn people prior to starting a chat with the bot.
In particular this is useful if moderators are regularly reading chat
transcripts as it sets expectations early.
By default this is disabled.
Also:
- Stops making ajax call prior to opening composer
- Hides PM title when starting a bot PM
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
- Attempt to hint reading is done by sending complete:true
- Do not include post_number in result unless it was sent in
- Rush visual feedback when a command is run (ensure we always revise)
- Include hyperlink in read command description
- Stop round tripping to GPT after image generation (speeds up images by a lot)
- Add a test for image command
This command is useful for reading a topics content. It allows us to perform
critical analysis or suggest answers.
Given 8k token limit in GPT-4 I hardcoded reading to 1500 tokens, but we can
follow up and allow larger windows on models that support more tokens.
On local testing even in this limited form this can be very useful.
* FEATURE: Add support for StableBeluga and Upstage Llama2 instruct
This means we support all models in the top3 of the Open LLM Leaderboard
Since some of those models have RoPE, we now have a setting so you can
customize the token limit depending which model you use.
Claude 1 costs the same and is less good than Claude 2. Make use of Claude
2 in all spots ...
This also fixes streaming so it uses the far more efficient streaming protocol.
* FEATURE: Embeddings to main db
This commit moves our embeddings store from an external configurable PostgreSQL
instance back into the main database. This is done to simplify the setup.
There is a migration that will try to import the external embeddings into
the main DB if it is configured and there are rows.
It removes support from embeddings models that aren't all_mpnet_base_v2 or OpenAI
text_embedding_ada_002. However it will now be easier to add new models.
It also now takes into account:
- topic title
- topic category
- topic tags
- replies (as much as the model allows)
We introduce an interface so we can eventually support multiple strategies
for handling long topics.
This PR severely damages the semantic search performance, but this is a
temporary until we can get adapt HyDE to make semantic search use the same
embeddings we have for semantic related with good performance.
Here we also have some ground work to add post level embeddings, but this
will be added in a future PR.
Please note that this PR will also block Discourse from booting / updating if
this plugin is installed and the pgvector extension isn't available on the
PostgreSQL instance Discourse uses.
* FEATURE: add ai_bot_enabled_chat commands and tune search
This allows admins to disable/enable GPT command integrations.
Also hones search results which were looping cause the result did not denote
the failure properly (it lost context)
* include more context for google command
include more context for time command
* type
The new site settings:
ai_openai_gpt35_url : distribution for GPT 16k
ai_openai_gpt4_url: distribution for GPT 4
ai_openai_embeddings_url: distribution for ada2
If untouched we will simply use OpenAI endpoints.
Azure requires 1 URL per model, OpenAI allows a single URL to serve multiple models. Hence the new settings.
```
prompt << build_message(bot_user.username, reply)
```
Would store a "cooked" prompt which is invalid, instead just store the raw
values which are later passed to build_message
Additionally:
1. Disable summary command which needs honing
2. Stop storing decorations (searched for X) in prompt which leads to straying
3. Ship username directly to model, avoiding "user: content" in prompts. This
was causing GPT to stray
* DEV: Remove the summarization feature
Instead, we'll register summarization implementations for OpenAI, Anthropic, and Discourse AI using the API defined in discourse/discourse#21813.
Core and chat will implement features on top of these implementations instead of this plugin extending them.
* Register instances that contain the model, requiring less site settings
* FEATURE: introduce a more efficient formatter
Previous formatting style was space inefficient given JSON consumes lots
of tokens, the new format is now used consistently across commands
Also fixes
- search limited to 10
- search breaking on limit: non existent directive
* Slight improvement to summarizer
Stop blowing up context with custom prompts
* ensure we include the guiding message
* correct spec
* langchain style summarizer ...
much more accurate (albeit more expensive)
* lint
This change-set connects GPT based chat with the forum it runs on. Allowing it to perform search, lookup tags and categories and summarize topics.
The integration is currently restricted to public portions of the forum.
Changes made:
- Do not run ai reply job for small actions
- Improved composable system prompt
- Trivial summarizer for topics
- Image generator
- Google command for searching via Google
- Corrected trimming of posts raw (was replacing with numbers)
- Bypass of problem specs
The feature works best with GPT-4
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
* FEATURE: Less friction for starting a conversation with an AI bot.
This PR adds a new header icon as a shortcut to start a conversation with one of our AI Bots. After clicking and selecting one from the dropdown menu, we'll open the composer with some fields already filled (recipients and title).
If you leave the title as is, we'll queue a job after five minutes to update it using a bot suggestion.
* Update assets/javascripts/initializers/ai-bot-replies.js
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
* Update assets/javascripts/initializers/ai-bot-replies.js
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
---------
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
We'll create one bot user for each available model. When listed in the `ai_bot_enabled_chat_bots` setting, they will reply.
This PR lets us use Claude-v1 in stream mode.
This module lets you chat with our GPT bot inside a PM. The bot only replies to members of the groups listed on the ai_bot_allowed_groups setting and only if you invite it to participate in the PM.
* FEATURE: Topic summarization
Summarize topics using the TopicView's "summary" filter. The UI is similar to what we do for chat, but we don't allow the user to select a timeframe.
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Depends on discourse/discourse#20915
Hooks to the full-page-search component using an experimental API and performs an assymetric similarity search using our embeddings database.
Also:
- Normalizes behavior between logged in and anon,
we only show related topics in the related topic section
- Renames "suggested" to "related" given this only exists in related section
- Adds a spec section to ensure anon does not regress
- Adds `ai_embeddings_semantic_related_topics` to limit related topics
Renamed settings:
ai_embeddings_semantic_suggested_model -> ai_embeddings_semantic_related_model
ai_embeddings_semantic_suggested_topics_enabled -> ai_embeddings_semantic_related_topics_enabled
Plugins is still in an experimental phase and not much is overidden hence
avoiding adding site setting migrations.
Co-authored-by: Krzysztof Kotlarek <kotlarek.krzysztof@gmail.com>
Allows related topics to show up for logged on users
- Introduces a new "Related Topics" block above suggested when related topics exist
- Renames `ai_embeddings_semantic_suggested_topics_anons_enabled` -> `ai_embeddings_semantic_suggested_topics_enabled` (given it is only deployed on 1 site not bothering with a migration)
- Adds an integration test to ensure data arrives correctly on the client
A prompt with multiple messages leads to better results, as the AI can learn for given examples. Alongside this change, we provide a better default proofreading prompt.
* FEATURE: Composer AI helper
This change introduces a new composer button for the group members listed in the `ai_helper_allowed_groups` site setting.
Users can use chatGPT to review, improve, or translate their posts to English.
* Add a safeguard for PMs and don't rely on parentView
This change adds two new reviewable types: ReviewableAIPost and ReviewableAIChatMessage. They have the same actions as their existing counterparts: ReviewableFlaggedPost and ReviewableChatMessage.
We'll display the model used and their accuracy when showing these flags in the review queue and adjust the latter after staff performs an action, tracking a global accuracy per existing model in a separate table.
* FEATURE: Dedicated reviewables for AI flags
* Store and adjust model accuracy
* Display accuracy in reviewable templates