* FEATURE: allow personas to supply top_p and temperature params
Code assistance generally are more focused at a lower temperature
This amends it so SQL Helper runs at 0.2 temperature vs the more
common default across LLMs of 1.0.
Reduced temperature leads to more focused, concise and predictable
answers for the SQL Helper
* fix tests
* This is not perfect, but far better than what we do today
Instead of fishing for
1. Draft sequence
2. Draft body
We skip (2), this means the composer "only" needs 1 http request to
open, we also want to eliminate (1) but it is a bit of a trickier
core change, may figure out how to pull it off (defer it to first draft save)
Value of bot drafts < value of opening bot conversations really fast
1. on failure we were queuing a job to generate embeddings, it had the wrong params. This is both fixed and covered in a test.
2. backfill embedding in the order of bumped_at, so newest content is embedded first, cover with a test
3. add a safeguard for hidden site setting that only allows batches of 50k in an embedding job run
Previously old embeddings were updated in a random order, this changes it so we update in a consistent order
* UX: Validations to Llm-backed features (except AI Bot)
This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs.
Validations are:
* You must choose a model before enabling the feature.
* You must turn off the feature before setting the model to blank.
* You must configure each model settings before being able to select it.
* Add provider name to summarization options
* vLLM can technically support same models as HF
* Check we can talk to the selected model
* Check for Bedrock instead of anthropic as a site could have both creds setup
Account properly for function calls, don't stream through <details> blocks
- Rush cooked content back to client
- Wait longer (up to 60 seconds) before giving up on streaming
- Clean up message bus channels so we don't have leftover data
- Make ai streamer much more reusable and much easier to read
- If buffer grows quickly, rush update so you are not artificially waiting
- Refine prompt interface
- Fix lost system message when prompt gets long
* REFACTOR: Represent generic prompts with an Object.
* Adds a bit more validation for clarity
* Rewrite bot title prompt and fix quirk handling
---------
Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
Previous to this change it was very hard to tell if completion was
stuck or not.
This introduces a "dot" that follows the completion and starts
flashing after 5 seconds.
* FIX: don't include <details> in context
We need to be careful adding <details> into context of conversations
it can cause LLMs to hallucinate results
* Fix Gemini multi-turn ctx flattening
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
DALL E command accepts an Array as a tool argument, this was not
parsed correctly by the invoker leading to errors generating
images with DALL E
Side quest ... don't use update! it calls validations and will now
fail due to email validation
It also corrects the syntax around tool support, which was wrong.
Gemini doesn't want us to include messages about previous tool invocations, so I had to shuffle around some code to send the response it generated from those invocations instead. For this, I created the "multi_turn" context, which bundles all the context involved in the interaction.
* DEV: AI bot migration to the Llm pattern.
We added tool and conversation context support to the Llm service in discourse-ai#366, meaning we met all the conditions to migrate this module.
This PR migrates to the new pattern, meaning adding a new bot now requires minimal effort as long as the service supports it. On top of this, we introduce the concept of a "Playground" to separate the PM-specific bits from the completion, allowing us to use the bot in other contexts like chat in the future. Commands are called tools, and we simplified all the placeholder logic to perform updates in a single place, making the flow more one-wayish.
* Followup fixes based on testing
* Cleanup unused inference code
* FIX: text-based tools could be in the middle of a sentence
* GPT-4-turbo support
* Use new LLM API
Introduce a Discourse Automation based periodical report. Depends on Discourse Automation.
Report works best with very large context language models such as GPT-4-Turbo and Claude 2.
- Introduces final_insts to generic llm format, for claude to work best it is better to guide the last assistant message (we should add this to other spots as well)
- Adds GPT-4 turbo support to generic llm interface
In https://github.com/discourse/discourse/pull/24740, `min_trust_to_create_topic` site setting was replaced by `create_topic_allowed_groups`. This PR replaces the former, deprecated one, with the latter.
This is somewhat experimental, but the context of likes/view/username
can help the llm find out what content is more important or even
common users that produce great content
This inflates the amount of tokens somewhat, but given it is all numbers
and search columns titles are only included once this is not severe
We were limiting to 20 results unconditionally cause we had to make
sure search always fit in an 8k context window.
Models such as GPT 3.5 Turbo (16k) and GPT 4 Turbo / Claude 2.1 (over 150k)
allow us to return a lot more results.
This means we have a much richer understanding cause context is far
larger.
This also allows a persona to tweak this number, in some cases admin
may want to be conservative and save on tokens by limiting results
This also tweaks the `limit` param which GPT-4 liked to set to tell
model only to use it when it needs to (and describes default behavior)
Personas now support providing options for commands.
This PR introduces a single option "base_query" for the SearchCommand. When supplied all searches the persona will perform will also include the pre-supplied filter.
This can allow personas to search a subset of the forum (such as documentation)
This system is extensible we can add options to any command trivially.
c.f. de983796e1b66aa2ab039a4fb6e32cec8a65a098
There will soon be additional login_required checks
for Guardian, and the intent of many checks by automated
systems is better fulfilled by using BasicUser, which
simulates a logged in TL0 forum user, rather than an
anon user.
Previous to this change we relied on explicit loading for a files in Discourse AI.
This had a few downsides:
- Busywork whenever you add a file (an extra require relative)
- We were not keeping to conventions internally ... some places were OpenAI others are OpenAi
- Autoloader did not work which lead to lots of full application broken reloads when developing.
This moves all of DiscourseAI into a Zeitwerk compatible structure.
It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there)
To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi
Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections
Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.
We must ensure we can isolate titles, and the models sometimes ignore the example we give them.
Additionally, anons can generate HyDE posts, so we need to check if user is nil when attempting to log requests.
* FEATURE: Azure OpenAI support for DALL*E 3
Previous to this there was no way to add an inference endpoint for
DALL*E on Azure cause it requires custom URLs
Also:
- On save, when editing a persona it would revert priority and enabled
- More forgiving parsing in command framework for array function calls
- By default generate HD images - they tend to be a bit better
- Improve DALL*E prompt which was getting very annoying and always echoing what it is about to do
- Add a bit of a sleep between retries on image generation
- Fix error handling in image_command
* FIX: no selected persona should pick first prioritized one
Previously we were looking at `.personaId` but there is only an
id attribute so it failed
* FEATURE: new DALL-E-3 persona
This persona generates images using DALL-E-3 API and is enabled
by default
Keep in mind that we are still waiting on seeds/gen_id so we can
not retain style consistently between turns.
This will change as soon as a new Open AI API provides the missing
parameters
Co-authored-by: Martin Brennan <martin@discourse.org>
Previous to this changeset we used a custom system for tools/command
support for Anthropic.
We defined commands by using !command as a signal to execute it
Following Anthropic Claude 2.1, there is an official supported syntax (beta)
for tools execution.
eg:
```
+ <function_calls>
+ <invoke>
+ <tool_name>image</tool_name>
+ <parameters>
+ <prompts>
+ [
+ "an oil painting",
+ "a cute fluffy orange",
+ "3 apple's",
+ "a cat"
+ ]
+ </prompts>
+ </parameters>
+ </invoke>
+ </function_calls>
```
This implements the spec per Anthropic, it should be stable enough
to also work on other LLMs.
Keep in mind that OpenAI is not impacted here at all, as it has its
own custom system for function calls.
Additionally:
- Fixes the title system prompt so it works with latest Anthropic
- Uses new spec for "system" messages by Anthropic
- Tweak forum helper persona to guide Anthropic a tiny be better
Overall results are pretty awesome and Anthropic Claude performs
really well now on Discourse
* DEV: One LLM abstraction to rule them all
* REFACTOR: HyDE search uses new LLM abstraction
* REFACTOR: Summarization uses the LLM abstraction
* Updated documentation and made small fixes. Remove Bedrock claude-2 restriction
People tend to keep to 1 persona when working with the bot,
this adds local browser memory for the last persona you interacted
with so you do not need to select it over and over again.
This is per browser, not per user memory.
Also... clean up tests so they do not need to require stubs which
were breaking the build
---------
Co-authored-by: Martin Brennan <martin@discourse.org>
Introduces a UI to manage customizable personas (admin only feature)
Part of the change was some extensive internal refactoring:
- AIBot now has a persona set in the constructor, once set it never changes
- Command now takes in bot as a constructor param, so it has the correct persona and is not generating AIBot objects on the fly
- Added a .prettierignore file, due to the way ALE is configured in nvim it is a pre-req for prettier to work
- Adds a bunch of validations on the AIPersona model, system personas (artist/creative etc...) are all seeded. We now ensure
- name uniqueness, and only allow certain properties to be touched for system personas.
- (JS note) the client side design takes advantage of nested routes, the parent route for personas gets all the personas via this.store.findAll("ai-persona") then child routes simply reach into this model to find a particular persona.
- (JS note) data is sideloaded into the ai-persona model the meta property supplied from the controller, resultSetMeta
- This removes ai_bot_enabled_personas and ai_bot_enabled_chat_commands, both should be controlled from the UI on a per persona basis
- Fixes a long standing bug in token accounting ... we were doing to_json.length instead of to_json.to_s.length
- Amended it so {commands} are always inserted at the end unconditionally, no need to add it to the template of the system message as it just confuses things
- Adds a concept of required_commands to stock personas, these are commands that must be configured for this stock persona to show up.
- Refactored tests so we stop requiring inference_stubs, it was very confusing to need it, added to plugin.rb for now which at least is clearer
- Migrates the persona selector to gjs
---------
Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
Co-authored-by: Martin Brennan <martin@discourse.org>
- New AiPersona model which can store custom personas
- Persona are restricted via group security
- They can contain custom system messages
- They can support a list of commands optionally
To avoid expensive DB calls in the serializer a Multisite friendly Hash was introduced (which can be expired on transaction commit)
This PR aims to clarify sentiment reports by replacing averages with a count of posts that have one of their values above a threshold (60), meaning we have some level of confidence they are, in fact, positive or negative.
Same thing happen with post emotions, with the difference that a post can have multiple values above it (30). Additionally, we dropped the "Neutral" axis.
We also reworded the tooltip next to each report title, and added an early return to signal we have no data available instead of displaying an empty chart.