When you trim a prompt we never want to have a state where there
is a "tool" reply without a corresponding tool call, it makes no
sense
Also
- GPT-4-Turbo is 128k, fix that
- Claude was not preserving username in prompt
- We were throwing away unicode usernames instead of adding to
message
Account properly for function calls, don't stream through <details> blocks
- Rush cooked content back to client
- Wait longer (up to 60 seconds) before giving up on streaming
- Clean up message bus channels so we don't have leftover data
- Make ai streamer much more reusable and much easier to read
- If buffer grows quickly, rush update so you are not artificially waiting
- Refine prompt interface
- Fix lost system message when prompt gets long
* REFACTOR: Represent generic prompts with an Object.
* Adds a bit more validation for clarity
* Rewrite bot title prompt and fix quirk handling
---------
Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
This PR introduces 3 things:
1. Fake bot that can be used on local so you can test LLMs, to enable on dev use:
SiteSetting.ai_bot_enabled_chat_bots = "fake"
2. More elegant smooth streaming of progress on LLM completion
This leans on JavaScript to buffer and trickle llm results through. It also amends it so the progress dot is much
more consistently rendered
3. It fixes the Claude dialect
Claude needs newlines **exactly** at the right spot, amended so it is happy
---------
Co-authored-by: Martin Brennan <martin@discourse.org>
Previous to this change it was very hard to tell if completion was
stuck or not.
This introduces a "dot" that follows the completion and starts
flashing after 5 seconds.
* FIX: improve bot behavior
- Provide more information to Gemini context post function execution
- Use system prompts for Claude (fixes Dall E)
- Ensure Assistant is properly separated
- Teach Claude to return arrays in JSON vs XML
Also refactors tests so we do not copy tool preamble everywhere
* System msg is claude-2 only. fix typo
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
We thought Azure's latest API version didn't have tool support yet, but I didn't understand it was complaining about a required field in the tool call message.
* FIX: don't include <details> in context
We need to be careful adding <details> into context of conversations
it can cause LLMs to hallucinate results
* Fix Gemini multi-turn ctx flattening
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
DALL E command accepts an Array as a tool argument, this was not
parsed correctly by the invoker leading to errors generating
images with DALL E
Side quest ... don't use update! it calls validations and will now
fail due to email validation
It also corrects the syntax around tool support, which was wrong.
Gemini doesn't want us to include messages about previous tool invocations, so I had to shuffle around some code to send the response it generated from those invocations instead. For this, I created the "multi_turn" context, which bundles all the context involved in the interaction.
* DEV: AI bot migration to the Llm pattern.
We added tool and conversation context support to the Llm service in discourse-ai#366, meaning we met all the conditions to migrate this module.
This PR migrates to the new pattern, meaning adding a new bot now requires minimal effort as long as the service supports it. On top of this, we introduce the concept of a "Playground" to separate the PM-specific bits from the completion, allowing us to use the bot in other contexts like chat in the future. Commands are called tools, and we simplified all the placeholder logic to perform updates in a single place, making the flow more one-wayish.
* Followup fixes based on testing
* Cleanup unused inference code
* FIX: text-based tools could be in the middle of a sentence
* GPT-4-turbo support
* Use new LLM API
* FIX: AI helper not working correctly with mixtral
This PR introduces a new function on the generic llm called #generate
This will replace the implementation of completion!
#generate introduces a new way to pass temperature, max_tokens and stop_sequences
Then LLM implementers need to implement #normalize_model_params to
ensure the generic names match the LLM specific endpoint
This also adds temperature and stop_sequences to completion_prompts
this allows for much more robust completion prompts
* port everything over to #generate
* Fix translation
- On anthropic this no longer throws random "This is your translation:"
- On mixtral this actually works
* fix markdown table generation as well
Previously endpoint/base would `+=` decoded_chunk to leftover
This could lead to cases where the leftover buffer had duplicate
previously processed data
Fix ensures we properly skip previously decoded data.
Introduce a Discourse Automation based periodical report. Depends on Discourse Automation.
Report works best with very large context language models such as GPT-4-Turbo and Claude 2.
- Introduces final_insts to generic llm format, for claude to work best it is better to guide the last assistant message (we should add this to other spots as well)
- Adds GPT-4 turbo support to generic llm interface
This PR adds tool support to available LLMs. We'll buffer tool invocations and return them instead of making users of this service parse the response.
It also adds support for conversation context in the generic prompt. It includes bot messages, user messages, and tool invocations, which we'll trim to make sure it doesn't exceed the prompt limit, then translate them to the correct dialect.
Finally, It adds some buffering when reading chunks to handle cases when streaming is extremely slow.:M
In https://github.com/discourse/discourse/pull/24740, `min_trust_to_create_topic` site setting was replaced by `create_topic_allowed_groups`. This PR replaces the former, deprecated one, with the latter.
This is somewhat experimental, but the context of likes/view/username
can help the llm find out what content is more important or even
common users that produce great content
This inflates the amount of tokens somewhat, but given it is all numbers
and search columns titles are only included once this is not severe
We were limiting to 20 results unconditionally cause we had to make
sure search always fit in an 8k context window.
Models such as GPT 3.5 Turbo (16k) and GPT 4 Turbo / Claude 2.1 (over 150k)
allow us to return a lot more results.
This means we have a much richer understanding cause context is far
larger.
This also allows a persona to tweak this number, in some cases admin
may want to be conservative and save on tokens by limiting results
This also tweaks the `limit` param which GPT-4 liked to set to tell
model only to use it when it needs to (and describes default behavior)
Personas now support providing options for commands.
This PR introduces a single option "base_query" for the SearchCommand. When supplied all searches the persona will perform will also include the pre-supplied filter.
This can allow personas to search a subset of the forum (such as documentation)
This system is extensible we can add options to any command trivially.
c.f. de983796e1b66aa2ab039a4fb6e32cec8a65a098
There will soon be additional login_required checks
for Guardian, and the intent of many checks by automated
systems is better fulfilled by using BasicUser, which
simulates a logged in TL0 forum user, rather than an
anon user.
Previous to this change we relied on explicit loading for a files in Discourse AI.
This had a few downsides:
- Busywork whenever you add a file (an extra require relative)
- We were not keeping to conventions internally ... some places were OpenAI others are OpenAi
- Autoloader did not work which lead to lots of full application broken reloads when developing.
This moves all of DiscourseAI into a Zeitwerk compatible structure.
It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there)
To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi
Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections
Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.
We must ensure we can isolate titles, and the models sometimes ignore the example we give them.
Additionally, anons can generate HyDE posts, so we need to check if user is nil when attempting to log requests.
* FEATURE: Azure OpenAI support for DALL*E 3
Previous to this there was no way to add an inference endpoint for
DALL*E on Azure cause it requires custom URLs
Also:
- On save, when editing a persona it would revert priority and enabled
- More forgiving parsing in command framework for array function calls
- By default generate HD images - they tend to be a bit better
- Improve DALL*E prompt which was getting very annoying and always echoing what it is about to do
- Add a bit of a sleep between retries on image generation
- Fix error handling in image_command