* FEATURE: add support for new OpenAI embedding models
This adds support for just released text_embedding_3_small and large
Note, we have not yet implemented truncation support which is a
new API feature. (triggered using dimensions)
* Tiny side fix, recalc bots when ai is enabled or disabled
* FIX: downsample to 2000 items per vector which is a pgvector limitation
This PR introduces 3 things:
1. Fake bot that can be used on local so you can test LLMs, to enable on dev use:
SiteSetting.ai_bot_enabled_chat_bots = "fake"
2. More elegant smooth streaming of progress on LLM completion
This leans on JavaScript to buffer and trickle llm results through. It also amends it so the progress dot is much
more consistently rendered
3. It fixes the Claude dialect
Claude needs newlines **exactly** at the right spot, amended so it is happy
---------
Co-authored-by: Martin Brennan <martin@discourse.org>
It also corrects the syntax around tool support, which was wrong.
Gemini doesn't want us to include messages about previous tool invocations, so I had to shuffle around some code to send the response it generated from those invocations instead. For this, I created the "multi_turn" context, which bundles all the context involved in the interaction.
* DEV: AI bot migration to the Llm pattern.
We added tool and conversation context support to the Llm service in discourse-ai#366, meaning we met all the conditions to migrate this module.
This PR migrates to the new pattern, meaning adding a new bot now requires minimal effort as long as the service supports it. On top of this, we introduce the concept of a "Playground" to separate the PM-specific bits from the completion, allowing us to use the bot in other contexts like chat in the future. Commands are called tools, and we simplified all the placeholder logic to perform updates in a single place, making the flow more one-wayish.
* Followup fixes based on testing
* Cleanup unused inference code
* FIX: text-based tools could be in the middle of a sentence
* GPT-4-turbo support
* Use new LLM API
Keep in mind:
- GPT-4 is only going to be fully released next year - so this hardcodes preview model for now
- Fixes streaming bugs which became a big problem with GPT-4 turbo
- Adds Azure endpoing for turbo as well
Co-authored-by: Martin Brennan <martin@discourse.org>
Previous to this change we relied on explicit loading for a files in Discourse AI.
This had a few downsides:
- Busywork whenever you add a file (an extra require relative)
- We were not keeping to conventions internally ... some places were OpenAI others are OpenAi
- Autoloader did not work which lead to lots of full application broken reloads when developing.
This moves all of DiscourseAI into a Zeitwerk compatible structure.
It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there)
To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi
Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections
Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.