Note, we perform permission checks on tag list against anon
to ensure we do not disclose information about private tags
to the llm which could get extracted.
In specific scenarios (no special filters or limits) we will also
always include 5 semantic results (at least) with every query.
This effectively means that all very wide queries will always return
20 results, regardless of how complex they are.
Also:
FIX: embedding backfill rake task not working
We renamed internals, this corrects the implementation
* FEATURE: HyDE-powered semantic search.
It relies on the new outlet added on discourse/discourse#23390 to display semantic search results in an unobtrusive way.
We'll use a HyDE-backed approach for semantic search, which consists on generating an hypothetical document from a given keywords, which gets transformed into a vector and used in a asymmetric similarity topic search.
This PR also reorganizes the internals to have less moving parts, maintaining one hierarchy of DAOish classes for vector-related operations like transformations and querying.
Completions and vectors created by HyDE will remain cached on Redis for now, but we could later use Postgres instead.
* Missing translation and rate limiting
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
The researcher persona has access to Google and can perform
various internet research tasks. At the moment it can not read
web pages, but that is under consideration
This refactor changes it so we only include minimal data in the
system prompt which leaves us lots of tokens for specific searches
The new search command allows us to pull in settings on demand
Descriptions are include in short search results, and names only
in longer results
Also:
* In dev it is important to tell when calls are made to open ai
this adds a console log to increase awareness around token usage
* PERF: stop counting tokens so often
This changes it so we only count tokens once per response
Previously each time we heard back from open ai we would count
tokens, leading to uneeded delays
* bug fix, commands may reach in for tokenizer
* add logging to console for anthropic calls as well
* Update lib/shared/inference/openai_completions.rb
Co-authored-by: Martin Brennan <mjrbrennan@gmail.com>
Also adds ai_bot_enabled_personas so admins can tweak which stock
personas are enabled.
The new persona has a full listing of all site settings and is
able to get context for each setting.
This means you can ask it to search through settings for something
relevant.
Security wise there is no access to actual configuration of settings
just to the names / description and implementation.
Previously this was part of the forum helper persona however it
just clashes too much with other behaviors, isolating it makes
it far more powerful.
* sneaking this one in, user_emails is a non obvious table in our
structure.
usually one would assume users has emails so the clarifies a bit
better. plus it is a very common table to hit.
This splits out a bunch of code that used to live inside bots
into a dedicated concept called a Persona.
This allows us to start playing with multiple personas for the bot
Ships with:
artist - for making images
sql helper - for helping with data explorer
general - for everything and anything
Also includes a few fixes that make the generic LLM function implementation more robust
This command can be used to extract information about a discourse
site setting directly from source.
To operate it needs the rg binary in the container.
This fixes 2 big issues:
1. No matter how hard you try, grounding anthropic title prompt
is just too hard. This works around by only looking at the last
sentence it returns and treating as title
2. Non English locales would be stuck with "generic" title, this
ensures every bot message gets a title, using a custom field to
track
Also, slightly tunes some anthropic prompts.
Open AI support function calling, this has a very specific shape
that other LLMs have not quite adopted.
This simulates a command framework using system prompts on LLMs
that are not open AI.
Features include:
- Smart system prompt to steer the LLM
- Parameter validation (we ensure all the params are specified correctly)
This is being tested on Anthropic at the moment and intial results
are promising.
Previously we were not counting functions correctly and not
accounting for minimum token count per message
This corrects both issues and improves documentation internally
Azure requires a single HTTP endpoint per type of completion.
The settings: `ai_openai_gpt35_16k_url` and `ai_openai_gpt4_32k_url` can be
used now to configure the extra endpoints
This amends token limit which was off a bit due to function calls and fixes
a minor JS issue where we were not testing for a property
* FEATURE: optional warning attached to all AI bot conversations
This commit introduces `ai_bot_enable_chat_warning` which can be used
to warn people prior to starting a chat with the bot.
In particular this is useful if moderators are regularly reading chat
transcripts as it sets expectations early.
By default this is disabled.
Also:
- Stops making ajax call prior to opening composer
- Hides PM title when starting a bot PM
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
previously you would have to wait quite a while to see the prompt this implements
a very basic implementation of progress so you can see the API is working.
Also:
- Fix google progress.
- Handle the incredibly rare, zero results from google.
- Simplify command so it is less error prone
- replace invoke and attache results with a invoke
- ensure invoke can only ever be run once
- pass in all the information a command needs in constructor
- use new pattern throughout
- test invocation in isolation
- Attempt to hint reading is done by sending complete:true
- Do not include post_number in result unless it was sent in
- Rush visual feedback when a command is run (ensure we always revise)
- Include hyperlink in read command description
- Stop round tripping to GPT after image generation (speeds up images by a lot)
- Add a test for image command
This command is useful for reading a topics content. It allows us to perform
critical analysis or suggest answers.
Given 8k token limit in GPT-4 I hardcoded reading to 1500 tokens, but we can
follow up and allow larger windows on models that support more tokens.
On local testing even in this limited form this can be very useful.
* FIX: Google command was including full payload
Additionally there was no truncating happening meaning you could blow token
budget easily on a single search.
This made Google search mostly useless and it would mean that after using
Google we would revert to a clean slate which is very confusing.
* no need for nil there
The command framework had some confusing dispatching where it would dispatch
JSON blobs, this meant there was lots of parsing required in every command
The refactor handles transforming the args prior to dispatch which makes
consuming far simpler
This is also general prep to supporting some basic command framework in other
llms.
* FEATURE: Add support for StableBeluga and Upstage Llama2 instruct
This means we support all models in the top3 of the Open LLM Leaderboard
Since some of those models have RoPE, we now have a setting so you can
customize the token limit depending which model you use.
TopicQuery already provides a lot of safeguards and options for filtering topic, and enforcing permissions. It makes sense to rely on it as other plugins like discourse-assign do.
As a bonus, we now have access to the current_user while serializing these topics, so users will see things like unread posts count just like we do for the lists.
Claude 1 costs the same and is less good than Claude 2. Make use of Claude
2 in all spots ...
This also fixes streaming so it uses the far more efficient streaming protocol.
Single and multi-chunk summaries end using different prompts for the last summary. This change detects when the summarized content fits in a single chunk and uses a slightly different prompt, which leads to more consistent summary formats.
This PR also moves the chunk-splitting step to the `FoldContent` strategy as preparation for implementing streamed summaries.
* FEATURE: Embeddings to main db
This commit moves our embeddings store from an external configurable PostgreSQL
instance back into the main database. This is done to simplify the setup.
There is a migration that will try to import the external embeddings into
the main DB if it is configured and there are rows.
It removes support from embeddings models that aren't all_mpnet_base_v2 or OpenAI
text_embedding_ada_002. However it will now be easier to add new models.
It also now takes into account:
- topic title
- topic category
- topic tags
- replies (as much as the model allows)
We introduce an interface so we can eventually support multiple strategies
for handling long topics.
This PR severely damages the semantic search performance, but this is a
temporary until we can get adapt HyDE to make semantic search use the same
embeddings we have for semantic related with good performance.
Here we also have some ground work to add post level embeddings, but this
will be added in a future PR.
Please note that this PR will also block Discourse from booting / updating if
this plugin is installed and the pgvector extension isn't available on the
PostgreSQL instance Discourse uses.
* DEV: Better strategies for summarization
The strategy responsibility needs to be "Given a collection of texts, I know how to summarize them most efficiently, using the minimum amount of requests and maximizing token usage".
There are different token limits for each model, so it all boils down to two different strategies:
Fold all these texts into a single one, doing the summarization in chunks, and then build a summary from those.
Build it by combining texts in a single prompt, and truncate it according to your token limits.
While the latter is less than ideal, we need it for "bart-large-cnn-samsum" and "flan-t5-base-samsum", both with low limits. The rest will rely on folding.
* Expose summarized chunks to users
Reduce maximum replies to 2500 tokens and make them even for both GPT-3.5
and 4
Account for 400+ tokens in function definitions (this was unaccounted for)
* FEATURE: add ai_bot_enabled_chat commands and tune search
This allows admins to disable/enable GPT command integrations.
Also hones search results which were looping cause the result did not denote
the failure properly (it lost context)
* include more context for google command
include more context for time command
* type
The new site settings:
ai_openai_gpt35_url : distribution for GPT 16k
ai_openai_gpt4_url: distribution for GPT 4
ai_openai_embeddings_url: distribution for ada2
If untouched we will simply use OpenAI endpoints.
Azure requires 1 URL per model, OpenAI allows a single URL to serve multiple models. Hence the new settings.
```
prompt << build_message(bot_user.username, reply)
```
Would store a "cooked" prompt which is invalid, instead just store the raw
values which are later passed to build_message
Additionally:
1. Disable summary command which needs honing
2. Stop storing decorations (searched for X) in prompt which leads to straying
3. Ship username directly to model, avoiding "user: content" in prompts. This
was causing GPT to stray
Given latest GPT 3.5 16k which is both better steered and supports functions
we can now support rich bot integration.
Clunky system message based steering is removed and instead we use the
function framework provided by Open AI
* DEV: Remove the summarization feature
Instead, we'll register summarization implementations for OpenAI, Anthropic, and Discourse AI using the API defined in discourse/discourse#21813.
Core and chat will implement features on top of these implementations instead of this plugin extending them.
* Register instances that contain the model, requiring less site settings
Previous to this change we were chaining stuff too late and would execute
commands serially leading to very unexpected results
This corrects this and allows us to run stuff like:
> Search google 3/4 times on various permutations of
QUERY and answer this question.
We limit at 5 commands to ensure there are not pathological user cases
where you lean on the LLM to flood us with results.
For the time being smart commands only work consistently on GPT 4.
Avoid using any smart commands on the earlier models.
Additionally adds better error handling to Claude which sometimes streams
partial json and slightly tunes the search command.
blog.start_gpt_chat -> was on my blog
This also slightly tunes the search prompt to support filtering by oldest
and try a tiny bit harder to guide GPT 3.5 which is a bit of a losing battle
Co-authored-by: Krzysztof Kotlarek <kotlarek.krzysztof@gmail.com>
The rails_failover middleware will intercept all `PG::ConnectionBad` errors and put the cluster into readonly mode. It does not have any handling for multiple databases. Therefore, an issue with the embeddings database was taking the whole cluster into readonly.
This commit fixes the issue by rescuing `PG::Error` from all AI database accesses, and re-raises errors with a different class. It also adds a spec to ensure that an embeddings database outage does not affect the functionality of the topics/show route.
Co-authored-by: David Taylor <david@taylorhq.com>
* FIX: guide GPT 3.5 better
This limits search results to 10 cause we were blowing the whole token
budget on search results, additionally it includes a quick exchange at
the start of a session to try and guide GPT 3.5 to follow instructions
Sadly GPT 3.5 drifts off very quickly but this does improve stuff a bit.
It also attempts to correct some issues with anthropic, though it still is
surprisingly hard to ground
* add status:public, this is a bit of a hack but ensures that we can search
for any filter provided
* fix specs
- We only support searching public topics - make it clear
- Stop using bug/feature, cause is poisons system - these may not exist
- Add after: and before: which are very handy for bounding search results
* FEATURE: introduce a more efficient formatter
Previous formatting style was space inefficient given JSON consumes lots
of tokens, the new format is now used consistently across commands
Also fixes
- search limited to 10
- search breaking on limit: non existent directive
* Slight improvement to summarizer
Stop blowing up context with custom prompts
* ensure we include the guiding message
* correct spec
* langchain style summarizer ...
much more accurate (albeit more expensive)
* lint
This change-set connects GPT based chat with the forum it runs on. Allowing it to perform search, lookup tags and categories and summarize topics.
The integration is currently restricted to public portions of the forum.
Changes made:
- Do not run ai reply job for small actions
- Improved composable system prompt
- Trivial summarizer for topics
- Image generator
- Google command for searching via Google
- Corrected trimming of posts raw (was replacing with numbers)
- Bypass of problem specs
The feature works best with GPT-4
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
* FEATURE: Less friction for starting a conversation with an AI bot.
This PR adds a new header icon as a shortcut to start a conversation with one of our AI Bots. After clicking and selecting one from the dropdown menu, we'll open the composer with some fields already filled (recipients and title).
If you leave the title as is, we'll queue a job after five minutes to update it using a bot suggestion.
* Update assets/javascripts/initializers/ai-bot-replies.js
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
* Update assets/javascripts/initializers/ai-bot-replies.js
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
---------
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
We'll create one bot user for each available model. When listed in the `ai_bot_enabled_chat_bots` setting, they will reply.
This PR lets us use Claude-v1 in stream mode.
* Minor... use username suggester in case username already exists
* FIX: ensure we truncate long prompts
Previously we
1. Used raw length instead of token counts for counting length
2. We totally dropped a prompt if it was too long
New implementation will truncate "raw" if it gets too long maintaining
meaning.
This module lets you chat with our GPT bot inside a PM. The bot only replies to members of the groups listed on the ai_bot_allowed_groups setting and only if you invite it to participate in the PM.
Also adds some tests around completions and supports additional params
such as top_p, temperature and max_tokens
This also migrates off Faraday to using Net::HTTP directly
* FEATURE: Topic summarization
Summarize topics using the TopicView's "summary" filter. The UI is similar to what we do for chat, but we don't allow the user to select a timeframe.
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Depends on discourse/discourse#20915
Hooks to the full-page-search component using an experimental API and performs an assymetric similarity search using our embeddings database.
Also:
- Normalizes behavior between logged in and anon,
we only show related topics in the related topic section
- Renames "suggested" to "related" given this only exists in related section
- Adds a spec section to ensure anon does not regress
- Adds `ai_embeddings_semantic_related_topics` to limit related topics
Renamed settings:
ai_embeddings_semantic_suggested_model -> ai_embeddings_semantic_related_model
ai_embeddings_semantic_suggested_topics_enabled -> ai_embeddings_semantic_related_topics_enabled
Plugins is still in an experimental phase and not much is overidden hence
avoiding adding site setting migrations.
Co-authored-by: Krzysztof Kotlarek <kotlarek.krzysztof@gmail.com>
Allows related topics to show up for logged on users
- Introduces a new "Related Topics" block above suggested when related topics exist
- Renames `ai_embeddings_semantic_suggested_topics_anons_enabled` -> `ai_embeddings_semantic_suggested_topics_enabled` (given it is only deployed on 1 site not bothering with a migration)
- Adds an integration test to ensure data arrives correctly on the client
* FIX: Only show public visible topics as suggested for anons
* DEV: Add tests for embeddings
* Update spec/lib/modules/embeddings/semantic_suggested_spec.rb
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
* Update spec/lib/modules/embeddings/semantic_suggested_spec.rb
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
* move to top
---------
Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
A prompt with multiple messages leads to better results, as the AI can learn for given examples. Alongside this change, we provide a better default proofreading prompt.
* FEATURE: Composer AI helper
This change introduces a new composer button for the group members listed in the `ai_helper_allowed_groups` site setting.
Users can use chatGPT to review, improve, or translate their posts to English.
* Add a safeguard for PMs and don't rely on parentView
This change adds two new reviewable types: ReviewableAIPost and ReviewableAIChatMessage. They have the same actions as their existing counterparts: ReviewableFlaggedPost and ReviewableChatMessage.
We'll display the model used and their accuracy when showing these flags in the review queue and adjust the latter after staff performs an action, tracking a global accuracy per existing model in a separate table.
* FEATURE: Dedicated reviewables for AI flags
* Store and adjust model accuracy
* Display accuracy in reviewable templates