Azure requires a single HTTP endpoint per type of completion.
The settings: `ai_openai_gpt35_16k_url` and `ai_openai_gpt4_32k_url` can be
used now to configure the extra endpoints
This amends token limit which was off a bit due to function calls and fixes
a minor JS issue where we were not testing for a property
* FEATURE: optional warning attached to all AI bot conversations
This commit introduces `ai_bot_enable_chat_warning` which can be used
to warn people prior to starting a chat with the bot.
In particular this is useful if moderators are regularly reading chat
transcripts as it sets expectations early.
By default this is disabled.
Also:
- Stops making ajax call prior to opening composer
- Hides PM title when starting a bot PM
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
previously you would have to wait quite a while to see the prompt this implements
a very basic implementation of progress so you can see the API is working.
Also:
- Fix google progress.
- Handle the incredibly rare, zero results from google.
- Simplify command so it is less error prone
- replace invoke and attache results with a invoke
- ensure invoke can only ever be run once
- pass in all the information a command needs in constructor
- use new pattern throughout
- test invocation in isolation
- Attempt to hint reading is done by sending complete:true
- Do not include post_number in result unless it was sent in
- Rush visual feedback when a command is run (ensure we always revise)
- Include hyperlink in read command description
- Stop round tripping to GPT after image generation (speeds up images by a lot)
- Add a test for image command
This command is useful for reading a topics content. It allows us to perform
critical analysis or suggest answers.
Given 8k token limit in GPT-4 I hardcoded reading to 1500 tokens, but we can
follow up and allow larger windows on models that support more tokens.
On local testing even in this limited form this can be very useful.
* FIX: Google command was including full payload
Additionally there was no truncating happening meaning you could blow token
budget easily on a single search.
This made Google search mostly useless and it would mean that after using
Google we would revert to a clean slate which is very confusing.
* no need for nil there
The command framework had some confusing dispatching where it would dispatch
JSON blobs, this meant there was lots of parsing required in every command
The refactor handles transforming the args prior to dispatch which makes
consuming far simpler
This is also general prep to supporting some basic command framework in other
llms.
* FEATURE: Add support for StableBeluga and Upstage Llama2 instruct
This means we support all models in the top3 of the Open LLM Leaderboard
Since some of those models have RoPE, we now have a setting so you can
customize the token limit depending which model you use.
TopicQuery already provides a lot of safeguards and options for filtering topic, and enforcing permissions. It makes sense to rely on it as other plugins like discourse-assign do.
As a bonus, we now have access to the current_user while serializing these topics, so users will see things like unread posts count just like we do for the lists.
* DEV: Pin plugin for v3.1
Changes to the topic recommendations list added by discourse/discourse#22896 were reverted from stable.
* Update .discourse-compatibility
Co-authored-by: David Taylor <david@taylorhq.com>
---------
Co-authored-by: David Taylor <david@taylorhq.com>
Claude 1 costs the same and is less good than Claude 2. Make use of Claude
2 in all spots ...
This also fixes streaming so it uses the far more efficient streaming protocol.
A missing parameter on the `parseInt` function was causing unexpected UI behavior for the AI helper since it turned an allowed group ID into NaN. We should always use base10 when parsing these IDs.
* FIX: Show related topics when scrolling long topics
* Update assets/javascripts/initializers/related-topics.js
Co-authored-by: Roman Rizzi <roman@discourse.org>
---------
Co-authored-by: Roman Rizzi <roman@discourse.org>
The tokenizer was truncating and padding to 128 tokens, and we try append
new post content until we hit 384 tokens. This was causing the tokenizer
to accept all posts in a topic, wasting CPU and memory.
Single and multi-chunk summaries end using different prompts for the last summary. This change detects when the summarized content fits in a single chunk and uses a slightly different prompt, which leads to more consistent summary formats.
This PR also moves the chunk-splitting step to the `FoldContent` strategy as preparation for implementing streamed summaries.
* FEATURE: Embeddings to main db
This commit moves our embeddings store from an external configurable PostgreSQL
instance back into the main database. This is done to simplify the setup.
There is a migration that will try to import the external embeddings into
the main DB if it is configured and there are rows.
It removes support from embeddings models that aren't all_mpnet_base_v2 or OpenAI
text_embedding_ada_002. However it will now be easier to add new models.
It also now takes into account:
- topic title
- topic category
- topic tags
- replies (as much as the model allows)
We introduce an interface so we can eventually support multiple strategies
for handling long topics.
This PR severely damages the semantic search performance, but this is a
temporary until we can get adapt HyDE to make semantic search use the same
embeddings we have for semantic related with good performance.
Here we also have some ground work to add post level embeddings, but this
will be added in a future PR.
Please note that this PR will also block Discourse from booting / updating if
this plugin is installed and the pgvector extension isn't available on the
PostgreSQL instance Discourse uses.
* DEV: Better strategies for summarization
The strategy responsibility needs to be "Given a collection of texts, I know how to summarize them most efficiently, using the minimum amount of requests and maximizing token usage".
There are different token limits for each model, so it all boils down to two different strategies:
Fold all these texts into a single one, doing the summarization in chunks, and then build a summary from those.
Build it by combining texts in a single prompt, and truncate it according to your token limits.
While the latter is less than ideal, we need it for "bart-large-cnn-samsum" and "flan-t5-base-samsum", both with low limits. The rest will rely on folding.
* Expose summarized chunks to users
Reduce maximum replies to 2500 tokens and make them even for both GPT-3.5
and 4
Account for 400+ tokens in function definitions (this was unaccounted for)
* FEATURE: add ai_bot_enabled_chat commands and tune search
This allows admins to disable/enable GPT command integrations.
Also hones search results which were looping cause the result did not denote
the failure properly (it lost context)
* include more context for google command
include more context for time command
* type