If a module LLM model is set to claude-2 and the ai_bedrock variables are all present we will use AWS Bedrock instead of Antrhopic own APIs.
This is quite hacky, but will allow us to test the waters with AWS Bedrock early access with every module.
This situation of "same module, completely different API" is quite a bit far from what we had in the OpenAI/Azure separation, so it's more food for thought for when we start working on the LLM abstraction layer soon this year.
This adds a new creative persona that has access to the underlying
model and no external integrations.
It allows people to use Claude/GPT models in a Discourse agnostic
way.
* FIX: properly truncate !command prompts
### What is going on here?
Previous to this change where a command was issued by the LLM it
could hallucinate a continuation eg:
```
This is what tags are
!tags
some nonsense here
```
This change introduces safeguards so `some nonsense here` does not
creep in to the prompt history, poisoning the llm results
This in effect grounds the llm a lot better and results in the llm
forgetting less about results.
The change only impacts Claude at the moment, but will also improve
stuff for llama 2 in future.
Also, this makes it significantly easier to test the bot framework
without an llm cause we avoid a whole bunch of complex stubbing
* blank is not a valid bot response, do not inject into prompt
We pass the text to the current LLM and ask them to generate a StableDifussion prompt.
We'll use that to generate 4 samples, temporarily creating uploads and returning their short URLs.
* FIX: Made bot more robust
This is a collection of small fixes
- Display "Searching for: ..." while searching instead of showing found 0 results.
- Only allow 5 commands in lang chain - 6 feels like too much
- On the 5th command stop informing the engine about functions, so it is forced to complete
- Add another 30 tokens of buffer and explain why
- Typo in command prompt
Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
I thought this wasn't neccessary and we could safely rely on the appEvent during the initial search.
It only fires if #searchEnabled is true, meaning the search term is valid.
Note, we perform permission checks on tag list against anon
to ensure we do not disclose information about private tags
to the llm which could get extracted.
In specific scenarios (no special filters or limits) we will also
always include 5 semantic results (at least) with every query.
This effectively means that all very wide queries will always return
20 results, regardless of how complex they are.
Also:
FIX: embedding backfill rake task not working
We renamed internals, this corrects the implementation
* FEATURE: HyDE-powered semantic search.
It relies on the new outlet added on discourse/discourse#23390 to display semantic search results in an unobtrusive way.
We'll use a HyDE-backed approach for semantic search, which consists on generating an hypothetical document from a given keywords, which gets transformed into a vector and used in a asymmetric similarity topic search.
This PR also reorganizes the internals to have less moving parts, maintaining one hierarchy of DAOish classes for vector-related operations like transformations and querying.
Completions and vectors created by HyDE will remain cached on Redis for now, but we could later use Postgres instead.
* Missing translation and rate limiting
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
The researcher persona has access to Google and can perform
various internet research tasks. At the moment it can not read
web pages, but that is under consideration
Previous to this change we relied on client side settings to
determine if an end user has access to the ai bot.
This meant that if a user was not aware they are a member of a
group (as it is with restricted visibility ones) they would not
see the bot button.
All checking has now moved to the server side, and tests were
added to cover.
This refactor changes it so we only include minimal data in the
system prompt which leaves us lots of tokens for specific searches
The new search command allows us to pull in settings on demand
Descriptions are include in short search results, and names only
in longer results
Also:
* In dev it is important to tell when calls are made to open ai
this adds a console log to increase awareness around token usage
* PERF: stop counting tokens so often
This changes it so we only count tokens once per response
Previously each time we heard back from open ai we would count
tokens, leading to uneeded delays
* bug fix, commands may reach in for tokenizer
* add logging to console for anthropic calls as well
* Update lib/shared/inference/openai_completions.rb
Co-authored-by: Martin Brennan <mjrbrennan@gmail.com>