Embedding search is rate limited due to potentially expensive
hyde operation (which require LLM access).
Embedding generally is very cheap compared to it. (usually 100x cheaper)
This raises the limit to 100 per minute for embedding searches,
while keeping the old 4 per minute for HyDE powered search.
Previously we waited 1 minute before automatically titling PMs
The new change introduces adding a title immediately after the the
llm replies
Prompt was also modified to include the LLM reply in title suggestion.
This helps situation like:
user: tell me a joke
llm: a very funy joke about horses
Then the title would be "A Funny Horse Joke"
Specs already covered some auto title logic, amended to also
catch the new message bus message we have been sending.
* FIX: we were never reindexing old content
Embedding backfill contains logic for searching for old content
change and then backfilling.
Unfortunately it was excluding all topics that had embedding
unconditionally, leading to no backfill ever happening.
This change adds a test and ensures we backfill.
* over select results, this ensures we will be more likely to find
ai results when filtered
This improves the site setting search so it performs a somewhat
fuzzy match.
Previously it did not handle seperators such as "space" and a
term such as "min_post_length" would not find "min_first_post_length"
A more liberal search algorithm makes it easier to the AI to
navigate settings.
* Minor fix, {{and parameter.enum parameter.enum.length}} is non
obviously broken.
If parameter.enum is a tracked array it will return the object
cause embers and helper implementation.
This corrects an issue where enum keeps on selecting itself by
mistake.
This allows callers of embedding based search to bypass hyde.
Hyde will expand the search term using an LLM, but if an LLM is
performing the search we can skip this expansion.
It also introduced some tests for the controller which we did not have
Previously there was too much work proofreading text, new implementation
provides a single shortcut and easy way of proofreading text.
Co-authored-by: Martin Brennan <martin@discourse.org>
* FEATURE: LLM Triage support for systemless models.
This change adds support for OSS models without support for system messages. LlmTriage's system message field is no longer mandatory. We now send the post contents in a separate user message.
* Models using Ollama can also disable system prompts
New `ai_pm_summarization_allowed_groups` can be used to allow
visibility of the summarization feature on PMs.
This can be useful on forums where a lot of communication happens
inside PMs.
When navigating between topic we were not correctly resetting
internal state for summarization. This leads to a situation where
incorrect summaries can be displayed to users and wrong summaries
can be displayed.
Additionally our controller for grabbing summaries was always
streaming results via message bus, which could be delayed when
sidekiq is overloaded. We now will return the cached summary
right away if it is available direct from REST endpoint.
Creating a new model, either manually or from presets, doesn't initialize the `provider_params` object, meaning their custom params won't persist.
Additionally, this change adds some validations for Bedrock params, which are mandatory, and a clear message when a completion fails because we cannot build the URL.
- Validate fields to reduce the chance of breaking features by a misconfigured model.
- Fixed a bug where the URL might get deleted during an update.
- Display a warning when a model is currently in use.
* FIX: Add tool support to open ai compatible dialect and vllm
Automatic tools are in progress in vllm see: https://github.com/vllm-project/vllm/pull/5649
Even when they are supported, initial support will be uneven, only some models have native tool support
notably mistral which has some special tokens for tool support.
After the above PR lands in vllm we will still need to swap to
XML based tools on models without native tool support.
* fix specs
* DEV: Remove old code now that features rely on LlmModels.
* Hide old settings and migrate persona llm overrides
* Remove shadowing special URL + seeding code. Use srv:// prefix instead.
Using RAG fragments can lead to considerably big system messages, which becomes problematic when models have a smaller context window.
Before this change, we only look at the rest of the conversation to make sure we don't surpass the limit, which could lead to two unwanted scenarios when having large system messages:
All other messages are excluded due to size.
The system message already exceeds the limit.
As a result, I'm putting a hard-limit of 60% of available tokens. We don't want to aggresively truncate because if rag fragments are included, the system message contains a lot of context to improve the model response, but we also want to make room for the recent messages in the conversation.
* Seeding the SRV-backed model should happen inside an initializer.
* Keep the model up to date when the hidden setting changes.
* Use the correct Mixtral model name and fix previous data migration.
* URL validation should trigger only when we attempt to update it.
Using assistant role for system produces an error because
they expect alternating roles like user/assistant/user and so on.
Prompts cannot start with the assistant role.
This allows summary to use the new LLM models and migrates of API key based model selection
Claude 3.5 etc... all work now.
---------
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>