This re-implements tool support in DiscourseAi::Completions::Llm #generate
Previously tool support was always returned via XML and it would be the responsibility of the caller to parse XML
New implementation has the endpoints return ToolCall objects.
Additionally this simplifies the Llm endpoint interface and gives it more clarity. Llms must implement
decode, decode_chunk (for streaming)
It is the implementers responsibility to figure out how to decode chunks, base no longer implements. To make this easy we ship a flexible json decoder which is easy to wire up.
Also (new)
Better debugging for PMs, we now have a next / previous button to see all the Llm messages associated with a PM
Token accounting is fixed for vllm (we were not correctly counting tokens)
This PR fixes an issue where clicking to regenerate a summary was still showing the cached summary. To resolve this we call resetSummary() to reset all the summarization related properties before creating a new request.
This change introduces a job to summarize topics and cache the results automatically. We provide a setting to control how many topics we'll backfill per hour and what the topic's minimum word count is to qualify.
We'll prioritize topics without summary over outdated ones.
Fixes encoding of params on LLM function calls.
Previously we would improperly return results if a function parameter returned an HTML tag.
Additionally adds some missing HTTP verbs to tool calls.
The custom field "discourse_ai_bypass_ai_reply" was added so
we can signal the post created hook to bypass replying even
if it thinks it should.
Otherwise there are cases where we double answer user questions
leading to much confusion.
This also slightly refactors code making the controller smaller
The new `/admin/plugins/discourse-ai/ai-personas/stream-reply.json` was added.
This endpoint streams data direct from a persona and can be used
to access a persona from remote systems leaving a paper trail in
PMs about the conversation that happened
This endpoint is only accessible to admins.
---------
Co-authored-by: Gabriel Grubba <70247653+Grubba27@users.noreply.github.com>
Co-authored-by: Keegan George <kgeorge13@gmail.com>
The primary key is usually a bigint column, but the foreign key columns
are usually of integer type. This can lead to issues when joining these
columns due to mismatched types and different value ranges.
This was using a temporary plugin / test API to make tests pass, but it
is safe to alter "ai_document_fragment_embeddings" and
"rag_document_fragments" tables because they usually have less than 1M
rows and migration is going to be fast.
Depending on the size of the community, "classification_results" table
may have more than 1M rows and the migration will lock the table for a
longer time. However, classification runs in background jobs and they
will be automatically retried if they fail due to the lock, which makes
it acceptable.
* FEATURE: Fast-track gist regeneration when a hot topic gets a new post
* DEV: Introduce an upsert-like summarize
* FIX: Only enqueue fast-track gist for hot hot hot topics
---------
Co-authored-by: Rafael Silva <xfalcox@gmail.com>
* FIX/REFACTOR: FoldContent revamp
We hit a snag with our hot topic gist strategy: the regex we used to split the content didn't work, so we cannot send the original post separately. This was important for letting the model focus on what's new in the topic.
The algorithm doesn’t give us full control over how prompts are written, and figuring out how to format the content isn't straightforward. This means we're having to use more complicated workarounds, like regex.
To tackle this, I'm suggesting we simplify the approach a bit. Let's focus on summarizing as much as we can upfront, then gradually add new content until there's nothing left to summarize.
Also, the "extend" part is mostly for models with small context windows, which shouldn't pose a problem 99% of the time with the content volume we're dealing with.
* Fix fold docs
* Use #shift instead of #pop to get the first elem, not the last
This changeset contains 4 fixes:
1. We were allowing running tests on unsaved tools,
this is problematic cause uploads are not yet associated or indexed
leading to confusing results. We now only show the test button when
tool is saved.
2. We were not properly scoping rag document fragements, this
meant that personas and ai tools could get results from other
unrelated tools, just to be filtered out later
3. index.search showed options as "optional" but implementation
required the second option
4. When testing tools searching through document fragments was
not working at all cause we did not properly load the tool
* FIX: Llm selector / forced tools / search tool
This fixes a few issues:
1. When search was not finding any semantic results we would break the tool
2. Gemin / Anthropic models did not implement forced tools previously despite it being an API option
3. Mechanics around displaying llm selector were not right. If you disabled LLM selector server side persona PM did not work correctly.
4. Disabling native tools for anthropic model moved out of a site setting. This deliberately does not migrate cause this feature is really rare to need now, people who had it set probably did not need it.
5. Updates anthropic model names to latest release
* linting
* fix a couple of tests I missed
* clean up conditional
A new feature_context json column was added to ai_api_audit_logs
This allows us to store rich json like context on any LLM request
made.
This new field now stores automation id and name.
Additionally allows llm_triage to specify maximum number of tokens
This means that you can limit the cost of llm triage by scanning only
first N tokens of a post.
This changeset:
1. Corrects some issues with "force_default_llm" not applying
2. Expands the LLM list page to show LLM usage
3. Clarifies better what "enabling a bot" on an llm means (you get it in the selector)
* Display gists in the hot topics list
* Adjust hot topics gist strategy and add a job to generate gists
* Replace setting with a configurable batch size
* Avoid loading summaries for other topic lists
* Tweak gist prompt to focus on latest posts in the context of the OP
* Remove serializer hack and rely on core change from discourse/discourse#29291
* Update lib/summarization/strategies/hot_topic_gists.rb
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
---------
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Splits persona permissions so you can allow a persona on:
- chat dms
- personal messages
- topic mentions
- chat channels
(any combination is allowed)
Previously we did not have this flexibility.
Additionally, adds the ability to "tether" a language model to a persona so it will always be used by the persona. This allows people to use a cheaper language model for one group of people and more expensive one for other people
This introduces another configuration that allows operators to
limit the amount of interactions with forced tool usage.
Forced tools are very handy in initial llm interactions, but as
conversation progresses they can hinder by slowing down stuff
and adding confusion.
The primary key is usually a bigint column, but the foreign key columns
usually are of integer type. This can lead to issues when joining these
columns due to mismatched types and different value ranges.
In a recent core change, all bigint sequences will start at a very high
value in the test environment to surface this type of errors. The same
change also added a temporary API that changes the column type to bigint
in order to allow for the tests to run.
The plugin API is only temporary and it is important for these plugins
to migrate their columns to bigint to avoid issues in the future.
This adds chain halting (ability to terminate llm chain in a tool)
and the ability to create uploads in a tool
Together this lets us integrate custom image generators into a
custom tool.
* FEATURE: allows forced LLM tool use
Sometimes we need to force LLMs to use tools, for example in RAG
like use cases we may want to force an unconditional search.
The new framework allows you backend to force tool usage.
Front end commit to follow
* UI for forcing tools now works, but it does not react right
* fix bugs
* fix tests, this is now ready for review
Previous to this change we could flag, but there was no way
to hide content and treat the flag as spam.
We had the option to hide topics, but this is not desirable for
a spam reply.
New option allows triage to hide a post if it is a reply, if the
post happens to be the first post on the topic, the topic will
be hidden.
This PR updates the rate limits for AI helper so that image caption follows a specific rate limit of 20 requests per minute. This should help when uploading multiple files that need to be captioned. This PR also updates the UI so that it shows toast message with the extracted error message instead of having a blocking `popupAjaxError` error dialog.
---------
Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>
This allows our users to add the Ollama provider and use it to serve our AI bot (completion/dialect).
In this PR, we introduce:
DiscourseAi::Completions::Dialects::Ollama which would help us translate by utilizing Completions::Endpoint::Ollama
Correct extract_completion_from and partials_from in Endpoints::Ollama
Also
Add tests for Endpoints::Ollama
Introduce ollama_model fabricator
This allows custom tools access to uploads and sophisticated searches using embedding.
It introduces:
- A shared front end for listing and uploading files (shared with personas)
- Backend implementation of index.search function within a custom tool.
Custom tools now may search through uploaded files
function invoke(params) {
return index.search(params.query)
}
This means that RAG implementers now may preload tools with knowledge and have high fidelity over
the search.
The search function support
specifying max results
specifying a subset of files to search (from uploads)
Also
- Improved documentation for tools (when creating a tool a preamble explains all the functionality)
- uploads were a bit finicky, fixed an edge case where the UI would not show them as updated
Restructures LLM config page so it is far clearer.
Also corrects bugs around adding LLMs and having LLMs not editable post addition
---------
Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
The `DiffModal` is triggered after selecting an option in the composer helper menu. After selecting an option, we should close the composer helper menu and only show the diff modal. On mobile, there was an edge-case where `this.args.close()` for was causing the closing of both the `DiffModal` and the `AiComposerHelperMenu`. This PR resolves that by ensuring the menu is closed _first_ asynchronously, followed by opening the relevant modal.
Polymorphic RAG means that we will be able to access RAG fragments both from AiPersona and AiCustomTool
In turn this gives us support for richer RAG implementations.
Previously we had moved the AI helper from the options menu to a selection menu that appears when selecting text in the composer. This had the benefit of making the AI helper a more discoverable feature. Now that some time has passed and the AI helper is more recognized, we will be moving it back to the composer toolbar.
This is better because:
- It consistent with other behavior and ways of accessing tools in the composer
- It has an improved mobile experience
- It reduces unnecessary code and keeps things easier to migrate when we have composer V2.
- It allows for easily triggering AI helper for all content by clicking the button instead of having to select everything.
Embedding search is rate limited due to potentially expensive
hyde operation (which require LLM access).
Embedding generally is very cheap compared to it. (usually 100x cheaper)
This raises the limit to 100 per minute for embedding searches,
while keeping the old 4 per minute for HyDE powered search.