Implement streaming tool call implementation for Anthropic and Open AI.
When calling:
llm.generate(..., partial_tool_calls: true) do ...
Partials may contain ToolCall instances with partial: true, These tool calls are partially populated with json partially parsed.
So for example when performing a search you may get:
ToolCall(..., {search: "hello" })
ToolCall(..., {search: "hello world" })
The library used to parse json is:
https://github.com/dgraham/json-stream
We use a fork cause we need access to the internal buffer.
This prepares internals to perform partial tool calls, but does not implement it yet.
This PR fixes an issue where the AI search results were not being reset when you append your search to an existing query param (typically when you've come from quick search). This is because `handleSearch()` doesn't get called in this situation. So here we explicitly check for query params, trigger a reset and search for those occasions.
This re-implements tool support in DiscourseAi::Completions::Llm #generate
Previously tool support was always returned via XML and it would be the responsibility of the caller to parse XML
New implementation has the endpoints return ToolCall objects.
Additionally this simplifies the Llm endpoint interface and gives it more clarity. Llms must implement
decode, decode_chunk (for streaming)
It is the implementers responsibility to figure out how to decode chunks, base no longer implements. To make this easy we ship a flexible json decoder which is easy to wire up.
Also (new)
Better debugging for PMs, we now have a next / previous button to see all the Llm messages associated with a PM
Token accounting is fixed for vllm (we were not correctly counting tokens)
This PR fixes an issue where clicking to regenerate a summary was still showing the cached summary. To resolve this we call resetSummary() to reset all the summarization related properties before creating a new request.
This PR further decouples the streaming animation by completely handling the streaming animation directly in the `AiSummaryBox` component. Previously, handling the streaming animation by calling methods in the `ai-streamer` API was leading to timing issues making things out-of-sync. This results in some issues such as the last update of streamed text not being shown. Handling streaming directly in the component should simplify things drastically and prevent any issues.
This change introduces a job to summarize topics and cache the results automatically. We provide a setting to control how many topics we'll backfill per hour and what the topic's minimum word count is to qualify.
We'll prioritize topics without summary over outdated ones.
Fixes encoding of params on LLM function calls.
Previously we would improperly return results if a function parameter returned an HTML tag.
Additionally adds some missing HTTP verbs to tool calls.
The custom field "discourse_ai_bypass_ai_reply" was added so
we can signal the post created hook to bypass replying even
if it thinks it should.
Otherwise there are cases where we double answer user questions
leading to much confusion.
This also slightly refactors code making the controller smaller
The new `/admin/plugins/discourse-ai/ai-personas/stream-reply.json` was added.
This endpoint streams data direct from a persona and can be used
to access a persona from remote systems leaving a paper trail in
PMs about the conversation that happened
This endpoint is only accessible to admins.
---------
Co-authored-by: Gabriel Grubba <70247653+Grubba27@users.noreply.github.com>
Co-authored-by: Keegan George <kgeorge13@gmail.com>
The primary key is usually a bigint column, but the foreign key columns
are usually of integer type. This can lead to issues when joining these
columns due to mismatched types and different value ranges.
This was using a temporary plugin / test API to make tests pass, but it
is safe to alter "ai_document_fragment_embeddings" and
"rag_document_fragments" tables because they usually have less than 1M
rows and migration is going to be fast.
Depending on the size of the community, "classification_results" table
may have more than 1M rows and the migration will lock the table for a
longer time. However, classification runs in background jobs and they
will be automatically retried if they fail due to the lock, which makes
it acceptable.
* FEATURE: Fast-track gist regeneration when a hot topic gets a new post
* DEV: Introduce an upsert-like summarize
* FIX: Only enqueue fast-track gist for hot hot hot topics
---------
Co-authored-by: Rafael Silva <xfalcox@gmail.com>
* FIX/REFACTOR: FoldContent revamp
We hit a snag with our hot topic gist strategy: the regex we used to split the content didn't work, so we cannot send the original post separately. This was important for letting the model focus on what's new in the topic.
The algorithm doesn’t give us full control over how prompts are written, and figuring out how to format the content isn't straightforward. This means we're having to use more complicated workarounds, like regex.
To tackle this, I'm suggesting we simplify the approach a bit. Let's focus on summarizing as much as we can upfront, then gradually add new content until there's nothing left to summarize.
Also, the "extend" part is mostly for models with small context windows, which shouldn't pose a problem 99% of the time with the content volume we're dealing with.
* Fix fold docs
* Use #shift instead of #pop to get the first elem, not the last
This changeset contains 4 fixes:
1. We were allowing running tests on unsaved tools,
this is problematic cause uploads are not yet associated or indexed
leading to confusing results. We now only show the test button when
tool is saved.
2. We were not properly scoping rag document fragements, this
meant that personas and ai tools could get results from other
unrelated tools, just to be filtered out later
3. index.search showed options as "optional" but implementation
required the second option
4. When testing tools searching through document fragments was
not working at all cause we did not properly load the tool
* FIX: Llm selector / forced tools / search tool
This fixes a few issues:
1. When search was not finding any semantic results we would break the tool
2. Gemin / Anthropic models did not implement forced tools previously despite it being an API option
3. Mechanics around displaying llm selector were not right. If you disabled LLM selector server side persona PM did not work correctly.
4. Disabling native tools for anthropic model moved out of a site setting. This deliberately does not migrate cause this feature is really rare to need now, people who had it set probably did not need it.
5. Updates anthropic model names to latest release
* linting
* fix a couple of tests I missed
* clean up conditional