Commit Graph

475 Commits

Author SHA1 Message Date
Sam 9551b1a4d1
FIX: do not strip empty string during stream processing (#911)
Fixes issue in Open AI provider eating newlines and spaces
2024-11-13 07:12:00 +11:00
Rafael dos Santos Silva aef9a03d4c
FEATURE: Truncate AI Captions to a reasonable max size (#907) 2024-11-12 15:52:46 -03:00
Sam e817b7dc11
FEATURE: improve tool support (#904)
This re-implements tool support in DiscourseAi::Completions::Llm #generate

Previously tool support was always returned via XML and it would be the responsibility of the caller to parse XML

New implementation has the endpoints return ToolCall objects.

Additionally this simplifies the Llm endpoint interface and gives it more clarity. Llms must implement

decode, decode_chunk (for streaming)

It is the implementers responsibility to figure out how to decode chunks, base no longer implements. To make this easy we ship a flexible json decoder which is easy to wire up.

Also (new)

    Better debugging for PMs, we now have a next / previous button to see all the Llm messages associated with a PM
    Token accounting is fixed for vllm (we were not correctly counting tokens)
2024-11-12 08:14:30 +11:00
Keegan George 644141ff08
FIX: Regenerate summary button still shows cached summary (#903)
This PR fixes an issue where clicking to regenerate a summary was still showing the cached summary. To resolve this we call resetSummary() to reset all the summarization related properties before creating a new request.
2024-11-07 16:01:18 -08:00
Roman Rizzi 9505a8976c
FEATURE: Automatically backfill regular summaries. (#892)
This change introduces a job to summarize topics and cache the results automatically. We provide a setting to control how many topics we'll backfill per hour and what the topic's minimum word count is to qualify.

We'll prioritize topics without summary over outdated ones.
2024-11-04 17:48:11 -03:00
Sam 98022d7d96
FEATURE: support custom instructions for persona streaming (#890)
This allows us to inject information into the system prompt
which can help shape replies without repeating over and over
in messages.
2024-11-05 07:43:26 +11:00
Rafael dos Santos Silva 772ee934ab
Migrate sentiment to a TEI backend (#886) 2024-11-04 09:14:34 -03:00
Sam bffe9dfa07
FIX: we must properly encode objects prior to escaping (#891)
in cases of arrays escapeHTML will not work)

*
2024-11-04 16:16:25 +11:00
Sam c352054d4e
FIX: encode parameters returned from LLMs correctly (#889)
Fixes encoding of params on LLM function calls.

Previously we would improperly return results if a function parameter returned an HTML tag.

Additionally adds some missing HTTP verbs to tool calls.
2024-11-04 10:07:17 +11:00
Roman Rizzi 7e3a543f6f
FEATURE: Double gist length to 40 words (#888) 2024-11-01 13:09:03 -03:00
Roman Rizzi e8f0633141
DEV: Extend truncation to all summarizable content (#884) 2024-10-31 12:17:42 -03:00
Roman Rizzi e8eed710e0
FIX: Truncate OP for gists to help the model focus on the latest posts (#883) 2024-10-31 10:54:56 -03:00
Sam 34a59b623e
FIX: ensure replies are never double streamed (#879)
The custom field "discourse_ai_bypass_ai_reply" was added so
we can signal the post created hook to bypass replying even
if it thinks it should.

Otherwise there are cases where we double answer user questions
leading to much confusion.

This also slightly refactors code making the controller smaller
2024-10-30 20:24:39 +11:00
Sam be0b78cacd
FEATURE: new endpoint for directly accessing a persona (#876)
The new `/admin/plugins/discourse-ai/ai-personas/stream-reply.json` was added.

This endpoint streams data direct from a persona and can be used
to access a persona from remote systems leaving a paper trail in
PMs about the conversation that happened

This endpoint is only accessible to admins.

---------

Co-authored-by: Gabriel Grubba <70247653+Grubba27@users.noreply.github.com>
Co-authored-by: Keegan George <kgeorge13@gmail.com>
2024-10-30 10:28:20 +11:00
Roman Rizzi dd404c924a
DEV: Use different feature_names for summarization strategies (#875) 2024-10-29 08:45:14 -03:00
Rafael dos Santos Silva 8ded4b2e58
FIX: Use present? instead of invalid exists? (#869) 2024-10-25 13:04:42 -03:00
Roman Rizzi a2b1ea3c63
FEATURE: Fast-track gist regeneration when a hot topic gets a new post (#860)
* FEATURE: Fast-track gist regeneration when a hot topic gets a new post

* DEV: Introduce an upsert-like summarize

* FIX: Only enqueue fast-track gist for hot hot hot topics

---------

Co-authored-by: Rafael Silva <xfalcox@gmail.com>
2024-10-25 12:38:49 -03:00
Rafael dos Santos Silva 33da27e231
FIX: Change hot gist prompt to avoid title repeating #859 (#859)
Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
2024-10-25 12:12:33 -03:00
Roman Rizzi ec97996905
FIX/REFACTOR: FoldContent revamp (#866)
* FIX/REFACTOR: FoldContent revamp

We hit a snag with our hot topic gist strategy: the regex we used to split the content didn't work, so we cannot send the original post separately. This was important for letting the model focus on what's new in the topic.

The algorithm doesn’t give us full control over how prompts are written, and figuring out how to format the content isn't straightforward. This means we're having to use more complicated workarounds, like regex.

To tackle this, I'm suggesting we simplify the approach a bit. Let's focus on summarizing as much as we can upfront, then gradually add new content until there's nothing left to summarize.

Also, the "extend" part is mostly for models with small context windows, which shouldn't pose a problem 99% of the time with the content volume we're dealing with.

* Fix fold docs

* Use #shift instead of #pop to get the first elem, not the last
2024-10-25 11:51:17 -03:00
Sam 12869f2146
FIX: testing tool was not showing rag results (#867)
This changeset contains 4 fixes:

1. We were allowing running tests on unsaved tools,
this is problematic cause uploads are not yet associated or indexed
leading to confusing results. We now only show the test button when
tool is saved.


2. We were not properly scoping rag document fragements, this
meant that personas and ai tools could get results from other
unrelated tools, just to be filtered out later


3. index.search showed options as "optional" but implementation
required the second option

4. When testing tools searching through document fragments was
not working at all cause we did not properly load the tool
2024-10-25 16:01:25 +11:00
Sam 4923837165
FIX: Llm selector / forced tools / search tool (#862)
* FIX: Llm selector / forced tools / search tool


This fixes a few issues:

1. When search was not finding any semantic results we would break the tool
2. Gemin / Anthropic models did not implement forced tools previously despite it being an API option
3. Mechanics around displaying llm selector were not right. If you disabled LLM selector server side persona PM did not work correctly.
4. Disabling native tools for anthropic model moved out of a site setting. This deliberately does not migrate cause this feature is really rare to need now, people who had it set probably did not need it.
5. Updates anthropic model names to latest release

* linting

* fix a couple of tests I missed

* clean up conditional
2024-10-25 06:24:53 +11:00
Rafael dos Santos Silva 3022d34613
FEATURE: Support srv records for OpenAI compatible LLMs (#865) 2024-10-24 15:47:12 -03:00
Rafael dos Santos Silva 96f5f8cbd0
FIX: Basic cleanup of AI Caption to remove line breaks and pipes (#857) 2024-10-23 18:38:29 -03:00
Sam f1283e156d
FEATURE: allow scoping of google tool queries (#852)
This allows to simply scope search results to specific domains and prepend arbitrary snippets to searches made
2024-10-23 16:55:10 +11:00
Sam 059d3b6fd2
FEATURE: better logging for automation reports (#853)
A new feature_context json column was added to ai_api_audit_logs

This allows us to store rich json like context on any LLM request
made.

This new field now stores automation id and name.

Additionally allows llm_triage to specify maximum number of tokens

This means that you can limit the cost of llm triage by scanning only
first N tokens of a post.
2024-10-23 16:49:56 +11:00
Sam a1f859a415
FEATURE: improve visibility of AI usage in LLM page (#845)
This changeset: 

1. Corrects some issues with "force_default_llm" not applying
2. Expands the LLM list page to show LLM usage
3. Clarifies better what "enabling a bot" on an llm means (you get it in the selector)
2024-10-22 11:16:02 +11:00
Roman Rizzi 3533814870
UX: Avoid introductory phrases and summarize topics without replies (#848) 2024-10-21 17:53:48 -03:00
Roman Rizzi 6d504ab80d
FEATURE: Make hot topic gists opt-in. (#846)
This change restricts gists to members of specific groups. It also fixes a bug where other lists could display the gist if available.
2024-10-21 15:15:25 -03:00
Roman Rizzi 27b5542357
FEATURE: Generate topic gists for the hot topics list. (#837)
* Display gists in the hot topics list

* Adjust hot topics gist strategy and add a job to generate gists

* Replace setting with a configurable batch size

* Avoid loading summaries for other topic lists

* Tweak gist prompt to focus on latest posts in the context of the OP

* Remove serializer hack and rely on core change from discourse/discourse#29291

* Update lib/summarization/strategies/hot_topic_gists.rb

Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>

---------

Co-authored-by: Rafael dos Santos Silva <xfalcox@gmail.com>
2024-10-18 18:01:39 -03:00
Rafael dos Santos Silva decf1bb49d
FIX: api key header error (#839)
* FIX: api key header error

* FIX: remove unnecessary headers

* FIX: a error

* FEATURE: Add both headers

---------

Co-authored-by: 耗子 <i@haozi.net>
2024-10-16 15:57:36 -03:00
Kris 25cc03809a
DEV: ensure far-copy icon is included in subset (#841) 2024-10-16 12:32:13 -04:00
Rafael dos Santos Silva 792703c942
FEATURE: Discord Bot integration (#831)
This adds support for the a Discord bot that can search in a Discourse instance when invoked via slash commands in Discord Guild channel.
2024-10-16 12:41:18 -03:00
Martin Brennan d7745d1ac3
FIX: Remove missed AiPersona.allowed_chat (#838)
Followup bdf3b6268b

I think this is the fix but not sure how to test it,
this is breaking the build
2024-10-16 10:33:24 +11:00
Sam bdf3b6268b
FEATURE: smarter persona tethering (#832)
Splits persona permissions so you can allow a persona on:

- chat dms
- personal messages
- topic mentions
- chat channels

(any combination is allowed)

Previously we did not have this flexibility.

Additionally, adds the ability to "tether" a language model to a persona so it will always be used by the persona. This allows people to use a cheaper language model for one group of people and more expensive one for other people
2024-10-16 07:20:31 +11:00
Roman Rizzi c7acb4a6a0
REFACTOR: Support of different summarization targets/prompts. (#835)
* DEV: Add summary types

* Refactor for different summary types

* Use enum for summary types

* Update lib/summarization/strategies/topic_summary.rb

Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>

* Update lib/summarization/strategies/topic_gist.rb

Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>

* Update lib/summarization/strategies/chat_messages.rb

Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>

* Fix chat_messages single prompt

* Small tweak to the chat summarization prompt

---------

Co-authored-by: Penar Musaraj <pmusaraj@gmail.com>
2024-10-15 13:53:26 -03:00
Rafael dos Santos Silva 791fad1e6a
FEATURE: Index embeddings using bit vectors (#824)
On very large sites, the rare cache misses for Related Topics can take around 200ms, which affects our p99 metric on the topic page. In order to mitigate this impact, we now have several tools at our disposal.

First, one is to migrate the index embedding type from halfvec to bit and change the related topic query to leverage the new bit index by changing the search algorithm from inner product to Hamming distance. This will reduce our index sizes by 90%, severely reducing the impact of embeddings on our storage. By making the related query a bit smarter, we can have zero impact on recall by using the index to over-capture N*2 results, then re-ordering those N*2 using the full halfvec vectors and taking the top N. The expected impact is to go from 200ms to <20ms for cache misses and from a 2.5GB index to a 250MB index on a large site.

Another tool is migrating our index type from IVFFLAT to HNSW, which can increase the cache misses performance even further, eventually putting us in the under 5ms territory. 

Co-authored-by: Roman Rizzi <roman@discourse.org>
2024-10-14 13:26:03 -03:00
Hoa Nguyen 94010a5f78
FEATURE: Tools for models from Ollama provider (#819)
Adds support for Ollama function calling
2024-10-11 07:25:53 +11:00
Sam 6c4c96e83c
FEATURE: allow persona to only force tool calls on limited replies (#827)
This introduces another configuration that allows operators to
limit the amount of interactions with forced tool usage.

Forced tools are very handy in initial llm interactions, but as
conversation progresses they can hinder by slowing down stuff
and adding confusion.
2024-10-11 07:23:42 +11:00
Mark VanLandingham 52d90cf1bc
DEV: Add apply_modifier for SemanticTopicQuery topics list (#830) 2024-10-10 12:13:16 -05:00
Sam e1a0eb6131
FEATURE: support chain halting and upload creation support (#821)
This adds chain halting (ability to terminate llm chain in a tool)
and the ability to create uploads in a tool

Together this lets us integrate custom image generators into a
custom tool.
2024-10-09 08:17:45 +11:00
Sam 545500b329
FEATURE: allows forced LLM tool use (#818)
* FEATURE: allows forced LLM tool use

Sometimes we need to force LLMs to use tools, for example in RAG
like use cases we may want to force an unconditional search.

The new framework allows you backend to force tool usage.

Front end commit to follow

* UI for forcing tools now works, but it does not react right

* fix bugs

* fix tests, this is now ready for review
2024-10-05 09:46:57 +10:00
Sam c294b6d394
FEATURE: allow llm triage to automatically hide posts (#820)
Previous to this change we could flag, but there was no way
to hide content and treat the flag as spam.

We had the option to hide topics, but this is not desirable for
a spam reply.

New option allows triage to hide a post if it is a reply, if the
post happens to be the first post on the topic, the topic will
be hidden.
2024-10-04 16:11:30 +10:00
Hoa Nguyen 2063b3854f
FEATURE: Add Ollama provider (#812)
This allows our users to add the Ollama provider and use it to serve our AI bot (completion/dialect).

In this PR, we introduce:

    DiscourseAi::Completions::Dialects::Ollama which would help us translate by utilizing Completions::Endpoint::Ollama
    Correct extract_completion_from and partials_from in Endpoints::Ollama

Also

    Add tests for Endpoints::Ollama
    Introduce ollama_model fabricator
2024-10-01 10:45:03 +10:00
Sam 5cbc9190eb
FEATURE: RAG search within tools (#802)
This allows custom tools access to uploads and sophisticated searches using embedding.

It introduces:

 - A shared front end for listing and uploading files (shared with personas)
 -  Backend implementation of index.search function within a custom tool.

Custom tools now may search through uploaded files

function invoke(params) {
   return index.search(params.query)
}

This means that RAG implementers now may preload tools with knowledge and have high fidelity over
the search.

The search function support

    specifying max results
    specifying a subset of files to search (from uploads)

Also

 - Improved documentation for tools (when creating a tool a preamble explains all the functionality)
  - uploads were a bit finicky, fixed an edge case where the UI would not show them as updated
2024-09-30 17:27:50 +10:00
Sam 4b21eb7974
FEATURE: basic support for GPT-o models (#804)
Caveats

- No streaming, by design
- No tool support (including no XML tools)
- No vision

Open AI will revamt the model and more of these features may
become available.

This solution is a bit hacky for now
2024-09-17 09:41:00 +10:00
Sam 03eccbe392
FEATURE: Make tool support polymorphic (#798)
Polymorphic RAG means that we will be able to access RAG fragments both from AiPersona and AiCustomTool

In turn this gives us support for richer RAG implementations.
2024-09-16 08:17:17 +10:00
Sam 5b9add0ac8
FEATURE: add a SambaNova LLM provider (#797)
Note, at the moment the context window is quite small, it is
mainly useful as a helper backend or hyde generator
2024-09-12 11:28:08 +10:00
Sam 36ce88f356
FIX: support case insensitive setting lookup (#795) 2024-09-10 15:21:03 +10:00
Sam a48acc894a
FEATURE: more accurate and faster titles (#791)
Previously we waited 1 minute before automatically titling PMs

The new change introduces adding a title immediately after the the
llm replies

Prompt was also modified to include the LLM reply in title suggestion.

This helps situation like:

user: tell me a joke
llm: a very funy joke about horses

Then the title would be "A Funny Horse Joke"

Specs already covered some auto title logic, amended to also
catch the new message bus message we have been sending.
2024-09-03 15:52:20 +10:00
Roman Rizzi db5cbfb148
FIX: Bail earlier when a chat thread has no messages (#789) 2024-08-30 17:17:14 -03:00