Commit Graph

67 Commits

Author SHA1 Message Date
Roman Rizzi c6aeabbfc0
FIX: Malformed message in systemless + inline img scenario (#771) 2024-08-23 16:41:57 -03:00
Roman Rizzi 64641b6175
FEATURE: LLM Triage support for systemless models. (#757)
* FEATURE: LLM Triage support for systemless models.

This change adds support for OSS models without support for system messages. LlmTriage's system message field is no longer mandatory. We now send the post contents in a separate user message.

* Models using Ollama can also disable system prompts
2024-08-21 11:41:55 -03:00
Sam cf1e15ef12
FIX: gemini 0801 tool calls (#748)
Gemini experimental model requires tool_config.

Previously defaults would apply.

This corrects prompts containing multiple tools on gemini.
2024-08-12 16:10:16 +10:00
Sam 948cf893a9
FIX: Add tool support to open ai compatible dialect and vllm (#734)
* FIX: Add tool support to open ai compatible dialect and vllm

Automatic tools are in progress in vllm see: https://github.com/vllm-project/vllm/pull/5649

Even when they are supported, initial support will be uneven, only some models have native tool support
notably mistral which has some special tokens for tool support.

After the above PR lands in vllm we will still need to swap to
XML based tools on models without native tool support.

* fix specs
2024-08-02 09:52:33 -03:00
Roman Rizzi bed044448c
DEV: Remove old code now that features rely on LlmModels. (#729)
* DEV: Remove old code now that features rely on LlmModels.

* Hide old settings and migrate persona llm overrides

* Remove shadowing special URL + seeding code. Use srv:// prefix instead.
2024-07-30 13:44:57 -03:00
Roman Rizzi 5c196bca89
FEATURE: Track if a model can do vision in the llm_models table (#725)
* FEATURE: Track if a model can do vision in the llm_models table

* Data migration
2024-07-24 16:29:47 -03:00
Roman Rizzi 0a8195242b
FIX: Limit system message size to 60% of available tokens. (#714)
Using RAG fragments can lead to considerably big system messages, which becomes problematic when models have a smaller context window.

Before this change, we only look at the rest of the conversation to make sure we don't surpass the limit, which could lead to two unwanted scenarios when having large system messages:

All other messages are excluded due to size.
The system message already exceeds the limit.

As a result, I'm putting a hard-limit of 60% of available tokens. We don't want to aggresively truncate because if rag fragments are included, the system message contains a lot of context to improve the model response, but we also want to make room for the recent messages in the conversation.
2024-07-12 15:09:01 -03:00
Roman Rizzi 442681a3d3
FIX: Mixtral models have system role support. (#703)
Using assistant role for system produces an error because
they expect alternating roles like user/assistant/user and so on.
Prompts cannot start with the assistant role.
2024-07-04 13:23:03 -03:00
Sam 1d5fa0ce6c
FIX: when creating an llm we were not creating user (#685)
This meant that if you toggle ai user early it surprisingly did
not work.

Also remove safety settings from gemini, it is overly cautious
2024-06-24 09:59:42 +10:00
Sam 460f5c4553
FIX: display search correctly, bug when stripping XML (#668)
- Display filtered search correctly, so it is not confusing
- When XML stripping, if a chunk was `<` it would crash
- SQL Helper improved to be better aware of Data Explorer
2024-06-14 15:28:40 +10:00
Sam 8b81ff45b8
FIX: switch off native tools on Anthropic Claude Opus (#659)
Native tools do not work well on Opus.

Chain of Thought prompting means it consumes enormous amounts of
tokens and has poor latency.

This commit introduce and XML stripper to remove various chain of
thought XML islands from anthropic prompts when tools are involved.

This mean Opus native tools is now functions (albeit slowly)

From local testing XML just works better now.

Also fixes enum support in Anthropic native tools
2024-06-07 10:52:01 -03:00
Sam 3993c685e1
FEATURE: anthropic function calling (#654)
Adds support for native tool calling (both streaming and non streaming) for Anthropic.

This improves general tool support on the Anthropic models.
2024-06-06 08:34:23 +10:00
Sam 564d2de534
FEATURE: Add native Cohere tool support (#655)
Add native Cohere tool support

- Introduce CohereTools class for tool translation and result processing
- Update Command dialect to integrate with CohereTools
- Modify Cohere endpoint to support passing tools and processing tool calls
- Add spec for testing tool triggering with Cohere endpoint
2024-06-04 08:59:15 +10:00
Sam baf88e7cfc
FEATURE: improve logging by including llm name (#640)
Log the language model name when logging api requests
2024-05-27 16:46:01 +10:00
Sam d5c23f01ff
FIX: correct gemini streaming implementation (#632)
This also implements image support and gemini-flash support
2024-05-22 16:35:29 +10:00
Roman Rizzi 1d786fbaaf
FEATURE: Set endpoint credentials directly from LlmModel. (#625)
* FEATURE: Set endpoint credentials directly from LlmModel.

Drop Llama2Tokenizer since we no longer use it.

* Allow http for custom LLMs

---------

Co-authored-by: Rafael Silva <xfalcox@gmail.com>
2024-05-16 09:50:22 -03:00
Sam 8eee6893d6
FEATURE: GPT4o support and better auditing (#618)
- Introduce new support for GPT4o (automation / bot / summary / helper)
- Properly account for token counts on OpenAI models
- Track feature that was used when generating AI completions
- Remove custom llm support for summarization as we need better interfaces to control registration and de-registration
2024-05-14 13:28:46 +10:00
Roman Rizzi 62fc7d6ed0
FEATURE: Configurable LLMs. (#606)
This PR introduces the concept of "LlmModel" as a new way to quickly add new LLM models without making any code changes. We are releasing this first version and will add incremental improvements, so expect changes.

The AI Bot can't fully take advantage of this feature as users are hard-coded. We'll fix this in a separate PR.s
2024-05-13 12:46:42 -03:00
Sam 514823daca
FIX: streaming broken in bedrock when chunks are not aligned (#609)
Also

- Stop caching llm list - this cause llm list in persona to be incorrect
- Add more UI to debug screen so you can properly see raw response
2024-05-09 12:11:50 +10:00
Roman Rizzi 4f1a3effe0
REFACTOR: Migrate Vllm/TGI-served models to the OpenAI format. (#588)
Both endpoints provide OpenAI-compatible servers. The only difference is that Vllm doesn't support passing tools as a separate parameter. Even if the tool param is supported, it ultimately relies on the model's ability to handle native functions, which is not the case with the models we have today.

As a part of this change, we are dropping support for StableBeluga/Llama2 models. They don't have a chat_template, meaning the new API can translate them.

These changes let us remove some of our existing dialects and are a first step in our plan to support any LLM by defining them as data-driven concepts.

 I rewrote the "translate" method to use a template method and extracted the tool support strategies into its classes to simplify the code.

Finally, these changes bring support for Ollama when running in dev mode. It only works with Mistral for now, but it will change soon..
2024-05-07 10:02:16 -03:00
Sam e4b326c711
FEATURE: support Chat with AI Persona via a DM (#488)
Add support for chat with AI personas

- Allow enabling chat for AI personas that have an associated user
- Add new setting `allow_chat` to AI persona to enable/disable chat
- When a message is created in a DM channel with an allowed AI persona user, schedule a reply job
- AI replies to chat messages using the persona's `max_context_posts` setting to determine context
- Store tool calls and custom prompts used to generate a chat reply on the `ChatMessageCustomPrompt` table
- Add tests for AI chat replies with tools and context

At the moment unlike posts we do not carry tool calls in the context.

No @mention support yet for ai personas in channels, this is future work
2024-05-06 09:49:02 +10:00
Sam 6623928b95
FIX: call after tool calls failing on OpenAI / Gemini (#599)
A recent change meant that llm instance got cached internally, repeat calls
to inference would cache data in Endpoint object leading model to
failures.

Both Gemini and Open AI expect a clean endpoint object cause they
set data.

This amends internals to make sure llm.generate will always operate
on clean objects
2024-05-01 17:50:58 +10:00
Sam a223d18f1a
FIX: more robust function call support (#581)
For quite a few weeks now, some times, when running function calls
on Anthropic we would get a "stray" - "calls" line.

This has been enormously frustrating!

I have been unable to find the source of the bug so instead decoupled
the implementation and create a very clear "function call normalizer"

This new class is extensively tested and guards against the type of
edge cases we saw pre-normalizer.

This also simplifies the implementation of "endpoint" which no longer
needs to handle all this complex logic.
2024-04-19 06:54:54 +10:00
Sam 23d12c8927
FEATURE: GPT-4 turbo vision support (#575)
Recent release of GPT-4 turbo adds vision support, this adds
the pipeline for sending images to Open AI.
2024-04-11 16:22:59 +10:00
Sam a77658e2b1
FIX: tools broke on Claude with no params (#574)
Some tools may have no params, allow that
2024-04-11 15:17:56 +10:00
Sam 3b0cfdbe5c
FIX: throwing away first file in diff (#573)
We were chucking out the first file in a PR diff due to a
logic bug
2024-04-11 13:26:58 +10:00
Sam 0cbbf130b9
FIX: never mention the word JSON in tool preamble (#572)
Just having the word JSON can confuse models when we expect them
to deal solely in XML

Instead provide an example of how string arrays should be returned

Technically the tool framework supports int arrays and more, but
our current implementation only does string arrays.

Also tune the prompt construction not to give any tips about arrays
if none exist
2024-04-11 11:24:22 +10:00
Sam 7f16d3ad43
FEATURE: Cohere Command R support (#558)
- Added Cohere Command models (Command, Command Light, Command R, Command R Plus) to the available model list
- Added a new site setting `ai_cohere_api_key` for configuring the Cohere API key
- Implemented a new `DiscourseAi::Completions::Endpoints::Cohere` class to handle interactions with the Cohere API, including:
   - Translating request parameters to the Cohere API format
   - Parsing Cohere API responses 
   - Supporting streaming and non-streaming completions
   - Supporting "tools" which allow the model to call back to discourse to lookup additional information
- Implemented a new `DiscourseAi::Completions::Dialects::Command` class to translate between the generic Discourse AI prompt format and the Cohere Command format
- Added specs covering the new Cohere endpoint and dialect classes
- Updated `DiscourseAi::AiBot::Bot.guess_model` to map the new Cohere model to the appropriate bot user

In summary, this PR adds support for using the Cohere Command family of models with the Discourse AI plugin. It handles configuring API keys, making requests to the Cohere API, and translating between Discourse's generic prompt format and Cohere's specific format. Thorough test coverage was added for the new functionality.
2024-04-11 07:24:17 +10:00
Sam 6f5f34184b
FEATURE: add Claude 3 Haiku bot support (#552)
it is close in performance to GPT 4 at a fraction of the cost,
nice to add it to the mix.

Also improves a test case to simulate streaming, I am hunting for
the "calls" word that is jumping into function calls and can't quite
find it.
2024-04-03 16:06:27 +11:00
Sam 61e4c56e1a
FEATURE: Add vision support to AI personas (Claude 3) (#546)
This commit adds the ability to enable vision for AI personas, allowing them to understand images that are posted in the conversation.

For personas with vision enabled, any images the user has posted will be resized to be within the configured max_pixels limit, base64 encoded and included in the prompt sent to the AI provider.

The persona editor allows enabling/disabling vision and has a dropdown to select the max supported image size (low, medium, high). Vision is disabled by default.

This initial vision support has been tested and implemented with Anthropic's claude-3 models which accept images in a special format as part of the prompt.

Other integrations will need to be updated to support images.

Several specs were added to test the new functionality at the persona, prompt building and API layers.

 - Gemini is omitted, pending API support for Gemini 1.5. Current Gemini bot is not performing well, adding images is unlikely to make it perform any better.

 - Open AI is omitted, vision support on GPT-4 it limited in that the API has no tool support when images are enabled so we would need to full back to a different prompting technique, something that would add lots of complexity


---------

Co-authored-by: Martin Brennan <martin@discourse.org>
2024-03-27 14:30:11 +11:00
Sam f62703760f
FEATURE: add Claude 3 sonnet/haiku support for Amazon Bedrock (#534)
This PR consolidates the  implements new Anthropic Messages interface for Bedrock Claude endpoints and adds support for the new Claude 3 models (haiku, opus, sonnet).

Key changes:
- Renamed `AnthropicMessages` and `Anthropic` endpoint classes into a single `Anthropic` class (ditto for ClaudeMessages -> Claude)
- Updated `AwsBedrock` endpoints to use the new `/messages` API format for all Claude models
- Added `claude-3-haiku`, `claude-3-opus` and `claude-3-sonnet` model support in both Anthropic and AWS Bedrock endpoints
- Updated specs for the new consolidated endpoints and Claude 3 model support

This refactor removes support for old non messages API which has been deprecated by anthropic
2024-03-19 06:48:46 +11:00
Sam 79638c2f50
FIX: Tune function calling (#519)
Adds support for "name" on functions which can be used for tool calls

For function calls we need to keep track of id/name and previously
we only supported either

Also attempts to improve sql helper
2024-03-09 08:46:40 +11:00
Sam 2ad743d246
FEATURE: Add GitHub Helper AI Bot persona and tools (#513)
Introduces a new AI Bot persona called 'GitHub Helper' which is specialized in assisting with GitHub-related tasks and questions. It includes the following key changes:

- Implements the GitHub Helper persona class with its system prompt and available tools
   
- Adds three new AI Bot tools for GitHub interactions:
  - github_file_content: Retrieves content of files from a GitHub repository
  - github_pull_request_diff: Retrieves the diff for a GitHub pull request
  - github_search_code: Searches for code in a GitHub repository
    
- Updates the AI Bot dialects to support the new GitHub tools

- Implements multiple function calls for standard tool dialect
2024-03-08 06:37:23 +11:00
Loïc Guitaut 6ae4218a96 DEV: Fix new Rubocop offenses 2024-03-06 15:23:29 +01:00
Sam 8b382d6098
FEATURE: support for claude opus and sonnet (#508)
This provides new support for messages API from Claude.

It is required for latest model access.

Also corrects implementation of function calls.

* Fix message interleving

* fix broken spec

* add new models to automation
2024-03-06 06:04:37 +11:00
Sam c02794cf2e
FIX: support multiple tool calls (#502)
* FIX: support multiple tool calls

Prior to this change we had a hard limit of 1 tool call per llm
round trip. This meant you could not google multiple things at
once or perform searches across two tools.

Also:

- Hint when Google stops working
- Log topic_id / post_id when performing completions

* Also track id for title
2024-03-02 07:53:21 +11:00
Sam 9fb1430e40
FIX: support spaces within arguments for Open AI (#499)
Previous to this fix if a tool call ever streamed a SPACE alone,
we would eat it and ignore it, breaking params

Also fixes some tests to ensure they are actually called :)
2024-02-29 12:47:34 +11:00
Sam aabff87501
FIX: image generation in gemini was broken (#490)
We need to inject blank model answers after tool calls if absent
otherwise model will reject it.
2024-02-27 18:24:30 +11:00
Roman Rizzi 94ba0dadc2
SECURITY: Place a SSRF protection when calling services from the plugin. (#485)
The Faraday adapter and `FinalDestionation::HTTP` will protect us from admin-initiated SSRF attacks when interacting with the external services powering this plugin features.:
2024-02-21 17:14:50 -03:00
Roman Rizzi 0634b85a81
UX: Validations to LLM-backed features (except AI Bot) (#436)
* UX: Validations to Llm-backed features (except AI Bot)

This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs.

Validations are:

* You must choose a model before enabling the feature.
* You must turn off the feature before setting the model to blank.
* You must configure each model settings before being able to select it.

* Add provider name to summarization options

* vLLM can technically support same models as HF

* Check we can talk to the selected model

* Check for Bedrock instead of anthropic as a site could have both creds setup
2024-01-29 16:04:25 -03:00
Jarek Radosz 5802cd1a0c
DEV: Fix various typos (#434) 2024-01-19 12:51:26 +01:00
Roman Rizzi 5bdf3dc1f4
DEV: Stop using shared_examples for endpoint specs (#430) 2024-01-17 15:08:49 -03:00
Sam 05d8b021f1
FIX: scrub invalid prompts when truncating (#426)
When you trim a prompt we never want to have a state where there
is a "tool" reply without a corresponding tool call, it makes no
sense

Also

- GPT-4-Turbo is 128k, fix that
- Claude was not preserving username in prompt
- We were throwing away unicode usernames instead of adding to
message
2024-01-16 13:48:00 +11:00
Roman Rizzi ff4da6ace8
FIX: Clean unicode usernames when adding messages through prompt's contrstuctor (#425) 2024-01-15 12:01:40 -03:00
Sam 825f01cfb2
FEATURE: even smoother streaming (#420)
Account properly for function calls, don't stream through <details> blocks
- Rush cooked content back to client
- Wait longer (up to 60 seconds) before giving up on streaming
- Clean up message bus channels so we don't have leftover data
- Make ai streamer much more reusable and much easier to read
- If buffer grows quickly, rush update so you are not artificially waiting
- Refine prompt interface
- Fix lost system message when prompt gets long
2024-01-15 18:51:14 +11:00
Roman Rizzi 04eae76f68
REFACTOR: Represent generic prompts with an Object. (#416)
* REFACTOR: Represent generic prompts with an Object.

* Adds a bit more validation for clarity

* Rewrite bot title prompt and fix quirk handling

---------

Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
2024-01-12 14:36:44 -03:00
Sam 8df966e9c5
FEATURE: smooth streaming of AI responses on the client (#413)
This PR introduces 3 things:

1. Fake bot that can be used on local so you can test LLMs, to enable on dev use:

SiteSetting.ai_bot_enabled_chat_bots = "fake"

2. More elegant smooth streaming of progress on LLM completion

This leans on JavaScript to buffer and trickle llm results through. It also amends it so the progress dot is much 
more consistently rendered

3. It fixes the Claude dialect 

Claude needs newlines **exactly** at the right spot, amended so it is happy 

---------

Co-authored-by: Martin Brennan <martin@discourse.org>
2024-01-11 15:56:40 +11:00
Roman Rizzi abde82c1f3
FIX: Use claude-2.1 to enable system prompts (#411) 2024-01-09 14:10:20 -03:00
Sam b0a0cbe3ca
FIX: improve bot behavior (#408)
* FIX: improve bot behavior

- Provide more information to Gemini context post function execution
- Use system prompts for Claude (fixes Dall E)
- Ensure Assistant is properly separated
- Teach Claude to return arrays in JSON vs XML

Also refactors tests so we do not copy tool preamble everywhere

* System msg is claude-2 only. fix typo

---------

Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
2024-01-08 10:28:03 -03:00
Roman Rizzi 6124f910c1
FIX: Bring back Azure support. (#407)
We thought Azure's latest API version didn't have tool support yet, but I didn't understand it was complaining about a required field in the tool call message.
2024-01-05 17:08:10 -03:00