discourse-ai

mirror of https://github.com/discourse/discourse-ai.git synced 2025-02-18 17:34:52 +00:00

Author	SHA1	Message	Date
Sam	545500b329	FEATURE: allows forced LLM tool use (#818 ) * FEATURE: allows forced LLM tool use Sometimes we need to force LLMs to use tools, for example in RAG like use cases we may want to force an unconditional search. The new framework allows you backend to force tool usage. Front end commit to follow * UI for forcing tools now works, but it does not react right * fix bugs * fix tests, this is now ready for review	2024-10-05 09:46:57 +10:00
Hoa Nguyen	2063b3854f	FEATURE: Add Ollama provider (#812 ) This allows our users to add the Ollama provider and use it to serve our AI bot (completion/dialect). In this PR, we introduce: DiscourseAi::Completions::Dialects::Ollama which would help us translate by utilizing Completions::Endpoint::Ollama Correct extract_completion_from and partials_from in Endpoints::Ollama Also Add tests for Endpoints::Ollama Introduce ollama_model fabricator	2024-10-01 10:45:03 +10:00
Roman Rizzi	bed044448c	DEV: Remove old code now that features rely on LlmModels. (#729 ) * DEV: Remove old code now that features rely on LlmModels. * Hide old settings and migrate persona llm overrides * Remove shadowing special URL + seeding code. Use srv:// prefix instead.	2024-07-30 13:44:57 -03:00
Roman Rizzi	5c196bca89	FEATURE: Track if a model can do vision in the llm_models table (#725 ) * FEATURE: Track if a model can do vision in the llm_models table * Data migration	2024-07-24 16:29:47 -03:00
Roman Rizzi	0a8195242b	FIX: Limit system message size to 60% of available tokens. (#714 ) Using RAG fragments can lead to considerably big system messages, which becomes problematic when models have a smaller context window. Before this change, we only look at the rest of the conversation to make sure we don't surpass the limit, which could lead to two unwanted scenarios when having large system messages: All other messages are excluded due to size. The system message already exceeds the limit. As a result, I'm putting a hard-limit of 60% of available tokens. We don't want to aggresively truncate because if rag fragments are included, the system message contains a lot of context to improve the model response, but we also want to make room for the recent messages in the conversation.	2024-07-12 15:09:01 -03:00
Roman Rizzi	442681a3d3	FIX: Mixtral models have system role support. (#703 ) Using assistant role for system produces an error because they expect alternating roles like user/assistant/user and so on. Prompts cannot start with the assistant role.	2024-07-04 13:23:03 -03:00
Roman Rizzi	1d786fbaaf	FEATURE: Set endpoint credentials directly from LlmModel. (#625 ) * FEATURE: Set endpoint credentials directly from LlmModel. Drop Llama2Tokenizer since we no longer use it. * Allow http for custom LLMs --------- Co-authored-by: Rafael Silva <xfalcox@gmail.com>	2024-05-16 09:50:22 -03:00
Roman Rizzi	8b00c47087	FIX: dialects var was not defined in prod (#617 )	2024-05-13 17:28:27 -03:00
Roman Rizzi	e22194f321	HACK: Llama3 support for summarization/AI helper. (#616 ) There are still some limitations to which models we can support with the `LlmModel` class. This will enable support for Llama3 while we sort those out.	2024-05-13 15:54:42 -03:00
Roman Rizzi	62fc7d6ed0	FEATURE: Configurable LLMs. (#606 ) This PR introduces the concept of "LlmModel" as a new way to quickly add new LLM models without making any code changes. We are releasing this first version and will add incremental improvements, so expect changes. The AI Bot can't fully take advantage of this feature as users are hard-coded. We'll fix this in a separate PR.s	2024-05-13 12:46:42 -03:00
Roman Rizzi	4f1a3effe0	REFACTOR: Migrate Vllm/TGI-served models to the OpenAI format. (#588 ) Both endpoints provide OpenAI-compatible servers. The only difference is that Vllm doesn't support passing tools as a separate parameter. Even if the tool param is supported, it ultimately relies on the model's ability to handle native functions, which is not the case with the models we have today. As a part of this change, we are dropping support for StableBeluga/Llama2 models. They don't have a chat_template, meaning the new API can translate them. These changes let us remove some of our existing dialects and are a first step in our plan to support any LLM by defining them as data-driven concepts. I rewrote the "translate" method to use a template method and extracted the tool support strategies into its classes to simplify the code. Finally, these changes bring support for Ollama when running in dev mode. It only works with Mistral for now, but it will change soon..	2024-05-07 10:02:16 -03:00
Sam	a77658e2b1	FIX: tools broke on Claude with no params (#574 ) Some tools may have no params, allow that	2024-04-11 15:17:56 +10:00
Sam	0cbbf130b9	FIX: never mention the word JSON in tool preamble (#572 ) Just having the word JSON can confuse models when we expect them to deal solely in XML Instead provide an example of how string arrays should be returned Technically the tool framework supports int arrays and more, but our current implementation only does string arrays. Also tune the prompt construction not to give any tips about arrays if none exist	2024-04-11 11:24:22 +10:00
Sam	7f16d3ad43	FEATURE: Cohere Command R support (#558 ) - Added Cohere Command models (Command, Command Light, Command R, Command R Plus) to the available model list - Added a new site setting `ai_cohere_api_key` for configuring the Cohere API key - Implemented a new `DiscourseAi::Completions::Endpoints::Cohere` class to handle interactions with the Cohere API, including: - Translating request parameters to the Cohere API format - Parsing Cohere API responses - Supporting streaming and non-streaming completions - Supporting "tools" which allow the model to call back to discourse to lookup additional information - Implemented a new `DiscourseAi::Completions::Dialects::Command` class to translate between the generic Discourse AI prompt format and the Cohere Command format - Added specs covering the new Cohere endpoint and dialect classes - Updated `DiscourseAi::AiBot::Bot.guess_model` to map the new Cohere model to the appropriate bot user In summary, this PR adds support for using the Cohere Command family of models with the Discourse AI plugin. It handles configuring API keys, making requests to the Cohere API, and translating between Discourse's generic prompt format and Cohere's specific format. Thorough test coverage was added for the new functionality.	2024-04-11 07:24:17 +10:00
Sam	f62703760f	FEATURE: add Claude 3 sonnet/haiku support for Amazon Bedrock (#534 ) This PR consolidates the implements new Anthropic Messages interface for Bedrock Claude endpoints and adds support for the new Claude 3 models (haiku, opus, sonnet). Key changes: - Renamed `AnthropicMessages` and `Anthropic` endpoint classes into a single `Anthropic` class (ditto for ClaudeMessages -> Claude) - Updated `AwsBedrock` endpoints to use the new `/messages` API format for all Claude models - Added `claude-3-haiku`, `claude-3-opus` and `claude-3-sonnet` model support in both Anthropic and AWS Bedrock endpoints - Updated specs for the new consolidated endpoints and Claude 3 model support This refactor removes support for old non messages API which has been deprecated by anthropic	2024-03-19 06:48:46 +11:00
Sam	79638c2f50	FIX: Tune function calling (#519 ) Adds support for "name" on functions which can be used for tool calls For function calls we need to keep track of id/name and previously we only supported either Also attempts to improve sql helper	2024-03-09 08:46:40 +11:00
Sam	2ad743d246	FEATURE: Add GitHub Helper AI Bot persona and tools (#513 ) Introduces a new AI Bot persona called 'GitHub Helper' which is specialized in assisting with GitHub-related tasks and questions. It includes the following key changes: - Implements the GitHub Helper persona class with its system prompt and available tools - Adds three new AI Bot tools for GitHub interactions: - github_file_content: Retrieves content of files from a GitHub repository - github_pull_request_diff: Retrieves the diff for a GitHub pull request - github_search_code: Searches for code in a GitHub repository - Updates the AI Bot dialects to support the new GitHub tools - Implements multiple function calls for standard tool dialect	2024-03-08 06:37:23 +11:00
Sam	8b382d6098	FEATURE: support for claude opus and sonnet (#508 ) This provides new support for messages API from Claude. It is required for latest model access. Also corrects implementation of function calls. * Fix message interleving * fix broken spec * add new models to automation	2024-03-06 06:04:37 +11:00
Sam	c02794cf2e	FIX: support multiple tool calls (#502 ) * FIX: support multiple tool calls Prior to this change we had a hard limit of 1 tool call per llm round trip. This meant you could not google multiple things at once or perform searches across two tools. Also: - Hint when Google stops working - Log topic_id / post_id when performing completions * Also track id for title	2024-03-02 07:53:21 +11:00
Sam	05d8b021f1	FIX: scrub invalid prompts when truncating (#426 ) When you trim a prompt we never want to have a state where there is a "tool" reply without a corresponding tool call, it makes no sense Also - GPT-4-Turbo is 128k, fix that - Claude was not preserving username in prompt - We were throwing away unicode usernames instead of adding to message	2024-01-16 13:48:00 +11:00
Sam	825f01cfb2	FEATURE: even smoother streaming (#420 ) Account properly for function calls, don't stream through <details> blocks - Rush cooked content back to client - Wait longer (up to 60 seconds) before giving up on streaming - Clean up message bus channels so we don't have leftover data - Make ai streamer much more reusable and much easier to read - If buffer grows quickly, rush update so you are not artificially waiting - Refine prompt interface - Fix lost system message when prompt gets long	2024-01-15 18:51:14 +11:00
Roman Rizzi	04eae76f68	REFACTOR: Represent generic prompts with an Object. (#416 ) * REFACTOR: Represent generic prompts with an Object. * Adds a bit more validation for clarity * Rewrite bot title prompt and fix quirk handling --------- Co-authored-by: Sam Saffron <sam.saffron@gmail.com>	2024-01-12 14:36:44 -03:00
Sam	8df966e9c5	FEATURE: smooth streaming of AI responses on the client (#413 ) This PR introduces 3 things: 1. Fake bot that can be used on local so you can test LLMs, to enable on dev use: SiteSetting.ai_bot_enabled_chat_bots = "fake" 2. More elegant smooth streaming of progress on LLM completion This leans on JavaScript to buffer and trickle llm results through. It also amends it so the progress dot is much more consistently rendered 3. It fixes the Claude dialect Claude needs newlines exactly at the right spot, amended so it is happy --------- Co-authored-by: Martin Brennan <martin@discourse.org>	2024-01-11 15:56:40 +11:00
Sam	b0a0cbe3ca	FIX: improve bot behavior (#408 ) * FIX: improve bot behavior - Provide more information to Gemini context post function execution - Use system prompts for Claude (fixes Dall E) - Ensure Assistant is properly separated - Teach Claude to return arrays in JSON vs XML Also refactors tests so we do not copy tool preamble everywhere * System msg is claude-2 only. fix typo --------- Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>	2024-01-08 10:28:03 -03:00
Roman Rizzi	971e03bdf2	FEATURE: AI Bot Gemini support. (#402 ) It also corrects the syntax around tool support, which was wrong. Gemini doesn't want us to include messages about previous tool invocations, so I had to shuffle around some code to send the response it generated from those invocations instead. For this, I created the "multi_turn" context, which bundles all the context involved in the interaction.	2024-01-04 18:15:34 -03:00
Roman Rizzi	f9d7d7f5f0	DEV: AI bot migration to the Llm pattern. (#343 ) * DEV: AI bot migration to the Llm pattern. We added tool and conversation context support to the Llm service in discourse-ai#366, meaning we met all the conditions to migrate this module. This PR migrates to the new pattern, meaning adding a new bot now requires minimal effort as long as the service supports it. On top of this, we introduce the concept of a "Playground" to separate the PM-specific bits from the completion, allowing us to use the bot in other contexts like chat in the future. Commands are called tools, and we simplified all the placeholder logic to perform updates in a single place, making the flow more one-wayish. * Followup fixes based on testing * Cleanup unused inference code * FIX: text-based tools could be in the middle of a sentence * GPT-4-turbo support * Use new LLM API	2024-01-04 10:44:07 -03:00
Rafael dos Santos Silva	5db7bf6e68	Mixtral (#376 ) Add both Mistral and Mixtral support. Also includes vLLM-openAI inference support. Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>	2023-12-26 14:49:55 -03:00
Sam	d0f54443ae	FEATURE: LLM based peroidical summary report (#357 ) Introduce a Discourse Automation based periodical report. Depends on Discourse Automation. Report works best with very large context language models such as GPT-4-Turbo and Claude 2. - Introduces final_insts to generic llm format, for claude to work best it is better to guide the last assistant message (we should add this to other spots as well) - Adds GPT-4 turbo support to generic llm interface	2023-12-19 12:04:15 +11:00
Roman Rizzi	e0bf6adb5b	DEV: Tool support for the LLM service. (#366 ) This PR adds tool support to available LLMs. We'll buffer tool invocations and return them instead of making users of this service parse the response. It also adds support for conversation context in the generic prompt. It includes bot messages, user messages, and tool invocations, which we'll trim to make sure it doesn't exceed the prompt limit, then translate them to the correct dialect. Finally, It adds some buffering when reading chunks to handle cases when streaming is extremely slow.:M	2023-12-18 18:06:01 -03:00

29 Commits