discourse-ai

mirror of https://github.com/discourse/discourse-ai.git synced 2025-02-19 18:04:51 +00:00

Author	SHA1	Message	Date
Sam	3a8d95f6b2	FEATURE: mentionable personas and random picker tool, context limits (#466 ) 1. Personas are now optionally mentionable, meaning that you can mention them either from public topics or PMs - Mentioning from PMs helps "switch" persona mid conversation, meaning if you want to look up sites setting you can invoke the site setting bot, or if you want to generate an image you can invoke dall e - Mentioning outside of PMs allows you to inject a bot reply in a topic trivially - We also add the support for max_context_posts this allow you to limit the amount of context you feed in, which can help control costs 2. Add support for a "random picker" tool that can be used to pick random numbers 3. Clean up routing ai_personas -> ai-personas 4. Add Max Context Posts so users can control how much history a persona can consume (this is important for mentionable personas) Co-authored-by: Martin Brennan <martin@discourse.org>	2024-02-15 16:37:59 +11:00
Sam	a3c827efcc	FEATURE: allow personas to supply top_p and temperature params (#459 ) * FEATURE: allow personas to supply top_p and temperature params Code assistance generally are more focused at a lower temperature This amends it so SQL Helper runs at 0.2 temperature vs the more common default across LLMs of 1.0. Reduced temperature leads to more focused, concise and predictable answers for the SQL Helper * fix tests * This is not perfect, but far better than what we do today Instead of fishing for 1. Draft sequence 2. Draft body We skip (2), this means the composer "only" needs 1 http request to open, we also want to eliminate (1) but it is a bit of a trickier core change, may figure out how to pull it off (defer it to first draft save) Value of bot drafts < value of opening bot conversations really fast	2024-02-03 07:09:34 +11:00
Sam	cec4251b00	DEV: improve error bedrock error messages (#454 ) When bedrock rate limits it returns a 200 BUT also returns a JSON document with the error. Previously we had no special case here so we complained about nil New code properly logs the problem	2024-02-01 08:01:07 -03:00
Sam	abcf5ea94a	FEATURE: fine tune llm report to follow instructions more closely (#451 ) - Allow users to supply top_p and temperature values, which means people can fine tune randomness - Fix bad localization string - Fix bad remapping of max tokens in gemini - Add support for top_p as a general param to llms - Amend system prompt so persona stops treating a user as an adversary	2024-01-31 09:58:25 +11:00
Roman Rizzi	0634b85a81	UX: Validations to LLM-backed features (except AI Bot) (#436 ) * UX: Validations to Llm-backed features (except AI Bot) This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs. Validations are: * You must choose a model before enabling the feature. * You must turn off the feature before setting the model to blank. * You must configure each model settings before being able to select it. * Add provider name to summarization options * vLLM can technically support same models as HF * Check we can talk to the selected model * Check for Bedrock instead of anthropic as a site could have both creds setup	2024-01-29 16:04:25 -03:00
Sam	092da860e2	FEATURE: support gpt-4-0125 which was just released (#443 ) The new model has better performance and is always preferable to the old one which has unicode issues during function calls.	2024-01-26 09:08:02 +11:00
Jarek Radosz	5802cd1a0c	DEV: Fix various typos (#434 )	2024-01-19 12:51:26 +01:00
Roman Rizzi	5bdf3dc1f4	DEV: Stop using shared_examples for endpoint specs (#430 )	2024-01-17 15:08:49 -03:00
Sam	370074ef21	FIX: always ensure `#generate` gets a valid input (#427 ) We were not validating input for generate leading to 2 tests not failing correctly despite functionality being broken. This ensures that input is validated,and in turn fixes the broken specs	2024-01-16 15:21:58 +11:00
Sam	05d8b021f1	FIX: scrub invalid prompts when truncating (#426 ) When you trim a prompt we never want to have a state where there is a "tool" reply without a corresponding tool call, it makes no sense Also - GPT-4-Turbo is 128k, fix that - Claude was not preserving username in prompt - We were throwing away unicode usernames instead of adding to message	2024-01-16 13:48:00 +11:00
Roman Rizzi	ff4da6ace8	FIX: Clean unicode usernames when adding messages through prompt's contrstuctor (#425 )	2024-01-15 12:01:40 -03:00
Sam	825f01cfb2	FEATURE: even smoother streaming (#420 ) Account properly for function calls, don't stream through <details> blocks - Rush cooked content back to client - Wait longer (up to 60 seconds) before giving up on streaming - Clean up message bus channels so we don't have leftover data - Make ai streamer much more reusable and much easier to read - If buffer grows quickly, rush update so you are not artificially waiting - Refine prompt interface - Fix lost system message when prompt gets long	2024-01-15 18:51:14 +11:00
Jarek Radosz	6b8a57d957	DEV: Update linting (#423 ) Co-authored-by: Keegan George <kgeorge13@gmail.com>	2024-01-13 00:28:06 +01:00
Roman Rizzi	04eae76f68	REFACTOR: Represent generic prompts with an Object. (#416 ) * REFACTOR: Represent generic prompts with an Object. * Adds a bit more validation for clarity * Rewrite bot title prompt and fix quirk handling --------- Co-authored-by: Sam Saffron <sam.saffron@gmail.com>	2024-01-12 14:36:44 -03:00
Sam	8df966e9c5	FEATURE: smooth streaming of AI responses on the client (#413 ) This PR introduces 3 things: 1. Fake bot that can be used on local so you can test LLMs, to enable on dev use: SiteSetting.ai_bot_enabled_chat_bots = "fake" 2. More elegant smooth streaming of progress on LLM completion This leans on JavaScript to buffer and trickle llm results through. It also amends it so the progress dot is much more consistently rendered 3. It fixes the Claude dialect Claude needs newlines exactly at the right spot, amended so it is happy --------- Co-authored-by: Martin Brennan <martin@discourse.org>	2024-01-11 15:56:40 +11:00
Rafael dos Santos Silva	8fcba12fae	FEATURE: Support for SRV records for Discourse services (#414 ) This allows admins to configure services with multiple backends using DNS SRV records. This PR also adds support for shared secret auth via headers for TEI and vLLM endpoints, so they are inline with the other ones.	2024-01-10 19:23:07 -03:00
Roman Rizzi	abde82c1f3	FIX: Use claude-2.1 to enable system prompts (#411 )	2024-01-09 14:10:20 -03:00
Sam	b0a0cbe3ca	FIX: improve bot behavior (#408 ) * FIX: improve bot behavior - Provide more information to Gemini context post function execution - Use system prompts for Claude (fixes Dall E) - Ensure Assistant is properly separated - Teach Claude to return arrays in JSON vs XML Also refactors tests so we do not copy tool preamble everywhere * System msg is claude-2 only. fix typo --------- Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>	2024-01-08 10:28:03 -03:00
Roman Rizzi	6124f910c1	FIX: Bring back Azure support. (#407 ) We thought Azure's latest API version didn't have tool support yet, but I didn't understand it was complaining about a required field in the tool call message.	2024-01-05 17:08:10 -03:00
Sam	17cc09ec9c	FIX: don't include <details> in context (#406 ) * FIX: don't include <details> in context We need to be careful adding <details> into context of conversations it can cause LLMs to hallucinate results * Fix Gemini multi-turn ctx flattening --------- Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>	2024-01-05 15:21:14 -03:00
Roman Rizzi	971e03bdf2	FEATURE: AI Bot Gemini support. (#402 ) It also corrects the syntax around tool support, which was wrong. Gemini doesn't want us to include messages about previous tool invocations, so I had to shuffle around some code to send the response it generated from those invocations instead. For this, I created the "multi_turn" context, which bundles all the context involved in the interaction.	2024-01-04 18:15:34 -03:00
Roman Rizzi	f9d7d7f5f0	DEV: AI bot migration to the Llm pattern. (#343 ) * DEV: AI bot migration to the Llm pattern. We added tool and conversation context support to the Llm service in discourse-ai#366, meaning we met all the conditions to migrate this module. This PR migrates to the new pattern, meaning adding a new bot now requires minimal effort as long as the service supports it. On top of this, we introduce the concept of a "Playground" to separate the PM-specific bits from the completion, allowing us to use the bot in other contexts like chat in the future. Commands are called tools, and we simplified all the placeholder logic to perform updates in a single place, making the flow more one-wayish. * Followup fixes based on testing * Cleanup unused inference code * FIX: text-based tools could be in the middle of a sentence * GPT-4-turbo support * Use new LLM API	2024-01-04 10:44:07 -03:00
Sam	03fc94684b	FIX: AI helper not working correctly with mixtral (#399 ) * FIX: AI helper not working correctly with mixtral This PR introduces a new function on the generic llm called #generate This will replace the implementation of completion! #generate introduces a new way to pass temperature, max_tokens and stop_sequences Then LLM implementers need to implement #normalize_model_params to ensure the generic names match the LLM specific endpoint This also adds temperature and stop_sequences to completion_prompts this allows for much more robust completion prompts * port everything over to #generate * Fix translation - On anthropic this no longer throws random "This is your translation:" - On mixtral this actually works * fix markdown table generation as well	2024-01-04 09:53:47 -03:00
Roman Rizzi	4182af230a	FIX: Correctly translate and read tools for Claude and Chat GPT. (#393 ) I tested against the live models for the AI bot migration. It ensures Open AI's tool syntax is correct and we can correctly read the replies. :	2024-01-02 11:21:13 -03:00
Rafael dos Santos Silva	20cb15ab5f	FEATURE: Mixtral for summarization (#381 )	2023-12-26 17:50:02 -03:00
Rafael dos Santos Silva	3c27cbfb9a	FIX: Use vLLM if TGI is not configured for OSS LLM inference (#380 )	2023-12-26 17:18:08 -03:00
Rafael dos Santos Silva	5db7bf6e68	Mixtral (#376 ) Add both Mistral and Mixtral support. Also includes vLLM-openAI inference support. Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>	2023-12-26 14:49:55 -03:00
Sam	af2e692761	FIX: under certain conditions we would get duplicate data from llm (#373 ) Previously endpoint/base would `+=` decoded_chunk to leftover This could lead to cases where the leftover buffer had duplicate previously processed data Fix ensures we properly skip previously decoded data.	2023-12-20 14:28:05 -03:00
Sam	529703b5ec	FEATURE: support sending AI report to an email address (#368 ) Support emailing the AI report to any arbitrary email	2023-12-19 17:51:49 +11:00
Sam	d0f54443ae	FEATURE: LLM based peroidical summary report (#357 ) Introduce a Discourse Automation based periodical report. Depends on Discourse Automation. Report works best with very large context language models such as GPT-4-Turbo and Claude 2. - Introduces final_insts to generic llm format, for claude to work best it is better to guide the last assistant message (we should add this to other spots as well) - Adds GPT-4 turbo support to generic llm interface	2023-12-19 12:04:15 +11:00
Roman Rizzi	e0bf6adb5b	DEV: Tool support for the LLM service. (#366 ) This PR adds tool support to available LLMs. We'll buffer tool invocations and return them instead of making users of this service parse the response. It also adds support for conversation context in the generic prompt. It includes bot messages, user messages, and tool invocations, which we'll trim to make sure it doesn't exceed the prompt limit, then translate them to the correct dialect. Finally, It adds some buffering when reading chunks to handle cases when streaming is extremely slow.:M	2023-12-18 18:06:01 -03:00
Roman Rizzi	203906be65	FIX: Bedrock was complaining input was too long (#365 )	2023-12-18 16:06:06 -03:00
Rafael dos Santos Silva	83744bf192	FEATURE: Support for Gemini in AiHelper / Search / Summarization (#358 )	2023-12-15 14:32:01 -03:00
Roman Rizzi	031c2a6b46	Revert "FIX: Recover from Bedrock returning invalid base64 payloads during streaming (#352 )" (#353 ) This reverts commit ef7d4cc5090e54491ab00d5bdd0ef3ad85c499de.	2023-12-12 17:22:44 -03:00
Roman Rizzi	ef7d4cc509	FIX: Recover from Bedrock returning invalid base64 payloads during streaming (#352 )	2023-12-12 17:06:53 -03:00
Roman Rizzi	2798e4c86d	FIX: Custom instructions where missing when generating custom prompt input (#348 )	2023-12-11 19:26:56 -03:00
Rafael dos Santos Silva	252efdf142	FIX: Don't echo prompt back on HF/TGI (#338 ) * FIX: Don't echo prompt back on HF/TGI * teeeeests	2023-12-06 16:06:26 -03:00
Rafael dos Santos Silva	d8267d8da0	FIX: Many fixes for huggingface and llama2 inference (#335 )	2023-12-06 11:22:42 -03:00
Sam	6ddc17fd61	DEV: port directory structure to Zeitwerk (#319 ) Previous to this change we relied on explicit loading for a files in Discourse AI. This had a few downsides: - Busywork whenever you add a file (an extra require relative) - We were not keeping to conventions internally ... some places were OpenAI others are OpenAi - Autoloader did not work which lead to lots of full application broken reloads when developing. This moves all of DiscourseAI into a Zeitwerk compatible structure. It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there) To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.	2023-11-29 15:17:46 +11:00
Roman Rizzi	f26adf2cf6	FIX: Use XML tags in generate_titles prompt. (#322 ) We must ensure we can isolate titles, and the models sometimes ignore the example we give them. Additionally, anons can generate HyDE posts, so we need to check if user is nil when attempting to log requests.	2023-11-28 12:52:22 -03:00
Roman Rizzi	2e7c5f047d	DEV: Don't attempt to update log if completion request fails. (#321 ) We already log the request failure when we raise the exception.	2023-11-28 11:15:12 -03:00
Roman Rizzi	419c43592a	FIX: Make summaries more cohesive by tweaking prompt. (#310 ) Other changes: - Don't use Bedrock for non claude models if credentials are set. - Remove extra sentence from HyDE prompt.	2023-11-23 16:33:37 -03:00
Roman Rizzi	02efca162e	FIX: Bedrock uses slightly different model names * Revert "FIX: We don't need to prepend anthropic. to bedrock models (#308)" This reverts commit 8a01751991178f7636030eb99e7f75c035707ffd. * FIX: Bedrock uses slightly different model names	2023-11-23 15:49:24 -03:00
Roman Rizzi	8a01751991	FIX: We don't need to prepend anthropic. to bedrock models (#308 )	2023-11-23 14:39:21 -03:00
Roman Rizzi	3064d4c288	REFACTOR: Summarization and HyDE now use an LLM abstraction. (#297 ) * DEV: One LLM abstraction to rule them all * REFACTOR: HyDE search uses new LLM abstraction * REFACTOR: Summarization uses the LLM abstraction * Updated documentation and made small fixes. Remove Bedrock claude-2 restriction	2023-11-23 12:58:54 -03:00

45 Commits