discourse-ai

mirror of https://github.com/discourse/discourse-ai.git synced 2025-02-18 17:34:52 +00:00

Author	SHA1	Message	Date
Sam	e817b7dc11	FEATURE: improve tool support (#904 ) This re-implements tool support in DiscourseAi::Completions::Llm #generate Previously tool support was always returned via XML and it would be the responsibility of the caller to parse XML New implementation has the endpoints return ToolCall objects. Additionally this simplifies the Llm endpoint interface and gives it more clarity. Llms must implement decode, decode_chunk (for streaming) It is the implementers responsibility to figure out how to decode chunks, base no longer implements. To make this easy we ship a flexible json decoder which is easy to wire up. Also (new) Better debugging for PMs, we now have a next / previous button to see all the Llm messages associated with a PM Token accounting is fixed for vllm (we were not correctly counting tokens)	2024-11-12 08:14:30 +11:00
Roman Rizzi	bed044448c	DEV: Remove old code now that features rely on LlmModels. (#729 ) * DEV: Remove old code now that features rely on LlmModels. * Hide old settings and migrate persona llm overrides * Remove shadowing special URL + seeding code. Use srv:// prefix instead.	2024-07-30 13:44:57 -03:00
Roman Rizzi	1d786fbaaf	FEATURE: Set endpoint credentials directly from LlmModel. (#625 ) * FEATURE: Set endpoint credentials directly from LlmModel. Drop Llama2Tokenizer since we no longer use it. * Allow http for custom LLMs --------- Co-authored-by: Rafael Silva <xfalcox@gmail.com>	2024-05-16 09:50:22 -03:00
Roman Rizzi	e22194f321	HACK: Llama3 support for summarization/AI helper. (#616 ) There are still some limitations to which models we can support with the `LlmModel` class. This will enable support for Llama3 while we sort those out.	2024-05-13 15:54:42 -03:00
Roman Rizzi	4f1a3effe0	REFACTOR: Migrate Vllm/TGI-served models to the OpenAI format. (#588 ) Both endpoints provide OpenAI-compatible servers. The only difference is that Vllm doesn't support passing tools as a separate parameter. Even if the tool param is supported, it ultimately relies on the model's ability to handle native functions, which is not the case with the models we have today. As a part of this change, we are dropping support for StableBeluga/Llama2 models. They don't have a chat_template, meaning the new API can translate them. These changes let us remove some of our existing dialects and are a first step in our plan to support any LLM by defining them as data-driven concepts. I rewrote the "translate" method to use a template method and extracted the tool support strategies into its classes to simplify the code. Finally, these changes bring support for Ollama when running in dev mode. It only works with Mistral for now, but it will change soon..	2024-05-07 10:02:16 -03:00
Roman Rizzi	0634b85a81	UX: Validations to LLM-backed features (except AI Bot) (#436 ) * UX: Validations to Llm-backed features (except AI Bot) This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs. Validations are: * You must choose a model before enabling the feature. * You must turn off the feature before setting the model to blank. * You must configure each model settings before being able to select it. * Add provider name to summarization options * vLLM can technically support same models as HF * Check we can talk to the selected model * Check for Bedrock instead of anthropic as a site could have both creds setup	2024-01-29 16:04:25 -03:00
Sam	03fc94684b	FIX: AI helper not working correctly with mixtral (#399 ) * FIX: AI helper not working correctly with mixtral This PR introduces a new function on the generic llm called #generate This will replace the implementation of completion! #generate introduces a new way to pass temperature, max_tokens and stop_sequences Then LLM implementers need to implement #normalize_model_params to ensure the generic names match the LLM specific endpoint This also adds temperature and stop_sequences to completion_prompts this allows for much more robust completion prompts * port everything over to #generate * Fix translation - On anthropic this no longer throws random "This is your translation:" - On mixtral this actually works * fix markdown table generation as well	2024-01-04 09:53:47 -03:00
Rafael dos Santos Silva	3c27cbfb9a	FIX: Use vLLM if TGI is not configured for OSS LLM inference (#380 )	2023-12-26 17:18:08 -03:00
Rafael dos Santos Silva	5db7bf6e68	Mixtral (#376 ) Add both Mistral and Mixtral support. Also includes vLLM-openAI inference support. Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>	2023-12-26 14:49:55 -03:00
Roman Rizzi	e0bf6adb5b	DEV: Tool support for the LLM service. (#366 ) This PR adds tool support to available LLMs. We'll buffer tool invocations and return them instead of making users of this service parse the response. It also adds support for conversation context in the generic prompt. It includes bot messages, user messages, and tool invocations, which we'll trim to make sure it doesn't exceed the prompt limit, then translate them to the correct dialect. Finally, It adds some buffering when reading chunks to handle cases when streaming is extremely slow.:M	2023-12-18 18:06:01 -03:00
Rafael dos Santos Silva	252efdf142	FIX: Don't echo prompt back on HF/TGI (#338 ) * FIX: Don't echo prompt back on HF/TGI * teeeeests	2023-12-06 16:06:26 -03:00
Rafael dos Santos Silva	d8267d8da0	FIX: Many fixes for huggingface and llama2 inference (#335 )	2023-12-06 11:22:42 -03:00
Sam	6ddc17fd61	DEV: port directory structure to Zeitwerk (#319 ) Previous to this change we relied on explicit loading for a files in Discourse AI. This had a few downsides: - Busywork whenever you add a file (an extra require relative) - We were not keeping to conventions internally ... some places were OpenAI others are OpenAi - Autoloader did not work which lead to lots of full application broken reloads when developing. This moves all of DiscourseAI into a Zeitwerk compatible structure. It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there) To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.	2023-11-29 15:17:46 +11:00
Roman Rizzi	3064d4c288	REFACTOR: Summarization and HyDE now use an LLM abstraction. (#297 ) * DEV: One LLM abstraction to rule them all * REFACTOR: HyDE search uses new LLM abstraction * REFACTOR: Summarization uses the LLM abstraction * Updated documentation and made small fixes. Remove Bedrock claude-2 restriction	2023-11-23 12:58:54 -03:00

14 Commits