This is a significant PR that introduces AI Artifacts functionality to the discourse-ai plugin along with several other improvements. Here are the key changes:
1. AI Artifacts System:
- Adds a new `AiArtifact` model and database migration
- Allows creation of web artifacts with HTML, CSS, and JavaScript content
- Introduces security settings (`strict`, `lax`, `disabled`) for controlling artifact execution
- Implements artifact rendering in iframes with sandbox protection
- New `CreateArtifact` tool for AI to generate interactive content
2. Tool System Improvements:
- Adds support for partial tool calls, allowing incremental updates during generation
- Better handling of tool call states and progress tracking
- Improved XML tool processing with CDATA support
- Fixes for tool parameter handling and duplicate invocations
3. LLM Provider Updates:
- Updates for Anthropic Claude models with correct token limits
- Adds support for native/XML tool modes in Gemini integration
- Adds new model configurations including Llama 3.1 models
- Improvements to streaming response handling
4. UI Enhancements:
- New artifact viewer component with expand/collapse functionality
- Security controls for artifact execution (click-to-run in strict mode)
- Improved dialog and response handling
- Better error management for tool execution
5. Security Improvements:
- Sandbox controls for artifact execution
- Public/private artifact sharing controls
- Security settings to control artifact behavior
- CSP and frame-options handling for artifacts
6. Technical Improvements:
- Better post streaming implementation
- Improved error handling in completions
- Better memory management for partial tool calls
- Enhanced testing coverage
7. Configuration:
- New site settings for artifact security
- Extended LLM model configurations
- Additional tool configuration options
This PR significantly enhances the plugin's capabilities for generating and displaying interactive content while maintaining security and providing flexible configuration options for administrators.
This re-implements tool support in DiscourseAi::Completions::Llm #generate
Previously tool support was always returned via XML and it would be the responsibility of the caller to parse XML
New implementation has the endpoints return ToolCall objects.
Additionally this simplifies the Llm endpoint interface and gives it more clarity. Llms must implement
decode, decode_chunk (for streaming)
It is the implementers responsibility to figure out how to decode chunks, base no longer implements. To make this easy we ship a flexible json decoder which is easy to wire up.
Also (new)
Better debugging for PMs, we now have a next / previous button to see all the Llm messages associated with a PM
Token accounting is fixed for vllm (we were not correctly counting tokens)
Fixes encoding of params on LLM function calls.
Previously we would improperly return results if a function parameter returned an HTML tag.
Additionally adds some missing HTTP verbs to tool calls.
* FIX: Llm selector / forced tools / search tool
This fixes a few issues:
1. When search was not finding any semantic results we would break the tool
2. Gemin / Anthropic models did not implement forced tools previously despite it being an API option
3. Mechanics around displaying llm selector were not right. If you disabled LLM selector server side persona PM did not work correctly.
4. Disabling native tools for anthropic model moved out of a site setting. This deliberately does not migrate cause this feature is really rare to need now, people who had it set probably did not need it.
5. Updates anthropic model names to latest release
* linting
* fix a couple of tests I missed
* clean up conditional
* DEV: Remove old code now that features rely on LlmModels.
* Hide old settings and migrate persona llm overrides
* Remove shadowing special URL + seeding code. Use srv:// prefix instead.
* FEATURE: Set endpoint credentials directly from LlmModel.
Drop Llama2Tokenizer since we no longer use it.
* Allow http for custom LLMs
---------
Co-authored-by: Rafael Silva <xfalcox@gmail.com>
There are still some limitations to which models we can support with the `LlmModel` class. This will enable support for Llama3 while we sort those out.
For quite a few weeks now, some times, when running function calls
on Anthropic we would get a "stray" - "calls" line.
This has been enormously frustrating!
I have been unable to find the source of the bug so instead decoupled
the implementation and create a very clear "function call normalizer"
This new class is extensively tested and guards against the type of
edge cases we saw pre-normalizer.
This also simplifies the implementation of "endpoint" which no longer
needs to handle all this complex logic.
- Updated AI Bot to only support Gemini 1.5 (used to support 1.0) - 1.0 was removed cause it is not appropriate for Bot usage
- Summaries and automation can now lean on Gemini 1.5 pro
- Amazon added support for Claude 3 Opus, added internal support for it on bedrock
Introduces a new AI Bot persona called 'GitHub Helper' which is specialized in assisting with GitHub-related tasks and questions. It includes the following key changes:
- Implements the GitHub Helper persona class with its system prompt and available tools
- Adds three new AI Bot tools for GitHub interactions:
- github_file_content: Retrieves content of files from a GitHub repository
- github_pull_request_diff: Retrieves the diff for a GitHub pull request
- github_search_code: Searches for code in a GitHub repository
- Updates the AI Bot dialects to support the new GitHub tools
- Implements multiple function calls for standard tool dialect
- Allow users to supply top_p and temperature values, which means people can fine tune randomness
- Fix bad localization string
- Fix bad remapping of max tokens in gemini
- Add support for top_p as a general param to llms
- Amend system prompt so persona stops treating a user as an adversary
* UX: Validations to Llm-backed features (except AI Bot)
This change is part of an ongoing effort to prevent enabling a broken feature due to lack of configuration. We also want to explicit which provider we are going to use. For example, Claude models are available through AWS Bedrock and Anthropic, but the configuration differs.
Validations are:
* You must choose a model before enabling the feature.
* You must turn off the feature before setting the model to blank.
* You must configure each model settings before being able to select it.
* Add provider name to summarization options
* vLLM can technically support same models as HF
* Check we can talk to the selected model
* Check for Bedrock instead of anthropic as a site could have both creds setup
It also corrects the syntax around tool support, which was wrong.
Gemini doesn't want us to include messages about previous tool invocations, so I had to shuffle around some code to send the response it generated from those invocations instead. For this, I created the "multi_turn" context, which bundles all the context involved in the interaction.
* FIX: AI helper not working correctly with mixtral
This PR introduces a new function on the generic llm called #generate
This will replace the implementation of completion!
#generate introduces a new way to pass temperature, max_tokens and stop_sequences
Then LLM implementers need to implement #normalize_model_params to
ensure the generic names match the LLM specific endpoint
This also adds temperature and stop_sequences to completion_prompts
this allows for much more robust completion prompts
* port everything over to #generate
* Fix translation
- On anthropic this no longer throws random "This is your translation:"
- On mixtral this actually works
* fix markdown table generation as well
This PR adds tool support to available LLMs. We'll buffer tool invocations and return them instead of making users of this service parse the response.
It also adds support for conversation context in the generic prompt. It includes bot messages, user messages, and tool invocations, which we'll trim to make sure it doesn't exceed the prompt limit, then translate them to the correct dialect.
Finally, It adds some buffering when reading chunks to handle cases when streaming is extremely slow.:M