discourse-ai

mirror of https://github.com/discourse/discourse-ai.git synced 2025-03-03 15:59:59 +00:00

Author	SHA1	Message	Date
Natalie Tay	2486e0e2dd	DEV: Extract configs to a yml file and allow local config (#1142 )	2025-02-24 16:22:19 +11:00
Keegan George	1f9f330ce2	DEV: Add summarization type to eval (#1138 ) Adds `type: summarization` for topic summarization eval: https://github.com/discourse/discourse-ai-evals/pull/4	2025-02-20 09:07:23 -08:00
Keegan George	af47873f28	FIX: hardcoded require for evals (#1137 ) The require for `discourse/config/environment` should point to the local user's core Discourse environment instead of the hardcoded path for Sam's env.	2025-02-19 11:56:52 -08:00
Sam	0c9466059c	DEV: improve artifact editing and eval system (#1130 ) - Add non-contiguous search/replace support using ... syntax - Add judge support for evaluating LLM outputs with ratings - Improve error handling and reporting in eval runner - Add full section replacement support without search blocks - Add fabricators and specs for artifact diffing - Track failed searches to improve debugging - Add JS syntax validation for artifact versions in eval system - Update prompt documentation with clear guidelines * improve eval output * move error handling * llm as a judge * fix spec * small note on evals	2025-02-19 15:44:33 +11:00
Sam	ce79a18790	FEATURE: Native PDF support (#1127 ) * FEATURE: Native PDF support This amends it so we use PDF Reader gem to extract text from PDFs * This means that our simple pdf eval passes at last * fix spec * skip test in CI * test file support * Update lib/utils/image_to_text.rb Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com> * address pr comments --------- Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>	2025-02-18 09:22:57 +11:00
Sam	9a6aec2cf6	DEV: eval support for tool calls (#1128 ) Also fixes anthropic with no params, streaming calls	2025-02-18 07:58:54 +11:00
Sam	5e80f93e4c	FEATURE: PDF support for rag pipeline (#1118 ) This PR introduces several enhancements and refactorings to the AI Persona and RAG (Retrieval-Augmented Generation) functionalities within the discourse-ai plugin. Here's a breakdown of the changes: 1. LLM Model Association for RAG and Personas: - New Database Columns: Adds `rag_llm_model_id` to both `ai_personas` and `ai_tools` tables. This allows specifying a dedicated LLM for RAG indexing, separate from the persona's primary LLM. Adds `default_llm_id` and `question_consolidator_llm_id` to `ai_personas`. - Migration: Includes a migration (`20250210032345_migrate_persona_to_llm_model_id.rb`) to populate the new `default_llm_id` and `question_consolidator_llm_id` columns in `ai_personas` based on the existing `default_llm` and `question_consolidator_llm` string columns, and a post migration to remove the latter. - Model Changes: The `AiPersona` and `AiTool` models now `belong_to` an `LlmModel` via `rag_llm_model_id`. The `LlmModel.proxy` method now accepts an `LlmModel` instance instead of just an identifier. `AiPersona` now has `default_llm_id` and `question_consolidator_llm_id` attributes. - UI Updates: The AI Persona and AI Tool editors in the admin panel now allow selecting an LLM for RAG indexing (if PDF/image support is enabled). The RAG options component displays an LLM selector. - Serialization: The serializers (`AiCustomToolSerializer`, `AiCustomToolListSerializer`, `LocalizedAiPersonaSerializer`) have been updated to include the new `rag_llm_model_id`, `default_llm_id` and `question_consolidator_llm_id` attributes. 2. PDF and Image Support for RAG: - Site Setting: Introduces a new hidden site setting, `ai_rag_pdf_images_enabled`, to control whether PDF and image files can be indexed for RAG. This defaults to `false`. - File Upload Validation: The `RagDocumentFragmentsController` now checks the `ai_rag_pdf_images_enabled` setting and allows PDF, PNG, JPG, and JPEG files if enabled. Error handling is included for cases where PDF/image indexing is attempted with the setting disabled. - PDF Processing: Adds a new utility class, `DiscourseAi::Utils::PdfToImages`, which uses ImageMagick (`magick`) to convert PDF pages into individual PNG images. A maximum PDF size and conversion timeout are enforced. - Image Processing: A new utility class, `DiscourseAi::Utils::ImageToText`, is included to handle OCR for the images and PDFs. - RAG Digestion Job: The `DigestRagUpload` job now handles PDF and image uploads. It uses `PdfToImages` and `ImageToText` to extract text and create document fragments. - UI Updates: The RAG uploader component now accepts PDF and image file types if `ai_rag_pdf_images_enabled` is true. The UI text is adjusted to indicate supported file types. 3. Refactoring and Improvements: - LLM Enumeration: The `DiscourseAi::Configuration::LlmEnumerator` now provides a `values_for_serialization` method, which returns a simplified array of LLM data (id, name, vision_enabled) suitable for use in serializers. This avoids exposing unnecessary details to the frontend. - AI Helper: The `AiHelper::Assistant` now takes optional `helper_llm` and `image_caption_llm` parameters in its constructor, allowing for greater flexibility. - Bot and Persona Updates: Several updates were made across the codebase, changing the string based association to a LLM to the new model based. - Audit Logs: The `DiscourseAi::Completions::Endpoints::Base` now formats raw request payloads as pretty JSON for easier auditing. - Eval Script: An evaluation script is included. 4. Testing: - The PR introduces a new eval system for LLMs, this allows us to test how functionality works across various LLM providers. This lives in `/evals`	2025-02-14 12:15:07 +11:00

7 Commits