mirror of
https://github.com/discourse/discourse-ai.git
synced 2025-03-06 09:20:14 +00:00
- Add non-contiguous search/replace support using ... syntax - Add judge support for evaluating LLM outputs with ratings - Improve error handling and reporting in eval runner - Add full section replacement support without search blocks - Add fabricators and specs for artifact diffing - Track failed searches to improve debugging - Add JS syntax validation for artifact versions in eval system - Update prompt documentation with clear guidelines * improve eval output * move error handling * llm as a judge * fix spec * small note on evals
739 B
739 B
Discourse AI Plugin
Plugin Summary
For more information, please see: https://meta.discourse.org/t/discourse-ai/259214?u=falco
Evals
The directory evals
contains AI evals for the Discourse AI plugin.
To run them use:
cd evals ./run --help
Usage: evals/run [options]
-e, --eval NAME Name of the evaluation to run
--list-models List models
-m, --model NAME Model to evaluate (will eval all models if not specified)
-l, --list List evals
To run evals you will need to configure API keys in your environment:
OPENAI_API_KEY=your_openai_api_key ANTHROPIC_API_KEY=your_anthropic_api_key GEMINI_API_KEY=your_gemini_api_key