mirror of https://github.com/discourse/discourse-ai.git synced 2025-03-06 09:20:14 +00:00

DEV: improve artifact editing and eval system (#1130 )

- Add non-contiguous search/replace support using ... syntax
- Add judge support for evaluating LLM outputs with ratings
- Improve error handling and reporting in eval runner
- Add full section replacement support without search blocks
- Add fabricators and specs for artifact diffing
- Track failed searches to improve debugging
- Add JS syntax validation for artifact versions in eval system
- Update prompt documentation with clear guidelines

* improve eval output

* move error handling

* llm as a judge

* fix spec

* small note on evals

2025-02-19 15:44:33 +11:00

739 B

Raw Blame History

Discourse AI Plugin

Plugin Summary

For more information, please see: https://meta.discourse.org/t/discourse-ai/259214?u=falco

Evals

The directory evals contains AI evals for the Discourse AI plugin.

To run them use:

cd evals ./run --help

Usage: evals/run [options]
    -e, --eval NAME                  Name of the evaluation to run
        --list-models                List models
    -m, --model NAME                 Model to evaluate (will eval all models if not specified)
    -l, --list                       List evals

To run evals you will need to configure API keys in your environment:

OPENAI_API_KEY=your_openai_api_key ANTHROPIC_API_KEY=your_anthropic_api_key GEMINI_API_KEY=your_gemini_api_key

739 B Raw Blame History

Discourse AI Plugin

Evals

739 B

Raw Blame History