Adds a comprehensive quota management system for LLM models that allows:
- Setting per-group (applied per user in the group) token and usage limits with configurable durations
- Tracking and enforcing token/usage limits across user groups
- Quota reset periods (hourly, daily, weekly, or custom)
- Admin UI for managing quotas with real-time updates
This system provides granular control over LLM API usage by allowing admins
to define limits on both total tokens and number of requests per group.
Supports multiple concurrent quotas per model and automatically handles
quota resets.
Co-authored-by: Keegan George <kgeorge13@gmail.com>
This changeset:
1. Corrects some issues with "force_default_llm" not applying
2. Expands the LLM list page to show LLM usage
3. Clarifies better what "enabling a bot" on an llm means (you get it in the selector)
Restructures LLM config page so it is far clearer.
Also corrects bugs around adding LLMs and having LLMs not editable post addition
---------
Co-authored-by: Sam Saffron <sam.saffron@gmail.com>
- Validate fields to reduce the chance of breaking features by a misconfigured model.
- Fixed a bug where the URL might get deleted during an update.
- Display a warning when a model is currently in use.
* DEV: Remove old code now that features rely on LlmModels.
* Hide old settings and migrate persona llm overrides
* Remove shadowing special URL + seeding code. Use srv:// prefix instead.
Previously, we stored request parameters like the OpenAI organization and Bedrock's access key and region as site settings. This change stores them in the `llm_models` table instead, letting us drop more settings while also becoming more flexible.
* DRAFT: Create AI Bot users dynamically and support custom LlmModels
* Get user associated to llm_model
* Track enabled bots with attribute
* Don't store bot username. Minor touches to migrate default values in settings
* Handle scenario where vLLM uses a SRV record
* Made 3.5-turbo-16k the default version so we can remove hack
* FEATURE: Set endpoint credentials directly from LlmModel.
Drop Llama2Tokenizer since we no longer use it.
* Allow http for custom LLMs
---------
Co-authored-by: Rafael Silva <xfalcox@gmail.com>
This PR introduces the concept of "LlmModel" as a new way to quickly add new LLM models without making any code changes. We are releasing this first version and will add incremental improvements, so expect changes.
The AI Bot can't fully take advantage of this feature as users are hard-coded. We'll fix this in a separate PR.s