Previously endpoint/base would `+=` decoded_chunk to leftover
This could lead to cases where the leftover buffer had duplicate
previously processed data
Fix ensures we properly skip previously decoded data.
Introduce a Discourse Automation based periodical report. Depends on Discourse Automation.
Report works best with very large context language models such as GPT-4-Turbo and Claude 2.
- Introduces final_insts to generic llm format, for claude to work best it is better to guide the last assistant message (we should add this to other spots as well)
- Adds GPT-4 turbo support to generic llm interface
This PR adds tool support to available LLMs. We'll buffer tool invocations and return them instead of making users of this service parse the response.
It also adds support for conversation context in the generic prompt. It includes bot messages, user messages, and tool invocations, which we'll trim to make sure it doesn't exceed the prompt limit, then translate them to the correct dialect.
Finally, It adds some buffering when reading chunks to handle cases when streaming is extremely slow.:M
Previous to this change we relied on explicit loading for a files in Discourse AI.
This had a few downsides:
- Busywork whenever you add a file (an extra require relative)
- We were not keeping to conventions internally ... some places were OpenAI others are OpenAi
- Autoloader did not work which lead to lots of full application broken reloads when developing.
This moves all of DiscourseAI into a Zeitwerk compatible structure.
It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there)
To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi
Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections
Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.
We must ensure we can isolate titles, and the models sometimes ignore the example we give them.
Additionally, anons can generate HyDE posts, so we need to check if user is nil when attempting to log requests.
* Revert "FIX: We don't need to prepend anthropic. to bedrock models (#308)"
This reverts commit 8a01751991.
* FIX: Bedrock uses slightly different model names
* DEV: One LLM abstraction to rule them all
* REFACTOR: HyDE search uses new LLM abstraction
* REFACTOR: Summarization uses the LLM abstraction
* Updated documentation and made small fixes. Remove Bedrock claude-2 restriction