Previously endpoint/base would `+=` decoded_chunk to leftover
This could lead to cases where the leftover buffer had duplicate
previously processed data
Fix ensures we properly skip previously decoded data.
This PR adds tool support to available LLMs. We'll buffer tool invocations and return them instead of making users of this service parse the response.
It also adds support for conversation context in the generic prompt. It includes bot messages, user messages, and tool invocations, which we'll trim to make sure it doesn't exceed the prompt limit, then translate them to the correct dialect.
Finally, It adds some buffering when reading chunks to handle cases when streaming is extremely slow.:M
Previous to this change we relied on explicit loading for a files in Discourse AI.
This had a few downsides:
- Busywork whenever you add a file (an extra require relative)
- We were not keeping to conventions internally ... some places were OpenAI others are OpenAi
- Autoloader did not work which lead to lots of full application broken reloads when developing.
This moves all of DiscourseAI into a Zeitwerk compatible structure.
It also leaves some minimal amount of manual loading (automation - which is loading into an existing namespace that may or may not be there)
To avoid needing /lib/discourse_ai/... we mount a namespace thus we are able to keep /lib pointed at ::DiscourseAi
Various files were renamed to get around zeitwerk rules and minimize usage of custom inflections
Though we can get custom inflections to work it is not worth it, will require a Discourse core patch which means we create a hard dependency.
We must ensure we can isolate titles, and the models sometimes ignore the example we give them.
Additionally, anons can generate HyDE posts, so we need to check if user is nil when attempting to log requests.
* DEV: One LLM abstraction to rule them all
* REFACTOR: HyDE search uses new LLM abstraction
* REFACTOR: Summarization uses the LLM abstraction
* Updated documentation and made small fixes. Remove Bedrock claude-2 restriction