4 Commits

Author SHA1 Message Date
Sam
ce79a18790
FEATURE: Native PDF support (#1127)
* FEATURE: Native PDF support

This amends it so we use PDF Reader gem to extract text from PDFs

* This means that our simple pdf eval passes at last

* fix spec

* skip test in CI

* test file support

* Update lib/utils/image_to_text.rb

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>

* address pr comments

---------

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
2025-02-18 09:22:57 +11:00
Sam
a5e4ab2825
FIX: blank metadata leading to errors (#578)
blank metadata block in RAG was leading to an error, this handles the edge case
2024-04-17 13:46:40 +10:00
Sam
f6ac5cd0a8
FEATURE: allow tuning of RAG generation (#565)
* FEATURE: allow tuning of RAG generation

- change chunking to be token based vs char based (which is more accurate)
- allow control over overlap / tokens per chunk and conversation snippets inserted
- UI to control new settings

* improve ui a bit

* fix various reindex issues

* reduce concurrency

* try ultra low queue ... concurrency 1 is too slow.
2024-04-12 10:32:46 -03:00
Sam
830cc26075
FEATURE: Add metadata support for RAG (#553)
* FEATURE: Add metadata support for RAG

You may include non indexed metadata in the RAG document by using

[[metadata ....]]

This information is attached to all the text below and provided to
the retriever.

This allows for RAG to operate within a rich amount of contexts
without getting lost

Also:

- re-implemented chunking algorithm so it streams
- moved indexing to background low priority queue

* Baran gem no longer required.

* tokenizers is on 4.4 ... upgrade it ...
2024-04-04 11:02:16 -03:00