Commit Graph

19 Commits

Author SHA1 Message Date
Rafael dos Santos Silva eb93b21769
FEATURE: Add BGE-M3 embeddings support (#569)
BAAI/bge-m3 is an interesting model, that is multilingual and with a
context size of 8192. Even with a 16x larger context, it's only 4x slower
to compute it's embeddings on the worst case scenario.

Also includes a minor refactor of the rake task, including setting model
and concurrency levels when running the backfill task.
2024-04-10 17:24:01 -03:00
Roman Rizzi c4ead89c6b
FIX: Filter soft-deleted topics when backfilling sentiment (#527) 2024-03-12 21:01:24 -03:00
Rafael dos Santos Silva 140359c2ef
FEATURE: Per post embeddings (#387) 2023-12-29 12:28:45 -03:00
Rafael dos Santos Silva 76f7940b55
Revert "FEATURE: User sentiment on profile summary page (#329)" (#383)
This reverts commit 71c5077228.
2023-12-28 11:01:57 +11:00
Rafael dos Santos Silva 71c5077228
FEATURE: User sentiment on profile summary page (#329)
* FEATURE: User sentiment on profile summary page

This introduces a new user stat in a user profile summary page.

It will show either neutral/positive/negative according to the dominant
sentiment in the user last interactions.

The user-stat widget is only rendered for staff.


Co-authored-by: Keegan George <kgeorge13@gmail.com>
2023-12-04 18:17:43 -03:00
Rafael dos Santos Silva 11e531b099
FEATURE: Backfill task for sentiment module (#316)
* FEATURE: Backfill task for sentiment module

* fix join clause
2023-11-28 12:28:36 -03:00
Rafael dos Santos Silva 818b20fb6f
FEATURE: Make embeddings turn-key (#261)
To ease the administrative burden of enabling the embeddings model, this change introduces automatic backfill when the setting is enabled. It also moves the topic visit embedding creation to a lower priority queue in sidekiq and adds an option to skip embedding computation and persistence when we match on the digest.
2023-10-26 12:07:37 -03:00
Rafael dos Santos Silva 453928e7bb
FIX: Improvment to embeddings index task (#238) 2023-10-02 16:37:13 -03:00
Sam 615eb8b440
FEATURE: add semantic search with hyde bot (#210)
In specific scenarios (no special filters or limits) we will also
always include 5 semantic results (at least) with every query.

This effectively means that all very wide queries will always return
20 results, regardless of how complex they are.

Also: 

FIX: embedding backfill rake task not working
We renamed internals, this corrects the implementation
2023-09-07 13:25:26 +10:00
Rafael dos Santos Silva 2c0f535bab
FEATURE: HyDE-powered semantic search. (#136)
* FEATURE: HyDE-powered semantic search.

It relies on the new outlet added on discourse/discourse#23390 to display semantic search results in an unobtrusive way.

We'll use a HyDE-backed approach for semantic search, which consists on generating an hypothetical document from a given keywords, which gets transformed into a vector and used in a asymmetric similarity topic search.

This PR also reorganizes the internals to have less moving parts, maintaining one hierarchy of DAOish classes for vector-related operations like transformations and querying.

Completions and vectors created by HyDE will remain cached on Redis for now, but we could later use Postgres instead.

* Missing translation and rate limiting

---------

Co-authored-by: Roman Rizzi <rizziromanalejandro@gmail.com>
2023-09-05 11:08:23 -03:00
Rafael dos Santos Silva 703762a7a9
PERF: .find_each instead of .find to save us from memory allocation peaks
also Fix embeddings rake task for new db structure
2023-07-13 18:59:25 -03:00
Rafael dos Santos Silva 739b314312
Fixes for embeddings and truncate (#67) 2023-05-18 09:21:28 +10:00
Rafael dos Santos Silva f1133f66a6
Updates to embedding rake tasks (#54)
- Creates embeddings in topic ID order, so it's easier to stop and
restart from where we stopped

- Update index parameters with current best practices
2023-05-09 13:45:16 -03:00
Roman Rizzi 4e05763a99
FEATURE: Semantic assymetric full-page search (#34)
Depends on discourse/discourse#20915

Hooks to the full-page-search component using an experimental API and performs an assymetric similarity search using our embeddings database.
2023-03-31 15:29:56 -03:00
Rafael dos Santos Silva 6bdbc0e32d
FIX: Proper flow when a topic doesn't have embeddings (#20) 2023-03-20 16:44:55 -03:00
Rafael dos Santos Silva 80d662e9e8
FEATURE: Semantic Suggested Topics (#10) 2023-03-15 17:21:45 -03:00
Roman Rizzi aa2fca6086
DEV: DiscourseAI -> DiscourseAi rename to have consistent folders and files (#9) 2023-03-14 16:03:50 -03:00
Rafael dos Santos Silva 510c6487e3
DEV: Preparation work for multiple inference providers (#5) 2023-03-07 16:14:39 -03:00
Rafael dos Santos Silva 6cf411ec90
add toxicity and sentiment modules 2023-02-22 20:46:53 -03:00