24 Commits

Author SHA1 Message Date
Rafael dos Santos Silva
0738f67fa4
FIX: Fix embeddings truncation strategy (#139) 2023-08-16 15:09:41 -03:00
Rafael dos Santos Silva
8318c4374c
FIX: Remove muted from Similar list (#127)
* FIX: Remove muted from Similar list
2023-08-08 15:44:10 -03:00
Roman Rizzi
58b96eda6c
REFACTOR: Build related topics using TopicQuery. (#124)
TopicQuery already provides a lot of safeguards and options for filtering topic, and enforcing permissions. It makes sense to rely on it as other plugins like discourse-assign do.

As a bonus, we now have access to the current_user while serializing these topics, so users will see things like unread posts count just like we do for the lists.
2023-08-02 16:58:09 -03:00
Roman Rizzi
c8de9495c8
UX: Update related-topics to follow <MoreTopics/> conventions (#118) 2023-07-31 18:33:37 -03:00
Rafael dos Santos Silva
3e7c99de89
FEATURE: Support for locally infered embeddings in 100 languages (#115)
* FEATURE: Support for locally infered embeddings in 100 languages

* add table
2023-07-27 15:50:03 -03:00
Rafael dos Santos Silva
e3b4a73267
FEATURE: Cache Related Topics for longer (#110) 2023-07-18 11:27:06 -03:00
Rafael dos Santos Silva
703762a7a9
PERF: .find_each instead of .find to save us from memory allocation peaks
also Fix embeddings rake task for new db structure
2023-07-13 18:59:25 -03:00
Rafael dos Santos Silva
5e3f4e1b78
FEATURE: Embeddings to main db (#99)
* FEATURE: Embeddings to main db

This commit moves our embeddings store from an external configurable PostgreSQL
instance back into the main database. This is done to simplify the setup.

There is a migration that will try to import the external embeddings into
the main DB if it is configured and there are rows.

It removes support from embeddings models that aren't all_mpnet_base_v2 or OpenAI
text_embedding_ada_002. However it will now be easier to add new models.

It also now takes into account:
  - topic title
  - topic category
  - topic tags
  - replies (as much as the model allows)

We introduce an interface so we can eventually support multiple strategies
for handling long topics.

This PR severely damages the semantic search performance, but this is a
temporary until we can get adapt HyDE to make semantic search use the same
embeddings we have for semantic related with good performance.

Here we also have some ground work to add post level embeddings, but this
will be added in a future PR.

Please note that this PR will also block Discourse from booting / updating if 
this plugin is installed and the pgvector extension isn't available on the 
PostgreSQL instance Discourse uses.
2023-07-13 12:41:36 -03:00
Rafael dos Santos Silva
cfc6e388df
FIX: Ensure embeddings database outages are handled gracefully (#80)
The rails_failover middleware will intercept all `PG::ConnectionBad` errors and put the cluster into readonly mode. It does not have any handling for multiple databases. Therefore, an issue with the embeddings database was taking the whole cluster into readonly.

This commit fixes the issue by rescuing `PG::Error` from all AI database accesses, and re-raises errors with a different class. It also adds a spec to ensure that an embeddings database outage does not affect the functionality of the topics/show route.

Co-authored-by: David Taylor <david@taylorhq.com>
2023-05-23 22:57:52 +01:00
Sam
b82fc1e692
FIX: ensure we only attempt embedding once every 15 minutes (#76)
This also heavily reduced log noise and ensures our exception handling is
more surgical.
2023-05-23 10:43:24 +10:00
Rafael dos Santos Silva
e9ae28f773
FIX: Non instructor OSS embeddings was broken (#65) 2023-05-17 12:10:10 -03:00
Rafael dos Santos Silva
2ed1f874c2
Use correct API signature for instructor embeddings (#62) 2023-05-15 17:18:11 -03:00
Rafael dos Santos Silva
3c9513e754
Refinements to embeddings and tokenizers (#61)
* Refinements to embeddings and tokenizers

* lint

* Truncate with tokenizers for summary

* fix
2023-05-15 15:10:42 -03:00
Rafael dos Santos Silva
e5537d4c77
FEATURE: Allow excluding closed topics from semantic related (#55) 2023-05-09 15:30:50 -03:00
David Taylor
a0542d1859
DEV: Resolve add_to_serializer deprecations (#46)
26b7f8a63b
2023-04-24 16:07:17 +01:00
Roman Rizzi
7a54455cf6
FIX: Use correct variable and method for embeddings (#35) 2023-03-31 16:15:10 -03:00
Roman Rizzi
4e05763a99
FEATURE: Semantic assymetric full-page search (#34)
Depends on discourse/discourse#20915

Hooks to the full-page-search component using an experimental API and performs an assymetric similarity search using our embeddings database.
2023-03-31 15:29:56 -03:00
Sam
6543c50758
FIX: stop returning self as a candidate for related topics (#31) 2023-03-31 11:04:17 +10:00
Sam
0d80d9ec49
FEATURE: allow limiting results in related topics section (#30)
Also:

- Normalizes behavior between logged in and anon,
 we only show related topics in the related topic section

- Renames "suggested" to "related" given this only exists in related section
- Adds a spec section to ensure anon does not regress
- Adds `ai_embeddings_semantic_related_topics` to limit related topics

Renamed settings:

ai_embeddings_semantic_suggested_model -> ai_embeddings_semantic_related_model
ai_embeddings_semantic_suggested_topics_enabled -> ai_embeddings_semantic_related_topics_enabled

Plugins is still in an experimental phase and not much is overidden hence
avoiding adding site setting migrations.


Co-authored-by: Krzysztof Kotlarek <kotlarek.krzysztof@gmail.com>
2023-03-31 11:04:34 +11:00
Sam
1d097b9d82
FEATURE: attempt to include related topics above suggested (#28)
Allows related topics to show up for logged on users

- Introduces a new "Related Topics" block above suggested when related topics exist
- Renames `ai_embeddings_semantic_suggested_topics_anons_enabled` -> `ai_embeddings_semantic_suggested_topics_enabled` (given it is only deployed on 1 site not bothering with a migration)
- Adds an integration test to ensure data arrives correctly on the client
2023-03-31 09:07:22 +11:00
Rafael dos Santos Silva
45950f1bb4
FIX: Only show public visible topics as suggested for anons (#27)
* FIX: Only show public visible topics as suggested for anons

* DEV: Add tests for embeddings

* Update spec/lib/modules/embeddings/semantic_suggested_spec.rb

Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>

* Update spec/lib/modules/embeddings/semantic_suggested_spec.rb

Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>

* move to top

---------

Co-authored-by: Bianca Nenciu <nbianca@users.noreply.github.com>
2023-03-23 17:28:01 -03:00
Rafael dos Santos Silva
bd342f538d
FEATURE: Try to generate embeddings for a topic when those aren't found (#23) 2023-03-21 18:20:46 -03:00
Rafael dos Santos Silva
6bdbc0e32d
FIX: Proper flow when a topic doesn't have embeddings (#20) 2023-03-20 16:44:55 -03:00
Rafael dos Santos Silva
80d662e9e8
FEATURE: Semantic Suggested Topics (#10) 2023-03-15 17:21:45 -03:00