Commit Graph

22 Commits

Author SHA1 Message Date
Sam cd247d5322
FEATURE: Roll out new search optimisations (#20364)
- Reduce duplication of terms in post index from unlimited to 6. This will
result in reduced index size and reduced weighting for posts containing
a huge amount of duplicate terms. (Eg: a post containing "sam sam sam sam
sam sam sam sam", will index as "sam sam sam sam sam sam", only including
the word up to 6 times.) This corrects a flaw where title weighting could
be ignored.

- Prioritize exact matches of words in titles. Our search always performs
a prefix match. However we want to give special weight to exact title matches
meaning that a search for "sum" will find topics such as "the sum of us" vs
"summer in spring".

- Pick up fixes to our search algorithm which are missing from old indexes.
Specifically pick up the fix that indexes URLs properly. (`https://happy.com`
was stemmed to `happi` in keywords and then was not searchable)

see also:

https://meta.discourse.org/t/refinements-to-search-being-tested-on-meta/254158

Indexing will take a while and work in batches, in the background.
2023-02-20 11:53:35 +11:00
David Taylor cb932d6ee1
DEV: Apply syntax_tree formatting to `spec/*` 2023-01-09 11:49:28 +00:00
Phil Pirozhkov 493d437e79
Add RSpec 4 compatibility (#17652)
* Remove outdated option

04078317ba

* Use the non-globally exposed RSpec syntax

https://github.com/rspec/rspec-core/pull/2803

* Use the non-globally exposed RSpec syntax, cont

https://github.com/rspec/rspec-core/pull/2803

* Comply to strict predicate matchers

See:
 - https://github.com/rspec/rspec-expectations/pull/1195
 - https://github.com/rspec/rspec-expectations/pull/1196
 - https://github.com/rspec/rspec-expectations/pull/1277
2022-07-28 10:27:38 +08:00
David Taylor c9dab6fd08
DEV: Automatically require 'rails_helper' in all specs (#16077)
It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors.

By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.
2022-03-01 17:50:50 +00:00
Régis Hanol aa1138ff71
FIX: reindex_search job should work on model with no search data (#11819)
Lots of changes but it's mostly a refactoring.

The interesting part that was fix are the 'load_problem_<model>_ids' methods.
They will now return records with no search data associated so they can be properly indexed for the search.
This "bad" state usually happens after a migration.
2021-01-25 11:23:36 +01:00
Guo Xiang Tan f70b330e7a DEV: Fix the build.
Follow-up to 650da7b626
2020-11-09 14:25:14 +08:00
Guo Xiang Tan 0cbf86f1e7
DEV: Refactor reindex_search_spec.
Follow-up 20dc845418
2020-07-24 09:29:54 +08:00
Mark VanLandingham 20dc845418
FIX: tests for reindex_search_spec pass regardless of seed (#10297) 2020-07-23 13:51:45 -05:00
Guo Xiang Tan 3766122a82
DEV: Allow developmental post search index versions. 2020-07-23 15:19:46 +08:00
Guo Xiang Tan 609ba50fe8
DEV: Add more granularity to `SearchIndexer` versions.
Sometimes, we just want to reindex a specific model and not all the
things.
2020-07-23 14:24:06 +08:00
Sam Saffron 74c4f926fc FIX: drop deleted posts from search index
This does two things

1. Our "index grace period" has been wound down to 1 day, there is no point
keeping a bloated index for a week, usually when people delete stuff they
mean for it to be removed

2. We were never dropping deleted posts from the index, only posts from
deleted topics

These changes speed up search a tiny bit and reduce background work.
2019-06-04 17:19:59 +10:00
Guo Xiang Tan 152238b4cf DEV: Prefer `public_send` over `send`. 2019-05-07 09:33:21 +08:00
Sam Saffron 4ea21fa2d0 DEV: use #frozen_string_literal: true on all spec
This change both speeds up specs (less strings to allocate) and helps catch
cases where methods in Discourse are mutating inputs.

Overall we will be migrating everything to use #frozen_string_literal: true
it will take a while, but this is the first and safest move in this direction
2019-04-30 10:27:42 +10:00
Sam Saffron 45285f1477 DEV: remove update_attributes which is deprecated in Rails 6
See: https://github.com/rails/rails/pull/31998

update_attributes is a relic of the past, it should no longer be used.
2019-04-29 17:32:25 +10:00
Guo Xiang Tan 108c231d1c FIX: Clean up `topic_search_data` of trashed topics.
This keeps the index and table smaller.
2019-04-08 16:53:39 +08:00
Guo Xiang Tan c4997ce85f FIX: Don't try and reindex posts that have been trashed. 2019-04-08 16:38:43 +08:00
Guo Xiang Tan d151425353
PERF: Delete search data of posts from trashed topics periodically. (#7302)
This keeps both the index and table smaller.
2019-04-03 10:10:41 +08:00
Guo Xiang Tan d8704c11ca PERF: Better use of index when queueing a topci for search reindex.
Also move `Search::INDEX_VERSION` to `SearchIndexer` which is where the
version is actually being used.
2019-04-02 09:53:37 +08:00
Guo Xiang Tan aa2311a7b0 FIX: Don't reindex posts belonging to a deleted topic for search.
Posts belonging to a deleted topic can't be index for search so we need
to avoid loading those post ids.
2019-04-02 07:36:53 +08:00
Guo Xiang Tan 3fc5dbb045 FIX: Don't attempt to reindex posts that have an empty raw.
If the post ids keep loading, we might end up in a situations where
we're always loading the same post ids over and over again without
indexing anything new.

Follow up to daeda80ada.
2019-04-02 07:13:33 +08:00
Guo Xiang Tan daeda80ada
FIX: Don't index posts with empty `Post#raw` for search. (#7263)
* DEV: Remove unnecessary join in `Jobs::ReindexSearch`.

* FIX: Don't index posts with empty `Post#raw` for search.
2019-04-01 10:06:27 +08:00
Erick Guan 6e59149a77 FIX: rebuild index when engine replaced (#5021) 2017-08-16 07:38:34 -04:00