Commit Graph

45 Commits

Author SHA1 Message Date
Sam cd247d5322
FEATURE: Roll out new search optimisations (#20364)
- Reduce duplication of terms in post index from unlimited to 6. This will
result in reduced index size and reduced weighting for posts containing
a huge amount of duplicate terms. (Eg: a post containing "sam sam sam sam
sam sam sam sam", will index as "sam sam sam sam sam sam", only including
the word up to 6 times.) This corrects a flaw where title weighting could
be ignored.

- Prioritize exact matches of words in titles. Our search always performs
a prefix match. However we want to give special weight to exact title matches
meaning that a search for "sum" will find topics such as "the sum of us" vs
"summer in spring".

- Pick up fixes to our search algorithm which are missing from old indexes.
Specifically pick up the fix that indexes URLs properly. (`https://happy.com`
was stemmed to `happi` in keywords and then was not searchable)

see also:

https://meta.discourse.org/t/refinements-to-search-being-tested-on-meta/254158

Indexing will take a while and work in batches, in the background.
2023-02-20 11:53:35 +11:00
David Taylor 6417173082
DEV: Apply syntax_tree formatting to `lib/*` 2023-01-09 12:10:19 +00:00
Penar Musaraj 8222810099
FIX: Limits for PM and group header search (#16887)
When searching for PMs or PMs in a group inbox, results in the header search were not being limited to 5 with a "More" link to the full page search. This PR fixes that.

It also simplifies the logic and updates the search API docs to include recently added `in:messages` and `group_messages:groupname` options.
2022-05-24 11:31:24 -04:00
Alan Guo Xiang Tan 4e5f5b67b0
DEV: Remove monkey patch that is no longer required (#16648) 2022-05-06 10:33:42 +08:00
Bianca Nenciu 34b4b53bac
FEATURE: Use Postgres unaccent to ignore accents (#16100)
The search_ignore_accents site setting can be used to make the search
indexer remove the accents before indexing the content. The unaccent
function from PostgreSQL is better than Ruby's unicode_normalize(:nfkd).
2022-03-07 23:03:10 +02:00
Alan Guo Xiang Tan 930f51e175 FEATURE: Split up text segmentation for Chinese and Japanese.
* Chinese segmenetation will continue to rely on cppjieba
* Japanese segmentation will use our port of TinySegmenter
* Korean currently does not rely on segmentation which was dropped in c677877e4f
* SiteSetting.search_tokenize_chinese_japanese_korean has been split
into SiteSetting.search_tokenize_chinese and
SiteSetting.search_tokenize_japanese respectively
2022-02-07 09:21:14 +08:00
Sam 5b342ae505
FIX: remove superfluous spaces from CJK blurbs (#12629)
Previously we used the raw data indexed to generate blurbs even for cases
when Chinese/Korean/Japanese text was used.

This caused superfluous spaces to show up in excerpts.
2021-04-12 12:46:42 +10:00
Guo Xiang Tan 93f8396b4b
FIX: Limit PG headline based search blurb generation to 200 characters.
* Recovers omission characters '...' in blurb as well.
2020-08-12 15:34:27 +08:00
Guo Xiang Tan 2193d02433
PERF: Use PG headlines for blurb generation and highlighting for search. 2020-08-06 14:56:29 +08:00
Guo Xiang Tan 255b0e9f14
PERF: Replace video and audio links in search blurb while indexing.
In the near future, we will be swtiching to PG headlines to generate the
search blurb. As such, we need to replace audio and video links in the
raw data used for headline generation. This also means that we avoid
replacing links each time we need to generate the blurb.
2020-08-06 12:25:03 +08:00
Guo Xiang Tan 06ef87da51
DEV: Make rubocop happy. 2020-08-06 10:11:07 +08:00
Guo Xiang Tan ee5d8fba0c
PERF: Optimize `ActionView::Helpers::TextHelper#excerpt`. 2020-08-06 09:59:20 +08:00
Guo Xiang Tan 4c03a944f6
PERF: Move URI regexp in `GroupSearchResults.blurb_for` into constant
No need to generate the huge regexp over and over again.
2020-08-04 14:38:43 +08:00
Guo Xiang Tan 181c4eb760 PERF: Avoid parsing `Post#cooked` with Nokogiri for every search. 2020-07-24 10:43:09 +08:00
Guo Xiang Tan ce39733b1a
FIX: Incorrect search blurb when advanced search filters are used take2
Also remove include_blurbs attribute which isn't used.
2020-07-14 11:50:40 +08:00
David Taylor cb1f891392
Revert "FIX: Incorrect search blurb when advanced search filters are used."
This change was causing advanced search filters to disappear from the search input

This reverts commit 2e1eafae06.
2020-07-09 16:19:18 +01:00
Guo Xiang Tan 2e1eafae06
FIX: Incorrect search blurb when advanced search filters are used. 2020-07-08 11:59:49 +08:00
Penar Musaraj 0dfc594784 FIX: skip invalid URLs when checking for audio/video in search blurbs
Fixes 500 errors on search queries introduced in 580a4a8
2019-11-06 10:32:15 -05:00
Penar Musaraj 15b25547bb DEV: Cleanup misspelled TextHelper param 2019-10-31 09:32:42 -04:00
Penar Musaraj f8b72d9835 DEV: Refactor excluding audio/video URLs from search result blurbs
Followup to 580a4a82
2019-10-31 09:13:24 -04:00
Penar Musaraj 580a4a827b Exclude audio/video URLs from search result blurbs
Displays translatable "[audio]" or "[video]" placeholders instead of ugly (and often long) URLs.
2019-10-30 13:07:16 -04:00
Sam Saffron 4dcc5f16f1 FEATURE: when under extreme load disable search
The global setting disable_search_queue_threshold
(DISCOURSE_DISABLE_SEARCH_QUEUE_THRESHOLD) which default to 1 second was
added.

This protection ensures that when the application is unable to keep up with
requests it will simply turn off search till it is not backed up.

To disable this protection set this to 0.
2019-07-02 11:22:01 +10:00
Sam Saffron 30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Sam Saffron e2bcf55077 DEV: move send => public_send in lib folder
This handles most of the cases in `lib` where we were using send instead
of public_send
2019-05-07 12:25:44 +10:00
Guo Xiang Tan 451f7842ff DEV: More `send` -> `public_send`. 2019-05-07 10:05:58 +08:00
Guo Xiang Tan dae0bb4c67 FIX: Post blurb incorrect when search contains a phrase match.
If the blurb generated is not around the search term, we will not be
able to highlight it on the client side.
2019-03-26 17:01:52 +08:00
Joffrey JAFFEUX dc4001370c
FEATURE: displays groups in menu search (#7090) 2019-03-04 10:30:09 +01:00
Régis Hanol 4481836de2 FEATURE: new 'search_ignore_accents' site setting 2018-09-17 10:42:30 +02:00
Neil Lalonde 2c56f8df7c FEATURE: show tags in search results 2017-08-25 11:52:59 -04:00
Neil Lalonde 7c1d7fb423 Merge branch 'master' into fix_limited_search_results 2017-07-31 15:55:31 -04:00
Guo Xiang Tan 5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Jakub Macina 7b40de5ac4 Add attribute to grouped search results for more available posts. 2017-07-20 18:07:13 +02:00
Robin Ward 21e02d6969 Include the `search_log_id` in search results 2017-07-17 12:10:32 -04:00
Sam 0a78ae739d Remove SearchObserver, aim is to remove all observers
rails-observers gem is mostly unmaintained and is a pain to carry forward
new implementation contains significantly less magic as a bonus
2016-12-22 13:13:14 +11:00
Sam 50f7616d04 FIX: include pinned status in search results 2016-03-18 16:26:20 +11:00
Sam 2876725e1b REFACTOR: remove hacky search from discovery 2015-07-27 16:47:06 +10:00
Sam 41ceff8430 UX: move search to its own route
previously search was bundled with discovery, something that makes stuff confusing internally
2015-07-27 16:47:06 +10:00
Robin Ward 6422d5efbd Use the same component for similar topics as search results. 2015-06-24 15:08:22 -04:00
Sam 0ade9bafff FIX: highlight in yellow, not blue
FEATURE: highlight in title
2014-09-04 15:01:13 +10:00
Sam 9c29c1c072 FEATURE: highlight search results 2014-09-03 17:09:01 +10:00
Sam 4f09d552ed FEATURE: increase search expansion to 50 results
refactor search code to deal with proper objects
use proper serializers, test the controllers
2014-09-03 12:13:25 +10:00
Sam abb2de22ab BUGFIX: search could break when expanding 2014-02-17 14:34:14 +11:00
Godfrey Chan 6bbea9de0b The Rails JSON encoder API requires `as_json` to take an optional arg 2013-11-29 21:43:44 -08:00
Robin Ward 20e88f18ee FIX: Removes some duplicates in search results when the search context is a user. 2013-05-27 15:18:55 -04:00
Robin Ward 9d0e830786 Search code now uses ActiveRecord instead of SQL. 2013-05-23 16:26:51 -04:00