discourse

Commit Graph

Author	SHA1	Message	Date
Sam	b2e3084205	FEATURE: allow searching for oldest topics (#21715 ) In some cases reverse chronological can be very important. - Oldest post by sam - Oldest topic by sam Prior to these new filters we had no way of searching for them. Now the 2 new orders `order:oldest` and `order:oldest_topic` can be used to find oldest topics and posts * Update spec/lib/search_spec.rb Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com> * Update spec/lib/search_spec.rb Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com> --------- Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>	2023-05-24 18:26:36 +10:00
Sam	bd32912c5e	FIX: do not allow title stuffing to dominate search (#21464 ) We were giving topics with repeated words extra weight in search index. This meant that it was trivial to stuff words into title to dominate in search given we search for exact title matches first. The following tweak means that: `invite invited invites` and `invite some stuff` Both rank the same for title searching. Titles are short and punchy, duplicating words should not give special weight. Requires a full reindex to take effect.	2023-05-10 11:47:58 +10:00
Bianca Nenciu	d6534bdb11	DEV: Fix test (#21283 ) Apostrophe-like characters (for example, ’ and ') are transformed to the ASCII apostrophe (') regardless of search_ignore_accents.	2023-05-04 17:04:26 +03:00
Sam	c63551d227	FEATURE: search_rank_sort_priorities modifier (#21329 ) This new modifier can be used by plugins to modify search ordering. Specifically plugins such as discourse_solved can amend search ordering so solved topics bump to the top. Also correct edge case where low and high sort priority categories did not order correctly when it came to closed/archived	2023-05-02 16:36:36 +10:00
Sam	cd247d5322	FEATURE: Roll out new search optimisations (#20364 ) - Reduce duplication of terms in post index from unlimited to 6. This will result in reduced index size and reduced weighting for posts containing a huge amount of duplicate terms. (Eg: a post containing "sam sam sam sam sam sam sam sam", will index as "sam sam sam sam sam sam", only including the word up to 6 times.) This corrects a flaw where title weighting could be ignored. - Prioritize exact matches of words in titles. Our search always performs a prefix match. However we want to give special weight to exact title matches meaning that a search for "sum" will find topics such as "the sum of us" vs "summer in spring". - Pick up fixes to our search algorithm which are missing from old indexes. Specifically pick up the fix that indexes URLs properly. (`https://happy.com` was stemmed to `happi` in keywords and then was not searchable) see also: https://meta.discourse.org/t/refinements-to-search-being-tested-on-meta/254158 Indexing will take a while and work in batches, in the background.	2023-02-20 11:53:35 +11:00
Sam	5d28cb709a	FIX: de-prioritize archived topics (#20161 ) Previously due to an error archived topics were more prominent in search than closed topics. This amends our internal logic to ensure archived topics are bumped down the list.	2023-02-03 13:23:27 +11:00
Sam	651476e89e	FIX: domain searches not working properly for URLs (#20136 ) If a post contains domain with a word that stems to a non prefix single words will not match it. For example: in happy.com, `happy` stems to `happi`. Thus searches for happy will not find URLs with it included. This bloats the index a tiny bit, but impact is limited. Will require a full reindex of search to take effect. When we are done refining search we can consider a full version bump.	2023-02-03 09:55:28 +11:00
Sam	1dba1aca27	FIX: add support for PG 14 and up (#20137 ) Previously to_tsquery would split terms and join with & In PG 14 terms are split and use <-> which means followed directly by. In PG 13: discourse_test=# SELECT to_tsquery('english', '''hello world'''); to_tsquery --------------------- 'hello' & 'world' (1 row) In PG 14: discourse_test=# SELECT to_tsquery('english', '''hello world'''); to_tsquery --------------------- 'hello' <-> 'world' (1 row) Change is very unobtrosive, we simply amend our to_tsquery to behave like it used to behave and make no use of the `<->` operator More detail at: https://akorotkov.github.io/blog/2021/05/22/pg-14-query-parsing/ Note that plainto_tsquery used elsewhere in Discourse keeps the exact same function. This also corrects a faulty test that was passing by a fluke on older version of PG	2023-02-03 08:11:25 +11:00
Sam	c5345d0e54	FEATURE: prioritize_exact_search_title_match hidden setting (#20089 ) The new `prioritize_exact_search_match` can be used to force the search algorithm to prioritize exact term matches in title when ranking results. This is scoped narrowly to titles for cases such as a topic titled: "organisation chart" and a search of "org chart". If we scoped this wider, all discussion about "org chart" would float to the top and leave a very common title de-prioritized. This is a hidden site setting and it has some performance impact due to double ranking. That said, performance impact is somewhat mitigated cause ranking on title alone is a very cheap operation.	2023-01-31 16:34:01 +11:00
Sam	07679888c8	FEATURE: allow restricting duplication in search index (#20062 ) * FEATURE: allow restricting duplication in search index This introduces the site setting `max_duplicate_search_index_terms`. Using this number we limit the amount of duplication in our search index. This allows us to more correctly weight title searches, so bloated posts don't unfairly bump to the top of search results. This feature is completely disabled by default and behind a site setting We will experiment with it first. Note entire search index must be rebuilt for it to take effect. --------- Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>	2023-01-31 12:41:31 +11:00
Alan Guo Xiang Tan	6934edd97c	DEV: Add hidden site setting to configure search ranking weights (#20086 ) This site setting is mostly experimental at this point.	2023-01-31 08:57:13 +08:00
Sam	5d669d8aa2	Revert "FEATURE: hidden site setting to disable search prefix matching (#20058 )" (#20073 ) This reverts commit `64f7b97d08`. Too many side effects for this setting, we have decided to remove it	2023-01-31 07:39:23 +08:00
Sam	64f7b97d08	FEATURE: hidden site setting to disable search prefix matching (#20058 ) Many users seems surprised by prefix matching in search leading to unexpected results. Over the years we always would return results starting with a search term and not expect exact matches. Meaning a search for `abra` would find `abracadabra` This introduces the Site Setting `enable_search_prefix_matching` which defaults to true. (behavior unchanged) We plan to experiment on select sites with exact matches to see if the results are less surprising	2023-01-30 12:44:40 +08:00
Sérgio Saquetim	0feb9ad341	DEV: Added callback to change the query used to filter groups in search (#19884 ) Added plugin registry that will allow adding callbacks that can change the query that is used to filter groups while running a search.	2023-01-16 15:48:00 -03:00
David Taylor	cb932d6ee1	DEV: Apply syntax_tree formatting to `spec/*`	2023-01-09 11:49:28 +00:00
Bianca Nenciu	b80765f1f4	DEV: Remove enable_whispers site setting (#19196 ) * DEV: Remove enable_whispers site setting Whispers are enabled as long as there is at least one group allowed to whisper, see whispers_allowed_groups site setting. * DEV: Always enable whispers for admins if at least one group is allowed.	2022-12-16 18:42:51 +02:00
Bianca Nenciu	17b7ab0d7b	FIX: Make sure generated tsqueries are valid (#19368 ) The tsquery used for searching is generated using both functions from Ruby and Postgresql (for example, unaccent function). Depending on the term used, it generated an invalid tsquery. For example "can’t" generated "''can''t''" instead of "''can''''t''".	2022-12-12 17:57:20 +02:00
Du Jiajun	41e6b516e5	FIX: Support unicode in search filter @username (#18804 )	2022-11-16 10:42:37 +01:00
Martin Brennan	f5194aadd3	DEV: Remove usages of enable_personal_messages (#18437 ) cf. `e62e93f83a` This PR also makes it so `bot` (negative ID) and `system` users are always allowed to send PMs, since the old conditional was just based on `enable_personal_messages`	2022-10-05 10:50:20 +10:00
Jarek Radosz	8fa9f0cf92	DEV: Fix a flaky spec (#18146 ) In some cases the topic of the fabricated post can be titled "This is a test topic 777" which matches the search query "#777"	2022-08-31 20:52:57 +02:00
Loïc Guitaut	3eaac56797	DEV: Use proper wording for contexts in specs	2022-08-04 11:05:02 +02:00
Phil Pirozhkov	493d437e79	Add RSpec 4 compatibility (#17652 ) * Remove outdated option `04078317ba` * Use the non-globally exposed RSpec syntax https://github.com/rspec/rspec-core/pull/2803 * Use the non-globally exposed RSpec syntax, cont https://github.com/rspec/rspec-core/pull/2803 * Comply to strict predicate matchers See: - https://github.com/rspec/rspec-expectations/pull/1195 - https://github.com/rspec/rspec-expectations/pull/1196 - https://github.com/rspec/rspec-expectations/pull/1277	2022-07-28 10:27:38 +08:00
Loïc Guitaut	296aad430a	DEV: Use `describe` for methods in specs	2022-07-27 16:35:27 +02:00
Krzysztof Kotlarek	09932738e5	FEATURE: whispers available for groups (#17170 ) Before, whispers were only available for staff members. Config has been changed to allow to configure privileged groups with access to whispers. Post migration was added to move from the old setting into the new one. I considered having a boolean column `whisperer` on user model similar to `admin/moderator` for performance reason. Finally, I decided to keep looking for groups as queries are only done for current user and didn't notice any N+1 queries.	2022-06-30 10:18:12 +10:00
Penar Musaraj	8222810099	FIX: Limits for PM and group header search (#16887 ) When searching for PMs or PMs in a group inbox, results in the header search were not being limited to 5 with a "More" link to the full page search. This PR fixes that. It also simplifies the logic and updates the search API docs to include recently added `in:messages` and `group_messages:groupname` options.	2022-05-24 11:31:24 -04:00
Martin Brennan	fcc2e7ebbf	FEATURE: Promote polymorphic bookmarks to default and migrate (#16729 ) This commit migrates all bookmarks to be polymorphic (using the bookmarkable_id and bookmarkable_type) columns. It also deletes all the old code guarded behind the use_polymorphic_bookmarks setting and changes that setting to true for all sites and by default for the sake of plugins. No data is deleted in the migrations, the old post_id and for_topic columns for bookmarks will be dropped later on.	2022-05-23 10:07:15 +10:00
Martin Brennan	955d47bbd0	FIX: Use polymorphic bookmarks for in:bookmarks search (#16684 ) This commit makes sure the in:bookmarks post advanced search filter works with polymorphic bookmarks.	2022-05-10 09:08:01 +10:00
Penar Musaraj	b266a36967	FEATURE: Add `group_messages:` keyword to advanced search (#16584 )	2022-04-28 10:47:40 -04:00
Penar Musaraj	eebce8f80a	FEATURE: Add in:messages search modifier (#16567 ) This adds `in:messages` as a synonym for `in:personal` and sets it up as our default nomenclature (`in:personal` will still work).	2022-04-26 16:47:01 -04:00
Jarek Radosz	be3dceccfa	DEV: Merge two spec files (#16244 ) Also reenabled two specs on macOS as they're green now.	2022-03-22 09:23:06 +08:00
Bianca Nenciu	34b4b53bac	FEATURE: Use Postgres unaccent to ignore accents (#16100 ) The search_ignore_accents site setting can be used to make the search indexer remove the accents before indexing the content. The unaccent function from PostgreSQL is better than Ruby's unicode_normalize(:nfkd).	2022-03-07 23:03:10 +02:00
David Taylor	c9dab6fd08	DEV: Automatically require 'rails_helper' in all specs (#16077 ) It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors. By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.	2022-03-01 17:50:50 +00:00
Alan Guo Xiang Tan	930f51e175	FEATURE: Split up text segmentation for Chinese and Japanese. * Chinese segmenetation will continue to rely on cppjieba * Japanese segmentation will use our port of TinySegmenter * Korean currently does not rely on segmentation which was dropped in `c677877e4f` * SiteSetting.search_tokenize_chinese_japanese_korean has been split into SiteSetting.search_tokenize_chinese and SiteSetting.search_tokenize_japanese respectively	2022-02-07 09:21:14 +08:00
Jean	34ff7bfeeb	FEATURE: Hide suspended users from site-wide search to regular users (#14245 )	2021-09-06 09:59:35 -04:00
Bianca Nenciu	fbf7627c8e	FIX: Make search work with sub-sub-categories (#13901 ) Searching in a category looked only one level down, ignoring the site setting max_category_nesting. The user interface did not support the third level of categories and did not display them in the "Categorized" input of the advanced search options.	2021-08-02 14:04:13 +03:00
Krzysztof Kotlarek	a6363170e9	FIX: flaky search-spec More precise expectations for search spec	2021-06-29 10:06:44 +08:00
Sam	435c4817cb	FEATURE: enable tagging by default (#13175 ) Over the years we have found that a few communities never discovered tags. Instead of having them default off we now have them default on, ensuring that everyone finds out about them. Co-authored-by: Dan Ungureanu <dan@ungureanu.me>	2021-06-07 18:07:46 +03:00
Krzysztof Kotlarek	e29605b79f	FEATURE: the ability to search users by custom fields (#12762 ) When the admin creates a new custom field they can specify if that field should be searchable or not. That setting is taken into consideration for quick search results.	2021-04-27 15:52:45 +10:00
Gerhard Schlager	3b2f6e129a	FEATURE: Add English (UK) as locale (#11768 ) * "English" gets renamed into "English (US)" * "English (UK)" replaces "English" @discourse-translator-bot keep_translations_and_approvals	2021-01-20 21:32:22 +01:00
Krzysztof Kotlarek	cb58cbbc2c	FEATURE: allow to extend topic_eager_loads in Search (#10625 ) This additional interface is required by encrypt plugin	2020-09-14 11:58:28 +10:00
Dan Ungureanu	38c9c87128	FIX: Add to tags result set only visible tags (#10580 )	2020-09-02 13:24:40 +03:00
Guo Xiang Tan	181c4eb760	PERF: Avoid parsing `Post#cooked` with Nokogiri for every search.	2020-07-24 10:43:09 +08:00
Sam Saffron	862773ec83	FIX: do not remove stop words when using English locale PG already handles English stop words, the list in cppjieba is bigger than the list PG uses, which in turn causes confusion cause words such as "volume" are stripped using cppijieba stop word list We will follow up with another commit here to apply the Chinese word stopwords, but for now to eliminate the confusion we are skipping applying the stopword list when the dictionary in PG is in English.	2020-05-18 10:54:56 +10:00
Penar Musaraj	0dfc594784	FIX: skip invalid URLs when checking for audio/video in search blurbs Fixes 500 errors on search queries introduced in `580a4a8`	2019-11-06 10:32:15 -05:00
Penar Musaraj	f8b72d9835	DEV: Refactor excluding audio/video URLs from search result blurbs Followup to `580a4a82`	2019-10-31 09:13:24 -04:00
Penar Musaraj	580a4a827b	Exclude audio/video URLs from search result blurbs Displays translatable "[audio]" or "[video]" placeholders instead of ugly (and often long) URLs.	2019-10-30 13:07:16 -04:00
Dan Ungureanu	6bd082feab	FIX: Update mapping between locales and Postgres dictionaries. (#7606 )	2019-05-27 16:52:09 +03:00

47 Commits