Commit Graph

64 Commits

Author SHA1 Message Date
Sam dc8249c08a
FEATURE: align with /filter and allow multiple category search (#27440)
This introduces the syntax of

`category:a,b,c` which will search across multiple categories.

Previously there was no way to allow search across a wide selection of
categories.
2024-06-12 16:06:04 +10:00
Régis Hanol 19b7b22627 DEV: fix the fix for flakey test 😓
We should not be clearing **all** the advanced search filters and orders, because some are required by the application.
2024-04-29 21:43:38 +02:00
Jan Cernik 9fb888923d
FIX: Do not show hidden posts in search results (#26800) 2024-04-29 12:32:02 -03:00
Régis Hanol f7a1272fa4 DEV: cleanup custom filters to prevent leaks
Ensures we clean up any custom filters added in the specs to prevent any leaks when running the specs.

Follow up to https://github.com/discourse/discourse/pull/26770#discussion_r1582464760
2024-04-29 16:11:12 +02:00
Alan Guo Xiang Tan e61608d080
FIX: Remap postgres text search proximity operator (#25497)
Why this change?

Since 1dba1aca27, we have been remapping
the `<->` proximity operator in a tsquery to `&`. However, there is
another variant of it which follows the `<N>` pattern. For example, the
following text "end-to-end" will eventually result in the following
tsquery `end-to-end:* <-> end:* <2> end:*` being generated by Postgres.
Before this fix, the tsquery is remapped to `end-to-end:* & end:* <2>
end:*` by us. This is requires the search data which we store to contain
`end` at exactly 2 position apart. Due to the way we limit the
number of duplicates in our search data, the search term may end up not
matching anything. In bd32912c5e, we made
it such that we do not allow any duplicates when indexing a topic's
title. Therefore, search for `end-to-end` against a topic title with
`end-to-end` will never match because our index will only contain one
`end` term.

What does this change do?

We will remap the `<N>` variant of the proximity operator.
2024-02-01 07:20:46 +08:00
Ted Johansson 7e5d2a95ee
DEV: Convert min_trust_level_to_tag_topics to groups (#25273)
We're changing the implementation of trust levels to use groups. Part of this is to have site settings that reference trust levels use groups instead. It converts the min_trust_level_to_tag_topics site setting to tag_topic_allowed_groups.
2024-01-26 13:25:03 +08:00
Ted Johansson 57ea56ee05
DEV: Remove full group refreshes from tests (#25414)
We have all these calls to Group.refresh_automatic_groups! littered throughout the tests. Including tests that are seemingly unrelated to groups. This is because automatic group memberships aren't fabricated when making a vanilla user. There are two places where you'd want to use this:

You have fabricated a user that needs a certain trust level (which is now based on group membership.)
You need the system user to have a certain trust level.
In the first case, we can pass refresh_auto_groups: true to the fabricator instead. This is a more lightweight operation that only considers a single user, instead of all users in all groups.

The second case is no longer a thing after #25400.
2024-01-25 14:28:26 +08:00
Martin Brennan 0e50f88212
DEV: Move min_trust_to_post_embedded_media to group setting (#25238)
c.f. https://meta.discourse.org/t/we-are-changing-giving-access-to-features/283408
2024-01-25 09:50:59 +10:00
Penar Musaraj f2cf5434f3
Revert "DEV: Convert min_trust_level_to_tag_topics to groups (#25258)" (#25262)
This reverts commit c7e3d27624 due to
test failures. This is temporary.
2024-01-15 11:33:47 -05:00
Ted Johansson c7e3d27624
DEV: Convert min_trust_level_to_tag_topics to groups (#25258)
We're changing the implementation of trust levels to use groups. Part of this is to have site settings that reference trust levels use groups instead. It converts the min_trust_level_to_tag_topics site setting to tag_topic_allowed_groups.
2024-01-15 20:59:08 +08:00
Jarek Radosz 694b5f108b
DEV: Fix various rubocop lints (#24749)
These (21 + 3 from previous PRs) are soon to be enabled in rubocop-discourse:

Capybara/VisibilityMatcher
Lint/DeprecatedOpenSSLConstant
Lint/DisjunctiveAssignmentInConstructor
Lint/EmptyConditionalBody
Lint/EmptyEnsure
Lint/LiteralInInterpolation
Lint/NonLocalExitFromIterator
Lint/ParenthesesAsGroupedExpression
Lint/RedundantCopDisableDirective
Lint/RedundantRequireStatement
Lint/RedundantSafeNavigation
Lint/RedundantStringCoercion
Lint/RedundantWithIndex
Lint/RedundantWithObject
Lint/SafeNavigationChain
Lint/SafeNavigationConsistency
Lint/SelfAssignment
Lint/UnreachableCode
Lint/UselessMethodDefinition
Lint/Void

Previous PRs:
Lint/ShadowedArgument
Lint/DuplicateMethods
Lint/BooleanSymbol
RSpec/SpecFilePathSuffix
2023-12-06 23:25:00 +01:00
Martin Brennan 30d5e752d7
DEV: Revert guardian changes (#24742)
I took the wrong approach here, need to rethink.

* Revert "FIX: Use Guardian.basic_user instead of new (anon) (#24705)"

This reverts commit 9057272ee2.

* Revert "DEV: Remove unnecessary method_missing from GuardianUser (#24735)"

This reverts commit a5d4bf6dd2.

* Revert "DEV: Improve Guardian devex (#24706)"

This reverts commit 77b6a038ba.

* Revert "FIX: Introduce Guardian::BasicUser for oneboxing checks (#24681)"

This reverts commit de983796e1.
2023-12-06 16:37:32 +10:00
Martin Brennan 9057272ee2
FIX: Use Guardian.basic_user instead of new (anon) (#24705)
c.f. de983796e1

There will soon be additional login_required checks
for Guardian, and the intent of many checks by automated
systems is better fulfilled by using BasicUser, which
simulates a logged in TL0 forum user, rather than an
anon user.

In some cases the use of anon still makes sense (e.g.
anonymous_cache), and in that case the more explicit
`Guardian.anon_user` is used
2023-12-06 11:56:21 +10:00
Martin Brennan 146da75fd7
FEATURE: Add setting & preference for search sort default order (#24428)
This commit adds a new `search_default_sort_order` site setting,
set to "relevance" by default, that controls the default sort order
for the full page /search route.

If the user changes the order in the dropdown on that page, we remember
their preference automatically, and it takes precedence over the site
setting as a default from then on. This way people who prefer e.g.
Latest Post as their default can make it so.
2023-11-20 10:43:58 +10:00
Daniel Waterworth 6e161d3e75
DEV: Allow fab! without block (#24314)
The most common thing that we do with fab! is:

    fab!(:thing) { Fabricate(:thing) }

This commit adds a shorthand for this which is just simply:

    fab!(:thing)

i.e. If you omit the block, then, by default, you'll get a `Fabricate`d object using the fabricator of the same name.
2023-11-09 16:47:59 -06:00
Sam f25849501d
FEATURE: allow consumers to parse a search string (#23528)
This extends search so it can have consumers that:

1. Can split off "term" from various advanced filters and orders
2. Can build a relation of either order or filter

It also moves a lot of stuff around in the search class for clarity.

Two new APIs are exposed:

`.apply_filter` to apply all the special filters to a posts/topics relation
`.apply_order` to force a particular order (eg: order:latest)

This can then be used by semantic search in Discourse AI
2023-09-12 16:21:01 +10:00
Canapin b3c722f2f7
FIX: `created:@` search keyword for uppercase usernames (#22878)
The filter wasn't working if the username had uppercase letters.
2023-08-02 15:28:17 -04:00
Sam b2e3084205
FEATURE: allow searching for oldest topics (#21715)
In some cases reverse chronological can be very important.

- Oldest post by sam
- Oldest topic by sam

Prior to these new filters we had no way of searching for them.

Now the 2 new orders `order:oldest` and `order:oldest_topic` can be used
to find oldest topics and posts

* Update spec/lib/search_spec.rb

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>

* Update spec/lib/search_spec.rb

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>

---------

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
2023-05-24 18:26:36 +10:00
Sam bd32912c5e
FIX: do not allow title stuffing to dominate search (#21464)
We were giving topics with repeated words extra weight in search index.
This meant that it was trivial to stuff words into title to dominate in search
given we search for exact title matches first.

The following tweak means that:

`invite invited invites`
and
`invite some stuff`

Both rank the same for title searching.

Titles are short and punchy, duplicating words should not give special
weight.

Requires a full reindex to take effect.
2023-05-10 11:47:58 +10:00
Bianca Nenciu d6534bdb11
DEV: Fix test (#21283)
Apostrophe-like characters (for example, ’ and ') are transformed to the
ASCII apostrophe (') regardless of search_ignore_accents.
2023-05-04 17:04:26 +03:00
Sam c63551d227
FEATURE: search_rank_sort_priorities modifier (#21329)
This new modifier can be used by plugins to modify search ordering.

Specifically plugins such as discourse_solved can amend search ordering
so solved topics bump to the top.

Also correct edge case where low and high sort priority categories did not
order correctly when it came to closed/archived
2023-05-02 16:36:36 +10:00
Sam cd247d5322
FEATURE: Roll out new search optimisations (#20364)
- Reduce duplication of terms in post index from unlimited to 6. This will
result in reduced index size and reduced weighting for posts containing
a huge amount of duplicate terms. (Eg: a post containing "sam sam sam sam
sam sam sam sam", will index as "sam sam sam sam sam sam", only including
the word up to 6 times.) This corrects a flaw where title weighting could
be ignored.

- Prioritize exact matches of words in titles. Our search always performs
a prefix match. However we want to give special weight to exact title matches
meaning that a search for "sum" will find topics such as "the sum of us" vs
"summer in spring".

- Pick up fixes to our search algorithm which are missing from old indexes.
Specifically pick up the fix that indexes URLs properly. (`https://happy.com`
was stemmed to `happi` in keywords and then was not searchable)

see also:

https://meta.discourse.org/t/refinements-to-search-being-tested-on-meta/254158

Indexing will take a while and work in batches, in the background.
2023-02-20 11:53:35 +11:00
Sam 5d28cb709a
FIX: de-prioritize archived topics (#20161)
Previously due to an error archived topics were more prominent in search
than closed topics.

This amends our internal logic to ensure archived topics are bumped down
the list.
2023-02-03 13:23:27 +11:00
Sam 651476e89e
FIX: domain searches not working properly for URLs (#20136)
If a post contains domain with a word that stems to a non prefix single
words will not match it.

For example: in happy.com, `happy` stems to `happi`. Thus searches for happy
will not find URLs with it included.

This bloats the index a tiny bit, but impact is limited.

Will require a full reindex of search to take effect. 

When we are done refining search we can consider a full version bump.
2023-02-03 09:55:28 +11:00
Sam 1dba1aca27
FIX: add support for PG 14 and up (#20137)
Previously to_tsquery would split terms and join with &

In PG 14 terms are split and use <-> which means followed directly by.

In PG 13:

discourse_test=# SELECT to_tsquery('english', '''hello world''');
     to_tsquery
---------------------
 'hello' & 'world'
(1 row)

In PG 14:

discourse_test=# SELECT to_tsquery('english', '''hello world''');
     to_tsquery
---------------------
 'hello' <-> 'world'
(1 row)


Change is very unobtrosive, we simply amend our to_tsquery to behave like
it used to behave and make no use of the `<->` operator


More detail at: https://akorotkov.github.io/blog/2021/05/22/pg-14-query-parsing/

Note that plainto_tsquery used elsewhere in Discourse keeps the exact
same function.

This also corrects a faulty test that was passing by a fluke on older
version of PG
2023-02-03 08:11:25 +11:00
Sam c5345d0e54
FEATURE: prioritize_exact_search_title_match hidden setting (#20089)
The new `prioritize_exact_search_match` can be used to force the search
algorithm to prioritize exact term matches in title when ranking results.

This is scoped narrowly to titles for cases such as a topic titled:

"organisation chart" and a search of "org chart".

If we scoped this wider, all discussion about "org chart" would float to
the top and leave a very common title de-prioritized.

This is a hidden site setting and it has some performance impact due
to double ranking.

That said, performance impact is somewhat mitigated cause ranking on
title alone is a very cheap operation.
2023-01-31 16:34:01 +11:00
Sam 07679888c8
FEATURE: allow restricting duplication in search index (#20062)
* FEATURE: allow restricting duplication in search index

This introduces the site setting `max_duplicate_search_index_terms`.
Using this number we limit the amount of duplication in our search index.

This allows us to more correctly weight title searches, so bloated posts
don't unfairly bump to the top of search results.

This feature is completely disabled by default and behind a site setting

We will experiment with it first. Note entire search index must be rebuilt
for it to take effect.


---------

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
2023-01-31 12:41:31 +11:00
Alan Guo Xiang Tan 6934edd97c
DEV: Add hidden site setting to configure search ranking weights (#20086)
This site setting is mostly experimental at this point.
2023-01-31 08:57:13 +08:00
Sam 5d669d8aa2
Revert "FEATURE: hidden site setting to disable search prefix matching (#20058)" (#20073)
This reverts commit 64f7b97d08.

Too many side effects for this setting, we have decided to remove it
2023-01-31 07:39:23 +08:00
Sam 64f7b97d08
FEATURE: hidden site setting to disable search prefix matching (#20058)
Many users seems surprised by prefix matching in search leading to
unexpected results.

Over the years we always would return results starting with a search term
and not expect exact matches.

Meaning a search for `abra` would find `abracadabra`

This introduces the Site Setting `enable_search_prefix_matching` which
defaults to true. (behavior unchanged)

We plan to experiment on select sites with exact matches to see if the
results are less surprising
2023-01-30 12:44:40 +08:00
Sérgio Saquetim 0feb9ad341
DEV: Added callback to change the query used to filter groups in search (#19884)
Added plugin registry that will allow adding callbacks that can change the query that is used
to filter groups while running a search.
2023-01-16 15:48:00 -03:00
David Taylor cb932d6ee1
DEV: Apply syntax_tree formatting to `spec/*` 2023-01-09 11:49:28 +00:00
Bianca Nenciu b80765f1f4
DEV: Remove enable_whispers site setting (#19196)
* DEV: Remove enable_whispers site setting

Whispers are enabled as long as there is at least one group allowed to
whisper, see whispers_allowed_groups site setting.

* DEV: Always enable whispers for admins if at least one group is allowed.
2022-12-16 18:42:51 +02:00
Bianca Nenciu 17b7ab0d7b
FIX: Make sure generated tsqueries are valid (#19368)
The tsquery used for searching is generated using both functions from
Ruby and Postgresql (for example, unaccent function). Depending on the
term used, it generated an invalid tsquery. For example "can’t"
generated "''can''t''" instead of "''can''''t''".
2022-12-12 17:57:20 +02:00
Du Jiajun 41e6b516e5
FIX: Support unicode in search filter @username (#18804) 2022-11-16 10:42:37 +01:00
Martin Brennan f5194aadd3
DEV: Remove usages of enable_personal_messages (#18437)
cf. e62e93f83a

This PR also makes it so `bot` (negative ID) and `system` users are always allowed
to send PMs, since the old conditional was just based on `enable_personal_messages`
2022-10-05 10:50:20 +10:00
Jarek Radosz 8fa9f0cf92
DEV: Fix a flaky spec (#18146)
In some cases the topic of the fabricated post can be titled "This is a test topic 777" which matches the search query "#777"
2022-08-31 20:52:57 +02:00
Loïc Guitaut 3eaac56797 DEV: Use proper wording for contexts in specs 2022-08-04 11:05:02 +02:00
Phil Pirozhkov 493d437e79
Add RSpec 4 compatibility (#17652)
* Remove outdated option

04078317ba

* Use the non-globally exposed RSpec syntax

https://github.com/rspec/rspec-core/pull/2803

* Use the non-globally exposed RSpec syntax, cont

https://github.com/rspec/rspec-core/pull/2803

* Comply to strict predicate matchers

See:
 - https://github.com/rspec/rspec-expectations/pull/1195
 - https://github.com/rspec/rspec-expectations/pull/1196
 - https://github.com/rspec/rspec-expectations/pull/1277
2022-07-28 10:27:38 +08:00
Loïc Guitaut 296aad430a DEV: Use `describe` for methods in specs 2022-07-27 16:35:27 +02:00
Krzysztof Kotlarek 09932738e5
FEATURE: whispers available for groups (#17170)
Before, whispers were only available for staff members.

Config has been changed to allow to configure privileged groups with access to whispers. Post migration was added to move from the old setting into the new one.

I considered having a boolean column `whisperer` on user model similar to `admin/moderator` for performance reason. Finally, I decided to keep looking for groups as queries are only done for current user and didn't notice any N+1 queries.
2022-06-30 10:18:12 +10:00
Penar Musaraj 8222810099
FIX: Limits for PM and group header search (#16887)
When searching for PMs or PMs in a group inbox, results in the header search were not being limited to 5 with a "More" link to the full page search. This PR fixes that.

It also simplifies the logic and updates the search API docs to include recently added `in:messages` and `group_messages:groupname` options.
2022-05-24 11:31:24 -04:00
Martin Brennan fcc2e7ebbf
FEATURE: Promote polymorphic bookmarks to default and migrate (#16729)
This commit migrates all bookmarks to be polymorphic (using the
bookmarkable_id and bookmarkable_type) columns. It also deletes
all the old code guarded behind the use_polymorphic_bookmarks setting
and changes that setting to true for all sites and by default for
the sake of plugins.

No data is deleted in the migrations, the old post_id and for_topic
columns for bookmarks will be dropped later on.
2022-05-23 10:07:15 +10:00
Martin Brennan 955d47bbd0
FIX: Use polymorphic bookmarks for in:bookmarks search (#16684)
This commit makes sure the in:bookmarks post advanced
search filter works with polymorphic bookmarks.
2022-05-10 09:08:01 +10:00
Penar Musaraj b266a36967
FEATURE: Add `group_messages:` keyword to advanced search (#16584) 2022-04-28 10:47:40 -04:00
Penar Musaraj eebce8f80a
FEATURE: Add in:messages search modifier (#16567)
This adds `in:messages` as a synonym for `in:personal` and sets it up as our default nomenclature (`in:personal` will still work).
2022-04-26 16:47:01 -04:00
Jarek Radosz be3dceccfa
DEV: Merge two spec files (#16244)
Also reenabled two specs on macOS as they're green now.
2022-03-22 09:23:06 +08:00
Bianca Nenciu 34b4b53bac
FEATURE: Use Postgres unaccent to ignore accents (#16100)
The search_ignore_accents site setting can be used to make the search
indexer remove the accents before indexing the content. The unaccent
function from PostgreSQL is better than Ruby's unicode_normalize(:nfkd).
2022-03-07 23:03:10 +02:00
David Taylor c9dab6fd08
DEV: Automatically require 'rails_helper' in all specs (#16077)
It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors.

By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.
2022-03-01 17:50:50 +00:00
Alan Guo Xiang Tan 930f51e175 FEATURE: Split up text segmentation for Chinese and Japanese.
* Chinese segmenetation will continue to rely on cppjieba
* Japanese segmentation will use our port of TinySegmenter
* Korean currently does not rely on segmentation which was dropped in c677877e4f
* SiteSetting.search_tokenize_chinese_japanese_korean has been split
into SiteSetting.search_tokenize_chinese and
SiteSetting.search_tokenize_japanese respectively
2022-02-07 09:21:14 +08:00