Commit Graph

286 Commits

Author SHA1 Message Date
Guo Xiang Tan d10d296e92 FIX: Search topic title headline being truncated.
Need to apply the `HighlightAll` option in order to avoid topic titles
from truncated in headlines when displaying search results.
2020-12-22 09:09:47 +08:00
Sam 293b243aeb
FEATURE: special shortcut for searching for own posts (#11541)
You can now use `@me` to search for posts created by yourself, this is particularly handy if you have a long username.

`@me rainbow` will find all posts you created with the word rainbow.

Also cleans up test suite so it has no warnings.
2020-12-22 10:46:42 +11:00
Arpit Jalan 29c7655221
FEATURE: allow plugins to preload custom data on search (#11518)
This commit allows discourse-assign plugin to show assigned users on
search result topic list.
2020-12-17 21:59:10 +05:30
Dan Ungureanu 9b7525bb03
FIX: Handle uncaught exception (#11263)
After the search term is parsed for advanced search filters, the term may
become empty. Later, the same term will be passed to Discourse.route_for
which will raise an ArgumentError.

> URI(nil)
ArgumentError: bad argument (expected URI object or URI string)
2020-11-20 11:28:14 +02:00
Roman Rizzi d815b95935
FEATURE: Search filter for searching all PMs on a site for admin. (#11280)
Admins can search all PMS on a site by using the `in:all-pms` advanced filter.
2020-11-19 13:56:19 -03:00
Guo Xiang Tan 68fc2a18b1 FIX: Properly handle quotes and backslash in `Search.set_tsquery_weight_filter` 2020-10-23 08:43:34 +08:00
Guo Xiang Tan 2607bb602e Fix broken spec.
Follow-up to 3c678df942
2020-10-08 10:52:46 +08:00
Sam 3c678df942
PERF: avoid lookbehinds when indexing search (#10862)
* PERF: avoid lookbehinds when indexing search

Previously we used a `EmailCook.url_regexp` this regex used lookbehinds

Unfortunately certain strings could lead to pathological behavior causing
CPU to skyrocket and regex replace to take a very very long time.

EmailCook still needs a fix, but it is less urgent cause it already splits
to single lines. That said we will correct that as well in a seperate PR.

New implementation is far more naive and relies on the extra spaces search
indexer inserts.
2020-10-08 11:40:13 +11:00
Arpit Jalan f7940b1d20
FEATURE: advanced search option for max posts count (#10761)
This commit adds an option to search for max posts count and updates
the UI for posts count search to show a min/max range in single line.
2020-09-28 21:34:16 +05:30
Arpit Jalan 4498c59085 FEATURE: add alias for min_post_count search filter 2020-09-28 16:07:44 +05:30
Arpit Jalan cdf45f4fe6 Update regex for views search filter. 2020-09-24 17:05:55 +05:30
Arpit Jalan 0c5cd0d1ef FEATURE: advanced search filters for view count 2020-09-24 15:22:18 +05:30
Bianca Nenciu 4abbe3d361
FEATURE: Make search filters case insensitive (#10715) 2020-09-23 11:59:42 +03:00
Krzysztof Kotlarek cb58cbbc2c
FEATURE: allow to extend topic_eager_loads in Search (#10625)
This additional interface is required by encrypt plugin
2020-09-14 11:58:28 +10:00
Guo Xiang Tan e6ca1b4326
FIX: Admin search for PMs should only search own PMs.
In c6ceda8c, a bug was introduced where an admin searching for his own
private messages will actually end up searching through all private
messages on the site.

Follow-up to c6ceda8c4e
2020-09-10 11:37:18 +08:00
Dan Ungureanu 38c9c87128
FIX: Add to tags result set only visible tags (#10580) 2020-09-02 13:24:40 +03:00
Guo Xiang Tan 40c6d90df3 PERF: Create a partial regular post_search_data index on large sites.
With the addition of `PostSearchData#private_message`, a partial
index consisting of only search data from regular posts can be created.
The partial index helps to speed up searches on large sites since PG
will not have to do an index scan on the entire search data index which
has shown to be a bottle neck.
2020-08-27 13:42:00 +08:00
siriwatknp 1a2800ad07 fix: 🐛 category & tag search regex to support thai character 2020-08-25 16:12:26 +08:00
Guo Xiang Tan 05174df5c0
FIX: Restrict `personal_messages:` advanced search filter to admin.
The filter noops if an incorrect username is passed. This filter is not
exposed as part of the UI but is only used when an admin transitions
from a search within a user's personal messages to the full page search.

Follow-up to 4b30799054.
2020-08-24 13:53:48 +08:00
Guo Xiang Tan c6ceda8c4e
PERF: Avoid extra subquery when searching within PMs for normal user.
Note the following query being generated where the filter for a user's
private messages is executed twice.

```sql
SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id", (TS_RANK_CD(
  post_search_data.search_data,
  TO_TSQUERY('english', '''test'':*ABCD'),
  0|32
)
 * (
  CASE categories.search_priority
  WHEN 2
  THEN 0.6
  WHEN 3
  THEN 0.8
  WHEN 4
  THEN 1.2
  WHEN 5
  THEN 1.4
  ELSE
    CASE WHEN topics.closed
    THEN 0.9
    ELSE 1
    END
  END
)
) rank, topics.bumped_at topic_bumped_at FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3) AND (topics.visible) AND (topics.archetype = 'private_message' AND post_search_data.private_message) AND (posts.topic_id IN (SELECT topic_id
FROM topic_allowed_users
WHERE user_id = 99999
UNION ALL
SELECT tg.topic_id
FROM topic_allowed_groups tg
JOIN group_users gu ON gu.user_id = 99999 AND gu.group_id = tg.group_id
)) AND (post_search_data.search_data @@ TO_TSQUERY('english', '''test'':*ABCD')) AND (posts.topic_id IN (SELECT topic_id
FROM topic_allowed_users
WHERE user_id = 99999
UNION ALL
SELECT tg.topic_id
FROM topic_allowed_groups tg
JOIN group_users gu ON gu.user_id = 99999 AND gu.group_id = tg.group_id
)) AND ((categories.id IS NULL) OR (NOT categories.read_restricted) OR (categories.id IN (999999))) ORDER BY rank DESC, topic_bumped_at DESC
```
2020-08-24 13:49:43 +08:00
Guo Xiang Tan 2f043dc89a
Fix lint. 2020-08-24 12:38:46 +08:00
Guo Xiang Tan 4b30799054
FIX: Correct `personal_messages:<username>` advanced search filter.
Renamed from `private_messages` to `personal_messages` without
deprecation because the `private_messages` advanced search filter never
worked in the first place when it was implemented.
2020-08-24 11:54:30 +08:00
Guo Xiang Tan 106a2f58a2
DEV: Drop support for deprecated `in:private` search filter. 2020-08-21 17:18:39 +08:00
Guo Xiang Tan 0684118008
DEV: Remove array_agg from search orders that does not need it. 2020-08-21 14:39:07 +08:00
Guo Xiang Tan 92b7fe4c62
PERF: Add partial index for non-pm search. 2020-08-18 15:55:08 +08:00
Guo Xiang Tan 248bebb8cd
PERF: Remove extra subquery in search.
I also noticed that removing the subquery helps the planner to plan
better.
2020-08-17 13:52:12 +08:00
Guo Xiang Tan 93f8396b4b
FIX: Limit PG headline based search blurb generation to 200 characters.
* Recovers omission characters '...' in blurb as well.
2020-08-12 15:34:27 +08:00
Guo Xiang Tan 053cbe3112
PERF: Limit characters used to generate headline for search blurb.
We determined using the following benchmark script that limiting to 2500 chars would mean a maximum of
25ms spent generating headlines.

```
require 'benchmark/ips'

string = <<~STRING
Far far away, behind the word mountains...
STRING

def sql_excerpt(string, l = 1000000)
  DB.query_single(<<~SQL)
  SELECT TS_HEADLINE('english', left('#{string}', #{l}), PLAINTO_TSQUERY('mountains'))
  SQL
end

def ruby_excerpt(string)
  output = DB.query_single("SELECT '#{string}'")[0]
  Search::GroupedSearchResults::TextHelper.excerpt(output, 'mountains', radius: 100)
end

puts "Ruby Excerpt: #{ruby_excerpt(string)}"
puts "SQL Excerpt: #{sql_excerpt(string)}"
puts

Benchmark.ips do |x|
  x.time = 10

  [1000, 2500, 5000, 10000, 20000, 50000].each do |l|
    short_string = string[0..l]

    x.report("ts_headline excerpt #{l}") do
      sql_excerpt(short_string, l)
    end

    x.report("actionview excerpt #{l}") do
      ruby_excerpt(short_string)
    end
  end

  x.compare!
end
```

```
actionview excerpt 1000:    20570.7 i/s
actionview excerpt 2500:    17863.1 i/s - 1.15x  (± 0.00) slower
actionview excerpt 5000:    14228.9 i/s - 1.45x  (± 0.00) slower
actionview excerpt 10000:    10906.2 i/s - 1.89x  (± 0.00) slower
actionview excerpt 20000:     6255.0 i/s - 3.29x  (± 0.00) slower
ts_headline excerpt 1000:     4337.5 i/s - 4.74x  (± 0.00) slower
actionview excerpt 50000:     3222.7 i/s - 6.38x  (± 0.00) slower
ts_headline excerpt 2500:     2240.4 i/s - 9.18x  (± 0.00) slower
ts_headline excerpt 5000:     1258.7 i/s - 16.34x  (± 0.00) slower
ts_headline excerpt 10000:      667.2 i/s - 30.83x  (± 0.00) slower
ts_headline excerpt 20000:      348.7 i/s - 58.98x  (± 0.00) slower
ts_headline excerpt 50000:      131.9 i/s - 155.91x  (± 0.00) slower
```
2020-08-07 14:36:52 +08:00
Guo Xiang Tan e60c74d3c1
FEATURE: Use PG `ts_headline` for highlighting topic title in search. 2020-08-07 12:43:09 +08:00
Krzysztof Kotlarek 12a00d6dc5
FEATURE: add advanced order to search (#10385)
Similar to `advanced_filter` I introduced `advanced_order`.

I needed a new option because default orders are evaluated after advanced_filter so I couldn't use it.

Also, that part is a little bit more generic
```
elsif word =~ /order:\w+/
  @order = word.gsub('order:', '').to_sym
nil
```

After those changes, I can use them in plugins in this way:
```
Search.advanced_order(:votes) do |posts|
  posts.reorder("COALESCE((SELECT dvvc.counter FROM discourse_voting_vote_counters dvvc WHERE dvvc.topic_id = subquery.topic_id), 0) DESC")
end
```
2020-08-07 12:47:00 +10:00
Guo Xiang Tan ab2b6f8dea
FIX: Specify config when generating tsquery using `ts_headline`. 2020-08-07 10:21:14 +08:00
Guo Xiang Tan 2193d02433
PERF: Use PG headlines for blurb generation and highlighting for search. 2020-08-06 14:56:29 +08:00
Guo Xiang Tan 3b08b15855
PERF: Remove one extra call to Redis when searching. 2020-08-04 14:02:02 +08:00
Guo Xiang Tan 597d542c33
FIX: Improve `Topic.similar_to` with better `Topic#title` matches.
This changes PG text search to only match the given title against
lexemes that are formed from the title. Likewise, the given raw will
only be matched against lexemes that are formed from the post's raw.
2020-07-28 12:00:27 +08:00
Guo Xiang Tan 181c4eb760 PERF: Avoid parsing `Post#cooked` with Nokogiri for every search. 2020-07-24 10:43:09 +08:00
Guo Xiang Tan af87911178
FIX: `in:title` search should only search through topic first posts. 2020-07-16 12:21:19 +08:00
Guo Xiang Tan 5bf0a0893b
FIX: Search by relevance may return incorrect post number.
Follow up to d8c796bc4.

Note that his change increases query time by around 40% in the following
benchmark against `dev.discourse.org` but this is a tradeoff that has to be taken so that relevance
search is accurate.

```
require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(time: 10, warmup: 2)

  x.report("current aggregate search query") do
    DB.exec <<~SQL
    SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id" FROM "posts" JOIN (SELECT *, row_number() over() row_number FROM (SELECT topics.id, min(posts.post_number) post_number FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3, 4) AND (topics.visible) AND (topics.archetype <> 'private_message') AND (post_search_data.search_data @@ TO_TSQUERY('english', '''postgres'':*ABCD')) AND (categories.id NOT IN (
      SELECT categories.id WHERE categories.search_priority = 1
    )
    ) AND ((categories.id IS NULL) OR (NOT categories.read_restricted)) GROUP BY topics.id ORDER BY MAX((
      TS_RANK_CD(
        post_search_data.search_data,
        TO_TSQUERY('english', '''postgres'':*ABCD'),
        1|32
      ) *
      (
        CASE categories.search_priority
        WHEN 2
        THEN 0.6
        WHEN 3
        THEN 0.8
        WHEN 4
        THEN 1.2
        WHEN 5
        THEN 1.4
        ELSE
          CASE WHEN topics.closed
          THEN 0.9
          ELSE 1
          END
        END
      )
    )
    ) DESC, topics.bumped_at DESC LIMIT 51 OFFSET 0) xxx) x ON x.id = posts.topic_id AND x.post_number = posts.post_number WHERE ("posts"."deleted_at" IS NULL) ORDER BY row_number;
    SQL
  end

  x.report("current aggregate search query with proper ranking") do
    DB.exec <<~SQL
    SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id" FROM "posts" JOIN (SELECT *, row_number() over() row_number FROM (SELECT subquery.topic_id id, (ARRAY_AGG(subquery.post_number ORDER BY rank DESC, bumped_at DESC))[1] post_number, MAX(subquery.rank) rank, MAX(subquery.bumped_at) bumped_at FROM (SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id", (
      TS_RANK_CD(
        post_search_data.search_data,
        TO_TSQUERY('english', '''postgres'':*ABCD'),
        1|32
      ) *
      (
        CASE categories.search_priority
        WHEN 2
        THEN 0.6
        WHEN 3
        THEN 0.8
        WHEN 4
        THEN 1.2
        WHEN 5
        THEN 1.4
        ELSE
          CASE WHEN topics.closed
          THEN 0.9
          ELSE 1
          END
        END
      )
    )
     rank, topics.bumped_at bumped_at FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3, 4) AND (topics.visible) AND (topics.archetype <> 'private_message') AND (post_search_data.search_data @@ TO_TSQUERY('english', '''postgres'':*ABCD')) AND (categories.id NOT IN (
      SELECT categories.id WHERE categories.search_priority = 1
    )
    ) AND ((categories.id IS NULL) OR (NOT categories.read_restricted))) subquery GROUP BY subquery.topic_id ORDER BY rank DESC, bumped_at DESC LIMIT 51 OFFSET 0) xxx) x ON x.id = posts.topic_id AND x.post_number = posts.post_number WHERE ("posts"."deleted_at" IS NULL) ORDER BY row_number;
    SQL
  end

  x.compare!
end
```

```
Warming up --------------------------------------
current aggregate search query
                         1.000  i/100ms
current aggregate search query with proper ranking
                         1.000  i/100ms
Calculating -------------------------------------
current aggregate search query
                         18.040  (± 0.0%) i/s -    181.000  in  10.035241s
current aggregate search query with proper ranking
                         12.992  (± 0.0%) i/s -    130.000  in  10.007214s

Comparison:
current aggregate search query:       18.0 i/s
current aggregate search query with proper ranking:       13.0 i/s - 1.39x  (± 0.00) slower
```
2020-07-15 11:45:56 +08:00
Guo Xiang Tan 2196d0b9ae
FIX: Strip query from URLs when indexing for search.
Indexing query strings in URLS produces inconsistent results in PG and
pollutes the search data for really little gain.

The following seems to work as expected...

```
discourse_development=# SELECT TO_TSVECTOR('https://www.discourse.org?test=2&test2=3');
                     to_tsvector
------------------------------------------------------
 '2':3 '3':5 'test':2 'test2':4 'www.discourse.org':1
```

However, once a path is present

```
discourse_development=# SELECT TO_TSVECTOR('https://www.discourse.org/latest?test=2&test2=3');
                                         to_tsvector
----------------------------------------------------------------------------------------------
 '/latest?test=2&test2=3':3 'www.discourse.org':2 'www.discourse.org/latest?test=2&test2=3':1
```

The lexeme contains both the path and the query string.
2020-07-14 15:32:40 +08:00
Guo Xiang Tan 5c31216aea
FIX: Search for whole URLs wasn't working. 2020-07-14 15:31:48 +08:00
Guo Xiang Tan d8c796bc44
FIX: Ensure that aggregating search shows the post with the higest rank.
Previously, we would only take either the `MIN` or `MAX` for
`post_number` during aggregation meaning that the ranking is not
considered.

```
require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(time: 10, warmup: 2)

  x.report("current aggregate search query") do
    DB.exec <<~SQL
    SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id" FROM "posts" JOIN (SELECT *, row_number() over() row_number FROM (SELECT topics.id, min(posts.post_number) post_number FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3, 4) AND (topics.visible) AND (topics.archetype <> 'private_message') AND (post_search_data.search_data @@ TO_TSQUERY('english', '''postgres'':*ABCD')) AND (categories.id NOT IN (
      SELECT categories.id WHERE categories.search_priority = 1
    )
    ) AND ((categories.id IS NULL) OR (NOT categories.read_restricted)) GROUP BY topics.id ORDER BY MAX((
      TS_RANK_CD(
        post_search_data.search_data,
        TO_TSQUERY('english', '''postgres'':*ABCD'),
        1|32
      ) *
      (
        CASE categories.search_priority
        WHEN 2
        THEN 0.6
        WHEN 3
        THEN 0.8
        WHEN 4
        THEN 1.2
        WHEN 5
        THEN 1.4
        ELSE
          CASE WHEN topics.closed
          THEN 0.9
          ELSE 1
          END
        END
      )
    )
    ) DESC, topics.bumped_at DESC LIMIT 51 OFFSET 0) xxx) x ON x.id = posts.topic_id AND x.post_number = posts.post_number WHERE ("posts"."deleted_at" IS NULL) ORDER BY row_number;
    SQL
  end

  x.report("current aggregate search query with proper ranking") do
    DB.exec <<~SQL
    SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id" FROM "posts" JOIN (SELECT *, row_number() over() row_number FROM (SELECT subquery.topic_id id, (ARRAY_AGG(subquery.post_number))[1] post_number, MAX(subquery.rank) rank, MAX(subquery.bumped_at) bumped_at FROM (SELECT "posts"."id", "posts"."user_id", "posts"."topic_id", "posts"."post_number", "posts"."raw", "posts"."cooked", "posts"."created_at", "posts"."updated_at", "posts"."reply_to_post_number", "posts"."reply_count", "posts"."quote_count", "posts"."deleted_at", "posts"."off_topic_count", "posts"."like_count", "posts"."incoming_link_count", "posts"."bookmark_count", "posts"."score", "posts"."reads", "posts"."post_type", "posts"."sort_order", "posts"."last_editor_id", "posts"."hidden", "posts"."hidden_reason_id", "posts"."notify_moderators_count", "posts"."spam_count", "posts"."illegal_count", "posts"."inappropriate_count", "posts"."last_version_at", "posts"."user_deleted", "posts"."reply_to_user_id", "posts"."percent_rank", "posts"."notify_user_count", "posts"."like_score", "posts"."deleted_by_id", "posts"."edit_reason", "posts"."word_count", "posts"."version", "posts"."cook_method", "posts"."wiki", "posts"."baked_at", "posts"."baked_version", "posts"."hidden_at", "posts"."self_edits", "posts"."reply_quoted", "posts"."via_email", "posts"."raw_email", "posts"."public_version", "posts"."action_code", "posts"."locked_by_id", "posts"."image_upload_id", (
      TS_RANK_CD(
        post_search_data.search_data,
        TO_TSQUERY('english', '''postgres'':*ABCD'),
        1|32
      ) *
      (
        CASE categories.search_priority
        WHEN 2
        THEN 0.6
        WHEN 3
        THEN 0.8
        WHEN 4
        THEN 1.2
        WHEN 5
        THEN 1.4
        ELSE
          CASE WHEN topics.closed
          THEN 0.9
          ELSE 1
          END
        END
      )
    )
     rank, topics.bumped_at bumped_at FROM "posts" INNER JOIN "post_search_data" ON "post_search_data"."post_id" = "posts"."id" INNER JOIN "topics" ON "topics"."id" = "posts"."topic_id" AND ("topics"."deleted_at" IS NULL) LEFT JOIN categories ON categories.id = topics.category_id WHERE ("posts"."deleted_at" IS NULL) AND "posts"."post_type" IN (1, 2, 3, 4) AND (topics.visible) AND (topics.archetype <> 'private_message') AND (post_search_data.search_data @@ TO_TSQUERY('english', '''postgres'':*ABCD')) AND (categories.id NOT IN (
      SELECT categories.id WHERE categories.search_priority = 1
    )
    ) AND ((categories.id IS NULL) OR (NOT categories.read_restricted))) subquery GROUP BY subquery.topic_id ORDER BY rank DESC, bumped_at DESC LIMIT 51 OFFSET 0) xxx) x ON x.id = posts.topic_id AND x.post_number = posts.post_number WHERE ("posts"."deleted_at" IS NULL) ORDER BY row_number;
    SQL
  end

  x.compare!
end
```

```
Warming up --------------------------------------
current aggregate search query
                         1.000  i/100ms
current aggregate search query with proper ranking
                         1.000  i/100ms
Calculating -------------------------------------
current aggregate search query
                         17.726  (± 0.0%) i/s -    178.000  in  10.045107s
current aggregate search query with proper ranking
                         17.802  (± 0.0%) i/s -    178.000  in  10.002230s

Comparison:
current aggregate search query with proper ranking:       17.8 i/s
current aggregate search query:       17.7 i/s - 1.00x  (± 0.00) slower
```
2020-07-14 13:39:13 +08:00
Guo Xiang Tan ce39733b1a
FIX: Incorrect search blurb when advanced search filters are used take2
Also remove include_blurbs attribute which isn't used.
2020-07-14 11:50:40 +08:00
David Taylor cb1f891392
Revert "FIX: Incorrect search blurb when advanced search filters are used."
This change was causing advanced search filters to disappear from the search input

This reverts commit 2e1eafae06.
2020-07-09 16:19:18 +01:00
Guo Xiang Tan 2e1eafae06
FIX: Incorrect search blurb when advanced search filters are used. 2020-07-08 11:59:49 +08:00
Guo Xiang Tan 6bab2acc9f
Fix typo.
Follow up to af52df2d
2020-07-02 14:23:10 +08:00
Guo Xiang Tan af52df2d96
DEV: Add hidden site setting for PG search ranking normalization. 2020-07-02 14:11:18 +08:00
Régis Hanol 860deeb072 FIX: identify slug-less topic urls everywhere
In 91c89df6, I fixed the onebox to support local topics with a slug-less URL.
This commit fixes all the other spots (search, topic links and user badges) where we look up for a local topic.

Follow-up-to: 91c89df6
2020-06-29 12:31:20 +02:00
Sam Saffron 3cb41d5429
PERF: stop adding more topics to search when not needed
The logic of adding additional search results does not seem to be
needed anymore.

It appears to be a relic of an old implementation.

This saves an entire search query for every search made.
2020-06-25 12:31:12 +10:00
Vinoth Kannan ce1491e830
UX: remove `in:unpinned` filter from advanced search page. (#9911) 2020-05-29 00:47:28 +05:30
Sam Saffron 862773ec83
FIX: do not remove stop words when using English locale
PG already handles English stop words, the list in cppjieba is
bigger than the list PG uses, which in turn causes confusion cause
words such as "volume" are stripped using cppijieba stop word list

We will follow up with another commit here to apply the Chinese
word stopwords, but for now to eliminate the confusion we are
skipping applying the stopword list when the dictionary in PG is
in English.
2020-05-18 10:54:56 +10:00
David Taylor 03818e642a
FEATURE: Include optimized thumbnails for topics (#9215)
This introduces new APIs for obtaining optimized thumbnails for topics. There are a few building blocks required for this:

- Introduces new `image_upload_id` columns on the `posts` and `topics` table. This replaces the old `image_url` column, which means that thumbnails are now restricted to uploads. Hotlinked thumbnails are no longer possible. In normal use (with pull_hotlinked_images enabled), this has no noticeable impact

- A migration attempts to match existing urls to upload records. If a match cannot be found then the posts will be queued for rebake

- Optimized thumbnails are generated during post_process_cooked. If thumbnails are missing when serializing a topic list, then a sidekiq job is queued

- Topic lists and topics now include a `thumbnails` key, which includes all the available images:
   ```
   "thumbnails": [
   {
     "max_width": null,
     "max_height": null,
     "url": "//example.com/original-image.png",
     "width": 1380,
     "height": 1840
   },
   {
     "max_width": 1024,
     "max_height": 1024,
     "url": "//example.com/optimized-image.png",
     "width": 768,
     "height": 1024
   }
   ]
  ```

- Themes can request additional thumbnail sizes by using a modifier in their `about.json` file:
   ```
    "modifiers": {
      "topic_thumbnail_sizes": [
        [200, 200],
        [800, 800]
      ],
      ...
  ```
  Remember that these are generated asynchronously, so your theme should include logic to fallback to other available thumbnails if your requested size has not yet been generated

- Two new raw plugin outlets are introduced, to improve the customisability of the topic list. `topic-list-before-columns` and `topic-list-before-link`
2020-05-05 09:07:50 +01:00