This commit optimises the database query generated by
`TopicsFilter#filter_categories` when the `-category:*` filter is used.
Previously, the method will add the `topics.category_id NOT IN
(<category ids to be excluded>)` filter to the resulting query. However,
we noticed that the performance of the query degrades as the number of
rows in the `topics` table grow and when the number of category ids to be
excluded is large.
Sample of query we ran on a large database in production to demonstrate
the improvement:
Before:
```
SELECT topics.id FROM topics WHERE topics.category_id NOT IN (83, 136, 149, 143, 153, 165, 161, 123, 155, 163, 144, 134, 69, 135, 158, 141, 151, 160, 131, 133, 89, 104, 150, 147, 132, 145, 108, 146, 122, 100, 128, 154, 95, 102, 140, 139, 88, 91, 87) ORDER BY topics.id DESC LIMIT 5;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=27795.34..27795.34 rows=1 width=4) (actual time=29.317..30.165 rows=5 loops=1)
-> Sort (cost=27795.34..27795.34 rows=1 width=4) (actual time=29.316..30.163 rows=5 loops=1)
Sort Key: id DESC
Sort Method: top-N heapsort Memory: 25kB
-> Gather (cost=1000.10..27795.33 rows=1 width=4) (actual time=0.187..26.132 rows=73478 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on topics (cost=0.10..26795.23 rows=1 width=4) (actual time=0.013..22.252 rows=24493 loops=3)
Filter: (category_id <> ALL ('{83,136,149,143,153,165,161,123,155,163,144,134,69,135,158,141,151,160,131,133,89,104,150,147,132,145,108,146,122,100,128,154,95,102,140,139,88,91,87}'::integer[]))
Rows Removed by Filter: 77276
Planning Time: 0.140 ms
Execution Time: 30.181 ms
```
After:
```
SELECT topics.id FROM topics WHERE NOT EXISTS (
SELECT 1
FROM unnest(array[83, 136, 149, 143, 153, 165, 161, 123, 155, 163, 144, 134, 69, 135, 158, 141, 151, 160, 131, 133, 89, 104, 150, 147, 132, 145, 108, 146, 122, 100, 128, 154, 95, 102, 140, 139, 88, 91, 87]) AS excluded_categories(category_id)
WHERE topics.category_id IS NULL OR excluded_categories.category_id = topics.category_id
) ORDER BY topics.id DESC LIMIT 5 ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..13.52 rows=5 width=4) (actual time=0.028..0.110 rows=5 loops=1)
-> Nested Loop Anti Join (cost=0.42..179929.62 rows=68715 width=4) (actual time=0.027..0.109 rows=5 loops=1)
Join Filter: ((topics.category_id IS NULL) OR (excluded_categories.category_id = topics.category_id))
Rows Removed by Join Filter: 239
-> Index Scan Backward using forum_threads_pkey on topics (cost=0.42..108925.71 rows=305301 width=8) (actual time=0.012..0.062 rows=44 loops=1)
-> Function Scan on unnest excluded_categories (cost=0.00..0.39 rows=39 width=4) (actual time=0.000..0.001 rows=6 loops=44)
Planning Time: 0.126 ms
Execution Time: 0.124 ms
(8 rows)
```
A category's slug can be encoded when
`SiteSetting.slug_generation_method` has been set to "encoded". As a
result, we have to support non ASCII characters as well.
This commit adds support for excluding categories when using the
`category:` filter with the `-` prefix. For example,
`-category:category-slug` will exclude all topics that belong to the
category with slug "category-slug" and all of its sub-categories.
To only exclude a particular category and not all of its sub-categories,
the `-` prefix can be used with the `=` prefix. For example,
`-=category:category-slug` will only exclude topics that belong to the
category with slug "category-slug". Topics in the sub-categories of
"category-slug" will still be included.
Specifying more than two tag names when using the `tag:` filter was not
working because of a bug in the code where only the first two value in
the `tag:` filter was being selected.
This allows multiple ordering to be specified by using a comma seperated string.
For example, `order:created,views` would order the topics by
`Topic#created_at` and then `Topic#views.
This commit adds support for the following ordering filters:
1. `order:activity` which orders the topics by `Topic#bumped_at` in descending order
2. `order:activity-asc` which orders the topics by `Topic#bumped_at` in ascending order
3. `order:latest-post` which orders the topics by `Topic#last_posted_at` in descending order
4. `order:latest-post-asc` which orders the topics by `Topic#last_posted_at` in ascending order
5. `order:created` which orders the topics by `Topic#created_at` in descending order
6. `order:created-asc` which orders the topics by `Topic#created_at` in ascending order
7. `order:views` which orders the topics by `Topic#views` in descending order
8. `order:views-asc` which orders the topics by `Topic#views` in ascending order
9. `order:likes` which orders the topics by `Topic#likes` in descending order
10. `order:likes-asc` which orders the topics by `Topic#likes` in ascending order
11. `order:likes-op` which orders the topics by `Post#like_count` of the first post in the topic in descending order
12. `order:likes-op-asc` which orders the topics by `Post#like_count` of the first post in the topic in ascending order
13. `order:posters` which orders the topics by `Topic#participant_count` in descending order
14. `order:posters-asc` which orders the topics by `Topic#participant_count` in ascending order
15. `order:category` which orders the topics by `Category#name` of the topic's category in descending order
16. `order:category-asc` which orders the topics by `Category#name` of the topic's category in ascending order
Multiple order filters can be composed together and the order of ordering is applied based on the position of the filter
in the query string. For example, `order:views order:created` will order the topics by `Topic#views` in descending order
and then order the topics by `Topics#created_at` in descending order.
This commit adds support for the following date filters:
1. `activity-before:<YYYY-MM-DD>` which filters for topics that have been bumped at or before given date
2. `activity-after:<YYYY-MM-DD>` which filters for topics that have been bumped at or after given date
3. `created-before:<YYYY-MM-DD>` which filters for topics that have been created at or before given date
4. `created-after:<YYYY-MM-DD>` which filters for topics that have been created at or after given date
5. `latest-post-before:<YYYY-MM-DD>` which filters for topics with the
latest post posted at or before given date
6. `latest-post-after:<YYYY-MM-DD>` which filters for topics with the
latest post posted at or after given date
If the filter has an invalid value, i.e string that cannot be converted
into a proper date in the `YYYY-MM-DD` format, the filter will be ignored.
If either of each filter is specify multiple times, only the last
occurrence of each filter will be taken into consideration.
* DEV: Support `likes-(min:max):<count>` on `/filter` route
This commit adds support for the following filters:
1. `likes-min`
2. `likes-max`
3. `views-min`
4. `views-max`
5. `likes-op-min`
6. `likes-op-max`
If the filter has an invalid value, i.e string that cannot be converted
into an integer, the filter will be ignored.
If either of each filter is specify multiple times, only the last
occurrence of each filter will be taken into consideration.
This commit adds support for the `posters-min:<count>` and
`posters-max:<count>` filters for the topics filtering query language.
`posters-min:1` will filter for topics with at least a one poster while
`posters-max:3` will filter for topics with a maximum of 3 posters.
If the filter has an invalid value, i.e string that cannot be converted
into an integer, the filter will be ignored.
If either of each filter is specify multiple times, only the last
occurence of each filter will be taken into consideration.
This commit adds support for the `posts-min:<count>` and
`posts-max:<count>` filters for the topics filtering query language.
`posts-min:1` will filter for topics with at least a one post while
`posts-max:3` will filter foor topics with a maximum of 3 posts.
If the filter has an invalid value, i.e string that cannot be converted
into an integer, the filter will be ignored.
If either of each filter is specify multiple times, only the last
occurence of each filter will be taken into consideration.
Why this change?
Previously `TopicsFilter` was designed in such a way that we act on a
filter sequentially based on the order it was matched. However, this
made it hard to support filters composition where a similar filter may
be present further in the query string. Because of this limitation, I
previously introduced a private API `TopicsFilter.register_scope` which
allows us to act on a filter only after the entire query string has been
scanned. However, I felt that it made the code complicated and hard to
reason about.
In thie commit, I've changed it such that we scan through the entire
query string and group the values of each filter together. This allows
us to act on the values of a given filter in one go which I find easier
to reason about. This also opens up the possibility for us to ignore
certain filters when it has been specified multiple times.
This commit adds support for the `created-by:<username>` query filter
which will return topics created by the specified user. Multiple
usernames can be specified by comma seperating the usernames like so:
`created-by:username1,username2`. This will filter for topics created by
either of the specified users. Multiple `created-by:<username>` can also
be composed together. `created-by:username1 created-by:username2` is
equivalent to `created-by:username1,username2`.
This commit adds support for the `in:<topic notification level>` query
filter. As an example, `in:tracking` will filter for topics that the
user is watching. Filtering for multiple topic notification levels can
be done by comma separating the topic notification level keys. For
example, `in:muted,tracking` or `in:muted,tracking,watching`.
Alternatively, the user can also compose multiple filters with `in:muted
in:tracking` which translates to the same behaviour as
`in:muted,tracking`.
This commit adds support for the `in:pinned` filter to the topics filtering
query language. When the filter is present, it will filter for topics
where `Topic#pinned_until` is greater than `Topic#pinned_at`.
Before this commit, composing multiple category filters with a query such as category:category1 and category:category2 would not return any results. This is because we were filtering for topics that belonged to both category1 and category2, which is impossible since a topic can only belong to a single category.
With this commit, specifying a query like category:category1 category:category2 will now translate to filtering for topics that belong to either the category1 or category2 category.
This commit adds support for filtering for topics in specific
subcategories via the categories filter query language.
For example: `category:documentation:admins` will filter for topics and
subcategory topics in
the category with slug "admins" whose parent category has the slug
"documentation".
The `=` prefix can also be used such that
`=category:documentation:admins` will exclude subcategory topics of the
category with slug "admins" whose parent category has the slug
"documentation".
On the `/filter` route, the categories filtering query language is now
supported in the input per the example provided below:
```
category:bug => topics in the bug category AND all subcategories
=category:bug => topics in the bug category excluding subcategories
category:bug,feature => allow for categories either in bug or feature
=category:bug,feature => allow for exact categories match excluding sub cats
categories: => alias for category
```
Currently composing multiple category filters is not supported as we
have yet to determine what behaviour it should result in. For example,
`category:bug category:feature` would now return topics that are in both
the `bug` and `feature` category but it is not possible for a topic to
belong to two categories.
The following are the changes being introduced in this commit:
1. Instead of mapping the query language to various query params on the
client side, we've decided that the benefits of having a more robust
query language far outweighs the benefits of having a more human readable query params in the URL.
As such, the `/filter` route will just accept a single `q` query param
and the query string will be parsed on the server side.
1. On the `/filter` route, the tags filtering query language is now
supported in the input per the example provided below:
```
tags:bug+feature tagged both bug and feature
tags:bug,feature tagged either bug or feature
-tags:bug+feature excluding topics tagged bug and feature
-tags:bug,feature excluding topics tagged bug or feature
```
The `tags` filter can also be specified multiple
times in the query string like so `tags:bug tags:feature` which will
filter topics that contain both the `bug` tag and `feature` tag. More
complex query like `tags:bug+feature -tags:experimental` will also work.
This change sets the ground work for allowing us to filter topics list
by tags in the following ways:
1. Filter for topics that matches all tags in a given set of tags
2. Filter for topics that matches any tags in a given set of tags
3. Exclude topics that matches all tags in a given set of tags
4. Exclude topics that matches any tags in a given set of tags