Commit Graph

86 Commits

Author SHA1 Message Date
Alan Guo Xiang Tan cd64e88711
PERF: Improve database query perf when loading topics for a category. (#14416)
* PERF: Improve database query perf when loading topics for a category.

Instead of left joining the `topics` table against `categories` by filtering with `categories.id`,
we can improve the query plan by filtering against `topics.category_id`
first before joining which helps to reduce the number of rows in the
topics table that has to be joined against the other tables and also
make better use of our existing index.

The following is a before and after of the query plan for a category
with many subcategories.

Before:

```
                                                                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
 Limit  (cost=1.28..747.09 rows=30 width=12) (actual time=85.502..2453.727 rows=30 loops=1)
   ->  Nested Loop Left Join  (cost=1.28..566518.36 rows=22788 width=12) (actual time=85.501..2453.722 rows=30 loops=1)
         Join Filter: (category_users.category_id = topics.category_id)
         Filter: ((topics.category_id = 11) OR (COALESCE(category_users.notification_level, 1) <> 0) OR (tu.notification_level > 1))
         ->  Nested Loop Left Join  (cost=1.00..566001.58 rows=22866 width=20) (actual time=85.494..2453.702 rows=30 loops=1)
               Filter: ((COALESCE(tu.notification_level, 1) > 0) AND ((topics.category_id <> 11) OR (topics.pinned_at IS NULL) OR ((t
opics.pinned_at <= tu.cleared_pinned_at) AND (tu.cleared_pinned_at IS NOT NULL))))
               Rows Removed by Filter: 1
               ->  Nested Loop  (cost=0.57..528561.75 rows=68606 width=24) (actual time=85.472..2453.562 rows=31 loops=1)
                     Join Filter: ((topics.category_id = categories.id) AND ((categories.topic_id <> topics.id) OR (categories.id = 1
1)))
                     Rows Removed by Join Filter: 13938306
                     ->  Index Scan using index_topics_on_bumped_at on topics  (cost=0.42..100480.05 rows=715549 width=24) (actual ti
me=0.010..633.015 rows=464623 loops=1)
                           Filter: ((deleted_at IS NULL) AND ((archetype)::text <> 'private_message'::text))
                           Rows Removed by Filter: 105321
                     ->  Materialize  (cost=0.14..36.04 rows=30 width=8) (actual time=0.000..0.002 rows=30 loops=464623)
                           ->  Index Scan using categories_pkey on categories  (cost=0.14..35.89 rows=30 width=8) (actual time=0.006.
.0.040 rows=30 loops=1)
                                 Index Cond: (id = ANY ('{11,53,57,55,54,56,112,94,107,115,116,117,97,95,102,103,101,105,99,114,106,1
13,104,98,100,96,108,109,110,111}'::integer[]))
               ->  Index Scan using index_topic_users_on_topic_id_and_user_id on topic_users tu  (cost=0.43..0.53 rows=1 width=16) (a
ctual time=0.004..0.004 rows=0 loops=31)
                     Index Cond: ((topic_id = topics.id) AND (user_id = 1103877))
         ->  Materialize  (cost=0.28..2.30 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=30)
               ->  Index Scan using index_category_users_on_user_id_and_last_seen_at on category_users  (cost=0.28..2.29 rows=1 width
=8) (actual time=0.004..0.004 rows=0 loops=1)
                     Index Cond: (user_id = 1103877)
 Planning Time: 1.359 ms
 Execution Time: 2453.765 ms
(23 rows)
```

After:

```
                                                                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.28..438.55 rows=30 width=12) (actual time=38.297..657.215 rows=30 loops=1)
   ->  Nested Loop Left Join  (cost=1.28..195944.68 rows=13443 width=12) (actual time=38.296..657.211 rows=30 loops=1)
         Filter: ((categories.topic_id <> topics.id) OR (topics.category_id = 11))
         Rows Removed by Filter: 29
         ->  Nested Loop Left Join  (cost=1.13..193462.59 rows=13443 width=16) (actual time=38.289..657.092 rows=59 loops=1)
               Join Filter: (category_users.category_id = topics.category_id)
               Filter: ((topics.category_id = 11) OR (COALESCE(category_users.notification_level, 1) <> 0) OR (tu.notification_level > 1))
               ->  Nested Loop Left Join  (cost=0.85..193156.79 rows=13489 width=20) (actual time=38.282..657.059 rows=59 loops=1)
                     Filter: ((COALESCE(tu.notification_level, 1) > 0) AND ((topics.category_id <> 11) OR (topics.pinned_at IS NULL) OR ((topics.pinned_at <= tu.cleared_pinned_at) AND (tu.cleared_pinned_at IS NOT NULL))))
                     Rows Removed by Filter: 1
                     ->  Index Scan using index_topics_on_bumped_at on topics  (cost=0.42..134521.06 rows=40470 width=24) (actual time=38.267..656.850 rows=60 loops=1)
                           Filter: ((deleted_at IS NULL) AND ((archetype)::text <> 'private_message'::text) AND (category_id = ANY ('{11,53,57,55,54,56,112,94,107,115,116,117,97,95,102,103,101,105,99,114,106,113,104,98,100,96,108,109,110,111}'::integer[])))
                           Rows Removed by Filter: 569895
                     ->  Index Scan using index_topic_users_on_topic_id_and_user_id on topic_users tu  (cost=0.43..1.43 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=60)
                           Index Cond: ((topic_id = topics.id) AND (user_id = 1103877))
               ->  Materialize  (cost=0.28..2.30 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=59)
                     ->  Index Scan using index_category_users_on_user_id_and_last_seen_at on category_users  (cost=0.28..2.29 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=1)
                           Index Cond: (user_id = 1103877)
         ->  Index Scan using categories_pkey on categories  (cost=0.14..0.17 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=59)
               Index Cond: (id = topics.category_id)
 Planning Time: 1.633 ms
 Execution Time: 657.255 ms
(22 rows)
```

* PERF: Optimize index on topics bumped_at.

Replace `index_topics_on_bumped_at` index with a partial index on `Topic#bumped_at` filtered by archetype since there is already another index that covers private topics.
2021-09-28 10:05:00 +08:00
Martin Brennan 22208836c5
DEV: Ignore bookmarks.topic_id column and remove references to it in code (#14289)
We don't need no stinkin' denormalization! This commit ignores
the topic_id column on bookmarks, to be deleted at a later date.
We don't really need this column and it's better to rely on the
post.topic_id as the canonical topic_id for bookmarks, then we
don't need to remember to update both columns if the bookmarked
post moves to another topic.
2021-09-15 10:16:54 +10:00
Jarek Radosz 07c6b720bc
DEV: Remove `PostProcessed` trigger option (#13916)
It was deprecated 5 years ago in e55e2aff94

I've seen it still being used in the wild, even though it doesn't do anything anymore as I understand it.
2021-08-04 22:24:47 +02:00
Martin Brennan ef90575b91
DEV: Drop uploads verified column (#13677)
I marked this ignored almost a year ago in
80268357e7
2021-07-09 16:16:13 +10:00
David Taylor f999ef2d52
DEV: Drop user_options.disable_jump_reply column (#13646)
24ef4f7b removed the use of this column in 2019
2021-07-06 10:47:17 +01:00
Alan Guo Xiang Tan 37b8ce79c9
FEATURE: Add last visit indication to topic view page. (#13471)
This PR also removes grey old unread bubble from the topic badges by
dropping `TopicUser#highest_seen_post_number`.
2021-07-05 14:17:31 +08:00
Martin Brennan d098f51ad3
DEV: Drop duration column from topic timers (#13543)
The duration column has been ignored since the commit
4af77f1e38
for topic_timers, we use duration_minutes instead.

Also removing the duration key from Topic.set_or_create_timer. The only
plugin to use this was discourse-solved, which doesn't use it any
longer
since
c722b94a97
2021-06-29 09:27:12 +10:00
Sam 14a0247301
PERF: optimise backfilling of topic_id (#13545)
Relying on large offsets can have uneven performance on huge table, new
implementation recovers more cleanly and avoids double updates
2021-06-28 16:16:22 +10:00
Martin Brennan 69518bee15
FIX: Backfill topic_id for EmailLog (#13469)
In the previous commit 5222247
we added a topic_id column to EmailLog. This simply backfills it in
batches. The next PR will get rid of the topic method defined on EmailLog in favour
of belongs_to.
2021-06-28 08:15:52 +10:00
David Taylor 5968dc07a5 DEV: Promote historic post_deploy migrations
This commit promotes all post_deploy migrations which existed in Discourse v2.6.7 (timestamp <= 20201110110952)
2021-06-23 17:43:38 +01:00
David Taylor e76c583b91
DEV: Promote old post-deploy migrations to pre-deploy migrations (#13477)
Having a large number of post-deploy migrations running out-of-numerical-sequence with pre-deploy migrations can be problematic. For example, if we have the sequence

- db/migrate/2017... - add column
- db/post_migrate/2018... - drop the column
- db/migrate/2021... - add the same column again

It will work fine in numerical order. But if you run the pre-deploy migrations **followed by** the post-deploy migrations, you will not get the same result.

Our post-deploy system is designed to allow for seamless upgrades of Discourse. However, it is reasonable for us to only support this totally seamless experience for a limited period of time. This commit moves all post_deploy migrations which are more than 1 year old (i.e. more than 2 major Discourse versions ago) into the regular pre-deploy migrations directory. This limits the impact of any edge cases caused by out-of-numerical-sequence migrations.
2021-06-22 16:02:24 +01:00
Mark VanLandingham 95b51669ad
DEV: Revert 3 commits for plugin API to add directory columns (#13423) 2021-06-17 12:37:37 -05:00
Mark VanLandingham 0c42a29dc4
DEV: Plugin API to allow creation of directory columns with item query (#13402)
The first thing we needed here was an enum rather than a boolean to determine how a directory_column was created. Now we have `automatic`, `user_field` and `plugin` directory columns.

This plugin API is assuming that the plugin has added a migration to a column to the `directory_items` table.

This was created to be initially used by discourse-solved. PR with API usage - https://github.com/discourse/discourse-solved/pull/137/
2021-06-17 09:06:18 -05:00
Andrei Prigorshnev 932a2fe419
FIX: PG::StringDataRightTruncation when linking posts (#13134)
Users who use encoded slugs on their sites sometimes run into 500 error when pasting a link to another topic in a post. The problem happens when generating a backward "reflection" link that would appear in a linked topic. Link URL restricted on the database level to 500 chars in length. At first glance, it should work since we have a restriction on topic title length.

But it doesn't work when a site uses encoded slugs, like here (take a look at the URL). The link to a topic, in this case, can be much longer than 500 characters.

By the way, an error happens only when generating a "reflection" link and doesn't happen with a direct link, we truncate that link. It works because, in this case, the original long link is still present in the post body and can be used for navigation. But we can't do the same for backward "reflection" links (without rewriting their implementation), the whole link must be saved to the database.

The simplest and cleanest solution will be just to remove the restriction on the database level. Abuse is impossible here since we are already protected by the restriction on topic title length. There aren’t performance benefits in using length-constrained columns in Postgres, in fact, length-constrained columns need a few extra CPU cycles to check the length when storing data.
2021-06-02 15:27:04 +04:00
Andrei Prigorshnev 00300b118d
FIX: errors that're triggering by too long excerpts (#13056)
The excerpt field in the database is constrained to 1000 chars in length. To support this constraint we added a restriction that the topic_excerpt_maxlength setting must be between 0 and 999.

Unfortunately, sometimes it doesn’t work because:

- topic_excerpt_maxlength restricts the length of a visible to user excerpt. But we HTML-escape text before saving. If an excerpt contains & it’ll be &amp;. One character for the user but 5 characters to save to the database. So if topic_excerpt_maxlength is set to 999 it’s not so hard to have an excerpt of for example 1003 characters in length and run into this issue.
- It’s possible to define a custom excerpt for a topic. Such excerpts bypass check for a length. So if the user defines a too long custom excerpt he will run into this issue.

Removing the constraint on the database level solves the problem. But we still need the constraint for topic_excerpt_maxlength on the setting page, because too long excerpts would make UI wonky.
2021-05-31 14:59:40 +04:00
Martin Brennan 501de809da
FIX: Do not mark badge image uploads as secure (#13193)
* FIX: Do not mark badge image uploads as secure

We do not need badge_image upload types to be marked as secure.
Post migration is the same as
https://github.com/discourse/discourse/pull/12081.

See
https://meta.discourse.org/t/secure-media-uploads/140017/122?u=martin
2021-05-28 12:35:52 +10:00
Martin Brennan 2d686191b5
FIX: Bookmark topics were not being updated when the post moved (#12542)
Because bookmarks have both topic and post ID, when the post was moved into another topic the bookmark was still attached to the post but did not show in the UI. This PR makes it so the all topic IDs for bookmarks attached to a post are updated when a post is moved.

Also included is a migration to fix affected records (e.g. on Meta there are 20 affected records).

See: https://meta.discourse.org/t/improved-bookmarks-with-reminders/144542/203
2021-03-29 11:25:48 +10:00
Krzysztof Kotlarek c03c85e661
FIX: delete orphan post revisions (#12502)
I was adding specs to ensure that post actions and uploads are removed for permanently deleted posts.

I noticed that post revisions were not permanently destroyed. I added a migration to fix old data.
2021-03-25 12:34:53 +11:00
David Taylor 8fd46c04ea
Drop flash video onebox (#12261)
Flash was discontinued by Adobe at the end of 2020. There is no need to continue OneBox support for it
2021-03-02 17:11:14 +00:00
David Taylor b22ea7911c
DEV: Drop old SSO site setting rows from the database (#12148)
These were copied to their new names in 821bb1e8cb
2021-02-19 19:05:49 +00:00
Sam 58de9e85be
FIX: ensure corrected migration runs (#12137)
Some instances may have ran earier version of the migration. Ensure newer
one runs instead.
2021-02-19 11:48:32 +11:00
Sam 4ef642b300
FIX: optimise MoveNewSinceToTable (#12136)
* FIX: optimise MoveNewSinceToTable

Avoids shuffling all ids around to the app (only use min / max)
Ensure the query for boundaries is ordered by user_id
2021-02-19 11:35:52 +11:00
Dan Ungureanu bddf94c0ab
FIX: Delete topic timers far in the future (#12125)
The migration used to fail because the same duration in minutes was out
of the integer range. The '20 years' limit was introduced in e0f0fe5.
2021-02-18 14:18:43 +02:00
Krzysztof Kotlarek 7829558c6d
FIX: dismiss new when topic_user exists without last read (#12103)
The bug was mentioned on meta: https://meta.discourse.org/t/pressing-dismiss-new-doesnt-clear-new-topics/179858

Problem is that sometimes the user has TopicUser records with `last_read_post_number` set as NULL. In that case, the topic is still "new" to them and should be dismissed when they click dismiss button.

In addition, I added that condition to post_migration and bumped the number to fix existing records. Migration is written to be idempotent so it will make no harm to already deployed instances.
2021-02-18 10:39:05 +11:00
Martin Brennan 9f0f801ae3
FIX: Do not mark group_flair images as secure on upload (#12081)
See https://meta.discourse.org/t/secure-media-uploads-breaks-group-flair-image/173671/4

Group flair image uploads definitely do not need to be secure.
2021-02-16 12:34:03 +10:00
Krzysztof Kotlarek ad3ec5809f
FIX: Dismiss new with better migration (#12062)
Original PR was reverted because of broken migration https://github.com/discourse/discourse/pull/12058

I fixed it by adding this line
```
          AND topics.id IN(SELECT id FROM topics ORDER BY created_at DESC LIMIT :max_new_topics)
```

This time it is left joining a limited amount of topics. I tested it on few databases and it worked quite smooth
2021-02-15 08:50:33 +11:00
Martin Brennan 18da1d5b07
FIX: Topic timer duration_minutes was not backfilled correctly (#12004)
Because of a where clause of duration_minutes != duration, where
duration_minutes was NULL, the previous migration to fill the new
duration_minutes column failed. This corrects the failed migration
by just running the update where duration_minutes is NULL and duration
IS NOT NULL.

Previous commit is 4af77f1
2021-02-08 09:33:08 +10:00
Mark VanLandingham 809274fe0d
DEV: Replace 'processed' column on notifications with new table (#11864) 2021-01-27 10:29:24 -06:00
Régis Hanol cd3d24ed8c
FIX: move post_search_data migration into onceoff job (#11851)
And reduce the size of the batches to 100k.

That should hopefully make the migrations run smoother...
2021-01-26 16:29:00 +01:00
Régis Hanol c56ba6c9bd
PERF: batch expensive post-migration (#11845)
Run the 'MigrateSearchDataAfterDefaultLocaleRename' post migration in batches of 500k records.

This will hopefully prevent any potential deadlocks on large tables.
2021-01-25 22:58:58 +01:00
Gerhard Schlager e65c5b0aad
PERF: Migrate search data after locale rename (#11831)
The default locale `en_US` has been renamed into `en`. This tries to migrate existing search data to avoid resource intensive reindexing.
2021-01-25 14:30:17 +01:00
David Taylor e8452a55a6
DEV: Drop github_user_infos table (#11181)
Follow-up to cf21de0e7a
2020-11-10 11:33:27 +00:00
Jarek Radosz 2f4a1ff61b
DEV: Update rubocop-discourse from 2.3.2 to 2.4.0 (#11079)
Also fixes whitespace related issues raised by rubocop.
2020-10-30 15:04:29 +01:00
Guo Xiang Tan 9b75d95fc6 PERF: Keep track of first unread PM and first unread group PM for user.
This optimization helps to filter away topics so that the joins on
related tables when querying for unread messages is not expensive.
2020-09-09 14:05:41 +08:00
Guo Xiang Tan 87de8948c0
DEV: Drop search index on non-pm posts.
The problem with this index is that on sites with a high non-pm to pm
posts ratio, the index is esstentially duplicating the existing index on
`PostSearchData#search_data`. If the site is huge, the index ends up
taking up more diskspace.
2020-08-21 07:21:34 +08:00
Sam Saffron 54cf3c6766
PERF: Drop index idx_regular_post_search_data concurrently
This can slightly help with the drop command.

That said if a giant vacuum is running we may still time out.
2020-08-20 13:39:46 +10:00
Sam Saffron 628319aad3
PERF: drop idx_regular_post_search_data during migration
Rebuilding this index while amending the boolean is very expensive.

Avoid this work
2020-08-20 12:48:48 +10:00
Sam Saffron d2c504ea86
PERF: Improve performance of post_search_data migration
Very large batches can take an enormous amount of time due to churn

Limiting to 200k changes at a time gives us a far larger chance of finishing
the job without timing out or deadlocking.
2020-08-20 08:45:04 +10:00
Sam Saffron fcfaa8b063
PERF: Ensure transaction is of minimal size
A giant transaction in a post migration can be very risky.

This splits the large amount of work this migration needs to do into 2 parts:

1. A re-runnable cleanup job prior to transaction
2. A minimally sized transaction to add the database constraint

This avoids large amounts of churn on the table
2020-08-19 17:15:14 +10:00
Guo Xiang Tan 7e414da0d9
DEV: Fix lint. 2020-08-18 16:59:57 +08:00
Guo Xiang Tan 2161abfabd
DEV: Move data migration of `PostSearchData#private_message` into post_migration.
Follow-up to 92b7fe4c62
2020-08-18 16:46:14 +08:00
Krzysztof Kotlarek 14003abc37
FIX: Improve allowed_path column migration (#10321)
Because previous migration was already deployed and some databases were already migrated, I needed to add some conditions to the migration.

Previous migration - https://github.com/discourse/discourse/blob/master/db/post_migrate/20200629232159_rename_path_whitelist_to_allowed_paths.rb

What will happen in a scenario when previous migration was not run.
1. column allowed_paths will be created
2. allowed_path will be populated with data from path_whitelist
3. path_whitelist column will be dropped

What will happen in a scenario when previous migration was already run.
1. column allowed_paths will not be created because already exists - `unless column_exists?(:embeddable_hosts, :allowed_paths)`
2. Data will not be copied because path_whitelist is missing - `if column_exists?(:embeddable_hosts, :path_whitelist) && column_exists?(:embeddable_hosts, :allowed_paths)`
3. path_whitelist column deletion will be skipped - `if column_exists?(:embeddable_hosts, :path_whitelist)`
2020-07-28 13:31:51 +10:00
Krzysztof Kotlarek e0d9232259
FIX: use allowlist and blocklist terminology (#10209)
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
Martin Brennan 41b43a2a25
FEATURE: Add "delete on owner reply" bookmark functionality (#10231)
This adds an option to "delete on owner reply" to bookmarks. If you select this option in the modal, then reply to the topic the bookmark is in, the bookmark will be deleted on reply.

This PR also changes the checkboxes for these additional bookmark options to an Integer column in the DB with a combobox to select the option you want.

The use cases are:

* Sometimes I will bookmark the topics to read it later. In this case we definitely don’t need to keep the bookmark after I replied to it.
* Sometimes I will read the topic in mobile and I will prefer to reply in PC later. Or I may have to do some research before reply. So I will bookmark it for reply later.
2020-07-21 10:00:39 +10:00
David Taylor c74f2b949f
DEV: Correct historical posts table schema discrepancies (#10014)
This is a no-op for modern installations. It only affects older sites.

Follow-up to f7d676dce1

This needs to be done in a transaction, because we need to drop and recreate the badge_posts view.
2020-06-11 11:45:46 +01:00
Dan Ungureanu 5bfe1ee4f1
FEATURE: Improve UX support for multiple email addresses (#9691) 2020-06-10 19:11:49 +03:00
Arpit Jalan 274db95fb2
FIX: MigrateInviteRedeemedDataToInvitedUsers should be normal migration (#10013) 2020-06-10 16:09:36 +05:30
Arpit Jalan 3094459cd9
FEATURE: multiple use invite links (#9813) 2020-06-09 20:49:32 +05:30
David Taylor 75b1298e99
DEV: Drop unused image_url column from posts and topics (#9953)
This has been superseded by image_upload_id. The image_url value in API responses is now generated dynamically from the upload record.
2020-06-02 16:21:38 +10:00
Michael Brown d9a02d1336
Revert "Revert "Merge branch 'master' of https://github.com/discourse/discourse""
This reverts commit 20780a1eee.

* SECURITY: re-adds accidentally reverted commit:
  03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
  instead of the 03d26cd6 parent (which contains security fixes)
2020-05-23 00:56:13 -04:00