This is necessary to allow for large file uploads via
the direct S3 upload mechanism, as we convert the external
file to an Upload record via ExternalUploadManager once
it is complete.
This will allow for files larger than 2,147,483,647 bytes (2.14GB)
to be referenced in the uploads table.
This is a table locking migration, but since it is not as highly
trafficked as posts, topics, or users, the disruption should be minimal.
This pull request introduces the endpoints required, and the JavaScript functionality in the `ComposerUppyUpload` mixin, for direct S3 multipart uploads. There are four new endpoints in the uploads controller:
* `create-multipart.json` - Creates the multipart upload in S3 along with an `ExternalUploadStub` record, storing information about the file in the same way as `generate-presigned-put.json` does for regular direct S3 uploads
* `batch-presign-multipart-parts.json` - Takes a list of part numbers and the unique identifier for an `ExternalUploadStub` record, and generates the presigned URLs for those parts if the multipart upload still exists and if the user has permission to access that upload
* `complete-multipart.json` - Completes the multipart upload in S3. Needs the full list of part numbers and their associated ETags which are returned when the part is uploaded to the presigned URL above. Only works if the user has permission to access the associated `ExternalUploadStub` record and the multipart upload still exists.
After we confirm the upload is complete in S3, we go through the regular `UploadCreator` flow, the same as `complete-external-upload.json`, and promote the temporary upload S3 into a full `Upload` record, moving it to its final destination.
* `abort-multipart.json` - Aborts the multipart upload on S3 and destroys the `ExternalUploadStub` record if the user has permission to access that upload.
Also added are a few new columns to `ExternalUploadStub`:
* multipart - Whether or not this is a multipart upload
* external_upload_identifier - The "upload ID" for an S3 multipart upload
* filesize - The size of the file when the `create-multipart.json` or `generate-presigned-put.json` is called. This is used for validation.
When the user completes a direct S3 upload, either regular or multipart, we take the `filesize` that was captured when the `ExternalUploadStub` was first created and compare it with the final `Content-Length` size of the file where it is stored in S3. Then, if the two do not match, we throw an error, delete the file on S3, and ban the user from uploading files for N (default 5) minutes. This would only happen if the user uploads a different file than what they first specified, or in the case of multipart uploads uploaded larger chunks than needed. This is done to prevent abuse of S3 storage by bad actors.
Also included in this PR is an update to vendor/uppy.js. This has been built locally from the latest uppy source at d613b849a6. This must be done so that I can get my multipart upload changes into Discourse. When the Uppy team cuts a proper release, we can bump the package.json versions instead.
This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads` settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader.
A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used.
### Starting a direct S3 upload
When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded.
Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage.
### Completing a direct S3 upload
Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`.
1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this.
2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues.
We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large.
3. Finally we return the serialized upload record back to the client
There are several errors that could happen that are handled by `UploadsController` as well.
Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
This commit adds the number of drafts a user has next to the "Draft"
label in the user preferences menu and activity tab. The count is
updated via MessageBus when a draft is created or destroyed.
When configured, all topics in the category inherits the slow mode
duration from the category's default.
Note that currently there is no way to remove the slow mode from the
topics once it has been set.
Both of the commits in this PR are meant to fix the problem of invalid
option being shown in the flair chooser. An invalid option can be shown
if at some point it was a valid one - a group with a flair that was
later changed by an admin and flair was removed. The other option an
invalid option can be selected is if the user had a primary group when
the migration ran and copied the same value to the flair_group_id
column.
* FIX: Set flair_group_id only if group has flair
Follow up to 4ba93aac66.
* FIX: Do not show invalid option in flair chooser
If selected flair group became unavailable because the flair was removed
then the option would still be selected and visible as an ID only.
This is a follow up to commit 87c1e98571
which introduced different fields for primary and flair groups. Before
that, primary group was used as a flair group too.
User flair was given by user's primary group. This PR separates the
two, adds a new field to the user model for flair group ID and users
can select their flair from user preferences now.
Since disable_ddl_transaction! is disabled for this migration, it needs to be idempotent. Any error during the migration (e.g. a timeout) will cause ActiveRecord to fail the migration, and try again on the next run. If the index had already been created during the first run, then an 'already exists' error will be raised, with no way to recover.
Unfortunately an [ActiveRecord bug](https://github.com/rails/rails/pull/41490) prevents us from using `if_not_exists: true` alongside `algorithm: :concurrently`, so we have to drop to raw SQL.
The duration column has been ignored since the commit
4af77f1e38
for topic_timers, we use duration_minutes instead.
Also removing the duration key from Topic.set_or_create_timer. The only
plugin to use this was discourse-solved, which doesn't use it any
longer
since
c722b94a97
In the previous commit 5222247
we added a topic_id column to EmailLog. This simply backfills it in
batches. The next PR will get rid of the topic method defined on EmailLog in favour
of belongs_to.
* FEATURE: Staff can receive pending user reminders more frequently.
We now express the "pending_users_reminder_delay" in minutes instead of hours so staff can have finer control over the delay.
We need to keep in mind that the reminders could still take up to 20 minutes, even when using a lower value. We send them from a scheduled job.
* Migrate to a new site setting for the reminders delay
ATM it only implements server side of it, as my need is for automation purposes. However it should probably be added in the UI too as it's unexpected to have pinned_until and no bannered_until.
Having a large number of post-deploy migrations running out-of-numerical-sequence with pre-deploy migrations can be problematic. For example, if we have the sequence
- db/migrate/2017... - add column
- db/post_migrate/2018... - drop the column
- db/migrate/2021... - add the same column again
It will work fine in numerical order. But if you run the pre-deploy migrations **followed by** the post-deploy migrations, you will not get the same result.
Our post-deploy system is designed to allow for seamless upgrades of Discourse. However, it is reasonable for us to only support this totally seamless experience for a limited period of time. This commit moves all post_deploy migrations which are more than 1 year old (i.e. more than 2 major Discourse versions ago) into the regular pre-deploy migrations directory. This limits the impact of any edge cases caused by out-of-numerical-sequence migrations.
This adds the following columns to EmailLog:
* cc_addresses
* cc_user_ids
* topic_id
* raw
This is to bring the EmailLog table closer in parity to
IncomingEmail so it can be better utilized for Group SMTP
and IMAP mailing.
The raw column contains the full content of the outbound email,
but _only_ if the new hidden site setting
enable_raw_outbound_email_logging is enabled. Most sites do not
need it, and it's mostly required for IMAP and SMTP sending.
In the next pull request, there will be a migration to backfill
topic_id on the EmailLog table, at which point we can remove the
topic fallback method on EmailLog.
The first thing we needed here was an enum rather than a boolean to determine how a directory_column was created. Now we have `automatic`, `user_field` and `plugin` directory columns.
This plugin API is assuming that the plugin has added a migration to a column to the `directory_items` table.
This was created to be initially used by discourse-solved. PR with API usage - https://github.com/discourse/discourse-solved/pull/137/
Adds a new `smtp_group_id` column to `EmailLog` which is filled in if the mail `from_address` matches a group's `email_username`. This is for easier debugging, so we know which emails have been sent via group SMTP.
Over the years we have found that a few communities never discovered tags.
Instead of having them default off we now have them default on, ensuring
that everyone finds out about them.
Co-authored-by: Dan Ungureanu <dan@ungureanu.me>
Users who use encoded slugs on their sites sometimes run into 500 error when pasting a link to another topic in a post. The problem happens when generating a backward "reflection" link that would appear in a linked topic. Link URL restricted on the database level to 500 chars in length. At first glance, it should work since we have a restriction on topic title length.
But it doesn't work when a site uses encoded slugs, like here (take a look at the URL). The link to a topic, in this case, can be much longer than 500 characters.
By the way, an error happens only when generating a "reflection" link and doesn't happen with a direct link, we truncate that link. It works because, in this case, the original long link is still present in the post body and can be used for navigation. But we can't do the same for backward "reflection" links (without rewriting their implementation), the whole link must be saved to the database.
The simplest and cleanest solution will be just to remove the restriction on the database level. Abuse is impossible here since we are already protected by the restriction on topic title length. There aren’t performance benefits in using length-constrained columns in Postgres, in fact, length-constrained columns need a few extra CPU cycles to check the length when storing data.
It was not clear that replace watched words can be used to replace text
with URLs. This introduces a new watched word type that makes it easier
to understand.
Refactors `TrustLevel` and moves translations from server to client
Additional changes:
* "staff" and "admin" wasn't translatable in site settings
* it replaces a concatenated string with a translation
* uses translation for trust levels in users_by_trust_level report
* adds a DB migration to rename keys of translation overrides affected by this commit
The excerpt field in the database is constrained to 1000 chars in length. To support this constraint we added a restriction that the topic_excerpt_maxlength setting must be between 0 and 999.
Unfortunately, sometimes it doesn’t work because:
- topic_excerpt_maxlength restricts the length of a visible to user excerpt. But we HTML-escape text before saving. If an excerpt contains & it’ll be &. One character for the user but 5 characters to save to the database. So if topic_excerpt_maxlength is set to 999 it’s not so hard to have an excerpt of for example 1003 characters in length and run into this issue.
- It’s possible to define a custom excerpt for a topic. Such excerpts bypass check for a length. So if the user defines a too long custom excerpt he will run into this issue.
Removing the constraint on the database level solves the problem. But we still need the constraint for topic_excerpt_maxlength on the setting page, because too long excerpts would make UI wonky.
This overhauls the user interface for the group email settings management, aiming to make it a lot easier to test the settings entered and confirm they are correct before proceeding. We do this by forcing the user to test the settings before they can be saved to the database. It also includes some quality of life improvements around setting up IMAP and SMTP for our first supported provider, GMail. This PR does not remove the old group email config, that will come in a subsequent PR. This is related to https://meta.discourse.org/t/imap-support-for-group-inboxes/160588 so read that if you would like more backstory.
### UI
Both site settings of `enable_imap` and `enable_smtp` must be true to test this. You must enable SMTP first to enable IMAP.
You can prefill the SMTP settings with GMail configuration. To proceed with saving these settings you must test them, which is handled by the EmailSettingsValidator.
If there is an issue with the configuration or credentials a meaningful error message should be shown.
IMAP settings must also be validated when IMAP is enabled, before saving.
When saving IMAP, we fetch the mailboxes for that account and populate them. This mailbox must be selected and saved for IMAP to work (the feature acts as though it is disabled until the mailbox is selected and saved):
### Database & Backend
This adds several columns to the Groups table. The purpose of this change is to make it much more explicit that SMTP/IMAP is enabled for a group, rather than relying on settings not being null. Also included is an UPDATE query to backfill these columns. These columns are automatically filled when updating the group.
For GMail, we now filter the mailboxes returned. This is so users cannot use a mailbox like Sent or Trash for syncing, which would generally be disastrous.
There is a new group endpoint for testing email settings. This may be useful in the future for other places in our UI, at which point it can be extracted to a more generic endpoint or module to be included.
Previously we would retry push notifications indefinitely for all errors
except for ExpiredSubscription
Under certain conditions other persistent errors may arise such as a persistent
rate limit.
If we track more than 3 errors in a period of time longer than a day we will
delete the subscription
Also performs a bit of internal cleanup to ensure protected methods really
are private.
Admins can visit an approved queued topic from the review queue by clicking their title. We no longer store the created post and topic ids in the reviewable's payload object. Instead, we set the `topic_id` and `target_id` attributes.
Over the years we accrued many spelling mistakes in the code base.
This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change
- comments
- test descriptions
- other low risk areas
* FEATURE: add support for like webhooks
Add support for like webhooks. Webhook events only send on user membership
in the defined webhook group filters.
This also fixes group webhook events, as before this was never used, and
the logic was not correct.
When the admin creates a new custom field they can specify if that field should be searchable or not.
That setting is taken into consideration for quick search results.
When invited by email, users will receive an invite URL which contains
a token. If that token is present when the invite is redeemed, their
account will be automatically activated.
This PR adds a new category setting which is a column in the `categories` table, `allow_unlimited_owner_edits_on_first_post`.
What this does is:
* Inside the `can_edit_post?` method of `PostGuardian`, if the current user editing a post is the owner of the post, it is the first post, and the topic's category has `allow_unlimited_owner_edits_on_first_post`, then we bypass the check for `LimitedEdit#edit_time_limit_expired?` on that post.
* Also, similar to wiki topics, in `PostActionNotifier#after_create_post_revision` we send a notification to all users watching a topic when the OP is edited in a topic with the category setting `allow_unlimited_owner_edits_on_first_post` enabled.
This is useful for forums where there is a Marketplace or similar category, where topics are created and then updated indefinitely by the OP rather than the OP making new topics or additional replies. In a way this acts similar to a wiki that only one person can edit.
When posts are moved from one topic to another, the `topic_user.bookmarked` column for all users in the new and the old topic needs to be resynced, for example because a user bookmarks post 12 in topic 1, then it is moved to topic 2, the topic_user record for topic 1 should no longer be bookmarked. A background job has been added to sync the column for a specified topic, or for no topic at all, which does it for all topics like the migration.
Also includes a migration that we have run in the past to fix bad data.
----
This has been addressed in other places in the past:
https://github.com/discourse/discourse/pull/10211https://github.com/discourse/discourse/pull/10188