Commit Graph

75 Commits

Author SHA1 Message Date
Martin Brennan 0568d36133
FIX: Use dualstack S3 endpoint for direct uploads ()
When we added direct S3 uploads to Discourse, which use
presigned URLs, we never took into account the dualstack
endpoints for IPv6 on S3.

This commit fixes the issue by using the dualstack endpoints
for presigned URLs and requests, which are used in the
get-presigned-put and batch-presign-urls endpoints used when
directly uploading to S3.

It also makes regular S3 requests for `put` and so on use
dualstack URLs. It doesn't seem like there is a downside to
doing this, but a bunch of specs needed to be updated to reflect this.
2024-11-07 11:06:39 +10:00
Alan Guo Xiang Tan 57f4176b57
DEV: Bump rubocop_discourse () 2024-11-06 06:27:49 +08:00
Alan Guo Xiang Tan 8cf4ed5f88
DEV: Introduce hidden `s3_inventory_bucket` site setting ()
This commit introduces a hidden `s3_inventory_bucket` site setting which
replaces the `enable_s3_inventory` and `s3_configure_inventory_policy`
site setting.

The reason `enable_s3_inventory` and `s3_configure_inventory_policy`
site settings are removed is because this feature has technically been
broken since it was introduced. When the `enable_s3_inventory` feature
is turned on, the app will because configure a daily inventory policy for the
`s3_upload_bucket` bucket and store the inventories under a prefix in
the bucket. The problem here is that once the inventories are created,
there is nothing cleaning up all these inventories so whoever that has
enabled this feature would have been paying the cost of storing a whole
bunch of inventory files which are never used. Given that we have not
received any complains about inventory files inflating S3 storage costs,
we think that it is very likely that this feature is no longer being
used and we are looking to drop support for this feature in the not too
distance future.

For now, we will still support a hidden `s3_inventory_bucket` site
setting which site administrators can configure via the
`DISCOURSE_S3_INVENTORY_BUCKET` env.
2024-06-10 13:16:00 +08:00
Alan Guo Xiang Tan df16ab0758
FIX: `S3Inventory` to ignore files older than last backup restore date ()
This commit updates `S3Inventory#files` to ignore S3 inventory files
which have a `last_modified` timestamp which are not at least 2 days
older than `BackupMetadata.last_restore_date` timestamp.

This check was previously only in `Jobs::EnsureS3UploadsExistence` but
`S3Inventory` can also be used via Rake tasks so this protection needs
to be in `S3Inventory` and not in the scheduled job.
2024-05-24 10:54:06 +08:00
Martin Brennan 731dffdf92
DEV: Align S3 transfer acceleration global settings ()
Followup to fe05fdae24

For consistency with other S3 settings, make the global setting
the same name as the site setting and use SiteSetting.Upload
too so it reads from the correct place.
2023-11-10 09:50:23 +10:00
Martin Brennan fe05fdae24
DEV: Introduce S3 transfer acceleration for uploads behind hidden setting ()
This commit adds an `enable_s3_transfer_acceleration` site setting,
which is hidden to begin with. We are adding this because in certain
regions, using https://aws.amazon.com/s3/transfer-acceleration/ can
drastically speed up uploads, sometimes as much as 70% in certain
regions depending on the target bucket region. This is important for
us because we have direct S3 multipart uploads enabled everywhere
on our hosting.

To start, we only want this on the uploads bucket, not the backup one.
Also, this will accelerate both uploads **and** downloads, depending
on whether a presigned URL is used for downloading. This is the case
when secure uploads is enabled, not anywhere else at this time. To
enable the S3 acceleration on downloads more generally would be a
more in-depth change, since we currently store S3 Upload record URLs
like this:

```
 url: "//test.s3.dualstack.us-east-2.amazonaws.com/original/2X/6/123456.png"
```

For acceleration, `s3.dualstack` would need to be changed to `s3-accelerate.dualstack`
here.

Note that for this to have any effect, Transfer Acceleration must be enabled
on the S3 bucket used for uploads per https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration-examples.html.
2023-11-07 11:50:40 +10:00
Martin Brennan cf42466dea
DEV: Add S3 upload system specs using minio ()
This commit adds some system specs to test uploads with
direct to S3 single and multipart uploads via uppy. This
is done with minio as a local S3 replacement. We are doing
this to catch regressions when uppy dependencies need to
be upgraded or we change uppy upload code, since before
this there was no way to know outside manual testing whether
these changes would cause regressions.

Minio's server lifecycle and the installed binaries are managed
by the https://github.com/discourse/minio_runner gem, though the
binaries are already installed on the discourse_test image we run
GitHub CI from.

These tests will only run in CI unless you specifically use the
CI=1 or RUN_S3_SYSTEM_SPECS=1 env vars.

For a history of experimentation here see https://github.com/discourse/discourse/pull/22381

Related PRs:

* https://github.com/discourse/minio_runner/pull/1
* https://github.com/discourse/minio_runner/pull/2
* https://github.com/discourse/minio_runner/pull/3
2023-08-23 11:18:33 +10:00
Matt Palmer a98d2a8086
FEATURE: allow S3 ACLs to be disabled ()
AWS recommends running buckets without ACLs, and to use resource policies to manage access control instead.
This is not a bad idea, because S3 ACLs are whack, and while resource policies are also whack, they're a more constrained form of whack.
Further, some compliance regimes get antsy if you don't go with the vendor's recommended settings, and arguing that you need to enable ACLs on a bucket just to store images in there is more hassle than it's worth.
The new site setting (s3_use_acls) cannot be disabled when secure
uploads is enabled -- the latter relies on private ACLs for security
at this point in time. We may want to reexamine this in future.
2023-06-06 15:47:40 +10:00
Martin Brennan 21a95b000e
DEV: Remove defunct TODOs ()
* Firefox now finally returns PerformanceMeasure from performance.measure
* Some TODOs were really more NOTE or FIXME material or no longer relevant
* retain_hours is not needed in ExternalUploadsManager,  it doesn't seem like anywhere in the UI sends this as a param for uploads
* https://github.com/discourse/discourse/pull/18413 was merged so we can remove JS test workaround for settings
2023-01-12 09:41:39 +10:00
David Taylor 6417173082
DEV: Apply syntax_tree formatting to `lib/*` 2023-01-09 12:10:19 +00:00
David Taylor f30f9ec5d9
PERF: Update `s3:expire_missing_assets` to delete in batches ()
Some sites may have thousands of stale assets - deleting them one-by-one is very slow.

Followup to e8570b5cc9
2022-11-07 12:53:14 +00:00
Vinoth Kannan 169f2ad443
FIX: don't raise an error if file not found in S3. ()
While deleting the object in S3, don't raise an error if the file is not available in S3.

Co-authored-by: Régis Hanol <regis@hanol.fr>
2022-08-09 15:16:35 +05:30
Gerhard Schlager 9d870f151c
FIX: Uploading large files (> 5GB) failed when `enable_direct_s3_uploads` is enabled ()
Larger files require a multipart copy.
2022-06-28 21:30:00 +02:00
Martin Brennan 641c4e0b7a
FEATURE: Make S3 presigned GET URL expiry configurable ()
Previously we hardcoded the DOWNLOAD_URL_EXPIRES_AFTER_SECONDS const
inside S3Helper to be 5 minutes (300 seconds). For various reasons,
some hosted sites may need this to be longer for other integrations.

The maximum expiry time for presigned URLs is 1 week (which is
604800 seconds), so that has been added as a validation on the
setting as well. The setting is hidden because 99% of the time
it should not be changed.
2022-05-26 09:53:01 +10:00
Martin Brennan e4350bb966
FEATURE: Direct S3 multipart uploads for backups ()
This PR introduces a new `enable_experimental_backup_uploads` site setting (default false and hidden), which when enabled alongside `enable_direct_s3_uploads` will allow for direct S3 multipart uploads of backup .tar.gz files.

To make multipart external uploads work with both the S3BackupStore and the S3Store, I've had to move several methods out of S3Store and into S3Helper, including:

* presigned_url
* create_multipart
* abort_multipart
* complete_multipart
* presign_multipart_part
* list_multipart_parts

Then, S3Store and S3BackupStore either delegate directly to S3Helper or have their own special methods to call S3Helper for these methods. FileStore.temporary_upload_path has also removed its dependence on upload_path, and can now be used interchangeably between the stores. A similar change was made in the frontend as well, moving the multipart related JS code out of ComposerUppyUpload and into a mixin of its own, so it can also be used by UppyUploadMixin.

Some changes to ExternalUploadManager had to be made here as well. The backup direct uploads do not need an Upload record made for them in the database, so they can be moved to their final S3 resting place when completing the multipart upload.

This changeset is not perfect; it introduces some special cases in UploadController to handle backups that was previously in BackupController, because UploadController is where the multipart routes are located. A subsequent pull request will pull these routes into a module or some other sharing pattern, along with hooks, so the backup controller and the upload controller (and any future controllers that may need them) can include these routes in a nicer way.
2021-11-11 08:25:31 +10:00
Martin Brennan 9a72a0945f
FIX: Ensure CORS rules exist for S3 using rake task ()
This commit introduces a new s3:ensure_cors_rules rake task
that is run as a prerequisite to s3:upload_assets. This rake
task calls out to the S3CorsRulesets class to ensure that
the 3 relevant sets of CORS rules are applied, depending on
site settings:

* assets
* direct S3 backups
* direct S3 uploads

This works for both Global S3 settings and Database S3 settings
(the latter set directly via SiteSetting).

As it is, only one rule can be applied, which is generally
the assets rule as it is called first. This commit changes
the ensure_cors! method to be able to apply new rules as
well as the existing ones.

This commit also slightly changes the existing rules to cover
direct S3 uploads via uppy, especially multipart, which requires
some more headers.
2021-11-08 09:16:38 +10:00
Martin Brennan a059c7251f
DEV: Add tests to S3Helper.ensure_cors and move rules to class ()
In preparation for adding automatic CORS rules creation
for direct S3 uploads, I am adding tests here and moving the
CORS rule definitions into a dedicated class so they are all
in the one place.

There is a problem with ensure_cors! as well -- if there is
already a CORS rule defined (presumably the asset one) then
we do nothing and do not apply the new rule. This means that
the S3BackupStore.ensure_cors method does nothing right now
if the assets rule is already defined, and it will mean the
same for any direct S3 upload rules I add for uppy. We need
to be able to add more rules, not just one.

This is not a problem on our hosting because we define the
rules at an infra level.
2021-11-01 08:23:13 +10:00
Martin Brennan 0d809197aa
FIX: Make sure S3 object headers are preserved on copy ()
When copying an existing upload stub temporary object
on S3 to its final destination we were not copying across
its additional headers such as content-disposition and
cache-control, which led to issues like attachments not
downloading with their original filename when clicking
the download links in posts.

This is because the metadata_directive = REPLACE option
was not being passed to object.copy_from(), so only the
source object's headers were being used. Added an option
for apply_metadata_to_destination to apply this option
conditionally, because we may not always want to replace
this metadata, but we definitely do when copying a temporary
upload.
2021-09-10 12:59:51 +10:00
Martin Brennan 841e054907
FIX: Do not prefix temp/ S3 keys with s3_bucket_folder_path in S3Helper ()
This is unnecessary, as when the temporary key is created
in S3Store we already include the s3_bucket_folder_path, and
the key will always start with temp/ to assist with lifecycle
rules for multipart uploads.

This was affecting Discourse.store.object_from_path,
Discourse.store.signed_url_for_path, and possibly others.

See also: e0102a5
2021-08-26 08:50:49 +10:00
Martin Brennan b500949ef6
FEATURE: Initial implementation of direct S3 uploads with uppy and stubs ()
This adds a few different things to allow for direct S3 uploads using uppy. **These changes are still not the default.** There are hidden `enable_experimental_image_uploader` and `enable_direct_s3_uploads`  settings that must be turned on for any of this code to be used, and even if they are turned on only the User Card Background for the user profile actually uses uppy-image-uploader.

A new `ExternalUploadStub` model and database table is introduced in this pull request. This is used to keep track of uploads that are uploaded to a temporary location in S3 with the direct to S3 code, and they are eventually deleted a) when the direct upload is completed and b) after a certain time period of not being used. 

### Starting a direct S3 upload

When an S3 direct upload is initiated with uppy, we first request a presigned PUT URL from the new `generate-presigned-put` endpoint in `UploadsController`. This generates an S3 key in the `temp` folder inside the correct bucket path, along with any metadata from the clientside (e.g. the SHA1 checksum described below). This will also create an `ExternalUploadStub` and store the details of the temp object key and the file being uploaded.

Once the clientside has this URL, uppy will upload the file direct to S3 using the presigned URL. Once the upload is complete we go to the next stage.

### Completing a direct S3 upload

Once the upload to S3 is done we call the new `complete-external-upload` route with the unique identifier of the `ExternalUploadStub` created earlier. Only the user who made the stub can complete the external upload. One of two paths is followed via the `ExternalUploadManager`.

1. If the object in S3 is too large (currently 100mb defined by `ExternalUploadManager::DOWNLOAD_LIMIT`) we do not download and generate the SHA1 for that file. Instead we create the `Upload` record via `UploadCreator` and simply copy it to its final destination on S3 then delete the initial temp file. Several modifications to `UploadCreator` have been made to accommodate this.

2. If the object in S3 is small enough, we download it. When the temporary S3 file is downloaded, we compare the SHA1 checksum generated by the browser with the actual SHA1 checksum of the file generated by ruby. The browser SHA1 checksum is stored on the object in S3 with metadata, and is generated via the `UppyChecksum` plugin. Keep in mind that some browsers will not generate this due to compatibility or other issues.

    We then follow the normal `UploadCreator` path with one exception. To cut down on having to re-upload the file again, if there are no changes (such as resizing etc) to the file in `UploadCreator` we follow the same copy + delete temp path that we do for files that are too large.

3. Finally we return the serialized upload record back to the client

There are several errors that could happen that are handled by `UploadsController` as well.

Also in this PR is some refactoring of `displayErrorForUpload` to handle both uppy and jquery file uploader errors.
2021-07-28 08:42:25 +10:00
Martin Brennan c3394ed9bb
DEV: Update aws-sdk-s3 gem for S3 multipart uploads ()
We are a few versions behind on this gem. We need to update it
for S3 multipart uploads. In the current version we are using, we
cannot do this:

```ruby
Discourse.store.s3_helper.object(key).presigned_url(:upload_part, part_number: 1, upload_id: multipart_upload_id)
```

The S3 client raises an error, saying the operation is undefined. Once
I updated the gem this operation works as expected and returns a
presigned URL for the upload_part operation.

Also remove use of Aws::S3::FileUploader::FIFTEEN_MEGABYTES.
This was part of a private API and should not have been used.
2021-06-25 14:22:31 +10:00
Michael Brown c25dc43f54 FIX: AWS S3 errors don't necessarily include a message
* If the error doesn't have a message, the class name will help
* example:
  before: "Failed to download #{filename} because "
  after: "Failed to download #{filename} because Aws::S3::Errors::NotFound"
2020-08-12 17:00:09 -04:00
Andrew Schleifer aee3c2c34d FIX: attempt to output a useful error message
currently AWS problems will just dump a stack trace
2020-08-05 17:49:42 +08:00
Martin Brennan 8ef782bdbd
FIX: Increase time of DOWNLOAD_URL_EXPIRES_AFTER_SECONDS to 5 minutes ()
* Change S3Helper::DOWNLOAD_URL_EXPIRES_AFTER_SECONDS to 5 minutes, which controls presigned URL expiry and secure-media route cache time.
* This is done because of the composer preview refreshing while typing causes a lot of requests sent to our server because of the short URL expiry. If this ends up being not enough we can always increase the time or explore other avenues (e.g. GitHub has a 7 day validity for secure URLs)
2020-07-03 13:42:36 +10:00
Andrew Schleifer 74d28a43d1
new S3 backup layout ()
* DEV: new S3 backup layout

Currently, with $S3_BACKUP_BUCKET of "bucket/backups", multisite backups
end up in "bucket/backups/backups/dbname/" and single-site will be in
"bucket/backups/".

Both _should_ be in "bucket/backups/dbname/"

- remove MULTISITE_PREFIX,
- always include dbname,
- method to move to the new prefix
- job to call the method

* SPEC: add tests for `VacateLegacyPrefixBackups` onceoff job.

Co-authored-by: Vinoth Kannan <vinothkannan@vinkas.com>
2020-05-29 00:28:23 +05:30
Rafael dos Santos Silva b48299f81c
FEATURE: Add setting to disable automatic CORS rule install in S3 buckets () 2020-05-25 17:09:34 -03:00
Rafael dos Santos Silva 08e4af6636 FEATURE: Add setting to controle the Expect header on S3 calls
Some providers don't implement the Expect: 100-continue support,
which results in a mismatch in the object signature.

With this settings, users can disable the header and use such providers.
2020-04-30 12:12:00 -03:00
Sam Saffron d0d5a138c3
DEV: stop freezing frozen strings
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.

For files with `# frozen_string_literal: true`

```
puts %w{a b}[0].frozen?
=> true

puts "hi".frozen?
=> true

puts "a #{1} b".frozen?
=> true

puts ("a " + "b").frozen?
=> false

puts (-("a " + "b")).frozen?
=> true
```

For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
2020-04-30 16:48:53 +10:00
David Taylor ba616ffb50
DEV: Use a tmp directory for storing uploads in tests ()
This avoids development-mode upload files from polluting the test environment
2020-04-28 14:03:04 +01:00
Robin Ward 80a572d3b7 FIX: Multisite spec was failing in parallel environment
We were not adding the test number to the path in all places.
2020-04-22 14:05:39 -04:00
Penar Musaraj 067696df8f DEV: Apply Rubocop redundant return style 2019-11-14 15:10:51 -05:00
Vinoth Kannan 3e456d5c0b FIX: don't include multisite upload path to source URL if already exist. 2019-08-02 07:57:27 +05:30
Penar Musaraj f00275ded3 FEATURE: Support private attachments when using S3 storage ()
* Support private uploads in S3
* Use localStore for local avatars
* Add job to update private upload ACL on S3
* Test multisite paths
* update ACL for private uploads in migrate_to_s3 task
2019-06-06 13:27:24 +10:00
Vinoth Kannan be0555cc17 FIX: Add bucket folder path only if not exists 2019-05-15 15:37:40 +05:30
Sam Saffron 30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Rishabh ad6ad3f679 DEV: Remove SiteSetting.s3_force_path_style ()
- s3_force_path_style was added as a Minio specific url scheme but it has never been well supported in our code base.
- Our new migrate_to_s3 rake task does not work reliably with path style urls too
- Minio has also added support for virtual style requests i.e the same scheme as AWS S3/DO Spaces so we can rely on that instead of using path style requests.
- Add migration to drop s3_force_path_style from the site_settings table
2019-03-20 14:58:20 +01:00
Vinoth Kannan cc496de10e FIX: Remove double quotes from etag value in API response
https://github.com/aws/aws-sdk-ruby/issues/1134
2019-02-08 14:31:19 +05:30
Gerhard Schlager ba724d7f25 FIX: S3 endpoint broke bucket creation in non-default region 2019-02-05 18:17:02 +01:00
Vinoth Kannan b4f713ca52
FEATURE: Use amazon s3 inventory to manage upload stats () 2019-02-01 10:10:48 +05:30
Rishabh f181e9cc08
FIX: Add compatibility for bucket folder paths in migrate_to_s3 task ()
* FIX: Add compatibility for bucket folder paths in migrate_to_s3 task
* Refactor bucket_name split logic into S3Helper
2019-01-08 20:04:48 +05:30
Vinoth Kannan 82d7f9ce5e fix the build
Checking size for a file object directly will cause issue if it is a closed stream
2019-01-04 13:25:11 +05:30
Vinoth Kannan 940a61037c DEV: Add option to pass s3 client in param 2019-01-04 12:16:09 +05:30
Vinoth Kannan 75dbb98cca FEATURE: Add S3 etag value to uploads table () 2019-01-04 14:16:22 +08:00
Régis Hanol 5381096bfd PERF: new 'migrate_to_s3' rake task 2018-12-26 17:34:49 +01:00
Rishabh cae5ba7356 FIX: Ensure that multisite s3 uploads are tombstoned correctly ()
* FIX: Ensure that multisite uploads are tombstoned into the correct paths

* Move multisite specs to spec/multisite/s3_store_spec.rb
2018-12-19 13:32:32 +08:00
Vinoth Kannan fd272eee44 FEATURE: Make uploads:missing task compatible with s3 uploads 2018-11-27 00:54:51 +05:30
Guo Xiang Tan 84d4c81a26 FEATURE: Support backup uploads/downloads directly to/from S3.
This reverts commit 3c59106bac.
2018-10-15 09:43:31 +08:00
Guo Xiang Tan 3c59106bac Revert "FEATURE: Support backup uploads/downloads directly to/from S3."
This reverts commit c29a4dddc1.

We're doing a beta bump soon so un-revert this after that is done.
2018-10-11 11:08:23 +08:00
Gerhard Schlager c29a4dddc1 FEATURE: Support backup uploads/downloads directly to/from S3. 2018-10-11 10:38:43 +08:00
Rishabh 4f46aa1ba3 FEATURE: Add SiteSetting for s3_configure_tombstone_policy
Add SiteSetting for s3_configure_tombstone_policy, skip policy generation if turned off (default on)
2018-09-17 10:57:50 +10:00