Commit Graph

190 Commits

Author SHA1 Message Date
Gerhard Schlager 96e87af605
FIX: Rake tasks related to uploads were broken (#17085) 2022-06-14 11:05:03 +02:00
Bianca Nenciu 9db8f00b3d
FEATURE: Create upload_references table (#16146)
This table holds associations between uploads and other models. This can be used to prevent removing uploads that are still in use.

* DEV: Create upload_references
* DEV: Use UploadReference instead of PostUpload
* DEV: Use UploadReference for SiteSetting
* DEV: Use UploadReference for Badge
* DEV: Use UploadReference for Category
* DEV: Use UploadReference for CustomEmoji
* DEV: Use UploadReference for Group
* DEV: Use UploadReference for ThemeField
* DEV: Use UploadReference for ThemeSetting
* DEV: Use UploadReference for User
* DEV: Use UploadReference for UserAvatar
* DEV: Use UploadReference for UserExport
* DEV: Use UploadReference for UserProfile
* DEV: Add method to extract uploads from raw text
* DEV: Use UploadReference for Draft
* DEV: Use UploadReference for ReviewableQueuedPost
* DEV: Use UploadReference for UserProfile's bio_raw
* DEV: Do not copy user uploads to upload references
* DEV: Copy post uploads again after deploy
* DEV: Use created_at and updated_at from uploads table
* FIX: Check if upload site setting is empty
* DEV: Copy user uploads to upload references
* DEV: Make upload extraction less strict
2022-06-09 09:24:30 +10:00
Martin Brennan faf5b4d3e9
PERF: Speed up secure media and ACL sync rake tasks (#16849)
Incorporates learnings from /t/64227:

* Changes the code to set access control posts in the rake
  task to be an efficient UPDATE SQL query.
  The original version was timing out with 312017 post uploads,
  the new query took ~3s to run.
* Changes the code to mark uploads as secure/not secure in
  the rake task to be an efficient UPDATE SQL query rather than
  using UploadSecurity. This took a very long time previously,
  and now takes only a few seconds.
* Spread out ACL syncing for uploads into jobs with batches of
  100 uploads at a time, so they can be parallelized instead
  of having to wait ~1.25 seconds for each ACL to be changed
  in S3 serially.

One issue that still remains is post rebaking. Doing this serially
is painfully slow. We have a way to do this in sidekiq via PeriodicalUpdates
but this is limited by max_old_rebakes_per_15_minutes. It would
be better to fan this rebaking out into jobs like we did for the
ACL sync, but that should be done in another PR.
2022-05-23 13:14:11 +10:00
Loïc Guitaut 008b700a3f DEV: Upgrade to Rails 7
This patch upgrades Rails to version 7.0.2.4.
2022-04-28 11:51:03 +02:00
Jarek Radosz 2fc70c5572
DEV: Correctly tag heredocs (#16061)
This allows text editors to use correct syntax coloring for the heredoc sections.

Heredoc tag names we use:

languages: SQL, JS, RUBY, LUA, HTML, CSS, SCSS, SH, HBS, XML, YAML/YML, MF, ICS
other: MD, TEXT/TXT, RAW, EMAIL
2022-02-28 20:50:55 +01:00
Gerhard Schlager 27f1630b01
DEV: Try to download missing uploads from origin URL (#15629) 2022-01-19 11:05:58 +01:00
Peter Zhu c5fd8c42db
DEV: Fix methods removed in Ruby 3.2 (#15459)
* File.exists? is deprecated and removed in Ruby 3.2 in favor of
File.exist?
* Dir.exists? is deprecated and removed in Ruby 3.2 in favor of
Dir.exist?
2022-01-05 18:45:08 +01:00
Arpit Jalan dec7e19da3
FIX: fix error message for fix_missing_s3 rake task (#13661) 2021-07-07 19:59:03 +05:30
Arpit Jalan 236d6d91b2
FIX: `fix_missing_s3` task fails on failed upload (take 2) (#13660)
ref: 935aadbfdd
2021-07-07 18:53:43 +05:30
Arpit Jalan 935aadbfdd
FIX: do not stop `fix_missing_s3` task if saving an upload failed (#13658)
This commit logs an error and moves to next upload when saving a single
upload record fails when running `uploads:fix_missing_s3` task.
2021-07-07 16:57:24 +05:30
Gerhard Schlager 820068ddaf
FIX: `fix_missing_s3` rake task could fail due to missing upload (#13479) 2021-06-22 17:00:55 +02:00
Josh Soref 59097b207f
DEV: Correct typos and spelling mistakes (#12812)
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
Arpit Jalan 626b8465ba
FIX: do not validate uploads when running `uploads:fix_missing_s3` task (#13096) 2021-05-20 15:29:09 +10:00
Arpit Jalan 130160537c
FEATURE: add support for "skip_validations" option in UploadCreator (#13094)
FIX: do not validate uploads when running `uploads:fix_missing_s3` task
2021-05-19 20:54:52 +05:30
Gerhard Schlager acc73f2989
FIX: `uploads:fix_missing_s3` rake task used wrong SHA1 (#12495) 2021-03-25 11:35:29 +01:00
Martin Brennan f49e3e5731
DEV: Add security_last_changed_at and security_last_changed_reason to uploads (#11860)
This PR adds security_last_changed_at and security_last_changed_reason to uploads. This has been done to make it easier to track down why an upload's secure column has changed and when. This necessitated a refactor of the UploadSecurity class to provide reasons why the upload security would have changed.

As well as this, a source is now provided from the location which called for the upload's security status to be updated as they are several (e.g. post creator, topic security updater, rake tasks, manual change).
2021-01-29 09:03:44 +10:00
Jarek Radosz 650e3d18c4
DEV: Add a note to S3 migration task (#11738)
A followup to #11703
2021-01-18 17:12:47 +01:00
Michael K Johnson 2a23e54632
FIX: remove migrate_from_s3 task that silently corrupts data (#11703)
Transient errors in migration are ignored, silently corrupting
data, and the migration is incomplete and misses many sources of
uploads, which will lead to an incorrect expectation of independence
from the remote object storage after announcing that the migration
was successful, regardles of whether transient errors permanently
corrupted the data.

Remove this migration until such time as it is re-written to
follow the same pattern has the migration to s3, moving the
core logic out of the task.
2021-01-17 22:33:29 +01:00
Neil Lalonde f1743ff69c
FIX: error "unknown attribute verified" in uploads rake tasks
The new column is verification_status.
verified: nil is now verification_status: unchecked
2020-09-21 10:47:52 -04:00
Martin Brennan 80268357e7
DEV: Change upload verified column to be integer (#10643)
Per review https://review.discourse.org/t/dev-add-verified-to-uploads-and-fill-in-s3-inventory-10406/14180

Change the verified column for Upload to a verified_status integer column, to avoid having NULL as a weird implicit status.
2020-09-17 13:35:29 +10:00
Sam Saffron 80e17f5f9e
DEV: Remove staff bypass on fix missing
After thinking about it, I worry that this will potentially leave a site
setting set when people hit ctrl-c ... feels a tiny bit risky, so leaving
it out.
2020-08-28 12:35:35 +10:00
Sam Saffron ed9323043f
DEV: Allow all uploads when fixing missing s3
If we do not do this we can not properly re-download files
2020-08-28 12:28:59 +10:00
Sam Saffron f7314f8ab4
DEV: Improve s3 upload image analysis
- Introduces uploads:delete_missing_s3 which can be used to "give up" and
delete broken records from the database

- Fixes a bug in fix_missing_s3 - crashing on deleted posts

- Adds more info to analyze_missing_s3
2020-08-27 11:50:07 +10:00
Sam Saffron b60c39fdf0
DEV: search more carefully for missing uploads
rake uploads:analyze_missing_s3 was not looking at all places, amend it so
it looks in all the places where uploads could exist.
2020-08-26 17:48:42 +10:00
Sam Saffron 2a7490149c
DEV: don't fail if in uploads:fix_missing_s3 when fix fails
Previously a single error on a file like invalid extension could fail the
entire rake task
2020-08-18 17:55:49 +10:00
Sam Saffron 24fe08230f
FEATURE: ensure posts are rebaked when missing is fixed
This ensures any corrupt optimized images are removed and re-created
2020-08-18 15:37:24 +10:00
Sam Saffron 5be26c7d24
DEV: add error handling in case download fails 2020-08-13 13:48:23 +10:00
Sam Saffron 5011435ec7
DEV: do not correct sha when correctly uploads 2020-08-13 11:52:57 +10:00
Sam Saffron f48fa30ecd
DEV: fix_missing_s3 attempts to re-download if missing
unverified uploads get re-downloaded and corrected if they exist in the task
2020-08-13 11:22:14 +10:00
David Taylor 451b9b245f
DEV: Rename new upload rake tasks
These tasks are s3-specific, so update the names to make that very clear
2020-08-12 23:26:13 +01:00
Sam Saffron ec173a72d9
DEV: add more counts to analyze missing 2020-08-12 17:28:58 +10:00
Sam Saffron 97fe68e980
DEV: look for avatars when analyzing missing uploads 2020-08-12 14:04:21 +10:00
Sam Saffron 990cd8c2e4
FEATURE: introduce tasks for dealing with legacy broken uploads
`rake uploads:analyze_missing` can be used get rich information regarding
uploads missing from s3 (where verified is false)

`rake uploads:fix_missing` is a work in progress task for automatically
correcting certain historic issues. At the moment it simply rebakes all
posts with missing uploads, but it will improve over time
2020-08-12 13:33:04 +10:00
Michael K Johnson 81e6bc7a0f
FEATURE: Add uploads:batch_migrate_from_s3 task to limit total posts migrated at once (#9933)
Allow limiting the number of migrations to do at once, both to do migrations that
have impact limited to multiple off-peak usage hours to reduce user impact from
a migration, and to allow tests that do only a very small number for test
purposes. ("Give me a ping, Vasili. One ping only, please.")
2020-06-04 09:48:11 +10:00
Martin Brennan 6f978bc95c
FIX: First pass to improve efficiency of secure uploads rake task (#9284)
Get rid of harmful each loop over uploads to update. Instead we put all the unique access control posts for the uploads into a map for fast access (vs using the slow .find through array) and look up the post when it is needed when looping through the uploads in batches.

On a Discourse instance with ~93k uploads, a simplified version of the old method takes > 1 minute, and a simplified version of the new method takes ~18s and uses a lot less memory.
2020-03-26 15:59:57 +10:00
Martin Brennan efd5fb665b
DEV: Fix flaky time sensitive uploads.rake specs (#9283)
Also fix issues in spec where certain uploads were not considered secure
2020-03-26 13:31:39 +10:00
Martin Brennan 0388653a4d
DEV: Upload and secure media retroactive rake task improvements (#9027)
* Add uploads:sync_s3_acls rake task to ensure the ACLs in S3 are the correct (public-read or private) setting based on upload security

* Improved uploads:disable_secure_media to be more efficient and provide better messages to the user.

* Rename uploads:ensure_correct_acl task to uploads:secure_upload_analyse_and_update as it does more than check the ACL

* Many improvements to uploads:secure_upload_analyse_and_update

* Make sure that upload.access_control_post is unscoped so deleted posts are still fetched, because they still affect the security of the upload.

* Add escape hatch for capture_stdout in the form of RAILS_ENABLE_TEST_STDOUT. If provided the capture_stdout code will be ignored, so you can see the output if you need.
2020-03-03 10:03:58 +11:00
Martin Brennan cfd56e9159 Include access control post when loading uploads in rake task
* to avoid N+1 query
2020-02-18 10:35:15 +10:00
Martin Brennan 9dcc454a07
FIX: Improvements and fixes for update_upload_acl rake task (#8980)
The rake task was broken, because the addition of the
UploadSecurity check returned true/false instead of the
upload ID to determine which uploads to set secure.
Also it was rebaking the posts in the wrong place and
pretty inefficiently at that. Also it was rebaking before
the upload was being changed to secure in the DB.
This also updates the task to set the access_control_post_id
for all uploads. the first post the upload is linked to is used
for the access control. if the upload doesn't get changed to
secure this doesn't affect anything.
Added a spec for the rake task to cover common cases.
2020-02-17 14:21:43 +10:00
Gerhard Schlager 4e8be6f18b FIX: uploads:s3_migration_status rake task was broken 2020-01-28 22:10:25 +01:00
Martin Brennan 7c32411881
FEATURE: Secure media allowing duplicated uploads with category-level privacy and post-based access rules (#8664)
### General Changes and Duplication

* We now consider a post `with_secure_media?` if it is in a read-restricted category.
* When uploading we now set an upload's secure status straight away.
* When uploading if `SiteSetting.secure_media` is enabled, we do not check to see if the upload already exists using the `sha1` digest of the upload. The `sha1` column of the upload is filled with a `SecureRandom.hex(20)` value which is the same length as `Upload::SHA1_LENGTH`. The `original_sha1` column is filled with the _real_ sha1 digest of the file. 
* Whether an upload `should_be_secure?` is now determined by whether the `access_control_post` is `with_secure_media?` (if there is no access control post then we leave the secure status as is).
* When serializing the upload, we now cook the URL if the upload is secure. This is so it shows up correctly in the composer preview, because we set secure status on upload.

### Viewing Secure Media

* The secure-media-upload URL will take the post that the upload is attached to into account via `Guardian.can_see?` for access permissions
* If there is no `access_control_post` then we just deliver the media. This should be a rare occurrance and shouldn't cause issues as the `access_control_post` is set when `link_post_uploads` is called via `CookedPostProcessor`

### Removed

We no longer do any of these because we do not reuse uploads by sha1 if secure media is enabled.

* We no longer have a way to prevent cross-posting of a secure upload from a private context to a public context.
* We no longer have to set `secure: false` for uploads when uploading for a theme component.
2020-01-16 13:50:27 +10:00
Gerhard Schlager e474cda321 REFACTOR: Restoring of backups and migration of uploads to S3 2020-01-14 11:41:35 +01:00
Martin Brennan abca91cc4d
FEATURE: Add rake task to disable secure media (#8669)
* Add a rake task to disable secure media. This sets all uploads to `secure: false`, changes the upload ACL to public, and rebakes all the posts using the uploads to make sure they point to the correct URLs. This is in a transaction for each upload with the upload being updated the last step, so if the task fails it can be resumed.
* Also allow viewing media via the secure url if secure media is disabled, redirecting to the normal CDN url, because otherwise media links will be broken while we go and rebake all the posts + update ACLs
2020-01-07 12:27:24 +10:00
David Taylor 46d8fd3831 FIX: Allow for nil upload record when migrating to S3 2019-12-04 15:13:39 +00:00
Penar Musaraj 102909edb3 FEATURE: Add support for secure media (#7888)
This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access. 

A few notes: 

- the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads
- the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured
- upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status
- when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error
- when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3
2019-11-18 11:25:42 +10:00
Krzysztof Kotlarek 427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
Michael Brown 503a11cc88 FIX: inline_uploads and subfolder (#8076)
* FIX: inline_uploads and subfolder

* if subfolder, also look for images with a path containing
  cdn_url + relative_url_root

* FIX: migrate_to_s3 task and subfolder
2019-09-11 11:50:48 +10:00
Vinoth Kannan e44d56e4d2 DEV: raise error only when 'STOP_ON_ERROR' env variable is available. 2019-08-01 23:54:06 +05:30
Guo Xiang Tan d21594f4f7 Revert changes added by mistake in 2b19e2acc8. 2019-06-25 15:25:12 +08:00
Guo Xiang Tan 2b19e2acc8 Fix typo in a0aeabbb94. 2019-06-25 15:18:57 +08:00