This commit renames all secure_media related settings to secure_uploads_* along with the associated functionality.
This is being done because "media" does not really cover it, we aren't just doing this for images and videos etc. but for all uploads in the site.
Additionally, in future we want to secure more types of uploads, and enable a kind of "mixed mode" where some uploads are secure and some are not, so keeping media in the name is just confusing.
This also keeps compatibility with the `secure-media-uploads` path, and changes new
secure URLs to be `secure-uploads`.
Deprecated settings:
* secure_media -> secure_uploads
* secure_media_allow_embed_images_in_emails -> secure_uploads_allow_embed_images_in_emails
* secure_media_max_email_embed_image_size_kb -> secure_uploads_max_email_embed_image_size_kb
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
normalized_encode in addressable has a number of issues, including https://github.com/sporkmonger/addressable/issues/472
To temporaily work around those issues for the majority of cases, we try parsing with `::URI`. If that fails (e.g. due to non-ascii characters) then we will fall back to addressable.
Hopefully we can simplify this back to `Addressable::URI.normalized_encode` in the future.
This commit also adds support for unicode domain names and emoji domain names with escape_uri.
This removes an unneeded hack checking for pre-signed urls, which are now handled by the general case due to starting off valid and only being minimally normalized. Previous test case continues to pass.
UrlHelper.s3_presigned_url? which was somewhat wide was removed.
This reverts a part of changes introduced by https://github.com/discourse/discourse/pull/13947
In that PR I:
1. Disallowed topic feature links for TL-0 users
2. Additionally, disallowed just putting any URL in topic titles for TL-0 users
Actually, we don't need the second part. It introduced unnecessary complexity for no good reason. In fact, it tries to do the job that anti-spam plugins (like Akismet plugin) should be doing.
This PR reverts this second change.
This disallows putting URLs in topic titles for TL0 users, which means that:
If a TL-0 user puts a link into the title, a topic featured link won't be generated (as if it was disabled in the site settings)
Server methods for creating and updating topics will be refusing featured links when they are called by TL-0 users
TL-0 users won't be able to put any link into the topic title. For example, the title "Hey, take a look at https://my-site.com" will be rejected.
Also, it improves a bit server behavior when creating or updating feature links on topics in the categories with disabled featured links. Before the server just silently ignored a featured link field that was passed to him, now it will be returning 422 response.
Over the years we accrued many spelling mistakes in the code base.
This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change
- comments
- test descriptions
- other low risk areas
- Only initialize the S3Helper when needed
- Skip initializing the S3Helper for S3Store#cdn_url
- Allow cook_url to be passed a `local` hint to skip unnecessary checks
These are a few small tweaks that slightly improve performance.
- we omitted 1 query from the post guardian which could cause an N+1
- cook_url has been sped up a bit
- url helper avoids re-creating sets for no reason
* DEV: More robust processing of URLs
The previous `UrlHelper.encode_component(CGI.unescapeHTML(UrlHelper.unencode(uri))` method would naively process URLs, which could result in a badly formed response.
`Addressable::URI.normalized_encode(uri)` appears to deal with these edge-cases in a more robust way.
* DEV: onebox should use UrlHelper
* DEV: fix spec
* DEV: Escape output when rendering local links
DEV: Replace instances of Discourse.base_uri with Discourse.base_path
This is clearer because the base_uri is actually just a path prefix. This continues the work started in 555f467.
Upload.secure_media_url? raised an exceptions when the URL was invalid,
which was a issue in some situations where secure media URLs must be
removed.
For example, sending digests used PrettyText.strip_secure_media,
which used Upload.secure_media_url? to replace secure media with
placeholders. If the URL was invalid, then an exception would be raised
and left unhandled.
Now instead in UrlHelper.rails_route_from_url we return nil if there is something wrong with the URL.
Co-authored-by: Bianca Nenciu <nenciu.bianca@gmail.com>
When FinalDestination is given a URL it encodes it before doing anything else. however S3 presigned URLs should not be messed with in any way otherwise we can end up with 400 errors when downloading the URL e.g.
<Error><Code>InvalidToken</Code><Message>The provided token is malformed or otherwise invalid.</Message>
The signature of presigned URLs is very important and is automatically generated and should be preserved.
When pull_hotlinked_images tried to run on posts with secure media (which had already been downloaded from external sources) we were getting a 404 when trying to download the image because the secure endpoint doesn't allow anon downloads.
Also, we were getting into an infinite loop of pull_hotlinked_images because the job didn't consider the secure media URLs as "downloaded" already so it kept trying to download them over and over.
In this PR I have also refactored secure-media-upload URL checks and mutations into single source of truth in Upload, adding a SECURE_MEDIA_ROUTE constant to check URLs against too.
The following methods have long been deprecated in ruby due to flaws in their implementation per http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/29293?29179-31097:
URI.escape
URI.unescape
URI.encode
URI.unencode
escape/encode are just aliases for one another. This PR uses the Addressable gem to replace these methods with its own encode, unencode, and encode_component methods where appropriate.
I have put all references to Addressable::URI here into the UrlHelper to keep them corralled in one place to make changes to this implementation easier.
Addressable is now also an explicit gem dependency.
There was an issue on dev where when uploading secure media, the href of the media was correctly being replaced in the CookedPostProcessor, but the srcset urls were not being replaced correctly. This is because UrlHelper.cook_url was returning the asset host URL for the media for secure media instead of returning early with the proxied secure proxy url.
This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access.
A few notes:
- the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads
- the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured
- upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status
- when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error
- when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
When the S3 store was enabled, we were only applying the S3 CDN.
So all images stored locally, like the emojis, were never put on the local CDN.
Fixed a bunch of CookedPostProcessor test by adding a call to 'optimize_urls'
in order to get final URLs.
I also removed the unnecessary PrettyText.add_s3_cdn method since this is already
handled in the CookedPostProcessor.
Some URLs in browsers are non compliant and contain twos `#` this commit adds
special handling for this edge case by auto encoding any fragments containing `#`
Previously we changed all CDN links to schemaless.
This is desirable for non HTTPS sites, to ease migration to HTTPS.
It is not desirable for secure sites.
Once site is secure or CDN is secure a rebake should be required
to move it back to non-secure.
- new hidden site setting 'migrate_to_new_scheme' (defaults to false)
- new rake tasks to toggle migration to new scheme
- FIX: migrate_to_new_scheme also works with CDN
- PERF: improve perf of the DbHelper.remap method
- REFACTOR: UrlHelper is now a class