Commit Graph

100 Commits

Author SHA1 Message Date
jbrw ea5d303a07
DEV: Increase timeout when pulling hotlinked image (#18036)
We observed that some sites seemingly put us in a tarpit when we attempt to pull hotlinked images. Increasing the timeout will help in these situations.
2022-08-23 09:38:10 -04:00
David Taylor 5238f6788c
FEATURE: Allow hotlinked media to be blocked (#16940)
This commit introduces a new site setting: `block_hotlinked_media`. When enabled, all attempts to hotlink media (images, videos, and audio) will fail, and be replaced with a linked placeholder. Exceptions to the rule can be added via `block_hotlinked_media_exceptions`.

`download_remote_image_to_local` can be used alongside this feature. In that case, hotlinked images will be blocked immediately when the post is created, but will then be replaced with the downloaded version a few seconds later.

This implementation is purely server-side, and does not impact the composer preview.

Technically, there are two stages to this feature:

1. `PrettyText.sanitize_hotlinked_media` is called during `PrettyText.cook`, and whenever new images are introduced by Onebox. It will iterate over all src/srcset attributes in the post HTML and check if they're allowed. If not, the attributes will be removed and replaced with a `data-blocked-hotlinked-src(set)` attribute

2. In the `CookedPostProcessor`, we iterate over all `data-blocked-hotlinked-src(set)` attributes and check whether we have a downloaded version of the media. If yes, we update the src to use the downloaded version. If not, the entire media element is replaced with a placeholder. The placeholder is labelled 'external media', and is a link to the offsite media.
2022-06-07 15:23:04 +01:00
David Taylor bf6f8299a7 FEATURE: Pull hotlinked images immediately after posting
Previously, with the default `editing_grace_period`, hotlinked images were pulled 5 minutes after a post is created. This delay was added to reduce the chance of automated edits clashing with user edits.

This commit refactors things so that we can pull hotlinked images immediately. URLs are immediately updated in the post's `cooked` HTML. The post's raw markdown is updated later, after the `editing_grace_period`.

This involves a number of behind-the-scenes changes including:

- Schedule Jobs::PullHotlinkedImages immediately after Jobs::ProcessPost. Move scheduling to after the `update_column` call to avoid race conditions

- Move raw changes into a separate job, which is delayed until after the ninja-edit window

- Move disable_if_low_on_disk_space logic into the `pull_hotlinked_images` job

- Move raw-parsing/replacing logic into `InlineUpload` so it can be easily be shared between `UpdateHotlinkedRaw` and `PullUserProfileHotlinkedImages`
2022-05-23 14:28:02 +01:00
David Taylor 0baabafa9d DEV: Map already-downloaded hotlinked images in post_process_cooked
Previously this mapping of **cooked** images was only being run for oneboxes. Now it runs for all images, so we can transform hotlinked images without needing to immediately update `raw`
2022-05-23 14:28:02 +01:00
David Taylor c1db968740
DEV: Move hotlinked image information into a dedicated table (#16585)
This will make future changes to the 'pull hotlinked images' system easier. This commit should not introduce any functional change.

For now, the old post_custom_field data is kept in the database. This will be dropped in a future commit.
2022-05-03 13:53:32 +01:00
David Taylor 39ac476db6 FIX: Do not attempt to pull_hotlinked_image for raw_html
raw_html posts (i.e. those which are pulled as part of our comments integration) don't go through our markdown pipeline, so `upload://` URLs are not supported. Running pull_hotlinked_images will break any images in the post.

In future we may add support for pulling hotlinked images in these posts. But for now, disabling it will stop it breaking images.
2022-04-05 16:39:38 +08:00
Bianca Nenciu 6a295ea9e9
DEV: Log more when verbose_upload_logging is enabled (#16177)
A message was logged when download started, but it was not known if a error
during the download.
2022-03-15 23:55:05 +02:00
Bianca Nenciu 1c3c0f04d9
FEATURE: Pull hotlinked images in user bios (#14726) 2021-10-29 17:58:05 +03:00
Dan Ungureanu da2889a7a8
DEV: Add more verbose logging for image uploads (#13270)
Image optimization fails randomly (very rare) without a trace and it is
near impossible to find culprit image, reproduce the issue and attempt
to fix.
2021-06-04 15:13:58 +03:00
Rafael dos Santos Silva 83f332b5a5
FEATURE: Add a site setting to allow emojis to come from an external URL (#12180) 2021-03-02 16:04:16 -03:00
Alan Guo Xiang Tan 0cc178d58b
FIX: Avoid pulling hotlinked images of post that have been deleted. (#11913) 2021-02-03 16:45:07 +11:00
David Taylor cb12a721c4
REFACTOR: Refactor pull_hotlinked_images job
This commit should cause no functional change
- Split into functions to avoid deep nesting
- Register custom field type, and remove manual json parse/serialize
- Recover from deleted upload records

Also adds a test to ensure pull_hotlinked_images redownloads secure images only once
2020-08-05 12:14:59 +01:00
Krzysztof Kotlarek e0d9232259
FIX: use allowlist and blocklist terminology (#10209)
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
David Taylor 17c4f76eac
FIX: Do not attempt to pull_hotlinked on emoji images when CDN enabled (#10091) 2020-06-19 20:21:05 +01:00
David Taylor a99bb0ded4
Revert "FIX: Do not attempt to pull_hotlinked on emoji images when CDN enabled"
This changed cause plugin spec failures and needs further investigation

This reverts commit 78626d2832.
2020-06-19 14:39:16 +01:00
David Taylor 9f2e7e4651
FIX: Handle invalid URLs gracefully when pulling hotlinked images 2020-06-19 12:52:51 +01:00
David Taylor 78626d2832
FIX: Do not attempt to pull_hotlinked on emoji images when CDN enabled 2020-06-19 12:45:06 +01:00
David Taylor ecfce93f28
FIX: Support IRIs (unicode URIs) when pulling hotlinked images (#9928) 2020-05-29 17:47:05 +01:00
David Taylor 28f46c171c
FIX: Pull hotlinked images even when edited by system users (#9890)
Previously the pull hotlinked images job was skipped after system edits. This ensured that we never had an infinite loop of system-edit/pull-hotlinked/system-edit/pull-hotlinked etc.

A side effect was that edits made by system for any other reason (e.g. API, removing full quotes) would prevent pulling hotlinked images. This commit removes the system edit check, and replaces it with another method to avoid an infinite job scheduling loop.
2020-05-29 13:07:47 +01:00
Krzysztof Kotlarek 9bff0882c3
FEATURE: Nokogumbo (#9577)
* FEATURE: Nokogumbo

Use Nokogumbo HTML parser.
2020-05-05 13:46:57 +10:00
Vinoth Kannan 5774107a2d
FIX: downloaded image URLs incorrectly replaced in post raw. (#9014)
Previously, while replacing the downloaded image URL `http://wiki.mozilla.org/images/2/2e/Longcat1.png` similar non-image URL `http://wiki.mozilla.org/images/2` was replaced wrongly.
2020-02-27 10:22:55 +05:30
Martin Brennan ab3bda6cd0
FIX: Mitigate issue where legacy pre-secure hotlinked media would not be redownloaded (#8802)
Basically, say you had already downloaded a certain image from a certain URL
using pull_hotlinked_images and the onebox. The upload would be stored
by its sha as an upload record. Whenever you linked to the same URL again
in a post (e.g. in our case an og:image on review.discourse) we would
would reuse the original upload record because of the sha1.

However when you turned on secure media this could cause problems as
the first post that uses that upload after secure media is enabled
will set the access control post for the upload to the new post.
Then if the post is deleted every single onebox/link to that same image
URL will fail forever with 403 as the secure-media-uploads URL fails
if the access control post has been deleted.

To fix this when cooking posts and pulling hotlinked images, we only
allow using an original upload by URL if its access control post
matches the current post, and if the original_sha1 is filled in,
meaning it was uploaded AFTER secure media was enabled. otherwise
we just redownload the media again to be safe, as the URL will always
be new then.
2020-01-29 10:11:38 +10:00
Martin Brennan 45b37a8bd1
FIX: Resolve pull hotlinked image and broken link issues for secure media URLs (#8777)
When pull_hotlinked_images tried to run on posts with secure media (which had already been downloaded from external sources) we were getting a 404 when trying to download the image because the secure endpoint doesn't allow anon downloads.

Also, we were getting into an infinite loop of pull_hotlinked_images because the job didn't consider the secure media URLs as "downloaded" already so it kept trying to download them over and over.

In this PR I have also refactored secure-media-upload URL checks and mutations into single source of truth in Upload, adding a SECURE_MEDIA_ROUTE constant to check URLs against too.
2020-01-24 11:59:30 +10:00
Krzysztof Kotlarek 427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
David Taylor 67a98946b8 FIX: Do not log 'pull_hotlinked_images' edits in the staff action log 2019-09-12 15:55:45 +01:00
Guo Xiang Tan 8deaef3872 FIX: Don't replace img tags within anchor tags with markdown format.
Follow up to 9a25b0d614.
2019-06-21 12:32:02 +08:00
Guo Xiang Tan c9db897777 FIX: Remove onebox src from `Jobs::PullHotlinkedImages`.
The test that was added is incorrect because the post was not cooked.
2019-06-14 09:21:25 +08:00
Guo Xiang Tan 9daed05ad0 Fix the build. 2019-06-13 13:53:43 +08:00
Guo Xiang Tan f0846ea7cf DEV: Remove unused line. 2019-06-12 17:38:30 +08:00
Guo Xiang Tan fb0a655e8a FEATURE: Update pull hotlinked images to use `Upload#short_url`. 2019-06-11 15:17:29 +08:00
David Taylor 54afa314fb FIX: Do not download emojis in pull_hotlinked_images 2019-06-07 13:00:52 +01:00
Guo Xiang Tan ca6c919299 DEV: Remove unused variable.
Follow up to df1e6eed5a07f49eeda4c0bbd8c63d539aefdb3b..
2019-05-23 16:42:42 +08:00
Guo Xiang Tan df1e6eed5a FIX: Pull hotlinked images for lightbox links as well. 2019-05-23 15:44:37 +08:00
Sam Saffron 30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
David Taylor 95d5819218 FIX: Re-download hotlinked optimized images (#7249)
* FIX: Download local images, even if download remote is disabled
2019-03-27 21:31:12 +01:00
Guo Xiang Tan b6a139b581 Fix broken spec. 2018-09-06 12:41:43 +08:00
Guo Xiang Tan 16c0ebe8a8 Fix the build. 2018-08-17 16:53:07 +08:00
Régis Hanol de92913bf4 FIX: store the topic links using the cooked upload url 2018-08-14 12:23:32 +02:00
Guo Xiang Tan ad5082d969 Make rubocop happy again. 2018-06-07 13:28:18 +08:00
Guo Xiang Tan 142571bba0 Remove use of `rescue nil`.
* `rescue nil` is a really bad pattern to use in our code base.
  We should rescue errors that we expect the code to throw and
  not rescue everything because we're unsure of what errors the
  code would throw. This would reduce the amount of pain we face
  when debugging why something isn't working as expexted. I've
  been bitten countless of times by errors being swallowed as a
  result during debugging sessions.
2018-04-02 13:52:51 +08:00
Guo Xiang Tan 90f91bf017 Fix regression due to ee69d58a59. 2018-03-29 10:01:29 +08:00
Guo Xiang Tan ee69d58a59 FIX: Tests could get stucked in infinite loop if it fails to resolve IP of a hostname. 2018-03-28 14:49:05 +08:00
Guo Xiang Tan 347e4eadbc Don't retry trying to download a file in test. 2018-03-28 12:54:11 +08:00
Régis Hanol 678e28794a FIX: properly handle too large & broken images in posts 2017-11-16 15:45:07 +01:00
Sam 9c22c68d39 FIX: only save custom fields if they actually change 2017-11-16 15:14:10 +11:00
Vinoth Kannan 7b494a65c9 NEW: large image placeholder added in cooked html (#5291) 2017-11-15 11:30:47 +01:00
Régis Hanol c838f43a75 let's not generate an error when logging errors... 2017-10-18 23:14:13 +02:00
Régis Hanol f7282e4ecd use force_https site setting when adding scheme for downloading schemaless images locally 2017-10-12 00:06:24 +02:00
Régis Hanol 4e78abb537 let's try 3 times to download images locally 2017-10-11 23:11:44 +02:00
Sam 70bb2aa426 FEATURE: allow specifying s3 config via globals
This refactors handling of s3 so it can be specified via GlobalSetting

This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3

It is a critical setting for situations where assets are mirrored to s3.
2017-10-06 16:20:01 +11:00