discourse

Commit Graph

Author	SHA1	Message	Date
David Taylor	5238f6788c	FEATURE: Allow hotlinked media to be blocked (#16940 ) This commit introduces a new site setting: `block_hotlinked_media`. When enabled, all attempts to hotlink media (images, videos, and audio) will fail, and be replaced with a linked placeholder. Exceptions to the rule can be added via `block_hotlinked_media_exceptions`. `download_remote_image_to_local` can be used alongside this feature. In that case, hotlinked images will be blocked immediately when the post is created, but will then be replaced with the downloaded version a few seconds later. This implementation is purely server-side, and does not impact the composer preview. Technically, there are two stages to this feature: 1. `PrettyText.sanitize_hotlinked_media` is called during `PrettyText.cook`, and whenever new images are introduced by Onebox. It will iterate over all src/srcset attributes in the post HTML and check if they're allowed. If not, the attributes will be removed and replaced with a `data-blocked-hotlinked-src(set)` attribute 2. In the `CookedPostProcessor`, we iterate over all `data-blocked-hotlinked-src(set)` attributes and check whether we have a downloaded version of the media. If yes, we update the src to use the downloaded version. If not, the entire media element is replaced with a placeholder. The placeholder is labelled 'external media', and is a link to the offsite media.	2022-06-07 15:23:04 +01:00
David Taylor	bf6f8299a7	FEATURE: Pull hotlinked images immediately after posting Previously, with the default `editing_grace_period`, hotlinked images were pulled 5 minutes after a post is created. This delay was added to reduce the chance of automated edits clashing with user edits. This commit refactors things so that we can pull hotlinked images immediately. URLs are immediately updated in the post's `cooked` HTML. The post's raw markdown is updated later, after the `editing_grace_period`. This involves a number of behind-the-scenes changes including: - Schedule Jobs::PullHotlinkedImages immediately after Jobs::ProcessPost. Move scheduling to after the `update_column` call to avoid race conditions - Move raw changes into a separate job, which is delayed until after the ninja-edit window - Move disable_if_low_on_disk_space logic into the `pull_hotlinked_images` job - Move raw-parsing/replacing logic into `InlineUpload` so it can be easily be shared between `UpdateHotlinkedRaw` and `PullUserProfileHotlinkedImages`	2022-05-23 14:28:02 +01:00
David Taylor	0baabafa9d	DEV: Map already-downloaded hotlinked images in post_process_cooked Previously this mapping of cooked images was only being run for oneboxes. Now it runs for all images, so we can transform hotlinked images without needing to immediately update `raw`	2022-05-23 14:28:02 +01:00
David Taylor	c1db968740	DEV: Move hotlinked image information into a dedicated table (#16585 ) This will make future changes to the 'pull hotlinked images' system easier. This commit should not introduce any functional change. For now, the old post_custom_field data is kept in the database. This will be dropped in a future commit.	2022-05-03 13:53:32 +01:00
David Taylor	39ac476db6	FIX: Do not attempt to pull_hotlinked_image for raw_html raw_html posts (i.e. those which are pulled as part of our comments integration) don't go through our markdown pipeline, so `upload://` URLs are not supported. Running pull_hotlinked_images will break any images in the post. In future we may add support for pulling hotlinked images in these posts. But for now, disabling it will stop it breaking images.	2022-04-05 16:39:38 +08:00
Bianca Nenciu	6a295ea9e9	DEV: Log more when verbose_upload_logging is enabled (#16177 ) A message was logged when download started, but it was not known if a error during the download.	2022-03-15 23:55:05 +02:00
Bianca Nenciu	1c3c0f04d9	FEATURE: Pull hotlinked images in user bios (#14726 )	2021-10-29 17:58:05 +03:00
Dan Ungureanu	da2889a7a8	DEV: Add more verbose logging for image uploads (#13270 ) Image optimization fails randomly (very rare) without a trace and it is near impossible to find culprit image, reproduce the issue and attempt to fix.	2021-06-04 15:13:58 +03:00
Rafael dos Santos Silva	83f332b5a5	FEATURE: Add a site setting to allow emojis to come from an external URL (#12180 )	2021-03-02 16:04:16 -03:00
Alan Guo Xiang Tan	0cc178d58b	FIX: Avoid pulling hotlinked images of post that have been deleted. (#11913 )	2021-02-03 16:45:07 +11:00
David Taylor	cb12a721c4	REFACTOR: Refactor pull_hotlinked_images job This commit should cause no functional change - Split into functions to avoid deep nesting - Register custom field type, and remove manual json parse/serialize - Recover from deleted upload records Also adds a test to ensure pull_hotlinked_images redownloads secure images only once	2020-08-05 12:14:59 +01:00
Krzysztof Kotlarek	e0d9232259	FIX: use allowlist and blocklist terminology (#10209 ) This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.	2020-07-27 10:23:54 +10:00
David Taylor	17c4f76eac	FIX: Do not attempt to pull_hotlinked on emoji images when CDN enabled (#10091 )	2020-06-19 20:21:05 +01:00
David Taylor	a99bb0ded4	Revert "FIX: Do not attempt to pull_hotlinked on emoji images when CDN enabled" This changed cause plugin spec failures and needs further investigation This reverts commit `78626d2832`.	2020-06-19 14:39:16 +01:00
David Taylor	9f2e7e4651	FIX: Handle invalid URLs gracefully when pulling hotlinked images	2020-06-19 12:52:51 +01:00
David Taylor	78626d2832	FIX: Do not attempt to pull_hotlinked on emoji images when CDN enabled	2020-06-19 12:45:06 +01:00
David Taylor	ecfce93f28	FIX: Support IRIs (unicode URIs) when pulling hotlinked images (#9928 )	2020-05-29 17:47:05 +01:00
David Taylor	28f46c171c	FIX: Pull hotlinked images even when edited by system users (#9890 ) Previously the pull hotlinked images job was skipped after system edits. This ensured that we never had an infinite loop of system-edit/pull-hotlinked/system-edit/pull-hotlinked etc. A side effect was that edits made by system for any other reason (e.g. API, removing full quotes) would prevent pulling hotlinked images. This commit removes the system edit check, and replaces it with another method to avoid an infinite job scheduling loop.	2020-05-29 13:07:47 +01:00
Krzysztof Kotlarek	9bff0882c3	FEATURE: Nokogumbo (#9577 ) * FEATURE: Nokogumbo Use Nokogumbo HTML parser.	2020-05-05 13:46:57 +10:00
Vinoth Kannan	5774107a2d	FIX: downloaded image URLs incorrectly replaced in post raw. (#9014 ) Previously, while replacing the downloaded image URL `http://wiki.mozilla.org/images/2/2e/Longcat1.png` similar non-image URL `http://wiki.mozilla.org/images/2` was replaced wrongly.	2020-02-27 10:22:55 +05:30
Martin Brennan	ab3bda6cd0	FIX: Mitigate issue where legacy pre-secure hotlinked media would not be redownloaded (#8802 ) Basically, say you had already downloaded a certain image from a certain URL using pull_hotlinked_images and the onebox. The upload would be stored by its sha as an upload record. Whenever you linked to the same URL again in a post (e.g. in our case an og:image on review.discourse) we would would reuse the original upload record because of the sha1. However when you turned on secure media this could cause problems as the first post that uses that upload after secure media is enabled will set the access control post for the upload to the new post. Then if the post is deleted every single onebox/link to that same image URL will fail forever with 403 as the secure-media-uploads URL fails if the access control post has been deleted. To fix this when cooking posts and pulling hotlinked images, we only allow using an original upload by URL if its access control post matches the current post, and if the original_sha1 is filled in, meaning it was uploaded AFTER secure media was enabled. otherwise we just redownload the media again to be safe, as the URL will always be new then.	2020-01-29 10:11:38 +10:00
Martin Brennan	45b37a8bd1	FIX: Resolve pull hotlinked image and broken link issues for secure media URLs (#8777 ) When pull_hotlinked_images tried to run on posts with secure media (which had already been downloaded from external sources) we were getting a 404 when trying to download the image because the secure endpoint doesn't allow anon downloads. Also, we were getting into an infinite loop of pull_hotlinked_images because the job didn't consider the secure media URLs as "downloaded" already so it kept trying to download them over and over. In this PR I have also refactored secure-media-upload URL checks and mutations into single source of truth in Upload, adding a SECURE_MEDIA_ROUTE constant to check URLs against too.	2020-01-24 11:59:30 +10:00
Krzysztof Kotlarek	427d54b2b0	DEV: Upgrading Discourse to Zeitwerk (#8098 ) Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. We no longer need to use Rails "require_dependency" anywhere and instead can just use standard Ruby patterns to require files. This is a far reaching change and we expect some followups here.	2019-10-02 14:01:53 +10:00
David Taylor	67a98946b8	FIX: Do not log 'pull_hotlinked_images' edits in the staff action log	2019-09-12 15:55:45 +01:00
Guo Xiang Tan	8deaef3872	FIX: Don't replace img tags within anchor tags with markdown format. Follow up to `9a25b0d614`.	2019-06-21 12:32:02 +08:00
Guo Xiang Tan	c9db897777	FIX: Remove onebox src from `Jobs::PullHotlinkedImages`. The test that was added is incorrect because the post was not cooked.	2019-06-14 09:21:25 +08:00
Guo Xiang Tan	9daed05ad0	Fix the build.	2019-06-13 13:53:43 +08:00
Guo Xiang Tan	f0846ea7cf	DEV: Remove unused line.	2019-06-12 17:38:30 +08:00
Guo Xiang Tan	fb0a655e8a	FEATURE: Update pull hotlinked images to use `Upload#short_url`.	2019-06-11 15:17:29 +08:00
David Taylor	54afa314fb	FIX: Do not download emojis in pull_hotlinked_images	2019-06-07 13:00:52 +01:00
Guo Xiang Tan	ca6c919299	DEV: Remove unused variable. Follow up to df1e6eed5a07f49eeda4c0bbd8c63d539aefdb3b..	2019-05-23 16:42:42 +08:00
Guo Xiang Tan	df1e6eed5a	FIX: Pull hotlinked images for lightbox links as well.	2019-05-23 15:44:37 +08:00
Sam Saffron	30990006a9	DEV: enable frozen string literal on all files This reduces chances of errors where consumers of strings mutate inputs and reduces memory usage of the app. Test suite passes now, but there may be some stuff left, so we will run a few sites on a branch prior to merging	2019-05-13 09:31:32 +08:00
David Taylor	95d5819218	FIX: Re-download hotlinked optimized images (#7249 ) * FIX: Download local images, even if download remote is disabled	2019-03-27 21:31:12 +01:00
Guo Xiang Tan	b6a139b581	Fix broken spec.	2018-09-06 12:41:43 +08:00
Guo Xiang Tan	16c0ebe8a8	Fix the build.	2018-08-17 16:53:07 +08:00
Régis Hanol	de92913bf4	FIX: store the topic links using the cooked upload url	2018-08-14 12:23:32 +02:00
Guo Xiang Tan	ad5082d969	Make rubocop happy again.	2018-06-07 13:28:18 +08:00
Guo Xiang Tan	142571bba0	Remove use of `rescue nil`. * `rescue nil` is a really bad pattern to use in our code base. We should rescue errors that we expect the code to throw and not rescue everything because we're unsure of what errors the code would throw. This would reduce the amount of pain we face when debugging why something isn't working as expexted. I've been bitten countless of times by errors being swallowed as a result during debugging sessions.	2018-04-02 13:52:51 +08:00
Guo Xiang Tan	90f91bf017	Fix regression due to `ee69d58a59`.	2018-03-29 10:01:29 +08:00
Guo Xiang Tan	ee69d58a59	FIX: Tests could get stucked in infinite loop if it fails to resolve IP of a hostname.	2018-03-28 14:49:05 +08:00
Guo Xiang Tan	347e4eadbc	Don't retry trying to download a file in test.	2018-03-28 12:54:11 +08:00
Régis Hanol	678e28794a	FIX: properly handle too large & broken images in posts	2017-11-16 15:45:07 +01:00
Sam	9c22c68d39	FIX: only save custom fields if they actually change	2017-11-16 15:14:10 +11:00
Vinoth Kannan	7b494a65c9	NEW: large image placeholder added in cooked html (#5291 )	2017-11-15 11:30:47 +01:00
Régis Hanol	c838f43a75	let's not generate an error when logging errors...	2017-10-18 23:14:13 +02:00
Régis Hanol	f7282e4ecd	use force_https site setting when adding scheme for downloading schemaless images locally	2017-10-12 00:06:24 +02:00
Régis Hanol	4e78abb537	let's try 3 times to download images locally	2017-10-11 23:11:44 +02:00
Sam	70bb2aa426	FEATURE: allow specifying s3 config via globals This refactors handling of s3 so it can be specified via GlobalSetting This means that in a multisite environment you can configure s3 uploads without actual sites knowing credentials in s3 It is a critical setting for situations where assets are mirrored to s3.	2017-10-06 16:20:01 +11:00
Sam	8ecf313a81	FIX: correctly raise errors when downloads fail This corrects an issue where we are hitting Gravatar for 404 over and over Also ensures file download properly reports errors	2017-09-28 16:35:43 +10:00

1 2

99 Commits