* DEV: Allow wildcards in Oneboxer optional domain Site Settings
Allows a wildcard to be used as a subdomain on Oneboxer-related SiteSettings, e.g.:
- `force_get_hosts`
- `cache_onebox_response_body_domains`
- `force_custom_user_agent_hosts`
* DEV: fix typos
* FIX: Try doing a GET after receiving a 500 error from a HEAD
By default we try to do a `HEAD` requests. If this results in a 500 error response, we should try to do a `GET`
* DEV: `force_get_hosts` should be a hidden setting
* DEV: Oneboxer Strategies
Have an alternative oneboxing ‘strategy’ (i.e., set of options) to use when an attempt to generate a Onebox fails. Keep track of any non-default strategies that were used on a particular host, and use that strategy for that host in the future.
Initially, the alternate strategy (`force_get_and_ua`) forces the FinalDestination step of Oneboxing to do a `GET` rather than `HEAD`, and forces a custom user agent.
* DEV: change stubbed return code
The stubbed status code needs to be a value not recognized by FinalDestination
CookedPostProcessor used Loofah to parse the cooked content of a post
and Nokogiri to parse cooked Oneboxes. Even though Loofah is built on
top of Nokogiri, replacing an element from the cooked post (a Nokogiri
node) with a parsed onebox (a Loofah node) produced a strange result
which included XML namespaces. Removing the mix and using Loofah
to parse Oneboxes fixed the problem.
* FEATURE: Cache successful HTTP GET requests during Oneboxing
Some oneboxes may fail if when making excessive and/or odd requests against the target domains. This change provides a simple mechanism to cache the results of succesful GET requests as part of the oneboxing process, with the goal of reducing repeated requests and ultimately improving the rate of successful oneboxing.
To enable:
Set `SiteSetting.cache_onebox_response_body` to `true`
Add the domains you’re interesting in caching to `SiteSetting. cache_onebox_response_body_domains` e.g. `example.com|example.org|example.net`
Optionally set `SiteSetting.cache_onebox_user_agent` to a user agent string of your choice to use when making requests against domains in the above list.
* FIX: Swap order of duration and value in redis call
The correct order for `setex` arguments is `key`, `duration`, and `value`.
Duration and value had been flipped, however the code would not have thrown an error because we were caching the value of `1.day.to_i` for a period of 1 seconds… The intention appears to be to set a value of 1 (purely as a flag) for a period of 1 day.
It has been observed that doing a HEAD against an Amazon store URL may result in a 405 error being returned.
Skipping the HEAD request may result in an improved oneboxing experience when requesting these URLs.
`Onebox.preview` can return 0-to-n errors, where the errors are missing OpenGraph attributes (e.g. title, description, image, etc.). If any of these attributes are missing, we construct an error message and attach it to the Oneboxer preview HTML. The error message is something like:
“Sorry, we were unable to generate a preview for this web page, because the following oEmbed / OpenGraph tags could not be found: description, image”
However, if the only missing tag is `image` we don’t need to display the error, as we have enough other data (title, description, etc.) to construct a useful/complete Onebox.
Uses discourse/onebox@ff9ec90
Adds a hidden site setting called disable_onebox_media_download_controls which will add controlslist="nodownload" to video and audio oneboxes, and also to the local video and audio oneboxes within Discourse.
It used to insert block Oneboxes inside paragraphs which resulted in
invalid HTML. This needed an additional parsing for removal of empty
paragraphs and the resulting HTML could still be invalid.
This commit ensure that block Oneboxes are inserted correctly, by
splitting the paragraph containing the link and putting the block
between the two. Paragraphs left with nothing but whitespaces will
be removed.
Follow up to 7f3a30d79f.
* DEV: More robust processing of URLs
The previous `UrlHelper.encode_component(CGI.unescapeHTML(UrlHelper.unencode(uri))` method would naively process URLs, which could result in a badly formed response.
`Addressable::URI.normalized_encode(uri)` appears to deal with these edge-cases in a more robust way.
* DEV: onebox should use UrlHelper
* DEV: fix spec
* DEV: Escape output when rendering local links
* FEATURE: onebox for local categories
This commit adjusts the category onebox to look more like the category boxes do on the category page.
Co-authored-by: Jordan Vidrine <jordan@jordanvidrine.com>
* FEATURE: display error if Oneboxing fails due to HTTP error
- display warning if onebox URL is unresolvable
- display warning if attributes are missing
* FEATURE: Use new Instagram oEmbed endpoint if access token is configured
Instagram requires an Access Token to access their oEmbed endpoint. The requirements (from https://developers.facebook.com/docs/instagram/oembed/) are as follows:
- a Facebook Developer account, which you can create at developers.facebook.com
- a registered Facebook app
- the oEmbed Product added to the app
- an Access Token
- The Facebook app must be in Live Mode
The generated Access Token, once added to SiteSetting.facebook_app_access_token, will be passed to onebox. Onebox can then use this token to access the oEmbed endpoint to generate a onebox for Instagram.
* DEV: update user agent string
* DEV: don’t do HEAD requests against news.yahoo.com
* DEV: Bump onebox version from 2.1.5 to 2.1.6
* DEV: Avoid re-reading templates
* DEV: Tweaks to onebox mustache templates
* DEV: simplified error message for missing onebox data
* Apply suggestions from code review
Co-authored-by: Gerhard Schlager <mail@gerhard-schlager.at>
This commit adds a new site setting "allowed_onebox_iframes". By default, all onebox iframes are allowed. When the list of domains is restricted, Onebox will automatically skip engines which require those domains, and use a fallback engine.
When linking to a topic in the same Discourse, we try to onebox the link to show the title
and other various information depending on whether it's a "standard" or "inline" onebox.
However, we were not properly detecting links to topics that had no slugs (eg. https://meta.discourse.org/t/1234).
Previously if somehow a user created a blank markdown document using tag
tricks (eg `<p></p><p></p><p></p><p></p><p></p><p></p>`) and so on, we would
completely strip the document down to blank on post process due to onebox
hack.
Needs a followup cause I am still unclear about the reason for empty p stripping
and it can cause some unclear cases when we re-cook posts.
For consistency this PR introduces using custom markdown and short upload:// URLs for video and audio uploads, rather than just treating them as links and relying on the oneboxer. The markdown syntax for videos is ![file text|video](upload://123456.mp4) and for audio it is ![file text|audio](upload://123456.mp3).
This is achieved in discourse-markdown-it by modifying the rules for images in mardown-it via md.renderer.rules.image. We return HTML instead of the token when we encounter audio or video after | and the preview renders that HTML. Also when uploading an audio or video file we insert the relevant markdown into the composer.
We already cache failed onebox URL requests client-side, we now want to cache this on the server-side for extra protection. failed onebox previews will be cached for 1 hour, and any more requests for that URL will fail with a 404 status. Forcing a rebake via the Rebake HTML action will delete the failed URL cache (like how the oneboxer preview cache is deleted).
Discourse.cache is a more consistent method to use and offers clean fallback
if you are skipping redis
This is part of a larger change that both optimizes Discoruse.cache and omits
use of setex on $redis in favor of consistently using discourse cache
Bench does reveal that use of Rails.cache and Discourse.cache is 1.25x slower
than redis.setex / get so a re-implementation will follow prior to porting
This PR introduces a new secure media setting. When enabled, it prevent unathorized access to media uploads (files of type image, video and audio). When the `login_required` setting is enabled, then all media uploads will be protected from unauthorized (anonymous) access. When `login_required`is disabled, only media in private messages will be protected from unauthorized access.
A few notes:
- the `prevent_anons_from_downloading_files` setting no longer applies to audio and video uploads
- the `secure_media` setting can only be enabled if S3 uploads are already enabled and configured
- upload records have a new column, `secure`, which is a boolean `true/false` of the upload's secure status
- when creating a public post with an upload that has already been uploaded and is marked as secure, the post creator will raise an error
- when enabling or disabling the setting on a site with existing uploads, the rake task `uploads:ensure_correct_acl` should be used to update all uploads' secure status and their ACL on S3
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains.
We no longer need to use Rails "require_dependency" anywhere and instead can just use standard
Ruby patterns to require files.
This is a far reaching change and we expect some followups here.
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
Previously, the site setting was only effective on the client side of
things. Once the site setting was been reached, all oneboxes are not
rendered. This commit changes it such that the site setting is respected
both on the client and server side. The first N oneboxes are rendered and
once the limit has been reached, subsequent oneboxes will not be
rendered.