Commit Graph

134 Commits

Author SHA1 Message Date
Roman Rizzi a709b7e861
FIX: Allow sanitized-HTML in GH issues and categories oneboxes. ()
Follow-up to d78357917c

Related meta topic: https://meta.discourse.org/t/html-is-not-render-on-category-onebox-description/289424:
2024-01-22 15:25:29 -03:00
Martin Brennan 30d5e752d7
DEV: Revert guardian changes ()
I took the wrong approach here, need to rethink.

* Revert "FIX: Use Guardian.basic_user instead of new (anon) ()"

This reverts commit 9057272ee2.

* Revert "DEV: Remove unnecessary method_missing from GuardianUser ()"

This reverts commit a5d4bf6dd2.

* Revert "DEV: Improve Guardian devex ()"

This reverts commit 77b6a038ba.

* Revert "FIX: Introduce Guardian::BasicUser for oneboxing checks ()"

This reverts commit de983796e1.
2023-12-06 16:37:32 +10:00
Martin Brennan 9057272ee2
FIX: Use Guardian.basic_user instead of new (anon) ()
c.f. de983796e1

There will soon be additional login_required checks
for Guardian, and the intent of many checks by automated
systems is better fulfilled by using BasicUser, which
simulates a logged in TL0 forum user, rather than an
anon user.

In some cases the use of anon still makes sense (e.g.
anonymous_cache), and in that case the more explicit
`Guardian.anon_user` is used
2023-12-06 11:56:21 +10:00
Martin Brennan de983796e1
FIX: Introduce Guardian::BasicUser for oneboxing checks ()
Through internal discussion, it has become clear that
we need a conceptual Guardian user that bridges the
gap between anon users and a logged in forum user with
an absolute baseline level of access to public topics,
which can be used in cases where:

1. Automated systems are running which shouldn't see any
   private data
1. A baseline level of user access is needed

In this case we are fixing the latter; when oneboxing a local
topic, and we are linking to a topic in another category from
the current one, we need to operate off a baseline level of
access, since not all users have access to the same categories,
and we don't want e.g. editing a post with an internal link to
expose sensitive internal information.
2023-12-05 09:25:23 +10:00
David Taylor e9387e238c
FIX: Do not follow redirects for twitter oneboxes ()
Twitter is now redirecting anonymous users (with a browser-like user agent, which FinalDestination uses) to the login page. Skipping redirect-following for twitter.com will allow us to continue oneboxing tweets via the OpenGraph data and the API (when credentials are present).

https://meta.discourse.org/t/269371/17
2023-06-30 11:30:03 +01:00
Sam 9e241e82e9
DEV: use HTML5 version of loofah ()
https://meta.discourse.org/t/markdown-preview-and-result-differ/263878

The result of this markdown had different results in the composer preview and the post. This is solved by updating Loofah to the latest version and using html5 fragments like our user had reported. While the change was only needed in cooked_post_processor.rb for this fix, other areas also had to be updated due to various side effects.
2023-06-20 09:49:22 +08:00
Penar Musaraj a67c96438c
UX: Fix user onebox layout () 2023-04-28 09:50:49 -04:00
Gerhard Schlager 7fd63b34b1
DEV: Make it obvious that `joined` translation is used by onebox ()
This also moves the date as interpolation key into the string which makes translation easier.
2023-02-03 10:02:14 +08:00
Bianca Nenciu c186a46910
SECURITY: Prevent XSS in local oneboxes ()
Co-authored-by: OsamaSayegh <asooomaasoooma90@gmail.com>
2023-01-25 19:17:21 +02:00
Daniel Waterworth 666536cbd1
DEV: Prefer \A and \z over ^ and $ in regexes () 2023-01-20 12:52:49 -06:00
David Taylor 6417173082
DEV: Apply syntax_tree formatting to `lib/*` 2023-01-09 12:10:19 +00:00
Martin Brennan 9d50790530
FIX: Allow svg in oneboxer in certain cases ()
When doing local oneboxes we sometimes want to allow
SVGs in the final preview HTML. The main case currently
is for the new cooked hashtags, which include an SVG
icon.

SVGs will be included in local oneboxes via `ExcerptParser` _only_
if they have the d-icon class, and if the caller for `post.excerpt`
specifies the `keep_svg: true` option.
2022-11-30 12:42:15 +10:00
Jarek Radosz 49fa2e93c2
DEV: Clean up twitter onebox code () 2022-08-21 19:26:24 +02:00
David Taylor 3c81683955 DEV: Rename `UriHelper.escape_uri` to `.normalized_encode`
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
2022-08-09 11:55:25 +01:00
Martin Brennan 61b9e3ee30
FIX: InlineOneboxer watched word censor error ()
In 7328a2bfb0 we changed the
InlineOneboxer#onebox_for method to run the title of the
onebox through WatchedWord#censor_text. However, it is
allowable for the title to be nil, which was causing this
error in production:

> NoMethodError : undefined method gsub for nil:NilClass

We just need to check whether the title is nil before trying
to censor it.
2022-05-26 14:01:44 +10:00
Bianca Nenciu 6c8f491dc3
DEV: Allow plugins to register Onebox handlers ()
This targets only the local Oneboxes and allows plugins to customize
regular or inline Oneboxes for routes inside the site.
2022-05-23 20:02:02 +03:00
Osama Sayegh d15867463f
FEATURE: Site setting for blocking onebox of URLs that redirect ()
Meta topic: https://meta.discourse.org/t/prevent-to-linkify-when-there-is-a-redirect/226964/2?u=osama.

This commit adds a new site setting `block_onebox_on_redirect` (default off) for blocking oneboxes (full and inline) of URLs that redirect. Note that an initial http → https redirect is still allowed if the redirect location is identical to the source (minus the scheme of course). For example, if a user includes a link to `http://example.com/page` and the link resolves to `https://example.com/page`, then the link will onebox (assuming it can be oneboxed) even if the setting is enabled. The reason for this is a user may type out a URL (i.e. the URL is short and memorizable) with http and since a lot of sites support TLS with http traffic automatically redirected to https, so we should still allow the URL to onebox.
2022-05-23 13:52:06 +03:00
Loïc Guitaut 46176b7dd7 DEV: Don’t patch Sanitize::Config
Currently we’re reopening the `Sanitize::Config` class (which is part of
the `sanitize` gem) to put our custom config for Onebox in it. This is
unnecessary as we can simply create a dedicated module to hold our
custom configuration.
2022-04-06 17:10:51 +02:00
Osama Sayegh b0656f3ed0
FIX: Apply onebox blocked domain checks on every redirect ()
The `blocked onebox domains` setting lets site owners change what sites
are allowed to be oneboxed. When a link is entered into a post,
Discourse checks the domain of the link against that setting and blocks
the onebox if the domain is blocked. But if there's a chain of
redirects, then only the final destination website is checked against
the site setting.

This commit amends that behavior so that every website in the redirect
chain is checked against the site setting, and if anything is blocked
the original link doesn't onebox at all in the post. The
`Discourse-No-Onebox` header is also checked in every response and the
onebox is blocked if the header is set to "1".

Additionally, Discourse will now include the `Discourse-No-Onebox`
header with every response if the site requires login to access content.
This is done to signal to a Discourse instance that it shouldn't attempt
to onebox other Discourse instances if they're login-only. Non-Discourse
websites can also use include that header if they don't wish to have
Discourse onebox their content.

Internal ticket: t59305.
2022-03-11 09:18:12 +03:00
Natalie Tay aac9f43038
Only block domains at the final destination ()
In an earlier PR, we decided that we only want to block a domain if 
the blocked domain in the SiteSetting is the final destination (/t/59305). That 
PR used `FinalDestination#get`. `resolve` however is used several places
 but blocks domains along the redirect chain when certain options are provided.

This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed 
and youtube
  - these folks already go through `Oneboxer#external_onebox` which already
  blocks correctly
2022-01-31 15:35:12 +08:00
David Taylor 03998e0a29
FIX: Use CDN URL for internal onebox avatars ()
This commit will also trigger a background rebake for all existing posts with internal oneboxes
2021-11-25 12:07:34 +00:00
jbrw 2f28ba318c
FEATURE: Onebox can match engines based on the content_type ()
* FEATURE: Onebox can match engines based on the content_type

`FinalDestination` now returns the `content_type` of a resolved URL.

`Oneboxer` passes this value to `Onebox` itself. Onebox engines can now specify a `matches_content_type` regex of content_types that the engine can handle, regardless of the URL.

`ImageOnebox` will match URLs with a content type of `image/png`, `jpg`, `gif`, `bmp`, `tif`, etc.

This will allow images that exist at a URL without a file type extension to be correctly rendered, assuming a valid `content_type` is returned.
2021-07-30 13:36:30 -04:00
Arpit Jalan 05bdbd9f97
SECURITY: Onebox canonical links bypassing FinalDestination checks () 2021-07-01 20:09:29 +05:30
Joffrey JAFFEUX e50b7e9111
SECURITY: ensures timeouts are correctly used on connect () 2021-06-21 17:34:01 +02:00
Bianca Nenciu d184fe59ca
FEATURE: Censor Oneboxes ()
Previously onebox content was not passed by the censor regex, meaning you could sneak in censored words via onebox.
2021-06-03 11:39:12 +10:00
jbrw 461a2c334b
FIX: return an empty result if response from Amazon is missing expected attributes ()
* FIX: return an empty result if response from Amazon is missing attributes

Check we have the basic attributes requires to construct a Onebox for Amazon.

This is an attempt to handle scenarios where we receive a valid 200-status response from an Amazon request that does not include the data we’re expecting.

* Update lib/onebox/engine/amazon_onebox.rb

Co-authored-by: Régis Hanol <regis@hanol.fr>

Co-authored-by: Régis Hanol <regis@hanol.fr>
2021-06-01 16:23:18 -04:00
jbrw a24b6daa87
FIX: An unresolved blank uri should attempt an alternate Oneboxing strategy, if available () 2021-05-14 15:23:20 -04:00
jbrw 19182b1386
DEV: Oneboxer wildcard subdomains ()
* DEV: Allow wildcards in Oneboxer optional domain Site Settings

Allows a wildcard to be used as a subdomain on Oneboxer-related SiteSettings, e.g.:

- `force_get_hosts`
- `cache_onebox_response_body_domains`
- `force_custom_user_agent_hosts`

* DEV: fix typos

* FIX: Try doing a GET after receiving a 500 error from a HEAD

By default we try to do a `HEAD` requests. If this results in a 500 error response, we should try to do a `GET`

* DEV: `force_get_hosts` should be a hidden setting

* DEV: Oneboxer Strategies

Have an alternative oneboxing ‘strategy’ (i.e., set of options) to use when an attempt to generate a Onebox fails. Keep track of any non-default strategies that were used on a particular host, and use that strategy for that host in the future.

Initially, the alternate strategy (`force_get_and_ua`) forces the FinalDestination step of Oneboxing to do a `GET` rather than `HEAD`, and forces a custom user agent.

* DEV: change stubbed return code

The stubbed status code needs to be a value not recognized by FinalDestination
2021-05-13 15:48:35 -04:00
Bianca Nenciu 21d1ee1065
FIX: Use Nokogiri and Loofah consistently ()
CookedPostProcessor used Loofah to parse the cooked content of a post
and Nokogiri to parse cooked Oneboxes. Even though Loofah is built on
top of Nokogiri, replacing an element from the cooked post (a Nokogiri
node) with a parsed onebox (a Loofah node) produced a strange result
which included XML namespaces. Removing the mix and using Loofah
to parse Oneboxes fixed the problem.
2021-04-14 18:09:55 +03:00
jbrw 50252d803e
DEV: stub youtube embed requests ()
* DEV: stub youtube embed requests

* DEV: Ignore redirects on youtube.com when oneboxing
2021-04-07 13:32:27 -04:00
jbrw 68d0916eb5
FEATURE: Oneboxer cache response body ()
* FEATURE: Cache successful HTTP GET requests during Oneboxing

Some oneboxes may fail if when making excessive and/or odd requests against the target domains. This change provides a simple mechanism to cache the results of succesful GET requests as part of the oneboxing process, with the goal of reducing repeated requests and ultimately improving the rate of successful oneboxing.

To enable:

Set `SiteSetting.cache_onebox_response_body` to `true`

Add the domains you’re interesting in caching to `SiteSetting. cache_onebox_response_body_domains` e.g. `example.com|example.org|example.net`

Optionally set `SiteSetting.cache_onebox_user_agent` to a user agent string of your choice to use when making requests against domains in the above list.

* FIX: Swap order of duration and value in redis call

The correct order for `setex` arguments is `key`, `duration`, and `value`.

Duration and value had been flipped, however the code would not have thrown an error because we were caching the value of `1.day.to_i` for a period of 1 seconds… The intention appears to be to set a value of 1 (purely as a flag) for a period of 1 day.
2021-03-31 13:19:34 -04:00
jbrw aed97c7bab
FIX: Add amazon sites to force_get_hosts ()
It has been observed that doing a HEAD against an Amazon store URL may result in a 405 error being returned.

Skipping the HEAD request may result in an improved oneboxing experience when requesting these URLs.
2021-03-10 14:42:17 -05:00
jbrw fff8a24f2b
FIX: Don’t display error if only error is a missing image ()
`Onebox.preview` can return 0-to-n errors, where the errors are missing OpenGraph attributes (e.g. title, description, image, etc.). If any of these attributes are missing, we construct an error message and attach it to the Oneboxer preview HTML. The error message is something like:

 “Sorry, we were unable to generate a preview for this web page, because the following oEmbed / OpenGraph tags could not be found: description, image”

However, if the only missing tag is `image` we don’t need to display the error, as we have enough other data (title, description, etc.) to construct a useful/complete Onebox.
2021-02-25 14:30:40 -05:00
Martin Brennan 13c2a4886f
FEATURE: Add disable_onebox_media_download_controls hidden site setting ()
Uses discourse/onebox@ff9ec90

Adds a hidden site setting called disable_onebox_media_download_controls which will add controlslist="nodownload" to video and audio oneboxes, and also to the local video and audio oneboxes within Discourse.
2021-02-25 12:39:15 +10:00
Dan Ungureanu 2d51833ca9
FIX: Make Oneboxer#apply insert block Oneboxes correctly ()
It used to insert block Oneboxes inside paragraphs which resulted in
invalid HTML. This needed an additional parsing for removal of empty
paragraphs and the resulting HTML could still be invalid.

This commit ensure that block Oneboxes are inserted correctly, by
splitting the paragraph containing the link and putting the block
between the two. Paragraphs left with nothing but whitespaces will
be removed.

Follow up to 7f3a30d79f.
2020-12-14 17:49:37 +02:00
jbrw da9b837da0
DEV: More robust processing of URLs ()
* DEV: More robust processing of URLs

The previous `UrlHelper.encode_component(CGI.unescapeHTML(UrlHelper.unencode(uri))` method would naively process URLs, which could result in a badly formed response.

`Addressable::URI.normalized_encode(uri)` appears to deal with these edge-cases in a more robust way.

* DEV: onebox should use UrlHelper

* DEV: fix spec

* DEV: Escape output when rendering local links
2020-12-03 17:16:01 -05:00
jbrw 51f9a56137
FEATURE: Onebox local categories ()
* FEATURE: onebox for local categories

This commit adjusts the category onebox to look more like the category boxes do on the category page.

Co-authored-by: Jordan Vidrine <jordan@jordanvidrine.com>
2020-11-25 10:53:05 +11:00
jbrw 331236d6d7
Onebox improved error handling and support for Instagram Access Tokens ()
* FEATURE: display error if Oneboxing fails due to HTTP error

- display warning if onebox URL is unresolvable
- display warning if attributes are missing

* FEATURE: Use new Instagram oEmbed endpoint if access token is configured

Instagram requires an Access Token to access their oEmbed endpoint. The requirements (from https://developers.facebook.com/docs/instagram/oembed/) are as follows:

- a Facebook Developer account, which you can create at developers.facebook.com
- a registered Facebook app
- the oEmbed Product added to the app
- an Access Token
- The Facebook app must be in Live Mode

The generated Access Token, once added to SiteSetting.facebook_app_access_token, will be passed to onebox. Onebox can then use this token to access the oEmbed endpoint to generate a onebox for Instagram.

* DEV: update user agent string

* DEV: don’t do HEAD requests against news.yahoo.com

* DEV: Bump onebox version from 2.1.5 to 2.1.6

* DEV: Avoid re-reading templates

* DEV: Tweaks to onebox mustache templates

* DEV: simplified error message for missing onebox data

* Apply suggestions from code review
Co-authored-by: Gerhard Schlager <mail@gerhard-schlager.at>
2020-11-18 12:55:16 -05:00
David Taylor a3577435f7
FEATURE: Additional control of iframes in oneboxes ()
This commit adds a new site setting "allowed_onebox_iframes". By default, all onebox iframes are allowed. When the list of domains is restricted, Onebox will automatically skip engines which require those domains, and use a fallback engine.
2020-08-27 20:12:13 +01:00
Krzysztof Kotlarek e0d9232259
FIX: use allowlist and blocklist terminology ()
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
Dan Ungureanu fe284ffd06
Revert "DEV: Remove useless code ()"
Some oneboxes still generate empty P tags (video oneboxes).

This reverts commit c299d02287.
2020-06-29 13:56:28 +03:00
Dan Ungureanu c299d02287
DEV: Remove useless code ()
protection is not needed and can easily be bypassed with empty divs anyway.
2020-06-29 17:49:30 +10:00
Guo Xiang Tan b28d97b64a
FIX: Bump onebox for twitch video and clips embedding fix. 2020-06-24 11:00:30 +08:00
Régis Hanol 91c89df68a FIX: onebox local topic when using slug-less URL
When linking to a topic in the same Discourse, we try to onebox the link to show the title
and other various information depending on whether it's a "standard" or "inline" onebox.

However, we were not properly detecting links to topics that had no slugs (eg. https://meta.discourse.org/t/1234).
2020-06-23 17:18:38 +02:00
Krzysztof Kotlarek 9bff0882c3
FEATURE: Nokogumbo ()
* FEATURE: Nokogumbo

Use Nokogumbo HTML parser.
2020-05-05 13:46:57 +10:00
Joffrey JAFFEUX fd4ce6ab8f
DEV: hbs extensions are misleading in this case ()
This would also prevent any linting tool to attempt to lint this incorrectly.
2020-03-11 14:42:14 +01:00
Dan Ungureanu ec40242b5c
FIX: Make inline oneboxes work with secured topics in secured contexts () 2020-02-12 12:11:28 +02:00
Penar Musaraj 4b6a47be48 DEV: do not persist force_custom_user_agent_hosts setting
Followup to f029e2
2020-02-06 11:56:54 -05:00
Penar Musaraj f029e2eaf6 FEATURE: Add site setting for specific hosts using custom user agent when oneboxing
Followup to #00c406
2020-02-06 10:32:42 -05:00
Sam Saffron 7f3a30d79f FIX: blank cooked markdown could raise an exception in logs
Previously if somehow a user created a blank markdown document using tag
tricks (eg `<p></p><p></p><p></p><p></p><p></p><p></p>`) and so on, we would
completely strip the document down to blank on post process due to onebox
hack.

Needs a followup cause I am still unclear about the reason for empty p stripping
and it can cause some unclear cases when we re-cook posts.
2020-01-29 11:37:25 +11:00