If the crawled page returned an error, `FinalDestination#safe_get`
yielded `nil` for `uri` and `chunk` arguments. Another problem is that
`get` did not handle the case when `safe_get` failed and did not return
the `location` and `set_cookie` headers.
The `blocked onebox domains` setting lets site owners change what sites
are allowed to be oneboxed. When a link is entered into a post,
Discourse checks the domain of the link against that setting and blocks
the onebox if the domain is blocked. But if there's a chain of
redirects, then only the final destination website is checked against
the site setting.
This commit amends that behavior so that every website in the redirect
chain is checked against the site setting, and if anything is blocked
the original link doesn't onebox at all in the post. The
`Discourse-No-Onebox` header is also checked in every response and the
onebox is blocked if the header is set to "1".
Additionally, Discourse will now include the `Discourse-No-Onebox`
header with every response if the site requires login to access content.
This is done to signal to a Discourse instance that it shouldn't attempt
to onebox other Discourse instances if they're login-only. Non-Discourse
websites can also use include that header if they don't wish to have
Discourse onebox their content.
Internal ticket: t59305.
In an earlier PR, we decided that we only want to block a domain if
the blocked domain in the SiteSetting is the final destination (/t/59305). That
PR used `FinalDestination#get`. `resolve` however is used several places
but blocks domains along the redirect chain when certain options are provided.
This commit changes the default options for `resolve` to not do that. Existing
users of `FinalDestination#resolve` are
- `Oneboxer#external_onebox`
- our onebox helper `fetch_html_doc`, which is used in amazon, standard embed
and youtube
- these folks already go through `Oneboxer#external_onebox` which already
blocks correctly
Our previous implementation used a simple `blocked_domain_array.include?(hostname)`
so some values were not matching. Additionally, in some configurations like ours, we'd used
"cat.*.dog.com" with the assumption we'd support globbing.
This change implicitly allows globbing by blocking "http://a.b.com" if "b.com" is a blocked
domain but does not actively do anything for "*".
An upcoming change might include frontend validation for values that can be inserted.
* FIX: Inline onebox should use encoding from Content-Type header when present
* Use Regexp.last_match(1)
Signed-off-by: OsamaSayegh <asooomaasoooma90@gmail.com>
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains.
We no longer need to use Rails "require_dependency" anywhere and instead can just use standard
Ruby patterns to require files.
This is a far reaching change and we expect some followups here.
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.
Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging