discourse

Commit Graph

Author	SHA1	Message	Date
Daniel Waterworth	666536cbd1	DEV: Prefer \A and \z over ^ and $ in regexes (#19936 )	2023-01-20 12:52:49 -06:00
David Taylor	6417173082	DEV: Apply syntax_tree formatting to `lib/*`	2023-01-09 12:10:19 +00:00
Ted Johansson	06db264f24	FIX: Gracefully handle DNS issued from SSRF lookup when inline oneboxing (#19631 ) There is an issue where chat message processing breaks due to unhandles `SocketError` exceptions originating in the SSRF check, specifically in `FinalDestination::Resolver`. This change gives `FinalDestination::SSRFDetector` a new error class to wrap the `SocketError` in, and haves the `RetrieveTitle` class handle that error gracefully.	2022-12-28 10:30:20 +08:00
Sam	60685e6984	DEV: improve comment (#18041 ) improves comment about caught exception to match implementation	2022-08-23 15:14:24 +10:00
Sam	df04462475	FIX: ignore malformed HTML for title extraction (#18040 ) Certain HTML can be rejected by nokogumbo, specifically cases where there are enormous amounts of attributes This ensures that malformed HTML is simply skipped instead of leaking out an exception and terminating downstream processes.	2022-08-23 15:03:57 +10:00
Sérgio Saquetim	300f835703	DEV: Supress logs when RetrieveTitle.crawl fails with Net::ReadTimeout errors (#16971 ) This PR changes the rescue block to rescue only Net::TimeoutError exceptions and removes the log line to prevent clutter the logs with errors that are ignored. Other errors can bubble up because they're errors we probably want to know about	2022-06-09 16:30:22 -03:00
Osama Sayegh	d15867463f	FEATURE: Site setting for blocking onebox of URLs that redirect (#16881 ) Meta topic: https://meta.discourse.org/t/prevent-to-linkify-when-there-is-a-redirect/226964/2?u=osama. This commit adds a new site setting `block_onebox_on_redirect` (default off) for blocking oneboxes (full and inline) of URLs that redirect. Note that an initial http → https redirect is still allowed if the redirect location is identical to the source (minus the scheme of course). For example, if a user includes a link to `http://example.com/page` and the link resolves to `https://example.com/page`, then the link will onebox (assuming it can be oneboxed) even if the setting is enabled. The reason for this is a user may type out a URL (i.e. the URL is short and memorizable) with http and since a lot of sites support TLS with http traffic automatically redirected to https, so we should still allow the URL to onebox.	2022-05-23 13:52:06 +03:00
Dan Ungureanu	8e9cbe9db4	FIX: Do not raise if title cannot be crawled (#16247 ) If the crawled page returned an error, `FinalDestination#safe_get` yielded `nil` for `uri` and `chunk` arguments. Another problem is that `get` did not handle the case when `safe_get` failed and did not return the `location` and `set_cookie` headers.	2022-03-22 20:13:27 +02:00
Osama Sayegh	ddde94f925	FIX: return nil when RetrieveTitle.crawl fails (#16167 )	2022-03-11 23:53:10 +03:00
Osama Sayegh	b0656f3ed0	FIX: Apply onebox blocked domain checks on every redirect (#16150 ) The `blocked onebox domains` setting lets site owners change what sites are allowed to be oneboxed. When a link is entered into a post, Discourse checks the domain of the link against that setting and blocks the onebox if the domain is blocked. But if there's a chain of redirects, then only the final destination website is checked against the site setting. This commit amends that behavior so that every website in the redirect chain is checked against the site setting, and if anything is blocked the original link doesn't onebox at all in the post. The `Discourse-No-Onebox` header is also checked in every response and the onebox is blocked if the header is set to "1". Additionally, Discourse will now include the `Discourse-No-Onebox` header with every response if the site requires login to access content. This is done to signal to a Discourse instance that it shouldn't attempt to onebox other Discourse instances if they're login-only. Non-Discourse websites can also use include that header if they don't wish to have Discourse onebox their content. Internal ticket: t59305.	2022-03-11 09:18:12 +03:00
Krzysztof Kotlarek	803fd7289d	FIX: inline onebox for github (#15859 ) Increase size of downloaded HTML for Github when getting title for inline Onebox.	2022-02-09 22:53:27 +01:00
Natalie Tay	aac9f43038	Only block domains at the final destination (#15689 ) In an earlier PR, we decided that we only want to block a domain if the blocked domain in the SiteSetting is the final destination (/t/59305). That PR used `FinalDestination#get`. `resolve` however is used several places but blocks domains along the redirect chain when certain options are provided. This commit changes the default options for `resolve` to not do that. Existing users of `FinalDestination#resolve` are - `Oneboxer#external_onebox` - our onebox helper `fetch_html_doc`, which is used in amazon, standard embed and youtube - these folks already go through `Oneboxer#external_onebox` which already blocks correctly	2022-01-31 15:35:12 +08:00
Natalie Tay	f5ea00c73f	FIX: Respect blocked domains list when redirecting (#15656 ) Our previous implementation used a simple `blocked_domain_array.include?(hostname)` so some values were not matching. Additionally, in some configurations like ours, we'd used "cat..dog.com" with the assumption we'd support globbing. This change implicitly allows globbing by blocking "http://a.b.com" if "b.com" is a blocked domain but does not actively do anything for "". An upcoming change might include frontend validation for values that can be inserted.	2022-01-20 14:12:34 +08:00
Arpit Jalan	763f48abc7	FIX: increase chunk size to fetch title tag correctly (#14144 )	2021-09-03 13:15:58 +05:30
Arpit Jalan	0adeddde61	FIX: follow redirects for inline/mini onebox (#13512 )	2021-06-24 19:53:39 +05:30
Osama Sayegh	558e9dd310	FIX: Inline Onebox should use encoding from Content-Type header when present (#11625 ) * FIX: Inline onebox should use encoding from Content-Type header when present * Use Regexp.last_match(1) Signed-off-by: OsamaSayegh <asooomaasoooma90@gmail.com>	2021-01-04 22:32:08 +03:00
Krzysztof Kotlarek	9bff0882c3	FEATURE: Nokogumbo (#9577 ) * FEATURE: Nokogumbo Use Nokogumbo HTML parser.	2020-05-05 13:46:57 +10:00
Penar Musaraj	067696df8f	DEV: Apply Rubocop redundant return style	2019-11-14 15:10:51 -05:00
Krzysztof Kotlarek	427d54b2b0	DEV: Upgrading Discourse to Zeitwerk (#8098 ) Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. We no longer need to use Rails "require_dependency" anywhere and instead can just use standard Ruby patterns to require files. This is a far reaching change and we expect some followups here.	2019-10-02 14:01:53 +10:00
Sam Saffron	30990006a9	DEV: enable frozen string literal on all files This reduces chances of errors where consumers of strings mutate inputs and reduces memory usage of the app. Test suite passes now, but there may be some stuff left, so we will run a few sites on a branch prior to merging	2019-05-13 09:31:32 +08:00
Guo Xiang Tan	ad5082d969	Make rubocop happy again.	2018-06-07 13:28:18 +08:00
Sam	f946db4afe	FIX: inline oneboxer min title length of 2 also: cache mini onebox misses as well to cut down traffic	2018-01-30 08:40:04 +11:00
Sam	fa5880e04f	PERF: ability to crawl for titles without extra HEAD req Also, introduces a much more aggressive timeout for title crawling and introduces gzip to body that is crawled	2018-01-29 15:40:12 +11:00
Robin Ward	07e84a3afa	FIX: Hack our title retriever so that it parses YouTube URLs	2017-09-28 09:30:22 -04:00
Sam	f6bc572fb8	FEATURE: option to enable inline oneboxes for all domains Also, change to prefer title over open graph which is often way too sparse	2017-08-02 14:27:31 -04:00
Robin Ward	2f8f2aa1dd	FEATURE: Whitelists for inline oneboxing	2017-07-21 15:41:47 -04:00

26 Commits