mirror of
https://github.com/discourse/discourse.git
synced 2025-02-08 04:18:23 +00:00
In 95a82d608d6377faf68a0e2c5d9640b043557852, we lowered the default for `Onebox.options.max_download_kb` from 10mb to 2mb for security hardening purposes. However, this resulted in multiple bug reports where seemingly nomral URLs stopped being oneboxed. It turns out that lowering `Onebox.options.max_download_kb` resulted in `Onebox::Helpers::DownloadTooLarge` being raised more often for more URLs in `Onebox::Helpers.fetch_response` which `Onebox::Helpers.fetch_html_doc` relies on. When `Onebox::Helpers::DownloadTooLarge` is raised in `Onebox::Helpers.fetch_response`, we throw away whatever response body which we have already downloaded at that point. This is not ideal because Nokogiri can parse incomplete HTML documents and there is a really high chance that the incomplete HTML document still contains the information which we need for oneboxing. Therefore, this commit updates `Onebox::Helpers.fetch_html_doc` to not throw away the response body when the size of the response body exceeds `Onebox.options.max_download_size`. Instead, we just take whatever response which we have and get Nokogiri to parse it.