Commit Graph

18 Commits

Author SHA1 Message Date
Arpit Jalan b059a0f789 extract url escaping to a dedicated class method and improved tests 2017-07-29 22:16:51 +05:30
Arpit Jalan 1fe553873c FIX: preserve fragment identifier when escaping url 2017-07-29 17:22:45 +05:30
Guo Xiang Tan 5012d46cbd Add rubocop to our build. (#5004) 2017-07-28 10:20:09 +09:00
Guo Xiang Tan b534778f46 FIX: Escape URL before attempting to resolve it. 2017-07-18 10:04:24 +09:00
Guo Xiang Tan 089a1bd3be Specify the error that we want to ignore instead of rescuing all errors. 2017-07-18 09:55:52 +09:00
Robin Ward db485ae0da FIX: Support for skipping redirects on certain domains (like steam) 2017-06-26 15:38:43 -04:00
Robin Ward 7366f334b0 FIX: Try a GET for error code 409 too -- (Medium posts) 2017-06-15 15:09:59 -04:00
Robin Ward 009f0921dc FEATURE: Whitelist hosts for internal crawling 2017-06-13 12:59:54 -04:00
Robin Ward a3729b51eb FIX: Always allow the host the forum is hosted on 2017-06-12 13:22:51 -04:00
Robin Ward 53b95f009f FIX: If HEAD is not supported, try GET. Also set cookies 2017-06-06 13:53:49 -04:00
Robin Ward 0a08c18a14 FIX: Don't rate limit gravatar downloads 2017-05-24 13:54:26 -04:00
Robin Ward 3b0cbf7013 FIX: Always allow downloads from CDN 2017-05-23 16:32:54 -04:00
Robin Ward b81e7be9a1 FEATURE: Rate limit how often we'll crawl a destination IP 2017-05-23 15:03:04 -04:00
Robin Ward 36e477750c FIX: Use same code path for downloading images 2017-05-23 14:51:30 -04:00
Robin Ward e5e7a15a85 SECURITY: Never crawl by IP 2017-05-23 13:07:18 -04:00
Robin Ward 93a5fc62bf FEATURE: A site setting to prevent crawling on private IP blocks 2017-05-23 11:56:06 -04:00
Robin Ward b8d78b33c6 FIX: Other content types like images are fine 2017-05-22 16:51:37 -04:00
Robin Ward b23fc2bf84 Helper to find the final destination for a URL 2017-05-22 15:52:41 -04:00