Guo Xiang Tan
|
b534778f46
|
FIX: Escape URL before attempting to resolve it.
|
2017-07-18 10:04:24 +09:00 |
Guo Xiang Tan
|
089a1bd3be
|
Specify the error that we want to ignore instead of rescuing all errors.
|
2017-07-18 09:55:52 +09:00 |
Robin Ward
|
db485ae0da
|
FIX: Support for skipping redirects on certain domains (like steam)
|
2017-06-26 15:38:43 -04:00 |
Robin Ward
|
7366f334b0
|
FIX: Try a GET for error code 409 too -- (Medium posts)
|
2017-06-15 15:09:59 -04:00 |
Robin Ward
|
009f0921dc
|
FEATURE: Whitelist hosts for internal crawling
|
2017-06-13 12:59:54 -04:00 |
Robin Ward
|
a3729b51eb
|
FIX: Always allow the host the forum is hosted on
|
2017-06-12 13:22:51 -04:00 |
Robin Ward
|
53b95f009f
|
FIX: If HEAD is not supported, try GET. Also set cookies
|
2017-06-06 13:53:49 -04:00 |
Robin Ward
|
0a08c18a14
|
FIX: Don't rate limit gravatar downloads
|
2017-05-24 13:54:26 -04:00 |
Robin Ward
|
3b0cbf7013
|
FIX: Always allow downloads from CDN
|
2017-05-23 16:32:54 -04:00 |
Robin Ward
|
b81e7be9a1
|
FEATURE: Rate limit how often we'll crawl a destination IP
|
2017-05-23 15:03:04 -04:00 |
Robin Ward
|
36e477750c
|
FIX: Use same code path for downloading images
|
2017-05-23 14:51:30 -04:00 |
Robin Ward
|
e5e7a15a85
|
SECURITY: Never crawl by IP
|
2017-05-23 13:07:18 -04:00 |
Robin Ward
|
93a5fc62bf
|
FEATURE: A site setting to prevent crawling on private IP blocks
|
2017-05-23 11:56:06 -04:00 |
Robin Ward
|
b8d78b33c6
|
FIX: Other content types like images are fine
|
2017-05-22 16:51:37 -04:00 |
Robin Ward
|
b23fc2bf84
|
Helper to find the final destination for a URL
|
2017-05-22 15:52:41 -04:00 |