Sam
f66efc601d
FIX: cubot android devices were detected as crawlers
2018-06-21 10:56:46 +10:00
Neil Lalonde
ced7e9a691
FEATURE: control which web crawlers can access using a whitelist or blacklist
2018-03-22 15:41:02 -04:00
Sam
d7657d8e47
correct specs, ensure crawler layout only applies to html
2018-01-16 16:28:11 +11:00
Sam
7b562d2f46
FEATURE: much improved and simplified crawler detection
...
- phase one does it match 'trident|webkit|gecko|chrome|safari|msie|opera'
yes- well it is possibly a browser
- phase two does it match 'rss|bot|spider|crawler|facebook|archive|wayback|ping|monitor'
probably a crawler then
Based off: https://gist.github.com/SamSaffron/6cfad7ea3e6df321ffb7a84f93720a53
2018-01-16 15:41:45 +11:00
Sam
f6fdc1ebe8
FEATURE: flexible crawler detection
...
You can use the crawler user agents site setting to amend what user agents
are considered crawlers based on a string match in the user agent
Also improves performance of crawler detection slightly
2017-09-29 12:31:50 +10:00
mcmcclur
a307ad6517
Update crawler_detection.rb
...
Add HTTrack to the list of detected crawlers so that Discourse will serve vanilla HTML per https://meta.discourse.org/t/a-basic-discourse-archival-tool/62614/25
2017-05-16 11:17:05 -04:00
Robin Ward
2a4006fe0c
Add `YandexBot` to our list of crawlers
2016-07-26 13:21:37 -04:00
Jeff Atwood
bbb1348118
add Swiftbot to crawler regex
2015-05-02 03:18:58 -07:00
Erick Guan
026cdd8fc3
FEATURE: add 360Spider UA to allow 360 crawl Discourse sites
2015-03-16 22:58:33 +08:00
Jeff Atwood
ceef06e771
add support for "Save Page Now" archive.org/web
2015-01-06 01:05:45 -08:00
riking
37dbc4b5e6
Add archive.org to crawler list to serve no-js to
2014-11-02 16:51:23 -08:00
Vikhyat Korrapati
e3702ecb30
Improved crawler detection: add Twitterbot, Facebook, curl, Bing, Baidu.
2014-03-16 19:30:20 +05:30
Robin Ward
c4b5455c21
REFACTOR: Rename `GooglebotDetection` to `CrawlerDetection` because we
...
will likely whitelist more crawlers in the future.
2014-02-20 16:07:02 -05:00