discourse

Commit Graph

Author	SHA1	Message	Date
David Taylor	6417173082	DEV: Apply syntax_tree formatting to `lib/*`	2023-01-09 12:10:19 +00:00
Dan Ungureanu	4e46732346	FEATURE: Implement browser update in crawler view (#12448 ) browser-update script does not work correctly in some very old browsers because the contents of <noscript> is not accessible in JavaScript. For these browsers, the server can display the crawler page and add the browser update notice. Simply loading the browser-update script in the crawler view is not a solution because that means all crawlers will also see it.	2021-03-22 19:41:42 +02:00
Krzysztof Kotlarek	e0d9232259	FIX: use allowlist and blocklist terminology (#10209 ) This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.	2020-07-27 10:23:54 +10:00
Dan Ungureanu	3ed6a0e904	FIX: Detect Wayback Machine using user agent (#9777 )	2020-05-14 21:10:07 +10:00
Maja Komel	42809f4d69	FIX: use crawler layout when saving url in Wayback Machine (#7667 )	2019-06-03 12:13:32 +10:00
Sam Saffron	30990006a9	DEV: enable frozen string literal on all files This reduces chances of errors where consumers of strings mutate inputs and reduces memory usage of the app. Test suite passes now, but there may be some stuff left, so we will run a few sites on a branch prior to merging	2019-05-13 09:31:32 +08:00
Sam	f66efc601d	FIX: cubot android devices were detected as crawlers	2018-06-21 10:56:46 +10:00
Neil Lalonde	ced7e9a691	FEATURE: control which web crawlers can access using a whitelist or blacklist	2018-03-22 15:41:02 -04:00
Sam	d7657d8e47	correct specs, ensure crawler layout only applies to html	2018-01-16 16:28:11 +11:00
Sam	7b562d2f46	FEATURE: much improved and simplified crawler detection - phase one does it match 'trident\|webkit\|gecko\|chrome\|safari\|msie\|opera' yes- well it is possibly a browser - phase two does it match 'rss\|bot\|spider\|crawler\|facebook\|archive\|wayback\|ping\|monitor' probably a crawler then Based off: https://gist.github.com/SamSaffron/6cfad7ea3e6df321ffb7a84f93720a53	2018-01-16 15:41:45 +11:00
Sam	f6fdc1ebe8	FEATURE: flexible crawler detection You can use the crawler user agents site setting to amend what user agents are considered crawlers based on a string match in the user agent Also improves performance of crawler detection slightly	2017-09-29 12:31:50 +10:00
mcmcclur	a307ad6517	Update crawler_detection.rb Add HTTrack to the list of detected crawlers so that Discourse will serve vanilla HTML per https://meta.discourse.org/t/a-basic-discourse-archival-tool/62614/25	2017-05-16 11:17:05 -04:00
Robin Ward	2a4006fe0c	Add `YandexBot` to our list of crawlers	2016-07-26 13:21:37 -04:00
Jeff Atwood	bbb1348118	add Swiftbot to crawler regex	2015-05-02 03:18:58 -07:00
Erick Guan	026cdd8fc3	FEATURE: add 360Spider UA to allow 360 crawl Discourse sites	2015-03-16 22:58:33 +08:00
Jeff Atwood	ceef06e771	add support for "Save Page Now" archive.org/web	2015-01-06 01:05:45 -08:00
riking	37dbc4b5e6	Add archive.org to crawler list to serve no-js to	2014-11-02 16:51:23 -08:00
Vikhyat Korrapati	e3702ecb30	Improved crawler detection: add Twitterbot, Facebook, curl, Bing, Baidu.	2014-03-16 19:30:20 +05:30
Robin Ward	c4b5455c21	REFACTOR: Rename `GooglebotDetection` to `CrawlerDetection` because we will likely whitelist more crawlers in the future.	2014-02-20 16:07:02 -05:00

19 Commits