discourse/app/controllers
Osama Sayegh 7bd3986b21
FEATURE: Replace `Crawl-delay` directive with proper rate limiting (#15131)
We have a couple of site setting, `slow_down_crawler_user_agents` and `slow_down_crawler_rate`, that are meant to allow site owners to signal to specific crawlers that they're crawling the site too aggressively and that they should slow down.

When a crawler is added to the `slow_down_crawler_user_agents` setting, Discourse currently adds a `Crawl-delay` directive for that crawler in `/robots.txt`. Unfortunately, many crawlers don't support the `Crawl-delay` directive in `/robots.txt` which leaves the site owners no options if a crawler is crawling the site too aggressively.

This PR replaces the `Crawl-delay` directive with proper rate limiting for crawlers added to the `slow_down_crawler_user_agents` list. On every request made by a non-logged in user, Discourse will check the User Agent string and if it contains one of the values of the `slow_down_crawler_user_agents` list, Discourse will only allow 1 request every N seconds for that User Agent (N is the value of the `slow_down_crawler_rate` setting) and the rest of requests made within the same interval will get a 429 response. 

The `slow_down_crawler_user_agents` setting becomes quite dangerous with this PR since it could rate limit lots if not all of anonymous traffic if the setting is not used appropriately. So to protect against this scenario, we've added a couple of new validations to the setting when it's changed:

1) each value added to setting must 3 characters or longer
2) each value cannot be a substring of tokens found in popular browser User Agent. The current list of prohibited values is: apple, windows, linux, ubuntu, gecko, firefox, chrome, safari, applewebkit, webkit, mozilla, macintosh, khtml, intel, osx, os x, iphone, ipad and mac.
2021-11-30 12:55:25 +03:00
..
admin DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
users DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
about_controller.rb Revert "Revert "Merge branch 'master' of https://github.com/discourse/discourse"" 2020-05-23 00:56:13 -04:00
application_controller.rb FEATURE: Replace `Crawl-delay` directive with proper rate limiting (#15131) 2021-11-30 12:55:25 +03:00
badges_controller.rb UX: Add image uploader widget for uploading badge images (#12377) 2021-03-17 08:55:23 +03:00
bookmarks_controller.rb FEATURE: Topic-level bookmarks (#14353) 2021-09-21 08:45:47 +10:00
bootstrap_controller.rb FIX: allows authentication data to be present in bootstrap (#13885) 2021-07-29 15:01:11 +02:00
categories_controller.rb FIX: Display top posts from private categories if the user has access. (#14878) 2021-11-11 13:35:03 -03:00
clicks_controller.rb DEV: enable frozen string literal on all files 2019-05-13 09:31:32 +08:00
composer_messages_controller.rb DEV: Upgrading Discourse to Zeitwerk (#8098) 2019-10-02 14:01:53 +10:00
csp_reports_controller.rb DEV: Only include "report-sample" CSP directive when reporting is enabled (#9337) 2020-04-02 11:16:38 -04:00
directory_columns_controller.rb DEV: Plugin API to add directory columns (#13440) 2021-06-22 13:00:04 -05:00
directory_items_controller.rb FIX: Include user_field_ids in pagination URL for directory items (#13569) 2021-06-29 14:43:38 -05:00
do_not_disturb_controller.rb DEV: Replace 'processed' column on notifications with new table (#11864) 2021-01-27 10:29:24 -06:00
drafts_controller.rb FEATURE: Cook drafts excerpt in user activity (#14315) 2021-09-14 15:18:01 +03:00
edit_directory_columns_controller.rb FIX: Always serialize the correct attributes for DirectoryItems (#13510) 2021-06-23 14:55:17 -05:00
email_controller.rb FIX: set mailing_list_mode to false when unsubscribing from all (#10354) 2020-08-03 16:59:54 +10:00
embed_controller.rb UX: display correct replies count in embedded comments view. (#14175) 2021-08-30 10:37:53 +05:30
exceptions_controller.rb FEATURE: Add site setting to show more detailed 404 errors. (#8014) 2019-10-08 14:15:08 +03:00
export_csv_controller.rb DEV: Switch to new ExportUserArchive job 2020-08-28 11:46:53 -07:00
extra_locales_controller.rb Replace `base_uri` with `base_path` (#10879) 2020-10-09 12:51:24 +01:00
finish_installation_controller.rb DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
forums_controller.rb FEATURE: Allow a cluster_name to be configured and used for /srv/status (#12365) 2021-03-15 15:41:59 +11:00
groups_controller.rb DEV: Let's always give a drop_from param to deprecate (#14901) 2021-11-12 08:52:59 -06:00
hashtags_controller.rb DEV: Merge category and tag hashtags code paths (#10216) 2020-07-13 19:13:17 +03:00
highlight_js_controller.rb DEV: apply allow origin response header for CDN requests. (#11893) 2021-01-29 07:44:49 +05:30
inline_onebox_controller.rb FIX: Make inline oneboxes work with secured topics in secured contexts (#8895) 2020-02-12 12:11:28 +02:00
invites_controller.rb DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
list_controller.rb PERF: Revert all inboxes from messages route. (#14445) 2021-09-28 11:58:04 +08:00
metadata_controller.rb DEV: Upgrade Rails to 6.1.3.1 (#12688) 2021-04-21 12:36:32 +03:00
notifications_controller.rb FIX: Typo in `NotificationsController#index` not caught by tests. 2020-07-22 09:22:26 +08:00
offline_controller.rb DEV: enable frozen string literal on all files 2019-05-13 09:31:32 +08:00
onebox_controller.rb DEV: Add more debugging context to onebox generation 2020-10-22 12:50:22 +08:00
permalinks_controller.rb FIX: Check for permalinks before showing the 404 page 2020-03-23 16:31:07 -07:00
post_action_users_controller.rb FEATURE: Allow category group moderators to delete topics (#11069) 2020-11-05 12:18:26 -05:00
post_actions_controller.rb FEATURE: Admins can flag posts so they can review them later. (#12311) 2021-03-11 08:21:24 -03:00
post_readers_controller.rb DEV: '= true' is not necessary 2019-12-03 11:32:45 -03:00
posts_controller.rb FEATURE: Display pending posts on user’s page 2021-11-29 10:26:33 +01:00
presence_controller.rb UX: Make PresenceChannel changes more responsive (#14733) 2021-10-26 21:15:20 +01:00
published_pages_controller.rb FIX: Do not enable published page if secure media enabled (#11131) 2020-11-06 10:33:19 +10:00
push_notification_controller.rb DEV: enable frozen string literal on all files 2019-05-13 09:31:32 +08:00
qunit_controller.rb DEV: Fix theme qunit error messages (#14420) 2021-09-22 20:00:19 +02:00
reviewable_claimed_topics_controller.rb FEATURE: Allow group moderators to close/archive topics 2020-07-14 12:36:19 -04:00
reviewables_controller.rb FEATURE: Show stale reviewable to other clients (#13114) 2021-05-26 09:47:35 +10:00
robots_txt_controller.rb FEATURE: Replace `Crawl-delay` directive with proper rate limiting (#15131) 2021-11-30 12:55:25 +03:00
safe_mode_controller.rb FEATURE: Always disable customizations on the `/safe-mode` route (#9052) 2020-02-28 10:53:11 +00:00
search_controller.rb FIX: global setting needs to be coerced to float (#11162) 2020-11-09 16:46:52 +11:00
session_controller.rb DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
similar_topics_controller.rb PERF: Avoid parsing `Post#cooked` with Nokogiri for every search. 2020-07-24 10:43:09 +08:00
site_controller.rb DEV: Include `login_required` attribute in basic info endpoint (#14064) 2021-08-17 14:05:51 -04:00
static_controller.rb DEV: Upgrade Rails to 6.1.3.1 (#12688) 2021-04-21 12:36:32 +03:00
steps_controller.rb DEV: Upgrading Discourse to Zeitwerk (#8098) 2019-10-02 14:01:53 +10:00
stylesheets_controller.rb DEV: Fix stylesheet manager flaky spec (#13846) 2021-07-26 14:22:54 +10:00
svg_sprite_controller.rb FIX: Use absolute URL when redirecting SVG sprite path. 2021-06-30 11:25:05 +08:00
tag_groups_controller.rb FIX: Allow finding non-lowercase tag groups (#12787) 2021-04-21 19:15:53 +02:00
tags_controller.rb FIX: Better and more secure validation of periods for TopicQuery 2021-07-23 14:24:44 -04:00
theme_javascripts_controller.rb FEATURE: Allow theme tests to be run in production (take 2) (#12845) 2021-04-28 23:12:08 +03:00
topics_controller.rb FEATURE: Add setting to disable notifications for topic tags edits (#14794) 2021-11-02 13:53:21 -04:00
uploads_controller.rb DEV: Extract shared external upload routes into controller helper (#14984) 2021-11-18 09:17:23 +10:00
user_actions_controller.rb FIX: restrict other user's notification routes (#14442) 2021-09-29 16:24:28 +04:00
user_api_keys_controller.rb FEATURE: Rename 'Discourse SSO' to DiscourseConnect (#11978) 2021-02-08 10:04:33 +00:00
user_avatars_controller.rb DEV: apply allow origin response header for CDN requests. (#11893) 2021-01-29 07:44:49 +05:30
user_badges_controller.rb FIX: simplify and improve choosing favorite badges (#13743) 2021-07-16 11:13:00 +08:00
users_controller.rb FEATURE: show recent searches in quick search panel (#15024) 2021-11-25 15:44:15 -05:00
users_email_controller.rb DEV: Hash tokens stored from email_tokens (#14493) 2021-11-25 09:34:39 +02:00
webhooks_controller.rb FEATURE: IMAP delete email sync for group inboxes (#10392) 2020-08-12 10:16:26 +10:00
wizard_controller.rb DEV: Upgrading Discourse to Zeitwerk (#8098) 2019-10-02 14:01:53 +10:00