We have a couple of site setting, `slow_down_crawler_user_agents` and `slow_down_crawler_rate`, that are meant to allow site owners to signal to specific crawlers that they're crawling the site too aggressively and that they should slow down.
When a crawler is added to the `slow_down_crawler_user_agents` setting, Discourse currently adds a `Crawl-delay` directive for that crawler in `/robots.txt`. Unfortunately, many crawlers don't support the `Crawl-delay` directive in `/robots.txt` which leaves the site owners no options if a crawler is crawling the site too aggressively.
This PR replaces the `Crawl-delay` directive with proper rate limiting for crawlers added to the `slow_down_crawler_user_agents` list. On every request made by a non-logged in user, Discourse will check the User Agent string and if it contains one of the values of the `slow_down_crawler_user_agents` list, Discourse will only allow 1 request every N seconds for that User Agent (N is the value of the `slow_down_crawler_rate` setting) and the rest of requests made within the same interval will get a 429 response.
The `slow_down_crawler_user_agents` setting becomes quite dangerous with this PR since it could rate limit lots if not all of anonymous traffic if the setting is not used appropriately. So to protect against this scenario, we've added a couple of new validations to the setting when it's changed:
1) each value added to setting must 3 characters or longer
2) each value cannot be a substring of tokens found in popular browser User Agent. The current list of prohibited values is: apple, windows, linux, ubuntu, gecko, firefox, chrome, safari, applewebkit, webkit, mozilla, macintosh, khtml, intel, osx, os x, iphone, ipad and mac.
The 'Discourse SSO' protocol is being rebranded to DiscourseConnect. This should help to reduce confusion when 'SSO' is used in the generic sense.
This commit aims to:
- Rename `sso_` site settings. DiscourseConnect specific ones are prefixed `discourse_connect_`. Generic settings are prefixed `auth_`
- Add (server-side-only) backwards compatibility for the old setting names, with deprecation notices
- Copy `site_settings` database records to the new names
- Rename relevant translation keys
- Update relevant translations
This commit does **not** aim to:
- Rename any Ruby classes or methods. This might be done in a future commit
- Change any URLs. This would break existing integrations
- Make any changes to the protocol. This would break existing integrations
- Change any functionality. Further normalization across DiscourseConnect and other auth methods will be done separately
The risks are:
- There is no backwards compatibility for site settings on the client-side. Accessing auth-related site settings in Javascript is fairly rare, and an error on the client side would not be security-critical.
- If a plugin is monkey-patching parts of the auth process, changes to locale keys could cause broken error messages. This should also be unlikely. The old site setting names remain functional, so security-related overrides will remain working.
A follow-up commit will be made with a post-deploy migration to delete the old `site_settings` rows.
There are issues around displaying images on published pages when secure media is enabled. This PR temporarily makes it appear as if published pages are enabled if secure media is also enabled.
* FIX: second factor cannot be enabled if SSO is enabled
If `enable_sso` setting is enabled then admin should not be able to
enable `enforce_second_factor` setting as that will lock users out.
Co-authored-by: Robin Ward <robin.ward@gmail.com>
The secure media functionality relied on `SiteSetting.enable_s3_uploads?` which, as we found in dev, did not take into account global S3 settings via `GlobalSetting.use_s3?`. We now use `SiteSetting.Upload.enable_s3_uploads` instead to be more consistent.
Also, we now validate `enable_s3_uploads` changes, because if `GlobalSetting.use_s3?` is true users should NOT be enabling S3 uploads manually.
It was possible to add a category to more than one default group, e.g. "default categories muted" and "default categories watching first post".
The bug was caused by category validations inadvertently comparing strings and numbers.
This is a low severity security fix because it requires a logged in
admin user to update a site setting via the API directly to an invalid
value.
The fix adds validation for the affected site settings, as well as a
secondary fix to prevent injection in the event of bad data somehow
already exists.
This removes all uses of both `send` and `public_send` from consumers of
SiteSetting and instead introduces a `get` helper for dynamic lookup
This leads to much cleaner and safer code long term as we are always explicit
to test that a site setting is really there before sending an arbitrary
string to the class
It also removes a couple of risky stubs from the auth provider test
This change both speeds up specs (less strings to allocate) and helps catch
cases where methods in Discourse are mutating inputs.
Overall we will be migrating everything to use #frozen_string_literal: true
it will take a while, but this is the first and safest move in this direction
This validation makes sure that the s3_upload_bucket and the
s3_backup_bucket have different values. The backup bucket is
allowed to be a subfolder of the upload bucket. The other way
around is forbidden because the backup system searches by
prefix and would return all files stored within the backup
bucket and its subfolders.