Commit Graph

76 Commits

Author SHA1 Message Date
Osama Sayegh b86127ad12
FEATURE: Apply rate limits per user instead of IP for trusted users (#14706)
Currently, Discourse rate limits all incoming requests by the IP address they
originate from regardless of the user making the request. This can be
frustrating if there are multiple users using Discourse simultaneously while
sharing the same IP address (e.g. employees in an office).

This commit implements a new feature to make Discourse apply rate limits by
user id rather than IP address for users at or higher than the configured trust
level (1 is the default).

For example, let's say a Discourse instance is configured to allow 200 requests
per minute per IP address, and we have 10 users at trust level 4 using
Discourse simultaneously from the same IP address. Before this feature, the 10
users could only make a total of 200 requests per minute before they got rate
limited. But with the new feature, each user is allowed to make 200 requests
per minute because the rate limits are applied on user id rather than the IP
address.

The minimum trust level for applying user-id-based rate limits can be
configured by the `skip_per_ip_rate_limit_trust_level` global setting. The
default is 1, but it can be changed by either adding the
`DISCOURSE_SKIP_PER_IP_RATE_LIMIT_TRUST_LEVEL` environment variable with the
desired value to your `app.yml`, or changing the setting's value in the
`discourse.conf` file.

Requests made with API keys are still rate limited by IP address and the
relevant global settings that control API keys rate limits.

Before this commit, Discourse's auth cookie (`_t`) was simply a 32 characters
string that Discourse used to lookup the current user from the database and the
cookie contained no additional information about the user. However, we had to
change the cookie content in this commit so we could identify the user from the
cookie without making a database query before the rate limits logic and avoid
introducing a bottleneck on busy sites.

Besides the 32 characters auth token, the cookie now includes the user id,
trust level and the cookie's generation date, and we encrypt/sign the cookie to
prevent tampering.

Internal ticket number: t54739.
2021-11-17 23:27:30 +03:00
Rafael dos Santos Silva 6645243a26
SECURITY: Disallow caching of MIME/Content-Type errors (#14907)
This will sign intermediary proxies and/or misconfigured CDNs to not
cache those error responses.
2021-11-12 15:52:25 -03:00
David Taylor 7a52ce0d6d
FIX: Strip `discourse-logged-in` header during `force_anonymous!` (#14533)
When the anonymous cache forces users into anonymous mode, it strips the cookies from their request. However, the discourse-logged-in header from the JS client remained.

When the discourse-logged-in header is present without any valid auth_token, the current_user_provider [marks the request as ['logged out'](dbbfad7ed0/lib/auth/default_current_user_provider.rb (L125-L125)), and a [discourse-logged-out header is returned to the client](dbbfad7ed0/lib/middleware/request_tracker.rb (L209-L211)). This causes the JS app to [popup a "you were logged out" modal](dbbfad7ed0/app/assets/javascripts/discourse/app/components/d-document.js (L29-L29)), which is very disruptive.

This commit strips the discourse-logged-in header from the request at the same time as the auth cookie.
2021-10-07 12:31:42 +01:00
Rafael dos Santos Silva b136375582
FEATURE: Rate limit exceptions via ENV (#14033)
Allow admins to configure exceptions to our Rails rate limiter.

Configuration happens in the environment variables, and work with both
IPs and CIDR blocks.

Example:

```
env:
  DISCOURSE_MAX_REQS_PER_IP_EXCEPTIONS: >-
    14.15.16.32/27
    216.148.1.2
```
2021-08-13 12:00:23 -03:00
Josh Soref 59097b207f
DEV: Correct typos and spelling mistakes (#12812)
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
Andrei Prigorshnev 075cd07a07
No need to disable rate limiter after running tests (#13093)
We disable rate limiter before running every test here 90ab3b1c75/spec/rails_helper.rb (L109-L109)
2021-05-19 16:04:35 +04:00
Bianca Nenciu 765ba1ab2d
FEATURE: Ignore anonymous page views on private sites (#12800)
For sites with login_required set to true, counting anonymous pageviews is
confusing. Requests to /login and other pages would make it look like
anonymous users have access to site's content.
2021-04-26 14:19:47 +03:00
Jarek Radosz 6ff888bd2c
DEV: Retry-after header values should be strings (#12475)
Fixes `Rack::Lint::LintError: a header value must be a String, but the value of 'Retry-After' is a Integer`. (see: 14a236b4f0/lib/rack/lint.rb (L676))

I found it when I got flooded by those warning a while back in a test-related accident 😉 (ember CLI tests were hitting a local rails server at a fast rate)
2021-03-23 20:32:36 +01:00
Martin Brennan 6eb0d0c38d
SECURITY: Fix is_private_ip for RateLimiter to cover all cases (#12464)
The regular expression to detect private IP addresses did not always detect them successfully.
Changed to use ruby's in-built IPAddr.new(ip_address).private? method instead
which does the same thing but covers all cases.
2021-03-22 13:56:32 +10:00
Sam 9fb9a2c098
DEV: freeze time when running rate limiter tests (#12315)
This avoids issues around clock skew making retry-after return 9 instead of
10
2021-03-11 10:47:23 +11:00
Dan Ungureanu 1f2f84a6df
FIX: Add Retry-Header to rate limited responses (#11736)
It returned a 429 error code with a 'Retry-After' header if a
RateLimiter::LimitExceeded was raised and unhandled, but the header was
missing if the request was limited in the 'RequestTracker' middleware.
2021-01-19 11:35:46 +02:00
Alan Guo Xiang Tan 38b6b098bc
FIX: Bypass `AnonymousCache` for `/srv/status` route. (#11491)
`/srv/status` routes should not be cached at all. Also, we want to
decouple the route from Redis which `AnonymouseCache` relies on. The
`/srv/status` should continue to return a success response even if Redis
is down.
2020-12-16 16:47:46 +11:00
Sam a6d9adf346
DEV: ensure queue_time and background_requests are floats (#10901)
GlobalSetting can end up with a String and we expect a Float
2020-10-13 18:08:38 +11:00
Sam 32393f72b1
PERF: backoff background requests when overloaded (#10888)
When the server gets overloaded and lots of requests start queuing server
will attempt to shed load by returning 429 errors on background requests.

The client can flag a request as background by setting the header:
`Discourse-Background` to `true`

Out-of-the-box we shed load when the queue time goes above 0.5 seconds.

The only request we shed at the moment is the request to load up a new post
when someone posts to a topic.

We can extend this as we go with a more general pattern on the client.

Previous to this change, rate limiting would "break" the post stream which
would make suggested topics vanish and users would have to scroll the page
to see more posts in the topic.

Server needs this protection for cases where tons of clients are navigated
to a topic and a new post is made. This can lead to a self inflicted denial
of service if enough clients are viewing the topic.

Due to the internal security design of Discourse it is hard for a large
number of clients to share a channel where we would pass the full post body
via the message bus.

It also renames (and deprecates) triggerNewPostInStream to triggerNewPostsInStream

This allows us to load a batch of new posts cleanly, so the controller can
keep track of a backlog

Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
2020-10-13 16:56:03 +11:00
Guo Xiang Tan 105d560177
SECURITY: 413 for GET, HEAD or DELETE requests with payload. 2020-08-03 14:21:33 +08:00
Guo Xiang Tan 32af607b70
DEV: Refactor anonymouse cache spec.
Mainly to properly categorize `Middleware::AnonymousCache` vs `Middleware::AnonymousCache::Helper` specs.
2020-08-03 14:17:11 +08:00
Krzysztof Kotlarek e0d9232259
FIX: use allowlist and blocklist terminology (#10209)
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
David Taylor c09b5807f3
FIX: Include resolved locale in anonymous cache key (#10289)
This only applies when set_locale_from_accept_language_header is enabled
2020-07-22 18:00:07 +01:00
Guo Xiang Tan d01c336899
DEV: Clean up some Redis leaks in test env. 2020-05-18 17:27:37 +08:00
Kane York 58ae0d4bd9
DEV: Add test case for /srv/status probers (#9259) 2020-03-24 16:28:07 +11:00
David Taylor 19814c5e81
FIX: Allow CSP to work correctly for non-default hostnames/schemes (#9180)
- Define the CSP based on the requested domain / scheme (respecting force_https)
- Update EnforceHostname middleware to allow secondary domains, add specs
- Add URL scheme to anon cache key so that CSP headers are cached correctly
2020-03-19 19:54:42 +00:00
Sam Saffron e440ec2519 FIX: crawler requests not tracked for non UTF-8 user agents
Non UTF-8 user_agent requests were bypassing logging due to PG always
wanting UTF-8 strings.

This adds some conversion to ensure we are always dealing with UTF-8
2019-12-09 17:43:51 +11:00
Joffrey JAFFEUX 0d3d2c43a0
DEV: s/\$redis/Discourse\.redis (#8431)
This commit also adds a rubocop rule to prevent global variables.
2019-12-03 10:05:53 +01:00
Sam Saffron 423ad5f0a4 FIX: do not log if an invalid mime type is passed to app
Previously our custom exception handler was unable to handle situations
where an invalid mime type was sent, resulting in a warning log

This ensures we pretend a request is HTML for the purpose of rendering
the error page if an invalid mime type from a scanner is shipped to the app
2019-11-21 15:51:34 +11:00
Sam Saffron 7d389df5e7 DEV: correct spec to allow for new default
b4bfc27b changes the default so the spec should be changed as well.
2019-11-18 16:05:58 +11:00
Penar Musaraj 74869b8a7f FIX: Do not consider mobile app traffic as crawler visits
Followup to a4eb523a
2019-11-04 09:16:50 -05:00
Krzysztof Kotlarek 427d54b2b0 DEV: Upgrading Discourse to Zeitwerk (#8098)
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains. 

We no longer need to use Rails "require_dependency" anywhere and instead can just use standard 
Ruby patterns to require files.

This is a far reaching change and we expect some followups here.
2019-10-02 14:01:53 +10:00
Sam Saffron ed00f35306 FEATURE: improve performance of anonymous cache
This commit introduces 2 features:

1. DISCOURSE_COMPRESS_ANON_CACHE (true|false, default false): this allows
you to optionally compress the anon cache body entries in Redis, can be
useful for high load sites with Redis that lives on a separate server to
to webs

2. DISCOURSE_ANON_CACHE_STORE_THRESHOLD (default 2), only pop entries into
redis if we observe them more than N times. This avoids situations where
a crawler can walk a big pile of topics and store them all in Redis never
to be used. Our default anon cache time for topics is only 60 seconds. Anon
cache is in place to avoid the "slashdot" effect where a single topic is
hit by 100s of people in one minute.
2019-09-04 17:18:32 +10:00
Sam Saffron b9954b53bb FIX: report cached controller and action to loggers
Previously we would treat all cached hits in anon cache as "other"

This hinders analysis of cache performance and makes logging inaccurate
2019-09-03 10:55:16 +10:00
Sam Saffron 08743e8ac0 FEATURE: anon cache reports data to loggers
This allows custom plugins such as prometheus exporter to log how many
requests are stored in the anon cache vs used by the anon cache.

This metric allows us to fine tune cache behaviors
2019-09-02 18:45:35 +10:00
Sam Saffron 62141b6316 FEATURE: enable_performance_http_headers for performance diagnostics
This adds support for DISCOURSE_ENABLE_PERFORMANCE_HTTP_HEADERS
when set to `true` this will turn on performance related headers

```text
X-Redis-Calls: 10     # number of redis calls
X-Redis-Time: 1.02    # redis time in seconds
X-Sql-Commands: 102   # number of SQL commands
X-Sql-Time: 1.02      # duration in SQL in seconds
X-Queue-Time: 1.01    # time the request sat in queue (depends on NGINX)
```

To get queue time NGINX must provide: HTTP_X_REQUEST_START

We do not recommend you enable this without thinking, it exposes information
about what your page is doing, usually you would only enable this if you
intend to strip off the headers further down the stream in a proxy
2019-06-05 16:08:11 +10:00
Penar Musaraj a4eb523af6 Track Discourse user agent pageviews as crawler
Since 5bfe051e, Discourse user agents are marked as non-crawlers (to avoid accidental blacklisting). This makes sure pageviews for these agents are tracked as crawler hits.
2019-05-08 10:38:55 -04:00
Sam Saffron 4ea21fa2d0 DEV: use #frozen_string_literal: true on all spec
This change both speeds up specs (less strings to allocate) and helps catch
cases where methods in Discourse are mutating inputs.

Overall we will be migrating everything to use #frozen_string_literal: true
it will take a while, but this is the first and safest move in this direction
2019-04-30 10:27:42 +10:00
Neil Lalonde 526ffc4966 FIX: error in response body to blocked crawlers, showing 500 Internal Server Error with status of 403 2018-09-14 15:40:20 -04:00
Neil Lalonde b87a089822 FIX: don't block api requests when whitelisted_crawler_user_agents is set 2018-09-14 15:40:20 -04:00
Osama Sayegh 0b7ed8ffaf FEATURE: backend support for user-selectable components
* FEATURE: backend support for user-selectable components

* fix problems with previewing default theme

* rename preview_key => preview_theme_id

* omit default theme from child themes dropdown and try a different fix

* cache & freeze stylesheets arrays
2018-08-08 14:46:34 +10:00
Sam 379384ae1e FIX: never block /srv/status which is used for health checks
This route is also very cheap so blocking it is not required

It is still rate limited and so on elsewhere
2018-07-18 12:37:01 +10:00
OsamaSayegh decf1f27cf FEATURE: Groundwork for user-selectable theme components
* Phase 0 for user-selectable theme components

- Drops `key` column from the `themes` table
- Drops `theme_key` column from the `user_options` table
- Adds `theme_ids` (array of ints default []) column to the `user_options` table and migrates data from `theme_key` to the new column.
- Removes the `default_theme_key` site setting and adds `default_theme_id` instead.
- Replaces `theme_key` cookie with a new one called `theme_ids`
- no longer need Theme.settings_for_client
2018-07-12 14:18:21 +10:00
Sam e72fd7ae4e FIX: move crawler blocking into anon cache
This refinement of previous fix moves the crawler blocking into
anonymous cache

This ensures we never poison the cache incorrectly when blocking crawlers
2018-07-04 11:14:43 +10:00
Neil Lalonde e8a6323bea remove crawler blocking until multisite support 2018-07-03 17:54:45 -04:00
Sam 035312d501 FIX: specify path for dosp cookie 2018-04-24 11:07:58 -04:00
Sam ded84a4b58 PERF: improve performance once logged in rate limiter hits
If "logged in" is being forced anonymous on certain routes, trigger
the protection for any requests that spend 50ms queueing

This means that ...

1. You need to trip it by having 3 requests take longer than 1 second in 10 second interval
2. Once tripped, if your route is still spending 50m queueuing it will continue to be protected

This means that site will continue to function with almost no delays while it is scaling up to handle the new load
2018-04-23 11:55:25 +10:00
Sam 59cd7894d9 FEATURE: if site is under extreme load show anon view
If a particular path is being hit extremely hard by logged on users,
revert to anonymous cached view.

This will only come into effect if 3 requests queue for longer than 2 seconds
on a *single* path.

This can happen if a URL is shared with the entire forum base and everyone
is logged on
2018-04-18 16:58:57 +10:00
Neil Lalonde b87fa6d749 FIX: blacklisted crawlers could get through by omitting the accept header 2018-04-17 12:39:30 -04:00
Sam 9980f18d86 FEATURE: track request queueing as early as possible 2018-04-17 18:06:17 +10:00
Neil Lalonde 7311023a52
Merge pull request #5700 from discourse/crawl-block
FEATURE: control web crawlers access with white/blacklist
2018-03-27 15:06:03 -04:00
Neil Lalonde 4d12ff2e8a when writing cache, remove elements from the user agents list. also return a message and content type when blocking a crawler. 2018-03-27 13:44:14 -04:00
Sam 31dea5d5fc correct flaky spec 2018-03-27 17:57:19 +11:00
Neil Lalonde a84bb81ab5 only applies to get html requests 2018-03-22 17:57:44 -04:00
Neil Lalonde ced7e9a691 FEATURE: control which web crawlers can access using a whitelist or blacklist 2018-03-22 15:41:02 -04:00