Commit Graph

195 Commits

Author SHA1 Message Date
Osama Sayegh 361992bb74
FIX: Apply crawler rate limits to cached requests (#27174)
This commit moves the logic for crawler rate limits out of the application controller and into the request tracker middleware. The reason for this move is to apply rate limits to all crawler requests instead of just the requests that make it to the application controller. Some requests are served early from the middleware stack without reaching the Rails app for performance reasons (e.g. `AnonymousCache`) which results in crawlers getting 200 responses even though they've reached their limits and should be getting 429 responses.

Internal topic: t/128810.
2024-05-27 16:26:35 +03:00
David Taylor ece0150cb7
FIX: Ensure RequestTracker handles bubbled exceptions correctly (#26940)
This can happen for various reasons including rate limiting and middleware bugs. This should resolve the warning we're seeing in the logs

```
RequestTracker.get_data failed : NoMethodError : undefined method `[]' for nil:NilClass
```
2024-05-08 16:08:39 +01:00
David Taylor 620f76cec1
DEV: Log original exception/backtrace for RequestTracker errors (#26802) 2024-04-29 09:05:32 +01:00
David Taylor 2f2da72747
FEATURE: Add experimental tracking of 'real browser' pageviews (#26647)
Our 'page_view_crawler' / 'page_view_anon' metrics are based purely on the User Agent sent by clients. This means that 'badly behaved' bots which are imitating real user agents are counted towards 'anon' page views.

This commit introduces a new method of tracking visitors. When an initial HTML request is made, we assume it is a 'non-browser' request (i.e. a bot). Then, once the JS application has booted, we notify the server to count it as a 'browser' request. This reliance on a JavaScript-capable browser matches up more closely to dedicated analytics systems like Google Analytics.

Existing data collection and graphs are unchanged. Data collected via the new technique is available in a new 'experimental' report.
2024-04-25 11:00:01 +01:00
David Taylor bca855f239
FIX: Improve handling of 'PublicExceptions' when bootstrap_error_pages enabled (#26700)
- Run the CSP-nonce-related middlewares on the generated response

- Fix the readonly mode checking to avoid empty strings being passed (the `check_readonly_mode` before_action will not execute in the case of these re-dispatched exceptions)

- Move the BlockRequestsMiddleware cookie-setting to the middleware, so that it is included even for unusual HTML responses like these exceptions
2024-04-24 09:40:13 +01:00
Michael Brown 680f1ff19c FIX: Add content-type header to rate limiter error
It's best to always set a content-type header and one was missing here.
2024-03-26 12:39:42 -04:00
David Taylor 1672a24490
DEV: Memoize CSP nonce placeholder on response (#25724)
That way, the same value is used even if the helper is called in the context of different controllers

Followup to c8a1b49ddd
2024-02-16 12:15:55 +00:00
David Taylor b1f74ab59e
FEATURE: Add experimental option for strict-dynamic CSP (#25664)
The strict-dynamic CSP directive is supported in all our target browsers, and makes for a much simpler configuration. Instead of allowlisting paths, we use a per-request nonce to authorize `<script>` tags, and then those scripts are allowed to load additional scripts (or add additional inline scripts) without restriction.

This becomes especially useful when admins want to add external scripts like Google Tag Manager, or advertising scripts, which then go on to load a ton of other scripts.

All script tags introduced via themes will automatically have the nonce attribute applied, so it should be zero-effort for theme developers. Plugins *may* need some changes if they are inserting their own script tags.

This commit introduces a strict-dynamic-based CSP behind an experimental `content_security_policy_strict_dynamic` site setting.
2024-02-16 11:16:54 +00:00
David Taylor a562214f56
FIX: Update global rate limiter keys/messages to clarify user vs ip (#25264) 2024-01-15 19:54:50 +00:00
David Taylor 59c2407e18
FEATURE: add username header to global-rate-limited responses (#25265)
This will make it easier to analyze rate limiting in reverse-proxy logs. To make this possible without a database lookup, we add the username to the encrypted `_t` cookie data.
2024-01-15 19:50:37 +00:00
Jarek Radosz 694b5f108b
DEV: Fix various rubocop lints (#24749)
These (21 + 3 from previous PRs) are soon to be enabled in rubocop-discourse:

Capybara/VisibilityMatcher
Lint/DeprecatedOpenSSLConstant
Lint/DisjunctiveAssignmentInConstructor
Lint/EmptyConditionalBody
Lint/EmptyEnsure
Lint/LiteralInInterpolation
Lint/NonLocalExitFromIterator
Lint/ParenthesesAsGroupedExpression
Lint/RedundantCopDisableDirective
Lint/RedundantRequireStatement
Lint/RedundantSafeNavigation
Lint/RedundantStringCoercion
Lint/RedundantWithIndex
Lint/RedundantWithObject
Lint/SafeNavigationChain
Lint/SafeNavigationConsistency
Lint/SelfAssignment
Lint/UnreachableCode
Lint/UselessMethodDefinition
Lint/Void

Previous PRs:
Lint/ShadowedArgument
Lint/DuplicateMethods
Lint/BooleanSymbol
RSpec/SpecFilePathSuffix
2023-12-06 23:25:00 +01:00
Jarek Radosz 6a66dc1cfb
DEV: Fix Lint/BooleanSymbol (#24747) 2023-12-06 13:19:09 +01:00
Martin Brennan 30d5e752d7
DEV: Revert guardian changes (#24742)
I took the wrong approach here, need to rethink.

* Revert "FIX: Use Guardian.basic_user instead of new (anon) (#24705)"

This reverts commit 9057272ee2.

* Revert "DEV: Remove unnecessary method_missing from GuardianUser (#24735)"

This reverts commit a5d4bf6dd2.

* Revert "DEV: Improve Guardian devex (#24706)"

This reverts commit 77b6a038ba.

* Revert "FIX: Introduce Guardian::BasicUser for oneboxing checks (#24681)"

This reverts commit de983796e1.
2023-12-06 16:37:32 +10:00
Martin Brennan 9057272ee2
FIX: Use Guardian.basic_user instead of new (anon) (#24705)
c.f. de983796e1

There will soon be additional login_required checks
for Guardian, and the intent of many checks by automated
systems is better fulfilled by using BasicUser, which
simulates a logged in TL0 forum user, rather than an
anon user.

In some cases the use of anon still makes sense (e.g.
anonymous_cache), and in that case the more explicit
`Guardian.anon_user` is used
2023-12-06 11:56:21 +10:00
David Taylor c88303bb27
DEV: Relax auth provider registration restrictions for plugins (#24095)
In the past we would build the stack of Omniauth providers at boot, which meant that plugins had to register any authenticators in the root of their plugin.rb (i.e. not in an `after_initialize` block). This could be frustrating because many features are not available that early in boot (e.g. Zeitwerk autoloading).

Now that we build the omniauth strategy stack 'just in time', it is safe for plugins to register their auth methods in an `after_initialize` block. This commit relaxes the old restrictions so that plugin authors have the option to move things around.
2023-10-26 10:54:30 +01:00
David Taylor 5c38e55dc9
DEV: Only run omniauth strategies for enabled authenticators (#24094)
Previously, we would build the stack of omniauth authenticators once on boot. That meant that all strategies had to be included, even if they were disabled. We then used the `before_request_phase` to ensure disabled strategies could not be used. This works well, but it means that omniauth is often doing unnecessary work running logic in disabled strategies.

This commit refactors things so that we build the stack of strategies on each request. That means we only need to include the enabled strategies in the stack - disabled strategies are totally ignored. Building the stack on-demand like this does add some overhead to auth requests, but on the majority of sites that will be significantly outweighed by the fact we're now skipping logic for disabled authenticators.

As well as the slight performance improvement, this new approach means that:

- Broken (i.e. exception-raising) strategies cannot cause issues on a site if they're disabled

- `other_phase` of disabled strategies will never appear in the backtrace of other authentication errors
2023-10-25 13:52:33 +01:00
Alan Guo Xiang Tan 773b22e8d0
DEV: Seperate concerns of tracking GC stat from `MethodProfiler` (#22921)
Why this change?

This is a follow up to e8f7b62752.
Tracking of GC stats didn't really belong in the `MethodProfiler` class
so we want to extract that concern into its own class.

As part of this PR, the `track_gc_stat_per_request` site setting has
also been renamed to `instrument_gc_stat_per_request`.
2023-08-02 10:46:37 +08:00
OsamaSayegh 0976c8fad6
SECURITY: Don't reuse CSP nonce between anonymous requests 2023-07-28 12:53:44 +01:00
Alan Guo Xiang Tan 68bb53a196
DEV: Fix failing spec after Rails upgrade to 7.0.5.1 (#22317)
Follow up to 4d3999de10
2023-06-28 08:17:11 +08:00
Alan Guo Xiang Tan 0d9efa938b
DEV: Avoid logging routing errors (#20622)
The logs are usually caused by the client and is of no use to us.
2023-03-10 17:17:59 +08:00
David Taylor 798b4bb604
FIX: Ensure anon-cached values are never returned for API requests (#20021)
Under some situations, we would inadvertently return a public (unauthenticated) result to an authenticated API request. This commit adds the `Api-Key` header to our anonymous cache bypass logic.
2023-01-26 13:26:29 +00:00
Daniel Waterworth 666536cbd1
DEV: Prefer \A and \z over ^ and $ in regexes (#19936) 2023-01-20 12:52:49 -06:00
Loïc Guitaut 4093fc6074 Revert "DEV: Migrate existing cookies to Rails 7 format"
This reverts commit 66e8fe9cc6 as it
unexpectedly caused some users to be logged out. We are investigating
the problem.
2023-01-12 12:07:49 +01:00
Loïc Guitaut 66e8fe9cc6 DEV: Migrate existing cookies to Rails 7 format
This patch introduces a cookies rotator as indicated in the Rails
upgrade guide. This allows to migrate from the old SHA1 digest to the
new SHA256 digest.
2023-01-12 11:09:07 +01:00
David Taylor 6417173082
DEV: Apply syntax_tree formatting to `lib/*` 2023-01-09 12:10:19 +00:00
David Taylor 66e8a35b4d
DEV: Include message-bus request type in HTTP request data (#19762) 2023-01-06 11:26:18 +00:00
Penar Musaraj 8546c2084a
DEV: Load SVG sprites during system spec runs (#19497)
Co-authored-by: David Taylor <david@taylorhq.com>
2022-12-22 08:13:43 -05:00
Bianca Nenciu 3048d3d07d
FEATURE: Track API and user API requests (#19186)
Adds stats for API and user API requests similar to regular page views.
This comes with a new report to visualize API requests per day like the
consolidated page views one.
2022-11-29 13:07:42 +02:00
Vinoth Kannan 076abe46fa
FEATURE: new site setting to set locale from cookie for anonymous users. (#18377)
This new hidden default-disabled site setting `set_locale_from_cookie` will set locale from anonymous user's cookie value.
2022-09-27 14:26:06 +05:30
Loïc Guitaut 008b700a3f DEV: Upgrade to Rails 7
This patch upgrades Rails to version 7.0.2.4.
2022-04-28 11:51:03 +02:00
David Taylor 8f786268be
SECURITY: Ensure user-agent-based responses are cached separately (#16475) 2022-04-14 14:25:52 +01:00
David Taylor cd6b7459a7
DEV: Improve background-request information in request_tracker (#16037)
This will allow consumers (e.g. the discourse-prometheus plugin) to separate topic-timings and message-bus requests. It also fixes the is_background boolean for subfolder sites.
2022-02-23 12:45:42 +00:00
David Taylor 11c93342dc
DEV: Consolidate Redis evalsha logic into DiscourseRedis::EvalHelper (#15957) 2022-02-15 16:06:12 +00:00
David Taylor 64be371749
DEV: Improve handling of invalid requests (#15841)
Our discourse_public_exceptions middleware is designed to catch bubbled exceptions from lower in the stack, and then use `ApplicationController.rescue_with_handler` to render an appropriate error response.

When the request itself is invalid, we had an escape-hatch to skip re-dispatching the request to ApplicationController. However, it was possible to work around this by 'layering' the errors. For example, if you made a request which resulted in a 404, but **also** had some other invalidity, the escape hatch would not be triggered.

This commit ensures that these kind of 'layered' errors are properly handled, without logging warnings. It also adds detection for invalid JSON bodies and badly-formed multipart requests.

The user-facing behavior is unchanged. This commit simply prevents warnings being logged for invalid requests.
2022-02-07 13:16:57 +00:00
Osama Sayegh b86127ad12
FEATURE: Apply rate limits per user instead of IP for trusted users (#14706)
Currently, Discourse rate limits all incoming requests by the IP address they
originate from regardless of the user making the request. This can be
frustrating if there are multiple users using Discourse simultaneously while
sharing the same IP address (e.g. employees in an office).

This commit implements a new feature to make Discourse apply rate limits by
user id rather than IP address for users at or higher than the configured trust
level (1 is the default).

For example, let's say a Discourse instance is configured to allow 200 requests
per minute per IP address, and we have 10 users at trust level 4 using
Discourse simultaneously from the same IP address. Before this feature, the 10
users could only make a total of 200 requests per minute before they got rate
limited. But with the new feature, each user is allowed to make 200 requests
per minute because the rate limits are applied on user id rather than the IP
address.

The minimum trust level for applying user-id-based rate limits can be
configured by the `skip_per_ip_rate_limit_trust_level` global setting. The
default is 1, but it can be changed by either adding the
`DISCOURSE_SKIP_PER_IP_RATE_LIMIT_TRUST_LEVEL` environment variable with the
desired value to your `app.yml`, or changing the setting's value in the
`discourse.conf` file.

Requests made with API keys are still rate limited by IP address and the
relevant global settings that control API keys rate limits.

Before this commit, Discourse's auth cookie (`_t`) was simply a 32 characters
string that Discourse used to lookup the current user from the database and the
cookie contained no additional information about the user. However, we had to
change the cookie content in this commit so we could identify the user from the
cookie without making a database query before the rate limits logic and avoid
introducing a bottleneck on busy sites.

Besides the 32 characters auth token, the cookie now includes the user id,
trust level and the cookie's generation date, and we encrypt/sign the cookie to
prevent tampering.

Internal ticket number: t54739.
2021-11-17 23:27:30 +03:00
Rafael dos Santos Silva 6645243a26
SECURITY: Disallow caching of MIME/Content-Type errors (#14907)
This will sign intermediary proxies and/or misconfigured CDNs to not
cache those error responses.
2021-11-12 15:52:25 -03:00
Dan Ungureanu 69f0f48dc0
DEV: Fix rubocop issues (#14715) 2021-10-27 11:39:28 +03:00
David Taylor 7a52ce0d6d
FIX: Strip `discourse-logged-in` header during `force_anonymous!` (#14533)
When the anonymous cache forces users into anonymous mode, it strips the cookies from their request. However, the discourse-logged-in header from the JS client remained.

When the discourse-logged-in header is present without any valid auth_token, the current_user_provider [marks the request as ['logged out'](dbbfad7ed0/lib/auth/default_current_user_provider.rb (L125-L125)), and a [discourse-logged-out header is returned to the client](dbbfad7ed0/lib/middleware/request_tracker.rb (L209-L211)). This causes the JS app to [popup a "you were logged out" modal](dbbfad7ed0/app/assets/javascripts/discourse/app/components/d-document.js (L29-L29)), which is very disruptive.

This commit strips the discourse-logged-in header from the request at the same time as the auth cookie.
2021-10-07 12:31:42 +01:00
Rafael dos Santos Silva b136375582
FEATURE: Rate limit exceptions via ENV (#14033)
Allow admins to configure exceptions to our Rails rate limiter.

Configuration happens in the environment variables, and work with both
IPs and CIDR blocks.

Example:

```
env:
  DISCOURSE_MAX_REQS_PER_IP_EXCEPTIONS: >-
    14.15.16.32/27
    216.148.1.2
```
2021-08-13 12:00:23 -03:00
Alan Guo Xiang Tan 8e3691d537 PERF: Eager load Theme associations in Stylesheet Manager.
Before this change, calling `StyleSheet::Manager.stylesheet_details`
for the first time resulted in multiple queries to the database. This is
because the code was modelled in a way where each `Theme` was loaded
from the database one at a time.

This PR restructures the code such that it allows us to load all the
theme records in a single query. It also allows us to eager load the
required associations upfront. In order to achieve this, I removed the
support of loading multiple themes per request. It was initially added
to support user selectable theme components but the feature was never
completed and abandoned because it wasn't a feature that we thought was
worth building.
2021-06-21 11:06:58 +08:00
Josh Soref 59097b207f
DEV: Correct typos and spelling mistakes (#12812)
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
Bianca Nenciu 765ba1ab2d
FEATURE: Ignore anonymous page views on private sites (#12800)
For sites with login_required set to true, counting anonymous pageviews is
confusing. Requests to /login and other pages would make it look like
anonymous users have access to site's content.
2021-04-26 14:19:47 +03:00
Dan Ungureanu dce48d8aa7
FIX: Redirect to provided origin after auth (#12558)
It used to redirect to the destination_url cookie which sometimes is set
incorrectly.
2021-03-31 10:23:12 +01:00
Jarek Radosz 6ff888bd2c
DEV: Retry-after header values should be strings (#12475)
Fixes `Rack::Lint::LintError: a header value must be a String, but the value of 'Retry-After' is a Integer`. (see: 14a236b4f0/lib/rack/lint.rb (L676))

I found it when I got flooded by those warning a while back in a test-related accident 😉 (ember CLI tests were hitting a local rails server at a fast rate)
2021-03-23 20:32:36 +01:00
Martin Brennan 6eb0d0c38d
SECURITY: Fix is_private_ip for RateLimiter to cover all cases (#12464)
The regular expression to detect private IP addresses did not always detect them successfully.
Changed to use ruby's in-built IPAddr.new(ip_address).private? method instead
which does the same thing but covers all cases.
2021-03-22 13:56:32 +10:00
Vinoth Kannan a5923ad603
DEV: apply allow origin response header for CDN requests. (#11893)
Currently, it creates a CORS error while accessing those static files.
2021-01-29 07:44:49 +05:30
Dan Ungureanu 1f2f84a6df
FIX: Add Retry-Header to rate limited responses (#11736)
It returned a 429 error code with a 'Retry-After' header if a
RateLimiter::LimitExceeded was raised and unhandled, but the header was
missing if the request was limited in the 'RequestTracker' middleware.
2021-01-19 11:35:46 +02:00
Alan Guo Xiang Tan 38b6b098bc
FIX: Bypass `AnonymousCache` for `/srv/status` route. (#11491)
`/srv/status` routes should not be cached at all. Also, we want to
decouple the route from Redis which `AnonymouseCache` relies on. The
`/srv/status` should continue to return a success response even if Redis
is down.
2020-12-16 16:47:46 +11:00
Tobias Eigen 0a0fd6eace
DEV: fixed capitalization in rate limit message (#11193) 2020-11-11 12:35:03 +11:00
Sam 2686d14b9a
PERF: introduce aggressive rate limiting for anonymous (#11129)
Previous to this change our anonymous rate limits acted as a throttle.
New implementation means we now also consider rate limited requests towards
the limit.

This means that if an anonymous user is hammering the server it will not be
able to get any requests through until it subsides with traffic.
2020-11-05 16:36:17 +11:00