Commit Graph

92 Commits

Author SHA1 Message Date
David Taylor 98a7fa3d1a
PERF: Bump message_bus to 4.2 ()
This includes upstream performance improvements. For details, see 1baa1ea4a5
2022-02-22 16:16:02 +00:00
Osama Sayegh b86127ad12
FEATURE: Apply rate limits per user instead of IP for trusted users ()
Currently, Discourse rate limits all incoming requests by the IP address they
originate from regardless of the user making the request. This can be
frustrating if there are multiple users using Discourse simultaneously while
sharing the same IP address (e.g. employees in an office).

This commit implements a new feature to make Discourse apply rate limits by
user id rather than IP address for users at or higher than the configured trust
level (1 is the default).

For example, let's say a Discourse instance is configured to allow 200 requests
per minute per IP address, and we have 10 users at trust level 4 using
Discourse simultaneously from the same IP address. Before this feature, the 10
users could only make a total of 200 requests per minute before they got rate
limited. But with the new feature, each user is allowed to make 200 requests
per minute because the rate limits are applied on user id rather than the IP
address.

The minimum trust level for applying user-id-based rate limits can be
configured by the `skip_per_ip_rate_limit_trust_level` global setting. The
default is 1, but it can be changed by either adding the
`DISCOURSE_SKIP_PER_IP_RATE_LIMIT_TRUST_LEVEL` environment variable with the
desired value to your `app.yml`, or changing the setting's value in the
`discourse.conf` file.

Requests made with API keys are still rate limited by IP address and the
relevant global settings that control API keys rate limits.

Before this commit, Discourse's auth cookie (`_t`) was simply a 32 characters
string that Discourse used to lookup the current user from the database and the
cookie contained no additional information about the user. However, we had to
change the cookie content in this commit so we could identify the user from the
cookie without making a database query before the rate limits logic and avoid
introducing a bottleneck on busy sites.

Besides the 32 characters auth token, the cookie now includes the user id,
trust level and the cookie's generation date, and we encrypt/sign the cookie to
prevent tampering.

Internal ticket number: t54739.
2021-11-17 23:27:30 +03:00
Daniel Waterworth fa3c4ad28b
DEV: Deprecate message bus site settings ()
It makes much more sense for these to be GlobalSettings, since, in
multisite clusters, only the default site's settings would be respected.

Co-authored-by: David Taylor <david@taylorhq.com>
2021-11-11 11:12:25 -06:00
Daniel Waterworth ceb234c2e9
FEATURE: Make the multisite config path configurable () 2021-09-10 14:19:52 -05:00
David Taylor 4134173bbf
FEATURE: Add global admin api key rate limiter () 2021-06-03 10:52:43 +01:00
Josh Soref 59097b207f
DEV: Correct typos and spelling mistakes ()
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
David Taylor 7970d1d99f
FEATURE: Allow a cluster_name to be configured and used for /srv/status ()
The cluster name can be configured by setting the `DISCOURSE_CLUSTER_NAME` environment variable. If set, you can then call /srv/status with a `?cluster=` parameter. If the cluster does not match, an error will be returned. This is useful if you need a load balancer to be able to verify the identity, as well as the presence, of an application container.
2021-03-15 15:41:59 +11:00
godmar 9aeece465f
FEATURE: support DISCOURSE_SMTP_FORCE_TLS option ()
Background: RFC 8314 3.3 asks that:

clients and servers SHOULD implement both STARTTLS on
port 587 and Implicit TLS on port 465

Discourse currently cannot be configured this way.
With this patch, it's possible to set
DISCOURSE_SMTP_FORCE_TLS=true to use implicit TLS on port 465
2021-01-18 11:56:18 -05:00
Krzysztof Kotlarek f84ff26aa9
FIX: use Redis replica host and port ()
Introduce Redis `replica` config and deprecate `slave`
2020-12-23 13:14:19 +11:00
Krzysztof Kotlarek 07bf7a91f4
Revert "FIX: use Redis replica host and port ()" ()
This reverts commit b0e1210b0c.
2020-12-22 16:16:50 +11:00
Krzysztof Kotlarek b0e1210b0c
FIX: use Redis replica host and port ()
* FIX: use Redis replica host and port

Introduce Redis `replica` config and deprecate `slave`

* FIX: move deprecations to separate file
2020-12-22 15:52:00 +11:00
Osama Sayegh a04c300495
DEV: Add optional ENV variables for MiniProfiler snapshots transporter () 2020-10-21 19:37:28 +03:00
Sam 32393f72b1
PERF: backoff background requests when overloaded ()
When the server gets overloaded and lots of requests start queuing server
will attempt to shed load by returning 429 errors on background requests.

The client can flag a request as background by setting the header:
`Discourse-Background` to `true`

Out-of-the-box we shed load when the queue time goes above 0.5 seconds.

The only request we shed at the moment is the request to load up a new post
when someone posts to a topic.

We can extend this as we go with a more general pattern on the client.

Previous to this change, rate limiting would "break" the post stream which
would make suggested topics vanish and users would have to scroll the page
to see more posts in the topic.

Server needs this protection for cases where tons of clients are navigated
to a topic and a new post is made. This can lead to a self inflicted denial
of service if enough clients are viewing the topic.

Due to the internal security design of Discourse it is hard for a large
number of clients to share a channel where we would pass the full post body
via the message bus.

It also renames (and deprecates) triggerNewPostInStream to triggerNewPostsInStream

This allows us to load a batch of new posts cleanly, so the controller can
keep track of a backlog

Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
2020-10-13 16:56:03 +11:00
Sam 120fa8ad2f
PERF: Introduce absolute limit of digests per 30 minutes ()
To avoid blocking the sidekiq queue a limit of 10,000 digests per 30 minutes
is introduced.

This acts as a safety measure that makes sure we don't keep pouring oil on
a fire.

On multisites it is recommended to set the number way lower so sites do not
dominate the backlog. A reasonable default for multisites may be 100-500.

This can be controlled with the environment var

DISCOURSE_MAX_DIGESTS_ENQUEUED_PER_30_MINS_PER_SITE
2020-10-07 17:30:15 +11:00
Osama Sayegh a92d88747e
DEV: Add ENV variable for enabling MiniProfiler snapshots ()
* DEV: Add ENV variable for enabling MiniProfiler snapshots

* MiniProfiler is not loaded in test env
2020-09-17 18:18:35 +03:00
Krzysztof Kotlarek e0d9232259
FIX: use allowlist and blocklist terminology ()
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
Martin Brennan a9905ef7e5
FIX: Add enable_email_sync_demon global variable and disable EmailSync demon by default ()
Demon::EmailSync is used in conjunction with the SiteSetting.enable_imap to sync N IMAP mailboxes with specific groups. It is a process started in unicorn.conf, and it spawns N threads (one for each multisite connection) and for each database spans another N threads (one for each configured group).

We want this off by default so the process is not started when it does not need to be (e.g. development, test, certain hosting tiers)
2020-07-24 17:09:29 +10:00
Guo Xiang Tan d8cd912769
DEV: Switch to db config to disable advisory locks. 2020-06-15 14:33:41 +08:00
Sam Saffron 57a3d4e0d2
FEATURE: whitelist theme repo mode (experimental)
In some restricted setups all JS payloads need tight control.

This setting bans admins from making changes to JS on the site and
requires all themes be whitelisted to be used.

There are edge cases we still need to work through in this mode
hence this is still not supported in production and experimental.

Use an example like this to enable:

`DISCOURSE_WHITELISTED_THEME_REPOS="https://repo.com/repo.git,https://repo.com/repo2.git"`

By default this feature is not enabled and no changes are made.

One exception is that default theme id was missing a security check
this was added for correctness.
2020-06-03 13:19:57 +10:00
Rafael dos Santos Silva b48299f81c
FEATURE: Add setting to disable automatic CORS rule install in S3 buckets () 2020-05-25 17:09:34 -03:00
Daniel Waterworth 497dc6eaa7 Add a global setting for CDN origin
This is so that, on a multisite cluster, when we handle a CDN request,
the hostname that is requested corresponds to one of the sites -
specifically the default site.
2020-05-12 16:43:40 +01:00
Rafael dos Santos Silva 08e4af6636 FEATURE: Add setting to controle the Expect header on S3 calls
Some providers don't implement the Expect: 100-continue support,
which results in a mismatch in the object signature.

With this settings, users can disable the header and use such providers.
2020-04-30 12:12:00 -03:00
Rafael dos Santos Silva 54f67661ac FEATURE: Option to connect to Redis using SSL 2020-03-05 20:49:05 -03:00
Sam Saffron a8ffb6949c FEATURE: support MaxMind DB downloads using a license key
MaxMind now requires an account with a license key to download files.

Discourse admins can register for such an account at:

https://www.maxmind.com/en/geolite2/signup

License key generation is available in the profile section.

Once registered you can set the license key using `DISCOURSE_MAXMIND_LICENSE_KEY`

This amends it so we unconditionally skip MaxMind DB downloads if no license key exists.
2020-01-03 16:32:48 +11:00
Rafael dos Santos Silva d170812e99
FIX: Use cached MaxMind DB for longer
Don't try to update the IP database as it's gone.

This allows users to rebuild Discourse while we work on a proper
fix / alternative database.
2019-12-30 16:41:23 -03:00
David Taylor e1fcbf4aef DEV: Remove new_version_emails global setting reference
All site settings are now shadowed by global settings, so there is no need to lookup the global setting explicitly
2019-11-20 15:54:09 +00:00
Sam Saffron b4bfc27b19 FEATURE: introduce default application level rate limiting by IP
We have tested rate limiting with admin accounts with block rate limiting for
close to 12 months now on meta.discourse.org.

This has resulted in no degradation of services even to admin accounts that
request a lot of info from the site.

The default of 200 requests a minute and 50 per 10 seconds is very generous.
It simply protects against very aggressive clients.

This setting can be disabled or tweaked using:

DISCOURSE_MAX_REQS_PER_IP_MODE and family.

The only big downside here is in cases when a very large number of users tend
to all come from a single IP.

This can be the case on sites accessing Discourse from an internal network
all sharing the same IP via NAT. Or a misconfigured Discourse that is unable
to resolve IP addresses of users due to proxy mis-configuration.
2019-11-18 15:54:50 +11:00
Sam Saffron ed00f35306 FEATURE: improve performance of anonymous cache
This commit introduces 2 features:

1. DISCOURSE_COMPRESS_ANON_CACHE (true|false, default false): this allows
you to optionally compress the anon cache body entries in Redis, can be
useful for high load sites with Redis that lives on a separate server to
to webs

2. DISCOURSE_ANON_CACHE_STORE_THRESHOLD (default 2), only pop entries into
redis if we observe them more than N times. This avoids situations where
a crawler can walk a big pile of topics and store them all in Redis never
to be used. Our default anon cache time for topics is only 60 seconds. Anon
cache is in place to avoid the "slashdot" effect where a single topic is
hit by 100s of people in one minute.
2019-09-04 17:18:32 +10:00
Régis Hanol 8ef831a1f3 fix the build 2019-08-29 14:17:41 +02:00
Sam Saffron 098f9e8b5b PERF: Run multiple threads for regular job schedules
Under extreme load on large databases certain regular jobs can take quite
a while to run. We need to ensure we never starve a sidekiq from running
mini scheduler, cause without it we are unable to queue stuff such as
heartbeat jobs.
2019-08-29 15:34:36 +10:00
Sam Saffron 8db38de9d7 SECURITY: add rate limiting to anon JS error reporting
This adds a 1 minute rate limit to all JS error reporting per IP. Previously
we would only use the global rate limit.

This also introduces DISCOURSE_ENABLE_JS_ERROR_REPORTING, if it is set to
false then no JS error reporting will be allowed on the site.
2019-08-20 11:29:11 +10:00
Sam Saffron 1f47ed1ea3 PERF: message_bus will be deferred by server when flooded
The message_bus performs a fair amount of work prior to hijacking requests
this change ensures that if there is a situation where the server is flooded
message_bus will inform client to back off for 30 seconds + random(120 secs)

This back-off is ultra cheap and happens very early in the middleware.

It corrects a situation where a flood to message bus could cause the app
to become unresponsive

MessageBus update is here to ensure message_bus gem properly respects
Retry-After header and status 429.

Under normal state this code should never trigger, to disable raise the
value of DISCOURSE_REJECT_MESSAGE_BUS_QUEUE_SECONDS, default is to tell
message bus to go away if we are queueing for 100ms or longer
2019-08-09 17:48:01 +10:00
Sam Saffron 4dcc5f16f1 FEATURE: when under extreme load disable search
The global setting disable_search_queue_threshold
(DISCOURSE_DISABLE_SEARCH_QUEUE_THRESHOLD) which default to 1 second was
added.

This protection ensures that when the application is unable to keep up with
requests it will simply turn off search till it is not backed up.

To disable this protection set this to 0.
2019-07-02 11:22:01 +10:00
Sam Saffron 62141b6316 FEATURE: enable_performance_http_headers for performance diagnostics
This adds support for DISCOURSE_ENABLE_PERFORMANCE_HTTP_HEADERS
when set to `true` this will turn on performance related headers

```text
X-Redis-Calls: 10     # number of redis calls
X-Redis-Time: 1.02    # redis time in seconds
X-Sql-Commands: 102   # number of SQL commands
X-Sql-Time: 1.02      # duration in SQL in seconds
X-Queue-Time: 1.01    # time the request sat in queue (depends on NGINX)
```

To get queue time NGINX must provide: HTTP_X_REQUEST_START

We do not recommend you enable this without thinking, it exposes information
about what your page is doing, usually you would only enable this if you
intend to strip off the headers further down the stream in a proxy
2019-06-05 16:08:11 +10:00
Rafael dos Santos Silva 315a38e0e3 FEATURE: Allow running message_bus in a different redis instance ()
Adds `DISCOURSE_MESSAGE_BUS_REDIS_ENABLED` env var, that when set
to true, will allow Discourse to connect to a different redis
instance for MessageBus needs.

When enabled you can configure the same env vars user for redis,
but prefixed by `MESSAGE_BUS`, eg:

`DISCOURSE_MESSAGE_BUS_REDIS_HOST`
2019-05-28 15:52:43 +10:00
Sam Saffron 6580025af9 FEATURE: add backup directory for mmdb files
This new `DISCOURSE_MAXMIND_BACKUP_PATH` can be used a secondary location
for maxmind db. That way a build machine, for example can cache it on the
host and reuse between builds.

Also per 5bfeef77 added proper error raising for download fails from
dedicated rake task

This also moves "refresh_maxmind_db_during_precompile_days" to a global
setting, it did not make sense in a site setting
2019-05-27 16:51:24 +10:00
Sam Saffron b418857830 DEV: document max_logster_logs in discourse_defaults.conf
This cleans up logster configuration a bit cause we no longer have to
check if we respond_to anything and keeps the logster limit properly
documented

Followup on da578e92
2019-03-22 14:11:40 +11:00
Rishabh ad6ad3f679 DEV: Remove SiteSetting.s3_force_path_style ()
- s3_force_path_style was added as a Minio specific url scheme but it has never been well supported in our code base.
- Our new migrate_to_s3 rake task does not work reliably with path style urls too
- Minio has also added support for virtual style requests i.e the same scheme as AWS S3/DO Spaces so we can rely on that instead of using path style requests.
- Add migration to drop s3_force_path_style from the site_settings table
2019-03-20 14:58:20 +01:00
Sam 8b7a2d1cb7 FEATURE: add setting to bypass sending redis CLIENT commands
Some cloud providers (Google Memorystore) do not support any CLIENT commands

By setting :id to nil in the redis config hash we can avoid these commands.

This adds a special global setting GCE users can enable:
`DISCOURSE_REDIS_SKIP_CLIENT_COMMANDS = true`
2019-01-04 15:08:33 +11:00
Sam 8f35fd4595 FEATURE: remove global settings for redis sentinels
This global setting is never used, configuring Discourse with sentinel is
unsupported.
2019-01-04 15:08:33 +11:00
Sam 70269c7c97 FEATURE: tighter limits on per cluster post rebakes
We have the periodical job that regularly will rebake old posts. This is
used to trickle in update to cooked markdown. The problem is that each rebake
can issue multiple background jobs (post process and pull hotlinked images)

Previously we had no per-cluster limit so cluster running 100s of sites could
flood the sidekiq queue with rebake related jobs.

New system introduces a hard limit of 300 rebakes per 15 minutes across a
cluster to ensure the sidekiq job is not dominated by this.

We also reduced `rebake_old_posts_count` to 80, which is a safer default.
2019-01-04 09:24:46 +11:00
Rishabh a6c589d882 FEATURE: Add custom S3 Endpoint and DigitalOcean Spaces/Minio support for Backups ()
- Add custom S3 Endpoints and DigitalOcean Spaces support
- Add Minio support using 'force_path_style' option and fix uploads to custom endpoint
2018-07-16 14:44:55 +10:00
Sam 87fabdc2f3 FIX: correct pool reaper
This removes a freedom patch and replaces with a custom reaper thread
it also captures an issue where reaper would fail when connections where
empty
2018-06-14 18:22:02 +10:00
Sam ded84a4b58 PERF: improve performance once logged in rate limiter hits
If "logged in" is being forced anonymous on certain routes, trigger
the protection for any requests that spend 50ms queueing

This means that ...

1. You need to trip it by having 3 requests take longer than 1 second in 10 second interval
2. Once tripped, if your route is still spending 50m queueuing it will continue to be protected

This means that site will continue to function with almost no delays while it is scaling up to handle the new load
2018-04-23 11:55:25 +10:00
Sam 59cd7894d9 FEATURE: if site is under extreme load show anon view
If a particular path is being hit extremely hard by logged on users,
revert to anonymous cached view.

This will only come into effect if 3 requests queue for longer than 2 seconds
on a *single* path.

This can happen if a URL is shared with the entire forum base and everyone
is logged on
2018-04-18 16:58:57 +10:00
Blake Erickson 720dd2432e remove change from descourse_defaults.conf 2018-04-10 14:27:03 -06:00
Blake Erickson 0337a8f6d5 ensure correct '/'s for relative_url_root in route file 2018-04-10 14:24:29 -06:00
Guo Xiang Tan a89f3160a5 Add new config to ensure backup/restore connects to PG directly.
* In `pg_dump` 10.3+ and 9.5.12+, in
  it does a `SELECT pg_catalog.set_config('search_path', '', false)`
  which changes the state of the current connection. This is known
  to be problematic with Pgbouncer which reuses connections. As such,
  we'll always try to connect directly to PG directly during
  the backup/restore process.
2018-03-09 10:28:03 +08:00
Sam f0d5f83424 FEATURE: limit assets less that non asset paths
By default assets can be requested up to 200 times per 10 seconds
from the app, this includes CSS and avatars
2018-03-06 15:20:39 +11:00
Sam f26ff290c3 FEATURE: Shorten setting name to max_reqs
So it is consistent with other settings
2018-01-22 13:18:30 +11:00