Commit Graph

80 Commits

Author SHA1 Message Date
Sam 32393f72b1
PERF: backoff background requests when overloaded (#10888)
When the server gets overloaded and lots of requests start queuing server
will attempt to shed load by returning 429 errors on background requests.

The client can flag a request as background by setting the header:
`Discourse-Background` to `true`

Out-of-the-box we shed load when the queue time goes above 0.5 seconds.

The only request we shed at the moment is the request to load up a new post
when someone posts to a topic.

We can extend this as we go with a more general pattern on the client.

Previous to this change, rate limiting would "break" the post stream which
would make suggested topics vanish and users would have to scroll the page
to see more posts in the topic.

Server needs this protection for cases where tons of clients are navigated
to a topic and a new post is made. This can lead to a self inflicted denial
of service if enough clients are viewing the topic.

Due to the internal security design of Discourse it is hard for a large
number of clients to share a channel where we would pass the full post body
via the message bus.

It also renames (and deprecates) triggerNewPostInStream to triggerNewPostsInStream

This allows us to load a batch of new posts cleanly, so the controller can
keep track of a backlog

Co-authored-by: Joffrey JAFFEUX <j.jaffeux@gmail.com>
2020-10-13 16:56:03 +11:00
Sam 120fa8ad2f
PERF: Introduce absolute limit of digests per 30 minutes (#10845)
To avoid blocking the sidekiq queue a limit of 10,000 digests per 30 minutes
is introduced.

This acts as a safety measure that makes sure we don't keep pouring oil on
a fire.

On multisites it is recommended to set the number way lower so sites do not
dominate the backlog. A reasonable default for multisites may be 100-500.

This can be controlled with the environment var

DISCOURSE_MAX_DIGESTS_ENQUEUED_PER_30_MINS_PER_SITE
2020-10-07 17:30:15 +11:00
Osama Sayegh a92d88747e
DEV: Add ENV variable for enabling MiniProfiler snapshots (#10690)
* DEV: Add ENV variable for enabling MiniProfiler snapshots

* MiniProfiler is not loaded in test env
2020-09-17 18:18:35 +03:00
Krzysztof Kotlarek e0d9232259
FIX: use allowlist and blocklist terminology (#10209)
This is a PR of the renaming whitelist to allowlist and blacklist to the blocklist.
2020-07-27 10:23:54 +10:00
Martin Brennan a9905ef7e5
FIX: Add enable_email_sync_demon global variable and disable EmailSync demon by default (#10304)
Demon::EmailSync is used in conjunction with the SiteSetting.enable_imap to sync N IMAP mailboxes with specific groups. It is a process started in unicorn.conf, and it spawns N threads (one for each multisite connection) and for each database spans another N threads (one for each configured group).

We want this off by default so the process is not started when it does not need to be (e.g. development, test, certain hosting tiers)
2020-07-24 17:09:29 +10:00
Guo Xiang Tan d8cd912769
DEV: Switch to db config to disable advisory locks. 2020-06-15 14:33:41 +08:00
Sam Saffron 57a3d4e0d2
FEATURE: whitelist theme repo mode (experimental)
In some restricted setups all JS payloads need tight control.

This setting bans admins from making changes to JS on the site and
requires all themes be whitelisted to be used.

There are edge cases we still need to work through in this mode
hence this is still not supported in production and experimental.

Use an example like this to enable:

`DISCOURSE_WHITELISTED_THEME_REPOS="https://repo.com/repo.git,https://repo.com/repo2.git"`

By default this feature is not enabled and no changes are made.

One exception is that default theme id was missing a security check
this was added for correctness.
2020-06-03 13:19:57 +10:00
Rafael dos Santos Silva b48299f81c
FEATURE: Add setting to disable automatic CORS rule install in S3 buckets (#9872) 2020-05-25 17:09:34 -03:00
Daniel Waterworth 497dc6eaa7 Add a global setting for CDN origin
This is so that, on a multisite cluster, when we handle a CDN request,
the hostname that is requested corresponds to one of the sites -
specifically the default site.
2020-05-12 16:43:40 +01:00
Rafael dos Santos Silva 08e4af6636 FEATURE: Add setting to controle the Expect header on S3 calls
Some providers don't implement the Expect: 100-continue support,
which results in a mismatch in the object signature.

With this settings, users can disable the header and use such providers.
2020-04-30 12:12:00 -03:00
Rafael dos Santos Silva 54f67661ac FEATURE: Option to connect to Redis using SSL 2020-03-05 20:49:05 -03:00
Sam Saffron a8ffb6949c FEATURE: support MaxMind DB downloads using a license key
MaxMind now requires an account with a license key to download files.

Discourse admins can register for such an account at:

https://www.maxmind.com/en/geolite2/signup

License key generation is available in the profile section.

Once registered you can set the license key using `DISCOURSE_MAXMIND_LICENSE_KEY`

This amends it so we unconditionally skip MaxMind DB downloads if no license key exists.
2020-01-03 16:32:48 +11:00
Rafael dos Santos Silva d170812e99
FIX: Use cached MaxMind DB for longer
Don't try to update the IP database as it's gone.

This allows users to rebuild Discourse while we work on a proper
fix / alternative database.
2019-12-30 16:41:23 -03:00
David Taylor e1fcbf4aef DEV: Remove new_version_emails global setting reference
All site settings are now shadowed by global settings, so there is no need to lookup the global setting explicitly
2019-11-20 15:54:09 +00:00
Sam Saffron b4bfc27b19 FEATURE: introduce default application level rate limiting by IP
We have tested rate limiting with admin accounts with block rate limiting for
close to 12 months now on meta.discourse.org.

This has resulted in no degradation of services even to admin accounts that
request a lot of info from the site.

The default of 200 requests a minute and 50 per 10 seconds is very generous.
It simply protects against very aggressive clients.

This setting can be disabled or tweaked using:

DISCOURSE_MAX_REQS_PER_IP_MODE and family.

The only big downside here is in cases when a very large number of users tend
to all come from a single IP.

This can be the case on sites accessing Discourse from an internal network
all sharing the same IP via NAT. Or a misconfigured Discourse that is unable
to resolve IP addresses of users due to proxy mis-configuration.
2019-11-18 15:54:50 +11:00
Sam Saffron ed00f35306 FEATURE: improve performance of anonymous cache
This commit introduces 2 features:

1. DISCOURSE_COMPRESS_ANON_CACHE (true|false, default false): this allows
you to optionally compress the anon cache body entries in Redis, can be
useful for high load sites with Redis that lives on a separate server to
to webs

2. DISCOURSE_ANON_CACHE_STORE_THRESHOLD (default 2), only pop entries into
redis if we observe them more than N times. This avoids situations where
a crawler can walk a big pile of topics and store them all in Redis never
to be used. Our default anon cache time for topics is only 60 seconds. Anon
cache is in place to avoid the "slashdot" effect where a single topic is
hit by 100s of people in one minute.
2019-09-04 17:18:32 +10:00
Régis Hanol 8ef831a1f3 fix the build 2019-08-29 14:17:41 +02:00
Sam Saffron 098f9e8b5b PERF: Run multiple threads for regular job schedules
Under extreme load on large databases certain regular jobs can take quite
a while to run. We need to ensure we never starve a sidekiq from running
mini scheduler, cause without it we are unable to queue stuff such as
heartbeat jobs.
2019-08-29 15:34:36 +10:00
Sam Saffron 8db38de9d7 SECURITY: add rate limiting to anon JS error reporting
This adds a 1 minute rate limit to all JS error reporting per IP. Previously
we would only use the global rate limit.

This also introduces DISCOURSE_ENABLE_JS_ERROR_REPORTING, if it is set to
false then no JS error reporting will be allowed on the site.
2019-08-20 11:29:11 +10:00
Sam Saffron 1f47ed1ea3 PERF: message_bus will be deferred by server when flooded
The message_bus performs a fair amount of work prior to hijacking requests
this change ensures that if there is a situation where the server is flooded
message_bus will inform client to back off for 30 seconds + random(120 secs)

This back-off is ultra cheap and happens very early in the middleware.

It corrects a situation where a flood to message bus could cause the app
to become unresponsive

MessageBus update is here to ensure message_bus gem properly respects
Retry-After header and status 429.

Under normal state this code should never trigger, to disable raise the
value of DISCOURSE_REJECT_MESSAGE_BUS_QUEUE_SECONDS, default is to tell
message bus to go away if we are queueing for 100ms or longer
2019-08-09 17:48:01 +10:00
Sam Saffron 4dcc5f16f1 FEATURE: when under extreme load disable search
The global setting disable_search_queue_threshold
(DISCOURSE_DISABLE_SEARCH_QUEUE_THRESHOLD) which default to 1 second was
added.

This protection ensures that when the application is unable to keep up with
requests it will simply turn off search till it is not backed up.

To disable this protection set this to 0.
2019-07-02 11:22:01 +10:00
Sam Saffron 62141b6316 FEATURE: enable_performance_http_headers for performance diagnostics
This adds support for DISCOURSE_ENABLE_PERFORMANCE_HTTP_HEADERS
when set to `true` this will turn on performance related headers

```text
X-Redis-Calls: 10     # number of redis calls
X-Redis-Time: 1.02    # redis time in seconds
X-Sql-Commands: 102   # number of SQL commands
X-Sql-Time: 1.02      # duration in SQL in seconds
X-Queue-Time: 1.01    # time the request sat in queue (depends on NGINX)
```

To get queue time NGINX must provide: HTTP_X_REQUEST_START

We do not recommend you enable this without thinking, it exposes information
about what your page is doing, usually you would only enable this if you
intend to strip off the headers further down the stream in a proxy
2019-06-05 16:08:11 +10:00
Rafael dos Santos Silva 315a38e0e3 FEATURE: Allow running message_bus in a different redis instance (#7616)
Adds `DISCOURSE_MESSAGE_BUS_REDIS_ENABLED` env var, that when set
to true, will allow Discourse to connect to a different redis
instance for MessageBus needs.

When enabled you can configure the same env vars user for redis,
but prefixed by `MESSAGE_BUS`, eg:

`DISCOURSE_MESSAGE_BUS_REDIS_HOST`
2019-05-28 15:52:43 +10:00
Sam Saffron 6580025af9 FEATURE: add backup directory for mmdb files
This new `DISCOURSE_MAXMIND_BACKUP_PATH` can be used a secondary location
for maxmind db. That way a build machine, for example can cache it on the
host and reuse between builds.

Also per 5bfeef77 added proper error raising for download fails from
dedicated rake task

This also moves "refresh_maxmind_db_during_precompile_days" to a global
setting, it did not make sense in a site setting
2019-05-27 16:51:24 +10:00
Sam Saffron b418857830 DEV: document max_logster_logs in discourse_defaults.conf
This cleans up logster configuration a bit cause we no longer have to
check if we respond_to anything and keeps the logster limit properly
documented

Followup on da578e92
2019-03-22 14:11:40 +11:00
Rishabh ad6ad3f679 DEV: Remove SiteSetting.s3_force_path_style (#7210)
- s3_force_path_style was added as a Minio specific url scheme but it has never been well supported in our code base.
- Our new migrate_to_s3 rake task does not work reliably with path style urls too
- Minio has also added support for virtual style requests i.e the same scheme as AWS S3/DO Spaces so we can rely on that instead of using path style requests.
- Add migration to drop s3_force_path_style from the site_settings table
2019-03-20 14:58:20 +01:00
Sam 8b7a2d1cb7 FEATURE: add setting to bypass sending redis CLIENT commands
Some cloud providers (Google Memorystore) do not support any CLIENT commands

By setting :id to nil in the redis config hash we can avoid these commands.

This adds a special global setting GCE users can enable:
`DISCOURSE_REDIS_SKIP_CLIENT_COMMANDS = true`
2019-01-04 15:08:33 +11:00
Sam 8f35fd4595 FEATURE: remove global settings for redis sentinels
This global setting is never used, configuring Discourse with sentinel is
unsupported.
2019-01-04 15:08:33 +11:00
Sam 70269c7c97 FEATURE: tighter limits on per cluster post rebakes
We have the periodical job that regularly will rebake old posts. This is
used to trickle in update to cooked markdown. The problem is that each rebake
can issue multiple background jobs (post process and pull hotlinked images)

Previously we had no per-cluster limit so cluster running 100s of sites could
flood the sidekiq queue with rebake related jobs.

New system introduces a hard limit of 300 rebakes per 15 minutes across a
cluster to ensure the sidekiq job is not dominated by this.

We also reduced `rebake_old_posts_count` to 80, which is a safer default.
2019-01-04 09:24:46 +11:00
Rishabh a6c589d882 FEATURE: Add custom S3 Endpoint and DigitalOcean Spaces/Minio support for Backups (#6045)
- Add custom S3 Endpoints and DigitalOcean Spaces support
- Add Minio support using 'force_path_style' option and fix uploads to custom endpoint
2018-07-16 14:44:55 +10:00
Sam 87fabdc2f3 FIX: correct pool reaper
This removes a freedom patch and replaces with a custom reaper thread
it also captures an issue where reaper would fail when connections where
empty
2018-06-14 18:22:02 +10:00
Sam ded84a4b58 PERF: improve performance once logged in rate limiter hits
If "logged in" is being forced anonymous on certain routes, trigger
the protection for any requests that spend 50ms queueing

This means that ...

1. You need to trip it by having 3 requests take longer than 1 second in 10 second interval
2. Once tripped, if your route is still spending 50m queueuing it will continue to be protected

This means that site will continue to function with almost no delays while it is scaling up to handle the new load
2018-04-23 11:55:25 +10:00
Sam 59cd7894d9 FEATURE: if site is under extreme load show anon view
If a particular path is being hit extremely hard by logged on users,
revert to anonymous cached view.

This will only come into effect if 3 requests queue for longer than 2 seconds
on a *single* path.

This can happen if a URL is shared with the entire forum base and everyone
is logged on
2018-04-18 16:58:57 +10:00
Blake Erickson 720dd2432e remove change from descourse_defaults.conf 2018-04-10 14:27:03 -06:00
Blake Erickson 0337a8f6d5 ensure correct '/'s for relative_url_root in route file 2018-04-10 14:24:29 -06:00
Guo Xiang Tan a89f3160a5 Add new config to ensure backup/restore connects to PG directly.
* In `pg_dump` 10.3+ and 9.5.12+, in
  it does a `SELECT pg_catalog.set_config('search_path', '', false)`
  which changes the state of the current connection. This is known
  to be problematic with Pgbouncer which reuses connections. As such,
  we'll always try to connect directly to PG directly during
  the backup/restore process.
2018-03-09 10:28:03 +08:00
Sam f0d5f83424 FEATURE: limit assets less that non asset paths
By default assets can be requested up to 200 times per 10 seconds
from the app, this includes CSS and avatars
2018-03-06 15:20:39 +11:00
Sam f26ff290c3 FEATURE: Shorten setting name to max_reqs
So it is consistent with other settings
2018-01-22 13:18:30 +11:00
Sam cecd7d0d07 FEATURE: global rate limiter can bypass local IPs 2018-01-08 08:39:17 +11:00
Sam 4986ebcf24 FEATURE: optional default off global per ip rate limiter 2017-12-11 17:52:57 +11:00
Sam 68d3c2c74f FEATURE: add global rate limiter for admin api 60 per minute
Also move configuration of admin and user api rate limiting into global
settings. This is not intended to be configurable per site
2017-12-11 11:07:22 +11:00
Guo Xiang Tan 6c04eb911d Fix typo. 2017-10-17 12:34:49 +08:00
Guo Xiang Tan b54eb8f53c FIX: Set PG `connect_timeout` to 5 seconds.
* 30 seconds is alittle too long.
2017-10-17 12:32:41 +08:00
Sam 70bb2aa426 FEATURE: allow specifying s3 config via globals
This refactors handling of s3 so it can be specified via GlobalSetting

This means that in a multisite environment you can configure s3 uploads
without actual sites knowing credentials in s3

It is a critical setting for situations where assets are mirrored to s3.
2017-10-06 16:20:01 +11:00
Sam c106ca6778 FEATURE: fallback asset path for multi host setups 2017-03-20 15:59:17 -04:00
Sam ff49f72ad9 FEATURE: per client user tokens
Revamped system for managing authentication tokens.

- Every user has 1 token per client (web browser)
- Tokens are rotated every 10 minutes

New system migrates the old tokens to "legacy" tokens,
so users still remain logged on.

Also introduces weekly job to expire old auth tokens.
2017-02-07 09:22:16 -05:00
Guo Xiang Tan 89b1998174 Add default port for redis_slave. 2016-03-11 15:07:07 +08:00
Guo Xiang Tan c07c474575 FEATURE: Master-Slave Redis configuration with fallback and switch over. 2016-03-11 12:18:58 +08:00
Guo Xiang Tan 46589a1a0c FEATURE: AR adapter to failover to a replica DB server. 2016-02-05 08:51:10 +08:00
Sam Saffron 209b022385 PERF: cut down on memory usage allowed to redis
This limits the amount of backlog message bus channels can have.
2016-02-04 13:58:38 +11:00