Commit Graph

20 Commits

Author SHA1 Message Date
Alan Guo Xiang Tan 9812407f76
FIX: Redo Sidekiq monitoring to restart stuck sidekiq processes (#30198)
This commit reimplements how we monitor Sidekiq processes that are
forked from the Unicorn master process. Prior to this change, we rely on
`Jobs::Heartbeat` to enqueue a `Jobs::RunHeartbeat` job every 3 minutes.
The `Jobs::RunHeartbeat` job then sets a Redis key with a timestamp. In
the Unicorn master process, we then fetch the timestamp that has been set
by the job from Redis every 30 minutes. If the timestamp has not been
updated for more than 30 minutes, we restart the Sidekiq process. The
fundamental flaw with this approach is that it fails to consider
deployments with multiple hosts and multiple Sidekiq processes. A
sidekiq process on a host may be in a bad state but the heartbeat check
will not restart the process because the `Jobs::RunHeartbeat` job is
still being executed by the working Sidekiq processes on other hosts.

In order to properly ensure that stuck Sidekiq processs are restarted,
we now rely on the [Sidekiq::ProcessSet](https://github.com/sidekiq/sidekiq/wiki/API#processes)
API that is supported by Sidekiq. The API provides us with "near real-time (updated every 5 sec)
info about the current set of Sidekiq processes running". The API
provides useful information like the hostname, pid and also when Sidekiq
last did its own heartbeat check. With that information, we can easily
determine if a Sidekiq process needs to be restarted from the Unicorn
master process.
2024-12-18 12:48:50 +08:00
Jarek Radosz 694b5f108b
DEV: Fix various rubocop lints (#24749)
These (21 + 3 from previous PRs) are soon to be enabled in rubocop-discourse:

Capybara/VisibilityMatcher
Lint/DeprecatedOpenSSLConstant
Lint/DisjunctiveAssignmentInConstructor
Lint/EmptyConditionalBody
Lint/EmptyEnsure
Lint/LiteralInInterpolation
Lint/NonLocalExitFromIterator
Lint/ParenthesesAsGroupedExpression
Lint/RedundantCopDisableDirective
Lint/RedundantRequireStatement
Lint/RedundantSafeNavigation
Lint/RedundantStringCoercion
Lint/RedundantWithIndex
Lint/RedundantWithObject
Lint/SafeNavigationChain
Lint/SafeNavigationConsistency
Lint/SelfAssignment
Lint/UnreachableCode
Lint/UselessMethodDefinition
Lint/Void

Previous PRs:
Lint/ShadowedArgument
Lint/DuplicateMethods
Lint/BooleanSymbol
RSpec/SpecFilePathSuffix
2023-12-06 23:25:00 +01:00
David Taylor 6417173082
DEV: Apply syntax_tree formatting to `lib/*` 2023-01-09 12:10:19 +00:00
Joffrey JAFFEUX 0d3d2c43a0
DEV: s/\$redis/Discourse\.redis (#8431)
This commit also adds a rubocop rule to prevent global variables.
2019-12-03 10:05:53 +01:00
Sam Saffron 30990006a9 DEV: enable frozen string literal on all files
This reduces chances of errors where consumers of strings mutate inputs
and reduces memory usage of the app.

Test suite passes now, but there may be some stuff left, so we will run
a few sites on a branch prior to merging
2019-05-13 09:31:32 +08:00
Guo Xiang Tan cff108762a Fix deadblock in 615a22a579. 2019-02-20 10:25:43 +08:00
Guo Xiang Tan 615a22a579 FIX: Race condition in SidekiqPauser.
This was showing up in our tests.
2019-02-20 09:52:26 +08:00
Guo Xiang Tan adbc87857e DEV: Fix randomly failing test.
Even if a thread is alive in the loop check, it may be dead by the
time `Thread#wakeup` is called on it.
2019-02-19 13:34:52 +08:00
Guo Xiang Tan bf21ebaecc DEV: Allow custom value when pausing sidekiq to aid in debugging.
Sometimes, it is useful to know what caused Sidekiq to be paused.
2019-02-19 10:55:53 +08:00
Sam 74d2d4f658 FEATURE: add APIS for unpausing all sites
This adjusts 53d592ad by @tgxworld

- Adds Sidekiq.upause_all! to unpause all sites
- Adds Sidekiq.paused_dbs to list dbs that are currently paused
- Handles some edge cases where unpause thread could extend expiry on
sites that were unpaused from a different process
- Ensures tests always terminates background thread used for pause
keepalive
2019-02-14 13:34:20 +11:00
Guo Xiang Tan 53d592ad3b FIX: Add multisite support to Sidekiq::Pausable. (#6960)
Having a global Sidekiq pause switch is problematic because a site in
the cluster can pause Sidekiq for the entire cluster.
2019-02-14 12:22:40 +11:00
Sam 44cf3cf975 FIX: queue heartbeats in readonly modes
If sidekiq is paused or Discourse is in readonly continue to queue
heartbeats

If we do not do that then a master process can end up reaping sidekiq
workers and causing various badness

This also impacts restore which can do weird stuff TM in cases like this
2018-08-29 12:36:59 +10:00
Guo Xiang Tan 4163f9e61e DEV: Better clean up for PostgreSQL failover test. 2018-07-10 09:53:25 +08:00
Sam 361fbfa518 FEATURE: raise an event when a sidekiq job runs 2017-10-23 17:30:17 +11:00
Régis Hanol fbacaab2fc FIX: disable scheduled jobs when in readonly mode 2016-01-11 18:31:28 +01:00
Sam 6d9a88c33b FIX: hanging specs 2014-08-19 20:56:25 +10:00
Sam 35ea1274e2 FIX: simplify, use our redis instead 2014-08-19 15:50:17 +10:00
Sam 997ab7a770 FIX: signaling seems flaky, simply kill thread 2014-08-19 14:59:40 +10:00
Sam cb686792df FIX: add safety so sidekiq can no longer be paused indefinitely
If the process pausing sidekiq dies sidekiq will come out of pause mode
2014-08-19 14:04:58 +10:00
Régis Hanol 90c00fcaba pausable sidekiq module 2014-02-13 13:31:13 -08:00