Commit Graph

851 Commits

Author SHA1 Message Date
Alan Guo Xiang Tan 9812407f76
FIX: Redo Sidekiq monitoring to restart stuck sidekiq processes (#30198)
This commit reimplements how we monitor Sidekiq processes that are
forked from the Unicorn master process. Prior to this change, we rely on
`Jobs::Heartbeat` to enqueue a `Jobs::RunHeartbeat` job every 3 minutes.
The `Jobs::RunHeartbeat` job then sets a Redis key with a timestamp. In
the Unicorn master process, we then fetch the timestamp that has been set
by the job from Redis every 30 minutes. If the timestamp has not been
updated for more than 30 minutes, we restart the Sidekiq process. The
fundamental flaw with this approach is that it fails to consider
deployments with multiple hosts and multiple Sidekiq processes. A
sidekiq process on a host may be in a bad state but the heartbeat check
will not restart the process because the `Jobs::RunHeartbeat` job is
still being executed by the working Sidekiq processes on other hosts.

In order to properly ensure that stuck Sidekiq processs are restarted,
we now rely on the [Sidekiq::ProcessSet](https://github.com/sidekiq/sidekiq/wiki/API#processes)
API that is supported by Sidekiq. The API provides us with "near real-time (updated every 5 sec)
info about the current set of Sidekiq processes running". The API
provides useful information like the hostname, pid and also when Sidekiq
last did its own heartbeat check. With that information, we can easily
determine if a Sidekiq process needs to be restarted from the Unicorn
master process.
2024-12-18 12:48:50 +08:00
Kelv 04ba5baec0
DEV: ensure rebaking works even when some users have inconsistent data (#30261)
* DEV: add db consistency check for UserEmail

* DEV: add db consistency check for UserAvatar

* DEV: ignore inconsistent data related to user avatars when deciding whether to rebake old posts


Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>

---------

Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
2024-12-16 19:48:25 +08:00
Alan Guo Xiang Tan f35128c6ed
DEV: Fix broken sidekiq logging due to eeb01ea0de (#30199) 2024-12-10 17:01:25 +08:00
Alan Guo Xiang Tan eeb01ea0de
DEV: Remove unnecessary thread in `Jobs::Base::JobInstrumenter` take 2 (#30195)
This reverts commit 766ff723f8.

Ensure that we create the sidekiq log file first before opening it for
logging. This avoids any issue of the log file not being present when we
initialize an instance of the `Logger`.
2024-12-10 12:44:56 +08:00
Juan David Martínez Cubillos 08440b0035
DEV: Add tl3_custom_promotions plugin modifier to tl3_promotions.rb (#29834)
* DEV: Add tl3_custom_promotions plugin modifier to tl3_promotions.rb

* added tests

* added tests for demotions

* changed argument order in test
2024-11-22 15:28:43 -05:00
Bianca Nenciu 250a145361
DEV: Fix undefined variable (#29876)
Follow up to commit 429cf656e7.
2024-11-21 20:23:20 +02:00
Bianca Nenciu 429cf656e7
FIX: Use FinalDestination::HTTP to push notifications (#29858)
Sometimes `Jobs::PushNotification` gets stuck, probably because of the
network call. This commit replaces `Excon` with `FinalDestination::HTTP`
which is safer.
2024-11-21 14:11:51 +11:00
Angus McLeod ec7de0fd68
Require permitted scopes when registering a client (#29718) 2024-11-19 15:28:04 -05:00
Jarek Radosz 7ab4df9a04
DEV: Fix linting in notify_category_change_spec (#29175) 2024-10-11 19:55:33 +02:00
Yuvaraj J 65a1e149ad
FIX: Notify mailing list subscribers on category change (#28811)
cf. https://meta.discourse.org/t/email-notifications-dont-get-sent-on-category-change-for-mailing-list-mode-users/308096
2024-10-11 14:47:39 +02:00
Alan Guo Xiang Tan ed6c9d1545
DEV: Call Discourse.redis.flushdb after the end of each test (#29117)
There have been too many flaky tests as a result of leaking state in
Redis so it is easier to resolve them by ensuring we flush Redis'
database.

Locally on my machine, calling `Discourse.redis.flushdb` takes around
0.1ms which means this change will have very little impact on test
runtimes.
2024-10-09 07:19:31 +08:00
Ted Johansson e60876ce49
FIX: Appropriately handle uninstalled problem checks (#28771)
When running checks, we look to the existing problem check trackers and try to grab their ProblemCheck classes.

In some cases this is no longer in the problem check repository, e.g. when the check was part of a plugin that has been uninstalled.

In the case where the check was scheduled, this would lead to an error in one of the jobs
2024-09-18 10:11:52 +08:00
Ted Johansson 776b4ec8e2
DEV: Remove old problem check system - Part 1 (#28772)
We're now using the new, database-backed problem check system. This PR removes parts of the old, Redis-backed system that is now defunct.
2024-09-06 17:00:25 +08:00
Osama Sayegh 280adda09c
FEATURE: Support designating multiple groups as mods on category (#28655)
Currently, categories support designating only 1 group as a moderation group on the category. This commit removes the one group limitation and makes it possible to designate multiple groups as mods on a category.

Internal topic: t/124648.
2024-09-04 04:38:46 +03:00
Bianca Nenciu 1f206349fd
DEV: Split slow test in multiple smaller tests (#28646)
* DEV: Split slow test in multiple smaller tests

This might be faster because the  smaller chunks of the test may run in
parallel.

* DEV: Fabricate reviewables only once
2024-08-30 14:47:29 +10:00
Martin Brennan daa06a1c00
DEV: Improve external upload debugging (#28627)
* Do not delete created external upload stubs for 2 days
  instead of 1 hour if enable_upload_debug_mode is true,
  this aids with server-side debugging.
* If using an API call, return the detailed error message
  if enable_upload_debug_mode is true. In this case the user
  is not using the UI, so a more detailed message is appropriate.
* Add a prefix to log messages in ExternalUploadHelpers, to
  make it easier to find these in logster.
2024-08-30 10:25:04 +10:00
David Battersby 0954ae70a6
FEATURE: add delay to native push notifications (#28314)
This change ensures native push notifications respect the site setting for push_notification_time_window_mins. Previously only web push notifications would account for the delay, now we can bring more consistency between Discourse in browser vs Hub, by applying the same delay strategy to both forms of push notifications.
2024-08-13 12:12:05 +04:00
David Battersby 6ec8728ebf
DEV: refactor live notifications setting in user preferences (#28145)
This change is mainly a refactor of the desktop notifications service to improve readability and have standardised values for tracking state for current user in regards to the Notification API and Push API.

Also improves readability when handling push notification jobs, especially in scenarios where the push_notification_time_window_mins site setting is set to 0, which will allow sending push notifications instantly.
2024-08-02 17:25:15 +04:00
Guhyoun Nam a01be4150a
DEV: Specs for redeliver_web_hook_events job (#27779)
It is a PR to add a spec for checking redeliver_web_hook_events job not to delete webhook event in process.
2024-07-09 10:35:10 -05:00
Guhyoun Nam 784c04ea81
FEATURE: Add Mechanism to redeliver all failed webhook events (#27609)
Background:
In order to redrive failed webhook events, an operator has to go through and click on each. This PR is adding a mechanism to retry all failed events to help resolve issues quickly once the underlying failure has been resolved.

What is the change?:
Previously, we had to redeliver each webhook event. This merge is adding a 'Redeliver Failed' button next to the webhook event filter to redeliver all failed events. If there is no failed webhook events to redeliver, 'Redeliver Failed' gets disabled. If you click it, a window pops up to confirm the operator. Failed webhook events will be added to the queue and webhook event list will show the redelivering progress. Every minute, a job will be ran to go through 20 events to redeliver. Every hour, a job will cleanup the redelivering events which have been stored more than 8 hours.
2024-07-08 15:43:16 -05:00
Keegan George ea58140032
DEV: Remove summarization code (#27373) 2024-07-02 08:51:47 -07:00
Alan Guo Xiang Tan adc824a9bc
FIX: `Jobs::EnsureS3UploadsExistence` broken for multisite (#27401)
This is a follow-up to 8cf4ed5f88.
2024-06-10 16:26:39 +08:00
Alan Guo Xiang Tan 8cf4ed5f88
DEV: Introduce hidden `s3_inventory_bucket` site setting (#27304)
This commit introduces a hidden `s3_inventory_bucket` site setting which
replaces the `enable_s3_inventory` and `s3_configure_inventory_policy`
site setting.

The reason `enable_s3_inventory` and `s3_configure_inventory_policy`
site settings are removed is because this feature has technically been
broken since it was introduced. When the `enable_s3_inventory` feature
is turned on, the app will because configure a daily inventory policy for the
`s3_upload_bucket` bucket and store the inventories under a prefix in
the bucket. The problem here is that once the inventories are created,
there is nothing cleaning up all these inventories so whoever that has
enabled this feature would have been paying the cost of storing a whole
bunch of inventory files which are never used. Given that we have not
received any complains about inventory files inflating S3 storage costs,
we think that it is very likely that this feature is no longer being
used and we are looking to drop support for this feature in the not too
distance future.

For now, we will still support a hidden `s3_inventory_bucket` site
setting which site administrators can configure via the
`DISCOURSE_S3_INVENTORY_BUCKET` env.
2024-06-10 13:16:00 +08:00
Ted Johansson 69205cb1e5
DEV: Catch missing translations during test runs (#26258)
This configuration makes it so that a missing translation will raise an error during test execution. Better discover there than after deploy.
2024-05-24 22:15:53 +08:00
Alan Guo Xiang Tan df16ab0758
FIX: `S3Inventory` to ignore files older than last backup restore date (#27166)
This commit updates `S3Inventory#files` to ignore S3 inventory files
which have a `last_modified` timestamp which are not at least 2 days
older than `BackupMetadata.last_restore_date` timestamp.

This check was previously only in `Jobs::EnsureS3UploadsExistence` but
`S3Inventory` can also be used via Rake tasks so this protection needs
to be in `S3Inventory` and not in the scheduled job.
2024-05-24 10:54:06 +08:00
Ted Johansson 3137e60653
DEV: Database backed admin notices (#26192)
This PR introduces a basic AdminNotice model to store these notices. Admin notices are categorized by their source/type (currently only notices from problem check.) They also have a priority.
2024-05-23 09:29:08 +08:00
Régis Hanol 958437e7dd
FIX: send activity summaries based on "last seen" (#27035)
instead of "last emailed" so that people getting email notifications (from a watched topic for example) also get the activity summaries.

Context - https://meta.discourse.org/t/activity-summary-not-sent-if-other-emails-are-sent/293040

Internal Ref - t/125582

Improvement over 95885645d9
2024-05-22 10:23:03 +02:00
Isaac Janzen ede0fa5802
DEV: Update bulk-invite logs and PM template (#27057)
# Preview

<img width="754" alt="Screenshot 2024-05-17 at 8 50 03 AM" src="https://github.com/discourse/discourse/assets/50783505/6710234f-0195-42be-b70e-9d57ba48bb4a">


# New Logs

```
[2024-05-17 08:49:54 -0600] Invalid User Field 'backend name' for 'foobarbing@gmail.com'
[2024-05-17 08:49:54 -0600] Invalid Email 'test
[2024-05-17 08:49:54 -0600] Invalid Email 'this@$@**.com
```
2024-05-17 12:21:21 -06:00
Régis Hanol e04ac5e2d8
FIX: display validation errors when converting topics (#27064)
When converting a PM to a public topic (and vice versa), if there was a validation error (like a topic already used, or a tag required or not allowed) the error message wasn't bubbled up nor shown to the user.

This fix ensures we properly stop the conversion whenever a validation error happens and bubble up the errors back to the user so they can be informed.

Internal ref - t/128795
2024-05-17 16:36:25 +02:00
Mark VanLandingham 9264479c27
DEV: Add modifier for webhook event header generation (#27054) 2024-05-17 09:33:39 -05:00
Natalie Tay 777b8f6d51
Revert "FIX: send activity summaries based on "last seen"" (#27029)
This reverts commit 95885645d9.
2024-05-15 14:09:29 +08:00
Régis Hanol 95885645d9 FIX: send activity summaries based on "last seen"
instead of "last emailed" so that people getting email notifications (from a watched topic for example) also get the activity summaries.

Context - https://meta.discourse.org/t/activity-summary-not-sent-if-other-emails-are-sent/293040

Internal Ref - t//125582
2024-05-06 15:22:52 +02:00
Natalie Tay 00a9369ca2
FIX: Move user reindexing into a job (#26753)
In a large forum with millions of users and millions of user_fields
updating the list of dropdown user field options will result in a
502 now due to the large number of fields.

This commit moves the indexing into a job.
2024-04-25 20:58:34 +08:00
Vinoth Kannan 859b55366f
DEV: don't send moderator welcome message to first admin. (#26719)
We already skipping the admin welcome message for the first admin user. We should also skip the moderator message.
2024-04-24 00:20:14 +05:30
Vinoth Kannan 9d88f80f26
UX: make first admin a moderator to review user approvals. (#26588)
Previously, when the new site was created and after the first admin login, no one will receive notifications to review the user approval queue since only the moderators would receive the PMs about it. Also, this PR will change the "pending_users_reminder_delay_minutes" site setting to 5 minutes while the site is in bootstrap mode.
2024-04-10 20:59:03 +05:30
jbrw 74d55f14fe
DEV: Add skip_email_bulk_invites hidden site setting (#26430)
This adds a hidden site setting of `skip_email_bulk_invites`

If set to `true`, the `BulkInvite` job will pass the value to `Invite`, meaning the generated invite wont trigger an email notification being sent to the newly invited user.

(This is useful if you want to manage the sending of the invite emails outside of Discourse.)
2024-03-29 13:22:00 -04:00
Ted Johansson 0c875cb4d5
DEV: Make problem check registration more explicit (#26413)
Previously the problem check registry simply looked at the subclasses of ProblemCheck. This was causing some confusion in environments where eager loading is not enabled, as the registry would appear empty as a result of the classes never being referenced (and thus never loaded.)

This PR changes the approach to a more explicit one. I followed other implementations (bookmarkable and hashtag autocomplete.) As a bonus, this now has a neat plugin entry point as well.
2024-03-28 14:00:47 +08:00
Jarek Radosz 4c860995e0
DEV: Remove unnecessary rails_helper requiring (#26364) 2024-03-26 11:32:01 +01:00
Ted Johansson b36256f222
DEV: Fix broken RunProblemCheck spec (#26074)
The build is broken due to some changes not being staged when I pushed the previous PR. The assertions that check that a job has been scheduled needs to be updated to reflect the new name.
2024-03-07 13:31:59 +08:00
Ted Johansson 6e95c152ed
DEV: Rename problem check jobs to avoid namespace clashes (#26073)
Doing the following renames:

Jobs::ProblemChecks → Jobs::RunProblemChecks
Jobs::ProblemCheck → Jobs::RunProblemCheck

This is to disambiguate the ProblemCheck class name, ease fuzzy finding, and avoid needing to use :: in a bunch of places.
2024-03-07 12:26:58 +08:00
Martin Brennan 6bcbe56116
DEV: Use freeze_time_safe in more places (#25949)
Followup to 120a2f70a9,
uses new method to avoid time-based spec flakiness
2024-03-01 10:07:35 +10:00
Ted Johansson 1bcb521fbf
DEV: Add DB backed problem checks to support perform_every config (#25834)
As part of problem checks refactoring, we're moving some data to be DB backed. In this PR it's the tracking of problem check execution. When was it last run, when was the last problem, when should it run next, how many consecutive checks had problems, etc.

This allows us to implement the perform_every feature in scheduled problem checks for checks that don't need to be run every 10 minutes.
2024-02-27 11:17:39 +08:00
Ted Johansson ed2496c59d
FEATURE: Add scheduled Twitter login problem check - Part 1 (#25830)
This PR adds a new scheduled problem check that simply tries to connect to Twitter OAuth endpoint to check that it's working. It is using the default retry strategy of 2 retries 30 seconds apart.
2024-02-26 12:08:12 +08:00
Vinoth Kannan b3238bfc34
FEATURE: call hub API to update Discourse discover enrollment. (#25634)
Now forums can enroll their sites to be showcased in the Discourse [Discover](https://discourse.org/discover) directory. Once they enable the site setting `include_in_discourse_discover` to enroll their forum the `CallDiscourseHub` job will ping the `api.discourse.org/api/discover/enroll` endpoint. Then the Discourse Hub will fetch the basic details from the forum and add it to the review queue. If the site is approved then the forum details will be displayed in the `/discover` page.
2024-02-23 11:42:28 +05:30
Sam 207cb2052f
FIX: muted tags breaking hot page when filtered to tags (#25824)
Also, remove experimental setting and simply use top_menu for feature detection

This means that when people eventually enable the hot top menu, there will
be topics in it


Co-authored-by: Alan Guo Xiang Tan <gxtan1990@gmail.com>
2024-02-23 17:11:39 +11:00
Ted Johansson a72dc2f420
DEV: Introduce a problem checks API (#25783)
Previously, problem checks were all added as either class methods or blocks in AdminDashboardData. Another set of class methods were used to add and run problem checks.

As of this PR, problem checks are promoted to first-class citizens. Each problem check receives their own class. This class of course contains the implementation for running the check, but also configuration items like retry strategies (for scheduled checks.)

In addition, the parent class ProblemCheck also serves as a registry for checks. For example we can get a list of all existing check classes through ProblemCheck.checks, or just the ones running on a schedule through ProblemCheck.scheduled.

After this refactor, the task of adding a new check is significantly simplified. You add a class that inherits ProblemCheck, you implement it, add a test, and you're good to go.
2024-02-23 11:20:32 +08:00
Martin Brennan ed47b55026
DEV: Increase default SMTP read timeout to 30s (#25763)
A while ago we increased group SMTP read and open timeouts
to address issues we were seeing with Gmail sometimes giving
really long timeouts for these values. The commit was:

3e639e4aa7

Now, we want to increase all SMTP read timeouts to 30s,
since the 5s is too low sometimes, and the ruby Net::SMTP
stdlib also defaults to 30s.

Also, we want to slightly tweak the group smtp email job
not to fail if the IncomingEmail log fails to create, or if
a ReadTimeout is encountered, to avoid retrying the job in sidekiq
again and sending the same email out.
2024-02-21 07:13:18 +10:00
Ted Johansson e071b74a79
DEV: Drop deprecated Badge#image column (#25536)
We just completed the 3.2 release, which marks a good time to drop some previously deprecated columns.

Since the column has been marked in ignored_columns, it has been inaccessible to application code since then. There's a tiny risk that this might break a Data Explorer query, but given the nature of the column, the years of disuse, and the fact that such a breakage wouldn't be critical, we accept it.
2024-02-02 14:09:55 +08:00
Blake Erickson 7200a41207
FIX: export csv file failed message (#25443)
When exporting a csv file and the size of the file exceeded the
max_export_file_size_kb it will still send the PM that the export
succeeded with a broken link to a missing export file. This change
ensures that a failed message will be sent instead.
2024-01-26 11:16:02 -07:00
Ted Johansson 7e5d2a95ee
DEV: Convert min_trust_level_to_tag_topics to groups (#25273)
We're changing the implementation of trust levels to use groups. Part of this is to have site settings that reference trust levels use groups instead. It converts the min_trust_level_to_tag_topics site setting to tag_topic_allowed_groups.
2024-01-26 13:25:03 +08:00