- Stream the queries that load the imported_ids
- Use an array instead of a hash for keeping the mapping between imported_ids and new ids
- Ensure we always treat the imported_ids as integers instead of strings
Fixed bugs, added specs, extracted the upload downsizing code to a class, added support for non-S3 setups, changed it so that images aren't downloaded twice.
This code has been tested on production and successfully resized ~180k uploads.
Includes:
* DEV: Extract upload downsizing logic
* DEV: Add support for non-S3 uploads
* DEV: Process only images uploaded by users
* FIX: Incorrect usage of `count` and `exist?` typo
* DEV: Spec S3 image downsizing
* DEV: Avoid downloading images twice
* DEV: Update filesizes earlier in the process
* DEV: Return false on invalid upload
* FIX: Download images that currently above the limit (If the image size limit is decreased, then there was no way to resize those images that now fall outside the allowed size range)
* Update script/downsize_uploads.rb (Co-authored-by: Régis Hanol <regis@hanol.fr>)
Refactors script to follow conventions of other importers and adds some features including like import, processing of post raw text, and, if needed, SSO import.
FEATURE: new rake task to update first_post_created_at column
The not-equal operator (`<>`) in PostgreSQL does not compare values
with NULL. We should instead use `IS DISTINCT FROM` when comparing
values with NULL.
Moves the most important checks into a linter. It gets executed by Lefthook as well as the docker rake task and Github actions. Doing those checks in rspec takes too long and it produces errors when the discourse:test Docker image contains old, invalid locale files.
`Facter.reset` (65d167eac9/lib/facter.rb (L126-L137)) clears `Facter::Options[:external_dir]` which seems to be the 4.x equivalent of `Facter::Util::Config.external_facts_dirs`.
This commit also makes sure that version 4.0 or higher is installed.
Correctly handles more upload formats in posts, updates post custom fields, fixes more edge cases, adds debugging capabilities. (VERBOSE=1 and INTERACTIVE=1 flags)
Includes these commits and some more:
* DEV: Show the fixed image dimensions
* FIX: Support more upload url formats
* DEV: Remove the old upload after updating posts
* FIX: Use the `process_post_#{id}` mutex
* FIX: Avoid rebaking twice
* DEV: Print out the link to the post
* DEV: Process posts chronologically
* DEV: Do a dry-run before saving, pause on any issue
* FIX: Also process deleted posts
* DEV: Make matchers case-insensitive
* DEV: Pause on "detached" uploads, add more debug info
* DEV: Print out time when finished
* DEV: Add support for WORKER_ID/WORKER_COUNT
* DEV: Fix the onebox in cooked text heuristic
* DEV: Don't report already processed posts
* DEV: Beep when done!
* DEV: Ignore issues with deleted posts
* DEV: Ignore issues with deleted topics
* DEV: Multiline SQL
* DEV: Use the bulk attribute assignment
* DEV: Add ENV["INTERACTIVE"] mode
* DEV: Handle post custom fields
* DEV: Bail on non-S3 sites
* DEV: Allow sizes smaller than 1 mpix
* PERF: Dematerialize topic_reply_count
It's only ever used for trust level promotions that run daily, or compared to 0. We don't need to track it on every post creation.
* UX: Add symbol in TL3 report if topic reply count is capped
* DEV: Drop user_stats.topic_reply_count column
* DEV: api documentation updates
- Created a script to convert json responses to rswag
- Documented several api endpoints
- Switched rswag to use header based auth
* Update script, fix some schema missmatches
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.
For files with `# frozen_string_literal: true`
```
puts %w{a b}[0].frozen?
=> true
puts "hi".frozen?
=> true
puts "a #{1} b".frozen?
=> true
puts ("a " + "b").frozen?
=> false
puts (-("a " + "b")).frozen?
=> true
```
For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
* DEV: Update the working tree just once.
`git pull` was effectively doing `git fetch` and `git merge FETCH_HEAD`, and only then we were checking out the desired branch/commit. This change will skip the the merge step.
* DEV: Don't run lefthook in docker_test
I'm not clear why changing only the `wait_for_url` address was necessary and not also the `get` a few lines above, but this change seems to work for me on both literatecomputing.com Groups and a public group.
Checking if all records have been imported uses a temp table in PostgreSQL. This fails when pgbouncer is used unless the temp table is created inside a transaction.
* Detects mostly all attachments and it's a lot faster
* Parses user properties in Ruby instead of the DB, because that's less errorprone
* Imports user avatars
* Imports topic views by users
* Better handling of quotes and YouTube links
* Adds ability to map forums to categories and tags as well as ignore forums.
* Fixes regular expression for detecting attachments in posts.
* Handles "remote attachments" 😮 by inserting a link.
* Imports view counts for topics.
* Handles incorrect references of parent posts.
* Better handling of quotes.
* Finds a lot more attachments by trying to replace various Unicode characters in filenames.
* Customizable email subject prefixes to remove "Re" and "Fwd" as well as localized prefixes.
* Configuration option for prefixes like [FOO] or (BAR) which can be replaced with tags during import.
* Bugfix: Import script might have skipped some users due to missing ORDER BY.
Posts without a user probably shouldn't happen unless there was some direct database tampering, but data like that has been seen in the wild.
The importer will assign those posts to the "system" user.
Previously we had many places in the app that called `hostname` to get
hostname of a server. This commit replaces the pattern in 2 ways
1. We cache the result in `Discourse.os_hostname` so it is only ever called once
2. We prefer to use Socket.gethostname which avoids making a shell command
This improves performance as we are not spawning hostname processes throughout
the app lifetime
This is not used in core or official plugins, and has been printing a deprecation notice since v2.3.0beta4. All OpenID 2.0 code and dependencies have been dropped. The user_open_ids table remains for now, in case anyone has missed the deprecation notice, and needs to migrate their data.
Context at https://meta.discourse.org/t/-/113249
With this change the script:
* Actually removes original large-sized images
* Doesn't save processed files if their size has increased
* Prevents inconsistent state
We pick the first topic with 30 responses as our bench topic.
Previously we simply picked the last topic, but hand no guarantee on ordering.
This also attempts to correct previous runs of the bench.
The following methods have long been deprecated in ruby due to flaws in their implementation per http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/29293?29179-31097:
URI.escape
URI.unescape
URI.encode
URI.unencode
escape/encode are just aliases for one another. This PR uses the Addressable gem to replace these methods with its own encode, unencode, and encode_component methods where appropriate.
I have put all references to Addressable::URI here into the UrlHelper to keep them corralled in one place to make changes to this implementation easier.
Addressable is now also an explicit gem dependency.
We like to stay as close as possible to latest with rubocop cause the cops
get better.
This update required some code changes, specifically the default is to avoid
explicit returns where implicit is done
Also this renames a few rules
This is a bottom up rewrite of Discourse cache to support faster performance
and a limited surface area.
ActiveSupport::Cache::Store accepts many options we do not use, this partial
implementation only picks the bits out that we do use and want to support.
Additionally params are named which avoids typos such as "expires_at" vs "expires_in"
This also moves a few spots in Discourse to use Discourse.cache over setex
Performance of setex and Discourse.cache.write is similar.
Discourse.cache is a more consistent method to use and offers clean fallback
if you are skipping redis
This is part of a larger change that both optimizes Discoruse.cache and omits
use of setex on $redis in favor of consistently using discourse cache
Bench does reveal that use of Rails.cache and Discourse.cache is 1.25x slower
than redis.setex / get so a re-implementation will follow prior to porting
`FileUtils.cd` and `Dir.chdir` cause the working directory to change for the entire process. We run sidekiq jobs, hijacked requests and deferred jobs in threads, which can make working directory changes have unintended side-effects.
- Add a rubocop rule to warn about usage of Dir.chdir and FileUtils.cd
- Added rubocop:disable for scripts used outside the app
- Refactored code using cd to use alternative methods
- Temporarily skipped the rubocop check for lib/backup_restore. This will require more complex refactoring, so I will create a separate PR for review
Doing .pluck(:column).first is a very common pattern in Discourse and in
most cases, a limit cause isn't being added. Instead of adding a limit
clause to all these callsites, this commit adds two new methods to
ActiveRecord::Relation:
pluck_first, equivalent to limit(1).pluck(*columns).first
and pluck_first! which, like other finder methods, raises an exception
when no record is found
The script will now correct all width/height and thumbnail_width/thumbnail_height properties of all the uploaded images.
The script now uses width * height to filter out all unaffected images.
Also handled the case where a downsized image was already an uploaded record.
These scripts are somewhat rough but I needed them to help debug a memory
leak we have noticed in rails 6.
The biggest object script finds all the biggest objects we have in memory
after boot.
The test memory leak runs a very simple iteration through all multisites
and observed memory.
I introduced DemonBase because I had got some conflict between `demon/base.rb` and `jobs/base.rb`, however, to not rename base class, it is possible to use regex on absolute path in Zeitwerk custom inflector.
Zeitwerk simplifies working with dependencies in dev and makes it easier reloading class chains.
We no longer need to use Rails "require_dependency" anywhere and instead can just use standard
Ruby patterns to require files.
This is a far reaching change and we expect some followups here.
Trying to automate the login into a Google account is quite hard. This makes the crawler use the content of a cookies.txt file instead. It also removes a couple of deprecation warnings and adds some color to the output.
This is a temporary workaround for the issue in https://github.com/rails/rails/pull/36949
Discussing a proper fix in Rails with the Rails team.
Prior to this fix we were spinning up a thread every time we closed a connection
to the db.
* REFACTOR: Rename SiteSetting.disable_edit_notifications to disable_system_edit_notifications
- The older name could cause some confusion because the setting does not disable all edit notifications, only system ones.
* FIX: Add frozen_string_literal: true in the migration
* DEV: Deprecate 'disable_edit_notifications'