FIX: First pass to improve efficiency of secure uploads rake task ()

Get rid of harmful each loop over uploads to update. Instead we put all the unique access control posts for the uploads into a map for fast access (vs using the slow .find through array) and look up the post when it is needed when looping through the uploads in batches.

On a Discourse instance with ~93k uploads, a simplified version of the old method takes > 1 minute, and a simplified version of the new method takes ~18s and uses a lot less memory.
This commit is contained in:
Martin Brennan 2020-03-26 15:59:57 +10:00 committed by GitHub
parent efd5fb665b
commit 6f978bc95c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -710,13 +710,6 @@ task "uploads:secure_upload_analyse_and_update" => :environment do
# secure based on site settings.
uploads_to_update = Upload.includes(:posts, :optimized_images).joins(:post_uploads)
# we do this to avoid a heavier post query, and to make sure we only
# get unique posts AND include deleted posts (unscoped)
unique_access_control_posts = Post.unscoped.select(:id, :topic_id).includes(topic: :category).where(id: uploads_to_update.pluck(:access_control_post_id).uniq)
uploads_to_update.each do |upload|
upload.access_control_post = unique_access_control_posts.find { |post| post.id == upload.access_control_post_id }
end
puts "There are #{uploads_to_update.count} upload(s) that could be marked secure.", ""
# Simply mark all these uploads as secure if login_required because no anons will be able to access them
@ -840,8 +833,20 @@ def determine_upload_security_and_posts_to_rebake(uploads_to_update, mark_secure
uploads_to_adjust_acl_for = []
posts_to_rebake = {}
# we do this to avoid a heavier post query, and to make sure we only
# get unique posts AND include deleted posts (unscoped)
unique_access_control_posts = {}
Post.unscoped.select(:id, :topic_id)
.includes(topic: :category)
.where(id: uploads_to_update.pluck(:access_control_post_id).uniq).find_each do |post|
unique_access_control_posts[post.id] = post
end
i = 0
uploads_to_update.find_each(batch_size: 50) do |upload_to_update|
uploads_to_update.find_each do |upload_to_update|
# fetch the post out of the already populated map to avoid n1s
upload_to_update.access_control_post = unique_access_control_posts[upload_to_update.access_control_post_id]
# we just need to determine the post security here so the ACL is set to the correct thing,
# because the update_upload_ACL method uses upload.secure?