After careful analysis of large data-sets it became apparent that avg_time
had no impact whatsoever on "best of" topic scoring. Calculating avg_time
was a very costly operation especially on large databases.
We have some longer term plans of introducing other weighting that is read
time based into our scoring for "best of" and "top" topics, but in the
interim to stop a large amount of work that is not achieving any value we
are removing the jobs.
Column removal will follow once we decide on a new replacement metric.
This commit fixes the follow quality issue with `PostSearchData#raw_data`:
1. URLs are being tokenized and links with similar href and characters
are being duplicated in the raw data.
`Post#cooked`:
```
<p><a href=\"https://meta.discourse.org/some.png\" class=\"onebox\" target=\"_blank\" rel=\"nofollow noopener\">https://meta.discourse.org/some.png</a></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png discourse org/some png https://meta.discourse.org/some.png discourse org/some png
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized https://meta.discourse.org/some.png meta discourse org
```
2. Ligthbox being included in search pollutes the
`PostSearchData#raw_data` unncessarily.
From 28 March 2018 to 28 March 2019, searches for the term `image` on
`meta.discourse.org` had a click through rate of 2.1%. Non-lightboxed images are not included in indexing for search yet we were indexing content within a lightbox. Also, search for terms like `image` was affected we were using `Pasted image` as the filename for
uploads that were pasted.
`Post#cooked`
```
<p>Let me see how I can fix this image<br>\n<div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://meta.discourse.org/some.png\" title=\"some.png\" rel=\"nofollow noopener\"><img src=\"https://meta.discourse.org/some.png\" width=\"275\" height=\"299\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#far-image\"></use></svg><span class=\"filename\">some.png</span><span class=\"informations\">1750×2000</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#discourse-expand\"></use></svg>\n</div></a></div></p>
```
`PostSearchData#raw_data` Before:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image some.png png https://meta.discourse.org/some.png discourse org/some png some.png png 1750×2000
```
`PostSearchData#raw_data` After:
```
This is a test topic 0 Uncategorized Let me see how I can fix this image
```
In terms of indexing performance, we now have to parse the given HTML
through nokogiri twice. However performance is not a huge worry here since a string length of 194170 takes only 30ms
to scrub plus the indexing takes place in a background job.
Includes support for flags, reviewable users and queued posts, with REST API
backwards compatibility.
Co-Authored-By: romanrizzi <romanalejandro@gmail.com>
Co-Authored-By: jjaffeux <j.jaffeux@gmail.com>
- Notices are visible only by poster and trust level 2+ users.
- Notices are not generated for non-human or staged users.
- Notices are deleted when post is deleted.
This commit introduces an ultra low priority queue for post rebakes. This
way rebakes can never interfere with regular sidekiq processing for cases
where we perform a large scale rebake.
Additionally it allows Post.rebake_old to be run with rate_limiter: false
to avoid triggering the limiter when rebaking. This is handy for cases
where you want to just force the full rebake and not wait for it to trickle
This allows us to run regular rebakes without starving the normal queue.
It additionally adds the ability to specify queue with `Jobs.enqueue` so
we can specifically queue a job with lower priority using the `queue` arg.
We have the periodical job that regularly will rebake old posts. This is
used to trickle in update to cooked markdown. The problem is that each rebake
can issue multiple background jobs (post process and pull hotlinked images)
Previously we had no per-cluster limit so cluster running 100s of sites could
flood the sidekiq queue with rebake related jobs.
New system introduces a hard limit of 300 rebakes per 15 minutes across a
cluster to ensure the sidekiq job is not dominated by this.
We also reduced `rebake_old_posts_count` to 80, which is a safer default.
Checking `plugin.enabled?` while initializing plugins causes issues in two ways:
- An application restart is required for changes to take effect. A load-balanced multi-server environment could behave very weirdly if containers restart at different times.
- In a multisite environment, it takes the `enabled?` setting from the default site. Changes on that site affect all other sites in the cluster.
Instead, `plugin.enabled?` should be checked at runtime, in the context of a request. This commit removes `plugin.enabled?` from many `instance.rb` methods.
I have added a working `plugin.enabled?` implementation for methods that actually affect security/functionality:
- `post_custom_fields_whitelist`
- `whitelist_staff_user_custom_field`
- `add_permitted_post_create_param`
* Add possibility to add hidden posts with PostCreator
* FEATURE: Create hidden posts for received spam emails
Spamchecker usually have 3 results: HAM, SPAM and PROBABLY_SPAM
SPAM gets usually directly rejected and needs no further handling.
HAM is good message and usually gets passed unmodified.
PROBABLY_SPAM gets an additional header to allow further processing.
This change addes processing capabilities for such headers and marks
new posts created as hidden when received via email.
Introduce new patterns for direct sql that are safe and fast.
MiniSql is not prone to memory bloat that can happen with direct PG usage.
It also has an extremely fast materializer and very a convenient API
- DB.exec(sql, *params) => runs sql returns row count
- DB.query(sql, *params) => runs sql returns usable objects (not a hash)
- DB.query_hash(sql, *params) => runs sql returns an array of hashes
- DB.query_single(sql, *params) => runs sql and returns a flat one dimensional array
- DB.build(sql) => returns a sql builder
See more at: https://github.com/discourse/mini_sql
This updates tests to use latest rails 5 practice
and updates ALL dependencies that could be updated
Performance testing shows that performance has not regressed
if anything it is marginally faster now.
* `rescue nil` is a really bad pattern to use in our code base.
We should rescue errors that we expect the code to throw and
not rescue everything because we're unsure of what errors the
code would throw. This would reduce the amount of pain we face
when debugging why something isn't working as expexted. I've
been bitten countless of times by errors being swallowed as a
result during debugging sessions.
Locking a post prevents it from being edited. This is useful if the user
has posted something which has been edited out, and the staff members don't
want them to be able to edit it back in again.
- Regular users are not notified of whispers
- Regular users no longer have "stuck" topics in unread
- Additional tracking for staff highest post number
- Remove a bunch of unused columns in topics table
- All unsubscribes go to the exact same page
- You may unsubscribe from watching a category on that page
- You no longer need to be logged in to unsubscribe from a topic
- Simplified footer on emails
* Rearrange frontend to account for mailing list mode
* Allow update of user preference for mailing list frequency
* Add mailing list frequency estimate
* Simplify frequency estimate; disable activity summary for mailing list mode
* Remove combined updates
* Add specs for enqueue mailing list mode job
* Write mailing list method for mailer
* Fix linting error
* Account for stale topics
* Add translations for default mailing list setting
* One query for mailing list topics
* Fix failing spec
* WIP
* Flesh out html template
* First pass at text-based mailing list summary
* Add user avatar
* Properly format posts for mailing list
* Move make_all_links_absolute into Email::Styles
* Apply first_seen_at to user
* Send mailing list email summary hourly based on first_seen_at
* Branch and test cleanup
* Use existing mailing list mode estimate
* Fix failing specs
- how long were people typing?
- how long was composer open?
- how many drafts were created?
- correct, draft saved to go away after you continue typing
store in Post.find(xyz).post_stat
- add User.staff scope
- inject MessageBus into Ember views (so it can be used by the poll plugin)
- REFACTOR: use more accurate is_first_post? method instead of post_number == 1
- FEATURE: add support for JSON-typed custom fields
- FEATURE: allow plugins to add validation
- FEATURE: add post_custom_fields to PostSerializer
- FEATURE: allow plugins to whitelist post_custom_fields
- FIX: don't bump when post did not save successfully
- FEATURE: polls are supported in any post
- FEATURE: allow for multiple polls in the same post
- FEATURE: multiple choice polls
- FEATURE: rating polls
- FEATURE: new dialect allowing users to preview polls in the composer
FIX: history revision can now properly be hidden
FIX: PostRevision serializer is now entirely dynamic to properly handle
hidden revisions
FIX: default history modal to "side by side" view on mobile
FIX: properly hiden which revision has been hidden
UX: inline category/user/wiki/post_type changes with the revision
details
FEATURE: new '/posts/:post_id/revisions/latest' endpoint to retrieve
latest revision
UX: do not show the hide/show revision button on mobile (no room for
them)
UX: remove CSS transitions on the buttons in the history modal
FIX: PostRevisor now handles all the changes that might create new
revisions
FIX: PostRevision.ensure_consistency! was wrong due to off by 1
mistake...
refactored topic's callbacks for better readability
extracted 'PostRevisionGuardian'
Changed internals so trust levels are referred to with
TrustLevel[1], TrustLevel[2] etc.
This gives us much better flexibility naming trust levels, these names
are meant to be controlled by various communities.