Fixed bugs, added specs, extracted the upload downsizing code to a class, added support for non-S3 setups, changed it so that images aren't downloaded twice.
This code has been tested on production and successfully resized ~180k uploads.
Includes:
* DEV: Extract upload downsizing logic
* DEV: Add support for non-S3 uploads
* DEV: Process only images uploaded by users
* FIX: Incorrect usage of `count` and `exist?` typo
* DEV: Spec S3 image downsizing
* DEV: Avoid downloading images twice
* DEV: Update filesizes earlier in the process
* DEV: Return false on invalid upload
* FIX: Download images that currently above the limit (If the image size limit is decreased, then there was no way to resize those images that now fall outside the allowed size range)
* Update script/downsize_uploads.rb (Co-authored-by: Régis Hanol <regis@hanol.fr>)
FIX: prevent re-flagging when we have reviewed flags before
Fixes an edge case where a review can be reflagged when:
User flags as inappropriate.
Moderator rejects the flag.
Another user re-flags the post as spam.
Before, anyone was able to re-flag as inappropriate despite it being flagged
previously. With this, users are unable to re-flag for the same reason
regardless of reviewable status.
FEATURE: new rake task to update first_post_created_at column
The not-equal operator (`<>`) in PostgreSQL does not compare values
with NULL. We should instead use `IS DISTINCT FROM` when comparing
values with NULL.
Since this is rare, we don't want to check for
`Discourse.pg_readonly_mode?` on every request since we have to reach
for Redis. Instead, just rescue the error here.
Allow limiting the number of migrations to do at once, both to do migrations that
have impact limited to multiple off-peak usage hours to reduce user impact from
a migration, and to allow tests that do only a very small number for test
purposes. ("Give me a ping, Vasili. One ping only, please.")
Looks like some html elements like `aside` and `section` will throw an error
when checking if they are inline or not. The commit simply handles
```
Job exception: undefined method `inline?' for nil:NilClass
```
and adds a test for it.
Moves the most important checks into a linter. It gets executed by Lefthook as well as the docker rake task and Github actions. Doing those checks in rspec takes too long and it produces errors when the discourse:test Docker image contains old, invalid locale files.
* DEV: Move `Discourse.getURL` and related functions to a module
* DEV: Remove `Discourse.getURL` and `Discourse.getURLWithCDN`
* FIX: `get-url` is required for server side code
* DEV: Deprecate `BaseUri` too.
In some restricted setups all JS payloads need tight control.
This setting bans admins from making changes to JS on the site and
requires all themes be whitelisted to be used.
There are edge cases we still need to work through in this mode
hence this is still not supported in production and experimental.
Use an example like this to enable:
`DISCOURSE_WHITELISTED_THEME_REPOS="https://repo.com/repo.git,https://repo.com/repo2.git"`
By default this feature is not enabled and no changes are made.
One exception is that default theme id was missing a security check
this was added for correctness.
* DEV: To be pedantic, there is more than EMBER in there now
* DEV: Use less globals. Have `Discourse` start in an initializer
* DEV: Remove another global
Previously the pull hotlinked images job was skipped after system edits. This ensured that we never had an infinite loop of system-edit/pull-hotlinked/system-edit/pull-hotlinked etc.
A side effect was that edits made by system for any other reason (e.g. API, removing full quotes) would prevent pulling hotlinked images. This commit removes the system edit check, and replaces it with another method to avoid an infinite job scheduling loop.
* DEV: new S3 backup layout
Currently, with $S3_BACKUP_BUCKET of "bucket/backups", multisite backups
end up in "bucket/backups/backups/dbname/" and single-site will be in
"bucket/backups/".
Both _should_ be in "bucket/backups/dbname/"
- remove MULTISITE_PREFIX,
- always include dbname,
- method to move to the new prefix
- job to call the method
* SPEC: add tests for `VacateLegacyPrefixBackups` onceoff job.
Co-authored-by: Vinoth Kannan <vinothkannan@vinkas.com>
Fixes a regression in
e8fb9d4066
which caused a bug where you couldn't send a message to a group that
contained an Uppercase letter. Added a test case for this.
Bug report: https://meta.discourse.org/t/-/152999
Previously we had a partial fix in place where non human users
were not allowed draft sequences, this left edges around where non
human users asked for drafts yet had none.
For example system could already have a few drafts in place.
This also removes and extensibility point we added that is not in use
This reverts commit 20780a1eee.
* SECURITY: re-adds accidentally reverted commit:
03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
instead of the 03d26cd6 parent (which contains security fixes)
* We now have a site setting "topic_excerpt_maxlength" that is used when the OP is created or revised to generate a topic excerpt.
* However, posts created before this setting was introduced cannot benefit from this change unless they are revised, and if the topic excerpt length setting is changed that situation is also not covererd.
* This PR makes a change to rebake! to update the topic excerpt IF the post is the OP.
In some cases, between Discourse forums the hostname of a URL could match if they are hosting S3 files on the same bucket but the S3 bucket path might not. So e.g. https://testbucket.somesite.com/testpath/some/file/url.png vs https://testbucket.somesite.com/prodpath/some/file/url.png. So has_been_uploaded? was returning true for the second URL, even though it may have been uploaded on a different Discourse forum.
This is a very rare case but must be accounted for, because this impacts UrlHelper.is_local which mistakenly thinks the file has already been downloaded and thus allows the URL to be cooked, where we want to return the full URL to be downloaded using PullHotlinkedImages.
* the post_actions table has no FK to users, so if a user has been
deleted we may end up with dangling post_action records, which then
interferes with the bookmarks migration because bookmarks DO have
an FK to users
PG already handles English stop words, the list in cppjieba is
bigger than the list PG uses, which in turn causes confusion cause
words such as "volume" are stripped using cppijieba stop word list
We will follow up with another commit here to apply the Chinese
word stopwords, but for now to eliminate the confusion we are
skipping applying the stopword list when the dictionary in PG is
in English.
* DEV: Add framework for filtered plugin registers
Plugins often need to add values to a list, and we need to filter those lists at runtime to ignore values from disabled plugins. This commit provides a re-usable way to do that, which should make it easier to add new registers in future, and also reduce repeated code.
Follow-up commits will migrate existing registers to use this new system
* DEV: Migrate user and group custom field APIs to plugin registry
This gives us a consistent system for checking plugin enabled state, so we are repeating less logic. API changes are backwards compatible
Prior to this change we would never clear memory from contexts and
rely on V8 reacting to pressure
This could lead to bloating of PrettyText and Transpiler contexts
This optimisations ensures that we will clear memory 2 seconds after
the last eval on the context
Previously we would raise a warning in the logs if downloading
a file (from s3) takes longer than 60 seconds.
At scale this happens reasonably frequently.
1. Raised the duration to 3 minutes
2. Pulled the resizing mutex out of the downloading mutex
so we have less and clearer error logs
* DEV: Standardize table sorting verbiage
This commit creates a common component that tables can use to make their
headers sortable. This commit also standardizes on using `desc` as the
default and passing in the `asc=true` flag to adjust the sorting
direction.
* Add deprecation warnings
Adds deprecation warnings if using previous params and maintains
backwards compatibility. Set the default sort value for group members to
be asc.
* switch group requests to use common table-header-toggle
* update fixture
* PERF: Dematerialize topic_reply_count
It's only ever used for trust level promotions that run daily, or compared to 0. We don't need to track it on every post creation.
* UX: Add symbol in TL3 report if topic reply count is capped
* DEV: Drop user_stats.topic_reply_count column
Co-authored-by: Mark VanLandingham <markvanlan@gmail.com>
Co-authored-by: Robin Ward <robin.ward@gmail.com>
Co-authored-by: Mark VanLandingham <markvanlan@gmail.com>
Use a helper method to simplify creating a new register. Previously this would require creating lots of different methods manually, and adding every register to the clear/reset functions
This commit prevents a 500 error from occurring if someone is trying to
setup their discourse instance as a sso provider and they don't pass in
a `return_sso_url` in their payload.
This refactors default_current_user_provider in a few ways:
- Introduce a generic `api_parameter_allowed?` method which checks for whitelisted routes/formats
- Only read the api_key parameter on allowed routes. It is now completely ignored on other routes (previously it would raise a 403)
- Start reading user_api_key parameter on allowed routes
- Refactor tests as end-end integration tests
A plugin API for PARAMETER_API_PATTERNS will be added soon
For pages that do not specify canonical URL we will default to `https://SITENAME/PATH`.
This ensures that if a URL is crawled on the CDN the search ranking will transfer to the main site.
Additionally we whitelist the `?page` param
Adds a new rake task to auto generate a constants.js file with the
constants present. This makes migrating to Ember CLI easier, but also
slightly speeds up asset compilation by having to do less work.
If the constants change you need to run:
`rake javascripts:update_constants`
There were two constants here, `INLINE_ONEBOX_LOADING_CSS_CLASS` and
`INLINE_ONEBOX_CSS_CLASS` that were both longer than the strings they
were DRYing up: `inline-onebox-loading` and `inline-onebox`
I normally appreciate constants, but in this case it meant that we had
a lot of JS imports resulting in many more lines of code (and CPU cycles
spent figuring them out.)
It also meant we had an `.erb` file and had to invoke Ruby to create the
JS file, which meant the app was harder to port to Ember CLI.
I removed the constants. It's less DRY but faster and simpler, and
arguably the loss of DRYness is not significant as you can still search
for the `inline-onebox-loading` and `inline-onebox` strings easily if
you are refactoring.
We now show an options gear icon next to the bookmark name.
When expanded we show the "delete bookmark when reminder sent" option. The value of this checkbox is saved in local storage for the user.
If this is ticked, when a reminder is sent for the bookmark the bookmark itself is deleted. This is so people can use the reminder functionality by itself.
Also remove the blue alert reminder section from the "Edit Bookmark" modal as it just added clutter, because the user can already see they had a reminder set:
Adds a default false boolean column `delete_when_reminder_sent` to bookmarks.
Locale files get precompiled after deployment and they contained translations from the `default_locale`. That's especially bad in multisites, because the initial `default_locale` is `en_US`. Sites where the `default_locale` isn't `en_US` could see missing translations. The same thing could happen when users are allowed to chose a different locale.
This change simplifies the logic by not using the `default_locale` in the locale chain. It always falls back to `en` in case of missing translations.
We were sharing `Discourse` both as an application object and a
namespace which complicated things for Ember CLI. This patch
moves raw templates into `__DISCOURSE_RAW_TEMPLATES` and adds
a couple helper methods to create/remove them.
This introduces new APIs for obtaining optimized thumbnails for topics. There are a few building blocks required for this:
- Introduces new `image_upload_id` columns on the `posts` and `topics` table. This replaces the old `image_url` column, which means that thumbnails are now restricted to uploads. Hotlinked thumbnails are no longer possible. In normal use (with pull_hotlinked_images enabled), this has no noticeable impact
- A migration attempts to match existing urls to upload records. If a match cannot be found then the posts will be queued for rebake
- Optimized thumbnails are generated during post_process_cooked. If thumbnails are missing when serializing a topic list, then a sidekiq job is queued
- Topic lists and topics now include a `thumbnails` key, which includes all the available images:
```
"thumbnails": [
{
"max_width": null,
"max_height": null,
"url": "//example.com/original-image.png",
"width": 1380,
"height": 1840
},
{
"max_width": 1024,
"max_height": 1024,
"url": "//example.com/optimized-image.png",
"width": 768,
"height": 1024
}
]
```
- Themes can request additional thumbnail sizes by using a modifier in their `about.json` file:
```
"modifiers": {
"topic_thumbnail_sizes": [
[200, 200],
[800, 800]
],
...
```
Remember that these are generated asynchronously, so your theme should include logic to fallback to other available thumbnails if your requested size has not yet been generated
- Two new raw plugin outlets are introduced, to improve the customisability of the topic list. `topic-list-before-columns` and `topic-list-before-link`
The PR https://github.com/rails/rails/pull/36612 changes the raised exception
if the error message includes the target database name.
Since the error message contains the hostname, this could be triggered when
the hostname contains the database name.
* When hovering over the bookmark icon for a post, show the name of the bookmark at the end of the tooltip _if_ it has been set.
* Order bookmarks by `updated_at DESC` in the user list and show that instead of created at.
Recently, we added feature that we are sending `/muted` to users who muted specific topic just before `/latest` so the client knows to ignore those messages - https://github.com/discourse/discourse/pull/9482
Same `/muted` message should be included when the post is edited
* Remove Handlebars.SafeString usage
* DEV: Support for `import Handlebars from 'handlebars'`;
* FIX: Sprockets was broken when `node_modules` was present
By default the old version of sprockets looks for application.js
anywhere, including in a node_modules folder if this exists
(which it will when we move to Ember CLI.)
Some providers don't implement the Expect: 100-continue support,
which results in a mismatch in the object signature.
With this settings, users can disable the header and use such providers.
TLDR; this commit vastly improves how whitespaces are handled when converting from HTML to Markdown.
It also adds support for converting HTML <tables> to markdown tables.
The previous 'remove_whitespaces!' method was traversing the whole HTML tree and used a heuristic to remove
leading and trailing whitespaces whenever it was appropriate (ie. mostly before and after HTML block elements)
It was a good idea, but it was very limited and leaded to bad conversion when the html had leading whitespaces on several lines for example.
One such example can be found [here](https://meta.discourse.org/t/86782).
For various reasons, most of the whitespaces in a HTML file is ignored when the page is being displayed in a browser.
The rules that the browsers follow are the [CSS' White Space Processing Rules](https://www.w3.org/TR/css-text-3/#white-space-rules).
They can be quite complicated when you take into account RTL languages and other various tidbits but they boils down to the following:
- Collapse whitespaces down to one space (0x20) inside an inline context (ie. nodes/tags that are being displaying on the same line)
- Remove any leading/trailing whitespaces inside an inline context
One quick & dirty way of getting this 90% solved would be to do 'HTML.gsub!(/[[:space:]]+/, " ")'.
We would also need to hoist <pre> elements in order to not mess with their whitespaces.
Unfortunately, this solution let some whitespaces creep around HTML tags which leads to more '.strip!' calls than I can bear.
I decided to "emulate" the browser's handling of whitespaces and came up with a solution in 4 parts
1. remove_not_allowed!
The HtmlToMarkdown library is recursively "visiting" all the nodes in the HTML in order to convert them to Markdown.
All the nodes that aren't handled by the library (eg. <script>, <style> or any non-textual HTML tags) are "swallowed".
In order to reduce the number of nodes visited, the method 'remove_not_allowed!' will automatically delete all the nodes
that have no "visitor" (eg. a 'visit_<tag>' method) defined.
2. remove_hidden!
Similar purpose as the previous method (eg. reducing number of nodes visited), there's no point trying to convert something that is hidden.
The 'remove_hidden!' method removes any nodes that was hidden using the "hidden" HTML attribute, some CSS or with a width or height equal to 0.
3. hoist_line_breaks!
The 'hoist_line_breaks!' method is there to handle <br> tags. I know those tiny <br> don't do much but they can be quite annoying.
The <br> tags are inline elements but they visually work like a block element (ie. they create a new line).
If you have the following HTML "<i>Foo<br>Bar</i>", it ends up visually similar to "<i>Foo</i><br><i>Bar</i>".
The latter being much more easy to process than the former, so that's what this method is doing.
The "hoist_line_breaks" will hoist <br> tags out of inline tags until their parent is a block element.
4. remove_whitespaces!
The "remove_whitespaces!" is where all the whitespace removal is happening. It's broken down into 4 methods as well
- remove_whitespaces!
- is_inline?
- collapse_spaces!
- remove_trailing_space!
The 'remove_whitespace!' method is recursively walking the HTML tree (skipping <pre> tags).
If a node has any children, they will be chunked into groups of inline elements vs block elements.
For each chunks of inline elements, it will call the "collapse_space!" and "remove_trailing_space!" methods.
For each chunks of block elements, it will call "remote_whitespace!" to keep walking the HTML tree recursively.
The "is_inline?" method determines whether a node is part of a inline context.
A node is inline iif it's a text node or it's an inline tag, but not <br>, and all its children are also inline.
The "collapse_spaces!" method will collapse any kind of (white) space into a single space (" ") character, even accros tags.
For example, if we have " Foo \n<i> Bar </i>\t42", it will return "Foo <i>Bar </i>42".
Finally, the "remove_trailing_space!" method is there to remove any trailing space that might creep in at the end of the inline chunk.
This solution is not 100% bullet-proof.
It does not support RTL languages at all and has some caveats that I felt were not worth the work to get properly fixed.
FIX: better detection of hidden elements when converting HTML to Markdown
FIX: take into account the 'allowed_href_schemes' site setting when converting HTML <a> to Markdown
FIX: added support for 'mailto:' scheme when converting <a> from HTML to Markdown
FIX: added support for <img> dimensions when converting from HTML to Markdown
FIX: added support for <dl>, <dd> and <dt> when converting from HTML to Markdown
FIX: added support for multilines emphases, strongs and strikes when converting from HTML to Markdown
FIX: added support for <acronym> when converting from HTML to Markdown
DEV: remove unused 'sanitize' gem
Wow, did you just read all that?! Congratz, here's a cookie: 🍪.
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.
For files with `# frozen_string_literal: true`
```
puts %w{a b}[0].frozen?
=> true
puts "hi".frozen?
=> true
puts "a #{1} b".frozen?
=> true
puts ("a " + "b").frozen?
=> false
puts (-("a " + "b")).frozen?
=> true
```
For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby
This is to help with the migration to Ember CLI. In the current running
version of Discourse everything should be the same as before, just with
a few extra files that are not used. However, using Ember CLI this can
be installed as an Ember addon.
Co-Authored-By: Jarek Radosz <jradosz@gmail.com>
rebuilding user_actions is not something that should be done.
Plugins such as solved and assigned extend it, there are tons of
little rules that were not captured in `user_actions:rebuild`
Make sure the topic_user.bookmarked column is set correctly when user bookmarks/unbookmarks any post in a topic. For example, you bookmarked a post in the topic that was not the OP, the bookmark icon in the topic list would not be shown. Same if deleting a bookmark for the last bookmarked post in a topic, the bookmark icon in the topic list would not be removed.
Previously this was only setting it to true if bookmarking the OP/topic, which was not correct -- we want to show the icon on the topic list if any post is bookmarked.
Also set to false if unbookmarking the last bookmarked post in the topic.
Also in this PR is a migration to correct any out of sync topic_user.bookmarked columns, based on the new logic.
* When copying the markdown for an image between posts, we were not adding the srcset and data-small-image attributes which are done by calling optimize_image! in cooked post processor
* Refactored the code which was confusing in its current state (the consider_for_reuse method was super confusing) and fixed the issue
* FEATURE: don't display new/unread notification for muted topics
Currently, even if user mute topic, when a new reply to that topic arrives, the user will get "See 1 new or updated topic" message. After clicking on that link, nothing is visible (because the topic is muted)
To solve that problem, we will send background message to all users who recently muted that topic that update is coming and they can ignore the next message about that topic.
DO does not implement tagging support for S3 objects. Removing our default
empty tag fixes compatibility.
The expire_missing_assets rake task can't be used with that service still,
but this patch allows normal operation.
The main thrust of this PR is to take all the conditional checks based on the `enable_bookmarks_with_reminders` away and only keep the code from the `true` path, making bookmarks with reminders the core bookmarks feature. There is also a migration to create `Bookmark` records out of `PostAction` bookmarks for a site.
### Summary
* Remove logic based on whether enable_bookmarks_with_reminders is true. This site setting is now obsolete, the old bookmark functionality is being removed. Retain the setting and set the value to `true` in a migration.
* Use the code from the rake task to create a database migration that creates bookmarks from post actions.
* Change the bookmark report to read from the new table.
* Get rid of old endpoints for bookmarks
* Link to the new bookmarks list from the user summary page
Otherwise we are trying to update the reminder type with a string which often evaluates to 0 (At Desktop) which causes reminders to come through early.
* DEV: Use `render_json_error` (Adds specs for Admin::GroupsController)
* DEV: Use a specific error on blank category slug (Fixes a `render_json_error` warning)
* DEV: Use a specific error on reviewable claim conflict (Fixes a `render_json_error` warning)
* DEV: Use specific errors in Admin::UsersController (Fixes `render_json_error` warnings)
* FIX: PublishedPages error responses
* FIX: TopicsController error responses (There was an issue of two separate `Topic` instances for the same record. This makes sure there's only one up-to-date instance.)
There is now an explicit "Delete Bookmark" button in the edit modal. A confirmation is shown before deleting.
Along with this, when the bookmarked post icon is clicked the modal is now shown instead of just deleting the bookmark. Also, the "Delete Bookmark" button from the user bookmark list now confirms the action.
Add a `d d` shortcut in the modal to delete the bookmark.