Looks like some html elements like `aside` and `section` will throw an error
when checking if they are inline or not. The commit simply handles
```
Job exception: undefined method `inline?' for nil:NilClass
```
and adds a test for it.
In some restricted setups all JS payloads need tight control.
This setting bans admins from making changes to JS on the site and
requires all themes be whitelisted to be used.
There are edge cases we still need to work through in this mode
hence this is still not supported in production and experimental.
Use an example like this to enable:
`DISCOURSE_WHITELISTED_THEME_REPOS="https://repo.com/repo.git,https://repo.com/repo2.git"`
By default this feature is not enabled and no changes are made.
One exception is that default theme id was missing a security check
this was added for correctness.
Previously the pull hotlinked images job was skipped after system edits. This ensured that we never had an infinite loop of system-edit/pull-hotlinked/system-edit/pull-hotlinked etc.
A side effect was that edits made by system for any other reason (e.g. API, removing full quotes) would prevent pulling hotlinked images. This commit removes the system edit check, and replaces it with another method to avoid an infinite job scheduling loop.
This reverts commit 20780a1eee.
* SECURITY: re-adds accidentally reverted commit:
03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
instead of the 03d26cd6 parent (which contains security fixes)
If a user is created with an id of 999, a `upload.user_id ==
user_avatar.user_id` will return true. This fix increases the id of the
upload to something that we will not hit in the foreseeable future.
Adds a new topic_excerpt_maxlength site setting.
* When topic excerpt is requested for a post, use the new topic_excerpt_maxlength site setting to limit the size of the excerpt
* Remove code for getting/setting Post.excerpt_size as it is not used anywhere
In some cases, between Discourse forums the hostname of a URL could match if they are hosting S3 files on the same bucket but the S3 bucket path might not. So e.g. https://testbucket.somesite.com/testpath/some/file/url.png vs https://testbucket.somesite.com/prodpath/some/file/url.png. So has_been_uploaded? was returning true for the second URL, even though it may have been uploaded on a different Discourse forum.
This is a very rare case but must be accounted for, because this impacts UrlHelper.is_local which mistakenly thinks the file has already been downloaded and thus allows the URL to be cooked, where we want to return the full URL to be downloaded using PullHotlinkedImages.
* DEV: Add framework for filtered plugin registers
Plugins often need to add values to a list, and we need to filter those lists at runtime to ignore values from disabled plugins. This commit provides a re-usable way to do that, which should make it easier to add new registers in future, and also reduce repeated code.
Follow-up commits will migrate existing registers to use this new system
* DEV: Migrate user and group custom field APIs to plugin registry
This gives us a consistent system for checking plugin enabled state, so we are repeating less logic. API changes are backwards compatible
* DEV: Standardize table sorting verbiage
This commit creates a common component that tables can use to make their
headers sortable. This commit also standardizes on using `desc` as the
default and passing in the `asc=true` flag to adjust the sorting
direction.
* Add deprecation warnings
Adds deprecation warnings if using previous params and maintains
backwards compatibility. Set the default sort value for group members to
be asc.
* switch group requests to use common table-header-toggle
* update fixture
* PERF: Dematerialize topic_reply_count
It's only ever used for trust level promotions that run daily, or compared to 0. We don't need to track it on every post creation.
* UX: Add symbol in TL3 report if topic reply count is capped
* DEV: Drop user_stats.topic_reply_count column
Use a helper method to simplify creating a new register. Previously this would require creating lots of different methods manually, and adding every register to the clear/reset functions
This refactors default_current_user_provider in a few ways:
- Introduce a generic `api_parameter_allowed?` method which checks for whitelisted routes/formats
- Only read the api_key parameter on allowed routes. It is now completely ignored on other routes (previously it would raise a 403)
- Start reading user_api_key parameter on allowed routes
- Refactor tests as end-end integration tests
A plugin API for PARAMETER_API_PATTERNS will be added soon
There were two constants here, `INLINE_ONEBOX_LOADING_CSS_CLASS` and
`INLINE_ONEBOX_CSS_CLASS` that were both longer than the strings they
were DRYing up: `inline-onebox-loading` and `inline-onebox`
I normally appreciate constants, but in this case it meant that we had
a lot of JS imports resulting in many more lines of code (and CPU cycles
spent figuring them out.)
It also meant we had an `.erb` file and had to invoke Ruby to create the
JS file, which meant the app was harder to port to Ember CLI.
I removed the constants. It's less DRY but faster and simpler, and
arguably the loss of DRYness is not significant as you can still search
for the `inline-onebox-loading` and `inline-onebox` strings easily if
you are refactoring.
Locale files get precompiled after deployment and they contained translations from the `default_locale`. That's especially bad in multisites, because the initial `default_locale` is `en_US`. Sites where the `default_locale` isn't `en_US` could see missing translations. The same thing could happen when users are allowed to chose a different locale.
This change simplifies the logic by not using the `default_locale` in the locale chain. It always falls back to `en` in case of missing translations.
The failover spec is very fragile and tests specific implementation
vs actual behavior
We rely on a different script during the build process to test
failover operates correctly
This introduces new APIs for obtaining optimized thumbnails for topics. There are a few building blocks required for this:
- Introduces new `image_upload_id` columns on the `posts` and `topics` table. This replaces the old `image_url` column, which means that thumbnails are now restricted to uploads. Hotlinked thumbnails are no longer possible. In normal use (with pull_hotlinked_images enabled), this has no noticeable impact
- A migration attempts to match existing urls to upload records. If a match cannot be found then the posts will be queued for rebake
- Optimized thumbnails are generated during post_process_cooked. If thumbnails are missing when serializing a topic list, then a sidekiq job is queued
- Topic lists and topics now include a `thumbnails` key, which includes all the available images:
```
"thumbnails": [
{
"max_width": null,
"max_height": null,
"url": "//example.com/original-image.png",
"width": 1380,
"height": 1840
},
{
"max_width": 1024,
"max_height": 1024,
"url": "//example.com/optimized-image.png",
"width": 768,
"height": 1024
}
]
```
- Themes can request additional thumbnail sizes by using a modifier in their `about.json` file:
```
"modifiers": {
"topic_thumbnail_sizes": [
[200, 200],
[800, 800]
],
...
```
Remember that these are generated asynchronously, so your theme should include logic to fallback to other available thumbnails if your requested size has not yet been generated
- Two new raw plugin outlets are introduced, to improve the customisability of the topic list. `topic-list-before-columns` and `topic-list-before-link`
Recently, we added feature that we are sending `/muted` to users who muted specific topic just before `/latest` so the client knows to ignore those messages - https://github.com/discourse/discourse/pull/9482
Same `/muted` message should be included when the post is edited
TLDR; this commit vastly improves how whitespaces are handled when converting from HTML to Markdown.
It also adds support for converting HTML <tables> to markdown tables.
The previous 'remove_whitespaces!' method was traversing the whole HTML tree and used a heuristic to remove
leading and trailing whitespaces whenever it was appropriate (ie. mostly before and after HTML block elements)
It was a good idea, but it was very limited and leaded to bad conversion when the html had leading whitespaces on several lines for example.
One such example can be found [here](https://meta.discourse.org/t/86782).
For various reasons, most of the whitespaces in a HTML file is ignored when the page is being displayed in a browser.
The rules that the browsers follow are the [CSS' White Space Processing Rules](https://www.w3.org/TR/css-text-3/#white-space-rules).
They can be quite complicated when you take into account RTL languages and other various tidbits but they boils down to the following:
- Collapse whitespaces down to one space (0x20) inside an inline context (ie. nodes/tags that are being displaying on the same line)
- Remove any leading/trailing whitespaces inside an inline context
One quick & dirty way of getting this 90% solved would be to do 'HTML.gsub!(/[[:space:]]+/, " ")'.
We would also need to hoist <pre> elements in order to not mess with their whitespaces.
Unfortunately, this solution let some whitespaces creep around HTML tags which leads to more '.strip!' calls than I can bear.
I decided to "emulate" the browser's handling of whitespaces and came up with a solution in 4 parts
1. remove_not_allowed!
The HtmlToMarkdown library is recursively "visiting" all the nodes in the HTML in order to convert them to Markdown.
All the nodes that aren't handled by the library (eg. <script>, <style> or any non-textual HTML tags) are "swallowed".
In order to reduce the number of nodes visited, the method 'remove_not_allowed!' will automatically delete all the nodes
that have no "visitor" (eg. a 'visit_<tag>' method) defined.
2. remove_hidden!
Similar purpose as the previous method (eg. reducing number of nodes visited), there's no point trying to convert something that is hidden.
The 'remove_hidden!' method removes any nodes that was hidden using the "hidden" HTML attribute, some CSS or with a width or height equal to 0.
3. hoist_line_breaks!
The 'hoist_line_breaks!' method is there to handle <br> tags. I know those tiny <br> don't do much but they can be quite annoying.
The <br> tags are inline elements but they visually work like a block element (ie. they create a new line).
If you have the following HTML "<i>Foo<br>Bar</i>", it ends up visually similar to "<i>Foo</i><br><i>Bar</i>".
The latter being much more easy to process than the former, so that's what this method is doing.
The "hoist_line_breaks" will hoist <br> tags out of inline tags until their parent is a block element.
4. remove_whitespaces!
The "remove_whitespaces!" is where all the whitespace removal is happening. It's broken down into 4 methods as well
- remove_whitespaces!
- is_inline?
- collapse_spaces!
- remove_trailing_space!
The 'remove_whitespace!' method is recursively walking the HTML tree (skipping <pre> tags).
If a node has any children, they will be chunked into groups of inline elements vs block elements.
For each chunks of inline elements, it will call the "collapse_space!" and "remove_trailing_space!" methods.
For each chunks of block elements, it will call "remote_whitespace!" to keep walking the HTML tree recursively.
The "is_inline?" method determines whether a node is part of a inline context.
A node is inline iif it's a text node or it's an inline tag, but not <br>, and all its children are also inline.
The "collapse_spaces!" method will collapse any kind of (white) space into a single space (" ") character, even accros tags.
For example, if we have " Foo \n<i> Bar </i>\t42", it will return "Foo <i>Bar </i>42".
Finally, the "remove_trailing_space!" method is there to remove any trailing space that might creep in at the end of the inline chunk.
This solution is not 100% bullet-proof.
It does not support RTL languages at all and has some caveats that I felt were not worth the work to get properly fixed.
FIX: better detection of hidden elements when converting HTML to Markdown
FIX: take into account the 'allowed_href_schemes' site setting when converting HTML <a> to Markdown
FIX: added support for 'mailto:' scheme when converting <a> from HTML to Markdown
FIX: added support for <img> dimensions when converting from HTML to Markdown
FIX: added support for <dl>, <dd> and <dt> when converting from HTML to Markdown
FIX: added support for multilines emphases, strongs and strikes when converting from HTML to Markdown
FIX: added support for <acronym> when converting from HTML to Markdown
DEV: remove unused 'sanitize' gem
Wow, did you just read all that?! Congratz, here's a cookie: 🍪.
We have the `# frozen_string_literal: true` comment on all our
files. This means all string literals are frozen. There is no need
to call #freeze on any literals.
For files with `# frozen_string_literal: true`
```
puts %w{a b}[0].frozen?
=> true
puts "hi".frozen?
=> true
puts "a #{1} b".frozen?
=> true
puts ("a " + "b").frozen?
=> false
puts (-("a " + "b")).frozen?
=> true
```
For more details see: https://samsaffron.com/archive/2018/02/16/reducing-string-duplication-in-ruby