This reverts commit 20780a1eee.
* SECURITY: re-adds accidentally reverted commit:
03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
instead of the 03d26cd6 parent (which contains security fixes)
Upgrades Rails to latest, this version has better compatibility
with Ruby 2.7
During the upgrade we needed a new cleaner mechanism for configuring
message bus.
All tests are green.
If anything weird pops up please revert.
This reverts commit e23f1a9071.
Reverting as this currently breaks our plugin linting job in GithHub Action and Jenkins. Will re-revert after all the plugins get the latest rubocop config and/or a (potential) rubocop issue is fixed.
rspec-rails 4.0 was released so we no longer need to depend on a
beta version. Also updates minor on a bunch of rspec gems.
Thanks to @ryanwi for raising this.
TLDR; this commit vastly improves how whitespaces are handled when converting from HTML to Markdown.
It also adds support for converting HTML <tables> to markdown tables.
The previous 'remove_whitespaces!' method was traversing the whole HTML tree and used a heuristic to remove
leading and trailing whitespaces whenever it was appropriate (ie. mostly before and after HTML block elements)
It was a good idea, but it was very limited and leaded to bad conversion when the html had leading whitespaces on several lines for example.
One such example can be found [here](https://meta.discourse.org/t/86782).
For various reasons, most of the whitespaces in a HTML file is ignored when the page is being displayed in a browser.
The rules that the browsers follow are the [CSS' White Space Processing Rules](https://www.w3.org/TR/css-text-3/#white-space-rules).
They can be quite complicated when you take into account RTL languages and other various tidbits but they boils down to the following:
- Collapse whitespaces down to one space (0x20) inside an inline context (ie. nodes/tags that are being displaying on the same line)
- Remove any leading/trailing whitespaces inside an inline context
One quick & dirty way of getting this 90% solved would be to do 'HTML.gsub!(/[[:space:]]+/, " ")'.
We would also need to hoist <pre> elements in order to not mess with their whitespaces.
Unfortunately, this solution let some whitespaces creep around HTML tags which leads to more '.strip!' calls than I can bear.
I decided to "emulate" the browser's handling of whitespaces and came up with a solution in 4 parts
1. remove_not_allowed!
The HtmlToMarkdown library is recursively "visiting" all the nodes in the HTML in order to convert them to Markdown.
All the nodes that aren't handled by the library (eg. <script>, <style> or any non-textual HTML tags) are "swallowed".
In order to reduce the number of nodes visited, the method 'remove_not_allowed!' will automatically delete all the nodes
that have no "visitor" (eg. a 'visit_<tag>' method) defined.
2. remove_hidden!
Similar purpose as the previous method (eg. reducing number of nodes visited), there's no point trying to convert something that is hidden.
The 'remove_hidden!' method removes any nodes that was hidden using the "hidden" HTML attribute, some CSS or with a width or height equal to 0.
3. hoist_line_breaks!
The 'hoist_line_breaks!' method is there to handle <br> tags. I know those tiny <br> don't do much but they can be quite annoying.
The <br> tags are inline elements but they visually work like a block element (ie. they create a new line).
If you have the following HTML "<i>Foo<br>Bar</i>", it ends up visually similar to "<i>Foo</i><br><i>Bar</i>".
The latter being much more easy to process than the former, so that's what this method is doing.
The "hoist_line_breaks" will hoist <br> tags out of inline tags until their parent is a block element.
4. remove_whitespaces!
The "remove_whitespaces!" is where all the whitespace removal is happening. It's broken down into 4 methods as well
- remove_whitespaces!
- is_inline?
- collapse_spaces!
- remove_trailing_space!
The 'remove_whitespace!' method is recursively walking the HTML tree (skipping <pre> tags).
If a node has any children, they will be chunked into groups of inline elements vs block elements.
For each chunks of inline elements, it will call the "collapse_space!" and "remove_trailing_space!" methods.
For each chunks of block elements, it will call "remote_whitespace!" to keep walking the HTML tree recursively.
The "is_inline?" method determines whether a node is part of a inline context.
A node is inline iif it's a text node or it's an inline tag, but not <br>, and all its children are also inline.
The "collapse_spaces!" method will collapse any kind of (white) space into a single space (" ") character, even accros tags.
For example, if we have " Foo \n<i> Bar </i>\t42", it will return "Foo <i>Bar </i>42".
Finally, the "remove_trailing_space!" method is there to remove any trailing space that might creep in at the end of the inline chunk.
This solution is not 100% bullet-proof.
It does not support RTL languages at all and has some caveats that I felt were not worth the work to get properly fixed.
FIX: better detection of hidden elements when converting HTML to Markdown
FIX: take into account the 'allowed_href_schemes' site setting when converting HTML <a> to Markdown
FIX: added support for 'mailto:' scheme when converting <a> from HTML to Markdown
FIX: added support for <img> dimensions when converting from HTML to Markdown
FIX: added support for <dl>, <dd> and <dt> when converting from HTML to Markdown
FIX: added support for multilines emphases, strongs and strikes when converting from HTML to Markdown
FIX: added support for <acronym> when converting from HTML to Markdown
DEV: remove unused 'sanitize' gem
Wow, did you just read all that?! Congratz, here's a cookie: 🍪.
pry-nav is not yet supported on latest pry, this holds off on
upgrading pry, which in turn holds off on upgrading deps
Stripping pry-nav for now till it works with latest pry
Latest version of Rails contains compatibility fixes for Ruby 2.7 and some
minor security fixes we would like to have
It also broke some of the multisite tests.
Rails tries to use the same connection for reading from a replica as writing
to the leader during tests, because, with everything happening in a
transaction, changes to the DB wouldn't otherwise be reflected in the
replica connection.
The difference now is that Rails tries to do this for connections opened
after the test has started which affected rails multisite connections.
The upshot of this is that, as things stand, you are likely to
experience problems if you try to connect to a different multisite DB in
a test when the `current_db` is not 'default'.
This adds rubocop-rspec, and enables some cops that were either already passing or are passing now, after fixing them in this commit.
Some new cops are disabled for now, with annotation: "TODO" or "To be decided". Those either need to be discussed first, or require manual changes, or the number of found and fixed offenses is too large to bundle them up in a single PR.
Includes:
* DEV: Update rubocop's `TargetRubyVersion` to 2.6
* DEV: Enable RSpec/VoidExpect
* DEV: Enable RSpec/SharedContext
* DEV: Enable RSpec/EmptyExampleGroup (Removed an obsolete empty spec file)
* DEV: Enable RSpec/ItBehavesLike
* DEV: Remove RSpec/ScatteredLet (It's too strict, as it doesn't recognize fab! as a let-like)
* DEV: Remove RSpec/MultipleExpectations
json is shipped out of sync with Ruby. Even though we use OJ for many things
we still use the json gem sometimes, this ensures we use the latest
b8b29e79ad/config/initializers/100-oj.rb (L9-L9)
* Remove some `.es6` from comments where it does not matter
* Use a post processor for transpilation
This will allow us to eventually use the directory structure to
transpile rather than the extension.
* FIX: Some errors and clean up in confirm-new-email
It would throw an error if the webauthn element wasn't present.
Also I changed things so that no-module is not explicitly
referenced.
* Remove `no-module`
Instead we allow a magic comment: `// discourse-skip-module` to prevent
the asset pipeline from creating a module.
* DEV: Enable babel transpilation based on directory
If it's in `app/assets/javascripts/dicourse` it will be transpiled
even without the `.es6` extension.
* REFACTOR: Remove Tilt/ES6ModuleTranspiler
byebug, ruby-prof, better_errors and rbtrace are very MRI specific, flag
them as such
This helps move forward on potential jruby and truffleruby experiments
This is not used in core or official plugins, and has been printing a deprecation notice since v2.3.0beta4. All OpenID 2.0 code and dependencies have been dropped. The user_open_ids table remains for now, in case anyone has missed the deprecation notice, and needs to migrate their data.
Context at https://meta.discourse.org/t/-/113249
* DEV: Use Ember 3.12.2
* Add Ember version to ThemeField's DEPENDENT_CONSTANTS
* DEV: Use `id` instead of `elementId` (See: https://github.com/emberjs/ember.js/issues/18147)
* FIX: Don't leak event listeners (bug introduced in 999e2ff)
We can not upgrade rack cause it breaks Sidekiq web.
I can not find a trivial fix short of disabling sessions in Sidekiq which
is a security concern.
We need to figure out how to reuse sessions with our Rails application in
Sidekiq.
This gets extra complex cause we use a special cookie store for sessions.
9e399b42b9/lib/discourse_cookie_store.rb (L3-L21)
The ROTP gem is only used in a very small amount of places in the app, we don't need to globally require it.
Also set the Addressable gem to not have a specific version range, as it has not been a problem yet.
Some slight refactoring of UserSecondFactor here too to use SecondFactorManager to avoid code repetition
The following methods have long been deprecated in ruby due to flaws in their implementation per http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-core/29293?29179-31097:
URI.escape
URI.unescape
URI.encode
URI.unencode
escape/encode are just aliases for one another. This PR uses the Addressable gem to replace these methods with its own encode, unencode, and encode_component methods where appropriate.
I have put all references to Addressable::URI here into the UrlHelper to keep them corralled in one place to make changes to this implementation easier.
Addressable is now also an explicit gem dependency.
Previously it was unclear why certain gems are being held back cause Gemfile
had no comment explaining it.
I tried to add some explanation from memory and remove some exceptions that
seemed to be superfluous.
This upgrades shoulda to latest, it appears to work once a couple of assertions
are removed
Also update http accept language used to auto detect language from http header
this is tested
Zeitwerk small update seems fine
This reverts commit ab74a50d85.
We really want to upgrade redis, but discovered some edge cases
around failover we need to test.
Holding off on the upgrade till a bit more testing happens