* FIX: Add post id to the anchor to prevent two identical anchors
We generate anchors for headings in posts. This works fine if there is
only one post in a topic with anchors. The problem comes when you have
two or more posts with the same heading. PrettyText generates anchors
based on the heading text using the raw context of each post, so it is
entirely possible to generate the same anchor for two posts in the same
topic, especially for topics with template replies
Post1:
# heading
context
Post2:
# heading
context
When both posts are on the page at the same time, the anchor will only
work for the first post, according to the [HTML specification](https://html.spec.whatwg.org/multipage/browsing-the-web.html#scroll-to-the-fragment-identifier).
> If there is an a element in the document tree whose root is document
> that has a name attribute whose value is equal to fragment, then
> return the *first* such element in tree order.
This bug is particularly serious in forums with non-Latin languages,
such as Chinese. We do not generate slugs for Chinese, which results in
the heading anchors being completely dependent on their order.
```ruby
[2] pry(main)> PrettyText.cook("# 中文")
=> "<h1><a name=\"h-1\" class=\"anchor\" href=\"#h-1\"></a>中文</h1>"
```
Therefore, the anchors in the two posts must be in exactly the same by
order, causing almost all of the anchors in the second post to be
invalid.
This commit solves this problem by adding the `post_id` to the anchor.
The new anchor generation method will add `p-{post_id}` as a prefix when
post_id is available:
```ruby
[3] pry(main)> PrettyText.cook("# 中文", post_id: 1234)
=> "<h1><a name=\"p-1234-h-1\" class=\"anchor\" href=\"#p-1234-h-1\"></a>中文</h1>"
```
This way we can ensure that each anchor name only appears once on the
same topic. Using post id also prevents the potential possibility of the
same anchor name when splitting/merging topics.
The AdminPlugin JS model uses a similar pattern to chat models,
where it is a plain JS class manually converting provided
snake_case attributes from the serializer to JS camelCase.
However this doesn't work when it comes to using `add_to_serializer`
in plugins since core does not know about these new attributes.
Instead, we can use a JS function to convert snake_case to camelCase
and use that when initializing AdminPlugin. This commit also moves
similar functions to a new case-converter.js file in
discourse-common/lib.
When a post is cooked the links are extracted and `TopicLink` instances
are created for each of them. These links are used in various places,
including the topic view, user summary page, etc.
In previous commit 48e5d1a, hotlinked images from Oneboxes have been
ignored from the texts, but hotlinked images turned into Lightboxes
were still extracted.
Inline the helper functions, avoid creating and then immediately destructuring arrays, use complete strings instead of string interpolation, Map instead of a pojo.
This fixes the `PrettyText.make_all_links_absolute` to better handle subfolder.
In subfolder, when given the cooked version of a post, links to mentions includes the `Discourse.base_path` prefix. Adding the `Discourse.base_url` was doubling the `Discourse.base_path`.
The issue was hidden behind the specs which was stubbing `Discourse.base_url` instead of relying on `Discourse.base_path`.
This fixes both the "algorithm" used in `PrettyText.make_all_links_absolute` to better handle this case and correct the specs to properly handle subfolder cases.
There are lots of changes in the specs due to a refactoring to use squiggly heredoc strings for easier reading and less escaping.
This method name is a bit confusing; with_secure_uploads implies
it may return a block or something with the uploads of the post,
and has_secure_uploads implies that it's checking whether the post
is linked to any secure uploads.
should_secure_uploads? communicates the true intent of this method --
which is to say whether uploads attached to this post should be
secure or not.
Before this commit, we had a yarn package set up in the root directory and also in `app/assets/javascripts`. That meant two `yarn install` calls and two `node_modules` directories. This commit merges them both into the root location, and updates references to node_modules.
A previous attempt can be found at https://github.com/discourse/discourse/pull/21172. This commit re-uses that script to merge the `yarn.lock` files.
Co-authored-by: Jarek Radosz <jradosz@gmail.com>
JS assets added by plugins via `register_asset` will be outside the `assets/javascripts` directory, and are therefore exempt from being transpiled. That means that there isn't really any need to run them through DiscourseJsProcessor. Instead, we can just concatenate them together, and avoid the need for all the sprockets-wrangling.
This commit also takes the opportunity to clean up a number of plugin-asset-related codepaths which are no longer required (e.g. globs, handlebars)
With Embroider, we can rely on async `import()` to do the splitting
for us.
This commit extracts from `pretty-text` all the parts that are
meant to be loaded async into a new `discourse-markdown-it` package
that is also a V2 addon (meaning that all files are presumed unused
until they are imported, aka "static").
Mostly I tried to keep the very discourse specific stuff (accessing
site settings and loading plugin features) inside discourse proper,
while the new package aims to have some resembalance of a general
purpose library, a MarkdownIt++ if you will. It is far from perfect
because of how all the "options" stuff work but I think it's a good
start for more refactorings (clearing up the interfaces) to happen
later.
With this, pretty-text and app/lib/text are mostly a kitchen sink
of loosely related text processing utilities.
After the refactor, a lot more code related to setting up the
engine are now loaded lazily, which should be a pretty nice win. I
also noticed that we are currently pulling in the `xss` library at
initial load to power the "sanitize" stuff, but I suspect with a
similar refactoring effort those usages can be removed too. (See
also #23790).
This PR does not attempt to fix the sanitize issue, but I think it
sets things up on the right trajectory for that to happen later.
Co-authored-by: David Taylor <david@taylorhq.com>
Preloading just metadata is not always respected by browsers, and
sometimes the whole video will be downloaded. This switches to using a
placeholder image for the video and only loads the video when the play
button is clicked.
This commit removes any logic in the app and in specs around
enable_experimental_hashtag_autocomplete and deletes some
old category hashtag code that is no longer necessary.
It also adds a `slug_ref` category instance method, which
will generate a reference like `parent:child` for a category,
with an optional depth, which hashtags use. Also refactors
PostRevisor which was using CategoryHashtagDataSource directly
which is a no-no.
Deletes the old hashtag markdown rule as well.
These avatar-related helper functions are used in pretty-text, which currently means we load the entire `discourse/lib/utilities` module into the mini-racer when running pretty-text on the server side. This stops us adding any logic or imports to discourse/lib/utilities which may depend on other `discourse/` namespace features.
This commit moves the avatar-related utils into a dedicated module in the `discourse-common` namespace, adds backwards-compatibility shims, and updates the pretty-text config accordingly.
https://meta.discourse.org/t/markdown-preview-and-result-differ/263878
The result of this markdown had different results in the composer preview and the post. This is solved by updating Loofah to the latest version and using html5 fragments like our user had reported. While the change was only needed in cooked_post_processor.rb for this fix, other areas also had to be updated due to various side effects.
Added a new modifier hook to allow plugins to modify the @mentions
extracted from a cooked text.
Use case: Some plugins may change how the mentions are cooked to prevent
them from being confused with user or group mentions and display the user
card.
This modifier hook allows the plugin to filter the mentions detected or add new ways
to add mentions into cooked text.
Test were sometimes failing with similar error to the following:
```
1) UsernameChanger#override when unicode_usernames is off overrides the username if a new name has different case
Failure/Error:
protect { v8.eval(<<~JS) }
__paths = #{paths_json};
__utils.avatarImg({size: #{size.inspect}, avatarTemplate: #{avatar_template.inspect}}, __getURL);
JS
MiniRacer::RuntimeError:
ReferenceError: __optInput is not defined
# JavaScript at exports.helperContext (<anonymous>:21:17)
# JavaScript at getRawAvatarSize (<anonymous>:108:49)
# JavaScript at avatarUrl (<anonymous>:102:21)
# JavaScript at Object.avatarImg (<anonymous>:129:15)
# JavaScript at <anonymous>:2:9
# ./lib/pretty_text.rb:259:in `block in avatar_img'
# ./lib/pretty_text.rb:661:in `block in protect'
# ./lib/pretty_text.rb:661:in `synchronize'
# ./lib/pretty_text.rb:661:in `protect'
# ./lib/pretty_text.rb:259:in `avatar_img'
# ./app/jobs/regular/update_username.rb:14:in `execute'
```
This should not be needed as it should already have been initialised but that should stop the flakey-ness for now while being a safe change.
* FEATURE: reduce avatar sizes to 6 from 20
This PR introduces 3 changes:
1. SiteSetting.avatar_sizes, now does what is says on the tin.
previously it would introduce a large number of extra sizes, to allow for
various DPIs. Instead we now trust the admin with the size list.
2. When `avatar_sizes` changes, we ensure consistency and remove resized
avatars that are not longer allowed per site setting. This happens on the
12 hourly job and limited out of the box to 20k cleanups per cycle, given
this may reach out to AWS 20k times to remove things.
3.Our default avatar sizes are now "24|48|72|96|144|288" these sizes were
very specifically picked to limit amount of bluriness introduced by webkit.
Our avatars are already blurry due to 1px border, so this corrects old blur.
This change heavily reduces storage required by forums which simplifies
site moves and more.
Co-authored-by: David Taylor <david@taylorhq.com>
This commit makes some fundamental changes to how hashtag cooking and
icon generation works in the new experimental hashtag autocomplete mode.
Previously we cooked the appropriate SVG icon with the cooked hashtag,
though this has proved inflexible especially for theming purposes.
Instead, we now cook a data-ID attribute with the hashtag and add a new
span as an icon placeholder. This is replaced on the client side with an
icon (or a square span in the case of categories) on the client side via
the decorateCooked API for posts and chat messages.
This client side logic uses the generated hashtag, category, and channel
CSS classes added in a previous commit.
This is missing changes to the sidebar to use the new generated CSS
classes and also colors and the split square for categories in the
hashtag autocomplete menu -- I will tackle this in a separate PR so it
is clearer.
Watched words were converted to regular expressions containing \W, which
handled only ASCII characters. Using [^[:word]] instead ensures that
UTF-8 characters are also handled correctly.
This feature will allow sites to define which emoji are not allowed. Emoji in this list should be excluded from the set we show in the core emoji picker used in the composer for posts when emoji are enabled. And they should not be allowed to be chosen to be added to messages or as reactions in chat.
This feature prevents denied emoji from appearing in the following scenarios:
- topic title and page title
- private messages (topic title and body)
- inserting emojis into a chat
- reacting to chat messages
- using the emoji picker (composer, user status etc)
- using search within emoji picker
It also takes into account the various ways that emojis can be accessed, such as:
- emoji autocomplete suggestions
- emoji favourites (auto populates when adding to emoji deny list for example)
- emoji inline translations
- emoji skintones (ie. for certain hand gestures)
This was added in d3f02a1270
for hashtags but later removed usage in
b2acc416e7. It was removed because
serializing the user does not include things like their
secure_categories.
It is not used by any other plugins or themes, and can cause
issues where it will error when operating on a null user. Better
to just pass in the user_id and use it to look up a user
directly in a PrettyText::Helper
There was an issue where if hashtag-cooked HTML was sent
to the ExcerptParser without the keep_svg option, we would
end up with empty </use> and </svg> tags on the parts of the
excerpt where the hashtag was, in this case when a post
push notification was sent.
Fixed this, and also added a way to only display a plaintext
version of the hashtag for cases like this via PrettyText#excerpt.
* FIX: Use Category.secured(guardian) for hashtag datasource
Follow up to comments in #19219, changing the category
hashtag datasource to use Category.secured(guardian) instead
of Site.new(guardian).categories here since the latter does
more work for not much benefit, and the query time is the
same. Also eliminates some Hash -> Model back and forth
busywork. Add some more specs too.
* FIX: Server-side hashtag lookup cooking user loading
When we were using the PrettyText.options.currentUser
and parsing back and forth with JSON for the hashtag
lookups server-side, we had a bug where the user's
secure categories were not loaded since we never actually
loaded a User model from the database, only parsed it
from JSON.
This commit fixes the issue by instead using the
PretyText.options.userId and looking up the user directly
from the database when calling hashtag_lookup via the
PrettyText::Helpers code when cooking server-side. Added
the missing spec to check for this as well.
Note that we don't have a database table and a model for post mentions yet, and I decided to implement it without adding one to avoid heavy data migrations. Still, we may want to add such a model later, that would be convenient, we have such a model for mentions in chat.
Note that status appears on all mentions on all posts in a topic except of the case when you just posted a new post, and it appeared on the bottom of the topic. On such posts, status won't be shown immediately for now (you'll need to reload the page to see the status). I'll take care of it in one of the following PRs.
This commit fleshes out and adds functionality for the new `#hashtag` search and
lookup system, still hidden behind the `enable_experimental_hashtag_autocomplete`
feature flag.
**Serverside**
We have two plugin API registration methods that are used to define data sources
(`register_hashtag_data_source`) and hashtag result type priorities depending on
the context (`register_hashtag_type_in_context`). Reading the comments in plugin.rb
should make it clear what these are doing. Reading the `HashtagAutocompleteService`
in full will likely help a lot as well.
Each data source is responsible for providing its own **lookup** and **search**
method that returns hashtag results based on the arguments provided. For example,
the category hashtag data source has to take into account parent categories and
how they relate, and each data source has to define their own icon to use for the
hashtag, and so on.
The `Site` serializer has two new attributes that source data from `HashtagAutocompleteService`.
There is `hashtag_icons` that is just a simple array of all the different icons that
can be used for allowlisting in our markdown pipeline, and there is `hashtag_context_configurations`
that is used to store the type priority orders for each registered context.
When sending emails, we cannot render the SVG icons for hashtags, so
we need to change the HTML hashtags to the normal `#hashtag` text.
**Markdown**
The `hashtag-autocomplete.js` file is where I have added the new `hashtag-autocomplete`
markdown rule, and like all of our rules this is used to cook the raw text on both the clientside
and on the serverside using MiniRacer. Only on the server side do we actually reach out to
the database with the `hashtagLookup` function, on the clientside we just render a plainer
version of the hashtag HTML. Only in the composer preview do we do further lookups based
on this.
This rule is the first one (that I can find) that uses the `currentUser` based on a passed
in `user_id` for guardian checks in markdown rendering code. This is the `last_editor_id`
for both the post and chat message. In some cases we need to cook without a user present,
so the `Discourse.system_user` is used in this case.
**Chat Channels**
This also contains the changes required for chat so that chat channels can be used
as a data source for hashtag searches and lookups. This data source will only be
used when `enable_experimental_hashtag_autocomplete` is `true`, so we don't have
to worry about channel results suddenly turning up.
------
**Known Rough Edges**
- Onebox excerpts will not render the icon svg/use tags, I plan to address that in a follow up PR
- Selecting a hashtag + pressing the Quote button will result in weird behaviour, I plan to address that in a follow up PR
- Mixed hashtag contexts for hashtags without a type suffix will not work correctly, e.g. #ux which is both a category and a channel slug will resolve to a category when used inside a post or within a [chat] transcript in that post. Users can get around this manually by adding the correct suffix, for example ::channel. We may get to this at some point in future
- Icons will not show for the hashtags in emails since SVG support is so terrible in email (this is not likely to be resolved, but still noting for posterity)
- Additional refinements and review fixes wil
This commit renames all secure_media related settings to secure_uploads_* along with the associated functionality.
This is being done because "media" does not really cover it, we aren't just doing this for images and videos etc. but for all uploads in the site.
Additionally, in future we want to secure more types of uploads, and enable a kind of "mixed mode" where some uploads are secure and some are not, so keeping media in the name is just confusing.
This also keeps compatibility with the `secure-media-uploads` path, and changes new
secure URLs to be `secure-uploads`.
Deprecated settings:
* secure_media -> secure_uploads
* secure_media_allow_embed_images_in_emails -> secure_uploads_allow_embed_images_in_emails
* secure_media_max_email_embed_image_size_kb -> secure_uploads_max_email_embed_image_size_kb
We were already compiling the markdown bundle via ember-cli, but that version was only being used in the test environment. This commit improves the implementation, and updates the filename so it's also used in production.
This commit also
- Removes the vendored copy of `markdown-it.js` and fetches from node_modules instead
- Updates `pretty_text.rb` to remove the custom sprockets-manifest-parsing
- Removes `pretty-text-bundle.js`, which was only being used by `pretty_text.rb`
This is a much better description of its function. It performs idempotent normalization of a URL. If consumers truly need to `encode` a URL (including double-encoding of existing encoded entities), they can use the existing `.encode` method.
* FEATURE: Add case-sensitivity flag to watched_words
Currently, all watched words are matched case-insensitively. This flag
allows a watched word to be flagged for case-sensitive matching.
To allow allow for backwards compatibility the flag is set to false by
default.
* FEATURE: Support case-sensitive creation of Watched Words via API
Extend admin creation and upload of Watched Words to support case
sensitive flag. This lays the ground work for supporting
case-insensitive matching of Watched Words.
Support for an extra column has also been introduced for the Watched
Words upload CSV file. The new column structure is as follows:
word,replacement,case_sentive
* FEATURE: Enable case-sensitive matching of Watched Words
WordWatcher's word_matcher_regexp now returns a list of regular
expressions instead of one case-insensitive regular expression.
With the ability to flag a Watched Word as case-sensitive, an action
can have words of both sensitivities.This makes the use of the global
Regexp::IGNORECASE flag added to all words problematic.
To get around platform limitations around the use of subexpression level
switches/flags, a list of regular expressions is returned instead, one for each
case sensitivity.
Word matching has also been updated to use this list of regular expressions
instead of one.
* FEATURE: Use case-sensitive regular expressions for Watched Words
Update Watched Words regular expressions matching and processing to handle
the extra metadata which comes along with the introduction of
case-sensitive Watched Words.
This allows case-sensitive Watched Words to matched as such.
* DEV: Simplify type casting of case-sensitive flag from uploads
Use builtin semantics instead of a custom method for converting
string case flags in uploaded Watched Words to boolean.
* UX: Add case-sensitivity details to Admin Watched Words UI
Update Watched Word form to include a toggle for case-sensitivity.
This also adds support for, case-sensitive testing and matching of Watched Word
in the admin UI.
* DEV: Code improvements from review feedback
- Extract watched word regex creation out to a utility function
- Make JS array presence check more explicit and readable
* DEV: Extract Watched Word regex creation to utility function
Clean-up work from review feedback. Reduce code duplication.
* DEV: Rename word_matcher_regexp to word_matcher_regexp_list
Since a list is returned now instead of a single regular expression,
change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate
this change.
* DEV: Incorporate WordWatcher updates from upstream
Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches
that aren't at the beginning of the line.
This commit introduces a new site setting: `block_hotlinked_media`. When enabled, all attempts to hotlink media (images, videos, and audio) will fail, and be replaced with a linked placeholder. Exceptions to the rule can be added via `block_hotlinked_media_exceptions`.
`download_remote_image_to_local` can be used alongside this feature. In that case, hotlinked images will be blocked immediately when the post is created, but will then be replaced with the downloaded version a few seconds later.
This implementation is purely server-side, and does not impact the composer preview.
Technically, there are two stages to this feature:
1. `PrettyText.sanitize_hotlinked_media` is called during `PrettyText.cook`, and whenever new images are introduced by Onebox. It will iterate over all src/srcset attributes in the post HTML and check if they're allowed. If not, the attributes will be removed and replaced with a `data-blocked-hotlinked-src(set)` attribute
2. In the `CookedPostProcessor`, we iterate over all `data-blocked-hotlinked-src(set)` attributes and check whether we have a downloaded version of the media. If yes, we update the src to use the downloaded version. If not, the entire media element is replaced with a placeholder. The placeholder is labelled 'external media', and is a link to the offsite media.
This option will make it so the [quote] bbcode will always
include the HTML link to the quoted post, even if a topic_id
is not provided in the PrettyText#cook options. This is so
[quote] bbcode can be used in other places, like chat messages,
that always need the link and do not have an "off-topic" ID
to use.
This will allow pretty text deprecations / errors / warnings to appear in the Rails logs, rather than disappearing silently.
(implementation adapted from `discourse_js_processor.rb`)
Sometimes plugins need to have additional data or options available
when rendering custom markdown features/rules that are not available
on the default opts.discourse object. These additional options should
be namespaced to the plugin adding them.
```
Site.markdown_additional_options["chat"] = { limited_pretty_text_markdown_rules: [] }
```
These are passed down to markdown rules on opts.discourse.additionalOptions.
The main motivation for adding this is the chat plugin, which currently stores
chat_pretty_text_features and chat_pretty_text_markdown_rules on
the Site object via additions to the serializer, and the Site object is
not accessible to import via markdown rules (either through
Site.current() or through container.lookup). So, to have this working
for both front + backend code, we need to attach these additional options
from the Site object onto the markdown options object.
This commit extends the options which can be passed to
`PrettyText.markdown` so that which Markdown-it rules and Discourse
Markdown plugins to be used when rendering a text can be customizable.
Currently, this extension is mainly used by plugins.