Commit Graph

10 Commits

Author SHA1 Message Date
Régis Hanol 4cb3412a56
PERF: improve `findAllMatches` speed (#22083)
When we introduced unicode support in the regular expressions used in watched words (9a27803) we didn't realize the cost adding the `u` flag would be.

Turns out, it's pretty bad when you have lots of regular expressions to test. A customer had slightly less than 200 watched words, and it would freeze the browser for about 2s on the first check of those regular expressions (roughly 10ms per regular expression).

This commit introduces a new field (`word`) to the serialized watched words which is then converted to a very fast and cheap regular expression on the client-side. We use that regexp to quicly check whether a matcher is even worth trying so that we don't incure the cost of compiling the expensive unicode regexp.

This commit also busts the `WordWatcher` cache since we added a new field to be serialized.

One nice side effect of using `matchAll` instead of a `while / exec` loop is that the likeliness of having a bad regexp matching infinitely is vastly reduced 🙌
2023-06-13 18:34:28 +02:00
Martin Brennan 0b3cf83e3c
FIX: Do not cook icon with hashtags (#21676)
This commit makes some fundamental changes to how hashtag cooking and
icon generation works in the new experimental hashtag autocomplete mode.
Previously we cooked the appropriate SVG icon with the cooked hashtag,
though this has proved inflexible especially for theming purposes.

Instead, we now cook a data-ID attribute with the hashtag and add a new
span as an icon placeholder. This is replaced on the client side with an
icon (or a square span in the case of categories) on the client side via
the decorateCooked API for posts and chat messages.

This client side logic uses the generated hashtag, category, and channel
CSS classes added in a previous commit.

This is missing changes to the sidebar to use the new generated CSS
classes and also colors and the split square for categories in the
hashtag autocomplete menu -- I will tackle this in a separate PR so it
is clearer.
2023-05-23 09:33:55 +02:00
Martin Brennan 30e7b716b0
FIX: Do not replace hashtag-cooked text with WatchedWords (#19279)
Adds the .hashtag-cooked as an exception for watched
words to not auto-link the text of the hashtag.
2022-12-01 16:31:06 +10:00
Selase Krakani 862007fb18
FEATURE: Add support for case-sensitive Watched Words (#17445)
* FEATURE: Add case-sensitivity flag to watched_words

Currently, all watched words are matched case-insensitively. This flag
allows a watched word to be flagged for case-sensitive matching.
To allow allow for backwards compatibility the flag is set to false by
default.

* FEATURE: Support case-sensitive creation of Watched Words via API

Extend admin creation and upload of Watched Words to support case
sensitive flag. This lays the ground work for supporting
case-insensitive matching of Watched Words.

Support for an extra column has also been introduced for the Watched
Words upload CSV file. The new column structure is as follows:

 word,replacement,case_sentive

* FEATURE: Enable case-sensitive matching of Watched Words

WordWatcher's word_matcher_regexp now returns a list of regular
expressions instead of one case-insensitive regular expression.

With the ability to flag a Watched Word as case-sensitive, an action
can have words of both sensitivities.This makes the use of the global
Regexp::IGNORECASE flag added to all words problematic.

To get around platform limitations around the use of subexpression level
switches/flags, a list of regular expressions is returned instead, one for each
case sensitivity.

Word matching has also been updated to use this list of regular expressions
instead of one.

* FEATURE: Use case-sensitive regular expressions for Watched Words

Update Watched Words regular expressions matching and processing to handle
the extra metadata which comes along with the introduction of
case-sensitive Watched Words.

This allows case-sensitive Watched Words to matched as such.

* DEV: Simplify type casting of case-sensitive flag from uploads

Use builtin semantics instead of a custom method for converting
string case flags in uploaded Watched Words to boolean.

* UX: Add case-sensitivity details to Admin Watched Words UI

Update Watched Word form to include a toggle for case-sensitivity.
This also adds support for, case-sensitive testing and matching of  Watched Word
in the admin UI.

* DEV: Code improvements from review feedback

 - Extract watched word regex creation out to a utility function
 - Make JS array presence check more explicit and readable

* DEV: Extract Watched Word regex creation to utility function

Clean-up work from review feedback. Reduce code duplication.

* DEV: Rename word_matcher_regexp to word_matcher_regexp_list

Since a list is returned now instead of a single regular expression,
change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate
this change.

* DEV:  Incorporate WordWatcher updates from upstream

Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches
that aren't at the beginning of the line.
2022-08-02 10:06:03 +02:00
Bianca Nenciu 32a174d883
FIX: Use Map instead of Object for caching (#14887)
Objects have default properties, such as "constructor" that can cause
issues when using similar texts as keys.
2021-11-12 15:18:07 +02:00
Bianca Nenciu 19ef6995a8
FIX: Do not replace words in hashtags and mentions (#14760)
Watched words were replaced inside mentions and hashtags when watched
word regular expressions were enabled.
2021-10-29 17:53:09 +03:00
Bianca Nenciu 148ee1d162
FIX: Do not perform link lookup for replaced links (#14742)
A link that was added because a watched word was replaced could create
a notice if the same link was present before.
2021-10-28 13:27:31 +03:00
Bianca Nenciu 0532a5a43e
FIX: Do not replace in mentions and hashtags (#14260)
Watched words of type 'replace' or 'link' replaced the text inside
mentions or hashtags too, which broke these. These types of watched
words must skip any match that has an @ or # before it.
2021-09-09 12:03:59 +03:00
Bianca Nenciu 74f7295631
FIX: Add word boundaries to replace and tag watched words (#13405)
The generated regular expressions did not contain \b which matched
every text that contained the word, even if it was only a substring of
a word.

For example, if "art" was a watched word a post containing word
"artist" matched.
2021-06-18 18:54:06 +03:00
Bianca Nenciu d9484db718
FIX: Split link watched words from replace (#13196)
It was not clear that replace watched words can be used to replace text
with URLs. This introduces a new watched word type that makes it easier
to understand.
2021-06-02 15:36:49 +10:00