discourse

Commit Graph

Author	SHA1	Message	Date
Régis Hanol	4cb3412a56	PERF: improve `findAllMatches` speed (#22083 ) When we introduced unicode support in the regular expressions used in watched words (`9a27803`) we didn't realize the cost adding the `u` flag would be. Turns out, it's pretty bad when you have lots of regular expressions to test. A customer had slightly less than 200 watched words, and it would freeze the browser for about 2s on the first check of those regular expressions (roughly 10ms per regular expression). This commit introduces a new field (`word`) to the serialized watched words which is then converted to a very fast and cheap regular expression on the client-side. We use that regexp to quicly check whether a matcher is even worth trying so that we don't incure the cost of compiling the expensive unicode regexp. This commit also busts the `WordWatcher` cache since we added a new field to be serialized. One nice side effect of using `matchAll` instead of a `while / exec` loop is that the likeliness of having a bad regexp matching infinitely is vastly reduced 🙌	2023-06-13 18:34:28 +02:00
Bianca Nenciu	9a2780397f	FIX: Handle all UTF-8 characters (#21344 ) Watched words were converted to regular expressions containing \W, which handled only ASCII characters. Using [^[:word]] instead ensures that UTF-8 characters are also handled correctly.	2023-05-15 12:45:04 +03:00
Natalie Tay	44b7706a2b	UX: Skip applying link-type watched words to user custom fields (#20465 ) We currently apply type: :link watched words to custom user fields. This makes the user card pretty ugly because we don't allow html / links there. Additionally, the admin UI also does not say that we apply this to custom user fields, but only words in posts. So this PR is to remove the replacement of link-type watch words for custom user fields.	2023-03-01 10:43:34 +08:00
David Taylor	5a003715d3	DEV: Apply syntax_tree formatting to `app/*`	2023-01-09 14:14:59 +00:00
Bianca Nenciu	d5dc4ca0e9	FIX: Make word watcher work with nil strings (#17830 ) Censoring or replacing nil strings raised an error.	2022-08-08 16:34:51 -03:00
David Taylor	381365facc	FIX: Update word_watcher cache key following schema change (#17755 ) `862007fb18` introduced a change to the format that watched words are cached in Redis. Newly-deployed versions of the app were attempting to load the old-format data from Redis, leading to a server error. This commit introduces a CACHE_VERSION constant which we can easily bump when making changes to the cache schema.	2022-08-02 12:11:08 +01:00
Selase Krakani	862007fb18	FEATURE: Add support for case-sensitive Watched Words (#17445 ) * FEATURE: Add case-sensitivity flag to watched_words Currently, all watched words are matched case-insensitively. This flag allows a watched word to be flagged for case-sensitive matching. To allow allow for backwards compatibility the flag is set to false by default. * FEATURE: Support case-sensitive creation of Watched Words via API Extend admin creation and upload of Watched Words to support case sensitive flag. This lays the ground work for supporting case-insensitive matching of Watched Words. Support for an extra column has also been introduced for the Watched Words upload CSV file. The new column structure is as follows: word,replacement,case_sentive * FEATURE: Enable case-sensitive matching of Watched Words WordWatcher's word_matcher_regexp now returns a list of regular expressions instead of one case-insensitive regular expression. With the ability to flag a Watched Word as case-sensitive, an action can have words of both sensitivities.This makes the use of the global Regexp::IGNORECASE flag added to all words problematic. To get around platform limitations around the use of subexpression level switches/flags, a list of regular expressions is returned instead, one for each case sensitivity. Word matching has also been updated to use this list of regular expressions instead of one. * FEATURE: Use case-sensitive regular expressions for Watched Words Update Watched Words regular expressions matching and processing to handle the extra metadata which comes along with the introduction of case-sensitive Watched Words. This allows case-sensitive Watched Words to matched as such. * DEV: Simplify type casting of case-sensitive flag from uploads Use builtin semantics instead of a custom method for converting string case flags in uploaded Watched Words to boolean. * UX: Add case-sensitivity details to Admin Watched Words UI Update Watched Word form to include a toggle for case-sensitivity. This also adds support for, case-sensitive testing and matching of Watched Word in the admin UI. * DEV: Code improvements from review feedback - Extract watched word regex creation out to a utility function - Make JS array presence check more explicit and readable * DEV: Extract Watched Word regex creation to utility function Clean-up work from review feedback. Reduce code duplication. * DEV: Rename word_matcher_regexp to word_matcher_regexp_list Since a list is returned now instead of a single regular expression, change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate this change. * DEV: Incorporate WordWatcher updates from upstream Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches that aren't at the beginning of the line.	2022-08-02 10:06:03 +02:00
Bianca Nenciu	5f13ca5e54	FIX: Don't cook user fields to apply watched words (#17590 ) The previous method for reused the PrettyText logic which applied the watched word logic, but had the unwanted effect of cooking the text too. This meant that regular text values were converted to HTML. Follow up to commit `5a4c35f627`.	2022-07-26 18:15:42 +03:00
Jarek Radosz	20e34b5da6	DEV: Stabilize watched words order (#17215 ) Fixes a flaky spec: ``` 1) WordWatcher.word_matcher_regexp format of the result regexp is correct when watched_words_regular_expressions = true Failure/Error: expect(regexp.inspect).to eq("/(#{word1})\|(#{word2})/i") expected: "/(word35)\|(word36)/i" got: "/(word36)\|(word35)/i" (compared using ==) # ./spec/services/word_watcher_spec.rb:19:in `block (4 levels) in <main>' ```	2022-06-23 15:38:12 +02:00
Bianca Nenciu	7328a2bfb0	FIX: Apply censored words to inline onebox (#16873 ) Censored watched words were not censored inside the title of an inline oneboxes. Malicious users could exploit this behaviour to insert bad words. The same issue has been fixed for regular Oneboxes in commit `d184fe59ca`.	2022-05-25 14:51:47 +03:00
Alan Guo Xiang Tan	6edf101d5f	DEV: Minor improvements to WordWatcher (#16735 ) Follow-up to `fd1dc91eed`	2022-05-24 10:23:54 +08:00
Alan Guo Xiang Tan	fd1dc91eed	DEV: Don't cache watched words in test env (#16731 ) The cache was causing state to leak between tests since the `WatchedWord` record in the DB would have been rolled back but `WordWatcher` still had the word in the cache.	2022-05-12 14:45:05 +08:00
Bianca Nenciu	186379adac	FIX: Cache all watched words (#14992 ) It used to cache up to 1000 words, but the maximum number of watched word is 2000.	2021-11-17 18:59:44 +02:00
Bianca Nenciu	74f7295631	FIX: Add word boundaries to replace and tag watched words (#13405 ) The generated regular expressions did not contain \b which matched every text that contained the word, even if it was only a substring of a word. For example, if "art" was a watched word a post containing word "artist" matched.	2021-06-18 18:54:06 +03:00
Bianca Nenciu	d184fe59ca	FEATURE: Censor Oneboxes (#12902 ) Previously onebox content was not passed by the censor regex, meaning you could sneak in censored words via onebox.	2021-06-03 11:39:12 +10:00
Bianca Nenciu	d9484db718	FIX: Split link watched words from replace (#13196 ) It was not clear that replace watched words can be used to replace text with URLs. This introduces a new watched word type that makes it easier to understand.	2021-06-02 15:36:49 +10:00
Bianca Nenciu	571ee4537a	FEATURE: Silence watched word (#13160 ) This is a new type of watched word to replace auto_silence_first_post_ regex site setting.	2021-05-27 19:19:58 +03:00
Bianca Nenciu	c1dfd76658	FIX: Make replace watched words work with wildcard (#13084 ) Watched words are always regular expressions, despite watched_words_ _regular_expressions being enabled or not. Internally, wildcard characters are replaced with a regular expression that matches any non whitespace character.	2021-05-18 12:09:47 +03:00
Bianca Nenciu	3a1b05f219	FIX: Make autotag watched words case insensitive (#13043 ) * FIX: Hide tag watched words if tagging is disabled These 'autotag' words were shown even if tagging was disabled. * FIX: Make autotag watched words case insensitive This commit also fixes the bug when no tag was applied if no other tag was already present.	2021-05-14 16:52:10 +03:00
Sam	e4f1760bab	FEATURE: watch title for automatic tagging (#12782 ) Previously watched words ignored topic titles when applying auto tagging rules. Also copy has been improved to reflect how the system behaves. The text hints that we are only watching first post now	2021-04-21 18:16:25 +03:00
Bianca Nenciu	b49b455e47	FEATURE: Autotag watched words (#12244 ) New topics with be matched against a set of watched words and be tagged accordingly.	2021-03-03 10:53:38 +02:00
Bianca Nenciu	533800a87b	Add watched words of type "replace" (#12020 ) This commit includes other various improvements to watched words. auto_silence_first_post_regex site setting was removed because it overlapped with 'require approval' watched words.	2021-02-25 14:00:58 +02:00
David Taylor	39e0442de9	FIX: Various watched words improvements - Client-side censoring fixed for non-chrome browsers. (Regular expression rewritten to avoid lookback) - Regex generation is now done on the server, to reduce repeated logic, and make it easier to extend in plugins - Censor tests are moved to ruby, to ensure everything works end-to-end - If "watched words regular expressions" is enabled, warn the admin when the generated regex is invalid	2019-08-02 15:29:12 +01:00
Osama Sayegh	f14c6d81f4	FEATURE: Watched words improvements (#7899 ) This commit contains 3 features: - FEATURE: Allow downloading watched words This introduces a button that allows admins to download watched words per action in a `.txt` file. - FEATURE: Allow clearing watched words in bulk This adds a "Clear All" button that clears all deleted words per action (e.g. block, flag etc.) - FEATURE: List all blocked words contained in the post when it's blocked When a post is rejected because it contains one or more blocked words, the error message now lists all the blocked words contained in the post. ------- This also changes the format of the file for importing watched words from `.csv` to `.txt` so it becomes inconsistent with the extension of the file when watched words are exported.	2019-07-22 14:59:56 +03:00
Jeff Wong	6de254f642	FIX: iterate when clearing watched words cache	2019-06-24 17:17:56 -07:00
Sam Saffron	30990006a9	DEV: enable frozen string literal on all files This reduces chances of errors where consumers of strings mutate inputs and reduces memory usage of the app. Test suite passes now, but there may be some stuff left, so we will run a few sites on a branch prior to merging	2019-05-13 09:31:32 +08:00
Arpit Jalan	99c6db21e6	FEATURE: allow blocking emojis (#7011 ) https://meta.discourse.org/t/blocking-emojis-wont-work/105853	2019-02-15 20:55:48 +05:30
Robin Ward	3785429948	FIX: Missing word boundaries when non-regexp	2017-11-17 14:37:31 -05:00
Robin Ward	d755c9c90f	FIX: Allow regular expressions to specify boundaries	2017-11-17 14:13:44 -05:00
Robin Ward	41c3941c4c	FEATURE: Support regular expressions for watched words	2017-09-27 15:48:57 -04:00
Guo Xiang Tan	5012d46cbd	Add rubocop to our build. (#5004 )	2017-07-28 10:20:09 +09:00
Neil Lalonde	24cb950432	FEATURE: Watched Words: when posts contain words, do one of flag, require approval, censor, or block	2017-07-26 11:01:09 -04:00

32 Commits