Commit Graph

26 Commits

Author SHA1 Message Date
Bianca Nenciu fd07c943ad
DEV: Refactor watched words (#24163)
- Ignore only invalid words, not all words if one of them is invalid

- The naming scheme for methods was inconsistent

- Optimize regular expressions
2023-11-01 16:41:10 +02:00
Régis Hanol 4cb3412a56
PERF: improve `findAllMatches` speed (#22083)
When we introduced unicode support in the regular expressions used in watched words (9a27803) we didn't realize the cost adding the `u` flag would be.

Turns out, it's pretty bad when you have lots of regular expressions to test. A customer had slightly less than 200 watched words, and it would freeze the browser for about 2s on the first check of those regular expressions (roughly 10ms per regular expression).

This commit introduces a new field (`word`) to the serialized watched words which is then converted to a very fast and cheap regular expression on the client-side. We use that regexp to quicly check whether a matcher is even worth trying so that we don't incure the cost of compiling the expensive unicode regexp.

This commit also busts the `WordWatcher` cache since we added a new field to be serialized.

One nice side effect of using `matchAll` instead of a `while / exec` loop is that the likeliness of having a bad regexp matching infinitely is vastly reduced 🙌
2023-06-13 18:34:28 +02:00
Bianca Nenciu 9a2780397f
FIX: Handle all UTF-8 characters (#21344)
Watched words were converted to regular expressions containing \W, which
handled only ASCII characters. Using [^[:word]] instead ensures that
UTF-8 characters are also handled correctly.
2023-05-15 12:45:04 +03:00
Natalie Tay 44b7706a2b
UX: Skip applying link-type watched words to user custom fields (#20465)
We currently apply type: :link watched words to custom user fields. This makes the user card pretty ugly because we don't allow html / links there. Additionally, the admin UI also does not say that we apply this to custom user fields, but only words in posts.

So this PR is to remove the replacement of link-type watch words for custom user fields.
2023-03-01 10:43:34 +08:00
David Taylor cb932d6ee1
DEV: Apply syntax_tree formatting to `spec/*` 2023-01-09 11:49:28 +00:00
Loïc Guitaut 3eaac56797 DEV: Use proper wording for contexts in specs 2022-08-04 11:05:02 +02:00
Selase Krakani 862007fb18
FEATURE: Add support for case-sensitive Watched Words (#17445)
* FEATURE: Add case-sensitivity flag to watched_words

Currently, all watched words are matched case-insensitively. This flag
allows a watched word to be flagged for case-sensitive matching.
To allow allow for backwards compatibility the flag is set to false by
default.

* FEATURE: Support case-sensitive creation of Watched Words via API

Extend admin creation and upload of Watched Words to support case
sensitive flag. This lays the ground work for supporting
case-insensitive matching of Watched Words.

Support for an extra column has also been introduced for the Watched
Words upload CSV file. The new column structure is as follows:

 word,replacement,case_sentive

* FEATURE: Enable case-sensitive matching of Watched Words

WordWatcher's word_matcher_regexp now returns a list of regular
expressions instead of one case-insensitive regular expression.

With the ability to flag a Watched Word as case-sensitive, an action
can have words of both sensitivities.This makes the use of the global
Regexp::IGNORECASE flag added to all words problematic.

To get around platform limitations around the use of subexpression level
switches/flags, a list of regular expressions is returned instead, one for each
case sensitivity.

Word matching has also been updated to use this list of regular expressions
instead of one.

* FEATURE: Use case-sensitive regular expressions for Watched Words

Update Watched Words regular expressions matching and processing to handle
the extra metadata which comes along with the introduction of
case-sensitive Watched Words.

This allows case-sensitive Watched Words to matched as such.

* DEV: Simplify type casting of case-sensitive flag from uploads

Use builtin semantics instead of a custom method for converting
string case flags in uploaded Watched Words to boolean.

* UX: Add case-sensitivity details to Admin Watched Words UI

Update Watched Word form to include a toggle for case-sensitivity.
This also adds support for, case-sensitive testing and matching of  Watched Word
in the admin UI.

* DEV: Code improvements from review feedback

 - Extract watched word regex creation out to a utility function
 - Make JS array presence check more explicit and readable

* DEV: Extract Watched Word regex creation to utility function

Clean-up work from review feedback. Reduce code duplication.

* DEV: Rename word_matcher_regexp to word_matcher_regexp_list

Since a list is returned now instead of a single regular expression,
change `word_matcher_regexp` to `word_matcher_regexp_list` to better communicate
this change.

* DEV:  Incorporate WordWatcher updates from upstream

Resolve conflicts and ensure apply_to_text does not remove non-word characters in matches
that aren't at the beginning of the line.
2022-08-02 10:06:03 +02:00
Phil Pirozhkov 493d437e79
Add RSpec 4 compatibility (#17652)
* Remove outdated option

04078317ba

* Use the non-globally exposed RSpec syntax

https://github.com/rspec/rspec-core/pull/2803

* Use the non-globally exposed RSpec syntax, cont

https://github.com/rspec/rspec-core/pull/2803

* Comply to strict predicate matchers

See:
 - https://github.com/rspec/rspec-expectations/pull/1195
 - https://github.com/rspec/rspec-expectations/pull/1196
 - https://github.com/rspec/rspec-expectations/pull/1277
2022-07-28 10:27:38 +08:00
Bianca Nenciu 5f13ca5e54
FIX: Don't cook user fields to apply watched words (#17590)
The previous method for reused the PrettyText logic which applied the
watched word logic, but had the unwanted effect of cooking the text too.
This meant that regular text values were converted to HTML.

Follow up to commit 5a4c35f627.
2022-07-26 18:15:42 +03:00
David Taylor c9dab6fd08
DEV: Automatically require 'rails_helper' in all specs (#16077)
It's very easy to forget to add `require 'rails_helper'` at the top of every core/plugin spec file, and omissions can cause some very confusing/sporadic errors.

By setting this flag in `.rspec`, we can remove the need for `require 'rails_helper'` entirely.
2022-03-01 17:50:50 +00:00
Josh Soref 59097b207f
DEV: Correct typos and spelling mistakes (#12812)
Over the years we accrued many spelling mistakes in the code base. 

This PR attempts to fix spelling mistakes and typos in all areas of the code that are extremely safe to change 

- comments
- test descriptions
- other low risk areas
2021-05-21 11:43:47 +10:00
Michael Brown d9a02d1336
Revert "Revert "Merge branch 'master' of https://github.com/discourse/discourse""
This reverts commit 20780a1eee.

* SECURITY: re-adds accidentally reverted commit:
  03d26cd6: ensure embed_url contains valid http(s) uri
* when the merge commit e62a85cf was reverted, git chose the 2660c2e2 parent to land on
  instead of the 03d26cd6 parent (which contains security fixes)
2020-05-23 00:56:13 -04:00
Jeff Atwood 20780a1eee Revert "Merge branch 'master' of https://github.com/discourse/discourse"
This reverts commit e62a85cf6f, reversing
changes made to 2660c2e21d.
2020-05-22 20:25:56 -07:00
Guo Xiang Tan 96c02caba7
DEV: Change use of Redis `flushall` to `flushdb`.
FLUSHALL removes all keys from all databases. Instead we only want to
remove keys from the current Redis database.
2020-05-19 10:20:00 +08:00
Joffrey JAFFEUX 0d3d2c43a0
DEV: s/\$redis/Discourse\.redis (#8431)
This commit also adds a rubocop rule to prevent global variables.
2019-12-03 10:05:53 +01:00
David Taylor 39e0442de9 FIX: Various watched words improvements
- Client-side censoring fixed for non-chrome browsers. (Regular expression rewritten to avoid lookback)
- Regex generation is now done on the server, to reduce repeated logic, and make it easier to extend in plugins
- Censor tests are moved to ruby, to ensure everything works end-to-end
- If "watched words regular expressions" is enabled, warn the admin when the generated regex is invalid
2019-08-02 15:29:12 +01:00
Osama Sayegh f14c6d81f4
FEATURE: Watched words improvements (#7899)
This commit contains 3 features:

- FEATURE: Allow downloading watched words
This introduces a button that allows admins to download watched words per action in a `.txt` file.

- FEATURE: Allow clearing watched words in bulk
This adds a "Clear All" button that clears all deleted words per action (e.g. block, flag etc.)

- FEATURE: List all blocked words contained in the post when it's blocked
When a post is rejected because it contains one or more blocked words, the error message now lists all the blocked words contained in the post.

-------

This also changes the format of the file for importing watched words from `.csv` to `.txt` so it becomes inconsistent with the extension of the file when watched words are exported.
2019-07-22 14:59:56 +03:00
Daniel Waterworth e219588142 DEV: Prefabrication (test optimization) (#7414)
* Introduced fab!, a helper that creates database state for a group

It's almost identical to let_it_be, except:

 1. It creates a new object for each test by default,
 2. You can disable it using PREFABRICATION=0
2019-05-07 13:12:20 +10:00
Sam Saffron 4ea21fa2d0 DEV: use #frozen_string_literal: true on all spec
This change both speeds up specs (less strings to allocate) and helps catch
cases where methods in Discourse are mutating inputs.

Overall we will be migrating everything to use #frozen_string_literal: true
it will take a while, but this is the first and safest move in this direction
2019-04-30 10:27:42 +10:00
Arpit Jalan a960cbd97f fix the build ❤️ 2019-02-18 10:00:17 +05:30
Arpit Jalan 7cb194f2db Add more specs for word watcher service. 2019-02-18 09:55:16 +05:30
Arpit Jalan 99c6db21e6
FEATURE: allow blocking emojis (#7011)
https://meta.discourse.org/t/blocking-emojis-wont-work/105853
2019-02-15 20:55:48 +05:30
Neil Lalonde 8f21c96ea5 FIX: don't downcase watched words on input since it can break the watched_words_regular_expressions setting 2018-01-09 16:51:59 -05:00
Robin Ward d755c9c90f FIX: Allow regular expressions to specify boundaries 2017-11-17 14:13:44 -05:00
Robin Ward 41c3941c4c FEATURE: Support regular expressions for watched words 2017-09-27 15:48:57 -04:00
Neil Lalonde 24cb950432 FEATURE: Watched Words: when posts contain words, do one of flag, require approval, censor, or block 2017-07-26 11:01:09 -04:00