Added warning messages about the dangers of pathological regexes to:
* pattern-replace charfilter * pattern-capture and pattern-replace token filters * pattern tokenizer * pattern analyzer Relates to #20038
This commit is contained in:
parent
add2fbd7b2
commit
2f6d0119f1
|
@ -5,6 +5,21 @@ The `pattern` analyzer uses a regular expression to split the text into terms.
|
|||
The regular expression should match the *token separators* not the tokens
|
||||
themselves. The regular expression defaults to `\W+` (or all non-word characters).
|
||||
|
||||
[WARNING]
|
||||
.Beware of Pathological Regular Expressions
|
||||
========================================
|
||||
|
||||
The pattern analyzer uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
||||
|
||||
[float]
|
||||
=== Definition
|
||||
|
||||
|
|
|
@ -5,6 +5,20 @@ The `pattern_replace` character filter uses a regular expression to match
|
|||
characters which should be replaced with the specified replacement string.
|
||||
The replacement string can refer to capture groups in the regular expression.
|
||||
|
||||
[WARNING]
|
||||
.Beware of Pathological Regular Expressions
|
||||
========================================
|
||||
|
||||
The pattern replace character filter uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
||||
[float]
|
||||
=== Configuration
|
||||
|
||||
|
|
|
@ -7,6 +7,20 @@ Patterns are not anchored to the beginning and end of the string, so
|
|||
each pattern can match multiple times, and matches are allowed to
|
||||
overlap.
|
||||
|
||||
[WARNING]
|
||||
.Beware of Pathological Regular Expressions
|
||||
========================================
|
||||
|
||||
The pattern capture token filter uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
||||
For instance a pattern like :
|
||||
|
||||
[source,js]
|
||||
|
|
|
@ -7,3 +7,17 @@ defined using the `pattern` parameter, and the replacement string can be
|
|||
provided using the `replacement` parameter (supporting referencing the
|
||||
original text, as explained
|
||||
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).
|
||||
|
||||
[WARNING]
|
||||
.Beware of Pathological Regular Expressions
|
||||
========================================
|
||||
|
||||
The pattern replace token filter uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
|
|
@ -8,6 +8,20 @@ terms.
|
|||
The default pattern is `\W+`, which splits text whenever it encounters
|
||||
non-word characters.
|
||||
|
||||
[WARNING]
|
||||
.Beware of Pathological Regular Expressions
|
||||
========================================
|
||||
|
||||
The pattern tokenizer uses
|
||||
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
|
||||
|
||||
A badly written regular expression could run very slowly or even throw a
|
||||
StackOverflowError and cause the node it is running on to exit suddenly.
|
||||
|
||||
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
|
||||
|
||||
========================================
|
||||
|
||||
[float]
|
||||
=== Example output
|
||||
|
||||
|
|
Loading…
Reference in New Issue