Added warning messages about the dangers of pathological regexes to:

* pattern-replace charfilter
* pattern-capture and pattern-replace token filters
* pattern tokenizer
* pattern analyzer

Relates to #20038
This commit is contained in:
Clinton Gormley 2016-09-09 09:52:38 +02:00
parent add2fbd7b2
commit 2f6d0119f1
5 changed files with 71 additions and 0 deletions

View File

@ -5,6 +5,21 @@ The `pattern` analyzer uses a regular expression to split the text into terms.
The regular expression should match the *token separators* not the tokens
themselves. The regular expression defaults to `\W+` (or all non-word characters).
[WARNING]
.Beware of Pathological Regular Expressions
========================================
The pattern analyzer uses
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
A badly written regular expression could run very slowly or even throw a
StackOverflowError and cause the node it is running on to exit suddenly.
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
========================================
[float]
=== Definition

View File

@ -5,6 +5,20 @@ The `pattern_replace` character filter uses a regular expression to match
characters which should be replaced with the specified replacement string.
The replacement string can refer to capture groups in the regular expression.
[WARNING]
.Beware of Pathological Regular Expressions
========================================
The pattern replace character filter uses
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
A badly written regular expression could run very slowly or even throw a
StackOverflowError and cause the node it is running on to exit suddenly.
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
========================================
[float]
=== Configuration

View File

@ -7,6 +7,20 @@ Patterns are not anchored to the beginning and end of the string, so
each pattern can match multiple times, and matches are allowed to
overlap.
[WARNING]
.Beware of Pathological Regular Expressions
========================================
The pattern capture token filter uses
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
A badly written regular expression could run very slowly or even throw a
StackOverflowError and cause the node it is running on to exit suddenly.
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
========================================
For instance a pattern like :
[source,js]

View File

@ -7,3 +7,17 @@ defined using the `pattern` parameter, and the replacement string can be
provided using the `replacement` parameter (supporting referencing the
original text, as explained
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]).
[WARNING]
.Beware of Pathological Regular Expressions
========================================
The pattern replace token filter uses
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
A badly written regular expression could run very slowly or even throw a
StackOverflowError and cause the node it is running on to exit suddenly.
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
========================================

View File

@ -8,6 +8,20 @@ terms.
The default pattern is `\W+`, which splits text whenever it encounters
non-word characters.
[WARNING]
.Beware of Pathological Regular Expressions
========================================
The pattern tokenizer uses
http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions].
A badly written regular expression could run very slowly or even throw a
StackOverflowError and cause the node it is running on to exit suddenly.
Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them].
========================================
[float]
=== Example output