From 2f6d0119f1ade206e16e322d9b0616940a05c1cf Mon Sep 17 00:00:00 2001 From: Clinton Gormley Date: Fri, 9 Sep 2016 09:52:38 +0200 Subject: [PATCH] Added warning messages about the dangers of pathological regexes to: * pattern-replace charfilter * pattern-capture and pattern-replace token filters * pattern tokenizer * pattern analyzer Relates to #20038 --- .../analysis/analyzers/pattern-analyzer.asciidoc | 15 +++++++++++++++ .../pattern-replace-charfilter.asciidoc | 14 ++++++++++++++ .../pattern-capture-tokenfilter.asciidoc | 14 ++++++++++++++ .../pattern_replace-tokenfilter.asciidoc | 14 ++++++++++++++ .../tokenizers/pattern-tokenizer.asciidoc | 14 ++++++++++++++ 5 files changed, 71 insertions(+) diff --git a/docs/reference/analysis/analyzers/pattern-analyzer.asciidoc b/docs/reference/analysis/analyzers/pattern-analyzer.asciidoc index 36901de0d15..7504be927d9 100644 --- a/docs/reference/analysis/analyzers/pattern-analyzer.asciidoc +++ b/docs/reference/analysis/analyzers/pattern-analyzer.asciidoc @@ -5,6 +5,21 @@ The `pattern` analyzer uses a regular expression to split the text into terms. The regular expression should match the *token separators* not the tokens themselves. The regular expression defaults to `\W+` (or all non-word characters). +[WARNING] +.Beware of Pathological Regular Expressions +======================================== + +The pattern analyzer uses +http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions]. + +A badly written regular expression could run very slowly or even throw a +StackOverflowError and cause the node it is running on to exit suddenly. + +Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them]. + +======================================== + + [float] === Definition diff --git a/docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc b/docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc index 3f4bf9aa05a..32ee14d8f55 100644 --- a/docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc +++ b/docs/reference/analysis/charfilters/pattern-replace-charfilter.asciidoc @@ -5,6 +5,20 @@ The `pattern_replace` character filter uses a regular expression to match characters which should be replaced with the specified replacement string. The replacement string can refer to capture groups in the regular expression. +[WARNING] +.Beware of Pathological Regular Expressions +======================================== + +The pattern replace character filter uses +http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions]. + +A badly written regular expression could run very slowly or even throw a +StackOverflowError and cause the node it is running on to exit suddenly. + +Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them]. + +======================================== + [float] === Configuration diff --git a/docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc b/docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc index 7c919b56b98..ccde46a3fd2 100644 --- a/docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc +++ b/docs/reference/analysis/tokenfilters/pattern-capture-tokenfilter.asciidoc @@ -7,6 +7,20 @@ Patterns are not anchored to the beginning and end of the string, so each pattern can match multiple times, and matches are allowed to overlap. +[WARNING] +.Beware of Pathological Regular Expressions +======================================== + +The pattern capture token filter uses +http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions]. + +A badly written regular expression could run very slowly or even throw a +StackOverflowError and cause the node it is running on to exit suddenly. + +Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them]. + +======================================== + For instance a pattern like : [source,js] diff --git a/docs/reference/analysis/tokenfilters/pattern_replace-tokenfilter.asciidoc b/docs/reference/analysis/tokenfilters/pattern_replace-tokenfilter.asciidoc index 54e08426e8b..bc8cdc385bf 100644 --- a/docs/reference/analysis/tokenfilters/pattern_replace-tokenfilter.asciidoc +++ b/docs/reference/analysis/tokenfilters/pattern_replace-tokenfilter.asciidoc @@ -7,3 +7,17 @@ defined using the `pattern` parameter, and the replacement string can be provided using the `replacement` parameter (supporting referencing the original text, as explained http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)[here]). + +[WARNING] +.Beware of Pathological Regular Expressions +======================================== + +The pattern replace token filter uses +http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions]. + +A badly written regular expression could run very slowly or even throw a +StackOverflowError and cause the node it is running on to exit suddenly. + +Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them]. + +======================================== diff --git a/docs/reference/analysis/tokenizers/pattern-tokenizer.asciidoc b/docs/reference/analysis/tokenizers/pattern-tokenizer.asciidoc index c96fd08c952..5e1b33512d6 100644 --- a/docs/reference/analysis/tokenizers/pattern-tokenizer.asciidoc +++ b/docs/reference/analysis/tokenizers/pattern-tokenizer.asciidoc @@ -8,6 +8,20 @@ terms. The default pattern is `\W+`, which splits text whenever it encounters non-word characters. +[WARNING] +.Beware of Pathological Regular Expressions +======================================== + +The pattern tokenizer uses +http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html[Java Regular Expressions]. + +A badly written regular expression could run very slowly or even throw a +StackOverflowError and cause the node it is running on to exit suddenly. + +Read more about http://www.regular-expressions.info/catastrophic.html[pathological regular expressions and how to avoid them]. + +======================================== + [float] === Example output