2013-10-07 08:42:13 -04:00
|
|
|
[[regexp-syntax]]
|
2019-07-24 08:37:37 -04:00
|
|
|
== Regular expression syntax
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
A https://en.wikipedia.org/wiki/Regular_expression[regular expression] is a way to
|
|
|
|
match patterns in data using placeholder characters, called operators.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
{es} supports regular expressions in the following queries:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
* <<query-dsl-regexp-query, `regexp`>>
|
|
|
|
* <<query-dsl-query-string-query, `query_string`>>
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
{es} uses https://lucene.apache.org/core/[Apache Lucene]'s regular expression
|
|
|
|
engine to parse these queries.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2019-07-24 08:37:37 -04:00
|
|
|
[[regexp-reserved-characters]]
|
|
|
|
=== Reserved characters
|
|
|
|
Lucene's regular expression engine supports all Unicode characters. However, the
|
|
|
|
following characters are reserved as operators:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
....
|
|
|
|
. ? + * | { } [ ] ( ) " \
|
|
|
|
....
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
Depending on the <<regexp-optional-operators, optional operators>> enabled, the
|
|
|
|
following characters may also be reserved:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
# @ & < > ~
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
To use one of these characters literally, escape it with a preceding
|
|
|
|
backslash or surround it with double quotes. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
\@ # renders as a literal '@'
|
|
|
|
\\ # renders as a literal '\'
|
|
|
|
"john@smith.com" # renders as 'john@smith.com'
|
|
|
|
....
|
|
|
|
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2019-07-24 08:37:37 -04:00
|
|
|
[[regexp-standard-operators]]
|
|
|
|
=== Standard operators
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
Lucene's regular expression engine does not use the
|
|
|
|
https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions[Perl
|
|
|
|
Compatible Regular Expressions (PCRE)] library, but it does support the
|
|
|
|
following standard operators.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`.`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Matches any character. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
ab. # matches 'aba', 'abb', 'abz', etc.
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`?`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Repeat the preceding character zero or one times. Often used to make the
|
|
|
|
preceding character optional. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
abc? # matches 'ab' and 'abc'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`+`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Repeat the preceding character one or more times. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
2020-03-06 11:16:41 -05:00
|
|
|
ab+ # matches 'ab', 'abb', 'abbb', etc.
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`*`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Repeat the preceding character zero or more times. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
2020-03-06 11:16:41 -05:00
|
|
|
ab* # matches 'a', 'ab', 'abb', 'abbb', etc.
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`{}`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Minimum and maximum number of times the preceding character can repeat. For
|
|
|
|
example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
a{2} # matches 'aa'
|
|
|
|
a{2,4} # matches 'aa', 'aaa', and 'aaaa'
|
|
|
|
a{2,} # matches 'a` repeated two or more times
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`|`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
OR operator. The match will succeed if the longest pattern on either the left
|
|
|
|
side OR the right side matches. For example:
|
|
|
|
....
|
|
|
|
abc|xyz # matches 'abc' and 'xyz'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`( … )`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Forms a group. You can use a group to treat part of the expression as a single
|
|
|
|
character. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
abc(def)? # matches 'abc' and 'abcdef' but not 'abcd'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`[ … ]`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Match one of the characters in the brackets. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
[abc] # matches 'a', 'b', 'c'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
Inside the brackets, `-` indicates a range unless `-` is the first character or
|
|
|
|
escaped. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
[a-c] # matches 'a', 'b', or 'c'
|
|
|
|
[-abc] # '-' is first character. Matches '-', 'a', 'b', or 'c'
|
|
|
|
[abc\-] # Escapes '-'. Matches 'a', 'b', 'c', or '-'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
A `^` before a character in the brackets negates the character or range. For
|
|
|
|
example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
[^abc] # matches any character except 'a', 'b', or 'c'
|
|
|
|
[^a-c] # matches any character except 'a', 'b', or 'c'
|
|
|
|
[^-abc] # matches any character except '-', 'a', 'b', or 'c'
|
|
|
|
[^abc\-] # matches any character except 'a', 'b', 'c', or '-'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2019-07-24 08:37:37 -04:00
|
|
|
[[regexp-optional-operators]]
|
|
|
|
=== Optional operators
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
You can use the `flags` parameter to enable more optional operators for
|
|
|
|
Lucene's regular expression engine.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
To enable multiple operators, use a `|` separator. For example, a `flags` value
|
|
|
|
of `COMPLEMENT|INTERVAL` enables the `COMPLEMENT` and `INTERVAL` operators.
|
2016-03-14 05:46:31 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2019-07-24 08:37:37 -04:00
|
|
|
==== Valid values
|
2016-03-14 05:46:31 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`ALL` (Default)::
|
|
|
|
Enables all optional operators.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`COMPLEMENT`::
|
|
|
|
+
|
|
|
|
--
|
|
|
|
Enables the `~` operator. You can use `~` to negate the shortest following
|
|
|
|
pattern. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
a~bc # matches 'adc' and 'aec' but not 'abc'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`INTERVAL`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Enables the `<>` operators. You can use `<>` to match a numeric range. For
|
|
|
|
example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
foo<1-100> # matches 'foo1', 'foo2' ... 'foo99', 'foo100'
|
|
|
|
foo<01-100> # matches 'foo01', 'foo02' ... 'foo99', 'foo100'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`INTERSECTION`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Enables the `&` operator, which acts as an AND operator. The match will succeed
|
|
|
|
if patterns on both the left side AND the right side matches. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
aaa.+&.+bbb # matches 'aaabbb'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
`ANYSTRING`::
|
2013-10-07 08:42:13 -04:00
|
|
|
+
|
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
Enables the `@` operator. You can use `@` to match any entire
|
|
|
|
string.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
You can combine the `@` operator with `&` and `~` operators to create an
|
|
|
|
"everything except" logic. For example:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2019-07-24 08:37:37 -04:00
|
|
|
....
|
|
|
|
@&~(abc.+) # matches everything except terms beginning with 'abc'
|
|
|
|
....
|
2013-10-07 08:42:13 -04:00
|
|
|
--
|
2019-07-24 08:37:37 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2019-07-24 08:37:37 -04:00
|
|
|
[[regexp-unsupported-operators]]
|
|
|
|
=== Unsupported operators
|
|
|
|
Lucene's regular expression engine does not support anchor operators, such as
|
|
|
|
`^` (beginning of line) or `$` (end of line). To match a term, the regular
|
2020-03-06 11:16:41 -05:00
|
|
|
expression must match the entire string.
|