2013-08-28 19:24:34 -04:00
|
|
|
[[query-dsl-regexp-query]]
|
|
|
|
=== Regexp Query
|
|
|
|
|
|
|
|
The `regexp` query allows you to use regular expression term queries.
|
2013-10-07 08:42:13 -04:00
|
|
|
See <<regexp-syntax>> for details of the supported regular expression language.
|
2014-07-03 05:39:08 -04:00
|
|
|
The "term queries" in that first sentence means that Elasticsearch will apply
|
|
|
|
the regexp to the terms produced by the tokenizer for that field, and not
|
|
|
|
to the original text of the field.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
*Note*: The performance of a `regexp` query heavily depends on the
|
|
|
|
regular expression chosen. Matching everything like `.*` is very slow as
|
|
|
|
well as using lookaround regular expressions. If possible, you should
|
|
|
|
try to use a long prefix before your regular expression starts. Wildcard
|
|
|
|
matchers like `.*?+` will mostly lower performance.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"regexp":{
|
|
|
|
"name.first": "s.*y"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Boosting is also supported
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"regexp":{
|
|
|
|
"name.first":{
|
|
|
|
"value":"s.*y",
|
|
|
|
"boost":1.2
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
You can also use special flags
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"regexp":{
|
2014-04-01 14:43:38 -04:00
|
|
|
"name.first": {
|
|
|
|
"value": "s.*y",
|
|
|
|
"flags" : "INTERSECTION|COMPLEMENT|EMPTY"
|
|
|
|
}
|
2013-08-28 19:24:34 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Possible flags are `ALL`, `ANYSTRING`, `AUTOMATON`, `COMPLEMENT`,
|
|
|
|
`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. Please check the
|
2014-07-03 05:39:08 -04:00
|
|
|
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html[Lucene
|
2013-08-28 19:24:34 -04:00
|
|
|
documentation] for their meaning
|
|
|
|
|
2014-11-10 13:43:48 -05:00
|
|
|
Regular expressions are dangerous because it's easy to accidentally
|
|
|
|
create an innocuous looking one that requires an exponential number of
|
|
|
|
internal determinized automaton states (and corresponding RAM and CPU)
|
|
|
|
for Lucene to execute. Lucene prevents these using the
|
|
|
|
`max_determinized_states` setting (defaults to 10000). You can raise
|
|
|
|
this limit to allow more complex regular expressions to execute.
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"regexp":{
|
|
|
|
"name.first": {
|
|
|
|
"value": "s.*y",
|
|
|
|
"flags" : "INTERSECTION|COMPLEMENT|EMPTY",
|
|
|
|
"max_determinized_states": 20000
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
include::regexp-syntax.asciidoc[]
|