OpenSearch/docs/reference/query-dsl/queries/regexp-query.asciidoc

[[query-dsl-regexp-query]]
=== Regexp Query

The `regexp` query allows you to use regular expression term queries.
See <<regexp-syntax>> for details of the supported regular expression language.
The "term queries" in that first sentence means that Elasticsearch will apply
the regexp to the terms produced by the tokenizer for that field, and not
to the original text of the field.

*Note*: The performance of a `regexp` query heavily depends on the
regular expression chosen. Matching everything like `.*` is very slow as
well as using lookaround regular expressions. If possible, you should
try to use a long prefix before your regular expression starts. Wildcard
matchers like `.*?+` will mostly lower performance.

[source,js]
--------------------------------------------------
{
    "regexp":{
        "name.first": "s.*y"
    }
}
--------------------------------------------------

Boosting is also supported

[source,js]
--------------------------------------------------
{
    "regexp":{
        "name.first":{
            "value":"s.*y",
            "boost":1.2
        }
    }
}
--------------------------------------------------

You can also use special flags

[source,js]
--------------------------------------------------
{
    "regexp":{
        "name.first": {
            "value": "s.*y",
            "flags" : "INTERSECTION|COMPLEMENT|EMPTY"
        }
    }
}
--------------------------------------------------

Possible flags are `ALL` (default), `ANYSTRING`, `COMPLEMENT`,
`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. Please check the
http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html[Lucene
documentation] for their meaning

Regular expressions are dangerous because it's easy to accidentally
create an innocuous looking one that requires an exponential number of
internal determinized automaton states (and corresponding RAM and CPU)
for Lucene to execute.  Lucene prevents these using the
`max_determinized_states` setting (defaults to 10000).  You can raise
this limit to allow more complex regular expressions to execute.

[source,js]
--------------------------------------------------
{
    "regexp":{
        "name.first": {
            "value": "s.*y",
            "flags" : "INTERSECTION|COMPLEMENT|EMPTY",
	    "max_determinized_states": 20000
        }
    }
}
--------------------------------------------------


include::regexp-syntax.asciidoc[]
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[query-dsl-regexp-query]]`
			`=== Regexp Query`

			The `regexp` query allows you to use regular expression term queries.
[DOCS] Added pages explaining lucene query parser syntax and regular expression syntax 2013-10-07 08:42:13 -04:00			`See <<regexp-syntax>> for details of the supported regular expression language.`
Docs: Add clarifying text about regexp and terms For the casual reader, the reference to "term queries" may be glossed over, yielding an unexpected result when using `regexp` queries. This attempts to make that distinction more prominent. Closes #6698 2014-07-03 05:39:08 -04:00			`The "term queries" in that first sentence means that Elasticsearch will apply`
			`the regexp to the terms produced by the tokenizer for that field, and not`
			`to the original text of the field.`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
			Note: The performance of a `regexp` query heavily depends on the
			regular expression chosen. Matching everything like `.*` is very slow as
			`well as using lookaround regular expressions. If possible, you should`
			`try to use a long prefix before your regular expression starts. Wildcard`
			matchers like `.*?+` will mostly lower performance.

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"regexp":{`
			`"name.first": "s.*y"`
			`}`
			`}`
			`--------------------------------------------------`

			`Boosting is also supported`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"regexp":{`
			`"name.first":{`
			`"value":"s.*y",`
			`"boost":1.2`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`You can also use special flags`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"regexp":{`
[DOC] Fixed flags example incorrect syntax 2014-04-01 14:43:38 -04:00			`"name.first": {`
			`"value": "s.*y",`
			`"flags" : "INTERSECTION\|COMPLEMENT\|EMPTY"`
			`}`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`}`
			`}`
			`--------------------------------------------------`

Docs: The regexp query defaults to the `ALL` flag, and removed the `AUTOMATON` flag which is not used in Elasticsearch. Closes #6180 2014-12-30 13:53:15 -05:00			Possible flags are `ALL` (default), `ANYSTRING`, `COMPLEMENT`,
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. Please check the
Docs: Add clarifying text about regexp and terms For the casual reader, the reference to "term queries" may be glossed over, yielding an unexpected result when using `regexp` queries. This attempts to make that distinction more prominent. Closes #6698 2014-07-03 05:39:08 -04:00			`http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/util/automaton/RegExp.html[Lucene`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`documentation] for their meaning`

Core: add max_determinized_states to query_string and regexp query/filter This prevents too-difficult regular expressions from consuming excessive RAM/CPU; the default max_determinized_states is 10,000 (same as Lucene) but query_string and regepx query/filter can override per-request. The also upgrades to a new Lucene 5.0.0 snapshot. Closes #8386 Closes #8357 2014-11-10 13:43:48 -05:00			`Regular expressions are dangerous because it's easy to accidentally`
			`create an innocuous looking one that requires an exponential number of`
			`internal determinized automaton states (and corresponding RAM and CPU)`
			`for Lucene to execute. Lucene prevents these using the`
			`max_determinized_states` setting (defaults to 10000). You can raise
			`this limit to allow more complex regular expressions to execute.`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"regexp":{`
			`"name.first": {`
			`"value": "s.*y",`
			`"flags" : "INTERSECTION\|COMPLEMENT\|EMPTY",`
			`"max_determinized_states": 20000`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

[DOCS] Added pages explaining lucene query parser syntax and regular expression syntax 2013-10-07 08:42:13 -04:00
			`include::regexp-syntax.asciidoc[]`