2013-10-07 08:42:13 -04:00
|
|
|
[[query-string-syntax]]
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
==== Query string syntax
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
The query string ``mini-language'' is used by the
|
2013-12-16 12:07:33 -05:00
|
|
|
<<query-dsl-query-string-query>> and by the
|
2013-10-16 10:31:36 -04:00
|
|
|
`q` query string parameter in the <<search-search,`search` API>>.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
The query string is parsed into a series of _terms_ and _operators_. A
|
|
|
|
term can be a single word -- `quick` or `brown` -- or a phrase, surrounded by
|
|
|
|
double quotes -- `"quick brown"` -- which searches for all the words in the
|
|
|
|
phrase, in the same order.
|
|
|
|
|
|
|
|
Operators allow you to customize the search -- the available options are
|
|
|
|
explained below.
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Field names
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
As mentioned in <<query-dsl-query-string-query>>, the `default_field` is searched for the
|
|
|
|
search terms, but it is possible to specify other fields in the query syntax:
|
|
|
|
|
|
|
|
* where the `status` field contains `active`
|
|
|
|
|
|
|
|
status:active
|
|
|
|
|
2018-03-30 10:36:40 -04:00
|
|
|
* where the `title` field contains `quick` or `brown`
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2014-07-14 10:14:20 -04:00
|
|
|
title:(quick OR brown)
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
* where the `author` field contains the exact phrase `"john smith"`
|
|
|
|
|
|
|
|
author:"John Smith"
|
|
|
|
|
|
|
|
* where any of the fields `book.title`, `book.content` or `book.date` contains
|
|
|
|
`quick` or `brown` (note how we need to escape the `*` with a backslash):
|
|
|
|
|
2018-03-30 10:36:40 -04:00
|
|
|
book.\*:(quick OR brown)
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
* where the field `title` has any non-null value:
|
|
|
|
|
|
|
|
_exists_:title
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Wildcards
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
Wildcard searches can be run on individual terms, using `?` to replace
|
|
|
|
a single character, and `*` to replace zero or more characters:
|
|
|
|
|
|
|
|
qu?ck bro*
|
|
|
|
|
|
|
|
Be aware that wildcard queries can use an enormous amount of memory and
|
|
|
|
perform very badly -- just think how many terms need to be queried to
|
|
|
|
match the query string `"a* b* c*"`.
|
|
|
|
|
2017-10-04 03:55:26 -04:00
|
|
|
[WARNING]
|
|
|
|
=======
|
|
|
|
Pure wildcards `\*` are rewritten to <<query-dsl-exists-query,`exists`>> queries for efficiency.
|
2018-01-23 14:29:11 -05:00
|
|
|
As a consequence, the wildcard `"field:*"` would match documents with an empty value
|
2017-10-04 03:55:26 -04:00
|
|
|
like the following:
|
|
|
|
```
|
|
|
|
{
|
|
|
|
"field": ""
|
|
|
|
}
|
|
|
|
```
|
|
|
|
\... and would **not** match if the field is missing or set with an explicit null
|
|
|
|
value like the following:
|
|
|
|
```
|
|
|
|
{
|
|
|
|
"field": null
|
|
|
|
}
|
|
|
|
```
|
|
|
|
=======
|
|
|
|
|
2013-10-07 08:42:13 -04:00
|
|
|
[WARNING]
|
2015-06-03 19:59:22 -04:00
|
|
|
=======
|
2013-10-07 08:42:13 -04:00
|
|
|
Allowing a wildcard at the beginning of a word (eg `"*ing"`) is particularly
|
|
|
|
heavy, because all terms in the index need to be examined, just in case
|
|
|
|
they match. Leading wildcards can be disabled by setting
|
|
|
|
`allow_leading_wildcard` to `false`.
|
2015-06-03 19:59:22 -04:00
|
|
|
=======
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2016-11-02 09:25:08 -04:00
|
|
|
Only parts of the analysis chain that operate at the character level are
|
|
|
|
applied. So for instance, if the analyzer performs both lowercasing and
|
|
|
|
stemming, only the lowercasing will be applied: it would be wrong to perform
|
|
|
|
stemming on a word that is missing some of its letters.
|
|
|
|
|
|
|
|
By setting `analyze_wildcard` to true, queries that end with a `*` will be
|
|
|
|
analyzed and a boolean query will be built out of the different tokens, by
|
|
|
|
ensuring exact matches on the first N-1 tokens, and prefix match on the last
|
|
|
|
token.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Regular expressions
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
Regular expression patterns can be embedded in the query string by
|
|
|
|
wrapping them in forward-slashes (`"/"`):
|
|
|
|
|
|
|
|
name:/joh?n(ath[oa]n)/
|
|
|
|
|
|
|
|
The supported regular expression syntax is explained in <<regexp-syntax>>.
|
|
|
|
|
|
|
|
[WARNING]
|
2015-06-03 19:59:22 -04:00
|
|
|
=======
|
2013-10-07 08:42:13 -04:00
|
|
|
The `allow_leading_wildcard` parameter does not have any control over
|
|
|
|
regular expressions. A query string such as the following would force
|
|
|
|
Elasticsearch to visit every term in the index:
|
|
|
|
|
|
|
|
/.*n/
|
|
|
|
|
|
|
|
Use with caution!
|
2015-06-03 19:59:22 -04:00
|
|
|
=======
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Fuzziness
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
We can search for terms that are
|
|
|
|
similar to, but not exactly like our search terms, using the ``fuzzy''
|
|
|
|
operator:
|
|
|
|
|
|
|
|
quikc~ brwn~ foks~
|
|
|
|
|
|
|
|
This uses the
|
|
|
|
http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance[Damerau-Levenshtein distance]
|
|
|
|
to find all terms with a maximum of
|
|
|
|
two changes, where a change is the insertion, deletion
|
|
|
|
or substitution of a single character, or transposition of two adjacent
|
|
|
|
characters.
|
|
|
|
|
|
|
|
The default _edit distance_ is `2`, but an edit distance of `1` should be
|
|
|
|
sufficient to catch 80% of all human misspellings. It can be specified as:
|
|
|
|
|
|
|
|
quikc~1
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Proximity searches
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
While a phrase query (eg `"john smith"`) expects all of the terms in exactly
|
|
|
|
the same order, a proximity query allows the specified words to be further
|
|
|
|
apart or in a different order. In the same way that fuzzy queries can
|
|
|
|
specify a maximum edit distance for characters in a word, a proximity search
|
|
|
|
allows us to specify a maximum edit distance of words in a phrase:
|
|
|
|
|
|
|
|
"fox quick"~5
|
|
|
|
|
|
|
|
The closer the text in a field is to the original order specified in the
|
|
|
|
query string, the more relevant that document is considered to be. When
|
|
|
|
compared to the above example query, the phrase `"quick fox"` would be
|
|
|
|
considered more relevant than `"quick brown fox"`.
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Ranges
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
Ranges can be specified for date, numeric or string fields. Inclusive ranges
|
|
|
|
are specified with square brackets `[min TO max]` and exclusive ranges with
|
|
|
|
curly brackets `{min TO max}`.
|
|
|
|
|
|
|
|
* All days in 2012:
|
|
|
|
|
2014-05-21 05:08:51 -04:00
|
|
|
date:[2012-01-01 TO 2012-12-31]
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
* Numbers 1..5
|
|
|
|
|
|
|
|
count:[1 TO 5]
|
|
|
|
|
|
|
|
* Tags between `alpha` and `omega`, excluding `alpha` and `omega`:
|
|
|
|
|
|
|
|
tag:{alpha TO omega}
|
|
|
|
|
|
|
|
* Numbers from 10 upwards
|
|
|
|
|
|
|
|
count:[10 TO *]
|
|
|
|
|
|
|
|
* Dates before 2012
|
|
|
|
|
2014-05-21 05:08:51 -04:00
|
|
|
date:{* TO 2012-01-01}
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2013-11-04 12:18:23 -05:00
|
|
|
Curly and square brackets can be combined:
|
|
|
|
|
|
|
|
* Numbers from 1 up to but not including 5
|
|
|
|
|
2015-03-03 14:03:25 -05:00
|
|
|
count:[1 TO 5}
|
2013-11-04 12:18:23 -05:00
|
|
|
|
|
|
|
|
|
|
|
Ranges with one side unbounded can use the following syntax:
|
|
|
|
|
|
|
|
age:>10
|
|
|
|
age:>=10
|
|
|
|
age:<10
|
|
|
|
age:<=10
|
|
|
|
|
|
|
|
[NOTE]
|
2015-06-03 19:59:22 -04:00
|
|
|
====================================================================
|
2013-11-04 12:18:23 -05:00
|
|
|
To combine an upper and lower bound with the simplified syntax, you
|
|
|
|
would need to join two clauses with an `AND` operator:
|
|
|
|
|
2015-02-12 16:41:21 -05:00
|
|
|
age:(>=10 AND <20)
|
2013-11-04 12:18:23 -05:00
|
|
|
age:(+>=10 +<20)
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
====================================================================
|
2013-11-04 12:18:23 -05:00
|
|
|
|
2013-10-07 08:42:13 -04:00
|
|
|
The parsing of ranges in query strings can be complex and error prone. It is
|
2015-05-05 02:27:52 -04:00
|
|
|
much more reliable to use an explicit <<query-dsl-range-query,`range` query>>.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2013-11-04 12:18:23 -05:00
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Boosting
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
Use the _boost_ operator `^` to make one term more relevant than another.
|
|
|
|
For instance, if we want to find all documents about foxes, but we are
|
|
|
|
especially interested in quick foxes:
|
|
|
|
|
|
|
|
quick^2 fox
|
|
|
|
|
|
|
|
The default `boost` value is 1, but can be any positive floating point number.
|
|
|
|
Boosts between 0 and 1 reduce relevance.
|
|
|
|
|
|
|
|
Boosts can also be applied to phrases or to groups:
|
|
|
|
|
|
|
|
"john smith"^2 (foo bar)^4
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Boolean operators
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
By default, all terms are optional, as long as one term matches. A search
|
|
|
|
for `foo bar baz` will find any document that contains one or more of
|
|
|
|
`foo` or `bar` or `baz`. We have already discussed the `default_operator`
|
|
|
|
above which allows you to force all terms to be required, but there are
|
|
|
|
also _boolean operators_ which can be used in the query string itself
|
|
|
|
to provide more control.
|
|
|
|
|
|
|
|
The preferred operators are `+` (this term *must* be present) and `-`
|
|
|
|
(this term *must not* be present). All other terms are optional.
|
|
|
|
For example, this query:
|
|
|
|
|
|
|
|
quick brown +fox -news
|
|
|
|
|
|
|
|
states that:
|
|
|
|
|
|
|
|
* `fox` must be present
|
|
|
|
* `news` must not be present
|
|
|
|
* `quick` and `brown` are optional -- their presence increases the relevance
|
|
|
|
|
2018-06-05 02:59:17 -04:00
|
|
|
The familiar boolean operators `AND`, `OR` and `NOT` (also written `&&`, `||`
|
|
|
|
and `!`) are also supported but beware that they do not honor the usual
|
|
|
|
precedence rules, so parentheses should be used whenever multiple operators are
|
|
|
|
used together. For instance the previous query could be rewritten as:
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
`((quick AND fox) OR (brown AND fox) OR fox) AND NOT news`::
|
|
|
|
|
|
|
|
This form now replicates the logic from the original query correctly, but
|
2015-11-18 09:49:40 -05:00
|
|
|
the relevance scoring bears little resemblance to the original.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
In contrast, the same query rewritten using the <<query-dsl-match-query,`match` query>>
|
|
|
|
would look like this:
|
|
|
|
|
|
|
|
{
|
|
|
|
"bool": {
|
|
|
|
"must": { "match": "fox" },
|
|
|
|
"should": { "match": "quick brown" },
|
|
|
|
"must_not": { "match": "news" }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Grouping
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
Multiple terms or clauses can be grouped together with parentheses, to form
|
|
|
|
sub-queries:
|
|
|
|
|
|
|
|
(quick OR brown) AND fox
|
|
|
|
|
|
|
|
Groups can be used to target a particular field, or to boost the result
|
|
|
|
of a sub-query:
|
|
|
|
|
|
|
|
status:(active OR pending) title:(full text search)^2
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Reserved characters
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
If you need to use any of the characters which function as operators in your
|
|
|
|
query itself (and not as operators), then you should escape them with
|
|
|
|
a leading backslash. For instance, to search for `(1+1)=2`, you would
|
2015-04-06 15:04:13 -04:00
|
|
|
need to write your query as `\(1\+1\)\=2`.
|
2013-10-07 08:42:13 -04:00
|
|
|
|
2015-01-31 22:53:06 -05:00
|
|
|
The reserved characters are: `+ - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /`
|
2013-10-07 08:42:13 -04:00
|
|
|
|
|
|
|
Failing to escape these special characters correctly could lead to a syntax
|
|
|
|
error which prevents your query from running.
|
|
|
|
|
2017-01-31 12:23:18 -05:00
|
|
|
NOTE: `<` and `>` can't be escaped at all. The only way to prevent them from
|
|
|
|
attempting to create a range query is to remove them from the query string
|
|
|
|
entirely.
|
|
|
|
|
2015-06-03 19:59:22 -04:00
|
|
|
===== Empty Query
|
2014-01-20 06:46:04 -05:00
|
|
|
|
2014-11-20 07:47:06 -05:00
|
|
|
If the query string is empty or only contains whitespaces the query will
|
|
|
|
yield an empty result set.
|