556 lines
17 KiB
Plaintext
556 lines
17 KiB
Plaintext
[[query-dsl-query-string-query]]
|
|
=== Query string query
|
|
++++
|
|
<titleabbrev>Query string</titleabbrev>
|
|
++++
|
|
|
|
Returns documents based on a provided query string, using a parser with a strict
|
|
syntax.
|
|
|
|
This query uses a <<query-string-syntax,syntax>> to parse and split the provided
|
|
query string based on operators, such as `AND` or `NOT`. The query
|
|
then <<analysis,analyzes>> each split text independently before returning
|
|
matching documents.
|
|
|
|
You can use the `query_string` query to create a complex search that includes
|
|
wildcard characters, searches across multiple fields, and more. While versatile,
|
|
the query is strict and returns an error if the query string includes any
|
|
invalid syntax.
|
|
|
|
[WARNING]
|
|
====
|
|
Because it returns an error for any invalid syntax, we don't recommend using
|
|
the `query_string` query for search boxes.
|
|
|
|
If you don't need to support a query syntax, consider using the
|
|
<<query-dsl-match-query, `match`>> query. If you need the features of a query
|
|
syntax, use the <<query-dsl-simple-query-string-query,`simple_query_string`>>
|
|
query, which is less strict.
|
|
====
|
|
|
|
|
|
[[query-string-query-ex-request]]
|
|
==== Example request
|
|
|
|
When running the following search, the `query_string` query splits `(new york
|
|
city) OR (big apple)` into two parts: `new york city` and `big apple`. The
|
|
`content` field's analyzer then independently converts each part into tokens
|
|
before returning matching documents. Because the query syntax does not use
|
|
whitespace as an operator, `new york city` is passed as-is to the analyzer.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string" : {
|
|
"query" : "(new york city) OR (big apple)",
|
|
"default_field" : "content"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[[query-string-top-level-params]]
|
|
==== Top-level parameters for `query_string`
|
|
`query`::
|
|
(Required, string) Query string you wish to parse and use for search. See
|
|
<<query-string-syntax>>.
|
|
|
|
`default_field`::
|
|
+
|
|
--
|
|
(Optional, string) Default field you wish to search if no field is provided in
|
|
the query string.
|
|
|
|
Defaults to the `index.query.default_field` index setting, which has a default
|
|
value of `*`. The `*` value extracts all fields that are eligible for term
|
|
queries and filters the metadata fields. All extracted fields are then
|
|
combined to build a query if no `prefix` is specified.
|
|
|
|
Searching across all eligible fields does not include <<nested,nested
|
|
documents>>. Use a <<query-dsl-nested-query,`nested` query>> to search those
|
|
documents.
|
|
|
|
[[WARNING]]
|
|
====
|
|
For mappings with a large number of fields, searching across all eligible fields
|
|
could be expensive.
|
|
|
|
There is a limit on the number of fields that can be queried at once.
|
|
It is defined by the `indices.query.bool.max_clause_count`
|
|
<<search-settings,search setting>>, which defaults to 1024.
|
|
====
|
|
--
|
|
|
|
`allow_leading_wildcard`::
|
|
(Optional, boolean) If `true`, the wildcard characters `*` and `?` are allowed
|
|
as the first character of the query string. Defaults to `true`.
|
|
|
|
`analyze_wildcard`::
|
|
(Optional, boolean) If `true`, the query attempts to analyze wildcard terms in
|
|
the query string. Defaults to `false`.
|
|
|
|
`analyzer`::
|
|
(Optional, string) <<analysis,Analyzer>> used to convert text in the
|
|
query string into tokens. Defaults to the
|
|
<<specify-index-time-analyzer,index-time analyzer>> mapped for the
|
|
`default_field`. If no analyzer is mapped, the index's default analyzer is used.
|
|
|
|
`auto_generate_synonyms_phrase_query`::
|
|
(Optional, boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
|
|
queries are automatically created for multi-term synonyms. Defaults to `true`.
|
|
See <<query-string-synonyms>> for an example.
|
|
|
|
`boost`::
|
|
+
|
|
--
|
|
(Optional, float) Floating point number used to decrease or increase the
|
|
<<relevance-scores,relevance scores>> of the query. Defaults to `1.0`.
|
|
|
|
Boost values are relative to the default value of `1.0`. A boost value between
|
|
`0` and `1.0` decreases the relevance score. A value greater than `1.0`
|
|
increases the relevance score.
|
|
--
|
|
|
|
`default_operator`::
|
|
+
|
|
--
|
|
(Optional, string) Default boolean logic used to interpret text in the query
|
|
string if no operators are specified. Valid values are:
|
|
|
|
`OR` (Default)::
|
|
For example, a query string of `capital of Hungary` is interpreted as `capital
|
|
OR of OR Hungary`.
|
|
|
|
`AND`::
|
|
For example, a query string of `capital of Hungary` is interpreted as `capital
|
|
AND of AND Hungary`.
|
|
--
|
|
|
|
`enable_position_increments`::
|
|
(Optional, boolean) If `true`, enable position increments in queries constructed
|
|
from a `query_string` search. Defaults to `true`.
|
|
|
|
`fields`::
|
|
+
|
|
--
|
|
(Optional, array of strings) Array of fields you wish to search.
|
|
|
|
You can use this parameter query to search across multiple fields. See
|
|
<<query-string-multi-field>>.
|
|
--
|
|
|
|
`fuzziness`::
|
|
(Optional, string) Maximum edit distance allowed for matching. See <<fuzziness>>
|
|
for valid values and more information.
|
|
|
|
`fuzzy_max_expansions`::
|
|
(Optional, integer) Maximum number of terms to which the query expands for fuzzy
|
|
matching. Defaults to `50`.
|
|
|
|
`fuzzy_prefix_length`::
|
|
(Optional, integer) Number of beginning characters left unchanged for fuzzy
|
|
matching. Defaults to `0`.
|
|
|
|
`fuzzy_transpositions`::
|
|
(Optional, boolean) If `true`, edits for fuzzy matching include
|
|
transpositions of two adjacent characters (ab → ba). Defaults to `true`.
|
|
|
|
`lenient`::
|
|
(Optional, boolean) If `true`, format-based errors, such as providing a text
|
|
value for a <<number,numeric>> field, are ignored. Defaults to `false`.
|
|
|
|
`max_determinized_states`::
|
|
+
|
|
--
|
|
(Optional, integer) Maximum number of
|
|
https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states]
|
|
required for the query. Default is `10000`.
|
|
|
|
{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse
|
|
regular expressions. Lucene converts each regular expression to a finite
|
|
automaton containing a number of determinized states.
|
|
|
|
You can use this parameter to prevent that conversion from unintentionally
|
|
consuming too many resources. You may need to increase this limit to run complex
|
|
regular expressions.
|
|
--
|
|
|
|
`minimum_should_match`::
|
|
(Optional, string) Minimum number of clauses that must match for a document to
|
|
be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
|
|
parameter>> for valid values and more information. See
|
|
<<query-string-min-should-match>> for an example.
|
|
|
|
`quote_analyzer`::
|
|
+
|
|
--
|
|
(Optional, string) <<analysis,Analyzer>> used to convert quoted text in the
|
|
query string into tokens. Defaults to the
|
|
<<search-quote-analyzer,`search_quote_analyzer`>> mapped for the
|
|
`default_field`.
|
|
|
|
For quoted text, this parameter overrides the analyzer specified in the
|
|
`analyzer` parameter.
|
|
--
|
|
|
|
`phrase_slop`::
|
|
(Optional, integer) Maximum number of positions allowed between matching tokens
|
|
for phrases. Defaults to `0`. If `0`, exact phrase matches are required.
|
|
Transposed terms have a slop of `2`.
|
|
|
|
`quote_field_suffix`::
|
|
+
|
|
--
|
|
(Optional, string) Suffix appended to quoted text in the query string.
|
|
|
|
You can use this suffix to use a different analysis method for exact matches.
|
|
See <<mixing-exact-search-with-stemming>>.
|
|
--
|
|
|
|
`rewrite`::
|
|
(Optional, string) Method used to rewrite the query. For valid values and more
|
|
information, see the <<query-dsl-multi-term-rewrite, `rewrite` parameter>>.
|
|
|
|
`time_zone`::
|
|
+
|
|
--
|
|
(Optional, string)
|
|
https://en.wikipedia.org/wiki/List_of_UTC_time_offsets[Coordinated Universal
|
|
Time (UTC) offset] or
|
|
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones[IANA time zone]
|
|
used to convert `date` values in the query string to UTC.
|
|
|
|
Valid values are ISO 8601 UTC offsets, such as `+01:00` or -`08:00`, and IANA
|
|
time zone IDs, such as `America/Los_Angeles`.
|
|
|
|
[NOTE]
|
|
====
|
|
The `time_zone` parameter does **not** affect the <<date-math,date math>> value
|
|
of `now`. `now` is always the current system time in UTC. However, the
|
|
`time_zone` parameter does convert dates calculated using `now` and
|
|
<<date-math,date math rounding>>. For example, the `time_zone` parameter will
|
|
convert a value of `now/d`.
|
|
====
|
|
--
|
|
|
|
[[query-string-query-notes]]
|
|
==== Notes
|
|
|
|
include::query-string-syntax.asciidoc[]
|
|
|
|
[[query-string-nested]]
|
|
====== Avoid using the `query_string` query for nested documents
|
|
|
|
`query_string` searches do not return <<nested,nested>> documents. To search
|
|
nested documents, use the <<query-dsl-nested-query, `nested` query>>.
|
|
|
|
[[query-string-multi-field]]
|
|
====== Search multiple fields
|
|
|
|
You can use the `fields` parameter to perform a `query_string` search across
|
|
multiple fields.
|
|
|
|
The idea of running the `query_string` query against multiple fields is to
|
|
expand each query term to an OR clause like this:
|
|
|
|
```
|
|
field1:query_term OR field2:query_term | ...
|
|
```
|
|
|
|
For example, the following query
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string" : {
|
|
"fields" : ["content", "name"],
|
|
"query" : "this AND that"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
matches the same words as
|
|
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string": {
|
|
"query": "(content:this OR name:this) AND (content:that OR name:that)"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Since several queries are generated from the individual search terms,
|
|
combining them is automatically done using a `dis_max` query with a `tie_breaker`.
|
|
For example (the `name` is boosted by 5 using `^5` notation):
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string" : {
|
|
"fields" : ["content", "name^5"],
|
|
"query" : "this AND that OR thus",
|
|
"tie_breaker" : 0
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Simple wildcard can also be used to search "within" specific inner
|
|
elements of the document. For example, if we have a `city` object with
|
|
several fields (or inner object with fields) in it, we can automatically
|
|
search on all "city" fields:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string" : {
|
|
"fields" : ["city.*"],
|
|
"query" : "this AND that OR thus"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Another option is to provide the wildcard fields search in the query
|
|
string itself (properly escaping the `*` sign), for example:
|
|
`city.\*:something`:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string" : {
|
|
"query" : "city.\\*:(this AND that OR thus)"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
NOTE: Since `\` (backslash) is a special character in json strings, it needs to
|
|
be escaped, hence the two backslashes in the above `query_string`.
|
|
|
|
The fields parameter can also include pattern based field names,
|
|
allowing to automatically expand to the relevant fields (dynamically
|
|
introduced fields included). For example:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string" : {
|
|
"fields" : ["content", "name.*^5"],
|
|
"query" : "this AND that OR thus"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[[query-string-multi-field-parms]]
|
|
====== Additional parameters for multiple field searches
|
|
|
|
When running the `query_string` query against multiple fields, the
|
|
following additional parameters are supported.
|
|
|
|
`type`::
|
|
+
|
|
--
|
|
(Optional, string) Determines how the query matches and scores documents. Valid
|
|
values are:
|
|
|
|
`best_fields` (Default)::
|
|
Finds documents which match any field and uses the highest
|
|
<<relevance-scores,`_score`>> from any matching field. See
|
|
<<type-best-fields>>.
|
|
|
|
`bool_prefix`::
|
|
Creates a `match_bool_prefix` query on each field and combines the `_score` from
|
|
each field. See <<type-bool-prefix>>.
|
|
|
|
`cross_fields`::
|
|
Treats fields with the same `analyzer` as though they were one big field. Looks
|
|
for each word in **any** field. See <<type-cross-fields>>.
|
|
|
|
`most_fields`::
|
|
Finds documents which match any field and combines the `_score` from each field.
|
|
See <<type-most-fields>>.
|
|
|
|
`phrase`::
|
|
Runs a `match_phrase` query on each field and uses the `_score` from the best
|
|
field. See <<type-phrase>>.
|
|
|
|
`phrase_prefix`::
|
|
Runs a `match_phrase_prefix` query on each field and uses the `_score` from the
|
|
best field. See <<type-phrase>>.
|
|
|
|
NOTE:
|
|
Additional top-level `multi_match` parameters may be available based on the
|
|
<<multi-match-types,`type`>> value.
|
|
--
|
|
|
|
[[query-string-synonyms]]
|
|
===== Synonyms and the `query_string` query
|
|
|
|
The `query_string` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter,
|
|
synonym_graph>> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
|
|
For example, the following synonym: `ny, new york` would produce:
|
|
|
|
`(ny OR ("new york"))`
|
|
|
|
It is also possible to match multi terms synonyms with conjunctions instead:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string" : {
|
|
"default_field": "title",
|
|
"query" : "ny city",
|
|
"auto_generate_synonyms_phrase_query" : false
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The example above creates a boolean query:
|
|
|
|
`(ny OR (new AND york)) city`
|
|
|
|
that matches documents with the term `ny` or the conjunction `new AND york`.
|
|
By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
|
|
|
|
[[query-string-min-should-match]]
|
|
===== How `minimum_should_match` works
|
|
|
|
The `query_string` splits the query around each operator to create a boolean
|
|
query for the entire input. You can use `minimum_should_match` to control how
|
|
many "should" clauses in the resulting query should match.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string": {
|
|
"fields": [
|
|
"title"
|
|
],
|
|
"query": "this that thus",
|
|
"minimum_should_match": 2
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The example above creates a boolean query:
|
|
|
|
`(title:this title:that title:thus)~2`
|
|
|
|
that matches documents with at least two of the terms `this`, `that` or `thus`
|
|
in the single field `title`.
|
|
|
|
[[query-string-min-should-match-multi]]
|
|
===== How `minimum_should_match` works for multiple fields
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string": {
|
|
"fields": [
|
|
"title",
|
|
"content"
|
|
],
|
|
"query": "this that thus",
|
|
"minimum_should_match": 2
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The example above creates a boolean query:
|
|
|
|
`((content:this content:that content:thus) | (title:this title:that title:thus))`
|
|
|
|
that matches documents with the disjunction max over the fields `title` and
|
|
`content`. Here the `minimum_should_match` parameter can't be applied.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string": {
|
|
"fields": [
|
|
"title",
|
|
"content"
|
|
],
|
|
"query": "this OR that OR thus",
|
|
"minimum_should_match": 2
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Adding explicit operators forces each term to be considered as a separate clause.
|
|
|
|
The example above creates a boolean query:
|
|
|
|
`((content:this | title:this) (content:that | title:that) (content:thus | title:thus))~2`
|
|
|
|
that matches documents with at least two of the three "should" clauses, each of
|
|
them made of the disjunction max over the fields for each term.
|
|
|
|
[[query-string-min-should-match-cross]]
|
|
===== How `minimum_should_match` works for cross-field searches
|
|
|
|
A `cross_fields` value in the `type` field indicates fields with the same
|
|
analyzer are grouped together when the input is analyzed.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"query_string": {
|
|
"fields": [
|
|
"title",
|
|
"content"
|
|
],
|
|
"query": "this OR that OR thus",
|
|
"type": "cross_fields",
|
|
"minimum_should_match": 2
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The example above creates a boolean query:
|
|
|
|
`(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2`
|
|
|
|
that matches documents with at least two of the three per-term blended queries.
|
|
|
|
==== Notes
|
|
===== Allow expensive queries
|
|
Query string query can be internally be transformed to a <<query-dsl-prefix-query, `prefix query`>> which means
|
|
that if the prefix queries are disabled as explained <<prefix-query-allow-expensive-queries, here>> the query will not be
|
|
executed and an exception will be thrown.
|