[DOCS] Reformat query string query (#45296)

This commit is contained in:
James Rodewig 2019-08-12 11:17:19 -04:00
parent bff4ef2d95
commit e3f618e1d3
3 changed files with 263 additions and 161 deletions

View File

@ -185,15 +185,3 @@ The example above creates a boolean query:
that matches documents with the term `ny` or the conjunction `new AND york`. that matches documents with the term `ny` or the conjunction `new AND york`.
By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`. By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
.Comparison to query_string / field
**************************************************
The match family of queries does not go through a "query parsing"
process. It does not support field name prefixes, wildcard characters,
or other "advanced" features. For this reason, chances of it failing are
very small / non existent, and it provides an excellent behavior when it
comes to just analyze and run that text as a query behavior (which is
usually what a text search box does).
**************************************************

View File

@ -4,8 +4,39 @@
<titleabbrev>Query string</titleabbrev> <titleabbrev>Query string</titleabbrev>
++++ ++++
A query that uses a query parser in order to parse its content. Here is Returns documents based on a provided query string, using a parser with a strict
an example: syntax.
This query uses a <<query-string-syntax,syntax>> to parse and split the provided
query string based on operators, such as `AND` or `NOT`. The query
then <<analysis,analyzes>> each split text independently before returning
matching documents.
You can use the `query_string` query to create a complex search that includes
wildcard characters, searches across multiple fields, and more. While versatile,
the query is strict and returns an error if the query string includes any
invalid syntax.
[WARNING]
====
Because it returns an error for any invalid syntax, we don't recommend using
the `query_string` query for search boxes.
If you don't need to support a query syntax, consider using the
<<query-dsl-match-query, `match`>> query. If you need the features of a query
syntax, use the <<query-dsl-simple-query-string-query,`simple_query_string`>>
query, which is less strict.
====
[[query-string-query-ex-request]]
==== Example request
When running the following search, the `query_string` query splits `(new york
city) OR (big apple)` into two parts: `new york city` and `big apple`. The
`content` field's analyzer then independently converts each part into tokens
before returning matching documents. Because the query syntax does not use
whitespace as an operator, `new york city` is passed as-is to the analyzer.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -13,154 +44,211 @@ GET /_search
{ {
"query": { "query": {
"query_string" : { "query_string" : {
"default_field" : "content", "query" : "(new york city) OR (big apple)",
"query" : "this AND that OR thus" "default_field" : "content"
} }
} }
} }
-------------------------------------------------- --------------------------------------------------
// CONSOLE // CONSOLE
The `query_string` query parses the input and splits text around operators. [[query-string-top-level-params]]
Each textual part is analyzed independently of each other. For instance the following query: ==== Top-level parameters for `query_string`
`query`::
(Required, string) Query string you wish to parse and use for search. See
<<query-string-syntax>>.
[source,js] `default_field`::
-------------------------------------------------- +
GET /_search --
{ (Optional, string) Default field you wish to search if no field is provided in
"query": { the query string.
"query_string" : {
"default_field" : "content",
"query" : "(new york city) OR (big apple)" <1>
}
}
}
--------------------------------------------------
// CONSOLE
<1> will be split into `new york city` and `big apple` and each part is then Defaults to the `index.query.default_field` index setting, which has a default
analyzed independently by the analyzer configured for the field. value of `*`. The `*` value extracts all fields that are eligible to term
queries and filters the metadata fields. All extracted fields are then combined
to build a query if no `prefix` is specified.
WARNING: Whitespaces are not considered operators, this means that `new york city` WARNING: There is a limit on the number of fields that can be queried at once.
will be passed "as is" to the analyzer configured for the field. If the field is a `keyword` It is defined by the `indices.query.bool.max_clause_count`
field the analyzer will create a single term `new york city` and the query builder will <<search-settings,search setting>>, which defaults to 1024.
use this term in the query. If you want to query each term separately you need to add explicit --
operators around the terms (e.g. `new AND york AND city`).
When multiple fields are provided it is also possible to modify how the different `allow_leading_wildcard`::
field queries are combined inside each textual part using the `type` parameter. (Optional, boolean) If `true`, the wildcard characters `*` and `?` are allowed
The possible modes are described <<multi-match-types, here>> and the default is `best_fields`. as the first character of the query string. Defaults to `true`.
The `query_string` top level parameters include: `analyze_wildcard`::
(Optional, boolean) If `true`, the query attempts to analyze wildcard terms in
the query string. Defaults to `false`.
[cols="<,<",options="header",] `analyzer`::
|======================================================================= (Optional, string) <<analysis,Analyzer>> used to convert text in the
|Parameter |Description query string into tokens. Defaults to the
|`query` |The actual query to be parsed. See <<query-string-syntax>>. <<specify-index-time-analyzer,index-time analyzer>> mapped for the
`default_field`. If no analyzer is mapped, the index's default analyzer is used.
|`default_field` |The default field for query terms if no prefix field is `auto_generate_synonyms_phrase_query`::
specified. Defaults to the `index.query.default_field` index settings, which in (Optional, boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
turn defaults to `*`. `*` extracts all fields in the mapping that are eligible queries are automatically created for multi-term synonyms. Defaults to `true`.
to term queries and filters the metadata fields. All extracted fields are then See <<query-string-synonyms>> for an example.
combined to build a query when no prefix field is provided.
WARNING: There is a limit on the number of fields that can be queried `boost`::
at once. It is defined by the `indices.query.bool.max_clause_count` <<search-settings>> +
which defaults to 1024. --
(Optional, float) Floating point number used to decrease or increase the
<<relevance-scores,relevance scores>> of the query. Defaults to `1.0`.
|`default_operator` |The default operator used if no explicit operator Boost values are relative to the default value of `1.0`. A boost value between
is specified. For example, with a default operator of `OR`, the query `0` and `1.0` decreases the relevance score. A value greater than `1.0`
`capital of Hungary` is translated to `capital OR of OR Hungary`, and increases the relevance score.
with default operator of `AND`, the same query is translated to --
`capital AND of AND Hungary`. The default value is `OR`.
|`analyzer` |The analyzer name used to analyze the query string. `default_operator`::
+
--
(Optional, string) Default boolean logic used to interpret text in the query
string if no operators are specified. Valid values are:
|`quote_analyzer` |The name of the analyzer that is used to analyze `OR` (Default)::
quoted phrases in the query string. For those parts, it overrides other For example, a query string of `capital of Hungary` is interpreted as `capital
analyzers that are set using the `analyzer` parameter or the OR of OR Hungary`.
<<search-quote-analyzer,`search_quote_analyzer`>> setting.
|`allow_leading_wildcard` |When set, `*` or `?` are allowed as the first `AND`::
character. Defaults to `true`. For example, a query string of `capital of Hungary` is interpreted as `capital
AND of AND Hungary`.
--
|`enable_position_increments` |Set to `true` to enable position `enable_position_increments`::
increments in result queries. Defaults to `true`. (Optional, boolean) If `true`, enable position increments in queries constructed
from a `query_string` search. Defaults to `true`.
|`fuzzy_max_expansions` |Controls the number of terms fuzzy queries will `fields`::
expand to. Defaults to `50` +
--
(Optional, array of strings) Array of fields you wish to search.
|`fuzziness` |Set the fuzziness for fuzzy queries. Defaults You can use this parameter query to search across multiple fields. See
to `AUTO`. See <<fuzziness>> for allowed settings. <<query-string-multi-field>>.
--
|`fuzzy_prefix_length` |Set the prefix length for fuzzy queries. Default `fuzziness`::
is `0`. (Optional, string) Maximum edit distance allowed for matching. See <<fuzziness>>
for valid values and more information.
|`fuzzy_transpositions` |Set to `false` to disable fuzzy transpositions (`ab` -> `ba`). `fuzzy_max_expansions`::
Default is `true`. (Optional, integer) Maximum number of terms to which the query expands for fuzzy
matching. Defaults to `50`.
|`phrase_slop` |Sets the default slop for phrases. If zero, then exact `fuzzy_prefix_length`::
phrase matches are required. Default value is `0`. (Optional, integer) Number of beginning characters left unchanged for fuzzy
matching. Defaults to `0`.
|`boost` |Sets the boost value of the query. Defaults to `1.0`. `fuzzy_transpositions`::
(Optional, boolean) If `true`, edits for fuzzy matching include
transpositions of two adjacent characters (ab → ba). Defaults to `true`.
|`analyze_wildcard` |By default, wildcards terms in a query string are `lenient`::
not analyzed. By setting this value to `true`, a best effort will be (Optional, boolean) If `true`, format-based errors, such as providing a text
made to analyze those as well. value for a <<number,numeric>> field, are ignored. Defaults to `false`.
|`max_determinized_states` |Limit on how many automaton states regexp `max_determinized_states`::
queries are allowed to create. This protects against too-difficult +
(e.g. exponentially hard) regexps. Defaults to 10000. --
(Optional, integer) Maximum number of
https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states]
required for the query. Default is `10000`.
|`minimum_should_match` |A value controlling how many "should" clauses {es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse
in the resulting boolean query should match. It can be an absolute value regular expressions. Lucene converts each regular expression to a finite
(`2`), a percentage (`30%`) or a automaton containing a number of determinized states.
<<query-dsl-minimum-should-match,combination of
both>>.
|`lenient` |If set to `true` will cause format based failures (like You can use this parameter to prevent that conversion from unintentionally
providing text to a numeric field) to be ignored. consuming too many resources. You may need to increase this limit to run complex
regular expressions.
--
|`time_zone` | Time Zone to be applied to any range query related to dates. `minimum_should_match`::
(Optional, string) Minimum number of clauses that must match for a document to
be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
parameter>> for valid values and more information. See
<<query-string-min-should-match>> for an example.
|`quote_field_suffix` | A suffix to append to fields for quoted parts of `quote_analyzer`::
the query string. This allows to use a field that has a different analysis chain +
for exact matching. Look <<mixing-exact-search-with-stemming,here>> for a --
comprehensive example. (Optional, string) <<analysis,Analyzer>> used to convert quoted text in the
query string into tokens. Defaults to the
<<search-quote-analyzer,`search_quote_analyzer`>> mapped for the
`default_field`.
|`auto_generate_synonyms_phrase_query` |Whether phrase queries should be automatically generated for multi terms synonyms. For quoted text, this parameter overrides the analyzer specified in the
Defaults to `true`. `analyzer` parameter.
--
|======================================================================= `phrase_slop`::
(Optional, integer) Maximum number of positions allowed between matching tokens
for phrases. Defaults to `0`. If `0`, exact phrase matches are required.
Transposed terms have a slop of `2`.
When a multi term query is being generated, one can control how it gets `quote_field_suffix`::
rewritten using the +
<<query-dsl-multi-term-rewrite,rewrite>> --
parameter. (Optional, string) Suffix appended to quoted text in the query string.
[float] You can use this suffix to use a different analysis method for exact matches.
==== Default Field See <<mixing-exact-search-with-stemming>>.
--
When not explicitly specifying the field to search on in the query `rewrite`::
string syntax, the `index.query.default_field` will be used to derive (Optional, string) Method used to rewrite the query. For valid values and more
which field to search on. If the `index.query.default_field` is not specified, information, see the <<query-dsl-multi-term-rewrite, `rewrite` parameter>>.
the `query_string` will automatically attempt to determine the existing fields in the index's
mapping that are queryable, and perform the search on those fields.
This will not include nested documents, use a nested query to search those documents.
NOTE: For mappings with a large number of fields, searching across all queryable `time_zone`::
fields in the mapping could be expensive. +
--
(Optional, string)
https://en.wikipedia.org/wiki/List_of_UTC_time_offsets[Coordinated Universal
Time (UTC) offset] or
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones[IANA time zone]
used to convert `date` values in the query string to UTC.
[float] Valid values are ISO 8601 UTC offsets, such as `+01:00` or -`08:00`, and IANA
==== Multi Field time zone IDs, such as `America/Los_Angeles`.
The `query_string` query can also run against multiple fields. Fields can be [NOTE]
provided via the `fields` parameter (example below). ====
The `time_zone` parameter does **not** affect the <<date-math,date math>> value
of `now`. `now` is always the current system time in UTC. However, the
`time_zone` parameter does convert dates calculated using `now` and
<<date-math,date math rounding>>. For example, the `time_zone` parameter will
convert a value of `now/d`.
====
--
[[query-string-query-notes]]
==== Notes
include::query-string-syntax.asciidoc[]
[[query-string-nested]]
====== Avoid using the `query_string` query for nested documents
`query_string` searches do not return <<nested,nested>> documents. To search
nested documents, use the <<query-dsl-nested-query, `nested` query>>.
[[query-string-multi-field]]
====== Search multiple fields
You can use the `fields` parameter to perform a `query_string` search across
multiple fields.
The idea of running the `query_string` query against multiple fields is to The idea of running the `query_string` query against multiple fields is to
expand each query term to an OR clause like this: expand each query term to an OR clause like this:
field1:query_term OR field2:query_term | ... ```
field1:query_term OR field2:query_term | ...
```
For example, the following query For example, the following query
@ -252,21 +340,6 @@ GET /_search
NOTE: Since `\` (backslash) is a special character in json strings, it needs to NOTE: Since `\` (backslash) is a special character in json strings, it needs to
be escaped, hence the two backslashes in the above `query_string`. be escaped, hence the two backslashes in the above `query_string`.
When running the `query_string` query against multiple fields, the
following additional parameters are allowed:
[cols="<,<",options="header",]
|=======================================================================
|Parameter |Description
|`type` |How the fields should be combined to build the text query.
See <<multi-match-types, types>> for a complete example.
Defaults to `best_fields`
|`tie_breaker` |The disjunction max tie breaker for multi fields.
Defaults to `0`
|=======================================================================
The fields parameter can also include pattern based field names, The fields parameter can also include pattern based field names,
allowing to automatically expand to the relevant fields (dynamically allowing to automatically expand to the relevant fields (dynamically
introduced fields included). For example: introduced fields included). For example:
@ -285,8 +358,50 @@ GET /_search
-------------------------------------------------- --------------------------------------------------
// CONSOLE // CONSOLE
[float] [[query-string-multi-field-parms]]
==== Synonyms ====== Additional parameters for multiple field searches
When running the `query_string` query against multiple fields, the
following additional parameters are supported.
`type`::
+
--
(Optional, string) Determines how the query matches and scores documents. Valid
values are:
`best_fields` (Default)::
Finds documents which match any field and uses the highest
<<relevance-scores,`_score`>> from any matching field. See
<<type-best-fields>>.
`bool_prefix`::
Creates a `match_bool_prefix` query on each field and combines the `_score` from
each field. See <<type-bool-prefix>>.
`cross_fields`::
Treats fields with the same `analyzer` as though they were one big field. Looks
for each word in **any** field. See <<type-cross-fields>>.
`most_fields`::
Finds documents which match any field and combines the `_score` from each field.
See <<type-most-fields>>.
`phrase`::
Runs a `match_phrase` query on each field and uses the `_score` from the best
field. See <<type-phrase>>.
`phrase_prefix`::
Runs a `match_phrase_prefix` query on each field and uses the `_score` from the
best field. See <<type-phrase>>.
NOTE:
Additional top-level `multi_match` parameters may be available based on the
<<multi-match-types,`type`>> value.
--
[[query-string-synonyms]]
===== Synonyms and the `query_string` query
The `query_string` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter, The `query_string` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter,
synonym_graph>> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms. synonym_graph>> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
@ -318,8 +433,8 @@ The example above creates a boolean query:
that matches documents with the term `ny` or the conjunction `new AND york`. that matches documents with the term `ny` or the conjunction `new AND york`.
By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`. By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
[float] [[query-string-min-should-match]]
==== Minimum should match ===== How `minimum_should_match` works
The `query_string` splits the query around each operator to create a boolean The `query_string` splits the query around each operator to create a boolean
query for the entire input. You can use `minimum_should_match` to control how query for the entire input. You can use `minimum_should_match` to control how
@ -349,8 +464,8 @@ The example above creates a boolean query:
that matches documents with at least two of the terms `this`, `that` or `thus` that matches documents with at least two of the terms `this`, `that` or `thus`
in the single field `title`. in the single field `title`.
[float] [[query-string-min-should-match-multi]]
===== Multi Field ===== How `minimum_should_match` works for multiple fields
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -404,8 +519,11 @@ The example above creates a boolean query:
that matches documents with at least two of the three "should" clauses, each of that matches documents with at least two of the three "should" clauses, each of
them made of the disjunction max over the fields for each term. them made of the disjunction max over the fields for each term.
[float] [[query-string-min-should-match-cross]]
===== Cross Field ===== How `minimum_should_match` works for cross-field searches
A `cross_fields` value in the `type` field indicates fields with the same
analyzer are grouped together when the input is analyzed.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -426,13 +544,8 @@ GET /_search
-------------------------------------------------- --------------------------------------------------
// CONSOLE // CONSOLE
The `cross_fields` value in the `type` field indicates that fields that have the
same analyzer should be grouped together when the input is analyzed.
The example above creates a boolean query: The example above creates a boolean query:
`(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2` `(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2`
that matches documents with at least two of the three per-term blended queries. that matches documents with at least two of the three per-term blended queries.
include::query-string-syntax.asciidoc[]

View File

@ -1,6 +1,6 @@
[[query-string-syntax]] [[query-string-syntax]]
==== Query string syntax ===== Query string syntax
The query string ``mini-language'' is used by the The query string ``mini-language'' is used by the
<<query-dsl-query-string-query>> and by the <<query-dsl-query-string-query>> and by the
@ -14,10 +14,9 @@ phrase, in the same order.
Operators allow you to customize the search -- the available options are Operators allow you to customize the search -- the available options are
explained below. explained below.
===== Field names ====== Field names
As mentioned in <<query-dsl-query-string-query>>, the `default_field` is searched for the You can specify fields to search in the query syntax:
search terms, but it is possible to specify other fields in the query syntax:
* where the `status` field contains `active` * where the `status` field contains `active`
@ -40,7 +39,7 @@ search terms, but it is possible to specify other fields in the query syntax:
_exists_:title _exists_:title
===== Wildcards ====== Wildcards
Wildcard searches can be run on individual terms, using `?` to replace Wildcard searches can be run on individual terms, using `?` to replace
a single character, and `*` to replace zero or more characters: a single character, and `*` to replace zero or more characters:
@ -88,7 +87,7 @@ analyzed and a boolean query will be built out of the different tokens, by
ensuring exact matches on the first N-1 tokens, and prefix match on the last ensuring exact matches on the first N-1 tokens, and prefix match on the last
token. token.
===== Regular expressions ====== Regular expressions
Regular expression patterns can be embedded in the query string by Regular expression patterns can be embedded in the query string by
wrapping them in forward-slashes (`"/"`): wrapping them in forward-slashes (`"/"`):
@ -108,7 +107,7 @@ Elasticsearch to visit every term in the index:
Use with caution! Use with caution!
======= =======
===== Fuzziness ====== Fuzziness
We can search for terms that are We can search for terms that are
similar to, but not exactly like our search terms, using the ``fuzzy'' similar to, but not exactly like our search terms, using the ``fuzzy''
@ -128,7 +127,7 @@ sufficient to catch 80% of all human misspellings. It can be specified as:
quikc~1 quikc~1
===== Proximity searches ====== Proximity searches
While a phrase query (eg `"john smith"`) expects all of the terms in exactly While a phrase query (eg `"john smith"`) expects all of the terms in exactly
the same order, a proximity query allows the specified words to be further the same order, a proximity query allows the specified words to be further
@ -143,7 +142,7 @@ query string, the more relevant that document is considered to be. When
compared to the above example query, the phrase `"quick fox"` would be compared to the above example query, the phrase `"quick fox"` would be
considered more relevant than `"quick brown fox"`. considered more relevant than `"quick brown fox"`.
===== Ranges ====== Ranges
Ranges can be specified for date, numeric or string fields. Inclusive ranges Ranges can be specified for date, numeric or string fields. Inclusive ranges
are specified with square brackets `[min TO max]` and exclusive ranges with are specified with square brackets `[min TO max]` and exclusive ranges with
@ -197,7 +196,7 @@ The parsing of ranges in query strings can be complex and error prone. It is
much more reliable to use an explicit <<query-dsl-range-query,`range` query>>. much more reliable to use an explicit <<query-dsl-range-query,`range` query>>.
===== Boosting ====== Boosting
Use the _boost_ operator `^` to make one term more relevant than another. Use the _boost_ operator `^` to make one term more relevant than another.
For instance, if we want to find all documents about foxes, but we are For instance, if we want to find all documents about foxes, but we are
@ -212,7 +211,7 @@ Boosts can also be applied to phrases or to groups:
"john smith"^2 (foo bar)^4 "john smith"^2 (foo bar)^4
===== Boolean operators ====== Boolean operators
By default, all terms are optional, as long as one term matches. A search By default, all terms are optional, as long as one term matches. A search
for `foo bar baz` will find any document that contains one or more of for `foo bar baz` will find any document that contains one or more of
@ -255,7 +254,7 @@ would look like this:
} }
===== Grouping ====== Grouping
Multiple terms or clauses can be grouped together with parentheses, to form Multiple terms or clauses can be grouped together with parentheses, to form
sub-queries: sub-queries:
@ -267,7 +266,7 @@ of a sub-query:
status:(active OR pending) title:(full text search)^2 status:(active OR pending) title:(full text search)^2
===== Reserved characters ====== Reserved characters
If you need to use any of the characters which function as operators in your If you need to use any of the characters which function as operators in your
query itself (and not as operators), then you should escape them with query itself (and not as operators), then you should escape them with
@ -283,7 +282,9 @@ NOTE: `<` and `>` can't be escaped at all. The only way to prevent them from
attempting to create a range query is to remove them from the query string attempting to create a range query is to remove them from the query string
entirely. entirely.
===== Empty Query ====== Whitespaces and empty queries
Whitespace is not considered an operator.
If the query string is empty or only contains whitespaces the query will If the query string is empty or only contains whitespaces the query will
yield an empty result set. yield an empty result set.