[DOCS] Reformat query string query (#45296)

2019-08-12 11:17:19 -04:00 · 2019-08-12 11:17:19 -04:00 · e3f618e1d3
parent bff4ef2d95
commit e3f618e1d3
3 changed files with 263 additions and 161 deletions
--- a/docs/reference/query-dsl/match-query.asciidoc
+++ b/docs/reference/query-dsl/match-query.asciidoc
@ -185,15 +185,3 @@ The example above creates a boolean query:
 that matches documents with the term `ny` or the conjunction `new AND york`.
 By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.

-
-.Comparison to query_string / field
-**************************************************
-
-The match family of queries does not go through a "query parsing"
-process. It does not support field name prefixes, wildcard characters,
-or other "advanced" features. For this reason, chances of it failing are
-very small / non existent, and it provides an excellent behavior when it
-comes to just analyze and run that text as a query behavior (which is
-usually what a text search box does).
-
-**************************************************
--- a/docs/reference/query-dsl/query-string-query.asciidoc
+++ b/docs/reference/query-dsl/query-string-query.asciidoc
@ -4,8 +4,39 @@
 <titleabbrev>Query string</titleabbrev>
 ++++

-A query that uses a query parser in order to parse its content. Here is
-an example:
+Returns documents based on a provided query string, using a parser with a strict
+syntax.
+
+This query uses a <<query-string-syntax,syntax>> to parse and split the provided
+query string based on operators, such as `AND` or `NOT`. The query
+then <<analysis,analyzes>> each split text independently before returning
+matching documents.
+
+You can use the `query_string` query to create a complex search that includes
+wildcard characters, searches across multiple fields, and more. While versatile,
+the query is strict and returns an error if the query string includes any
+invalid syntax.
+
+[WARNING]
+====
+Because it returns an error for any invalid syntax, we don't recommend using
+the `query_string` query for search boxes.
+
+If you don't need to support a query syntax, consider using the
+<<query-dsl-match-query, `match`>> query. If you need the features of a query
+syntax, use the <<query-dsl-simple-query-string-query,`simple_query_string`>>
+query, which is less strict.
+====
+ 
+
+[[query-string-query-ex-request]]
+==== Example request
+
+When running the following search, the `query_string` query splits `(new york
+city) OR (big apple)` into two parts: `new york city` and `big apple`. The
+`content` field's analyzer then independently converts each part into tokens
+before returning matching documents. Because the query syntax does not use
+whitespace as an operator, `new york city` is passed as-is to the analyzer.

 [source,js]
 --------------------------------------------------
@ -13,154 +44,211 @@ GET /_search
 {
    "query": {
        "query_string" : {
-            "default_field" : "content",
-            "query" : "this AND that OR thus"
+            "query" : "(new york city) OR (big apple)",
+            "default_field" : "content"
        }
    }
 }
 --------------------------------------------------
 // CONSOLE

-The `query_string` query parses the input and splits text around operators.
-Each textual part is analyzed independently of each other. For instance the following query:
+[[query-string-top-level-params]]
+==== Top-level parameters for `query_string`
+`query`::
+(Required, string) Query string you wish to parse and use for search. See
+<<query-string-syntax>>.

-[source,js]
--------------------------------------------------
-GET /_search
-{
-    "query": {
-        "query_string" : {
-            "default_field" : "content",
-            "query" : "(new york city) OR (big apple)" <1>
-        }
-    }
-}
--------------------------------------------------
-// CONSOLE
+`default_field`::
+
+--
+(Optional, string) Default field you wish to search if no field is provided in
+the query string.

-<1> will be split into `new york city` and `big apple` and each part is then
-analyzed independently by the analyzer configured for the field.
+Defaults to the `index.query.default_field` index setting, which has a default
+value of `*`. The `*` value extracts all fields that are eligible to term
+queries and filters the metadata fields. All extracted fields are then combined
+to build a query if no `prefix` is specified.

-WARNING: Whitespaces are not considered operators, this means that `new york city`
-will be passed "as is" to the analyzer configured for the field. If the field is a `keyword`
-field the analyzer will create a single term `new york city` and the query builder will
-use this term in the query. If you want to query each term separately you need to add explicit
-operators around the terms (e.g. `new AND york AND city`).
+WARNING: There is a limit on the number of fields that can be queried at once.
+It is defined by the `indices.query.bool.max_clause_count`
+<<search-settings,search setting>>, which defaults to 1024.
+--

-When multiple fields are provided it is also possible to modify how the different
-field queries are combined inside each textual part using the `type` parameter.
-The possible modes are described <<multi-match-types, here>> and the default is `best_fields`.
+`allow_leading_wildcard`::
+(Optional, boolean) If `true`, the wildcard characters `*` and `?` are allowed
+as the first character of the query string. Defaults to `true`.

-The `query_string` top level parameters include:
+`analyze_wildcard`::
+(Optional, boolean) If `true`, the query attempts to analyze wildcard terms in
+the query string. Defaults to `false`.

-[cols="<,<",options="header",]
-|=======================================================================
-|Parameter |Description
-|`query` |The actual query to be parsed. See <<query-string-syntax>>.
+`analyzer`::
+(Optional, string) <<analysis,Analyzer>> used to convert text in the
+query string into tokens. Defaults to the
+<<specify-index-time-analyzer,index-time analyzer>> mapped for the
+`default_field`. If no analyzer is mapped, the index's default analyzer is used.

-|`default_field` |The default field for query terms if no prefix field is
-specified. Defaults to the `index.query.default_field` index settings, which in
-turn defaults to `*`. `*` extracts all fields in the mapping that are eligible
-to term queries and filters the metadata fields. All extracted fields are then
-combined to build a query when no prefix field is provided.
+`auto_generate_synonyms_phrase_query`::
+(Optional, boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
+queries are automatically created for multi-term synonyms. Defaults to `true`.
+See <<query-string-synonyms>> for an example.

-WARNING: There is a limit on the number of fields that can be queried
-at once. It is defined by the `indices.query.bool.max_clause_count` <<search-settings>>
-which defaults to 1024.
+`boost`::
+
+--
+(Optional, float) Floating point number used to decrease or increase the
+<<relevance-scores,relevance scores>> of the query. Defaults to `1.0`.

-|`default_operator` |The default operator used if no explicit operator
-is specified. For example, with a default operator of `OR`, the query
-`capital of Hungary` is translated to `capital OR of OR Hungary`, and
-with default operator of `AND`, the same query is translated to
-`capital AND of AND Hungary`. The default value is `OR`.
+Boost values are relative to the default value of `1.0`. A boost value between
+`0` and `1.0` decreases the relevance score. A value greater than `1.0`
+increases the relevance score.
+--

-|`analyzer` |The analyzer name used to analyze the query string.
+`default_operator`::
+
+--
+(Optional, string) Default boolean logic used to interpret text in the query
+string if no operators are specified. Valid values are:

-|`quote_analyzer` |The name of the analyzer that is used to analyze
-quoted phrases in the query string. For those parts, it overrides other
-analyzers that are set using the `analyzer` parameter or the
-<<search-quote-analyzer,`search_quote_analyzer`>> setting.
+ `OR` (Default)::
+For example, a query string of `capital of Hungary` is interpreted as `capital
+OR of OR Hungary`.

-|`allow_leading_wildcard` |When set, `*` or `?` are allowed as the first
-character. Defaults to `true`.
+ `AND`::
+For example, a query string of `capital of Hungary` is interpreted as `capital
+AND of AND Hungary`.
+--

-|`enable_position_increments` |Set to `true` to enable position
-increments in result queries. Defaults to `true`.
+`enable_position_increments`::
+(Optional, boolean) If `true`, enable position increments in queries constructed
+from a `query_string` search. Defaults to `true`.

-|`fuzzy_max_expansions` |Controls the number of terms fuzzy queries will
-expand to. Defaults to `50`
+`fields`::
+
+--
+(Optional, array of strings) Array of fields you wish to search.

-|`fuzziness` |Set the fuzziness for fuzzy queries. Defaults
-to `AUTO`. See <<fuzziness>> for allowed settings.
+You can use this parameter query to search across multiple fields. See
+<<query-string-multi-field>>.
+--

-|`fuzzy_prefix_length` |Set the prefix length for fuzzy queries. Default
-is `0`.
+`fuzziness`::
+(Optional, string) Maximum edit distance allowed for matching. See <<fuzziness>>
+for valid values and more information.

-|`fuzzy_transpositions` |Set to `false` to disable fuzzy transpositions (`ab` -> `ba`).
-Default is `true`.
+`fuzzy_max_expansions`::
+(Optional, integer) Maximum number of terms to which the query expands for fuzzy
+matching. Defaults to `50`.

-|`phrase_slop` |Sets the default slop for phrases. If zero, then exact
-phrase matches are required. Default value is `0`.
+`fuzzy_prefix_length`::
+(Optional, integer) Number of beginning characters left unchanged for fuzzy
+matching. Defaults to `0`.

-|`boost` |Sets the boost value of the query. Defaults to `1.0`.
+`fuzzy_transpositions`::
+(Optional, boolean) If `true`, edits for fuzzy matching include
+transpositions of two adjacent characters (ab → ba). Defaults to `true`.

-|`analyze_wildcard` |By default, wildcards terms in a query string are
-not analyzed. By setting this value to `true`, a best effort will be
-made to analyze those as well.
+`lenient`::
+(Optional, boolean) If `true`, format-based errors, such as providing a text
+value for a <<number,numeric>> field, are ignored. Defaults to `false`.

-|`max_determinized_states` |Limit on how many automaton states regexp
-queries are allowed to create. This protects against too-difficult
-(e.g. exponentially hard) regexps. Defaults to 10000.
+`max_determinized_states`::
+
+--
+(Optional, integer) Maximum number of
+https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states]
+required for the query. Default is `10000`.

-|`minimum_should_match` |A value controlling how many "should" clauses
-in the resulting boolean query should match. It can be an absolute value
-(`2`), a percentage (`30%`) or a
-<<query-dsl-minimum-should-match,combination of
-both>>.
+{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse
+regular expressions. Lucene converts each regular expression to a finite
+automaton containing a number of determinized states.

-|`lenient` |If set to `true` will cause format based failures (like
-providing text to a numeric field) to be ignored.
+You can use this parameter to prevent that conversion from unintentionally
+consuming too many resources. You may need to increase this limit to run complex
+regular expressions.
+--

-|`time_zone` | Time Zone to be applied to any range query related to dates.
+`minimum_should_match`::
+(Optional, string) Minimum number of clauses that must match for a document to
+be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
+parameter>> for valid values and more information. See
+<<query-string-min-should-match>> for an example.

-|`quote_field_suffix` | A suffix to append to fields for quoted parts of
-the query string. This allows to use a field that has a different analysis chain
-for exact matching. Look <<mixing-exact-search-with-stemming,here>> for a
-comprehensive example.
+`quote_analyzer`::
+
+--
+(Optional, string) <<analysis,Analyzer>> used to convert quoted text in the
+query string into tokens. Defaults to the
+<<search-quote-analyzer,`search_quote_analyzer`>> mapped for the
+`default_field`.

-|`auto_generate_synonyms_phrase_query` |Whether phrase queries should be automatically generated for multi terms synonyms.
-Defaults to `true`.
+For quoted text, this parameter overrides the analyzer specified in the
+`analyzer` parameter.
+--

-|=======================================================================
+`phrase_slop`::
+(Optional, integer) Maximum number of positions allowed between matching tokens
+for phrases. Defaults to `0`. If `0`, exact phrase matches are required.
+Transposed terms have a slop of `2`.

-When a multi term query is being generated, one can control how it gets
-rewritten using the
-<<query-dsl-multi-term-rewrite,rewrite>>
-parameter.
+`quote_field_suffix`::
+
+--
+(Optional, string) Suffix appended to quoted text in the query string.

-[float]
-==== Default Field
+You can use this suffix to use a different analysis method for exact matches.
+See <<mixing-exact-search-with-stemming>>.
+--

-When not explicitly specifying the field to search on in the query
-string syntax, the `index.query.default_field` will be used to derive
-which field to search on. If the `index.query.default_field` is not specified,
-the `query_string` will automatically attempt to determine the existing fields in the index's
-mapping that are queryable, and perform the search on those fields. 
-This will not include nested documents, use a nested query to search those documents.
+`rewrite`::
+(Optional, string) Method used to rewrite the query. For valid values and more
+information, see the <<query-dsl-multi-term-rewrite, `rewrite` parameter>>.

-NOTE: For mappings with a large number of fields, searching across all queryable
-fields in the mapping could be expensive.
+`time_zone`::
+
+--
+(Optional, string)
+https://en.wikipedia.org/wiki/List_of_UTC_time_offsets[Coordinated Universal
+Time (UTC) offset] or
+https://en.wikipedia.org/wiki/List_of_tz_database_time_zones[IANA time zone]
+used to convert `date` values in the query string to UTC.

-[float]
-==== Multi Field
+Valid values are ISO 8601 UTC offsets, such as `+01:00` or -`08:00`, and IANA
+time zone IDs, such as `America/Los_Angeles`.

-The `query_string` query can also run against multiple fields. Fields can be
-provided via the `fields` parameter (example below).
+[NOTE]
+====
+The `time_zone` parameter does **not** affect the <<date-math,date math>> value
+of `now`. `now` is always the current system time in UTC. However, the
+`time_zone` parameter does convert dates calculated using `now` and
+<<date-math,date math rounding>>. For example, the `time_zone` parameter will
+convert a value of `now/d`.
+====
+--
+
+[[query-string-query-notes]]
+==== Notes
+
+include::query-string-syntax.asciidoc[]
+
+[[query-string-nested]]
+====== Avoid using the `query_string` query for nested documents
+
+`query_string` searches do not return <<nested,nested>> documents. To search
+nested documents, use the <<query-dsl-nested-query, `nested` query>>.
+
+[[query-string-multi-field]]
+====== Search multiple fields
+
+You can use the `fields` parameter to perform a `query_string` search across
+multiple fields.

 The idea of running the `query_string` query against multiple fields is to
 expand each query term to an OR clause like this:

-    field1:query_term OR field2:query_term | ...
+```
+field1:query_term OR field2:query_term | ...
+```

 For example, the following query

@ -252,21 +340,6 @@ GET /_search
 NOTE: Since `\` (backslash) is a special character in json strings, it needs to
 be escaped, hence the two backslashes in the above `query_string`.

-When running the `query_string` query against multiple fields, the
-following additional parameters are allowed:
-
-[cols="<,<",options="header",]
-|=======================================================================
-|Parameter |Description
-
-|`type` |How the fields should be combined to build the text query.
-See <<multi-match-types, types>> for a complete example.
-Defaults to `best_fields`
-
-|`tie_breaker` |The disjunction max tie breaker for multi fields.
-Defaults to `0`
-|=======================================================================
-
 The fields parameter can also include pattern based field names,
 allowing to automatically expand to the relevant fields (dynamically
 introduced fields included). For example:
@ -285,8 +358,50 @@ GET /_search
 --------------------------------------------------
 // CONSOLE

-[float]
-==== Synonyms
+[[query-string-multi-field-parms]]
+====== Additional parameters for multiple field searches
+
+When running the `query_string` query against multiple fields, the
+following additional parameters are supported.
+
+`type`::
+
+--
+(Optional, string) Determines how the query matches and scores documents. Valid
+values are:
+
+`best_fields` (Default)::
+Finds documents which match any field and uses the highest
+<<relevance-scores,`_score`>> from any matching field. See
+<<type-best-fields>>.
+
+`bool_prefix`::
+Creates a `match_bool_prefix` query on each field and combines the `_score` from
+each field. See <<type-bool-prefix>>.
+
+`cross_fields`::
+Treats fields with the same `analyzer` as though they were one big field. Looks
+for each word in **any** field. See <<type-cross-fields>>.
+
+`most_fields`::
+Finds documents which match any field and combines the `_score` from each field.
+See <<type-most-fields>>.
+
+`phrase`::
+Runs a `match_phrase` query on each field and uses the `_score` from the best
+field. See <<type-phrase>>.
+
+`phrase_prefix`::
+Runs a `match_phrase_prefix` query on each field and uses the `_score` from the
+best field. See <<type-phrase>>.
+
+NOTE:
+Additional top-level `multi_match` parameters may be available based on the
+<<multi-match-types,`type`>> value.
+--
+
+[[query-string-synonyms]]
+===== Synonyms and the `query_string` query

 The `query_string` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter,
 synonym_graph>> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
@ -318,8 +433,8 @@ The example above creates a boolean query:
 that matches documents with the term `ny` or the conjunction `new AND york`.
 By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.

-[float]
-==== Minimum should match
+[[query-string-min-should-match]]
+===== How `minimum_should_match` works

 The `query_string` splits the query around each operator to create a boolean
 query for the entire input. You can use `minimum_should_match` to control how
@ -349,8 +464,8 @@ The example above creates a boolean query:
 that matches documents with at least two of the terms `this`, `that` or `thus`
 in the single field `title`.

-[float]
-===== Multi Field
+[[query-string-min-should-match-multi]]
+===== How `minimum_should_match` works for multiple fields

 [source,js]
 --------------------------------------------------
@ -404,8 +519,11 @@ The example above creates a boolean query:
 that matches documents with at least two of the three "should" clauses, each of
 them made of the disjunction max over the fields for each term.

-[float]
-===== Cross Field
+[[query-string-min-should-match-cross]]
+===== How `minimum_should_match` works for cross-field searches
+
+A `cross_fields` value in the `type` field indicates fields with the same
+analyzer are grouped together when the input is analyzed.

 [source,js]
 --------------------------------------------------
@ -426,13 +544,8 @@ GET /_search
 --------------------------------------------------
 // CONSOLE

-The `cross_fields` value in the `type` field indicates that fields that have the
-same analyzer should be grouped together when the input is analyzed.
-
 The example above creates a boolean query:

 `(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2`

 that matches documents with at least two of the three per-term blended queries.
-
-include::query-string-syntax.asciidoc[]
--- a/docs/reference/query-dsl/query-string-syntax.asciidoc
+++ b/docs/reference/query-dsl/query-string-syntax.asciidoc
@ -1,6 +1,6 @@
 [[query-string-syntax]]

-==== Query string syntax
+===== Query string syntax

 The query string ``mini-language'' is used by the
 <<query-dsl-query-string-query>> and by the
@ -14,10 +14,9 @@ phrase, in the same order.
 Operators allow you to customize the search -- the available options are
 explained below.

-===== Field names
+====== Field names

-As mentioned in <<query-dsl-query-string-query>>, the `default_field` is searched for the
-search terms, but it is possible to specify other fields in the query syntax:
+You can specify fields to search in the query syntax:

 * where the `status` field contains `active`

@ -40,7 +39,7 @@ search terms, but it is possible to specify other fields in the query syntax:

    _exists_:title

-===== Wildcards
+====== Wildcards

 Wildcard searches can be run on individual terms, using `?` to replace
 a single character, and `*` to replace zero or more characters:
@ -88,7 +87,7 @@ analyzed and a boolean query will be built out of the different tokens, by
 ensuring exact matches on the first N-1 tokens, and prefix match on the last
 token.

-===== Regular expressions
+====== Regular expressions

 Regular expression patterns can be embedded in the query string by
 wrapping them in forward-slashes (`"/"`):
@ -108,7 +107,7 @@ Elasticsearch to visit every term in the index:
 Use with caution!
 =======

-===== Fuzziness
+====== Fuzziness

 We can search for terms that are
 similar to, but not exactly like our search terms, using the ``fuzzy''
@ -128,7 +127,7 @@ sufficient to catch 80% of all human misspellings. It can be specified as:

    quikc~1

-===== Proximity searches
+====== Proximity searches

 While a phrase query (eg `"john smith"`) expects all of the terms in exactly
 the same order, a proximity query allows the specified words to be further
@ -143,7 +142,7 @@ query string, the more relevant that document is considered to be. When
 compared to the above example query, the phrase `"quick fox"` would be
 considered more relevant than `"quick brown fox"`.

-===== Ranges
+====== Ranges

 Ranges can be specified for date, numeric or string fields. Inclusive ranges
 are specified with square brackets `[min TO max]` and exclusive ranges with
@ -197,7 +196,7 @@ The parsing of ranges in query strings can be complex and error prone. It is
 much more reliable to use an explicit <<query-dsl-range-query,`range` query>>.


-===== Boosting
+====== Boosting

 Use the _boost_ operator `^` to make one term more relevant than another.
 For instance, if we want to find all documents about foxes, but we are
@ -212,7 +211,7 @@ Boosts can also be applied to phrases or to groups:

    "john smith"^2   (foo bar)^4

-===== Boolean operators
+====== Boolean operators

 By default, all terms are optional, as long as one term matches.  A search
 for `foo bar baz` will find any document that contains one or more of
@ -255,7 +254,7 @@ would look like this:
    }


-===== Grouping
+====== Grouping

 Multiple terms or clauses can be grouped together with parentheses, to form
 sub-queries:
@ -267,7 +266,7 @@ of a sub-query:

    status:(active OR pending) title:(full text search)^2

-===== Reserved characters
+====== Reserved characters

 If you need to use any of the characters which function as operators in your
 query itself (and not as operators), then you should escape them with
@ -283,7 +282,9 @@ NOTE: `<` and `>` can't be escaped at all. The only way to prevent them from
 attempting to create a range query is to remove them from the query string
 entirely.

-===== Empty Query
+====== Whitespaces and empty queries
+
+Whitespace is not considered an operator.

 If the query string is empty or only contains whitespaces the query will
 yield an empty result set.