[DOCS] Reformat query string query (#45296)

2019-08-12 11:17:19 -04:00 · 2019-08-12 11:17:19 -04:00 · e3f618e1d3
parent bff4ef2d95
commit e3f618e1d3
3 changed files with 263 additions and 161 deletions
--- a/docs/reference/query-dsl/match-query.asciidoc
+++ b/docs/reference/query-dsl/match-query.asciidoc
@ -185,15 +185,3 @@ The example above creates a boolean query:
 that matches documents with the term `ny` or the conjunction `new AND york`.
 By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
 .Comparison to query_string / field
 **************************************************
 The match family of queries does not go through a "query parsing"
 process. It does not support field name prefixes, wildcard characters,
 or other "advanced" features. For this reason, chances of it failing are
 very small / non existent, and it provides an excellent behavior when it
 comes to just analyze and run that text as a query behavior (which is
 usually what a text search box does).
 **************************************************
--- a/docs/reference/query-dsl/query-string-query.asciidoc
+++ b/docs/reference/query-dsl/query-string-query.asciidoc
@ -4,8 +4,39 @@
 <titleabbrev>Query string</titleabbrev>
 ++++
-A query that uses a query parser in order to parse its content. Here is
+Returns documents based on a provided query string, using a parser with a strict
-an example:
+syntax.
 This query uses a <<query-string-syntax,syntax>> to parse and split the provided
 query string based on operators, such as `AND` or `NOT`. The query
 then <<analysis,analyzes>> each split text independently before returning
 matching documents.
 You can use the `query_string` query to create a complex search that includes
 wildcard characters, searches across multiple fields, and more. While versatile,
 the query is strict and returns an error if the query string includes any
 invalid syntax.
 [WARNING]
 ====
 Because it returns an error for any invalid syntax, we don't recommend using
 the `query_string` query for search boxes.
 If you don't need to support a query syntax, consider using the
 <<query-dsl-match-query, `match`>> query. If you need the features of a query
 syntax, use the <<query-dsl-simple-query-string-query,`simple_query_string`>>
 query, which is less strict.
 ====
 [[query-string-query-ex-request]]
 ==== Example request
 When running the following search, the `query_string` query splits `(new york
 city) OR (big apple)` into two parts: `new york city` and `big apple`. The
 `content` field's analyzer then independently converts each part into tokens
 before returning matching documents. Because the query syntax does not use
 whitespace as an operator, `new york city` is passed as-is to the analyzer.
 [source,js]
 --------------------------------------------------
@ -13,154 +44,211 @@ GET /_search
 {
    "query": {
        "query_string" : {
-            "default_field" : "content",
+            "query" : "(new york city) OR (big apple)",
-            "query" : "this AND that OR thus"
+            "default_field" : "content"
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
-The `query_string` query parses the input and splits text around operators.
+[[query-string-top-level-params]]
-Each textual part is analyzed independently of each other. For instance the following query:
+==== Top-level parameters for `query_string`
 `query`::
 (Required, string) Query string you wish to parse and use for search. See
 <<query-string-syntax>>.
-[source,js]
+`default_field`::
--------------------------------------------------
+
-GET /_search
+--
-{
+(Optional, string) Default field you wish to search if no field is provided in
-    "query": {
+the query string.
        "query_string" : {
            "default_field" : "content",
            "query" : "(new york city) OR (big apple)" <1>
        }
    }
 }
 --------------------------------------------------
 // CONSOLE
-<1> will be split into `new york city` and `big apple` and each part is then
+Defaults to the `index.query.default_field` index setting, which has a default
-analyzed independently by the analyzer configured for the field.
+value of `*`. The `*` value extracts all fields that are eligible to term
 queries and filters the metadata fields. All extracted fields are then combined
 to build a query if no `prefix` is specified.
-WARNING: Whitespaces are not considered operators, this means that `new york city`
+WARNING: There is a limit on the number of fields that can be queried at once.
-will be passed "as is" to the analyzer configured for the field. If the field is a `keyword`
+It is defined by the `indices.query.bool.max_clause_count`
-field the analyzer will create a single term `new york city` and the query builder will
+<<search-settings,search setting>>, which defaults to 1024.
-use this term in the query. If you want to query each term separately you need to add explicit
+--
 operators around the terms (e.g. `new AND york AND city`).
-When multiple fields are provided it is also possible to modify how the different
+`allow_leading_wildcard`::
-field queries are combined inside each textual part using the `type` parameter.
+(Optional, boolean) If `true`, the wildcard characters `*` and `?` are allowed
-The possible modes are described <<multi-match-types, here>> and the default is `best_fields`.
+as the first character of the query string. Defaults to `true`.
-The `query_string` top level parameters include:
+`analyze_wildcard`::
 (Optional, boolean) If `true`, the query attempts to analyze wildcard terms in
 the query string. Defaults to `false`.
-[cols="<,<",options="header",]
+`analyzer`::
-|=======================================================================
+(Optional, string) <<analysis,Analyzer>> used to convert text in the
-|Parameter |Description
+query string into tokens. Defaults to the
-|`query` |The actual query to be parsed. See <<query-string-syntax>>.
+<<specify-index-time-analyzer,index-time analyzer>> mapped for the
 `default_field`. If no analyzer is mapped, the index's default analyzer is used.
-|`default_field` |The default field for query terms if no prefix field is
+`auto_generate_synonyms_phrase_query`::
-specified. Defaults to the `index.query.default_field` index settings, which in
+(Optional, boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
-turn defaults to `*`. `*` extracts all fields in the mapping that are eligible
+queries are automatically created for multi-term synonyms. Defaults to `true`.
-to term queries and filters the metadata fields. All extracted fields are then
+See <<query-string-synonyms>> for an example.
 combined to build a query when no prefix field is provided.
-WARNING: There is a limit on the number of fields that can be queried
+`boost`::
-at once. It is defined by the `indices.query.bool.max_clause_count` <<search-settings>>
+
-which defaults to 1024.
+--
 (Optional, float) Floating point number used to decrease or increase the
 <<relevance-scores,relevance scores>> of the query. Defaults to `1.0`.
-|`default_operator` |The default operator used if no explicit operator
+Boost values are relative to the default value of `1.0`. A boost value between
-is specified. For example, with a default operator of `OR`, the query
+`0` and `1.0` decreases the relevance score. A value greater than `1.0`
-`capital of Hungary` is translated to `capital OR of OR Hungary`, and
+increases the relevance score.
-with default operator of `AND`, the same query is translated to
+--
 `capital AND of AND Hungary`. The default value is `OR`.
-|`analyzer` |The analyzer name used to analyze the query string.
+`default_operator`::
 +
 --
 (Optional, string) Default boolean logic used to interpret text in the query
 string if no operators are specified. Valid values are:
-|`quote_analyzer` |The name of the analyzer that is used to analyze
+ `OR` (Default)::
-quoted phrases in the query string. For those parts, it overrides other
+For example, a query string of `capital of Hungary` is interpreted as `capital
-analyzers that are set using the `analyzer` parameter or the
+OR of OR Hungary`.
 <<search-quote-analyzer,`search_quote_analyzer`>> setting.
-|`allow_leading_wildcard` |When set, `*` or `?` are allowed as the first
+ `AND`::
-character. Defaults to `true`.
+For example, a query string of `capital of Hungary` is interpreted as `capital
 AND of AND Hungary`.
 --
-|`enable_position_increments` |Set to `true` to enable position
+`enable_position_increments`::
-increments in result queries. Defaults to `true`.
+(Optional, boolean) If `true`, enable position increments in queries constructed
 from a `query_string` search. Defaults to `true`.
-|`fuzzy_max_expansions` |Controls the number of terms fuzzy queries will
+`fields`::
-expand to. Defaults to `50`
+
 --
 (Optional, array of strings) Array of fields you wish to search.
-|`fuzziness` |Set the fuzziness for fuzzy queries. Defaults
+You can use this parameter query to search across multiple fields. See
-to `AUTO`. See <<fuzziness>> for allowed settings.
+<<query-string-multi-field>>.
 --
-|`fuzzy_prefix_length` |Set the prefix length for fuzzy queries. Default
+`fuzziness`::
-is `0`.
+(Optional, string) Maximum edit distance allowed for matching. See <<fuzziness>>
 for valid values and more information.
-|`fuzzy_transpositions` |Set to `false` to disable fuzzy transpositions (`ab` -> `ba`).
+`fuzzy_max_expansions`::
-Default is `true`.
+(Optional, integer) Maximum number of terms to which the query expands for fuzzy
 matching. Defaults to `50`.
-|`phrase_slop` |Sets the default slop for phrases. If zero, then exact
+`fuzzy_prefix_length`::
-phrase matches are required. Default value is `0`.
+(Optional, integer) Number of beginning characters left unchanged for fuzzy
 matching. Defaults to `0`.
-|`boost` |Sets the boost value of the query. Defaults to `1.0`.
+`fuzzy_transpositions`::
 (Optional, boolean) If `true`, edits for fuzzy matching include
 transpositions of two adjacent characters (ab → ba). Defaults to `true`.
-|`analyze_wildcard` |By default, wildcards terms in a query string are
+`lenient`::
-not analyzed. By setting this value to `true`, a best effort will be
+(Optional, boolean) If `true`, format-based errors, such as providing a text
-made to analyze those as well.
+value for a <<number,numeric>> field, are ignored. Defaults to `false`.
-|`max_determinized_states` |Limit on how many automaton states regexp
+`max_determinized_states`::
-queries are allowed to create. This protects against too-difficult
+
-(e.g. exponentially hard) regexps. Defaults to 10000.
+--
 (Optional, integer) Maximum number of
 https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states]
 required for the query. Default is `10000`.
-|`minimum_should_match` |A value controlling how many "should" clauses
+{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse
-in the resulting boolean query should match. It can be an absolute value
+regular expressions. Lucene converts each regular expression to a finite
-(`2`), a percentage (`30%`) or a
+automaton containing a number of determinized states.
 <<query-dsl-minimum-should-match,combination of
 both>>.
-|`lenient` |If set to `true` will cause format based failures (like
+You can use this parameter to prevent that conversion from unintentionally
-providing text to a numeric field) to be ignored.
+consuming too many resources. You may need to increase this limit to run complex
 regular expressions.
 --
-|`time_zone` | Time Zone to be applied to any range query related to dates.
+`minimum_should_match`::
 (Optional, string) Minimum number of clauses that must match for a document to
 be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
 parameter>> for valid values and more information. See
 <<query-string-min-should-match>> for an example.
-|`quote_field_suffix` | A suffix to append to fields for quoted parts of
+`quote_analyzer`::
-the query string. This allows to use a field that has a different analysis chain
+
-for exact matching. Look <<mixing-exact-search-with-stemming,here>> for a
+--
-comprehensive example.
+(Optional, string) <<analysis,Analyzer>> used to convert quoted text in the
 query string into tokens. Defaults to the
 <<search-quote-analyzer,`search_quote_analyzer`>> mapped for the
 `default_field`.
-|`auto_generate_synonyms_phrase_query` |Whether phrase queries should be automatically generated for multi terms synonyms.
+For quoted text, this parameter overrides the analyzer specified in the
-Defaults to `true`.
+`analyzer` parameter.
 --
-|=======================================================================
+`phrase_slop`::
 (Optional, integer) Maximum number of positions allowed between matching tokens
 for phrases. Defaults to `0`. If `0`, exact phrase matches are required.
 Transposed terms have a slop of `2`.
-When a multi term query is being generated, one can control how it gets
+`quote_field_suffix`::
-rewritten using the
+
-<<query-dsl-multi-term-rewrite,rewrite>>
+--
-parameter.
+(Optional, string) Suffix appended to quoted text in the query string.
-[float]
+You can use this suffix to use a different analysis method for exact matches.
-==== Default Field
+See <<mixing-exact-search-with-stemming>>.
 --
-When not explicitly specifying the field to search on in the query
+`rewrite`::
-string syntax, the `index.query.default_field` will be used to derive
+(Optional, string) Method used to rewrite the query. For valid values and more
-which field to search on. If the `index.query.default_field` is not specified,
+information, see the <<query-dsl-multi-term-rewrite, `rewrite` parameter>>.
 the `query_string` will automatically attempt to determine the existing fields in the index's
 mapping that are queryable, and perform the search on those fields. 
 This will not include nested documents, use a nested query to search those documents.
-NOTE: For mappings with a large number of fields, searching across all queryable
+`time_zone`::
-fields in the mapping could be expensive.
+
 --
 (Optional, string)
 https://en.wikipedia.org/wiki/List_of_UTC_time_offsets[Coordinated Universal
 Time (UTC) offset] or
 https://en.wikipedia.org/wiki/List_of_tz_database_time_zones[IANA time zone]
 used to convert `date` values in the query string to UTC.
-[float]
+Valid values are ISO 8601 UTC offsets, such as `+01:00` or -`08:00`, and IANA
-==== Multi Field
+time zone IDs, such as `America/Los_Angeles`.
-The `query_string` query can also run against multiple fields. Fields can be
+[NOTE]
-provided via the `fields` parameter (example below).
+====
 The `time_zone` parameter does **not** affect the <<date-math,date math>> value
 of `now`. `now` is always the current system time in UTC. However, the
 `time_zone` parameter does convert dates calculated using `now` and
 <<date-math,date math rounding>>. For example, the `time_zone` parameter will
 convert a value of `now/d`.
 ====
 --
 [[query-string-query-notes]]
 ==== Notes
 include::query-string-syntax.asciidoc[]
 [[query-string-nested]]
 ====== Avoid using the `query_string` query for nested documents
 `query_string` searches do not return <<nested,nested>> documents. To search
 nested documents, use the <<query-dsl-nested-query, `nested` query>>.
 [[query-string-multi-field]]
 ====== Search multiple fields
 You can use the `fields` parameter to perform a `query_string` search across
 multiple fields.
 The idea of running the `query_string` query against multiple fields is to
 expand each query term to an OR clause like this:
-    field1:query_term OR field2:query_term | ...
+```
 field1:query_term OR field2:query_term | ...
 ```
 For example, the following query
@ -252,21 +340,6 @@ GET /_search
 NOTE: Since `\` (backslash) is a special character in json strings, it needs to
 be escaped, hence the two backslashes in the above `query_string`.
 When running the `query_string` query against multiple fields, the
 following additional parameters are allowed:
 [cols="<,<",options="header",]
 |=======================================================================
 |Parameter |Description
 |`type` |How the fields should be combined to build the text query.
 See <<multi-match-types, types>> for a complete example.
 Defaults to `best_fields`
 |`tie_breaker` |The disjunction max tie breaker for multi fields.
 Defaults to `0`
 |=======================================================================
 The fields parameter can also include pattern based field names,
 allowing to automatically expand to the relevant fields (dynamically
 introduced fields included). For example:
@ -285,8 +358,50 @@ GET /_search
 --------------------------------------------------
 // CONSOLE
-[float]
+[[query-string-multi-field-parms]]
-==== Synonyms
+====== Additional parameters for multiple field searches
 When running the `query_string` query against multiple fields, the
 following additional parameters are supported.
 `type`::
 +
 --
 (Optional, string) Determines how the query matches and scores documents. Valid
 values are:
 `best_fields` (Default)::
 Finds documents which match any field and uses the highest
 <<relevance-scores,`_score`>> from any matching field. See
 <<type-best-fields>>.
 `bool_prefix`::
 Creates a `match_bool_prefix` query on each field and combines the `_score` from
 each field. See <<type-bool-prefix>>.
 `cross_fields`::
 Treats fields with the same `analyzer` as though they were one big field. Looks
 for each word in **any** field. See <<type-cross-fields>>.
 `most_fields`::
 Finds documents which match any field and combines the `_score` from each field.
 See <<type-most-fields>>.
 `phrase`::
 Runs a `match_phrase` query on each field and uses the `_score` from the best
 field. See <<type-phrase>>.
 `phrase_prefix`::
 Runs a `match_phrase_prefix` query on each field and uses the `_score` from the
 best field. See <<type-phrase>>.
 NOTE:
 Additional top-level `multi_match` parameters may be available based on the
 <<multi-match-types,`type`>> value.
 --
 [[query-string-synonyms]]
 ===== Synonyms and the `query_string` query
 The `query_string` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter,
 synonym_graph>> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
@ -318,8 +433,8 @@ The example above creates a boolean query:
 that matches documents with the term `ny` or the conjunction `new AND york`.
 By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
-[float]
+[[query-string-min-should-match]]
-==== Minimum should match
+===== How `minimum_should_match` works
 The `query_string` splits the query around each operator to create a boolean
 query for the entire input. You can use `minimum_should_match` to control how
@ -349,8 +464,8 @@ The example above creates a boolean query:
 that matches documents with at least two of the terms `this`, `that` or `thus`
 in the single field `title`.
-[float]
+[[query-string-min-should-match-multi]]
-===== Multi Field
+===== How `minimum_should_match` works for multiple fields
 [source,js]
 --------------------------------------------------
@ -404,8 +519,11 @@ The example above creates a boolean query:
 that matches documents with at least two of the three "should" clauses, each of
 them made of the disjunction max over the fields for each term.
-[float]
+[[query-string-min-should-match-cross]]
-===== Cross Field
+===== How `minimum_should_match` works for cross-field searches
 A `cross_fields` value in the `type` field indicates fields with the same
 analyzer are grouped together when the input is analyzed.
 [source,js]
 --------------------------------------------------
@ -426,13 +544,8 @@ GET /_search
 --------------------------------------------------
 // CONSOLE
 The `cross_fields` value in the `type` field indicates that fields that have the
 same analyzer should be grouped together when the input is analyzed.
 The example above creates a boolean query:
 `(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2`
 that matches documents with at least two of the three per-term blended queries.
 include::query-string-syntax.asciidoc[]
--- a/docs/reference/query-dsl/query-string-syntax.asciidoc
+++ b/docs/reference/query-dsl/query-string-syntax.asciidoc
@ -1,6 +1,6 @@
 [[query-string-syntax]]
-==== Query string syntax
+===== Query string syntax
 The query string ``mini-language'' is used by the
 <<query-dsl-query-string-query>> and by the
@ -14,10 +14,9 @@ phrase, in the same order.
 Operators allow you to customize the search -- the available options are
 explained below.
-===== Field names
+====== Field names
-As mentioned in <<query-dsl-query-string-query>>, the `default_field` is searched for the
+You can specify fields to search in the query syntax:
 search terms, but it is possible to specify other fields in the query syntax:
 * where the `status` field contains `active`
@ -40,7 +39,7 @@ search terms, but it is possible to specify other fields in the query syntax:
    _exists_:title
-===== Wildcards
+====== Wildcards
 Wildcard searches can be run on individual terms, using `?` to replace
 a single character, and `*` to replace zero or more characters:
@ -88,7 +87,7 @@ analyzed and a boolean query will be built out of the different tokens, by
 ensuring exact matches on the first N-1 tokens, and prefix match on the last
 token.
-===== Regular expressions
+====== Regular expressions
 Regular expression patterns can be embedded in the query string by
 wrapping them in forward-slashes (`"/"`):
@ -108,7 +107,7 @@ Elasticsearch to visit every term in the index:
 Use with caution!
 =======
-===== Fuzziness
+====== Fuzziness
 We can search for terms that are
 similar to, but not exactly like our search terms, using the ``fuzzy''
@ -128,7 +127,7 @@ sufficient to catch 80% of all human misspellings. It can be specified as:
    quikc~1
-===== Proximity searches
+====== Proximity searches
 While a phrase query (eg `"john smith"`) expects all of the terms in exactly
 the same order, a proximity query allows the specified words to be further
@ -143,7 +142,7 @@ query string, the more relevant that document is considered to be. When
 compared to the above example query, the phrase `"quick fox"` would be
 considered more relevant than `"quick brown fox"`.
-===== Ranges
+====== Ranges
 Ranges can be specified for date, numeric or string fields. Inclusive ranges
 are specified with square brackets `[min TO max]` and exclusive ranges with
@ -197,7 +196,7 @@ The parsing of ranges in query strings can be complex and error prone. It is
 much more reliable to use an explicit <<query-dsl-range-query,`range` query>>.
-===== Boosting
+====== Boosting
 Use the _boost_ operator `^` to make one term more relevant than another.
 For instance, if we want to find all documents about foxes, but we are
@ -212,7 +211,7 @@ Boosts can also be applied to phrases or to groups:
    "john smith"^2   (foo bar)^4
-===== Boolean operators
+====== Boolean operators
 By default, all terms are optional, as long as one term matches.  A search
 for `foo bar baz` will find any document that contains one or more of
@ -255,7 +254,7 @@ would look like this:
    }
-===== Grouping
+====== Grouping
 Multiple terms or clauses can be grouped together with parentheses, to form
 sub-queries:
@ -267,7 +266,7 @@ of a sub-query:
    status:(active OR pending) title:(full text search)^2
-===== Reserved characters
+====== Reserved characters
 If you need to use any of the characters which function as operators in your
 query itself (and not as operators), then you should escape them with
@ -283,7 +282,9 @@ NOTE: `<` and `>` can't be escaped at all. The only way to prevent them from
 attempting to create a range query is to remove them from the query string
 entirely.
-===== Empty Query
+====== Whitespaces and empty queries
 Whitespace is not considered an operator.
 If the query string is empty or only contains whitespaces the query will
 yield an empty result set.