diff --git a/docs/reference/query-dsl/match-query.asciidoc b/docs/reference/query-dsl/match-query.asciidoc index 27dde4c2a91..a894ef0dae2 100644 --- a/docs/reference/query-dsl/match-query.asciidoc +++ b/docs/reference/query-dsl/match-query.asciidoc @@ -185,15 +185,3 @@ The example above creates a boolean query: that matches documents with the term `ny` or the conjunction `new AND york`. By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`. - -.Comparison to query_string / field -************************************************** - -The match family of queries does not go through a "query parsing" -process. It does not support field name prefixes, wildcard characters, -or other "advanced" features. For this reason, chances of it failing are -very small / non existent, and it provides an excellent behavior when it -comes to just analyze and run that text as a query behavior (which is -usually what a text search box does). - -************************************************** diff --git a/docs/reference/query-dsl/query-string-query.asciidoc b/docs/reference/query-dsl/query-string-query.asciidoc index 967dd906eec..cced4f30eeb 100644 --- a/docs/reference/query-dsl/query-string-query.asciidoc +++ b/docs/reference/query-dsl/query-string-query.asciidoc @@ -4,8 +4,39 @@ Query string ++++ -A query that uses a query parser in order to parse its content. Here is -an example: +Returns documents based on a provided query string, using a parser with a strict +syntax. + +This query uses a <> to parse and split the provided +query string based on operators, such as `AND` or `NOT`. The query +then <> each split text independently before returning +matching documents. + +You can use the `query_string` query to create a complex search that includes +wildcard characters, searches across multiple fields, and more. While versatile, +the query is strict and returns an error if the query string includes any +invalid syntax. + +[WARNING] +==== +Because it returns an error for any invalid syntax, we don't recommend using +the `query_string` query for search boxes. + +If you don't need to support a query syntax, consider using the +<> query. If you need the features of a query +syntax, use the <> +query, which is less strict. +==== + + +[[query-string-query-ex-request]] +==== Example request + +When running the following search, the `query_string` query splits `(new york +city) OR (big apple)` into two parts: `new york city` and `big apple`. The +`content` field's analyzer then independently converts each part into tokens +before returning matching documents. Because the query syntax does not use +whitespace as an operator, `new york city` is passed as-is to the analyzer. [source,js] -------------------------------------------------- @@ -13,154 +44,211 @@ GET /_search { "query": { "query_string" : { - "default_field" : "content", - "query" : "this AND that OR thus" + "query" : "(new york city) OR (big apple)", + "default_field" : "content" } } } -------------------------------------------------- // CONSOLE -The `query_string` query parses the input and splits text around operators. -Each textual part is analyzed independently of each other. For instance the following query: +[[query-string-top-level-params]] +==== Top-level parameters for `query_string` +`query`:: +(Required, string) Query string you wish to parse and use for search. See +<>. -[source,js] --------------------------------------------------- -GET /_search -{ - "query": { - "query_string" : { - "default_field" : "content", - "query" : "(new york city) OR (big apple)" <1> - } - } -} --------------------------------------------------- -// CONSOLE +`default_field`:: ++ +-- +(Optional, string) Default field you wish to search if no field is provided in +the query string. -<1> will be split into `new york city` and `big apple` and each part is then -analyzed independently by the analyzer configured for the field. +Defaults to the `index.query.default_field` index setting, which has a default +value of `*`. The `*` value extracts all fields that are eligible to term +queries and filters the metadata fields. All extracted fields are then combined +to build a query if no `prefix` is specified. -WARNING: Whitespaces are not considered operators, this means that `new york city` -will be passed "as is" to the analyzer configured for the field. If the field is a `keyword` -field the analyzer will create a single term `new york city` and the query builder will -use this term in the query. If you want to query each term separately you need to add explicit -operators around the terms (e.g. `new AND york AND city`). +WARNING: There is a limit on the number of fields that can be queried at once. +It is defined by the `indices.query.bool.max_clause_count` +<>, which defaults to 1024. +-- -When multiple fields are provided it is also possible to modify how the different -field queries are combined inside each textual part using the `type` parameter. -The possible modes are described <> and the default is `best_fields`. +`allow_leading_wildcard`:: +(Optional, boolean) If `true`, the wildcard characters `*` and `?` are allowed +as the first character of the query string. Defaults to `true`. -The `query_string` top level parameters include: +`analyze_wildcard`:: +(Optional, boolean) If `true`, the query attempts to analyze wildcard terms in +the query string. Defaults to `false`. -[cols="<,<",options="header",] -|======================================================================= -|Parameter |Description -|`query` |The actual query to be parsed. See <>. +`analyzer`:: +(Optional, string) <> used to convert text in the +query string into tokens. Defaults to the +<> mapped for the +`default_field`. If no analyzer is mapped, the index's default analyzer is used. -|`default_field` |The default field for query terms if no prefix field is -specified. Defaults to the `index.query.default_field` index settings, which in -turn defaults to `*`. `*` extracts all fields in the mapping that are eligible -to term queries and filters the metadata fields. All extracted fields are then -combined to build a query when no prefix field is provided. +`auto_generate_synonyms_phrase_query`:: +(Optional, boolean) If `true`, <> +queries are automatically created for multi-term synonyms. Defaults to `true`. +See <> for an example. -WARNING: There is a limit on the number of fields that can be queried -at once. It is defined by the `indices.query.bool.max_clause_count` <> -which defaults to 1024. +`boost`:: ++ +-- +(Optional, float) Floating point number used to decrease or increase the +<> of the query. Defaults to `1.0`. -|`default_operator` |The default operator used if no explicit operator -is specified. For example, with a default operator of `OR`, the query -`capital of Hungary` is translated to `capital OR of OR Hungary`, and -with default operator of `AND`, the same query is translated to -`capital AND of AND Hungary`. The default value is `OR`. +Boost values are relative to the default value of `1.0`. A boost value between +`0` and `1.0` decreases the relevance score. A value greater than `1.0` +increases the relevance score. +-- -|`analyzer` |The analyzer name used to analyze the query string. +`default_operator`:: ++ +-- +(Optional, string) Default boolean logic used to interpret text in the query +string if no operators are specified. Valid values are: -|`quote_analyzer` |The name of the analyzer that is used to analyze -quoted phrases in the query string. For those parts, it overrides other -analyzers that are set using the `analyzer` parameter or the -<> setting. + `OR` (Default):: +For example, a query string of `capital of Hungary` is interpreted as `capital +OR of OR Hungary`. -|`allow_leading_wildcard` |When set, `*` or `?` are allowed as the first -character. Defaults to `true`. + `AND`:: +For example, a query string of `capital of Hungary` is interpreted as `capital +AND of AND Hungary`. +-- -|`enable_position_increments` |Set to `true` to enable position -increments in result queries. Defaults to `true`. +`enable_position_increments`:: +(Optional, boolean) If `true`, enable position increments in queries constructed +from a `query_string` search. Defaults to `true`. -|`fuzzy_max_expansions` |Controls the number of terms fuzzy queries will -expand to. Defaults to `50` +`fields`:: ++ +-- +(Optional, array of strings) Array of fields you wish to search. -|`fuzziness` |Set the fuzziness for fuzzy queries. Defaults -to `AUTO`. See <> for allowed settings. +You can use this parameter query to search across multiple fields. See +<>. +-- -|`fuzzy_prefix_length` |Set the prefix length for fuzzy queries. Default -is `0`. +`fuzziness`:: +(Optional, string) Maximum edit distance allowed for matching. See <> +for valid values and more information. -|`fuzzy_transpositions` |Set to `false` to disable fuzzy transpositions (`ab` -> `ba`). -Default is `true`. +`fuzzy_max_expansions`:: +(Optional, integer) Maximum number of terms to which the query expands for fuzzy +matching. Defaults to `50`. -|`phrase_slop` |Sets the default slop for phrases. If zero, then exact -phrase matches are required. Default value is `0`. +`fuzzy_prefix_length`:: +(Optional, integer) Number of beginning characters left unchanged for fuzzy +matching. Defaults to `0`. -|`boost` |Sets the boost value of the query. Defaults to `1.0`. +`fuzzy_transpositions`:: +(Optional, boolean) If `true`, edits for fuzzy matching include +transpositions of two adjacent characters (ab → ba). Defaults to `true`. -|`analyze_wildcard` |By default, wildcards terms in a query string are -not analyzed. By setting this value to `true`, a best effort will be -made to analyze those as well. +`lenient`:: +(Optional, boolean) If `true`, format-based errors, such as providing a text +value for a <> field, are ignored. Defaults to `false`. -|`max_determinized_states` |Limit on how many automaton states regexp -queries are allowed to create. This protects against too-difficult -(e.g. exponentially hard) regexps. Defaults to 10000. +`max_determinized_states`:: ++ +-- +(Optional, integer) Maximum number of +https://en.wikipedia.org/wiki/Deterministic_finite_automaton[automaton states] +required for the query. Default is `10000`. -|`minimum_should_match` |A value controlling how many "should" clauses -in the resulting boolean query should match. It can be an absolute value -(`2`), a percentage (`30%`) or a -<>. +{es} uses https://lucene.apache.org/core/[Apache Lucene] internally to parse +regular expressions. Lucene converts each regular expression to a finite +automaton containing a number of determinized states. -|`lenient` |If set to `true` will cause format based failures (like -providing text to a numeric field) to be ignored. +You can use this parameter to prevent that conversion from unintentionally +consuming too many resources. You may need to increase this limit to run complex +regular expressions. +-- -|`time_zone` | Time Zone to be applied to any range query related to dates. +`minimum_should_match`:: +(Optional, string) Minimum number of clauses that must match for a document to +be returned. See the <> for valid values and more information. See +<> for an example. -|`quote_field_suffix` | A suffix to append to fields for quoted parts of -the query string. This allows to use a field that has a different analysis chain -for exact matching. Look <> for a -comprehensive example. +`quote_analyzer`:: ++ +-- +(Optional, string) <> used to convert quoted text in the +query string into tokens. Defaults to the +<> mapped for the +`default_field`. -|`auto_generate_synonyms_phrase_query` |Whether phrase queries should be automatically generated for multi terms synonyms. -Defaults to `true`. +For quoted text, this parameter overrides the analyzer specified in the +`analyzer` parameter. +-- -|======================================================================= +`phrase_slop`:: +(Optional, integer) Maximum number of positions allowed between matching tokens +for phrases. Defaults to `0`. If `0`, exact phrase matches are required. +Transposed terms have a slop of `2`. -When a multi term query is being generated, one can control how it gets -rewritten using the -<> -parameter. +`quote_field_suffix`:: ++ +-- +(Optional, string) Suffix appended to quoted text in the query string. -[float] -==== Default Field +You can use this suffix to use a different analysis method for exact matches. +See <>. +-- -When not explicitly specifying the field to search on in the query -string syntax, the `index.query.default_field` will be used to derive -which field to search on. If the `index.query.default_field` is not specified, -the `query_string` will automatically attempt to determine the existing fields in the index's -mapping that are queryable, and perform the search on those fields. -This will not include nested documents, use a nested query to search those documents. +`rewrite`:: +(Optional, string) Method used to rewrite the query. For valid values and more +information, see the <>. -NOTE: For mappings with a large number of fields, searching across all queryable -fields in the mapping could be expensive. +`time_zone`:: ++ +-- +(Optional, string) +https://en.wikipedia.org/wiki/List_of_UTC_time_offsets[Coordinated Universal +Time (UTC) offset] or +https://en.wikipedia.org/wiki/List_of_tz_database_time_zones[IANA time zone] +used to convert `date` values in the query string to UTC. -[float] -==== Multi Field +Valid values are ISO 8601 UTC offsets, such as `+01:00` or -`08:00`, and IANA +time zone IDs, such as `America/Los_Angeles`. -The `query_string` query can also run against multiple fields. Fields can be -provided via the `fields` parameter (example below). +[NOTE] +==== +The `time_zone` parameter does **not** affect the <> value +of `now`. `now` is always the current system time in UTC. However, the +`time_zone` parameter does convert dates calculated using `now` and +<>. For example, the `time_zone` parameter will +convert a value of `now/d`. +==== +-- + +[[query-string-query-notes]] +==== Notes + +include::query-string-syntax.asciidoc[] + +[[query-string-nested]] +====== Avoid using the `query_string` query for nested documents + +`query_string` searches do not return <> documents. To search +nested documents, use the <>. + +[[query-string-multi-field]] +====== Search multiple fields + +You can use the `fields` parameter to perform a `query_string` search across +multiple fields. The idea of running the `query_string` query against multiple fields is to expand each query term to an OR clause like this: - field1:query_term OR field2:query_term | ... +``` +field1:query_term OR field2:query_term | ... +``` For example, the following query @@ -252,21 +340,6 @@ GET /_search NOTE: Since `\` (backslash) is a special character in json strings, it needs to be escaped, hence the two backslashes in the above `query_string`. -When running the `query_string` query against multiple fields, the -following additional parameters are allowed: - -[cols="<,<",options="header",] -|======================================================================= -|Parameter |Description - -|`type` |How the fields should be combined to build the text query. -See <> for a complete example. -Defaults to `best_fields` - -|`tie_breaker` |The disjunction max tie breaker for multi fields. -Defaults to `0` -|======================================================================= - The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example: @@ -285,8 +358,50 @@ GET /_search -------------------------------------------------- // CONSOLE -[float] -==== Synonyms +[[query-string-multi-field-parms]] +====== Additional parameters for multiple field searches + +When running the `query_string` query against multiple fields, the +following additional parameters are supported. + +`type`:: ++ +-- +(Optional, string) Determines how the query matches and scores documents. Valid +values are: + +`best_fields` (Default):: +Finds documents which match any field and uses the highest +<> from any matching field. See +<>. + +`bool_prefix`:: +Creates a `match_bool_prefix` query on each field and combines the `_score` from +each field. See <>. + +`cross_fields`:: +Treats fields with the same `analyzer` as though they were one big field. Looks +for each word in **any** field. See <>. + +`most_fields`:: +Finds documents which match any field and combines the `_score` from each field. +See <>. + +`phrase`:: +Runs a `match_phrase` query on each field and uses the `_score` from the best +field. See <>. + +`phrase_prefix`:: +Runs a `match_phrase_prefix` query on each field and uses the `_score` from the +best field. See <>. + +NOTE: +Additional top-level `multi_match` parameters may be available based on the +<> value. +-- + +[[query-string-synonyms]] +===== Synonyms and the `query_string` query The `query_string` query supports multi-terms synonym expansion with the <> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms. @@ -318,8 +433,8 @@ The example above creates a boolean query: that matches documents with the term `ny` or the conjunction `new AND york`. By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`. -[float] -==== Minimum should match +[[query-string-min-should-match]] +===== How `minimum_should_match` works The `query_string` splits the query around each operator to create a boolean query for the entire input. You can use `minimum_should_match` to control how @@ -349,8 +464,8 @@ The example above creates a boolean query: that matches documents with at least two of the terms `this`, `that` or `thus` in the single field `title`. -[float] -===== Multi Field +[[query-string-min-should-match-multi]] +===== How `minimum_should_match` works for multiple fields [source,js] -------------------------------------------------- @@ -404,8 +519,11 @@ The example above creates a boolean query: that matches documents with at least two of the three "should" clauses, each of them made of the disjunction max over the fields for each term. -[float] -===== Cross Field +[[query-string-min-should-match-cross]] +===== How `minimum_should_match` works for cross-field searches + +A `cross_fields` value in the `type` field indicates fields with the same +analyzer are grouped together when the input is analyzed. [source,js] -------------------------------------------------- @@ -426,13 +544,8 @@ GET /_search -------------------------------------------------- // CONSOLE -The `cross_fields` value in the `type` field indicates that fields that have the -same analyzer should be grouped together when the input is analyzed. - The example above creates a boolean query: `(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2` that matches documents with at least two of the three per-term blended queries. - -include::query-string-syntax.asciidoc[] diff --git a/docs/reference/query-dsl/query-string-syntax.asciidoc b/docs/reference/query-dsl/query-string-syntax.asciidoc index 765b54b5883..03a2e8b8212 100644 --- a/docs/reference/query-dsl/query-string-syntax.asciidoc +++ b/docs/reference/query-dsl/query-string-syntax.asciidoc @@ -1,6 +1,6 @@ [[query-string-syntax]] -==== Query string syntax +===== Query string syntax The query string ``mini-language'' is used by the <> and by the @@ -14,10 +14,9 @@ phrase, in the same order. Operators allow you to customize the search -- the available options are explained below. -===== Field names +====== Field names -As mentioned in <>, the `default_field` is searched for the -search terms, but it is possible to specify other fields in the query syntax: +You can specify fields to search in the query syntax: * where the `status` field contains `active` @@ -40,7 +39,7 @@ search terms, but it is possible to specify other fields in the query syntax: _exists_:title -===== Wildcards +====== Wildcards Wildcard searches can be run on individual terms, using `?` to replace a single character, and `*` to replace zero or more characters: @@ -88,7 +87,7 @@ analyzed and a boolean query will be built out of the different tokens, by ensuring exact matches on the first N-1 tokens, and prefix match on the last token. -===== Regular expressions +====== Regular expressions Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes (`"/"`): @@ -108,7 +107,7 @@ Elasticsearch to visit every term in the index: Use with caution! ======= -===== Fuzziness +====== Fuzziness We can search for terms that are similar to, but not exactly like our search terms, using the ``fuzzy'' @@ -128,7 +127,7 @@ sufficient to catch 80% of all human misspellings. It can be specified as: quikc~1 -===== Proximity searches +====== Proximity searches While a phrase query (eg `"john smith"`) expects all of the terms in exactly the same order, a proximity query allows the specified words to be further @@ -143,7 +142,7 @@ query string, the more relevant that document is considered to be. When compared to the above example query, the phrase `"quick fox"` would be considered more relevant than `"quick brown fox"`. -===== Ranges +====== Ranges Ranges can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets `[min TO max]` and exclusive ranges with @@ -197,7 +196,7 @@ The parsing of ranges in query strings can be complex and error prone. It is much more reliable to use an explicit <>. -===== Boosting +====== Boosting Use the _boost_ operator `^` to make one term more relevant than another. For instance, if we want to find all documents about foxes, but we are @@ -212,7 +211,7 @@ Boosts can also be applied to phrases or to groups: "john smith"^2 (foo bar)^4 -===== Boolean operators +====== Boolean operators By default, all terms are optional, as long as one term matches. A search for `foo bar baz` will find any document that contains one or more of @@ -255,7 +254,7 @@ would look like this: } -===== Grouping +====== Grouping Multiple terms or clauses can be grouped together with parentheses, to form sub-queries: @@ -267,7 +266,7 @@ of a sub-query: status:(active OR pending) title:(full text search)^2 -===== Reserved characters +====== Reserved characters If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with @@ -283,7 +282,9 @@ NOTE: `<` and `>` can't be escaped at all. The only way to prevent them from attempting to create a range query is to remove them from the query string entirely. -===== Empty Query +====== Whitespaces and empty queries + +Whitespace is not considered an operator. If the query string is empty or only contains whitespaces the query will yield an empty result set.