[[query-dsl-query-string-query]] === Query string query ++++ Query string ++++ A query that uses a query parser in order to parse its content. Here is an example: [source,js] -------------------------------------------------- GET /_search { "query": { "query_string" : { "default_field" : "content", "query" : "this AND that OR thus" } } } -------------------------------------------------- // CONSOLE The `query_string` query parses the input and splits text around operators. Each textual part is analyzed independently of each other. For instance the following query: [source,js] -------------------------------------------------- GET /_search { "query": { "query_string" : { "default_field" : "content", "query" : "(new york city) OR (big apple)" <1> } } } -------------------------------------------------- // CONSOLE <1> will be split into `new york city` and `big apple` and each part is then analyzed independently by the analyzer configured for the field. WARNING: Whitespaces are not considered operators, this means that `new york city` will be passed "as is" to the analyzer configured for the field. If the field is a `keyword` field the analyzer will create a single term `new york city` and the query builder will use this term in the query. If you want to query each term separately you need to add explicit operators around the terms (e.g. `new AND york AND city`). When multiple fields are provided it is also possible to modify how the different field queries are combined inside each textual part using the `type` parameter. The possible modes are described <> and the default is `best_fields`. The `query_string` top level parameters include: [cols="<,<",options="header",] |======================================================================= |Parameter |Description |`query` |The actual query to be parsed. See <>. |`default_field` |The default field for query terms if no prefix field is specified. Defaults to the `index.query.default_field` index settings, which in turn defaults to `*`. `*` extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then combined to build a query when no prefix field is provided. WARNING: There is a limit on the number of fields that can be queried at once. It is defined by the `indices.query.bool.max_clause_count` <> which defaults to 1024. |`default_operator` |The default operator used if no explicit operator is specified. For example, with a default operator of `OR`, the query `capital of Hungary` is translated to `capital OR of OR Hungary`, and with default operator of `AND`, the same query is translated to `capital AND of AND Hungary`. The default value is `OR`. |`analyzer` |The analyzer name used to analyze the query string. |`quote_analyzer` |The name of the analyzer that is used to analyze quoted phrases in the query string. For those parts, it overrides other analyzers that are set using the `analyzer` parameter or the <> setting. |`allow_leading_wildcard` |When set, `*` or `?` are allowed as the first character. Defaults to `true`. |`enable_position_increments` |Set to `true` to enable position increments in result queries. Defaults to `true`. |`fuzzy_max_expansions` |Controls the number of terms fuzzy queries will expand to. Defaults to `50` |`fuzziness` |Set the fuzziness for fuzzy queries. Defaults to `AUTO`. See <> for allowed settings. |`fuzzy_prefix_length` |Set the prefix length for fuzzy queries. Default is `0`. |`fuzzy_transpositions` |Set to `false` to disable fuzzy transpositions (`ab` -> `ba`). Default is `true`. |`phrase_slop` |Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is `0`. |`boost` |Sets the boost value of the query. Defaults to `1.0`. |`analyze_wildcard` |By default, wildcards terms in a query string are not analyzed. By setting this value to `true`, a best effort will be made to analyze those as well. |`max_determinized_states` |Limit on how many automaton states regexp queries are allowed to create. This protects against too-difficult (e.g. exponentially hard) regexps. Defaults to 10000. |`minimum_should_match` |A value controlling how many "should" clauses in the resulting boolean query should match. It can be an absolute value (`2`), a percentage (`30%`) or a <>. |`lenient` |If set to `true` will cause format based failures (like providing text to a numeric field) to be ignored. |`time_zone` | Time Zone to be applied to any range query related to dates. |`quote_field_suffix` | A suffix to append to fields for quoted parts of the query string. This allows to use a field that has a different analysis chain for exact matching. Look <> for a comprehensive example. |`auto_generate_synonyms_phrase_query` |Whether phrase queries should be automatically generated for multi terms synonyms. Defaults to `true`. |======================================================================= When a multi term query is being generated, one can control how it gets rewritten using the <> parameter. [float] ==== Default Field When not explicitly specifying the field to search on in the query string syntax, the `index.query.default_field` will be used to derive which field to search on. If the `index.query.default_field` is not specified, the `query_string` will automatically attempt to determine the existing fields in the index's mapping that are queryable, and perform the search on those fields. This will not include nested documents, use a nested query to search those documents. NOTE: For mappings with a large number of fields, searching across all queryable fields in the mapping could be expensive. [float] ==== Multi Field The `query_string` query can also run against multiple fields. Fields can be provided via the `fields` parameter (example below). The idea of running the `query_string` query against multiple fields is to expand each query term to an OR clause like this: field1:query_term OR field2:query_term | ... For example, the following query [source,js] -------------------------------------------------- GET /_search { "query": { "query_string" : { "fields" : ["content", "name"], "query" : "this AND that" } } } -------------------------------------------------- // CONSOLE matches the same words as [source,js] -------------------------------------------------- GET /_search { "query": { "query_string": { "query": "(content:this OR name:this) AND (content:that OR name:that)" } } } -------------------------------------------------- // CONSOLE Since several queries are generated from the individual search terms, combining them is automatically done using a `dis_max` query with a `tie_breaker`. For example (the `name` is boosted by 5 using `^5` notation): [source,js] -------------------------------------------------- GET /_search { "query": { "query_string" : { "fields" : ["content", "name^5"], "query" : "this AND that OR thus", "tie_breaker" : 0 } } } -------------------------------------------------- // CONSOLE Simple wildcard can also be used to search "within" specific inner elements of the document. For example, if we have a `city` object with several fields (or inner object with fields) in it, we can automatically search on all "city" fields: [source,js] -------------------------------------------------- GET /_search { "query": { "query_string" : { "fields" : ["city.*"], "query" : "this AND that OR thus" } } } -------------------------------------------------- // CONSOLE Another option is to provide the wildcard fields search in the query string itself (properly escaping the `*` sign), for example: `city.\*:something`: [source,js] -------------------------------------------------- GET /_search { "query": { "query_string" : { "query" : "city.\\*:(this AND that OR thus)" } } } -------------------------------------------------- // CONSOLE NOTE: Since `\` (backslash) is a special character in json strings, it needs to be escaped, hence the two backslashes in the above `query_string`. When running the `query_string` query against multiple fields, the following additional parameters are allowed: [cols="<,<",options="header",] |======================================================================= |Parameter |Description |`type` |How the fields should be combined to build the text query. See <> for a complete example. Defaults to `best_fields` |`tie_breaker` |The disjunction max tie breaker for multi fields. Defaults to `0` |======================================================================= The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example: [source,js] -------------------------------------------------- GET /_search { "query": { "query_string" : { "fields" : ["content", "name.*^5"], "query" : "this AND that OR thus" } } } -------------------------------------------------- // CONSOLE [float] ==== Synonyms The `query_string` query supports multi-terms synonym expansion with the <> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms. For example, the following synonym: `ny, new york` would produce: `(ny OR ("new york"))` It is also possible to match multi terms synonyms with conjunctions instead: [source,js] -------------------------------------------------- GET /_search { "query": { "query_string" : { "default_field": "title", "query" : "ny city", "auto_generate_synonyms_phrase_query" : false } } } -------------------------------------------------- // CONSOLE The example above creates a boolean query: `(ny OR (new AND york)) city` that matches documents with the term `ny` or the conjunction `new AND york`. By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`. [float] ==== Minimum should match The `query_string` splits the query around each operator to create a boolean query for the entire input. You can use `minimum_should_match` to control how many "should" clauses in the resulting query should match. [source,js] -------------------------------------------------- GET /_search { "query": { "query_string": { "fields": [ "title" ], "query": "this that thus", "minimum_should_match": 2 } } } -------------------------------------------------- // CONSOLE The example above creates a boolean query: `(title:this title:that title:thus)~2` that matches documents with at least two of the terms `this`, `that` or `thus` in the single field `title`. [float] ===== Multi Field [source,js] -------------------------------------------------- GET /_search { "query": { "query_string": { "fields": [ "title", "content" ], "query": "this that thus", "minimum_should_match": 2 } } } -------------------------------------------------- // CONSOLE The example above creates a boolean query: `((content:this content:that content:thus) | (title:this title:that title:thus))` that matches documents with the disjunction max over the fields `title` and `content`. Here the `minimum_should_match` parameter can't be applied. [source,js] -------------------------------------------------- GET /_search { "query": { "query_string": { "fields": [ "title", "content" ], "query": "this OR that OR thus", "minimum_should_match": 2 } } } -------------------------------------------------- // CONSOLE Adding explicit operators forces each term to be considered as a separate clause. The example above creates a boolean query: `((content:this | title:this) (content:that | title:that) (content:thus | title:thus))~2` that matches documents with at least two of the three "should" clauses, each of them made of the disjunction max over the fields for each term. [float] ===== Cross Field [source,js] -------------------------------------------------- GET /_search { "query": { "query_string": { "fields": [ "title", "content" ], "query": "this OR that OR thus", "type": "cross_fields", "minimum_should_match": 2 } } } -------------------------------------------------- // CONSOLE The `cross_fields` value in the `type` field indicates that fields that have the same analyzer should be grouped together when the input is analyzed. The example above creates a boolean query: `(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2` that matches documents with at least two of the three per-term blended queries. include::query-string-syntax.asciidoc[]