2013-08-28 19:24:34 -04:00
[[query-dsl-match-query]]
2019-07-18 10:18:11 -04:00
=== Match query
++++
<titleabbrev>Match</titleabbrev>
++++
2013-08-28 19:24:34 -04:00
2015-07-05 12:24:06 -04:00
2016-04-05 08:16:43 -04:00
`match` queries accept text/numerics/dates, analyzes
2015-07-05 12:24:06 -04:00
them, and constructs a query. For example:
2013-08-28 19:24:34 -04:00
[source,js]
--------------------------------------------------
2016-05-24 05:58:43 -04:00
GET /_search
2013-08-28 19:24:34 -04:00
{
2016-05-24 05:58:43 -04:00
"query": {
"match" : {
"message" : "this is a test"
}
2013-08-28 19:24:34 -04:00
}
}
--------------------------------------------------
2016-05-24 05:58:43 -04:00
// CONSOLE
2013-08-28 19:24:34 -04:00
Note, `message` is the name of a field, you can substitute the name of
2017-08-21 07:12:27 -04:00
any field instead.
2013-08-28 19:24:34 -04:00
2015-05-24 11:57:34 -04:00
[[query-dsl-match-query-boolean]]
2016-04-05 08:16:43 -04:00
==== match
2013-08-28 19:24:34 -04:00
2016-04-05 08:16:43 -04:00
The `match` query is of type `boolean`. It means that the text
2013-08-28 19:24:34 -04:00
provided is analyzed and the analysis process constructs a boolean query
from the provided text. The `operator` flag can be set to `or` or `and`
to control the boolean clauses (defaults to `or`). The minimum number of
2014-05-14 05:58:46 -04:00
optional `should` clauses to match can be set using the
2013-08-28 19:24:34 -04:00
<<query-dsl-minimum-should-match,`minimum_should_match`>>
parameter.
The `analyzer` can be set to control which analyzer will perform the
2014-05-14 05:58:46 -04:00
analysis process on the text. It defaults to the field explicit mapping
2013-08-28 19:24:34 -04:00
definition, or the default search analyzer.
2015-05-24 11:57:34 -04:00
The `lenient` parameter can be set to `true` to ignore exceptions caused by
data-type mismatches, such as trying to query a numeric field with a text
query string. Defaults to `false`.
[[query-dsl-match-query-fuzziness]]
2015-06-04 07:16:32 -04:00
===== Fuzziness
2015-05-24 11:57:34 -04:00
2014-01-02 10:45:24 -05:00
`fuzziness` allows _fuzzy matching_ based on the type of field being queried.
See <<fuzziness>> for allowed settings.
The `prefix_length` and
2013-08-28 19:24:34 -04:00
`max_expansions` can be set in this case to control the fuzzy process.
2015-07-08 10:20:58 -04:00
If the fuzzy option is set the query will use `top_terms_blended_freqs_${max_expansions}`
2013-08-28 19:24:34 -04:00
as its <<query-dsl-multi-term-rewrite,rewrite
2014-04-23 15:05:14 -04:00
method>> the `fuzzy_rewrite` parameter allows to control how the query will get
2013-08-28 19:24:34 -04:00
rewritten.
2016-05-14 05:20:04 -04:00
Fuzzy transpositions (`ab` -> `ba`) are allowed by default but can be disabled
by setting `fuzzy_transpositions` to `false`.
2019-06-05 16:02:17 -04:00
NOTE: Fuzzy matching is not applied to terms with synonyms or in cases where the
analysis process produces multiple tokens at the same position. Under the hood
2019-04-04 04:07:42 -04:00
these terms are expanded to a special synonym query that blends term frequencies,
which does not support fuzzy expansion.
2013-08-28 19:24:34 -04:00
[source,js]
--------------------------------------------------
2016-05-24 05:58:43 -04:00
GET /_search
2013-08-28 19:24:34 -04:00
{
2016-05-24 05:58:43 -04:00
"query": {
"match" : {
"message" : {
"query" : "this is a test",
"operator" : "and"
}
2013-08-28 19:24:34 -04:00
}
}
}
--------------------------------------------------
2016-05-24 05:58:43 -04:00
// CONSOLE
2013-08-28 19:24:34 -04:00
2015-05-24 11:57:34 -04:00
[[query-dsl-match-query-zero]]
===== Zero terms query
2013-08-28 19:24:34 -04:00
If the analyzer used removes all tokens in a query like a `stop` filter
does, the default behavior is to match no documents at all. In order to
change that the `zero_terms_query` option can be used, which accepts
`none` (default) and `all` which corresponds to a `match_all` query.
[source,js]
--------------------------------------------------
2016-05-24 05:58:43 -04:00
GET /_search
2013-08-28 19:24:34 -04:00
{
2016-05-24 05:58:43 -04:00
"query": {
"match" : {
"message" : {
"query" : "to be or not to be",
"operator" : "and",
"zero_terms_query": "all"
}
2013-08-28 19:24:34 -04:00
}
}
}
--------------------------------------------------
2016-05-24 05:58:43 -04:00
// CONSOLE
2013-08-28 19:24:34 -04:00
2015-05-24 11:57:34 -04:00
[[query-dsl-match-query-cutoff]]
===== Cutoff frequency
2019-05-30 12:04:47 -04:00
deprecated[7.3.0,"This option can be omitted as the <<query-dsl-match-query>> can skip block of documents efficiently, without any configuration, provided that the total number of hits is not tracked."]
2013-09-03 15:27:49 -04:00
The match query supports a `cutoff_frequency` that allows
2013-08-28 19:24:34 -04:00
specifying an absolute or relative document frequency where high
2014-12-15 13:36:32 -05:00
frequency terms are moved into an optional subquery and are only scored
if one of the low frequency (below the cutoff) terms in the case of an
`or` operator or all of the low frequency terms in the case of an `and`
2013-08-28 19:24:34 -04:00
operator match.
This query allows handling `stopwords` dynamically at runtime, is domain
2015-04-23 08:09:03 -04:00
independent and doesn't require a stopword file. It prevents scoring /
2014-12-15 13:36:32 -05:00
iterating high frequency terms and only takes the terms into account if a
2015-04-23 08:09:03 -04:00
more significant / lower frequency term matches a document. Yet, if all
of the query terms are above the given `cutoff_frequency` the query is
2013-08-28 19:24:34 -04:00
automatically transformed into a pure conjunction (`and`) query to
ensure fast execution.
2015-04-07 04:12:39 -04:00
The `cutoff_frequency` can either be relative to the total number of
documents if in the range `[0..1)` or absolute if greater or equal to
2013-08-28 19:24:34 -04:00
`1.0`.
2015-04-23 08:09:03 -04:00
Here is an example showing a query composed of stopwords exclusively:
2013-08-28 19:24:34 -04:00
[source,js]
--------------------------------------------------
2016-05-24 05:58:43 -04:00
GET /_search
2013-08-28 19:24:34 -04:00
{
2016-05-24 05:58:43 -04:00
"query": {
"match" : {
"message" : {
"query" : "to be or not to be",
"cutoff_frequency" : 0.001
}
2013-08-28 19:24:34 -04:00
}
}
}
--------------------------------------------------
2016-05-24 05:58:43 -04:00
// CONSOLE
2019-05-30 12:04:47 -04:00
// TEST[warning:Deprecated field [cutoff_frequency] used, replaced by [you can omit this option, the [match] query can skip block of documents efficiently if the total number of hits is not tracked]]
2015-04-07 04:12:39 -04:00
IMPORTANT: The `cutoff_frequency` option operates on a per-shard-level. This means
that when trying it out on test indexes with low document numbers you
should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken].
2017-08-09 06:15:09 -04:00
[[query-dsl-match-query-synonyms]]
===== Synonyms
The `match` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter,
synonym_graph>> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
For example, the following synonym: `"ny, new york" would produce:`
`(ny OR ("new york"))`
It is also possible to match multi terms synonyms with conjunctions instead:
[source,js]
--------------------------------------------------
GET /_search
{
"query": {
"match" : {
"message": {
"query" : "ny city",
"auto_generate_synonyms_phrase_query" : false
}
}
}
}
--------------------------------------------------
// CONSOLE
The example above creates a boolean query:
2018-09-12 08:34:05 -04:00
`(ny OR (new AND york)) city`
2017-08-09 06:15:09 -04:00
that matches documents with the term `ny` or the conjunction `new AND york`.
By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
2015-05-24 11:57:34 -04:00
.Comparison to query_string / field
**************************************************
2013-08-28 19:24:34 -04:00
The match family of queries does not go through a "query parsing"
process. It does not support field name prefixes, wildcard characters,
2014-08-13 17:05:09 -04:00
or other "advanced" features. For this reason, chances of it failing are
2013-08-28 19:24:34 -04:00
very small / non existent, and it provides an excellent behavior when it
comes to just analyze and run that text as a query behavior (which is
2019-03-27 16:29:13 -04:00
usually what a text search box does).
2013-10-16 04:53:25 -04:00
2015-05-24 11:57:34 -04:00
**************************************************