335 lines
9.9 KiB
Plaintext
335 lines
9.9 KiB
Plaintext
[[query-dsl-match-query]]
|
|
=== Match query
|
|
++++
|
|
<titleabbrev>Match</titleabbrev>
|
|
++++
|
|
|
|
Returns documents that match a provided text, number, date or boolean value. The
|
|
provided text is analyzed before matching.
|
|
|
|
The `match` query is the standard query for performing a full-text search,
|
|
including options for fuzzy matching.
|
|
|
|
|
|
[[match-query-ex-request]]
|
|
==== Example request
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": {
|
|
"query": "this is a test"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
|
|
[[match-top-level-params]]
|
|
==== Top-level parameters for `match`
|
|
|
|
`<field>`::
|
|
(Required, object) Field you wish to search.
|
|
|
|
|
|
[[match-field-params]]
|
|
==== Parameters for `<field>`
|
|
`query`::
|
|
+
|
|
--
|
|
(Required) Text, number, boolean value or date you wish to find in the provided
|
|
`<field>`.
|
|
|
|
The `match` query <<analysis,analyzes>> any provided text before performing a
|
|
search. This means the `match` query can search <<text,`text`>> fields for
|
|
analyzed tokens rather than an exact term.
|
|
--
|
|
|
|
`analyzer`::
|
|
(Optional, string) <<analysis,Analyzer>> used to convert the text in the `query`
|
|
value into tokens. Defaults to the <<specify-index-time-analyzer,index-time
|
|
analyzer>> mapped for the `<field>`. If no analyzer is mapped, the index's
|
|
default analyzer is used.
|
|
|
|
`auto_generate_synonyms_phrase_query`::
|
|
+
|
|
--
|
|
(Optional, boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
|
|
queries are automatically created for multi-term synonyms. Defaults to `true`.
|
|
|
|
See <<query-dsl-match-query-synonyms,Use synonyms with match query>> for an
|
|
example.
|
|
--
|
|
|
|
`fuzziness`::
|
|
(Optional, string) Maximum edit distance allowed for matching. See <<fuzziness>>
|
|
for valid values and more information. See <<query-dsl-match-query-fuzziness>>
|
|
for an example.
|
|
|
|
`max_expansions`::
|
|
(Optional, integer) Maximum number of terms to which the query will
|
|
expand. Defaults to `50`.
|
|
|
|
`prefix_length`::
|
|
(Optional, integer) Number of beginning characters left unchanged for fuzzy
|
|
matching. Defaults to `0`.
|
|
|
|
`fuzzy_transpositions`::
|
|
(Optional, boolean) If `true`, edits for fuzzy matching include
|
|
transpositions of two adjacent characters (ab → ba). Defaults to `true`.
|
|
|
|
`fuzzy_rewrite`::
|
|
+
|
|
--
|
|
(Optional, string) Method used to rewrite the query. See the
|
|
<<query-dsl-multi-term-rewrite, `rewrite` parameter>> for valid values and more
|
|
information.
|
|
|
|
If the `fuzziness` parameter is not `0`, the `match` query uses a `fuzzy_rewrite`
|
|
method of `top_terms_blended_freqs_${max_expansions}` by default.
|
|
--
|
|
|
|
`lenient`::
|
|
(Optional, boolean) If `true`, format-based errors, such as providing a text
|
|
`query` value for a <<number,numeric>> field, are ignored. Defaults to `false`.
|
|
|
|
`operator`::
|
|
+
|
|
--
|
|
(Optional, string) Boolean logic used to interpret text in the `query` value.
|
|
Valid values are:
|
|
|
|
`OR` (Default)::
|
|
For example, a `query` value of `capital of Hungary` is interpreted as `capital
|
|
OR of OR Hungary`.
|
|
|
|
`AND`::
|
|
For example, a `query` value of `capital of Hungary` is interpreted as `capital
|
|
AND of AND Hungary`.
|
|
--
|
|
|
|
`minimum_should_match`::
|
|
+
|
|
--
|
|
(Optional, string) Minimum number of clauses that must match for a document to
|
|
be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
|
|
parameter>> for valid values and more information.
|
|
--
|
|
|
|
`zero_terms_query`::
|
|
+
|
|
--
|
|
(Optional, string) Indicates whether no documents are returned if the `analyzer`
|
|
removes all tokens, such as when using a `stop` filter. Valid values are:
|
|
|
|
`none` (Default)::
|
|
No documents are returned if the `analyzer` removes all tokens.
|
|
|
|
`all`::
|
|
Returns all documents, similar to a <<query-dsl-match-all-query,`match_all`>>
|
|
query.
|
|
|
|
See <<query-dsl-match-query-zero>> for an example.
|
|
--
|
|
|
|
|
|
[[match-query-notes]]
|
|
==== Notes
|
|
|
|
[[query-dsl-match-query-short-ex]]
|
|
===== Short request example
|
|
|
|
You can simplify the match query syntax by combining the `<field>` and `query`
|
|
parameters. For example:
|
|
|
|
[source,console]
|
|
----
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": "this is a test"
|
|
}
|
|
}
|
|
}
|
|
----
|
|
|
|
[[query-dsl-match-query-boolean]]
|
|
===== How the match query works
|
|
|
|
The `match` query is of type `boolean`. It means that the text
|
|
provided is analyzed and the analysis process constructs a boolean query
|
|
from the provided text. The `operator` parameter can be set to `or` or `and`
|
|
to control the boolean clauses (defaults to `or`). The minimum number of
|
|
optional `should` clauses to match can be set using the
|
|
<<query-dsl-minimum-should-match,`minimum_should_match`>>
|
|
parameter.
|
|
|
|
Here is an example with the `operator` parameter:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": {
|
|
"query": "this is a test",
|
|
"operator": "and"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The `analyzer` can be set to control which analyzer will perform the
|
|
analysis process on the text. It defaults to the field explicit mapping
|
|
definition, or the default search analyzer.
|
|
|
|
The `lenient` parameter can be set to `true` to ignore exceptions caused by
|
|
data-type mismatches, such as trying to query a numeric field with a text
|
|
query string. Defaults to `false`.
|
|
|
|
[[query-dsl-match-query-fuzziness]]
|
|
===== Fuzziness in the match query
|
|
|
|
`fuzziness` allows _fuzzy matching_ based on the type of field being queried.
|
|
See <<fuzziness>> for allowed settings.
|
|
|
|
The `prefix_length` and
|
|
`max_expansions` can be set in this case to control the fuzzy process.
|
|
If the fuzzy option is set the query will use `top_terms_blended_freqs_${max_expansions}`
|
|
as its <<query-dsl-multi-term-rewrite,rewrite
|
|
method>> the `fuzzy_rewrite` parameter allows to control how the query will get
|
|
rewritten.
|
|
|
|
Fuzzy transpositions (`ab` -> `ba`) are allowed by default but can be disabled
|
|
by setting `fuzzy_transpositions` to `false`.
|
|
|
|
NOTE: Fuzzy matching is not applied to terms with synonyms or in cases where the
|
|
analysis process produces multiple tokens at the same position. Under the hood
|
|
these terms are expanded to a special synonym query that blends term frequencies,
|
|
which does not support fuzzy expansion.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": {
|
|
"query": "this is a testt",
|
|
"fuzziness": "AUTO"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[[query-dsl-match-query-zero]]
|
|
===== Zero terms query
|
|
If the analyzer used removes all tokens in a query like a `stop` filter
|
|
does, the default behavior is to match no documents at all. In order to
|
|
change that the `zero_terms_query` option can be used, which accepts
|
|
`none` (default) and `all` which corresponds to a `match_all` query.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": {
|
|
"query": "to be or not to be",
|
|
"operator": "and",
|
|
"zero_terms_query": "all"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[[query-dsl-match-query-cutoff]]
|
|
===== Cutoff frequency
|
|
|
|
deprecated[7.3.0,"This option can be omitted as the <<query-dsl-match-query>> can skip blocks of documents efficiently, without any configuration, provided that the total number of hits is not tracked."]
|
|
|
|
The match query supports a `cutoff_frequency` that allows
|
|
specifying an absolute or relative document frequency where high
|
|
frequency terms are moved into an optional subquery and are only scored
|
|
if one of the low frequency (below the cutoff) terms in the case of an
|
|
`or` operator or all of the low frequency terms in the case of an `and`
|
|
operator match.
|
|
|
|
This query allows handling `stopwords` dynamically at runtime, is domain
|
|
independent and doesn't require a stopword file. It prevents scoring /
|
|
iterating high frequency terms and only takes the terms into account if a
|
|
more significant / lower frequency term matches a document. Yet, if all
|
|
of the query terms are above the given `cutoff_frequency` the query is
|
|
automatically transformed into a pure conjunction (`and`) query to
|
|
ensure fast execution.
|
|
|
|
The `cutoff_frequency` can either be relative to the total number of
|
|
documents if in the range from 0 (inclusive) to 1 (exclusive) or absolute if greater or equal to
|
|
`1.0`.
|
|
|
|
Here is an example showing a query composed of stopwords exclusively:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"message": {
|
|
"query": "to be or not to be",
|
|
"cutoff_frequency": 0.001
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[warning:Deprecated field [cutoff_frequency] used, replaced by [you can omit this option, the [match] query can skip block of documents efficiently if the total number of hits is not tracked]]
|
|
|
|
IMPORTANT: The `cutoff_frequency` option operates on a per-shard-level. This means
|
|
that when trying it out on test indexes with low document numbers you
|
|
should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken].
|
|
|
|
[[query-dsl-match-query-synonyms]]
|
|
===== Synonyms
|
|
|
|
The `match` query supports multi-terms synonym expansion with the <<analysis-synonym-graph-tokenfilter,
|
|
synonym_graph>> token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
|
|
For example, the following synonym: `"ny, new york"` would produce:
|
|
|
|
`(ny OR ("new york"))`
|
|
|
|
It is also possible to match multi terms synonyms with conjunctions instead:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /_search
|
|
{
|
|
"query": {
|
|
"match" : {
|
|
"message": {
|
|
"query" : "ny city",
|
|
"auto_generate_synonyms_phrase_query" : false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The example above creates a boolean query:
|
|
|
|
`(ny OR (new AND york)) city`
|
|
|
|
that matches documents with the term `ny` or the conjunction `new AND york`.
|
|
By default the parameter `auto_generate_synonyms_phrase_query` is set to `true`.
|
|
|