235 lines
7.2 KiB
Plaintext
235 lines
7.2 KiB
Plaintext
[[query-dsl-match-query]]
|
|
=== Match Query
|
|
|
|
A family of `match` queries that accept text/numerics/dates, analyzes
|
|
it, and constructs a query out of it. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match" : {
|
|
"message" : "this is a test"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Note, `message` is the name of a field, you can substitute the name of
|
|
any field (including `_all`) instead.
|
|
|
|
[float]
|
|
==== Types of Match Queries
|
|
|
|
[float]
|
|
===== boolean
|
|
|
|
The default `match` query is of type `boolean`. It means that the text
|
|
provided is analyzed and the analysis process constructs a boolean query
|
|
from the provided text. The `operator` flag can be set to `or` or `and`
|
|
to control the boolean clauses (defaults to `or`). The minimum number of
|
|
should clauses to match can be set using the
|
|
<<query-dsl-minimum-should-match,`minimum_should_match`>>
|
|
parameter.
|
|
|
|
The `analyzer` can be set to control which analyzer will perform the
|
|
analysis process on the text. It default to the field explicit mapping
|
|
definition, or the default search analyzer.
|
|
|
|
`fuzziness` allows _fuzzy matching_ based on the type of field being queried.
|
|
See <<fuzziness>> for allowed settings.
|
|
|
|
The `prefix_length` and
|
|
`max_expansions` can be set in this case to control the fuzzy process.
|
|
If the fuzzy option is set the query will use `constant_score_rewrite`
|
|
as its <<query-dsl-multi-term-rewrite,rewrite
|
|
method>> the `fuzzy_rewrite` parameter allows to control how the query will get
|
|
rewritten.
|
|
|
|
Here is an example when providing additional parameters (note the slight
|
|
change in structure, `message` is the field name):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match" : {
|
|
"message" : {
|
|
"query" : "this is a test",
|
|
"operator" : "and"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
.zero_terms_query
|
|
If the analyzer used removes all tokens in a query like a `stop` filter
|
|
does, the default behavior is to match no documents at all. In order to
|
|
change that the `zero_terms_query` option can be used, which accepts
|
|
`none` (default) and `all` which corresponds to a `match_all` query.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match" : {
|
|
"message" : {
|
|
"query" : "to be or not to be",
|
|
"operator" : "and",
|
|
"zero_terms_query": "all"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
.cutoff_frequency
|
|
The match query supports a `cutoff_frequency` that allows
|
|
specifying an absolute or relative document frequency where high
|
|
frequent terms are moved into an optional subquery and are only scored
|
|
if one of the low frequent (below the cutoff) terms in the case of an
|
|
`or` operator or all of the low frequent terms in the case of an `and`
|
|
operator match.
|
|
|
|
This query allows handling `stopwords` dynamically at runtime, is domain
|
|
independent and doesn't require on a stopword file. It prevent scoring /
|
|
iterating high frequent terms and only takes the terms into account if a
|
|
more significant / lower frequent terms match a document. Yet, if all of
|
|
the query terms are above the given `cutoff_frequency` the query is
|
|
automatically transformed into a pure conjunction (`and`) query to
|
|
ensure fast execution.
|
|
|
|
The `cutoff_frequency` can either be relative to the number of documents
|
|
in the index if in the range `[0..1)` or absolute if greater or equal to
|
|
`1.0`.
|
|
|
|
Note: If the `cutoff_frequency` is used and the operator is `and`
|
|
_stacked tokens_ (tokens that are on the same position like `synonym` filter emits)
|
|
are not handled gracefully as they are in a pure `and` query. For instance the query
|
|
`fast fox` is analyzed into 3 terms `[fast, quick, fox]` where `quick` is a synonym
|
|
for `fast` on the same token positions the query might require `fast` and `quick` to
|
|
match if the operator is `and`.
|
|
|
|
Here is an example showing a query composed of stopwords exclusivly:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match" : {
|
|
"message" : {
|
|
"query" : "to be or not to be",
|
|
"cutoff_frequency" : 0.001
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
===== phrase
|
|
|
|
The `match_phrase` query analyzes the text and creates a `phrase` query
|
|
out of the analyzed text. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match_phrase" : {
|
|
"message" : "this is a test"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Since `match_phrase` is only a `type` of a `match` query, it can also be
|
|
used in the following manner:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match" : {
|
|
"message" : {
|
|
"query" : "this is a test",
|
|
"type" : "phrase"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
A phrase query matches terms up to a configurable `slop`
|
|
(which defaults to 0) in any order. Transposed terms have a slop of 2.
|
|
|
|
The `analyzer` can be set to control which analyzer will perform the
|
|
analysis process on the text. It default to the field explicit mapping
|
|
definition, or the default search analyzer, for example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match_phrase" : {
|
|
"message" : {
|
|
"query" : "this is a test",
|
|
"analyzer" : "my_analyzer"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
===== match_phrase_prefix
|
|
|
|
The `match_phrase_prefix` is the same as `match_phrase`, except that it
|
|
allows for prefix matches on the last term in the text. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match_phrase_prefix" : {
|
|
"message" : "this is a test"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Or:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match" : {
|
|
"message" : {
|
|
"query" : "this is a test",
|
|
"type" : "phrase_prefix"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
It accepts the same parameters as the phrase type. In addition, it also
|
|
accepts a `max_expansions` parameter that can control to how many
|
|
prefixes the last term will be expanded. It is highly recommended to set
|
|
it to an acceptable value to control the execution time of the query.
|
|
For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"match_phrase_prefix" : {
|
|
"message" : {
|
|
"query" : "this is a test",
|
|
"max_expansions" : 10
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
==== Comparison to query_string / field
|
|
|
|
The match family of queries does not go through a "query parsing"
|
|
process. It does not support field name prefixes, wildcard characters,
|
|
or other "advance" features. For this reason, chances of it failing are
|
|
very small / non existent, and it provides an excellent behavior when it
|
|
comes to just analyze and run that text as a query behavior (which is
|
|
usually what a text search box does). Also, the `phrase_prefix` type can
|
|
provide a great "as you type" behavior to automatically load search
|
|
results.
|
|
|
|
[float]
|
|
==== Other options
|
|
|
|
* `lenient` - If set to true will cause format based failures (like
|
|
providing text to a numeric field) to be ignored. Defaults to false.
|