14 KiB
layout | title | parent | grand_parent | nav_order |
---|---|---|---|---|
default | Match | Full-text queries | Query DSL | 10 |
Match query
Use the match
query for full-text search on a specific document field. If you run a match
query on a text
field, the match
query analyzes the provided search string and returns documents that match any of the string's terms. If you run a match
query on an exact-value field, it returns documents that match the exact value. The preferred way to search exact-value fields is to use a filter because, unlike a query, a filter is cached.
The following example shows a basic match
query for the word wind
in the title
:
GET _search
{
"query": {
"match": {
"title": "wind"
}
}
}
{% include copy-curl.html %}
To pass additional parameters, you can use the expanded syntax:
GET _search
{
"query": {
"match": {
"title": {
"query": "wind",
"analyzer": "stop"
}
}
}
}
{% include copy-curl.html %}
Examples
In the following examples, you'll use the index that contains the following documents:
PUT testindex/_doc/1
{
"title": "Let the wind rise"
}
{% include copy-curl.html %}
PUT testindex/_doc/2
{
"title": "Gone with the wind"
}
{% include copy-curl.html %}
PUT testindex/_doc/3
{
"title": "Rise is gone"
}
{% include copy-curl.html %}
Operator
If a match
query is run on a text
field, the text is analyzed with the analyzer specified in the analyzer
parameter. Then the resulting tokens are combined into a Boolean query using the operator specified in the operator
parameter. The default operator is OR
, so the query wind rise
is changed into wind OR rise
. In this example, this query returns documents 1--3 because each document has a term that matches the query. To specify the and
operator, use the following query:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wind rise",
"operator": "and"
}
}
}
}
{% include copy-curl.html %}
The query is constructed as wind AND rise
and returns document 1 as the matching document:
Response
{: .text-delta}{
"took": 17,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
Minimum should match
You can control the minimum number of terms that a document must match to be returned in the results by specifying the minimum_should_match
parameter:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wind rise",
"operator": "or",
"minimum_should_match": 2
}
}
}
}
{% include copy-curl.html %}
Now documents are required to match both terms, so only document 1 is returned (this is equivalent to the and
operator):
Response
{: .text-delta}{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
Analyzer
Because in this example you didn't explicitly specify the analyzer, the default standard
analyzer is used. The default analyzer does not perform stemming, so if you run a query the wind rises
, you receive no results because the token rises
does not match the token rise
. To change the search analyzer, specify it in the analyzer
field. For example, the following query uses the english
analyzer:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "the wind rises",
"operator": "and",
"analyzer": "english"
}
}
}
}
{% include copy-curl.html %}
The english
analyzer removes the stopword the
and performs stemming, producing the tokens wind
and rise
. The latter token matches document 1, which is returned in the results:
Response
{: .text-delta}{
"took": 19,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
Empty query
In some cases, an analyzer might remove all tokens from a query. For example, the english
analyzer removes stop words, so in a query and OR or
, all tokens are removed. To check the analyzer behavior, you can use the Analyze API:
GET testindex/_analyze
{
"analyzer" : "english",
"text" : "and OR or"
}
{% include copy-curl.html %}
As expected, the query produces no tokens:
{
"tokens": []
}
You can specify the behavior for an empty query in the zero_terms_query
parameter. Setting zero_terms_query
to all
returns all documents in the index and setting it to none
returns no documents:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "and OR or",
"analyzer" : "english",
"zero_terms_query": "all"
}
}
}
}
{% include copy-curl.html %}
Fuzziness
To account for typos, you can specify fuzziness
for your query as either of the following:
- An integer that specifies the maximum allowed Levenshtein distance for this edit.
AUTO
:- Strings of 0–2 characters must match exactly.
- Strings of 3–5 characters allow 1 edit.
- Strings longer than 5 characters allow 2 edits.
Setting fuzziness
to the default AUTO
value works best in most cases:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO"
}
}
}
}
{% include copy-curl.html %}
The token wnid
matches wind
and the query returns documents 1 and 2:
Response
{: .text-delta}{
"took": 31,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.47501624,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 0.47501624,
"_source": {
"title": "Let the wind rise"
}
},
{
"_index": "testindex",
"_id": "2",
"_score": 0.47501624,
"_source": {
"title": "Gone with the wind"
}
}
]
}
}
Prefix length
Misspellings rarely occur in the beginning of words. Thus, you can specify the minimum length the matched prefix must be to return a document in the results. For example, you can change the preceding query to include a prefix_length
:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO",
"prefix_length": 2
}
}
}
}
{% include copy-curl.html %}
The preceding query returns no results. If you change the prefix_length
to 1, documents 1 and 2 are returned because the first letter of the token wnid
is not misspelled.
Transpositions
In the preceding example, the word wnid
contained a transposition (in
was changed to ni
). By default, transpositions are allowed in fuzzy matching, but you can disallow them by setting fuzzy_transpositions
to false
:
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO",
"fuzzy_transpositions": false
}
}
}
}
{% include copy-curl.html %}
Now the query returns no results.
Synonyms
If you use a synonym_graph
filter and auto_generate_synonyms_phrase_query
is set to true
(default), OpenSearch parses the query into terms and then combines the terms to generate a phrase query for multi-term synonyms. For example, if you specify ba,batting average
as synonyms and search for ba
, OpenSearch searches for ba OR "batting average"
.
To match multi-term synonyms with conjunctions, set auto_generate_synonyms_phrase_query
to false
:
GET /testindex/_search
{
"query": {
"match": {
"text": {
"query": "good ba",
"auto_generate_synonyms_phrase_query": false
}
}
}
}
{% include copy-curl.html %}
The query produced is ba OR (batting AND average)
.
Parameters
The query accepts the name of the field (<field>
) as a top-level parameter:
GET _search
{
"query": {
"match": {
"<field>": {
"query": "text to search for",
...
}
}
}
}
{% include copy-curl.html %}
The <field>
accepts the following parameters. All parameters except query
are optional.
Parameter | Data type | Description |
---|---|---|
query |
String | The query string to use for search. Required. |
auto_generate_synonyms_phrase_query |
Boolean | Specifies whether to create a match phrase query automatically for multi-term synonyms. For example, if you specify ba,batting average as synonyms and search for ba , OpenSearch searches for ba OR "batting average" (if this option is true ) or ba OR (batting AND average) (if this option is false ). Default is true . |
analyzer |
String | The analyzer used to tokenize the query string text. Default is the index-time analyzer specified for the default_field . If no analyzer is specified for the default_field , the analyzer is the default analyzer for the index. |
boost |
Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is 1 . |
enable_position_increments |
Boolean | When true , resulting queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. Default is true . |
fuzziness |
String | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between wined and wind is 1. Valid values are non-negative integers or AUTO . The default, AUTO , chooses a value based on the length of each term and is a good choice for most use cases. |
fuzzy_rewrite |
String | Determines how OpenSearch rewrites the query. Valid values are constant_score , scoring_boolean , constant_score_boolean , top_terms_N , top_terms_boost_N , and top_terms_blended_freqs_N . If the fuzziness parameter is not 0 , the query uses a fuzzy_rewrite method of top_terms_blended_freqs_${max_expansions} by default. Default is constant_score . |
fuzzy_transpositions |
Boolean | Setting fuzzy_transpositions to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the fuzziness option. For example, the distance between wind and wnid is 1 if fuzzy_transpositions is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If fuzzy_transpositions is false, rewind and wnid have the same distance (2) from wind , despite the more human-centric opinion that wnid is an obvious typo. The default is a good choice for most use cases. |
lenient |
Boolean | Setting lenient to true ignores data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type float . Default is false . |
max_expansions |
Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in fuzziness . Then OpenSearch tries to match those terms. Default is 50 . |
minimum_should_match |
Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the or operator, the number of terms that need to match for the document to be considered a match. For example, if minimum_should_match is 2, wind often rising does not match The Wind Rises. If minimum_should_match is 1 , it matches. For details, see Minimum should match. |
operator |
String | If the query string contains multiple search terms, whether all terms need to match (AND ) or only one term needs to match (OR ) for a document to be considered a match. Valid values are:- OR : The string to be is interpreted as to OR be - AND : The string to be is interpreted as to AND be Default is OR . |
prefix_length |
Non-negative integer | The number of leading characters that are not considered in fuzziness. Default is 0 . |
zero_terms_query |
String | In some cases, the analyzer removes all terms from a query string. For example, the stop analyzer removes all terms from the string an but this . In those cases, zero_terms_query specifies whether to match no documents (none ) or all documents (all ). Valid values are none and all . Default is none . |