Add full-text query documentation (#5428)

* Refactor full-text query documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add examples and parameter descriptions

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add multi-match query

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add query string field format

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Query string examples

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add regular expressions and fuzziness

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add wildcard and regex warning

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Added more query string format

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Added multi-field sections

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Rewrite minimum should match section

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Added allow expensive queries section

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add simple query string query

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Small rewrites

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add intervals query

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Include discover in query string syntax

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Link and index page fix

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
This commit is contained in:
kolchfa-aws 2023-11-01 09:29:13 -04:00 committed by GitHub
parent b5ed6c7b1b
commit 88d06e13bd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
25 changed files with 3529 additions and 528 deletions

View File

@ -50,10 +50,10 @@ The flat object field type supports the following queries:
- [Terms set]({{site.url}}{{site.baseurl}}/query-dsl/term/terms-set/)
- [Prefix]({{site.url}}{{site.baseurl}}/query-dsl/term/prefix/)
- [Range]({{site.url}}{{site.baseurl}}/query-dsl/term/range/)
- [Match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#match)
- [Multi-match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#multi-match)
- [Match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match/)
- [Multi-match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/multi-match/)
- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/)
- [Simple query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#simple-query-string)
- [Simple query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/simple-query-string/)
- [Exists]({{site.url}}{{site.baseurl}}/query-dsl/term/exists/)
## Limitations

View File

@ -10,7 +10,7 @@ redirect_from:
- /query-dsl/query-dsl/compound/bool/
---
# Boolean queries
# Boolean query
A Boolean (`bool`) query can combine several query clauses into one advanced query. The clauses are combined with Boolean logic to find matching documents returned in the results.

View File

@ -8,7 +8,7 @@ redirect_from:
- /query-dsl/query-dsl/compound/boosting/
---
# Boosting queries
# Boosting query
If you're searching for the word "pitcher", your results may relate to either baseball players or containers for liquids. For a search in the context of baseball, you might want to completely exclude results that contain the words "glass" or "water" by using the `must_not` clause. However, if you want to keep those results but downgrade them in relevance, you can do so with `boosting` queries.

View File

@ -8,9 +8,9 @@ redirect_from:
- /query-dsl/query-dsl/compound/constant-score/
---
# Constant score queries
# Constant score query
If you need to return documents that contain a certain word regardless of how many times the word appears, you can use a `constant_ score` query. A `constant_score` query wraps a filter query and assigns all documents in the results a relevance score equal to the value of the `boost` parameter. Thus, all returned documents have an equal relevance score, and term frequency/inverse document frequency (TF/IDF) is not considered. Filter queries do not calculate relevance scores. Further, OpenSearch caches frequently used filter queries to improve performance.
If you need to return documents that contain a certain word regardless of how many times the word appears, you can use a `constant_score` query. A `constant_score` query wraps a filter query and assigns all documents in the results a relevance score equal to the value of the `boost` parameter. Thus, all returned documents have an equal relevance score, and term frequency/inverse document frequency (TF/IDF) is not considered. Filter queries do not calculate relevance scores. Further, OpenSearch caches frequently used filter queries to improve performance.
## Example

View File

@ -8,7 +8,7 @@ redirect_from:
- /query-dsl/query-dsl/compound/disjunction-max/
---
# Disjunction max queries
# Disjunction max query
A disjunction max (`dis_max`) query returns any document that matches one or more query clauses. For documents that match multiple query clauses, the relevance score is set to the highest relevance score from all matching query clauses.
@ -25,6 +25,7 @@ PUT testindex1/_doc/1
"description": "Top 10 sonnets of England's national poet and the Bard of Avon"
}
```
{% include copy-curl.html %}
```json
PUT testindex1/_doc/2
@ -33,8 +34,9 @@ PUT testindex1/_doc/2
"body": "The poems written by various 16-th century poets"
}
```
{% include copy-curl.html %}
Use a `dis_max` query to search for documents that contain the words "Shakespeare works":
Use a `dis_max` query to search for documents that contain the words "Shakespeare poems":
```json
GET testindex1/_search

View File

@ -9,7 +9,7 @@ redirect_from:
- /query-dsl/query-dsl/compound/function-score/
---
# Function score queries
# Function score query
Use a `function_score` query if you need to alter the relevance scores of documents returned in the results. A `function_score` query defines a query and one or more functions that can be applied to all results or subsets of the results to recalculate their relevance scores.

View File

@ -2,9 +2,10 @@
layout: default
title: Compound queries
has_children: true
has_toc: false
nav_order: 40
redirect_from:
- /opensearch/query-dsl/compound/index/
- /query-dsl/compound/index/
- /query-dsl/query-dsl/compound/
---
@ -12,10 +13,13 @@ redirect_from:
Compound queries serve as wrappers for multiple leaf or compound clauses either to combine their results or to modify their behavior.
OpenSearch supports the following compound query types:
The following table lists all compound query types.
- **Boolean**: Combines multiple query clauses with Boolean logic. To learn more, see [Boolean queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/bool/).
- **Constant score**: Wraps a query or a filter and assigns a constant score to all matching documents. This score is equal to the `boost` value.
- **Disjunction max**: Returns documents that match one or more query clauses. If a document matches multiple query clauses, it is assigned a higher relevance score. The relevance score is calculated using the highest score from any matching clause and, optionally, the scores from the other matching clauses multiplied by the tiebreaker value.
- **Function score**: Recalculates the relevance score of documents that are returned by a query using a function that you define.
- **Boosting**: Changes the relevance score of documents without removing them from the search results. Returns documents that match a `positive` query, but downgrades the relevance of documents in the results that match a `negative` query.
Query type | Description
:--- | :---
[`bool`]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) (Boolean)| Combines multiple query clauses with Boolean logic.
[`boosting`]({{site.url}}{{site.baseurl}}/query-dsl/compound/boosting/) | Changes the relevance score of documents without removing them from the search results. Returns documents that match a `positive` query, but downgrades the relevance of documents in the results that match a `negative` query.
[`constant_score`]({{site.url}}{{site.baseurl}}/query-dsl/compound/constant-score/) | Wraps a query or a filter and assigns a constant score to all matching documents. This score is equal to the `boost` value.
[`dis_max`]({{site.url}}{{site.baseurl}}/query-dsl/compound/disjunction-max/) (disjunction max) | Returns documents that match one or more query clauses. If a document matches multiple query clauses, it is assigned a higher relevance score. The relevance score is calculated using the highest score from any matching clause and, optionally, the scores from the other matching clauses multiplied by the tiebreaker value.
[`function_score`]({{site.url}}{{site.baseurl}}/query-dsl/compound/function-score/) | Recalculates the relevance score of documents that are returned by a query using a function that you define.
[`hybrid`]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/) | Combines relevance scores from multiple queries into one score for a given document.

View File

@ -2,6 +2,7 @@
layout: default
title: Full-text queries
has_children: true
has_toc: false
nav_order: 30
redirect_from:
- /opensearch/query-dsl/full-text/
@ -20,459 +21,15 @@ To learn more about search query classes, see [Lucene query JavaDocs](https://lu
The full-text query types shown in this section use the standard analyzer, which analyzes text automatically when the query is submitted.
<!-- to do: rewrite query type definitions per issue: https://github.com/opensearch-project/documentation-website/issues/1116
-->
---
#### Table of contents
1. TOC
{:toc}
---
Common terms queries and the optional query field `cutoff_frequency` are now deprecated.
{: .note }
## Query types
OpenSearch Query DSL provides multiple query types that you can use in your searches.
### Match
Use the `match` query for full-text search of a specific document field. The `match` query analyzes the provided search string and returns documents that match any of the string's terms.
You can use Boolean query operators to combine searches.
<!-- we don't need to include Lucene query definitions >
Creates a [boolean query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
-->
The following example shows a basic `match` search for the `title` field set to the value `wind`:
```json
GET _search
{
"query": {
"match": {
"title": "wind"
}
}
}
```
For an example that uses [curl](https://curl.haxx.se/), try:
```bash
curl --insecure -XGET -u 'admin:admin' https://<host>:<port>/<index>/_search \
-H "content-type: application/json" \
-d '{
"query": {
"match": {
"title": "wind"
}
}
}'
```
The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options).
```json
GET _search
{
"query": {
"match": {
"title": {
"query": "wind",
"fuzziness": "AUTO",
"fuzzy_transpositions": true,
"operator": "or",
"minimum_should_match": 1,
"analyzer": "standard",
"zero_terms_query": "none",
"lenient": false,
"prefix_length": 0,
"max_expansions": 50,
"boost": 1
}
}
}
}
```
### Multi-match
You can use the `multi_match` query type to search multiple fields. Multi-match operation functions similarly to the [match](#match) operation.
The `^` lets you "boost" certain fields. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. In the following example, a match for "wind" in the title field influences `_score` four times as much as a match in the plot field. The result is that films like *The Wind Rises* and *Gone with the Wind* are near the top of the search results, and films like *Twister* and *Sharknado*, which presumably have "wind" in their plot summaries, are near the bottom.
```json
GET _search
{
"query": {
"multi_match": {
"query": "wind",
"fields": ["title^4", "plot"]
}
}
}
```
The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options).
```json
GET _search
{
"query": {
"multi_match": {
"query": "wind",
"fields": ["title^4", "description"],
"type": "most_fields",
"operator": "and",
"minimum_should_match": 3,
"tie_breaker": 0.0,
"analyzer": "standard",
"boost": 1,
"fuzziness": "AUTO",
"fuzzy_transpositions": true,
"lenient": false,
"prefix_length": 0,
"max_expansions": 50,
"auto_generate_synonyms_phrase_query": true,
"zero_terms_query": "none"
}
}
}
```
### Match Boolean prefix
The `match_bool_prefix` query analyzes the provided search string and creates a `bool` query from the string's terms. It uses every term except the last term as a whole word for matching. The last term is used as a prefix. The `match_bool_prefix` query returns documents that contain either the whole-word terms or terms that start with the prefix term, in any order.
```json
GET _search
{
"query": {
"match_bool_prefix": {
"title": "rises wi"
}
}
}
```
The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options).
```json
GET _search
{
"query": {
"match_bool_prefix": {
"title": {
"query": "rises wi",
"fuzziness": "AUTO",
"fuzzy_transpositions": true,
"max_expansions": 50,
"prefix_length": 0,
"operator": "or",
"minimum_should_match": 2,
"analyzer": "standard"
}
}
}
}
```
For more reference information about prefix queries, see the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PrefixQuery.html).
### Match phrase
Use the `match_phrase` query to match documents that contain an exact phrase in a specified order. You can add flexibility to phrase matching by providing the `slop` parameter.
Creates a [phrase query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
```json
GET _search
{
"query": {
"match_phrase": {
"title": "the wind rises"
}
}
}
```
The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options).
```json
GET _search
{
"query": {
"match_phrase": {
"title": {
"query": "wind rises the",
"slop": 3,
"analyzer": "standard",
"zero_terms_query": "none"
}
}
}
}
```
### Match phrase prefix
Use the `match_phrase_prefix` query to specify a phrase to match in order. The documents that contain the phrase you specify will be returned. The last partial term in the phrase is interpreted as a prefix, so any documents that contain phrases that begin with the phrase and prefix of the last term will be returned.
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
{
"query": {
"match_phrase_prefix": {
"title": "the wind ri"
}
}
}
```
The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options).
```json
GET _search
{
"query": {
"match_phrase_prefix": {
"title": {
"query": "the wind ri",
"analyzer": "standard",
"max_expansions": 50,
"slop": 3
}
}
}
}
```
<!-- Common terms query has been deprecated. Saving docs in-case we get a request to add it back later. See code deprecation https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/CommonTermsQueryBuilder.java#L72-L73>
## Common terms
The common terms query separates the query string into high- and low-frequency terms based on number of occurrences on the shard. Low-frequency terms are weighed more heavily in the results, and high-frequency terms are considered only for documents that already matched one or more low-frequency terms. In that sense, you can think of this query as having a built-in, ever-changing list of stop words.
```json
GET _search
{
"query": {
"common": {
"title": {
"query": "the wind rises"
}
}
}
}
```
The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options).
```json
GET _search
{
"query": {
"common": {
"title": {
"query": "the wind rises",
"cutoff_frequency": 0.002,
"low_freq_operator": "or",
"boost": 1,
"analyzer": "standard",
"minimum_should_match": {
"low_freq" : 2,
"high_freq" : 3
}
}
}
}
}
```
-->
### Query string
The query string query splits text based on operators and analyzes each individually.
If you search using the HTTP request parameters (i.e. `_search?q=wind`), OpenSearch creates a query string query.
{: .note }
```json
GET _search
{
"query": {
"query_string": {
"query": "the wind AND (rises OR rising)"
}
}
}
```
The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options).
```json
GET _search
{
"query": {
"query_string": {
"query": "the wind AND (rises OR rising)",
"default_field": "title",
"type": "best_fields",
"fuzziness": "AUTO",
"fuzzy_transpositions": true,
"fuzzy_max_expansions": 50,
"fuzzy_prefix_length": 0,
"minimum_should_match": 1,
"default_operator": "or",
"analyzer": "standard",
"lenient": false,
"boost": 1,
"allow_leading_wildcard": true,
"enable_position_increments": true,
"phrase_slop": 3,
"max_determinized_states": 10000,
"time_zone": "-08:00",
"quote_field_suffix": "",
"quote_analyzer": "standard",
"analyze_wildcard": false,
"auto_generate_synonyms_phrase_query": true
}
}
}
```
### Simple query string
Use the `simple_query_string` type to specify directly in the query string multiple arguments delineated by regular expressions. Searches with this type will discard any invalid portions of the string.
```json
GET _search
{
"query": {
"simple_query_string": {
"query": "\"rises wind the\"~4 | *ising~2",
"fields": ["title"]
}
}
}
```
Special character | Behavior
:--- | :---
`+` | Acts as the `and` operator.
`|` | Acts as the `or` operator.
`*` | Acts as a wildcard.
`""` | Wraps several terms into a phrase.
`()` | Wraps a clause for precedence.
`~n` | When used after a term (for example, `wnid~3`), sets `fuzziness`. When used after a phrase, sets `slop`. [Advanced filter options](#advanced-filter-options).
`-` | Negates the term.
The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options).
```json
GET _search
{
"query": {
"simple_query_string": {
"query": "\"rises wind the\"~4 | *ising~2",
"fields": ["title"],
"flags": "ALL",
"fuzzy_transpositions": true,
"fuzzy_max_expansions": 50,
"fuzzy_prefix_length": 0,
"minimum_should_match": 1,
"default_operator": "or",
"analyzer": "standard",
"lenient": false,
"quote_field_suffix": "",
"analyze_wildcard": false,
"auto_generate_synonyms_phrase_query": true
}
}
}
```
### Match all
The `match_all` query type will return all documents. This type can be useful in testing large document sets if you need to return the entire set.
```json
GET _search
{
"query": {
"match_all": {}
}
}
```
<!-- need to research why a customer would need to match zero documents in a search >
## Match none
Matches no documents. Rarely useful.
```json
GET _search
{
"query": {
"match_none": {}
}
}
```
-->
## Advanced filter options
You can filter your query results by using some of the optional query fields, such as wildcards, fuzzy query fields, or synonyms. You can also use analyzers as optional query fields.
### Wildcard options
Option | Valid values | Description
:--- | :--- | :---
`allow_leading_wildcard` | Boolean | Whether `*` and `?` are allowed as the first character of a search term. The default is `true`.
`analyze_wildcard` | Boolean | Whether OpenSearch should attempt to analyze wildcard terms. Some analyzers do a poor job at this task, so the default is false.
### Fuzzy query options
Option | Valid values | Description
:--- | :--- | :---
`fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`fuzzy_max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indexes.
### Synonyms in a multiple terms search
You can also use synonyms with the `terms` query type to search for multiple terms. Use the `auto_generate_synonyms_phrase_query` Boolean field. By default it is set to `true`. It automatically generates phrase queries for multiple term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` when the option is `true` or `ba OR (batting AND average)` when the option is `false`.
To learn more about the multiple terms query type, see [Terms]({{site.url}}{{site.baseurl}}/query-dsl/term/terms/). For more reference information about phrase queries, see the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html).
### Other advanced options
You can also use the following optional query fields to filter your query results.
Option | Valid values | Description
:--- | :--- | :---
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. The default is 1.0.
`enable_position_increments` | Boolean | When true, result queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. The default is true.
`fields` | String array | The list of fields to search (e.g. `"fields": ["title^4", "description"]`). If unspecified, defaults to the `index.query.default_field` setting, which defaults to `["*"]`.
`flags` | String | A `|`-delimited string of [flags](#simple-query-string) to enable (e.g., `AND|OR|NOT`). The default is `ALL`. You can explicitly set the value for `default_field`. For example, to return all titles, set it to `"default_field": "title"`.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See also `operator` in this table.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_expansions` | Positive integer | `max_expansions` specifies the maximum number of terms to which the query can expand. The default is 50.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches.
`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match.
`phrase_slop` | `0` (default) or a positive integer | See `slop`.
`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness.
`quote_field_suffix` | String | This option lets you search different fields depending on whether terms are wrapped in quotes. For example, if `quote_field_suffix` is `".exact"` and you search for `"lightly"` (in quotes) in the `title` field, OpenSearch searches the `title.exact` field. This second field might use a different type (e.g. `keyword` rather than `text`) or a different analyzer. The default is null.
`rewrite` | `constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, top_terms_blended_freqs_N` | Determines how OpenSearch rewrites and scores multi-term queries. The default is `constant_score`.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`tie_breaker` | `0.0` (default) to `1.0` | Changes the way OpenSearch scores searches. For example, a `type` of `best_fields` typically uses the highest score from any one field. If you specify a `tie_breaker` value between 0.0 and 1.0, the score changes to highest score + `tie_breaker` * score for all other matching fields. If you specify a value of 1.0, OpenSearch adds together the scores for all matching fields (effectively defeating the purpose of `best_fields`).
`time_zone` | UTC offset hours | Specifies the number of hours to offset the desired time zone from `UTC`. You need to indicate the time zone offset number if the query string contains a date range. For example, set `time_zone": "-08:00"` for a query with a date range such as `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default time zone format used to specify number of offset hours is `UTC`.
`type` | `best_fields, most_fields, cross_fields, phrase, phrase_prefix` | Determines how OpenSearch executes the query and scores the results. The default is `best_fields`.
`zero_terms_query` | `none, all` | If the analyzer removes all terms from a query string, whether to match no documents (default) or all documents. For example, the `stop` analyzer removes all terms from the string "an but this."
<!-- cutoff_frequency is now deprecated. See https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/MatchQueryBuilder.java#L61-L72 >
`cutoff_frequency` | Between `0.0` and `1.0` or a positive integer | This value lets you define high and low frequency terms based on number of occurrences in the index. Numbers between 0 and 1 are treated as a percentage. For example, 0.10 is 10%. This value means that if a word occurs within the search field in more than 10% of the documents on the shard, OpenSearch considers the word "high frequency" and deemphasizes it when calculating search score.<br /><br />Because this setting is *per shard*, testing its impact on search results can be challenging unless a cluster has many documents. -->
The following table lists all full-text query types.
Query type | Description
:--- | :---
[`intervals`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/intervals/) | Allows fine-grained control of the matching terms' proximity and order.
[`match`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match/) | The default full-text query, which can be used for fuzzy matching and phrase or proximity searches.
[`match_bool_prefix`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-bool-prefix/) | Creates a [Boolean query]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) that matches all terms in any position, treating the last term as a prefix.
[`match_phrase`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase/) | Similar to the `match` query but matches a whole phrase up to a configurable slop.
[`match_phrase_prefix`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase-prefix/) | Similar to the `match_phrase` query but matches terms as a whole phrase, treating the last term as a prefix.
[`multi_match`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/multi-match/) | Similar to the `match` query but is used on multiple fields.
[`query_string`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/) | Uses a strict syntax to specify Boolean conditions and multi-field search within a single query string.
[`simple_query_string`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/simple-query-string/) | A simpler, less strict version of `query_string` query.

View File

@ -0,0 +1,399 @@
---
layout: default
title: Intervals
nav_order: 80
parent: Full-text queries
grand_parent: Query DSL
---
# Intervals query
The intervals query matches documents based on the proximity and order of matching terms. It applies a set of _matching rules_ to terms contained in the specified field. The query generates sequences of minimal intervals that span terms in the text. You can combine the intervals and filter them by parent sources.
Consider an index containing the following documents:
```json
PUT testindex/_doc/1
{
"title": "key-value pairs are efficiently stored in a hash table"
}
```
{% include copy-curl.html %}
```json
PUT /testindex/_doc/2
{
"title": "store key-value pairs in a hash map"
}
```
{% include copy-curl.html %}
For example, the following query searches for documents containing the phrase `key-value pairs` (with no gap separating the terms) followed by either `hash table` or `hash map`:
```json
GET /testindex/_search
{
"query": {
"intervals": {
"title": {
"all_of": {
"ordered": true,
"intervals": [
{
"match": {
"query": "key-value pairs",
"max_gaps": 0,
"ordered": true
}
},
{
"any_of": {
"intervals": [
{
"match": {
"query": "hash table"
}
},
{
"match": {
"query": "hash map"
}
}
]
}
}
]
}
}
}
}
}
```
{% include copy-curl.html %}
The query returns both documents:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 1011,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.25,
"hits": [
{
"_index": "testindex",
"_id": "2",
"_score": 0.25,
"_source": {
"title": "store key-value pairs in a hash map"
}
},
{
"_index": "testindex",
"_id": "1",
"_score": 0.14285713,
"_source": {
"title": "key-value pairs are efficiently stored in a hash table"
}
}
]
}
}
```
</details>
## Parameters
The query accepts the name of the field (`<field>`) as a top-level parameter:
```json
GET _search
{
"query": {
"intervals": {
"<field>": {
...
}
}
}
}
```
{% include copy-curl.html %}
The `<field>` accepts the following rule objects that are used to match documents based on terms, order, and proximity.
Rule | Description
:--- | :---
[`match`](#the-match-rule) | Matches analyzed text.
[`prefix`](#the-prefix-rule) | Matches terms that start with a specified set of characters.
[`wildcard`](#the-wildcard-rule) | Matches terms using a wildcard pattern.
[`fuzzy`](#the-fuzzy-rule) | Matches terms that are similar to the provided term within a specified edit distance.
[`all_of`](#the-all_of-rule) | Combines multiple rules using a conjunction (`AND`).
[`any_of`](#the-any_of-rule) | Combines multiple rules using a disjunction (`OR`).
## The `match` rule
The `match` rule matches analyzed text. The following table lists all parameters the `match` rule supports.
Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`query` | Required | String | Text for which to search.
`analyzer` | Optional | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to analyze the `query` text. Default is the analyzer specified for the `<field>`.
[`filter`](#the-filter-rule) | Optional | Interval filter rule object | A rule used to filter returned intervals.
`max_gaps` | Optional | Integer | The maximum allowed number of positions between the matching terms. Terms further apart than `max_gaps` are not considered matches. If `max_gaps` is not specified or is set to `-1`, terms are considered matches regardless of their position. If `max_gaps` is set to `0`, matching terms must appear next to each other. Default is `-1`.
`ordered` | Optional | Boolean | Specifies whether matching terms must appear in their specified order. Default is `false`.
`use_field` | Optional | String | Specifies to search this field instead of the top-level <field>. Terms are analyzed using the search analyzer specified for this field. By specifying `use_field`, you can search across multiple fields as if they were all the same field. For example, if you index the same text into stemmed and unstemmed fields, you can search for stemmed tokens that are near unstemmed ones.
## The `prefix` rule
The `prefix` rule matches terms that start with a specified set of characters (prefix). The prefix can expand to match at most 128 terms. If the prefix matches more than 128 terms, an error is returned. The following table lists all parameters the `prefix` rule supports.
Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`prefix` | Required | String | The prefix used to match terms.
`analyzer` | Optional | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to normalize the `prefix`. Default is the analyzer specified for the `<field>`.
`use_field` | Optional | String | Specifies to search this field instead of the top-level <field>. The `prefix` is normalized using the search analyzer specified for this field, unless you specify an `analyzer`.
## The `wildcard` rule
The `wildcard` rule matches terms using a wildcard pattern. The wildcard pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, an error is returned. The following table lists all parameters the `wildcard` rule supports.
Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`pattern` | Required | String | The wildcard pattern used to match terms. Specify `?` to match any single character or `*` to match zero or more characters.
`analyzer` | Optional | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to normalize the `pattern`. Default is the analyzer specified for the `<field>`.
`use_field` | Optional | String | Specifies to search this field instead of the top-level <field>. The `prefix` is normalized using the search analyzer specified for this field, unless you specify an `analyzer`.
Specifying patterns that start with `*` or `?` can hinder search performance because it increases the number of iterations required to match terms.
{: .important}
## The `fuzzy` rule
The `fuzzy` rule matches terms that are similar to the provided term within a specified edit distance. The fuzzy pattern can expand to match at most 128 terms. If the pattern matches more than 128 terms, an error is returned. The following table lists all parameters the `fuzzy` rule supports.
Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`term` | Required | String | The term to match.
`analyzer` | Optional | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to normalize the `term`. Default is the analyzer specified for the `<field>`.
`fuzziness` | Optional | String | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. Valid values are non-negative integers or `AUTO`. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`transpositions` | Optional | Boolean | Setting `transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `transpositions` is `false`, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`prefix_length`| Optional | Integer | The number of beginning characters left unchanged for fuzzy matching. Default is 0.
`use_field` | Optional | String | Specifies to search this field instead of the top-level <field>. The `term` is normalized using the search analyzer specified for this field, unless you specify an `analyzer`.
## The `all_of` rule
The `all_of` rule combines multiple rules using a conjunction (`AND`). The following table lists all parameters the `all_of` rule supports.
Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`intervals` | Required | Array of rule objects | An array of rules to combine. A document must match all rules in order to be returned in the results.
[`filter`](#the-filter-rule) | Optional | Interval filter rule object | A rule used to filter returned intervals.
`max_gaps` | Optional | Integer | The maximum allowed number of positions between the matching terms. Terms further apart than `max_gaps` are not considered matches. If `max_gaps` is not specified or is set to `-1`, terms are considered matches regardless of their position. If `max_gaps` is set to `0`, matching terms must appear next to each other. Default is `-1`.
`ordered` | Optional | Boolean | If `true`, intervals generated by the rules should appear in the specified order. Default is `false`.
## The `any_of` rule
The `any_of` rule combines multiple rules using a disjunction (`OR`). The following table lists all parameters the `any_of` rule supports.
Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`intervals` | Required | Array of rule objects | An array of rules to combine. A document must match at least one rule in order to be returned in the results.
[`filter`](#the-filter-rule) | Optional | Interval filter rule object | A rule used to filter returned intervals.
## The `filter` rule
The `filter` rule is used to restrict the results. The following table lists all parameters the `filter` rule supports.
Parameter | Required/Optional | Data type | Description
:--- | :--- | :--- | :---
`after` | Optional | Query object | A query used to return intervals that follow an interval specified in the filter rule.
`before` | Optional | Query object | A query used to return intervals that are before an interval specified in the filter rule.
`contained_by` | Optional | Query object | A query used to return intervals contained by an interval specified in the filter rule.
`containing` | Optional | Query object | A query used to return intervals that contain an interval specified in the filter rule.
`not_contained_by` | Optional | Query object | A query used to return intervals that are not contained by an interval specified in the filter rule.
`not_containing` | Optional | Query object | A query used to return intervals that do not contain an interval specified in the filter rule.
`not_overlapping` | Optional | Query object | A query used to return intervals that do not overlap with an interval specified in the filter rule.
`overlapping` | Optional | Query object | A query used to return intervals that overlap with an interval specified in the filter rule.
`script` | Optional | Script object | A script used to match documents. This script must return `true` or `false`.
#### Example: Filters
The following query searches for documents containing the words `pairs` and `hash` that are within five positions of each other and don't contain the word `efficiently` between them:
```json
POST /testindex/_search
{
"query": {
"intervals" : {
"title" : {
"match" : {
"query" : "pairs hash",
"max_gaps" : 5,
"filter" : {
"not_containing" : {
"match" : {
"query" : "efficiently"
}
}
}
}
}
}
}
}
```
{% include copy-curl.html %}
The response contains only document 2:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.25,
"hits": [
{
"_index": "testindex",
"_id": "2",
"_score": 0.25,
"_source": {
"title": "store key-value pairs in a hash map"
}
}
]
}
}
```
</details>
#### Example: Script filters
Alternatively, you can write your own script filter to include with the `intervals` query using the following variables:
- `interval.start`: The position (term number) where the interval starts.
- `interval.end`: The position (term number) where the interval ends.
- `interval.gap`: The number of words between the terms.
For example, the following query searches for the words `map` and `hash` that are next to each other within the specified interval. Terms are numbered starting with 0, so in the text `store key-value pairs in a hash map`, `store` is at position 0, `key`is at position `1`, and so on. The specified interval should start after `a` and end before the end of string:
```json
POST /testindex/_search
{
"query": {
"intervals" : {
"title" : {
"match" : {
"query" : "map hash",
"filter" : {
"script" : {
"source" : "interval.start > 5 && interval.end < 8 && interval.gaps == 0"
}
}
}
}
}
}
}
```
{% include copy-curl.html %}
The response contains document 2:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5,
"hits": [
{
"_index": "testindex",
"_id": "2",
"_score": 0.5,
"_source": {
"title": "store key-value pairs in a hash map"
}
}
]
}
}
```
</details>
## Interval minimization
To ensure that queries run in linear time, the `intervals` query minimizes the intervals. For example, consider a document containing the text `a b c d c`. You can use the following query to search for `d` that is contained by `a` and `c`:
```json
POST /testindex/_search
{
"query": {
"intervals" : {
"my_text" : {
"match" : {
"query" : "d",
"filter" : {
"contained_by" : {
"match" : {
"query" : "a c"
}
}
}
}
}
}
}
}
```
{% include copy-curl.html %}
The query returns no results because it matches the first two terms `a c` and finds no `d` between these terms.

View File

@ -0,0 +1,230 @@
---
layout: default
title: Match Boolean prefix
parent: Full-text queries
grand_parent: Query DSL
nav_order: 20
---
# Match Boolean prefix query
The `match_bool_prefix` query analyzes the provided search string and creates a [Boolean query]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) from the string's terms. It uses every term except the last term as a whole word for matching. The last term is used as a prefix. The `match_bool_prefix` query returns documents that contain either the whole-word terms or terms that start with the prefix term, in any order.
The following example shows a basic `match_bool_prefix` query:
```json
GET _search
{
"query": {
"match_bool_prefix": {
"title": "the wind"
}
}
}
```
{% include copy-curl.html %}
To pass additional parameters, you can use the expanded syntax:
```json
GET _search
{
"query": {
"match_bool_prefix": {
"title": {
"query": "the wind",
"analyzer": "stop"
}
}
}
}
```
{% include copy-curl.html %}
## Example
For example, consider an index with the following documents:
```json
PUT testindex/_doc/1
{
"title": "The wind rises"
}
```
{% include copy-curl.html %}
```json
PUT testindex/_doc/2
{
"title": "Gone with the wind"
}
```
{% include copy-curl.html %}
The following `match_bool_prefix` query searches for the whole word `rises` and the words that start with `wi`, in any order:
```json
GET testindex/_search
{
"query": {
"match_bool_prefix": {
"title": "rises wi"
}
}
}
```
{% include copy-curl.html %}
The preceding query is equivalent to the following Boolean query:
```json
GET testindex/_search
{
"query": {
"bool" : {
"should": [
{ "term": { "title": "rises" }},
{ "prefix": { "title": "wi"}}
]
}
}
}
```
The response contains both documents:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.73617,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.73617,
"_source": {
"title": "The wind rises"
}
},
{
"_index": "testindex",
"_id": "2",
"_score": 1,
"_source": {
"title": "Gone with the wind"
}
}
]
}
}
```
</details>
## The `match_bool_prefix` and `match_phrase_prefix` queries
The `match_bool_prefix` query matches terms in any position, while the `match_phrase_prefix` query matches terms as a whole phrase. To illustrate the difference, once again consider the `match_bool_prefix` query from the preceding section:
```json
GET testindex/_search
{
"query": {
"match_bool_prefix": {
"title": "rises wi"
}
}
}
```
{% include copy-curl.html %}
Both `The wind rises` and `Gone with the wind` match the search terms, so the query returns both documents.
Now run a `match_phrase_prefix` query on the same index:
```json
GET testindex/_search
{
"query": {
"match_phrase_prefix": {
"title": "rises wi"
}
}
}
```
{% include copy-curl.html %}
The response returns no documents because none of the documents contain a phrase `rises wi` in the specified order.
## Analyzer
By default, when you run a query on a `text` field, the search text is analyzed using the index analyzer associated with the field. You can specify a different search analyzer in the `analyzer` parameter:
```json
GET testindex/_search
{
"query": {
"match_bool_prefix": {
"title": {
"query": "rise the wi",
"analyzer": "stop"
}
}
}
}
```
{% include copy-curl.html %}
## Parameters
The query accepts the name of the field (`<field>`) as a top-level parameter:
```json
GET _search
{
"query": {
"match_bool_prefix": {
"<field>": {
"query": "text to search for",
...
}
}
}
}
```
{% include copy-curl.html %}
The `<field>` accepts the following parameters. All parameters except `query` are optional.
Parameter | Data type | Description
:--- | :--- | :---
`query` | String | The text, number, Boolean value, or date to use for search. Required.
`analyzer` | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index.
`fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`fuzzy_rewrite` | String | Determines how OpenSearch rewrites the query. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. If the `fuzziness` parameter is not `0`, the query uses a `fuzzy_rewrite` method of `top_terms_blended_freqs_${max_expansions}` by default. Default is `constant_score`.
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`max_expansions` | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, `wind often rising` does not match `The Wind Rises.` If `minimum_should_match` is `1`, it matches. For details, see [Minimum should match]({{site.url}}{{site.baseurl}}/query-dsl/minimum-should-match/).
`operator` | String | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match. Valid values are `or` and `and`. Default is `or`.
`prefix_length` | Non-negative integer | The number of leading characters that are not considered in fuzziness. Default is `0`.
The `fuzziness`, `fuzzy_transpositions`, `fuzzy_rewrite`, `max_expansions`, and `prefix_length` parameters can be applied to the term subqueries constructed for all terms except the final term. They do not have any effect on the prefix query constructed for the final term.
{: .note}

View File

@ -0,0 +1,148 @@
---
layout: default
title: Match phrase prefix
parent: Full-text queries
grand_parent: Query DSL
nav_order: 40
---
# Match phrase prefix query
Use the `match_phrase_prefix` query to specify a phrase to match in order. The documents that contain the phrase you specify will be returned. The last partial term in the phrase is interpreted as a prefix, so any documents that contain phrases that begin with the phrase and prefix of the last term will be returned.
Similar to [match phrase]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase/), but creates a [prefix query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
For differences between the `match_phrase_prefix` and the `match_bool_prefix` queries, see [The `match_bool_prefix` and `match_phrase_prefix` queries]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-bool-prefix/#the-match_bool_prefix-and-match_phrase_prefix-queries).
The following example shows a basic `match_phrase_prefix` query:
```json
GET _search
{
"query": {
"match_phrase_prefix": {
"title": "the wind"
}
}
}
```
{% include copy-curl.html %}
To pass additional parameters, you can use the expanded syntax:
```json
GET _search
{
"query": {
"match_phrase_prefix": {
"title": {
"query": "the wind",
"analyzer": "stop"
}
}
}
}
```
{% include copy-curl.html %}
## Example
For example, consider an index with the following documents:
```json
PUT testindex/_doc/1
{
"title": "The wind rises"
}
```
{% include copy-curl.html %}
```json
PUT testindex/_doc/2
{
"title": "Gone with the wind"
}
```
{% include copy-curl.html %}
The following `match_phrase_prefix` query searches for the whole word `wind`, followed by a word that starts with `ri`:
```json
GET testindex/_search
{
"query": {
"match_phrase_prefix": {
"title": "wind ri"
}
}
}
```
{% include copy-curl.html %}
The response contains the matching document:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.92980814,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 0.92980814,
"_source": {
"title": "The wind rises"
}
}
]
}
}
```
</details>
## Parameters
The query accepts the name of the field (`<field>`) as a top-level parameter:
```json
GET _search
{
"query": {
"match_phrase": {
"<field>": {
"query": "text to search for",
...
}
}
}
}
```
{% include copy-curl.html %}
The `<field>` accepts the following parameters. All parameters except `query` are optional.
Parameter | Data type | Description
:--- | :--- | :---
`query` | String | The query string to use for search. Required.
`analyzer` | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to tokenize the query.
`max_expansions` | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit reorderings of phrases, the slop must be at least two. A value of zero requires an exact match."

View File

@ -0,0 +1,274 @@
---
layout: default
title: Match phrase
parent: Full-text queries
grand_parent: Query DSL
nav_order: 30
---
# Match phrase query
Use the `match_phrase` query to match documents that contain an exact phrase in a specified order. You can add flexibility to phrase matching by providing the `slop` parameter.
The `match_phrase` query creates a [phrase query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
The following example shows a basic `match_phrase` query:
```json
GET _search
{
"query": {
"match_phrase": {
"title": "the wind"
}
}
}
```
{% include copy-curl.html %}
To pass additional parameters, you can use the expanded syntax:
```json
GET _search
{
"query": {
"match_phrase": {
"title": {
"query": "the wind",
"analyzer": "stop"
}
}
}
}
```
{% include copy-curl.html %}
## Example
For example, consider an index with the following documents:
```json
PUT testindex/_doc/1
{
"title": "The wind rises"
}
```
{% include copy-curl.html %}
```json
PUT testindex/_doc/2
{
"title": "Gone with the wind"
}
```
{% include copy-curl.html %}
The following `match_phrase` query searches for the phrase `wind rises`, where the word `wind` is followed by the word `rises`:
```json
GET testindex/_search
{
"query": {
"match_phrase": {
"title": "wind rises"
}
}
}
```
{% include copy-curl.html %}
The response contains the matching document:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 30,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.92980814,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 0.92980814,
"_source": {
"title": "The wind rises"
}
}
]
}
}
```
</details>
## Analyzer
By default, when you run a query on a `text` field, the search text is analyzed using the index analyzer associated with the field. You can specify a different search analyzer in the `analyzer` parameter. For example, the following query uses the `english` analyzer:
```json
GET testindex/_search
{
"query": {
"match_phrase": {
"title": {
"query": "the winds",
"analyzer": "english"
}
}
}
}
```
{% include copy-curl.html %}
The `english` analyzer removes the stopword `the` and performs stemming, producing the token `wind`. Both documents match this token and are returned in the results:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.19363807,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 0.19363807,
"_source": {
"title": "The wind rises"
}
},
{
"_index": "testindex",
"_id": "2",
"_score": 0.17225474,
"_source": {
"title": "Gone with the wind"
}
}
]
}
}
```
</details>
## Slop
If you provide a `slop` parameter, the query tolerates reorderings of the search terms. Slop specifies the number of other words permitted between words in a query phrase. For example, in the following query, the search text is reordered compared to the document text:
```json
GET _search
{
"query": {
"match_phrase": {
"title": {
"query": "wind rises the",
"slop": 3
}
}
}
}
```
The query still returns the matching document:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.44026947,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 0.44026947,
"_source": {
"title": "The wind rises"
}
}
]
}
}
```
</details>
## Empty query
For information about a possible empty query, see the corresponding [match query section]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match/#empty-query).
## Parameters
The query accepts the name of the field (`<field>`) as a top-level parameter:
```json
GET _search
{
"query": {
"match_phrase": {
"<field>": {
"query": "text to search for",
...
}
}
}
}
```
{% include copy-curl.html %}
The `<field>` accepts the following parameters. All parameters except `query` are optional.
Parameter | Data type | Description
:--- | :--- | :---
`query` | String | The query string to use for search. Required.
`analyzer` | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit reorderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`zero_terms_query` | String | In some cases, the analyzer removes all terms from a query string. For example, the `stop` analyzer removes all terms from the string `an but this`. In those cases, `zero_terms_query` specifies whether to match no documents (`none`) or all documents (`all`). Valid values are `none` and `all`. Default is `none`.

View File

@ -0,0 +1,466 @@
---
layout: default
title: Match
parent: Full-text queries
grand_parent: Query DSL
nav_order: 10
---
# Match query
Use the `match` query for full-text search on a specific document field. If you run a `match` query on a [`text`]({{site.url}}/{{site.baseurl}}/field-types/supported-field-types/text/) field, the `match` query [analyzes]({{site.url}}/{{site.baseurl}}/analyzers/index/) the provided search string and returns documents that match any of the string's terms. If you run a `match` query on an exact-value field, it returns documents that match the exact value. The preferred way to search exact-value fields is to use a filter because, unlike a query, a filter is cached.
The following example shows a basic `match` query for the word `wind` in the `title`:
```json
GET _search
{
"query": {
"match": {
"title": "wind"
}
}
}
```
{% include copy-curl.html %}
To pass additional parameters, you can use the expanded syntax:
```json
GET _search
{
"query": {
"match": {
"title": {
"query": "wind",
"analyzer": "stop"
}
}
}
}
```
{% include copy-curl.html %}
## Examples
In the following examples, you'll use the index that contains the following documents:
```json
PUT testindex/_doc/1
{
"title": "Let the wind rise"
}
```
{% include copy-curl.html %}
```json
PUT testindex/_doc/2
{
"title": "Gone with the wind"
}
```
{% include copy-curl.html %}
```json
PUT testindex/_doc/3
{
"title": "Rise is gone"
}
```
{% include copy-curl.html %}
## Operator
If a `match` query is run on a `text` field, the text is analyzed with the analyzer specified in the `analyzer` parameter. Then the resulting tokens are combined into a Boolean query using the operator specified in the `operator` parameter. The default operator is `OR`, so the query `wind rise` is changed into `wind OR rise`. In this example, this query returns documents 1--3 because each document has a term that matches the query. To specify the `and` operator, use the following query:
```json
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wind rise",
"operator": "and"
}
}
}
}
```
{% include copy-curl.html %}
The query is constructed as `wind AND rise` and returns document 1 as the matching document:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
```
</details>
### Minimum should match
You can control the minimum number of terms that a document must match to be returned in the results by specifying the [`minimum_should_match`]({{site.url}}{{site.baseurl}}/query-dsl/minimum-should-match/) parameter:
```json
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wind rise",
"operator": "or",
"minimum_should_match": 2
}
}
}
}
```
{% include copy-curl.html %}
Now documents are required to match both terms, so only document 1 is returned (this is equivalent to the `and` operator):
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
```
</details>
## Analyzer
Because in this example you didn't explicitly specify the analyzer, the default `standard` analyzer is used. The default analyzer does not perform stemming, so if you run a query `the wind rises`, you receive no results because the token `rises` does not match the token `rise`. To change the search analyzer, specify it in the `analyzer` field. For example, the following query uses the `english` analyzer:
```json
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "the wind rises",
"operator": "and",
"analyzer": "english"
}
}
}
}
```
{% include copy-curl.html %}
The `english` analyzer removes the stopword `the` and performs stemming, producing the tokens `wind` and `rise`. The latter token matches document 1, which is returned in the results:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 19,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.2667098,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 1.2667098,
"_source": {
"title": "Let the wind rise"
}
}
]
}
}
```
</details>
## Empty query
In some cases, an analyzer might remove all tokens from a query. For example, the `english` analyzer removes stop words, so in a query `and OR or`, all tokens are removed. To check the analyzer behavior, you can use the [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/#apply-a-built-in-analyzer):
```json
GET testindex/_analyze
{
"analyzer" : "english",
"text" : "and OR or"
}
```
{% include copy-curl.html %}
As expected, the query produces no tokens:
```json
{
"tokens": []
}
```
You can specify the behavior for an empty query in the `zero_terms_query` parameter. Setting `zero_terms_query` to `all` returns all documents in the index and setting it to `none` returns no documents:
```json
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "and OR or",
"analyzer" : "english",
"zero_terms_query": "all"
}
}
}
}
```
{% include copy-curl.html %}
## Fuzziness
To account for typos, you can specify `fuzziness` for your query as either of the following:
- An integer that specifies the maximum allowed [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) for this edit.
- `AUTO`:
- Strings of 02 characters must match exactly.
- Strings of 35 characters allow 1 edit.
- Strings longer than 5 characters allow 2 edits.
Setting `fuzziness` to the default `AUTO` value works best in most cases:
```json
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO"
}
}
}
}
```
{% include copy-curl.html %}
The token `wnid` matches `wind` and the query returns documents 1 and 2:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 31,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.47501624,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 0.47501624,
"_source": {
"title": "Let the wind rise"
}
},
{
"_index": "testindex",
"_id": "2",
"_score": 0.47501624,
"_source": {
"title": "Gone with the wind"
}
}
]
}
}
```
</details>
### Prefix length
Misspellings rarely occur in the beginning of words. Thus, you can specify the minimum length the matched prefix must be to return a document in the results. For example, you can change the preceding query to include a `prefix_length`:
```json
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO",
"prefix_length": 2
}
}
}
}
```
{% include copy-curl.html %}
The preceding query returns no results. If you change the `prefix_length` to 1, documents 1 and 2 are returned because the first letter of the token `wnid` is not misspelled.
### Transpositions
In the preceding example, the word `wnid` contained a transposition (`in` was changed to `ni`). By default, transpositions are allowed in fuzzy matching, but you can disallow them by setting `fuzzy_transpositions` to `false`:
```json
GET testindex/_search
{
"query": {
"match": {
"title": {
"query": "wnid",
"fuzziness": "AUTO",
"fuzzy_transpositions": false
}
}
}
}
```
{% include copy-curl.html %}
Now the query returns no results.
## Synonyms
If you use a `synonym_graph` filter and `auto_generate_synonyms_phrase_query` is set to `true` (default), OpenSearch parses the query into terms and then combines the terms to generate a [phrase query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you specify `ba,batting average` as synonyms and search for `ba`, OpenSearch searches for `ba OR "batting average"`.
To match multi-term synonyms with conjunctions, set `auto_generate_synonyms_phrase_query` to `false`:
```json
GET /testindex/_search
{
"query": {
"match": {
"text": {
"query": "good ba",
"auto_generate_synonyms_phrase_query": false
}
}
}
}
```
{% include copy-curl.html %}
The query produced is `ba OR (batting AND average)`.
## Parameters
The query accepts the name of the field (`<field>`) as a top-level parameter:
```json
GET _search
{
"query": {
"match": {
"<field>": {
"query": "text to search for",
...
}
}
}
}
```
{% include copy-curl.html %}
The `<field>` accepts the following parameters. All parameters except `query` are optional.
Parameter | Data type | Description
:--- | :--- | :---
`query` | String | The query string to use for search. Required.
`auto_generate_synonyms_phrase_query` | Boolean | Specifies whether to create a [match phrase query]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase/) automatically for multi-term synonyms. For example, if you specify `ba,batting average` as synonyms and search for `ba`, OpenSearch searches for `ba OR "batting average"` (if this option is `true`) or `ba OR (batting AND average)` (if this option is `false`). Default is `true`.
`analyzer` | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index.
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`.
`enable_position_increments` | Boolean | When `true`, resulting queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. Default is `true`.
`fuzziness` | String | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. Valid values are non-negative integers or `AUTO`. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`fuzzy_rewrite` | String | Determines how OpenSearch rewrites the query. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. If the `fuzziness` parameter is not `0`, the query uses a `fuzzy_rewrite` method of `top_terms_blended_freqs_${max_expansions}` by default. Default is `constant_score`.
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient` | Boolean | Setting `lenient` to `true` ignores data type mismatches between the query and the document field. For example, a query string of `"8.2"` could match a field of type `float`. Default is `false`.
`max_expansions` | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, `wind often rising` does not match `The Wind Rises.` If `minimum_should_match` is `1`, it matches. For details, see [Minimum should match]({{site.url}}{{site.baseurl}}/query-dsl/minimum-should-match/).
`operator` | String | If the query string contains multiple search terms, whether all terms need to match (`AND`) or only one term needs to match (`OR`) for a document to be considered a match. Valid values are:<br>- `OR`: The string `to be` is interpreted as `to OR be`<br>- `AND`: The string `to be` is interpreted as `to AND be`<br> Default is `OR`.
`prefix_length` | Non-negative integer | The number of leading characters that are not considered in fuzziness. Default is `0`.
`zero_terms_query` | String | In some cases, the analyzer removes all terms from a query string. For example, the `stop` analyzer removes all terms from the string `an but this`. In those cases, `zero_terms_query` specifies whether to match no documents (`none`) or all documents (`all`). Valid values are `none` and `all`. Default is `none`.

View File

@ -0,0 +1,932 @@
---
layout: default
title: Multi-match
parent: Full-text queries
grand_parent: Query DSL
nav_order: 50
---
# Multi-match queries
A multi-match operation functions similarly to the [match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match/) operation. You can use a `multi_match` query to search multiple fields.
The `^` "boosts" certain fields. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. In the following example, a match for "wind" in the title field influences `_score` four times as much as a match in the plot field:
```json
GET _search
{
"query": {
"multi_match": {
"query": "wind",
"fields": ["title^4", "plot"]
}
}
}
```
{% include copy-curl.html %}
The result is that films like *The Wind Rises* and *Gone with the Wind* are near the top of the search results, and films like *Twister*, which presumably have "wind" in their plot summaries, are near the bottom.
You can use wildcards in the field name. For example, the following query will search the `speaker` field and all fields that start with `play_`, for example, `play_name` or `play_title`:
```json
GET _search
{
"query": {
"multi_match": {
"query": "hamlet",
"fields": ["speaker", "play_*"]
}
}
}
```
{% include copy-curl.html %}
If you don't provide the `fields` parameter, `multi_match` query searches the fields specified in the `index.query. Default_field` setting, which defaults to `*`. The default behavior is to extract all fields in the mapping that are eligible for [term-level queries]({{site.url}}{{site.baseurl}}/query-dsl/term/index/), filter the metadata fields, and combine all extracted fields to build a query.
The maximum number of clauses in a query is defined in the `indices.query.bool.max_clause_count` setting, which defaults to 1,024.
{: .note}
## Multi-match query types
OpenSearch supports the following multi-match query types, which differ in the way the query is executed internally:
- [`best_fields`](#best-fields) (default): Returns documents that match any field. Uses the `_score` of the best-matching field.
- [`most_fields`](#most-fields): Returns documents that match any field. Uses a combined score of each matching field.
- [`cross_fields`](#cross-fields): Treats all fields as if they were one field. Processes fields with the same `analyzer` and matches words in any field.
- [`phrase`](#phrase): Runs a `match_phrase` query on each field. Uses the `_score` of the best-matching field.
- [`phrase_prefix`](#phrase-prefix): Runs a `match_phrase_prefix` query on each field. Uses the `_score` of the best-matching field.
- [`bool_prefix`](#boolean-prefix): Runs a `match_bool_prefix` query on each field. Uses a combined score of each matched field.
## Best fields
If you're searching for two words that specify a concept, you want the results where the two words are next to each other to score higher.
For example, consider an index that contains the following scientific articles:
```json
PUT /articles/_doc/1
{
"title": "Aurora borealis",
"description": "Northern lights, or aurora borealis, explained"
}
```
{% include copy-curl.html %}
```json
PUT /articles/_doc/2
{
"title": "Sun deprivation in the Northern countries",
"description": "Using fluorescent lights for therapy"
}
```
{% include copy-curl.html %}
You can search for articles containing `northern lights` in the title or description:
```json
GET articles/_search
{
"query": {
"multi_match" : {
"query": "northern lights",
"type": "best_fields",
"fields": [ "title", "description" ],
"tie_breaker": 0.3
}
}
}
```
{% include copy-curl.html %}
The preceding query is executed as the following [`dis_max`]({{site.url}}{{site.baseurl}}/query-dsl/compound/disjunction-max/) query with a `match` query for each field:
```json
GET /articles/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "northern lights" }},
{ "match": { "description": "northern lights" }}
],
"tie_breaker": 0.3
}
}
}
```
The results contain both documents, but document 1 is scored higher because both words are in the `description` field:
```json
{
"took": 30,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.84407747,
"hits": [
{
"_index": "articles",
"_id": "1",
"_score": 0.84407747,
"_source": {
"title": "Aurora borealis",
"description": "Northern lights, or aurora borealis, explained"
}
},
{
"_index": "articles",
"_id": "2",
"_score": 0.6322521,
"_source": {
"title": "Sun deprivation in the Northern countries",
"description": "Using fluorescent lights for therapy"
}
}
]
}
}
```
The `best_fields` query uses the score of the best-matching field. If you specify a `tie_breaker`, the score is calculated using the following algorithm:
Take the score of the best-matching field and add (`tie_breaker` * `_score`) for all other matching fields.
## Most fields
Use the `most_fields` query for multiple fields that contain the same text that is analyzed in different ways. For example, the original field may contain text analyzed with the `standard` analyzer and another field may contain the same text analyzed with the `english` analyzer, which performs stemming:
```json
PUT /articles
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
```
{% include copy-curl.html %}
Consider the following two documents that are indexed in the `articles` index:
```json
PUT /articles/_doc/1
{
"title": "Buttered toasts"
}
```
{% include copy-curl.html %}
```json
PUT /articles/_doc/2
{
"title": "Buttering a toast"
}
```
{% include copy-curl.html %}
The `standard` analyzer analyzes the title `Buttered toast` into [`buttered`, `toasts`] and the title `Buttering a toast` into [`buttering`, `a`, `toast`]. On the other hand, the `english` analyzer produces the same token list [`butter`, `toast`] for both titles because of stemming.
You can use the `most_fields` query in order to return as many documents as possible:
```json
GET /articles/_search
{
"query": {
"multi_match": {
"query": "buttered toast",
"fields": [
"title",
"title.english"
],
"type": "most_fields"
}
}
}
```
{% include copy-curl.html %}
The preceding query is executed as the following Boolean query:
```json
GET articles/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "buttered toasts" }},
{ "match": { "title.english": "buttered toasts" }}
]
}
}
}
```
To calculate the relevance score, a document's scores for all `match` clauses are added together and then the result is divided by the number of `match` clauses.
Including the `title.english` field retrieves the second document that matches the stemmed tokens:
```json
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.4418206,
"hits": [
{
"_index": "articles",
"_id": "1",
"_score": 1.4418206,
"_source": {
"title": "Buttered toasts"
}
},
{
"_index": "articles",
"_id": "2",
"_score": 0.09304003,
"_source": {
"title": "Buttering a toast"
}
}
]
}
}
```
Because both `title` and `title.english` fields match for the first document, it has a higher relevance score.
## Operator and minimum should match
The `best_fields` and `most_fields` queries generate a match query on a field basis (one per field). Thus, the `minimum_should_match` and `operator` parameters are applied to each field, which is normally not the desired behavior.
For example, consider a `customers` index with the following documents:
```json
PUT customers/_doc/1
{
"first_name": "John",
"last_name": "Doe"
}
```
{% include copy-curl.html %}
```json
PUT customers/_doc/2
{
"first_name": "Jane",
"last_name": "Doe"
}
```
{% include copy-curl.html %}
If you're searching for `John Doe` in the `customers` index, you might construct the following query:
```json
GET customers/_validate/query?explain
{
"query": {
"multi_match" : {
"query": "John Doe",
"type": "best_fields",
"fields": [ "first_name", "last_name" ],
"operator": "and"
}
}
}
```
{% include copy-curl.html %}
The intent of the `and` operator in this query is to find a document that matches `John` and `Doe`. However, the query does not return any results. You can learn how the query is executed by running the Validate API:
```json
GET customers/_validate/query?explain
{
"query": {
"multi_match" : {
"query": "John Doe",
"type": "best_fields",
"fields": [ "first_name", "last_name" ],
"operator": "and"
}
}
}
```
{% include copy-curl.html %}
From the response, you can see that the query is trying to match both `John` and `Doe` to either the `first_name` or `last_name` field:
```json
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"valid": true,
"explanations": [
{
"index": "customers",
"valid": true,
"explanation": "((+first_name:john +first_name:doe) | (+last_name:john +last_name:doe))"
}
]
}
```
Because neither field contains both words, no results are returned.
A better alternative for searching across fields is to use the [`cross_fields`](#cross-fields) query. Unlike the field-centric `best_fields` and `most_fields` queries, `cross_fields` query is term-centric.
## Cross fields
Use the `cross_fields` query to search for data across multiple fields. For example, if an index contains customer data, the first name and last name of the customer reside in different fields. Yet, when you search for `John Doe`, you want to receive documents in which `John` is in the `first_name` field and `Doe` is in the `last_name` field.
The `most_fields` query does not work in this case because of the following problems:
- The [`operator` and `minimum_should_match`](#operator-and-minimum-should-match) parameters are applied on a field basis instead of on a term basis.
- Term frequencies in the `first_name` and `last_name` fields can lead to unexpected results. For example, if someone's first name happens to be `Doe`, a document with this name will be presumed a better match because this first name will not appear in any other documents.
The `cross_fields` query analyzes the query string into individual terms and then searches for each of the terms in any of the fields, as if they were one field.
The following is the `cross_fields` query for `John Doe`:
```json
GET /customers/_search
{
"query": {
"multi_match" : {
"query": "John Doe",
"type": "cross_fields",
"fields": [ "first_name", "last_name" ],
"operator": "and"
}
}
}
```
{% include copy-curl.html %}
The response contains the only document in which both `John` and `Doe` are present:
```json
{
"took": 19,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.8754687,
"hits": [
{
"_index": "customers",
"_id": "1",
"_score": 0.8754687,
"_source": {
"first_name": "John",
"last_name": "Doe"
}
}
]
}
}
```
You can use the Validate API operation to gain insight into how the preceding query is executed:
```json
GET /customers/_validate/query?explain
{
"query": {
"multi_match" : {
"query": "John Doe",
"type": "cross_fields",
"fields": [ "first_name", "last_name" ],
"operator": "and"
}
}
}
```
{% include copy-curl.html %}
From the response, you can see that the query is searching for all terms in at least one field:
```json
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"valid": true,
"explanations": [
{
"index": "customers",
"valid": true,
"explanation": "+blended(terms:[last_name:john, first_name:john]) +blended(terms:[last_name:doe, first_name:doe])"
}
]
}
```
Thus, blending the term frequencies for all fields solves the problem of differing term frequencies by correcting for the differences.
The `cross_fields` query is usually only useful on short string fields with a `boost` of 1. In other cases, the score does not produce a meaningful blend of term statistics because of the way boosts, term frequencies, and length normalization contribute to the score.
{: .note}
The `fuzziness` parameter is not supported for `cross_fields` queries.
{: .note}
### Analysis
The `cross_fields` query only works as a term-centric query on fields with the same analyzer. Fields with the same analyzer are grouped together and these groups are combined with a Boolean query.
For example, consider an index where the `first_name` and `last_name` fields are analyzed with the default `standard`
analyzer and their `.edge` subfields are analyzed with an edge n-gram analyzer:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
PUT customers
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}
}
}
},
"mappings": {
"properties": {
"first_name": {
"type": "text",
"fields": {
"edge": {
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"last_name": {
"type": "text",
"fields": {
"edge": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
}
```
{% include copy-curl.html %}
</details>
You index one document in the `customers` index:
```json
PUT /customers/_doc/1
{
"first": "John",
"last": "Doe"
}
```
{% include copy-curl.html %}
You can use a `cross_fields` query to search across the fields for `John Doe`:
```json
GET /customers/_search
{
"query": {
"multi_match" : {
"query": "John",
"type": "cross_fields",
"fields": [
"first_name", "first_name.edge",
"last_name", "last_name.edge"
]
}
}
}
```
{% include copy-curl.html %}
To see how the query is executed, you can run the Validate API:
```json
GET /customers/_validate/query?explain
{
"query": {
"multi_match" : {
"query": "John",
"type": "cross_fields",
"fields": [
"first_name", "first_name.edge",
"last_name", "last_name.edge"
]
}
}
}
```
{% include copy-curl.html %}
The response shows that the `last_name` and `first_name` fields are grouped together and treated as a single field. Similarly, the `last_name.edge` and `first_name.edge` fields are grouped together and treated as a single field:
```json
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"valid": true,
"explanations": [
{
"index": "customers",
"valid": true,
"explanation": "(blended(terms:[last_name:john, first_name:john]) | (blended(terms:[last_name.edge:Jo, first_name.edge:Jo]) blended(terms:[last_name.edge:Joh, first_name.edge:Joh]) blended(terms:[last_name.edge:John, first_name.edge:John])))"
}
]
}
```
Using the `operator` or `minimum_should_match` parameters with multiple field groups like the preceding ones can lead to the problem described in the [previous section](#operator-and-minimum-should-match). To avoid it, you can rewrite the previous query as two `cross_fields` subqueries combined with a Boolean query and apply the `minimum_should_match` to one of the subqueries:
```json
GET /customers/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "John Doe",
"type": "cross_fields",
"fields": [
"first_name",
"last_name"
],
"minimum_should_match": "1"
}
},
{
"multi_match": {
"query": "John Doe",
"type": "cross_fields",
"fields": [
"first_name.edge",
"last_name.edge"
]
}
}
]
}
}
}
```
{% include copy-curl.html %}
To create one group for all fields, specify an analyzer in your query:
```json
GET customers/_search
{
"query": {
"multi_match" : {
"query": "John Doe",
"type": "cross_fields",
"analyzer": "standard",
"fields": [ "first_name", "last_name", "*.edge" ]
}
}
}
```
{% include copy-curl.html %}
Running the Validate API on the previous query shows how the query is executed:
```json
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"valid": true,
"explanations": [
{
"index": "customers",
"valid": true,
"explanation": "blended(terms:[last_name.edge:john, last_name:john, first_name:john, first_name.edge:john]) blended(terms:[last_name.edge:doe, last_name:doe, first_name:doe, first_name.edge:doe])"
}
]
}
```
## Phrase
The `phrase` query behaves similarly to the [`best_fields`](#best-fields) query but uses a `match_phrase` query instead of a `match` query.
The following is an example `phrase` query for the index described in the [`best_fields`](#best-fields) section:
```json
GET articles/_search
{
"query": {
"multi_match" : {
"query": "northern lights",
"type": "phrase",
"fields": [ "title", "description" ]
}
}
}
```
{% include copy-curl.html %}
The preceding query is executed as the following [`dis_max`]({{site.url}}{{site.baseurl}}/query-dsl/compound/disjunction-max/) query with a `match_phrase` query for each field:
```json
GET articles/_search
{
"query": {
"dis_max": {
"queries": [
{ "match_phrase": { "title": "northern lights" }},
{ "match_phrase": { "description": "northern lights" }}
]
}
}
}
```
Because by default a `phrase` query matches text only when the terms appear in the same order, only document 1 is returned in the results:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.84407747,
"hits": [
{
"_index": "articles",
"_id": "1",
"_score": 0.84407747,
"_source": {
"title": "Aurora borealis",
"description": "Northern lights, or aurora borealis, explained"
}
}
]
}
}
```
</details>
You can use the `slop` parameter to allow other words between words in query phrase. For example, the following query accepts text as a match if up to two words are between `flourescent` and `therapy`:
```json
GET articles/_search
{
"query": {
"multi_match" : {
"query": "fluorescent therapy",
"type": "phrase",
"fields": [ "title", "description" ],
"slop": 2
}
}
}
```
{% include copy-curl.html %}
The response contains document 2:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.7003825,
"hits": [
{
"_index": "articles",
"_id": "2",
"_score": 0.7003825,
"_source": {
"title": "Sun deprivation in the Northern countries",
"description": "Using fluorescent lights for therapy"
}
}
]
}
}
```
</details>
For `slop` values less than 2, no documents are returned.
The `fuzziness` parameter is not supported for `phrase` queries.
{: .note}
## Phrase prefix
The `phrase_prefix` query behaves similarly to the [`phrase`](#phrase) query but uses a `match_phrase_prefix` query instead of a `match_phrase` query.
The following is an example `phrase_prefix` query for the index described in the [`best_fields`](#best-fields) section:
```json
GET articles/_search
{
"query": {
"multi_match" : {
"query": "northern light",
"type": "phrase_prefix",
"fields": [ "title", "description" ]
}
}
}
```
{% include copy-curl.html %}
The preceding query is executed as the following [`dis_max`]({{site.url}}{{site.baseurl}}/query-dsl/compound/disjunction-max/) query with a `match_phrase_prefix` query for each field:
```json
GET articles/_search
{
"query": {
"dis_max": {
"queries": [
{ "match_phrase_prefix": { "title": "northern light" }},
{ "match_phrase_prefix": { "description": "northern light" }}
]
}
}
}
```
You can use the `slop` parameter to allow other words between words in query phrase.
The `fuzziness` parameter is not supported for `phrase_prefix` queries.
{: .note}
## Boolean prefix
The `bool_prefix` query scores documents similarly to the [`most_fields`](#most-fields) query but uses a [`match_bool_prefix`]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-bool-prefix/) query instead of a `match` query.
The following is an example `bool_prefix` query for the index described in the [`best_fields`](#best-fields) section:
```json
GET articles/_search
{
"query": {
"multi_match" : {
"query": "li northern",
"type": "bool_prefix",
"fields": [ "title", "description" ]
}
}
}
```
{% include copy-curl.html %}
The preceding query is executed as the following [`dis_max`]({{site.url}}{{site.baseurl}}/query-dsl/compound/disjunction-max/) query with a `match_bool_prefix` query for each field:
```json
GET articles/_search
{
"query": {
"dis_max": {
"queries": [
{ "match_bool_prefix": { "title": "li northern" }},
{ "match_bool_prefix": { "description": "li northern" }}
]
}
}
}
```
The `fuzziness`, `prefix_length`, `max_expansions`, `fuzzy_rewrite`, and `fuzzy_transpositions` parameters are supported for the terms that are used to construct term queries, but they do not have an effect on the prefix query constructed from the final term.
{: .note}
## Parameters
The query accepts the following parameters. All parameters except `query` are optional.
Parameter | Data type | Description
:--- | :--- | :---
`query` | String | The query string to use for search. Required.
`auto_generate_synonyms_phrase_query` | Boolean | Specifies whether to create a [match phrase query]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase/) automatically for multi-term synonyms. For example, if you specify `ba,batting average` as synonyms and search for `ba`, OpenSearch searches for `ba OR "batting average"` (if this option is `true`) or `ba OR (batting AND average)` (if this option is `false`). Default is `true`.
`analyzer` | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index.
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`.
`fields` | Array of strings | The list of fields in which to search. If you don't provide the `fields` parameter, `multi_match` query searches the fields specified in the `index.query. Default_field` setting, which defaults to `*`.
`fuzziness` | String | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. Valid values are non-negative integers or `AUTO`. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases. Not supported for `phrase`, `phrase_prefix`, and `cross_fields` queries.
`fuzzy_rewrite` | String | Determines how OpenSearch rewrites the query. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. If the `fuzziness` parameter is not `0`, the query uses a `fuzzy_rewrite` method of `top_terms_blended_freqs_${max_expansions}` by default. Default is `constant_score`.
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient` | Boolean | Setting `lenient` to `true` ignores data type mismatches between the query and the document field. For example, a query string of `"8.2"` could match a field of type `float`. Default is `false`.
`max_expansions` | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, `wind often rising` does not match `The Wind Rises.` If `minimum_should_match` is `1`, it matches. For details, see [Minimum should match]({{site.url}}{{site.baseurl}}/query-dsl/minimum-should-match/).
`operator` | String | If the query string contains multiple search terms, whether all terms need to match (`AND`) or only one term needs to match (`OR`) for a document to be considered a match. Valid values are:<br>- `OR`: The string `to be` is interpreted as `to OR be`<br>- `AND`: The string `to be` is interpreted as `to AND be`<br> Default is `OR`.
`prefix_length` | Non-negative integer | The number of leading characters that are not considered in fuzziness. Default is `0`.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit reorderings of phrases, the slop must be at least two. A value of zero requires an exact match." Supported for `phrase` and `phrase_prefix` query types.
`tie_breaker` | Floating-point | A factor between 0 and 1.0 that is used to give more weight to documents that match multiple query clauses. For more information, see [The `tie_breaker` parameter`](#the-tie_breaker-parameter).
`type` | String | The multi-match query type. Valid values are `best_fields`, `most_fields`, `cross_fields`, `phrase`, `phrase_prefix`, `bool_prefix`. Default is `best_fields`.
`zero_terms_query` | String | In some cases, the analyzer removes all terms from a query string. For example, the `stop` analyzer removes all terms from the string `an but this`. In those cases, `zero_terms_query` specifies whether to match no documents (`none`) or all documents (`all`). Valid values are `none` and `all`. Default is `none`.
The `fuzziness` parameter is not supported for `phrase`, `phrase_prefix`, and `cross_fields` queries.
{: .note}
The `slop` parameter is only supported for `phrase` and `phrase_prefix` queries.
{: .note}
### The `tie_breaker` parameter
Each term-level blended query calculates the document score as the best score returned by any field in a group. The scores from all blended queries are added together to produce the final score. You can change the way the score is calculated by using the `tie_breaker` parameter. The `tie_breaker` parameter accepts the following values:
- 0.0 (default for `best_fields`, `cross_fields`, `phrase`, and `phrase_prefix` queries): Take the single best score returned by any field in a group.
- 1.0 (default for `most_fields` and `bool_prefix` queries): Add the scores for all fields in a group.
- A floating-point value in the (0, 1) range: Take the single best score of the best-matching field and add (`tie_breaker` * `_score`) for all other matching fields.

View File

@ -1,33 +1,618 @@
---
layout: default
title: Query string queries
title: Query string
parent: Full-text queries
grand_parent: Query DSL
nav_order: 25
nav_order: 60
redirect_from:
- /opensearch/query-dsl/full-text/query-string/
- /query-dsl/query-dsl/full-text/query-string/
---
# Query string queries
# Query string query
A `query_string` query parses the query string based on the `query_string` syntax. It lets you create powerful yet concise queries that can incorporate wildcards and search multiple fields.
A `query_string` query parses the query string based on the [query string syntax](#query-string-syntax). It provides for creating powerful yet concise queries that can incorporate wildcards and search multiple fields.
## Example
Searches with `query_string` queries do not return nested documents. To search nested fields, use the [`nested` query]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/).
{: .note}
The following query searches for the speaker `KING` in the play name that ends with `well`:
Query string query has a strict syntax and returns an error in case of invalid syntax. Therefore, it does not work well for search box applications. For a less strict alternative, consider using [`simple_query_string` query]({{site.url}}{{site.baseurl}}/query-dsl/full-text/simple-query-string/). If you don't need query syntax support, use the [`match` query]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match/).
{: .important}
## Query string syntax
Query string syntax is based on [Apache Lucene query syntax](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html).
You can use query string syntax in the following cases:
1. In a `query_string` query, for example:
```json
GET _search
{
"query": {
"query_string": {
"query": "the wind AND (rises OR rising)"
}
}
}
```
{% include copy-curl.html %}
1. In the Discover app of OpenSearch Dashboards, if you turn off DQL, as shown in the following image.
![Using query string syntax in OpenSearch Dashboards Discover]({{site.url}}{{site.baseurl}}/images/discover-lucene-syntax.png)
For more information, see [Discover]({{site.url}}{{site.baseurl}}/dashboards/discover/index-discover/).
1. If you search using the HTTP request query parameters, for example:
```json
GET _search?q=wind
```
A query string consists of _terms_ and _operators_. A term is a single word (for example, in the query `wind rises`, the terms are `wind` and `rises`). If several terms are surrounded by quotation marks, they are treated as one phrase where words are marched in the order they appear (for example, `"wind rises"`). Operators (such as `OR`, `AND`, and `NOT`) specify the Boolean logic used to interpret text in the query string.
The examples in this section use an index containing the following mapping and documents:
```json
GET shakespeare/_search
PUT /testindex
{
"query": {
"query_string": {
"query": "speaker:KING AND play_name: *well"
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
```
{% include copy-curl.html %}
```json
PUT /testindex/_doc/1
{
"title": "The wind rises"
}
```
{% include copy-curl.html %}
```json
PUT /testindex/_doc/2
{
"title": "Gone with the wind",
"description": "A 1939 American epic historical film"
}
```
{% include copy-curl.html %}
```json
PUT /testindex/_doc/3
{
"title": "Windy city"
}
```
{% include copy-curl.html %}
```json
PUT /testindex/_doc/4
{
"article title": "Wind turbines"
}
```
{% include copy-curl.html %}
## Reserved characters
The following is a list of reserved characters for the query string query:
`+`, `-`, `=`, `&&`, `||`, `>`, `<`, `!`, `(`, `)`,`{`, `}`, `[`, `]`, `^`, `"`, `~`, `*`, `?`, `:`, `\`, `/`
Escape reserved characters with a backslash (`\`). When sending a JSON request, use a double backslash (`\\`) to escape reserved characters (because the backslash character is itself reserved, you must escape the backslash with another backslash).
{: .tip}
For example, to search for an expression `2*3`, specify the query string: `2\\*3`:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"query": "title: 2\\*3"
}
}
}
```
{% include copy-curl.html %}
The `>` and `<` signs cannot be escaped. They are interpreted as a range query.
{: .important}
## White space characters and empty queries
White space characters are not considered operators. If a query string is empty or only contains white space characters, the query does not return results.
## Field names
Specify the field name before the colon. The following table contains example queries with field names.
Query in the `query_string` query | Query in Discover | Criterion for a document to match | Matching documents from the `testindex` index
:--- | :--- | :--- | :---
`title: wind` | `title: wind` | The `title` field contains the word `wind`. | 1, 2
`title: (wind OR windy)` | `title: (wind OR windy)` | The `title` field contains the word `wind` or the word `windy`. | 1, 2, 3
`title: \"wind rises\"` | `title: "wind rises"` | The `title` field contains the phrase `wind rises`. Escape quotation marks with a backslash. | 1
`article\\ title: wind` | `article\ title: wind` | The `article title` field contains the word `wind`. Escape the space character with a backslash. | 4
`title.\\*: rise` | `title.\*: rise` | Every field that begins with `title.` (in this example, `title.english`) contains the word `rise`. Escape the wildcard character with a backslash. | 1
`_exists_: description` | `_exists_: description` | The field `description` exists. | 2
## Wildcard expressions
You can specify wildcard expressions using special characters: `?` replaces a single character and `*` replaces zero or more characters.
#### Example
The following query searches for the title containing the word `gone` and a description that contains a word starting with `hist`:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"query": "title: gone AND description: hist*"
}
}
}
```
{% include copy-curl.html %}
Wildcard queries can use a significant amount of memory, which can degrade performance. Wildcards at the beginning of a word (for example, `*cal`) are the most expensive because matching documents on such wildcards requires examining all terms in the index. To disable leading wildcards, set `allow_leading_wildcard` to `false`.
{: .warning}
For efficiency, pure wildcards such as `*` are rewritten as `exists` queries. Therefore, the `description: *` wildcard will match documents containing an empty value in the `description` field but will not match documents in which the `description` field is either missing or has a `null` value.
If you set `analyze_wildcard` to `true`, OpenSearch will analyze queries that end with a `*` (such as `hist*`). Consequently, OpenSearch will build a Boolean query comprising the resulting tokens by taking exact matches on the first n-1 tokens and a prefix match on the last token.
## Regular expressions
To specify regular expression patterns in a query string, surround them with forward slashes (`/`), for example `title: /w[a-z]nd/`.
The `allow_leading_wildcard` parameter does not apply to regular expressions. For example, a query string such as `/.*d/` will examine all terms in the index.
{: .important}
## Fuzziness
You can run fuzzy queries using the `~` operator, for example `title: rise~`.
The query searches for documents containing terms that are similar to the search term within the maximum allowed edit distance. The edit distance is defined as the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance), which measures the number of one-character changes (insertions, deletions, substitutions, or transpositions) needed to change one term to another term.
The default edit distance of 2 should catch 80% of misspellings. To change the default edit distance, specify the new edit distance after the `~` operator. For example, to set the edit distance to `1`, use the query `title: rise~1`.
Do not mix fuzzy and wildcard operators. If you specify both fuzzy and wildcard operators, one of the operators will not be applied. For example, if you can search for `wnid*~1`, the wildcard operator `*` will be applied but the fuzzy operator `~1` will not be applied.
{: .important}
## Proximity queries
A proximity query does not require the search phrase to be in the specified order. It allows the words in the phrase to be in a different order or separated by other words. A proximity query specifies a maximum edit distance of words in a phrase. For example, the following query allows an edit distance of 4 when matching the words in the specified phrase:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"query": "title: \"wind gone\"~4"
}
}
}
```
{% include copy-curl.html %}
When OpenSearch matches documents, the closer the words in the document to the word order specified in the query (the less the edit distance), the higher the document's relevance score.
## Ranges
To specify a range for a numeric, string, or date field, use square brackets (`[min TO max]`) for an inclusive range and curly braces (`{min TO max}`) for an exclusive range. You can also mix square brackets and curly braces to include or exclude the lower and upper bound (for example, `{min TO max]`).
The dates for a date range must be provided in the format that you used when mapping the field containing the date. For more information about supported date formats, see [Formats]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/#formats).
The following table provides range syntax examples.
Data type | Query | Query string
:--- | :--- | :---
Numeric | Documents whose account numbers are from 1 to 15, inclusive. | `account_number: [1 TO 15]` or <br> `account_number: (>=1 AND <=15)` or <br> `account_number: (+>=1 +<=15)`
| Documents whose account numbers are 15 and greater. | `account_number: [15 TO *]` or <br> `account_number: >=15` (note no space after the `>=` sign)
String | Documents where last name is from Bates, inclusive, to Duke, exclusive. | `lastname: [Bates TO Duke}` or <br> `lastname: (>=Bates AND <Duke)`
| Documents where last name precedes Bates alphabetically. | `lastname: {* TO Bates}` or <br> `lastname: <Bates` (note no space after the `<` sign)
Date | Documents where the release date is between 03/21/2023 and 09/25/2023, inclusive. | `release_date: [03/21/2023 TO 09/25/2023]`
As an alternative to specifying a range in a query string, you can use a [range query]({{site.url}}{{site.baseurl}}/query-dsl/term/range/), which provides a more reliable syntax.
{: .tip}
## Boosting
Use the caret (`^`) boost operator to boost the relevance score of documents by a multiplier. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`.
The following table provides boost examples.
Type | Description | Query string
:--- | :--- | :---
Word boost | Find all addresses containing the word `street` and boost the ones containing the word `Madison`. | `address: Madison^2 street`
Phrase boost | Find documents with the title containing the phrase `wind rises`, boosted by 2. | `title: \"wind rises\"^2`
| Find documents with the title containing the words `wind rises`, and boost the documents containing the phrase `wind rises` by 2. | `title: (wind rises)^2`
## Boolean operators
When you provide search terms in the query, by default, the query returns documents containing at least one of the provided terms. You can use the `default_operator` parameter to specify an operator for all terms. Thus, if you set the `default_operator` to `AND`, all terms will be required, whereas if you set it to `OR`, all terms will be optional.
### `+` and `-` operators
If you want more granular control over the required and optional terms, you can use the `+` and `-` operators. The `+` operator makes the term following it required, while the `-` operator excludes the term following it.
For example, in the query string `title: (gone +wind -turbines)` specifies that the term `gone` is optional, the term `wind` must be present and the term `turbines` must not be present in the title of the matching documents:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"query": "title: (gone +wind -turbines)"
}
}
}
```
{% include copy-curl.html %}
The query returns two matching documents:
```json
{
"_index": "testindex",
"_id": "2",
"_score": 1.3159468,
"_source": {
"title": "Gone with the wind",
"description": "A 1939 American epic historical film"
}
},
{
"_index": "testindex",
"_id": "1",
"_score": 0.3438858,
"_source": {
"title": "The wind rises"
}
}
```
The preceding query is equivalent to the following Boolean query:
```json
GET testindex/_search
{
"query": {
"bool": {
"must": {
"match": {
"title": "wind"
}
},
"should": {
"match": {
"title": "gone"
}
},
"must_not": {
"match": {
"title": "turbines"
}
}
}
}
}
```
### Conventional Boolean operators
Alternatively, you can use the following Boolean operators: `AND`, `&&`, `OR`, `||`, `NOT`, `!`. However, these operators do not follow the precedence rules, so you must use parentheses to specify precedence when using multiple Boolean operators. For example, the query string `title: (gone +wind -turbines)` can be rewritten as follows using Boolean operators:
`title: ((gone AND wind) OR wind) AND NOT turbines`
Run the following query that contains the rewritten query string:
```json
GET testindex/_search
{
"query": {
"query_string": {
"query": "title: ((gone AND wind) OR wind) AND NOT turbines"
}
}
}
```
{% include copy-curl.html %}
The query returns the same results as the query that uses the `+` and `-` operators. However, note that the relevance scores of the matching documents are not the same as in the previous results:
```json
{
"_index": "testindex",
"_id": "2",
"_score": 1.6166971,
"_source": {
"title": "Gone with the wind",
"description": "A 1939 American epic historical film"
}
},
{
"_index": "testindex",
"_id": "1",
"_score": 0.3438858,
"_source": {
"title": "The wind rises"
}
}
```
{% include copy-curl.html %}
## Grouping
Group multiple clauses or terms into subqueries using parentheses. For example, the following query searches for documents containing the words `gone` or `rises` that must contain the word `wind` in the title:
```json
GET testindex/_search
{
"query": {
"query_string": {
"query": "title: (gone OR rises) AND wind"
}
}
}
```
The results contain the two matching documents:
```json
{
"_index": "testindex",
"_id": "1",
"_score": 1.5046883,
"_source": {
"title": "The wind rises"
}
},
{
"_index": "testindex",
"_id": "2",
"_score": 1.3159468,
"_source": {
"title": "Gone with the wind",
"description": "A 1939 American epic historical film"
}
}
```
You can also use grouping to boost subquery results or to target the specified field, for example `title:(gone AND wind) description:(historical film)^2`.
## Searching multiple fields
To search multiple fields, use the `fields` parameter. When you provide the `fields` parameter, the query is rewritten as `field_1: query OR field_2: query ...`.
For example, the following query searches for the terms `wind` or `film` in the `title` and `description` fields:
```json
GET testindex/_search
{
"query": {
"query_string": {
"fields": [ "title", "description" ],
"query": "wind AND film"
}
}
}
```
{% include copy-curl.html %}
The preceding query is equivalent to the following query that does not provide the `fields` parameter:
```json
GET testindex/_search
{
"query": {
"query_string": {
"query": "(title:wind OR description:wind) AND (title:film OR description:film)"
}
}
}
```
### Searching multiple subfields of a field
To search all inner fields of a field, you can use a wildcard. For example, to search all subfields within the `address` field, use the following query:
```json
GET /testindex/_search
{
"query": {
"query_string" : {
"fields" : ["address.*"],
"query" : "New AND (York OR Jersey)"
}
}
}
```
{% include copy-curl.html %}
The preceding query is equivalent to the following query that does not provide the `fields` parameter (note that the `*` is escaped with `\\`):
```json
GET /testindex/_search
{
"query": {
"query_string" : {
"query" : "address.\\*: New AND (York OR Jersey)"
}
}
}
```
### Boosting
The subqueries that are generated from each search term are combined using a [`dis_max` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/disjunction-max/) with a `tie_breaker`. To boost individual fields, use the `^` operator. For example, the following query boosts the `title` field by a factor of 2:
```json
GET testindex/_search
{
"query": {
"query_string": {
"fields": [ "title^2", "description" ],
"query": "wind AND film"
}
}
}
```
{% include copy-curl.html %}
To boost all subfields of a field, specify the boost operator after the wildcard:
```json
GET /testindex/_search
{
"query": {
"query_string" : {
"fields" : ["work_address", "address.*^2"],
"query" : "New AND (York OR Jersey)"
}
}
}
```
### Parameters for multiple field searches
When searching multiple fields, you can pass the additional optional `type` parameter to the `query_string` query.
Parameter | Data type | Description
:--- | :--- | :---
`type` | String | Determines how OpenSearch executes the query and scores the results. Valid values are `best_fields`, `bool_prefix`, `most_fields`, `cross_fields`, `phrase`, and `phrase_prefix`. Default is `best_fields`. For descriptions of valid values, see [Multi-match query types]({{site.url}}{{site.baseurl}}/query-dsl/full-text/multi-match/#multi-match-query-types).
## Synonyms in the `query_string` query
The `query_string` query supports multi-term synonym expansion with the `synonym_graph` token filter. If you use the `synonym_graph` token filter, OpenSearch creates a [match phrase query]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase/) for each synonym.
The `auto_generate_synonyms_phrase_query` parameter specifies whether to create a match phrase query automatically for multi-term synonyms. By default, `auto_generate_synonyms_phrase_query` is `true`, so if you specify `ml, machine learning` as synonyms and search for `ml`, OpenSearch searches for `ml OR "machine learning"`.
Alternatively, you can match multi-term synonyms using conjunctions. If you set `auto_generate_synonyms_phrase_query` to `false`, OpenSearch searches for `ml OR (machine AND learning)`.
For example, the following query searches for the text `ml models` and specifies not to auto-generate a match phrase query for each synonym:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"default_field": "title",
"query": "ml models",
"auto_generate_synonyms_phrase_query": false
}
}
}
```
{% include copy-curl.html %}
For this query, OpenSearch creates the following Boolean query: `(ml OR (machine AND learning)) models`.
## Minimum should match
The `query_string` query splits the query around each operator and creates a Boolean query for the entire input. The [`minimum_should_match` parameter]({{site.url}}{{site.baseurl}}/query-dsl/minimum-should-match/) specifies the minimum number of terms a document must match to be returned in search results. For example, the following query specifies that the `description` field must match at least two terms for each search result:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"fields": [
"description"
],
"query": "historical epic film",
"minimum_should_match": 2
}
}
}
```
{% include copy-curl.html %}
For this query, OpenSearch creates the following Boolean query: `(description:historical description:epic description:film)~2`.
### Minimum should match with multiple fields
If you specify multiple fields in a `query_string` query, OpenSearch creates a [`dis_max` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/disjunction-max/) for the specified fields. If you don't explicitly specify an operator for the query terms, the whole query text is treated as one clause. OpenSearch builds a query for each field using this single clause. The final Boolean query contains a single clause that corresponds to the `dis_max` query for all fields, therefore the `minimum_should_match` parameter is not applied.
For example, in the following query, `historical epic heroic` is treated as a single clause:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"fields": [
"title",
"description"
],
"query": "historical epic heroic",
"minimum_should_match": 2
}
}
}
```
{% include copy-curl.html %}
For this query, OpenSearch creates the following Boolean query: `((title:historical title:epic title:heroic) | (description:historical description:epic description:heroic))`.
If you add explicit operators (`AND` or `OR`) to the query terms, each term is considered a separate clause, to which the `minimum_should_match` parameter can be applied. For example, in the following query, `historical`, `epic`, and `heroic` are considered separate clauses:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"fields": [
"title",
"description"
],
"query": "historical OR epic OR heroic",
"minimum_should_match": 2
}
}
}
```
{% include copy-curl.html %}
For this query, OpenSearch creates the following Boolean query: `((title:historical | description:historical) (description:epic | title:epic) (description:heroic | title:heroic))~2`. The query matches at least two of the three clauses. Each clause represents a `dis_max` query on both the `title` and `description` fields for each term.
Alternatively, to ensure that `minimum_should_match` can be applied, you can set the `type` parameter to `cross_fields`. This indicates that the fields with the same analyzer should be grouped together when the input text is analyzed:
```json
GET /testindex/_search
{
"query": {
"query_string": {
"fields": [
"title",
"description"
],
"query": "historical epic heroic",
"type": "cross_fields",
"minimum_should_match": 2
}
}
}
```
{% include copy-curl.html %}
For this query, OpenSearch creates the following Boolean query: `((title:historical | description:historical) (description:epic | title:epic) (description:heroic | title:heroic))~2`.
However, if you use different analyzers, you must use explicit operators in the query to ensure that the `minimum_should_match` parameter is applied to each term.
## Parameters
@ -35,24 +620,27 @@ The following table lists the parameters that `query_string` query supports. All
Parameter | Data type | Description
:--- | :--- | :---
`query` | String | The query string to use for search. Required.
`fields` | String array | The list of fields to search (for example, `"fields": ["title^4", "description"]`). Supports wildcards.
`default_field` | String | The field in which to search if the field is not specified in the query string. Supports wildcards. Defaults to the value specified in the `index.query.default_field` index setting. By default, the `index.query.default_field` is `*`, which means extract all fields eligible for term query and filter the metadata fields. The extracted fields are combined into a query if the `prefix` is not specified. Eligible fields do not include nested documents. Searching all eligible fields could be a resource-intensive operation. The `indices.query.bool.max_clause_count` search setting defines the maximum value for the product of the number of fields and the number of terms that can be queried at one time. The default value for `indices.query.bool.max_clause_count` is 4,096.
`query` | String | The text that may contain expressions in the [query string syntax](#query-string-syntax) to use for search. Required.
`allow_leading_wildcard` | Boolean | Specifies whether `*` and `?` are allowed as first characters of a search term. Default is `true`.
`analyze_wildcard` | Boolean | Specifies whether OpenSearch should attempt to analyze wildcard terms. Default is `false`.
`analyzer` | String | The analyzer used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index.
`quote_analyzer` | String | The analyzer used to tokenize quoted text in the query string. Overrides the `analyzer` parameter for quoted text. Default is the `search_quote_analyzer` specified for the `default_field`.
`quote_field_suffix` | String | This option lets you search for exact matches (surrounded with quotation marks) using a different analysis method than non-exact matches use. For example, if `quote_field_suffix` is `.exact` and you search for `\"lightly\"` in the `title` field, OpenSearch searches for the word `lightly` in the `title.exact` field. This second field might use a different type (for example, `keyword` rather than `text`) or a different analyzer.
`phrase_slop` | Integer | The maximum number of words that are allowed between the matched words. If `phrase_slop` is 2, a maximum of two words is allowed between matched words in a phrase. Transposed words have a slop of 2. Default is 0 (an exact phrase match where matched words must be next to each other).
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches.
`rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`.
`auto_generate_synonyms_phrase_query` | Boolean | Specifies whether to create [match queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#match) automatically for multi-term synonyms. Default is `true`.
`boost` | Floating-point | Boosts the clause by the given multiplier. Values less than 1.0 decrease relevance, and values greater than 1.0 increase relevance. Default is 1.0.
`default_operator`| String | The default Boolean operator used if no operators are specified. Valid values are:<br>- `OR`: The string `to be` is interpreted as `to OR be`<br>- `AND`: The string `to be` is interpreted as `to AND be`<br> Default is `OR`.
`enable_position_increments` | Boolean | When true, resulting queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. Default is `true`.
`analyzer` | String | The [analyzer]({{site.url}}{{site.baseurl}}/analyzers/index/) used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index.
`auto_generate_synonyms_phrase_query` | Boolean | Specifies whether to create a [match phrase query]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase/) automatically for multi-term synonyms. For example, if you specify `ba, batting average` as synonyms and search for `ba`, OpenSearch searches for `ba OR "batting average"` (if this option is `true`) or `ba OR (batting AND average)` (if this option is `false`). Default is `true`.
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`.
`default_field` | String | The field in which to search if the field is not specified in the query string. Supports wildcards. Defaults to the value specified in the `index.query. Default_field` index setting. By default, the `index.query. Default_field` is `*`, which means extract all fields eligible for term query and filter the metadata fields. The extracted fields are combined into a query if the `prefix` is not specified. Eligible fields do not include nested documents. Searching all eligible fields could be a resource-intensive operation. The `indices.query.bool.max_clause_count` search setting defines the maximum value for the product of the number of fields and the number of terms that can be queried at one time. The default value for `indices.query.bool.max_clause_count` is 1,024.
`default_operator`| String | If the query string contains multiple search terms, whether all terms need to match (`AND`) or only one term needs to match (`OR`) for a document to be considered a match. Valid values are:<br>- `OR`: The string `to be` is interpreted as `to OR be`<br>- `AND`: The string `to be` is interpreted as `to AND be`<br> Default is `OR`.
`enable_position_increments` | Boolean | When `true`, resulting queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. Default is `true`.
`fields` | String array | The list of fields to search (for example, `"fields": ["title^4", "description"]`). Supports wildcards. If unspecified, defaults to the `index.query. Default_field` setting, which defaults to `["*"]`.
`fuzziness` | String | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. Valid values are non-negative integers or `AUTO`. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`fuzzy_max_expansions` | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`.
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`fuzzy_max_expansions` | Positive integer | The maximum number of terms the fuzzy query will match. Default is 50.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. Default is `false`.
`lenient` | Boolean | Setting `lenient` to `true` ignores data type mismatches between the query and the document field. For example, a query string of `"8.2"` could match a field of type `float`. Default is `false`.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (for example, `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. Default is 10,000.
`time_zone` | String | Specifies the number of hours to offset the desired time zone from `UTC`. You need to indicate the time zone offset number if the query string contains a date range. For example, set `time_zone": "-08:00"` for a query with a date range such as `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default time zone format used to specify number of offset hours is `UTC`.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, `wind often rising` does not match `The Wind Rises.` If `minimum_should_match` is `1`, it matches. For details, see [Minimum should match]({{site.url}}{{site.baseurl}}/query-dsl/minimum-should-match/).
`phrase_slop` | Integer | The maximum number of words that are allowed between the matched words. If `phrase_slop` is 2, a maximum of two words is allowed between matched words in a phrase. Transposed words have a slop of 2. Default is `0` (an exact phrase match where matched words must be next to each other).
`quote_analyzer` | String | The analyzer used to tokenize quoted text in the query string. Overrides the `analyzer` parameter for quoted text. Default is the `search_quote_analyzer` specified for the `default_field`.
`quote_field_suffix` | String | This option supports searching for exact matches (surrounded with quotation marks) using a different analysis method than non-exact matches use. For example, if `quote_field_suffix` is `.exact` and you search for `\"lightly\"` in the `title` field, OpenSearch searches for the word `lightly` in the `title.exact` field. This second field might use a different type (for example, `keyword` rather than `text`) or a different analyzer.
`rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`.
`time_zone` | String | Specifies the number of hours to offset the desired time zone from `UTC`. You need to indicate the time zone offset number if the query string contains a date range. For example, set `time_zone": "-08:00"` for a query with a date range such as `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default time zone format used to specify number of offset hours is `UTC`.
Query string queries may be internally converted into [prefix queries]({{site.url}}{{site.baseurl}}/query-dsl/term/prefix/). If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, prefix queries are not executed. If `index_prefixes` is enabled, the `search.allow_expensive_queries` setting is ignored and an optimized query is built and executed.
{: .important}

View File

@ -0,0 +1,369 @@
---
layout: default
title: Simple query string
parent: Full-text queries
grand_parent: Query DSL
nav_order: 70
---
# Simple query string query
Use the `simple_query_string` type to specify multiple arguments delineated by regular expressions directly in the query string. Simple query string has a less strict syntax than query string because it discards any invalid portions of the string and does not return errors for invalid syntax.
This query uses a [simple syntax](#simple-query-string-syntax) to parse the query string based on special operators and split the string into terms. After parsing, the query analyzes each term independently and then returns matching documents.
The following query performs fuzzy search on the `title` field:
```json
GET _search
{
"query": {
"simple_query_string": {
"query": "\"rises wind the\"~4 | *ising~2",
"fields": ["title"]
}
}
}
```
{% include copy-curl.html %}
## Simple query string syntax
A query string consists of _terms_ and _operators_. A term is a single word (for example, in the query `wind rises`, the terms are `wind` and `rises`). If several terms are surrounded by quotation marks, they are treated as one phrase where words are marched in the order they appear (for example, `"wind rises"`). Operators such as `+`, `|`, and `-` specify the Boolean logic used to interpret text in the query string.
## Operators
Simple query string syntax supports the following operators.
Operator | Description
:--- | :---
`+` | Acts as the `AND` operator.
`|` | Acts as the `OR` operator.
`*` | When used at the end of a term, signifies a prefix query.
`"` | Wraps several terms into a phrase (for example, `"wind rises"`).
`(`, `)` | Wrap a clause for precedence (for example, `wind + (rises | rising)`).
`~n` | When used after a term (for example, `wnid~3`), sets `fuzziness`. When used after a phrase, sets `slop`.
`-` | Negates the term.
All of the preceding operators are reserved characters. To refer to them as raw characters and not operators, escape any of them with a backslash. When sending a JSON request, use `\\` to escape reserved characters (because the backslash character is itself reserved, you must escape the backslash with another backslash).
## Default operator
The default operator is `OR` (unless you set the `default_operator` to `AND`). The default operator dictates the overall query behavior. For example, consider an index containing the following documents:
```json
PUT /customers/_doc/1
{
"first_name":"Amber",
"last_name":"Duke",
"address":"880 Holmes Lane"
}
```
{% include copy-curl.html %}
```json
PUT /customers/_doc/2
{
"first_name":"Hattie",
"last_name":"Bond",
"address":"671 Bristol Street"
}
```
{% include copy-curl.html %}
```json
PUT /customers/_doc/3
{
"first_name":"Nanette",
"last_name":"Bates",
"address":"789 Madison St"
}
```
{% include copy-curl.html %}
```json
PUT /customers/_doc/4
{
"first_name":"Dale",
"last_name":"Amber",
"address":"467 Hutchinson Court"
}
```
{% include copy-curl.html %}
The following query attempts to find documents, for which the address contains the words `street` or `st` and does not contain the word `madison`:
```json
GET /customers/_search
{
"query": {
"simple_query_string": {
"fields": [ "address" ],
"query": "street st -madison"
}
}
}
```
{% include copy-curl.html %}
However, the results include not only the expected document, but all four documents:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 2.2039728,
"hits": [
{
"_index": "customers",
"_id": "2",
"_score": 2.2039728,
"_source": {
"first_name": "Hattie",
"last_name": "Bond",
"address": "671 Bristol Street"
}
},
{
"_index": "customers",
"_id": "3",
"_score": 1.2039728,
"_source": {
"first_name": "Nanette",
"last_name": "Bates",
"address": "789 Madison St"
}
},
{
"_index": "customers",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "Amber",
"last_name": "Duke",
"address": "880 Holmes Lane"
}
},
{
"_index": "customers",
"_id": "4",
"_score": 1,
"_source": {
"first_name": "Dale",
"last_name": "Amber",
"address": "467 Hutchinson Court"
}
}
]
}
}
```
</details>
Because the default operator is `OR`, this query includes documents that contain the words `street` or `st` (documents 2 and 3) and documents that do not contain the word `madison` (documents 1 and 4).
To express the query intent correctly, precede `-madison` with `+`:
```json
GET /customers/_search
{
"query": {
"simple_query_string": {
"fields": [ "address" ],
"query": "street st +-madison"
}
}
}
```
{% include copy-curl.html %}
Alternatively, specify `AND` as the default operator and use disjunction for the words `street` and `st`:
```json
GET /customers/_search
{
"query": {
"simple_query_string": {
"fields": [ "address" ],
"query": "st|street -madison",
"default_operator": "AND"
}
}
}
```
{% include copy-curl.html %}
The preceding query returns document 2:
<details closed markdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 2.2039728,
"hits": [
{
"_index": "customers",
"_id": "2",
"_score": 2.2039728,
"_source": {
"first_name": "Hattie",
"last_name": "Bond",
"address": "671 Bristol Street"
}
}
]
}
}
```
</details>
## Limit operators
To limit the supported operators for the simple query string parser, include the operators that you want to support, separated by `|`, in the `flags` parameter. For example, the following query enables only `OR`, `AND`, and `FUZZY` operators:
```json
GET /customers/_search
{
"query": {
"simple_query_string": {
"fields": [ "address" ],
"query": "bristol | madison +stre~2",
"flags": "OR|AND|FUZZY"
}
}
}
```
{% include copy-curl.html %}
The following table lists all available operator flags.
Flag | Description
:--- | :---
`ALL` (default) | Enables all operators.
`AND` | Enables the `+` (`AND`) operator.
`ESCAPE` | Enables the `\` as an escape character.
`FUZZY` | Enables the `~n` operator after a word, where `n` is an integer denoting the allowed edit distance for matching.
`NEAR` | Enables the `~n` operator after a phrase, where `n` is the maximum number of positions allowed between matching tokens. Same as `SLOP`.
`NONE` | Disables all operators.
`NOT` | Enables the `-` (`NOT`) operator.
`OR` | Enables the `|` (`OR`) operator.
`PHRASE` | Enables the `"` (quotation marks) for phrase search.
`PRECEDENCE` | Enables the `(` and `)` (parentheses) operators for operator precedence.
`PREFIX` | Enables the `*` (prefix) operator.
`SLOP` | Enables the `~n` operator after a phrase, where `n` is the maximum number of positions allowed between matching tokens. Same as `NEAR`.
`WHITESPACE` | Enables white space characters as characters on which the text is split.
## Wildcard expressions
You can specify wildcard expressions using the `*` special character, which replaces zero or more characters. For example, the following query searches in all fields that end with `name`:
```json
GET /customers/_search
{
"query": {
"simple_query_string" : {
"query": "Amber Bond",
"fields": [ "*name" ]
}
}
}
```
{% include copy-curl.html %}
## Boosting
Use the caret (`^`) boost operator to boost the relevance score of a field by a multiplier. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`.
For example, the following query searches the `first_name` and `last_name` fields and boosts matches from the `first_name` field by a factor of 2:
```json
GET /customers/_search
{
"query": {
"simple_query_string" : {
"query": "Amber",
"fields": [ "first_name^2", "last_name" ]
}
}
}
```
{% include copy-curl.html %}
## Multi-position tokens
For multi-position tokens, simple query string creates a [match phrase query]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase/). Thus, if you specify `ml, machine learning` as synonyms and search for `ml`, OpenSearch searches for `ml OR "machine learning"`.
Alternatively, you can match multi-position tokens using conjunctions. If you set `auto_generate_synonyms_phrase_query` to `false`, OpenSearch searches for `ml OR (machine AND learning)`.
For example, the following query searches for the text `ml models` and specifies not to auto-generate a match phrase query for each synonym:
```json
GET /testindex/_search
{
"query": {
"simple_query_string": {
"fields": ["title"],
"query": "ml models",
"auto_generate_synonyms_phrase_query": false
}
}
}
```
{% include copy-curl.html %}
For this query, OpenSearch creates the following Boolean query: `(ml OR (machine AND learning)) models`.
## Parameters
The following table lists the top-level parameters that `simple_query_string` query supports. All parameters except `query` are optional.
Parameter | Data type | Description
:--- | :--- | :---
`query`| String | The text that may contain expressions in the [simple query string syntax](#simple-query-string-syntax) to use for search. Required.
`analyze_wildcard` | Boolean | Specifies whether OpenSearch should attempt to analyze wildcard terms. Default is `false`.
`analyzer` | String | The analyzer used to tokenize the query string text. Default is the index-time analyzer specified for the `default_field`. If no analyzer is specified for the `default_field`, the `analyzer` is the default analyzer for the index.
`auto_generate_synonyms_phrase_query` | Boolean | Specifies whether to create [match_phrase queries]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match/) automatically for multi-term synonyms. Default is `true`.
`default_operator`| String | If the query string contains multiple search terms, whether all terms need to match (`AND`) or only one term needs to match (`OR`) for a document to be considered a match. Valid values are:<br>- `OR`: The string `to be` is interpreted as `to OR be`<br>- `AND`: The string `to be` is interpreted as `to AND be`<br> Default is `OR`.
`fields` | String array | The list of fields to search (for example, `"fields": ["title^4", "description"]`). Supports wildcards. If unspecified, defaults to the `index.query. Default_field` setting, which defaults to `["*"]`. The maximum number of fields that can be searched at the same time is defined by `indices.query.bool.max_clause_count`, which is 1,024 by default.
`flags` | String | A `|`-delimited string of [flags]({{site.baseurl}}/query-dsl/full-text/simple-query-string/) to enable (for example, `AND|OR|NOT`). Default is `ALL`. You can explicitly set the value for `default_field`. For example, to return all titles, set it to `"default_field": "title"`.
`fuzzy_max_expansions` | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`.
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to `true` (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`fuzzy_prefix_length`| Integer | The number of beginning characters left unchanged for fuzzy matching. Default is 0.
`lenient` | Boolean | Setting `lenient` to `true` ignores data type mismatches between the query and the document field. For example, a query string of `"8.2"` could match a field of type `float`. Default is `false`.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you use the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, `wind often rising` does not match `The Wind Rises.` If `minimum_should_match` is `1`, it matches. For details, see [Minimum should match]({{site.url}}{{site.baseurl}}/query-dsl/minimum-should-match/).
`quote_field_suffix` | String | This option supports searching for exact matches (surrounded with quotation marks) using a different analysis method than non-exact matches use. For example, if `quote_field_suffix` is `.exact` and you search for `\"lightly\"` in the `title` field, OpenSearch searches for the word `lightly` in the `title.exact` field. This second field might use a different type (for example, `keyword` rather than `text`) or a different analyzer.

View File

@ -1,6 +1,6 @@
---
layout: default
title: Geo-bounding box queries
title: Geo-bounding box
parent: Geographic and xy queries
grand_parent: Query DSL
nav_order: 10
@ -9,9 +9,9 @@ redirect_from:
- /query-dsl/query-dsl/geo-and-xy/geo-bounding-box/
---
# Geo-bounding box queries
# Geo-bounding box query
To search for documents that contain [geopoint]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point) fields, use a geo-bounding box query. The geo-bounding box query returns documents whose geopoints are within the bounding box specified in the query. A document with multiple geopoints matches the query if at least one geopoint is within the bounding box.
To search for documents that contain [geopoint]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/) fields, use a geo-bounding box query. The geo-bounding box query returns documents whose geopoints are within the bounding box specified in the query. A document with multiple geopoints matches the query if at least one geopoint is within the bounding box.
## Example
@ -170,10 +170,10 @@ Geo-bounding box queries accept the following fields.
Field | Data type | Description
:--- | :--- | :---
_name | String | The name of the filter. Optional.
validation_method | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Default is `STRICT`.
type | String | Specifies how to execute the filter. Valid values are `indexed` (index the filter) and `memory` (execute the filter in memory). Default is `memory`.
ignore_unmapped | Boolean | Specifies whether to ignore an unmapped field. If set to `true`, the query does not return any documents that have an unmapped field. If set to `false`, an exception is thrown when the field is unmapped. Default is `false`.
`_name` | String | The name of the filter. Optional.
`validation_method` | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Default is `STRICT`.
`type` | String | Specifies how to execute the filter. Valid values are `indexed` (index the filter) and `memory` (execute the filter in memory). Default is `memory`.
`ignore_unmapped` | Boolean | Specifies whether to ignore an unmapped field. If set to `true`, the query does not return any documents that have an unmapped field. If set to `false`, an exception is thrown when the field is unmapped. Default is `false`.
## Accepted formats

View File

@ -1,6 +1,6 @@
---
layout: default
title: xy queries
title: xy
parent: Geographic and xy queries
grand_parent: Query DSL
nav_order: 50
@ -10,7 +10,7 @@ redirect_from:
- /query-dsl/query-dsl/geo-and-xy/xy/
---
# xy queries
# xy query
To search for documents that contain [xy point]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-point) and [xy shape]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/xy-shape) fields, use an xy query.
@ -180,9 +180,9 @@ When constructing an xy query, you can also reference the name of a shape pre-in
Parameter | Description
:--- | :---
index | The name of the index that contains the pre-indexed shape.
id | The document ID of the document that contains the pre-indexed shape.
path | The field name of the field that contains the pre-indexed shape as a path.
`index` | The name of the index that contains the pre-indexed shape.
`id` | The document ID of the document that contains the pre-indexed shape.
`path` | The field name of the field that contains the pre-indexed shape as a path.
The following example illustrates referencing the name of a shape pre-indexed in another index. In this example, the index `pre-indexed-shapes` contains the shape that defines the boundaries, and the index `testindex` contains the shapes whose locations are checked against those boundaries.

View File

@ -49,7 +49,7 @@ Broadly, you can classify queries into two categories---*leaf queries* and *comp
- [Specialized queries]({{site.url}}{{site.baseurl}}/query-dsl/specialized/index/): Specialized queries include all other query types (`distance_feature`, `more_like_this`, `percolate`, `rank_feature`, `script`, `script_score`, and `wrapper`).
- **Compound queries**: Compound queries serve as wrappers for multiple leaf or compound clauses, either to combine their results or to modify their behavior. They include the Boolean, disjunction max, constant score, function score, and boosting query types. To learn more, see [Compound queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/index/).
- **Compound queries**: Compound queries serve as wrappers for multiple leaf or compound clauses, either to combine their results or to modify their behavior. They include the Boolean, disjunction max, constant score, function score, and boosting query types. To learn more, see [Compound queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/).
## A note on Unicode special characters in text fields

31
_query-dsl/match-all.md Normal file
View File

@ -0,0 +1,31 @@
---
layout: default
title: Match all queries
nav_order: 65
---
# Match all queries
The `match_all` query returns all documents. This query can be useful in testing large document sets if you need to return the entire set.
```json
GET _search
{
"query": {
"match_all": {}
}
}
```
{% include copy-curl.html %}
The `match_all` query has a `match_none` counterpart, which is rarely useful:
```json
GET _search
{
"query": {
"match_none": {}
}
}
```
{% include copy-curl.html %}

View File

@ -2,6 +2,7 @@
layout: default
title: Term-level queries
has_children: true
has_toc: false
nav_order: 20
---

View File

@ -44,7 +44,7 @@ GET shakespeare/_search
}
```
To make the word order and relative positions flexible, specify a `slop` value. To learn about the `slop` option, see [Other advanced options]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#other-advanced-options).
To make the word order and relative positions flexible, specify a `slop` value. To learn about the `slop` option, see [Slop]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase#slop).
Prefix matching doesnt require any special mappings. It works with your data as is.
However, its a fairly resource-intensive operation. A prefix of `a` could match hundreds of thousands of terms and not be useful to your user.
@ -65,7 +65,7 @@ GET shakespeare/_search
}
```
To learn about the `max_expansions` option, see [Other advanced options]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#other-advanced-options).
The maximum number of terms to which the query can expand. Queries “expand” search terms to a number of matching terms that are within the distance specified in `fuzziness`.
The ease of implementing query-time autocomplete comes at the cost of performance.
When implementing this feature on a large scale, we recommend an index-time solution. With an index-time solution, you might experience slower indexing, but its a price you pay only once and not for every query. The edge n-gram, search-as-you-type, and completion suggester methods are index-time solutions.

View File

@ -38,7 +38,7 @@ You can specify the following options in any order:
- `zero_terms_query`
- `boost`
Refer to the `match` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#match) for parameter descriptions and supported values.
Refer to the `match` query [documentation]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match/) for parameter descriptions and supported values.
### Example 1: Search the `message` field for the text "this is a test":
@ -119,13 +119,13 @@ To search for text in multiple fields, use `MULTI_MATCH` function. This function
### Syntax
The `MULTI_MATCH` function lets you *boost* certain fields using **^** character. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. The syntax allows to specify the fields in double quotes, single quotes, surrounded by backticks, or unquoted. Use star ``"*"`` to search all fields. Star symbol should be quoted.
The `MULTI_MATCH` function *boosts* certain fields by using **^** character. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. The syntax supports specifying the fields with double quotes, single quotes, backticks, or without any quotes. Use star ``"*"`` to search all fields. Star symbol should be quoted.
```sql
multi_match([field_expression+], query_expression[, option=<option_value>]*)
```
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by whitespace. Please, refer to examples below:
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by white space. Refer to the following examples:
```sql
multi_match(["Tags" ^ 2, 'Title' 3.4, `Body`, Comments ^ 0.3], ...)
@ -150,7 +150,7 @@ You can specify the following options for `MULTI_MATCH` in any order:
- `zero_terms_query`
- `boost`
Please, refer to `multi_match` query [documentation](#multi-match) for parameter description and supported values.
Refer to `multi_match` query [documentation]({{site.baseurl}}/query-dsl/full-text/multi-match/) for parameter description and supported values.
### For example, REST API search for `Dale` in either the `firstname` or `lastname` fields:
@ -187,13 +187,13 @@ This function maps to the to the `query_string` query used in search engine, to
### Syntax
The `QUERY_STRING` function has syntax similar to `MATCH_QUERY` and lets you *boost* certain fields using **^** character. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. The syntax allows to specify the fields in double quotes, single quotes, surrounded by backticks, or unquoted. Use star ``"*"`` to search all fields. Star symbol should be quoted.
The `QUERY_STRING` function has syntax similar to `MATCH_QUERY` and *boosts* certain fields by using **^** character. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. The syntax supports specifying the fields with double quotes, single quotes, backticks, or without any quotes. Use star ``"*"`` to search all fields. Star symbol should be quoted.
```sql
query_string([field_expression+], query_expression[, option=<option_value>]*)
```
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by whitespace. Please, refer to examples below:
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by white space. Refer to the following examples:
```sql
query_string(["Tags" ^ 2, 'Title' 3.4, `Body`, Comments ^ 0.3], ...)
@ -226,7 +226,7 @@ You can specify the following options for `QUERY_STRING` in any order:
- `tie_breaker`
- `time_zone`
Refer to the `query_string` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#query-string) for parameter descriptions and supported values.
Refer to the `query_string` query [documentation]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/) for parameter descriptions and supported values.
### Example of using `query_string` in SQL and PPL queries:
@ -283,7 +283,7 @@ The `MATCHPHRASE`/`MATCH_PHRASE` functions let you specify the following options
- `zero_terms_query`
- `boost`
Refer to the `match_phrase` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#match-phrase) for parameter descriptions and supported values.
Refer to the `match_phrase` query [documentation]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase/) for parameter descriptions and supported values.
### Example of using `match_phrase` in SQL and PPL queries:
@ -323,13 +323,13 @@ The **^** lets you *boost* certain fields. Boosts are multipliers that weigh mat
### Syntax
The syntax allows to specify the fields in double quotes, single quotes, surrounded by backticks, or unquoted. Use star ``"*"`` to search all fields. Star symbol should be quoted.
The syntax supports specifying the fields with double quotes, single quotes, backticks, or without any quotes. Use star ``"*"`` to search all fields. Star symbol should be quoted.
```sql
simple_query_string([field_expression+], query_expression[, option=<option_value>]*)
```
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by whitespace. Please, refer to examples below:
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by white space. Refer to the following examples:
```sql
simple_query_string(["Tags" ^ 2, 'Title' 3.4, `Body`, Comments ^ 0.3], ...)
@ -351,7 +351,7 @@ You can specify the following options for `SIMPLE_QUERY_STRING` in any order:
- `minimum_should_match`
- `quote_field_suffix`
Refer to the `simple_query_string` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#simple-query-string) for parameter descriptions and supported values.
Refer to the `simple_query_string` query [documentation]({{site.url}}{{site.baseurl}}/query-dsl/full-text/simple-query-string/) for parameter descriptions and supported values.
### *Example* of using `simple_query_string` in SQL and PPL queries:
@ -402,7 +402,7 @@ The `MATCH_PHRASE_PREFIX` function lets you specify the following options in any
- `zero_terms_query`
- `boost`
Refer to the `match_phrase_prefix` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#match-phrase-prefix) for parameter descriptions and supported values.
Refer to the `match_phrase_prefix` query [documentation]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-phrase-prefix/) for parameter descriptions and supported values.
### *Example* of using `match_phrase_prefix` in SQL and PPL queries:
@ -458,7 +458,7 @@ The `MATCH_BOOL_PREFIX` function lets you specify the following options in any o
- `analyzer`
- `operator`
Refer to the `match_bool_prefix` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#match-boolean-prefix) for parameter descriptions and supported values.
Refer to the `match_bool_prefix` query [documentation]({{site.url}}{{site.baseurl}}/query-dsl/full-text/match-bool-prefix/) for parameter descriptions and supported values.
### Example of using `match_bool_prefix` in SQL and PPL queries:

View File

@ -220,7 +220,7 @@ Get all SM policies:
```json
GET _plugins/_sm/policies
```
You can use a [query string]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index#query-string) and specify pagination, the field to be sorted by, and sort order:
You can use a [query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/) and specify pagination, the field to be sorted by, and sort order:
```json
GET _plugins/_sm/policies?from=0&size=20&sortField=sm_policy.name&sortOrder=desc&queryString=*

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB