From 128f57abba566c310027ba6c010deeb65d6e7229 Mon Sep 17 00:00:00 2001 From: Alice Williams <88908598+alicejw-aws@users.noreply.github.com> Date: Wed, 19 Oct 2022 08:17:21 -0700 Subject: [PATCH] Rewrite full-text query definitions (#1548) * start of rewrites for query type definitions Signed-off-by: alicejw * for issue https://github.com/opensearch-project/documentation-website/issues/1116 Signed-off-by: alicejw * for defining the terms multiple query type in this issue https://github.com/opensearch-project/documentation-website/issues/1114 Signed-off-by: alicejw * remove extra instance of multi-term for clarity Signed-off-by: alicejw * clarity for synonym usage with multiple terms searches Signed-off-by: alicejw * for proper 3rd party doc reference Signed-off-by: alicejw * format error fix Signed-off-by: alicejw * fix link format Signed-off-by: alicejw * introduce that we use Apache Lucene search library and give link Signed-off-by: alicejw * additional changes Signed-off-by: alicejw * for 1st pass doc review updates Signed-off-by: alicejw * Update _opensearch/query-dsl/full-text.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _opensearch/query-dsl/full-text.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _opensearch/query-dsl/full-text.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _opensearch/query-dsl/full-text.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Update _opensearch/query-dsl/full-text.md Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * for 2nd doc reviewer updates Signed-off-by: alicejw * for clarity between using analyzers during index time and the auto query time analysis with the standard analyzer Signed-off-by: alicejw * update link text to new section title Signed-off-by: alicejw * update link text for lang analyzer section Signed-off-by: alicejw * update 10 anchor links to a section that now has a new title and anchor Signed-off-by: alicejw * Update _opensearch/query-dsl/full-text.md Co-authored-by: Nate Bower * Update _opensearch/query-dsl/full-text.md Co-authored-by: Nate Bower * updates per editorial review feedback provided Signed-off-by: alicejw * one additional edit Signed-off-by: alicejw * fix format errors from MDlinter Signed-off-by: alicejw Signed-off-by: alicejw Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nate Bower --- _opensearch/query-dsl/full-text.md | 113 ++++++++++++++---------- _opensearch/query-dsl/index.md | 3 + _opensearch/query-dsl/text-analyzers.md | 6 +- 3 files changed, 73 insertions(+), 49 deletions(-) diff --git a/_opensearch/query-dsl/full-text.md b/_opensearch/query-dsl/full-text.md index 2e7e7b5b..831ce222 100644 --- a/_opensearch/query-dsl/full-text.md +++ b/_opensearch/query-dsl/full-text.md @@ -5,13 +5,24 @@ parent: Query DSL nav_order: 40 --- -# Full-text queries +# Full-text query types and options -This page lists all full-text query types and common options. Given the sheer number of options and subtle behaviors, the best method of ensuring useful search results is to test different queries against representative indices and verify the output. +This page lists all full-text query types and common options. There are many optional fields that you can use to create subtle search behaviors, so we recommend that you test out some basic query types against representative indexes and verify the output before you perform more advanced or complex searches with multiple options. +OpenSearch uses the Apache Lucene search library, which provides highly efficient data structures and algorithms for ingesting, indexing, searching, and aggregating data. + +To learn more about search query classes, see [Lucene query JavaDocs](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/Query.html). + +The full-text query types shown in this section use the standard analyzer, which analyzes text automatically when the query is submitted. + +You can also analyze fields when you index them. To learn more about how to convert unstructured text into structured text that is optimized for search, see [Optimizing text for searches with text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers). +{: .note } + + --- -## Table of contents +#### Table of contents 1. TOC {:toc} @@ -21,11 +32,21 @@ This page lists all full-text query types and common options. Given the sheer nu Common terms queries and the optional query field `cutoff_frequency` are now deprecated. {: .note } -## Match +## Query types +OpenSearch Query DSL provides multiple query types that you can use in your searches. + +### Match + +Use the `match` query for full-text search of a specific document field. The `match` query analyzes the provided search string and returns documents that match any of the string's terms. + +You can use Boolean query operators to combine searches. + + -The most basic form of the query provides only a field (`title`) and a term (`wind`): +The following example shows a basic `match` search for the `title` field set to the value `wind`: ```json GET _search @@ -52,7 +73,7 @@ curl --insecure -XGET -u 'admin:admin' https://://_search \ }' ``` -The query accepts the following options. For descriptions of each, see [Optional Query Fields](#optional-query-fields). +The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options). ```json GET _search @@ -75,11 +96,11 @@ GET _search } } } +``` +### Multi-match -## Multi match - -Similar to [match](#match), but searches multiple fields. +You can use the `multi_match` query type to search multiple fields. Multi-match operation functions similarly to the [match](#match) operation. The `^` lets you "boost" certain fields. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. In the following example, a match for "wind" in the title field influences `_score` four times as much as a match in the plot field. The result is that films like *The Wind Rises* and *Gone with the Wind* are near the top of the search results, and films like *Twister* and *Sharknado*, which presumably have "wind" in their plot summaries, are near the bottom. @@ -95,7 +116,7 @@ GET _search } ``` -The query accepts the following options. For descriptions of each, see [Optional Query Fields](#optional-query-fields). +The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options). ```json GET _search @@ -122,9 +143,9 @@ GET _search } ``` -## Match boolean prefix +### Match Boolean prefix -Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string. +The `match_bool_prefix` query analyzes the provided search string and creates a `bool` query from the string's terms. It uses every term except the last term as a whole word for matching. The last term is used as a prefix. The `match_bool_prefix` query returns documents that contain either the whole-word terms or terms that start with the prefix term, in any order. ```json GET _search @@ -137,7 +158,7 @@ GET _search } ``` -The query accepts the following options. For descriptions of each, see [Optional Query Fields](#optional-query-fields). +The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options). ```json GET _search @@ -159,7 +180,11 @@ GET _search } ``` -## Match phrase +For more reference information about prefix queries, see the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PrefixQuery.html). + +### Match phrase + +Use the `match_phrase` query to match documents that contain an exact phrase in a specified order. You can add flexibility to phrase matching by providing the `slop` parameter. Creates a [phrase query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms. @@ -174,7 +199,7 @@ GET _search } ``` -The query accepts the following options. For descriptions of each, see [Optional Query Fields](#optional-query-fields). +The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options). ```json GET _search @@ -192,7 +217,9 @@ GET _search } ``` -## Match phrase prefix +### Match phrase prefix + +Use the `match_phrase_prefix` query to specify a phrase to match in order. The documents that contain the phrase you specify will be returned. The last partial term in the phrase is interpreted as a prefix, so any documents that contain phrases that begin with the phrase and prefix of the last term will be returned. Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string. @@ -207,7 +234,7 @@ GET _search } ``` -The query accepts the following options. For descriptions of each, see [Optional Query Fields](#optional-query-fields). +The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options). ```json GET _search @@ -242,7 +269,7 @@ GET _search } ``` -The query accepts the following options. For descriptions of each, see [Optional Query Fields](#optional-query-fields). +The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options). ```json GET _search @@ -265,7 +292,7 @@ GET _search } ``` --> -## Query string +### Query string The query string query splits text based on operators and analyzes each individually. @@ -283,7 +310,7 @@ GET _search } ``` -The query accepts the following options. For descriptions of each, see [Optional Query Fields](#optional-query-fields). +The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options). ```json GET _search @@ -316,9 +343,9 @@ GET _search } ``` -## Simple query string +### Simple query string -The simple query string query is like the query string query, but it lets advanced users specify many arguments directly in the query string. The query discards any invalid portions of the query string. +Use the `simple_query_string` type to specify directly in the query string multiple arguments delineated by regular expressions. Searches with this type will discard any invalid portions of the string. ```json GET _search @@ -339,10 +366,10 @@ Special character | Behavior `*` | Acts as a wildcard. `""` | Wraps several terms into a phrase. `()` | Wraps a clause for precedence. -`~n` | When used after a term (e.g. `wnid~3`), sets `fuzziness`. When used after a phrase, sets `slop`. See [Optional Query Fields](#optional-query-fields). +`~n` | When used after a term (for example, `wnid~3`), sets `fuzziness`. When used after a phrase, sets `slop`. [Advanced filter options](#advanced-filter-options). `-` | Negates the term. -The query accepts the following options. For descriptions of each, see [Optional Query Fields](#optional-query-fields). +The query accepts the following options. For descriptions of each, see [Advanced filter options](#advanced-filter-options). ```json GET _search @@ -367,9 +394,9 @@ GET _search } ``` -## Match all +### Match all -Matches all documents. Can be useful for testing. +The `match_all` query type will return all documents. This type can be useful in testing large document sets if you need to return the entire set. ```json GET _search @@ -380,6 +407,7 @@ GET _search } ``` + +## Advanced filter options -## Convert text with analyzers +You can filter your query results by using some of the optional query fields, such as wildcards, fuzzy query fields, and synonyms. You can also use analyzers as optional query fields. To learn more, see [How to use text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers/#how-to-use-text-analyzers). -To convert unstructured text into the format that you need, you can use the text analyzers. To learn more about how to convert unstructured text into structured text that is optimized for search, see [Text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers). - - - -## Optional query fields - -You can filter your query results by using some of the optional query fields, such as wildcards, analyzers, fuzzy query fields, and synonyms. - -### Use wildcards +### Wildcard options Option | Valid values | Description :--- | :--- | :--- `allow_leading_wildcard` | Boolean | Whether `*` and `?` are allowed as the first character of a search term. The default is `true`. `analyze_wildcard` | Boolean | Whether OpenSearch should attempt to analyze wildcard terms. Some analyzers do a poor job at this task, so the default is false. -### Use built-in analyzers - -Option | Valid values | Description -:--- | :--- | :--- -`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, language, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (e.g., "an," "but," "this") from the query string. For a full list of acceptable language values, see [Convert text with analyzers](#convert-text-with-analyzers) on this page. -`quote_analyzer` | String | This option lets you choose to use the standard analyzer without any options, such as `language` or other analyzers. Usage is `"quote_analyzer": "standard"`. - -### Run fuzzy queries +### Fuzzy query options Option | Valid values | Description :--- | :--- | :--- `fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) that it takes to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases. -`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n").

If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases. +`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases. `fuzzy_max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indexes. -### Use synonyms with a query +### Synonyms in a multiple terms search -You can also run multi-term queries that allow for generating synonyms. Use the `auto_generate_synonyms_phrase_query` Boolean field. By default it is set to `true`. It automatically generates [phrase queries](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false). +You can also use synonyms with the `terms` query type to search for multiple terms. Use the `auto_generate_synonyms_phrase_query` Boolean field. By default it is set to `true`. It automatically generates phrase queries for multiple term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` when the option is `true` or `ba OR (batting AND average)` when the option is `false`. -### Other optional query fields +To learn more about the multiple terms query type, see [Terms]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/#terms). For more reference information about phrase queries, see the [Lucene documentation](https://lucene.apache.org/core/8_9_0/core/org/apache/lucene/search/PhraseQuery.html). + +### Other advanced options You can also use the following optional query fields to filter your query results. diff --git a/_opensearch/query-dsl/index.md b/_opensearch/query-dsl/index.md index 7eac3680..429ce5f8 100644 --- a/_opensearch/query-dsl/index.md +++ b/_opensearch/query-dsl/index.md @@ -57,8 +57,11 @@ GET _search?q=speaker:queen With query DSL, however, you can include an HTTP request body to look for results more tailored to your needs. The following example shows how to search for `speaker` and `text_entry` fields that have a value of `QUEEN`. + **Sample request** ```json +GET _search { "query": { "multi_match": { diff --git a/_opensearch/query-dsl/text-analyzers.md b/_opensearch/query-dsl/text-analyzers.md index 98873218..9e7a0e69 100644 --- a/_opensearch/query-dsl/text-analyzers.md +++ b/_opensearch/query-dsl/text-analyzers.md @@ -49,8 +49,12 @@ Each analyzer consists of one tokenizer and zero or more token filters. Differen } ``` +## Analyzer options - +Option | Valid values | Description +:--- | :--- | :--- +`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, language, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (for example, "an," "but," "this") from the query string. For a full list of acceptable language values, see [Language analyzer](#language-analyzer) on this page. +`quote_analyzer` | String | This option lets you choose to use the standard analyzer without any options, such as `language` or other analyzers. Usage is `"quote_analyzer": "standard"`.