Split query DSL, analyzer, and aggregation sections and add more to analyzer section (#4693)

* Add analyzer documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add index and search analyzer pages

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Doc review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* More doc review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update index-analyzers.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
kolchfa-aws 2023-08-08 09:41:55 -04:00 committed by GitHub
parent aed1d68ae0
commit a87fdc0f63
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
76 changed files with 624 additions and 276 deletions

View File

@ -4,6 +4,8 @@ title: Adjacency matrix
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 10
redirect_from:
- /query-dsl/aggregations/bucket/adjacency-matrix/
---
# Adjacency matrix aggregations

View File

@ -4,6 +4,8 @@ title: Date histogram
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 20
redirect_from:
- /query-dsl/aggregations/bucket/date-histogram/
---
# Date histogram aggregations

View File

@ -4,6 +4,8 @@ title: Date range
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 30
redirect_from:
- /query-dsl/aggregations/bucket/date-range/
---
# Date range aggregations

View File

@ -4,6 +4,8 @@ title: Diversified sampler
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 40
redirect_from:
- /query-dsl/aggregations/bucket/diversified-sampler/
---
# Diversified sampler aggregations

View File

@ -4,6 +4,8 @@ title: Filter
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 50
redirect_from:
- /query-dsl/aggregations/bucket/filter/
---
# Filter aggregations

View File

@ -4,6 +4,8 @@ title: Filters
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 60
redirect_from:
- /query-dsl/aggregations/bucket/filters/
---
# Filters aggregations

View File

@ -4,6 +4,8 @@ title: Geodistance
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 70
redirect_from:
- /query-dsl/aggregations/bucket/geo-distance/
---
# Geodistance aggregations

View File

@ -4,6 +4,8 @@ title: Geohash grid
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 80
redirect_from:
- /query-dsl/aggregations/bucket/geohash-grid/
---
# Geohash grid aggregations

View File

@ -7,6 +7,7 @@ nav_order: 85
redirect_from:
- /aggregations/geohexgrid/
- /query-dsl/aggregations/geohexgrid/
- /query-dsl/aggregations/bucket/geohex-grid/
---
# Geohex grid aggregations

View File

@ -4,6 +4,8 @@ title: Geotile grid
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 87
redirect_from:
- /query-dsl/aggregations/bucket/geotile-grid/
---
# Geotile grid aggregations

View File

@ -4,6 +4,8 @@ title: Global
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 90
redirect_from:
- /query-dsl/aggregations/bucket/global/
---
# Global aggregations

View File

@ -4,6 +4,8 @@ title: Histogram
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 100
redirect_from:
- /query-dsl/aggregations/bucket/histogram/
---
# Histogram aggregations

View File

@ -0,0 +1,45 @@
---
layout: default
title: Bucket aggregations
has_children: true
has_toc: false
nav_order: 3
redirect_from:
- /opensearch/bucket-agg/
- /query-dsl/aggregations/bucket-agg/
- /query-dsl/aggregations/bucket/
- /aggregations/bucket-agg/
---
# Bucket aggregations
Bucket aggregations categorize sets of documents as buckets. The type of bucket aggregation determines the bucket for a given document.
You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users filter the results.
## Supported bucket aggregations
OpenSearch supports the following bucket aggregations:
- [Adjacency matrix]({{site.url}}{{site.baseurl}}/aggregations/bucket/adjacency-matrix/)
- [Date histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/)
- [Date range]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-range/)
- [Diversified sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/diversified-sampler/)
- [Filter]({{site.url}}{{site.baseurl}}/aggregations/bucket/filter/)
- [Filters]({{site.url}}{{site.baseurl}}/aggregations/bucket/filters/)
- [Geodistance]({{site.url}}{{site.baseurl}}/aggregations/bucket/geo-distance/)
- [Geohash grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geohash-grid/)
- [Geohex grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geohex-grid/)
- [Geotile grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geotile-grid/)
- [Global]({{site.url}}{{site.baseurl}}/aggregations/bucket/global/)
- [Histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/histogram/)
- [IP range]({{site.url}}{{site.baseurl}}/aggregations/bucket/ip-range/)
- [Missing]({{site.url}}{{site.baseurl}}/aggregations/bucket/missing/)
- [Multi-terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/multi-terms/)
- [Nested]({{site.url}}{{site.baseurl}}/aggregations/bucket/nested/)
- [Range]({{site.url}}{{site.baseurl}}/aggregations/bucket/range/)
- [Reverse nested]({{site.url}}{{site.baseurl}}/aggregations/bucket/reverse-nested/)
- [Sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/)
- [Significant terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-terms/)
- [Significant text]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-text/)
- [Terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/terms/)

View File

@ -4,6 +4,8 @@ title: IP range
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 110
redirect_from:
- /query-dsl/aggregations/bucket/ip-range/
---
# IP range aggregations

View File

@ -4,6 +4,8 @@ title: Missing
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 120
redirect_from:
- /query-dsl/aggregations/bucket/missing/
---
# Missing aggregations

View File

@ -4,6 +4,8 @@ title: Multi-terms
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 130
redirect_from:
- /query-dsl/aggregations/multi-terms/
---
# Multi-terms aggregations

View File

@ -4,6 +4,8 @@ title: Nested
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 140
redirect_from:
- /query-dsl/aggregations/bucket/nested/
---
# Nested aggregations

View File

@ -4,6 +4,8 @@ title: Range
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 150
redirect_from:
- /query-dsl/aggregations/bucket/range/
---
# Range aggregations

View File

@ -4,6 +4,8 @@ title: Reverse nested
parent: Bucket aggregations
grand_parent: Aggregations
nav_order: 160
redirect_from:
- /query-dsl/aggregations/bucket/reverse-nested/
---
# Reverse nested aggregations

View File

@ -3,9 +3,10 @@ layout: default
title: Aggregations
has_children: true
nav_order: 5
permalink: /aggregations/
nav_exclude: true
redirect_from:
- /opensearch/aggregations/
- /query-dsl/aggregations/
---
# Aggregations

View File

@ -4,6 +4,8 @@ title: Average
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 10
redirect_from:
- /query-dsl/aggregations/metric/average/
---
# Average aggregations

View File

@ -4,6 +4,8 @@ title: Cardinality
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 20
redirect_from:
- /query-dsl/aggregations/metric/cardinality/
---
# Cardinality aggregations

View File

@ -4,6 +4,8 @@ title: Extended stats
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 30
redirect_from:
- /query-dsl/aggregations/metric/extended-stats/
---
# Extended stats aggregations

View File

@ -4,6 +4,8 @@ title: Geobounds
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 40
redirect_from:
- /query-dsl/aggregations/metric/geobounds/
---
## Geobounds aggregations

View File

@ -0,0 +1,47 @@
---
layout: default
title: Metric aggregations
has_children: true
has_toc: false
nav_order: 2
redirect_from:
- /opensearch/metric-agg/
- /query-dsl/aggregations/metric-agg/
- /aggregations/metric-agg/
- /query-dsl/aggregations/metric/
---
# Metric aggregations
Metric aggregations let you perform simple calculations such as finding the minimum, maximum, and average values of a field.
## Types of metric aggregations
There are two types of metric aggregations: single-value metric aggregations and multi-value metric aggregations.
### Single-value metric aggregations
Single-value metric aggregations return a single metric, for example, `sum`, `min`, `max`, `avg`, `cardinality`, or `value_count`.
### Multi-value metric aggregations
Multi-value metric aggregations return more than one metric. These include `stats`, `extended_stats`, `matrix_stats`, `percentile`, `percentile_ranks`, `geo_bound`, `top_hits`, and `scripted_metric`.
## Supported metric aggregations
OpenSearch supports the following metric aggregations:
- [Average]({{site.url}}{{site.baseurl}}/aggregations/metric/average/)
- [Cardinality]({{site.url}}{{site.baseurl}}/aggregations/metric/cardinality/)
- [Extended stats]({{site.url}}{{site.baseurl}}/aggregations/metric/extended-stats/)
- [Geobounds]({{site.url}}{{site.baseurl}}/aggregations/metric/geobounds/)
- [Matrix stats]({{site.url}}{{site.baseurl}}/aggregations/metric/matrix-stats/)
- [Maximum]({{site.url}}{{site.baseurl}}/aggregations/metric/maximum/)
- [Minimum]({{site.url}}{{site.baseurl}}/aggregations/metric/minimum/)
- [Percentile ranks]({{site.url}}{{site.baseurl}}/aggregations/metric/percentile-ranks/)
- [Percentile]({{site.url}}{{site.baseurl}}/aggregations/metric/percentile/)
- [Scripted metric]({{site.url}}{{site.baseurl}}/aggregations/metric/scripted-metric/)
- [Stats]({{site.url}}{{site.baseurl}}/aggregations/metric/stats/)
- [Sum]({{site.url}}{{site.baseurl}}/aggregations/metric/sum/)
- [Top hits]({{site.url}}{{site.baseurl}}/aggregations/metric/top-hits/)
- [Value count]({{site.url}}{{site.baseurl}}/aggregations/metric/value-count/)

View File

@ -4,6 +4,8 @@ title: Matrix stats
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 50
redirect_from:
- /query-dsl/aggregations/metric/matrix-stats/
---
# Matrix stats aggregations

View File

@ -4,6 +4,8 @@ title: Maximum
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 60
redirect_from:
- /query-dsl/aggregations/metric/maximum/
---
# Maximum aggregations

View File

@ -4,6 +4,8 @@ title: Minimum
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 70
redirect_from:
- /query-dsl/aggregations/metric/minimum/
---
# Minimum aggregations

View File

@ -4,6 +4,8 @@ title: Percentile ranks
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 80
redirect_from:
- /query-dsl/aggregations/metric/percentile-ranks/
---
# Percentile rank aggregations

View File

@ -4,6 +4,8 @@ title: Percentile
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 90
redirect_from:
- /query-dsl/aggregations/metric/percentile/
---
# Percentile aggregations

View File

@ -4,6 +4,8 @@ title: Scripted metric
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 100
redirect_from:
- /query-dsl/aggregations/metric/scripted-metric/
---
# Scripted metric aggregations

View File

@ -1,9 +1,11 @@
---
layout: default
title: Stats aggregations
title: Stats
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 110
redirect_from:
- /query-dsl/aggregations/metric/stats/
---
# Stats aggregations

View File

@ -4,6 +4,8 @@ title: Sum
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 120
redirect_from:
- /query-dsl/aggregations/metric/sum/
---
# Sum aggregations

View File

@ -4,6 +4,8 @@ title: Top hits
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 130
redirect_from:
- /query-dsl/aggregations/metric/top-hits/
---
# Top hits aggregations

View File

@ -4,6 +4,8 @@ title: Value count
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 140
redirect_from:
- /query-dsl/aggregations/metric/value-count/
---
# Value count aggregations

View File

@ -1,12 +1,11 @@
---
layout: default
title: Pipeline aggregations
parent: Aggregations
nav_order: 5
permalink: /aggregations/pipeline-agg/
has_children: false
redirect_from:
- /opensearch/pipeline-agg/
- /query-dsl/aggregations/pipeline-agg/
---
# Pipeline aggregations

View File

@ -0,0 +1,65 @@
---
layout: default
title: Index analyzers
nav_order: 20
---
# Index analyzers
Index analyzers are specified at indexing time and are used to analyze [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) fields when indexing a document.
## Determining which index analyzer to use
To determine which analyzer to use for a field when a document is indexed, OpenSearch examines the following parameters in order:
1. The `analyzer` mapping parameter of the field
1. The `analysis.analyzer.default` index setting
1. The `standard` analyzer (default)
When specifying an index analyzer, keep in mind that in most cases, specifying an analyzer for each `text` field in an index works best. Analyzing both the text field (at indexing time) and the query string (at query time) with the same analyzer ensures that the search uses the same terms as those that are stored in the index.
{: .important }
For information about verifying which analyzer is associated with which field, see [Verifying analyzer settings]({{site.url}}{{site.baseurl}}/analyzers/index/#verifying-analyzer-settings).
## Specifying an index analyzer for a field
When creating index mappings, you can supply the `analyzer` parameter for each [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. For example, the following request specifies the `simple` analyzer for the `text_entry` field:
```json
PUT testindex
{
"mappings": {
"properties": {
"text_entry": {
"type": "text",
"analyzer": "simple"
}
}
}
}
```
{% include copy-curl.html %}
## Specifying a default index analyzer for an index
If you want to use the same analyzer for all text fields in an index, you can specify it in the `analysis.analyzer.default` setting as follows:
```json
PUT testindex
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "simple"
}
}
}
}
}
```
{% include copy-curl.html %}
If you don't specify a default analyzer, the `standard` analyzer is used.
{: .note}

163
_analyzers/index.md Normal file
View File

@ -0,0 +1,163 @@
---
layout: default
title: Text analysis
has_children: true
nav_order: 5
nav_exclude: true
has_toc: false
redirect_from:
- /opensearch/query-dsl/text-analyzers/
- /query-dsl/analyzers/text-analyzers/
- /analyzers/text-analyzers/
---
# Text analysis
When you are searching documents using a full-text search, you want to receive all relevant results and not only exact matches. If you're looking for "walk", you're interested in results that contain any form of the word, like "Walk", "walked", or "walking." To facilitate full-text search, OpenSearch uses text analysis.
Text analysis consists of the following steps:
1. _Tokenize_ text into terms: For example, after tokenization, the phrase `Actions speak louder than words` is split into tokens `Actions`, `speak`, `louder`, `than`, and `words`.
1. _Normalize_ the terms by converting them into a standard format, for example, converting them to lowercase or performing stemming (reducing the word to its root): For example, after normalization, `Actions` becomes `action`, `louder` becomes `loud`, and `words` becomes `word`.
## Analyzers
In OpenSearch, text analysis is performed by an _analyzer_. Each analyzer contains the following sequentially applied components:
1. **Character filters**: First, a character filter receives the original text as a stream of characters and adds, removes, or modifies characters in the text. For example, a character filter can strip HTML characters from a string so that the text `<p><b>Actions</b> speak louder than <em>words</em></p>` becomes `\nActions speak louder than words\n`. The output of a character filter is a stream of characters.
1. **Tokenizer**: Next, a tokenizer receives the stream of characters that has been processed by the character filter and splits the text into individual _tokens_ (usually, words). For example, a tokenizer can split text on white space so that the preceding text becomes [`Actions`, `speak`, `louder`, `than`, `words`]. Tokenizers also maintain metadata about tokens, such as their starting and ending positions in the text. The output of a tokenizer is a stream of tokens.
1. **Token filters**: Last, a token filter receives the stream of tokens from the tokenizer and adds, removes, or modifies tokens. For example, a token filter may lowercase the tokens so that `Actions` becomes `action`, remove stopwords like `than`, or add synonyms like `talk` for the word `speak`.
An analyzer must contain exactly one tokenizer and may contain zero or more character filters and zero or more token filters.
{: .note}
## Built-in analyzers
The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `Its fun to contribute a brand-new PR or 2 to OpenSearch!`.
Analyzer | Analysis performed | Analyzer output
:--- | :--- | :---
**Standard** (default) | - Parses strings into tokens at word boundaries <br> - Removes most punctuation <br> - Converts tokens to lowercase | [`its`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
**Simple** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Converts tokens to lowercase | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
**Whitespace** | - Parses strings into tokens on white space | [`Its`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
**Stop** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Removes stop words <br> - Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
**Keyword** (noop) | - Outputs the entire string unchanged | [`Its fun to contribute a brand-new PR or 2 to OpenSearch!`]
**Pattern** | - Parses strings into tokens using regular expressions <br> - Supports converting strings to lowercase <br> - Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
[**Language**]({{site.url}}/{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
**Fingerprint** | - Parses strings on any non-letter character <br> - Normalizes characters by converting them to ASCII <br> - Converts tokens to lowercase <br> - Sorts, deduplicates, and concatenates tokens into a single token <br> - Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`] <br> Note that the apostrophe was converted to its ASCII counterpart.
## Custom analyzers
If needed, you can combine tokenizers, token filters, and character filters to create a custom analyzer.
## Text analysis at indexing time and query time
OpenSearch performs text analysis on text fields when you index a document and when you send a search request. Depending on the time of text analysis, the analyzers used for it are classified as follows:
- An _index analyzer_ performs analysis at indexing time: When you are indexing a [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field, OpenSearch analyzes it before indexing it. For more information about ways to specify index analyzers, see [Index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/).
- A _search analyzer_ performs analysis at query time: OpenSearch analyzes the query string when you run a full-text query on a text field. For more information about ways to specify search analyzers, see [Search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
In most cases, you should use the same analyzer at both indexing and search time because the text field and the query string will be analyzed in the same way and the resulting tokens will match as expected.
{: .tip}
### Example
When you index a document that has a text field with the text `Actions speak louder than words`, OpenSearch analyzes the text and produces the following list of tokens:
Text field tokens = [`action`, `speak`, `loud`, `than`, `word`]
When you search for documents that match the query `speaking loudly`, OpenSearch analyzes the query string and produces the following list of tokens:
Query string tokens = [`speak`, `loud`]
Then OpenSearch compares each token in the query string against the list of text field tokens and finds that both lists contain the tokens `speak` and `loud`, so OpenSearch returns this document as part of the search results that match the query.
## Testing an analyzer
To test a built-in analyzer and view the list of tokens it generates when a document is indexed, you can use the [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/#apply-a-built-in-analyzer).
Specify the analyzer and the text to be analyzed in the request:
```json
GET /_analyze
{
"analyzer" : "standard",
"text" : "Lets contribute to OpenSearch!"
}
```
{% include copy-curl.html %}
The following image shows the query string.
![Query string with indices]({{site.url}}{{site.baseurl}}/images/string-indices.png)
The response contains each token and its start and end offsets that correspond to the starting index in the original string (inclusive) and the ending index (exclusive):
```json
{
"tokens": [
{
"token": "lets",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "contribute",
"start_offset": 6,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "to",
"start_offset": 17,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "opensearch",
"start_offset": 20,
"end_offset": 30,
"type": "<ALPHANUM>",
"position": 3
}
]
}
```
## Verifying analyzer settings
To verify which analyzer is associated with which field, you can use the get mapping API operation:
```json
GET /testindex/_mapping
```
{% include copy-curl.html %}
The response provides information about the analyzers for each field:
```json
{
"testindex": {
"mappings": {
"properties": {
"text_entry": {
"type": "text",
"analyzer": "simple",
"search_analyzer": "whitespace"
}
}
}
}
}
```
## Next steps
- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).

View File

@ -1,28 +1,26 @@
---
layout: default
title: Language analyzers
nav_order: 45
parent: Text analyzers
nav_order: 10
---
# Language analyzer
OpenSearch supports the following language values with the `analyzer` option:
arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, czech, danish, dutch, english, estonian, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, and thai.
`arabic`, `armenian`, `basque`, `bengali`, `brazilian`, `bulgarian`, `catalan`, `czech`, `danish`, `dutch`, `english`, `estonian`, `finnish`, `french`, `galician`, `german`, `greek`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`, `lithuanian`, `norwegian`, `persian`, `portuguese`, `romanian`, `russian`, `sorani`, `spanish`, `swedish`, `turkish`, and `thai`.
To use the analyzer when you map an index, specify the value within your query. For example, to map your index with the French language analyzer, specify the `french` value for the analyzer field:
```json
"analyzer": "french"
```
```
#### Example request
The following query maps an index with the language analyzer set to `french`:
The following query specifies the `french` language analyzer for the index `my-index`:
```json
PUT my-index-000001
PUT my-index
{
"mappings": {
"properties": {
@ -30,7 +28,7 @@ PUT my-index-000001
"type": "text",
"fields": {
"french": {
"type": "text",
"type": "text",
"analyzer": "french"
}
}

View File

@ -0,0 +1,93 @@
---
layout: default
title: Search analyzers
nav_order: 30
---
# Search analyzers
Search analyzers are specified at query time and are used to analyze the query string when you run a full-text query on a [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field.
## Determining which search analyzer to use
To determine which analyzer to use for a query string at query time, OpenSearch examines the following parameters in order:
1. The `analyzer` parameter of the query
1. The `search_analyzer` mapping parameter of the field
1. The `analysis.analyzer.default_search` index setting
1. The `analyzer` mapping parameter of the field
1. The `standard` analyzer (default)
In most cases, specifying a search analyzer that is different from the index analyzer is not necessary and could negatively impact search result relevance or lead to unexpected search results.
{: .warning}
For information about verifying which analyzer is associated with which field, see [Verifying analyzer settings]({{site.url}}{{site.baseurl}}/analyzers/index/#verifying-analyzer-settings).
## Specifying a search analyzer for a query string
Specify the name of the analyzer you want to use at query time in the `analyzer` field:
```json
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": {
"query": "speak the truth",
"analyzer": "english"
}
}
}
}
```
{% include copy-curl.html %}
Valid values for [built-in analyzers]({{site.url}}/{{site.baseurl}}/analyzers/index/#built-in-analyzers/) are `standard`, `simple`, `whitespace`, `stop`, `keyword`, `pattern`, `fingerprint`, or any supported [language analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index/language-analyzers/).
## Specifying a search analyzer for a field
When creating index mappings, you can provide the `search_analyzer` parameter for each [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. When providing the `search_analyzer`, you must also provide the `analyzer` parameter, which specifies the [index analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index-analyzers/) to be used at indexing time.
For example, the following request specifies the `simple` analyzer as the index analyzer and the `whitespace` analyzer as the search analyzer for the `text_entry` field:
```json
PUT testindex
{
"mappings": {
"properties": {
"text_entry": {
"type": "text",
"analyzer": "simple",
"search_analyzer": "whitespace"
}
}
}
}
```
{% include copy-curl.html %}
## Specifying the default search analyzer for an index
If you want to analyze all query strings at search time with the same analyzer, you can specify the search analyzer in the `analysis.analyzer.default_search` setting. When providing the `analysis.analyzer.default_search`, you must also provide the `analysis.analyzer.default` parameter, which specifies the [index analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index-analyzers/) to be used at indexing time.
For example, the following request specifies the `simple` analyzer as the index analyzer and the `whitespace` analyzer as the search analyzer for the `testindex` index:
```json
PUT testindex
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "simple"
},
"default_search": {
"type": "whitespace"
}
}
}
}
}
```
{% include copy-curl.html %}

View File

@ -1,16 +1,20 @@
---
layout: default
title: Perform text analysis
parent: Analyze API
nav_order: 2
title: Analyze API
has_children: true
nav_order: 7
redirect_from:
- /opensearch/rest-api/analyze-apis/
- /api-reference/analyze-apis/
---
# Perform text analysis
# Analyze API
The perform text analysis API analyzes a text string and returns the resulting tokens.
The Analyze API allows you to perform [text analysis]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/), which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search.
If you use the Security plugin, you must have the `manage index` privilege. If you simply want to analyze text, you must have the `manager cluster` privilege.
The Analyze API analyzes a text string and returns the resulting tokens.
If you use the Security plugin, you must have the `manage index` privilege. If you only want to analyze text, you must have the `manage cluster` privilege.
{: .note}
## Path and HTTP methods
@ -22,7 +26,7 @@ POST /_analyze
POST /{index}/_analyze
```
Although you can issue an analyzer request via both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared to data that is already in the index. `POST` requests are not cached.
Although you can issue an analyze request using both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared with data that is already in the index. `POST` requests are not cached.
{: .note}
## Path parameter

View File

@ -1,12 +0,0 @@
---
layout: default
title: Analyze API
has_children: true
nav_order: 7
redirect_from:
- /opensearch/rest-api/analyze-apis/
---
# Analyze API
The analyze API allows you to perform text analysis, which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search.

View File

@ -20,7 +20,7 @@ If needed, you can combine tokenizers, token filters, and character filters to c
#### Tokenizers
Tokenizers break unstuctured text into tokens and maintain metadata about tokens, such as their start and ending positions in the text.
Tokenizers break unstructured text into tokens and maintain metadata about tokens, such as their starting and ending positions in the text.
#### Character filters

View File

@ -18,7 +18,7 @@ This reference includes the REST APIs supported by OpenSearch. If a REST API is
## Related articles
- [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/index/)
- [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/)
- [Access control API]({{site.url}}{{site.baseurl}}/security/access-control/api/)
- [Alerting API]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/api/)
- [Anomaly detection API]({{site.url}}{{site.baseurl}}/observing-your-data/ad/api/)
@ -43,7 +43,8 @@ This reference includes the REST APIs supported by OpenSearch. If a REST API is
- [Point in Time API]({{site.url}}{{site.baseurl}}/search-plugins/point-in-time-api/)
- [Popular APIs]({{site.url}}{{site.baseurl}}/api-reference/popular-api/)
- [Ranking evaluation]({{site.url}}{{site.baseurl}}/api-reference/rank-eval/)
- [Reload search analyzer]({{site.url}}{{site.baseurl}}/api-reference/reload-search-analyzer/)
- [Refresh search analyzer]({{site.url}}{{site.baseurl}}/im-plugin/refresh-analyzer/)
- [Reload search analyzer]({{site.url}}{{site.baseurl}}/im-plugin/reload-search-analyzer/)
- [Remove cluster information]({{site.url}}{{site.baseurl}}/api-reference/remote-info/)
- [Root cause analysis API]({{site.url}}{{site.baseurl}}/monitoring-your-cluster/pa/rca/api/)
- [Snapshot management API]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/sm-api/)

View File

@ -70,9 +70,15 @@ collections:
reporting:
permalink: /:collection/:path/
output: true
analyzers:
permalink: /:collection/:path/
output: true
query-dsl:
permalink: /:collection/:path/
output: true
aggregations:
permalink: /:collection/:path/
output: true
field-types:
permalink: /:collection/:path/
output: true
@ -133,8 +139,14 @@ just_the_docs:
field-types:
name: Mappings and field types
nav_fold: true
analyzers:
name: Text analysis
nav_fold: true
query-dsl:
name: Query DSL, Aggregations, and Analyzers
name: Query DSL
nav_fold: true
aggregations:
name: Aggregations
nav_fold: true
search-plugins:
name: Search

View File

@ -52,7 +52,7 @@ The flat object field type supports the following queries:
- [Range]({{site.url}}{{site.baseurl}}/query-dsl/term#range)
- [Match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#match)
- [Multi-match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#multi-match)
- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#query-string)
- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/)
- [Simple query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#simple-query-string)
- [Exists]({{site.url}}{{site.baseurl}}/query-dsl/term#exists)

View File

@ -2,10 +2,9 @@
layout: default
title: Refresh search analyzer
nav_order: 50
parent: Text analyzers
has_toc: false
redirect_from:
- /im-plugin/refresh-analyzer/
- /query-dsl/analyzers/refresh-analyzer/
- /im-plugin/refresh-analyzer/index/
---

View File

@ -62,7 +62,7 @@ You've created a new visualization that can be added to a new or existing dashbo
### Limitations of event analytics visualizations
Event analytics visualizations currently do not support [Dashboards Query Language (DQL)]({{site.url}}{{site.baseurl}}/dashboards/discover/dql/) or [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/), and they do not use index patterns. Note the following limitations:
Event analytics visualizations currently do not support [Dashboards Query Language (DQL)]({{site.url}}{{site.baseurl}}/dashboards/discover/dql/) or [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/), and they do not use index patterns. Note the following limitations:
- Event analytics visualizations only use filters created using the dropdown interface. If you have DQL query or DSL filters in a dashboard, the visualizations do not use them.
- The **Dashboard** filter dropdown interface only shows fields from the default index pattern or index patterns used by other visualizations in the same dashboard.

View File

@ -1,18 +0,0 @@
---
layout: default
title: Bucket aggregations
parent: Aggregations
has_children: true
has_toc: true
nav_order: 3
redirect_from:
- /opensearch/bucket-agg/
- /query-dsl/aggregations/bucket-agg/
- /aggregations/bucket-agg/
---
# Bucket aggregations
Bucket aggregations categorize sets of documents as buckets. The type of bucket aggregation determines whether a given document falls into a bucket or not.
You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users narrow down the results.

View File

@ -1,28 +0,0 @@
---
layout: default
title: Metric aggregations
parent: Aggregations
has_children: true
has_toc: true
nav_order: 2
redirect_from:
- /opensearch/metric-agg/
- /query-dsl/aggregations/metric-agg/
- /aggregations/metric-agg/
---
# Metric aggregations
Metric aggregations let you perform simple calculations such as finding the minimum, maximum, and average values of a field.
## Types of metric aggregations
Metric aggregations are of two types: single-value metric aggregations and multi-value metric aggregations.
### Single-value metric aggregations
Single-value metric aggregations return a single metric. For example, `sum`, `min`, `max`, `avg`, `cardinality`, and `value_count`.
### Multi-value metric aggregations
Multi-value metric aggregations return more than one metric. For example, `stats`, `extended_stats`, `matrix_stats`, `percentile`, `percentile_ranks`, `geo_bound`, `top_hits`, and `scripted_metric`.

View File

@ -1,75 +0,0 @@
---
layout: default
title: Text analyzers
nav_order: 190
has_children: true
permalink: /analyzers/text-analyzers/
redirect_from:
- /opensearch/query-dsl/text-analyzers/
- /query-dsl/analyzers/text-analyzers/
---
# Optimizing text for searches with text analyzers
OpenSearch applies text analysis during indexing or searching for `text` fields. There is a standard analyzer that OpenSearch uses by default for text analysis. To optimize unstructured text for search, you can convert it into structured text with our text analyzers.
## Text analyzers
OpenSearch provides several text analyzers to convert your structured text into the format that works best for your searches.
OpenSearch supports the following text analyzers:
- **Standard analyzer** Parses strings into terms at word boundaries according to the Unicode text segmentation algorithm. It removes most, but not all, punctuation and converts strings to lowercase. You can remove stop words if you enable that option, but it does not remove stop words by default.
- **Simple analyzer** Converts strings to lowercase and removes non-letter characters when it splits a string into tokens on any non-letter character.
- **Whitespace analyzer** Parses strings into terms between each whitespace.
- **Stop analyzer** Converts strings to lowercase and removes non-letter characters by splitting strings into tokens at each non-letter character. It also removes stop words (for example, "but" or "this") from strings.
- **Keyword analyzer** Receives a string as input and outputs the entire string as one term.
- **Pattern analyzer** Splits strings into terms using regular expressions and supports converting strings to lowercase. It also supports removing stop words.
- **Language analyzer** Provides analyzers specific to multiple languages.
- **Fingerprint analyzer** Creates a fingerprint to use as a duplicate detector.
The full specialized text analyzers reference is in progress and will be published soon.
{: .note }
## How to use text analyzers
If you want to use a text analyzer, specify the name of the analyzer for the `analyzer` field: standard, simple, whitespace, stop, keyword, pattern, fingerprint, or language.
Each analyzer consists of one tokenizer and zero or more token filters. Different analyzers have different character filters, tokenizers, and token filters. To pre-process the string before the tokenizer is applied, you can use one or more character filters.
#### Example: Specify the standard analyzer in a simple query
```json
GET _search
{
"query": {
"match": {
"title": "A brief history of Time",
"analyzer": "standard"
}
}
}
```
## Analyzer options
Option | Valid values | Description
:--- | :--- | :---
`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, language, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (for example, "an," "but," "this") from the query string. For a full list of acceptable language values, see [Language analyzer]({{site.url}}{{site.baseurl}}/query-dsl/analyzers/language-analyzers/) on this page.
`quote_analyzer` | String | This option lets you choose to use the standard analyzer without any options, such as `language` or other analyzers. Usage is `"quote_analyzer": "standard"`.
<!-- This is a list of the 7 individual new pages we need to write
If you want to select one of the text analyzers, see [Text analyzers reference]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/specialized-analyzers).
## Specialized text analyzers
1. Standard analyzer
1. Simple
1. Whitespace
1. Stop
1. Keyword
1. Pattern
1. Language
1. Fingerprint
-->

View File

@ -4,10 +4,10 @@ title: Boolean queries
parent: Compound queries
grand_parent: Query DSL
nav_order: 10
permalink: /query-dsl/compound/bool/
redirect_from:
- /opensearch/query-dsl/compound/bool/
- /opensearch/query-dsl/bool/
- /query-dsl/query-dsl/compound/bool/
---
# Boolean queries

View File

@ -4,6 +4,8 @@ title: Boosting queries
parent: Compound queries
grand_parent: Query DSL
nav_order: 30
redirect_from:
- /query-dsl/query-dsl/compound/boosting/
---
# Boosting queries

View File

@ -4,6 +4,8 @@ title: Constant score queries
parent: Compound queries
grand_parent: Query DSL
nav_order: 40
redirect_from:
- /query-dsl/query-dsl/compound/constant-score/
---
# Constant score queries

View File

@ -4,6 +4,8 @@ title: Disjunction max queries
parent: Compound queries
grand_parent: Query DSL
nav_order: 50
redirect_from:
- /query-dsl/query-dsl/compound/disjunction-max/
---
# Disjunction max queries

View File

@ -5,6 +5,8 @@ parent: Compound queries
grand_parent: Query DSL
nav_order: 60
has_math: true
redirect_from:
- /query-dsl/query-dsl/compound/function-score/
---
# Function score queries

View File

@ -1,12 +1,11 @@
---
layout: default
title: Compound queries
parent: Query DSL
has_children: true
nav_order: 40
permalink: /query-dsl/compound/
redirect_from:
- /opensearch/query-dsl/compound/index/
- /query-dsl/query-dsl/compound/
---
# Compound queries

View File

@ -1,13 +1,13 @@
---
layout: default
title: Full-text queries
parent: Query DSL
has_children: true
nav_order: 30
permalink: /query-dsl/full-text/
redirect_from:
- /opensearch/query-dsl/full-text/
- /opensearch/query-dsl/full-text/index/
- /query-dsl/query-dsl/full-text/
- /query-dsl/full-text/
---
# Full-text queries
@ -20,9 +20,6 @@ To learn more about search query classes, see [Lucene query JavaDocs](https://lu
The full-text query types shown in this section use the standard analyzer, which analyzes text automatically when the query is submitted.
You can also analyze fields when you index them. To learn more about how to convert unstructured text into structured text that is optimized for search, see [Optimizing text for searches with text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers).
{: .note }
<!-- to do: rewrite query type definitions per issue: https://github.com/opensearch-project/documentation-website/issues/1116
-->
---
@ -428,7 +425,7 @@ GET _search
-->
## Advanced filter options
You can filter your query results by using some of the optional query fields, such as wildcards, fuzzy query fields, and synonyms. You can also use analyzers as optional query fields. To learn more, see [How to use text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers/#how-to-use-text-analyzers).
You can filter your query results by using some of the optional query fields, such as wildcards, fuzzy query fields, or synonyms. You can also use analyzers as optional query fields.
### Wildcard options

View File

@ -4,9 +4,10 @@ title: Query string queries
parent: Full-text queries
grand_parent: Query DSL
nav_order: 25
permalink: /query-dsl/full-text/query-string/
redirect_from:
- /opensearch/query-dsl/full-text/query-string/
- /query-dsl/query-dsl/full-text/query-string/
---
# Query string queries

View File

@ -4,9 +4,9 @@ title: Geo-bounding box queries
parent: Geographic and xy queries
grand_parent: Query DSL
nav_order: 10
permalink: /query-dsl/geo-and-xy/geo-bounding-box/
redirect_from:
- /opensearch/query-dsl/geo-and-xy/geo-bounding-box/
- /query-dsl/query-dsl/geo-and-xy/geo-bounding-box/
---
# Geo-bounding box queries

View File

@ -1,12 +1,12 @@
---
layout: default
title: Geographic and xy queries
parent: Query DSL
has_children: true
nav_order: 50
permalink: /query-dsl/geo-and-xy/
redirect_from:
- /opensearch/query-dsl/geo-and-xy/index/
- /query-dsl/query-dsl/geo-and-xy/
- /query-dsl/query-dsl/geo-and-xy/index/
---
# Geographic and xy queries

View File

@ -4,9 +4,10 @@ title: xy queries
parent: Geographic and xy queries
grand_parent: Query DSL
nav_order: 50
permalink: /query-dsl/geo-and-xy/xy/
redirect_from:
- /opensearch/query-dsl/geo-and-xy/xy/
- /query-dsl/query-dsl/geo-and-xy/xy/
---
# xy queries

View File

@ -1,16 +1,85 @@
---
layout: default
title: Query DSL, aggregations, and analyzers
nav_order: 1
has_children: false
has_toc: false
title: Query DSL
nav_order: 2
has_children: true
nav_exclude: true
redirect_from:
- /opensearch/query-dsl/
- /opensearch/query-dsl/index/
- /docs/opensearch/query-dsl/
- /query-dsl/query-dsl/
- /query-dsl/
---
# Query DSL, aggregations, and analyzers
{%- comment -%}The `/docs/opensearch/query-dsl/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
[Analyzers]({{site.url}}{{site.baseurl}}/analyzers/text-analyzers/) process text to make it searchable. OpenSearch provides various analyzers that let you customize the way text is split into terms and converted into a structured format. To search documents written in a different language, you can use one of the built-in [language analyzers]({{site.url}}{{site.baseurl}}/query-dsl/analyzers/language-analyzers/) for your language of choice.
# Query DSL
The most essential search function is using a query to return relevant documents. OpenSearch provides a search language called _query domain-specific language_ (DSL) that lets you build complex and targeted queries. Explore the [query DSL documentation]({{site.url}}{{site.baseurl}}/query-dsl/) to learn more about the different types of queries OpenSearch supports.
OpenSearch provides a search language called *query domain-specific language (DSL)* that you can use to search your data. Query DSL is a flexible language with a JSON interface.
[Aggregations]({{site.url}}{{site.baseurl}}/aggregations/) let you categorize your data and analyze it to extract statistics. Use cases for aggregations include analyzing data in real time and using OpenSearch Dashboards to create visualizations.
With query DSL, you need to specify a query in the `query` parameter of the search. One of the simplest searches in OpenSearch uses the `match_all` query, which matches all documents in an index:
```json
GET testindex/_search
{
"query": {
"match_all": {
}
}
}
```
A query can consist of many query clauses. You can combine query clauses to produce complex queries.
Broadly, you can classify queries into two categories---*leaf queries* and *compound queries*:
- **Leaf queries**: Leaf queries search for a specified value in a certain field or fields. You can use leaf queries on their own. They include the following query types:
- **Full-text queries**: Use full-text queries to search text documents. For an analyzed text field search, full-text queries split the query string into terms using the same analyzer that was used when the field was indexed. For an exact value search, full-text queries look for the specified value without applying text analysis. To learn more, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index/).
- **Term-level queries**: Use term-level queries to search documents for an exact term, such as an ID or value range. Term-level queries do not analyze search terms or sort results by relevance score. To learn more, see [Term-level queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/).
- **Geographic and xy queries**: Use geographic queries to search documents that include geographic data. Use xy queries to search documents that include points and shapes in a two-dimensional coordinate system. To learn more, see [Geographic and xy queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/index).
- **Joining queries**: Use joining queries to search nested fields or return parent and child documents that match a specific query. Types of joining queries include `nested`, `has_child`, `has_parent`, and `parent_id` queries.
- **Span queries**: Use span queries to perform precise positional searches. Span queries are low-level, specific queries that provide control over the order and proximity of specified query terms. They are primarily used to search legal documents. To learn more, see [Span queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/span-query/).
- **Specialized queries**: Specialized queries include all other query types (`distance_feature`, `more_like_this`, `percolate`, `rank_feature`, `script`, `script_score`, `wrapper`, and `pinned_query`).
- **Compound queries**: Compound queries serve as wrappers for multiple leaf or compound clauses, either to combine their results or to modify their behavior. They include the Boolean, disjunction max, constant score, function score, and boosting query types. To learn more, see [Compound queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/index/).
## A note on Unicode special characters in text fields
Because of word boundaries associated with Unicode special characters, the Unicode standard analyzer cannot index a [text field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) value as a whole value when it includes one of these special characters. As a result, a text field value that includes a special character is parsed by the standard analyzer as multiple values separated by the special character, effectively tokenizing the different elements on either side of it. This can lead to unintentional filtering of documents and potentially compromise control over their access.
The following examples illustrate values containing special characters that will be parsed improperly by the standard analyzer. In this example, the existence of the hyphen/minus sign in the value prevents the analyzer from distinguishing between the two different users for `user.id` and interprets them as being one and the same:
```json
{
"bool": {
"must": {
"match": {
"user.id": "User-1"
}
}
}
}
```
```json
{
"bool": {
"must": {
"match": {
"user.id": "User-2"
}
}
}
}
```
To avoid this circumstance when using either query DSL or the REST API, you can use a custom analyzer or map the field as `keyword`, which performs an exact-match search. See [Keyword field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) for the latter option.
For a list of characters that should be avoided when using `text` field types, see [Word Boundaries](https://unicode.org/reports/tr29/#Word_Boundaries).

View File

@ -1,8 +1,9 @@
---
layout: default
title: Minimum should match
parent: Query DSL
nav_order: 70
redirect_from:
- /query-dsl/query-dsl/minimum-should-match/
---
# Minimum should match

View File

@ -1,83 +0,0 @@
---
layout: default
title: Query DSL
nav_order: 2
has_children: true
permalink: /query-dsl/
redirect_from:
- /opensearch/query-dsl/
- /opensearch/query-dsl/index/
- /docs/opensearch/query-dsl/
---
{%- comment -%}The `/docs/opensearch/query-dsl/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# Query DSL
OpenSearch provides a search language called *query domain-specific language (DSL)* that you can use to search your data. Query DSL is a flexible language with a JSON interface.
With query DSL, you need to specify a query in the `query` parameter of the search. One of the simplest searches in OpenSearch uses the `match_all` query, which matches all documents in an index:
```json
GET testindex/_search
{
"query": {
"match_all": {
}
}
}
```
A query can consist of many query clauses. You can combine query clauses to produce complex queries.
Broadly, you can classify queries into two categories---*leaf queries* and *compound queries*:
- **Leaf queries**: Leaf queries search for a specified value in a certain field or fields. You can use leaf queries on their own. They include the following query types:
- **Full-text queries**: Use full-text queries to search text documents. For an analyzed text field search, full-text queries split the query string into terms with the same analyzer that was used when the field was indexed. For an exact value search, full-text queries look for the specified value without applying text analysis. To learn more, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index).
- **Term-level queries**: Use term-level queries to search documents for an exact specified term, such as an ID or value range. Term-level queries do not analyze search terms or sort results by relevance score. To learn more, see [Term-level queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/).
- **Geographic and xy queries**: Use geographic queries to search documents that include geographic data. Use xy queries to search documents that include points and shapes in a two-dimensional coordinate system. To learn more, see [Geographic and xy queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/index).
- **Joining queries**: Use joining queries to search nested fields or return parent and child documents that match a specific query. Types of joining queries include `nested`, `has_child`, `has_parent`, and `parent_id` queries.
- **Span queries**: Use span queries to perform precise positional searches. Span queries are low-level, specific queries that provide control over the order and proximity of specified query terms. They are primarily used to search legal documents. To learn more, see [Span queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/span-query/).
- **Specialized queries**: Specialized queries include all other query types (`distance_feature`, `more_like_this`, `percolate`, `rank_feature`, `script`, `script_score`, `wrapper`, and `pinned_query`).
- **Compound queries**: Compound queries serve as wrappers for multiple leaf or compound clauses either to combine their results or to modify their behavior. They include the Boolean, disjunction max, constant score, function score, and boosting query types. To learn more, see [Compound queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/index).
## A note on Unicode special characters in text fields
Because of word boundaries associated with Unicode special characters, the Unicode standard analyzer cannot index a [text field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) value as a whole value when it includes one of these special characters. As a result, a text field value that includes a special character is parsed by the standard analyzer as multiple values separated by the special character, effectively tokenizing the different elements on either side of it. This can lead to unintentional filtering of documents and potentially compromise control over their access.
The following examples illustrate values containing special characters that will be parsed improperly by the standard analyzer. In this example, the existence of the hyphen/minus sign in the value prevents the analyzer from distinguishing between the two different users for `user.id` and interprets them as one and the same:
```json
{
"bool": {
"must": {
"match": {
"user.id": "User-1"
}
}
}
}
```
```json
{
"bool": {
"must": {
"match": {
"user.id": "User-2"
}
}
}
}
```
To avoid this circumstance when using either query DSL or the REST API, you can use a custom analyzer or map the field as `keyword`, which performs an exact-match search. See [Keyword field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) for the latter option.
For a list of characters that should be avoided for `text` field types, see [Word Boundaries](https://unicode.org/reports/tr29/#Word_Boundaries).

View File

@ -1,9 +1,9 @@
---
layout: default
title: Query and filter context
parent: Query DSL
permalink: /query-dsl/query-filter-context/
nav_order: 5
redirect_from:
- /query-dsl/query-dsl/query-filter-context/
---
# Query and filter context

View File

@ -1,11 +1,10 @@
---
layout: default
title: Span queries
parent: Query DSL
nav_order: 60
permalink: /query-dsl/span-query/
redirect_from:
- /opensearch/query-dsl/span-query/
- /query-dsl/query-dsl/span-query/
---
# Span queries

View File

@ -1,9 +1,9 @@
---
layout: default
title: Term-level and full-text queries compared
parent: Query DSL
permalink: /query-dsl/term-vs-full-text/
nav_order: 10
redirect_from:
- /query-dsl/query-dsl/term-vs-full-text
---
# Term-level and full-text queries compared

View File

@ -1,11 +1,10 @@
---
layout: default
title: Term-level queries
parent: Query DSL
nav_order: 20
permalink: /query-dsl/term/
redirect_from:
- /opensearch/query-dsl/term/
- /query-dsl/query-dsl/term/
---
# Term-level queries

View File

@ -11,7 +11,7 @@ nav_exclude: true
OpenSearch provides several features for customizing your search use cases and improving search relevance. In OpenSearch, you can:
- Use [SQL and Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) as alternatives to [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/) to search data.
- Use [SQL and Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) as alternatives to [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/) for searching data.
- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).

BIN
images/string-indices.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB