Split query DSL, analyzer, and aggregation sections and add more to analyzer section (#4693)

* Add analyzer documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add index and search analyzer pages

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Doc review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* More doc review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update index-analyzers.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
kolchfa-aws 2023-08-08 09:41:55 -04:00 committed by GitHub
parent aed1d68ae0
commit a87fdc0f63
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
76 changed files with 624 additions and 276 deletions

View File

@ -4,6 +4,8 @@ title: Adjacency matrix
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 10 nav_order: 10
redirect_from:
- /query-dsl/aggregations/bucket/adjacency-matrix/
--- ---
# Adjacency matrix aggregations # Adjacency matrix aggregations

View File

@ -4,6 +4,8 @@ title: Date histogram
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 20 nav_order: 20
redirect_from:
- /query-dsl/aggregations/bucket/date-histogram/
--- ---
# Date histogram aggregations # Date histogram aggregations

View File

@ -4,6 +4,8 @@ title: Date range
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 30 nav_order: 30
redirect_from:
- /query-dsl/aggregations/bucket/date-range/
--- ---
# Date range aggregations # Date range aggregations

View File

@ -4,6 +4,8 @@ title: Diversified sampler
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 40 nav_order: 40
redirect_from:
- /query-dsl/aggregations/bucket/diversified-sampler/
--- ---
# Diversified sampler aggregations # Diversified sampler aggregations

View File

@ -4,6 +4,8 @@ title: Filter
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 50 nav_order: 50
redirect_from:
- /query-dsl/aggregations/bucket/filter/
--- ---
# Filter aggregations # Filter aggregations

View File

@ -4,6 +4,8 @@ title: Filters
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 60 nav_order: 60
redirect_from:
- /query-dsl/aggregations/bucket/filters/
--- ---
# Filters aggregations # Filters aggregations

View File

@ -4,6 +4,8 @@ title: Geodistance
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 70 nav_order: 70
redirect_from:
- /query-dsl/aggregations/bucket/geo-distance/
--- ---
# Geodistance aggregations # Geodistance aggregations

View File

@ -4,6 +4,8 @@ title: Geohash grid
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 80 nav_order: 80
redirect_from:
- /query-dsl/aggregations/bucket/geohash-grid/
--- ---
# Geohash grid aggregations # Geohash grid aggregations

View File

@ -7,6 +7,7 @@ nav_order: 85
redirect_from: redirect_from:
- /aggregations/geohexgrid/ - /aggregations/geohexgrid/
- /query-dsl/aggregations/geohexgrid/ - /query-dsl/aggregations/geohexgrid/
- /query-dsl/aggregations/bucket/geohex-grid/
--- ---
# Geohex grid aggregations # Geohex grid aggregations

View File

@ -4,6 +4,8 @@ title: Geotile grid
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 87 nav_order: 87
redirect_from:
- /query-dsl/aggregations/bucket/geotile-grid/
--- ---
# Geotile grid aggregations # Geotile grid aggregations

View File

@ -4,6 +4,8 @@ title: Global
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 90 nav_order: 90
redirect_from:
- /query-dsl/aggregations/bucket/global/
--- ---
# Global aggregations # Global aggregations

View File

@ -4,6 +4,8 @@ title: Histogram
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 100 nav_order: 100
redirect_from:
- /query-dsl/aggregations/bucket/histogram/
--- ---
# Histogram aggregations # Histogram aggregations

View File

@ -0,0 +1,45 @@
---
layout: default
title: Bucket aggregations
has_children: true
has_toc: false
nav_order: 3
redirect_from:
- /opensearch/bucket-agg/
- /query-dsl/aggregations/bucket-agg/
- /query-dsl/aggregations/bucket/
- /aggregations/bucket-agg/
---
# Bucket aggregations
Bucket aggregations categorize sets of documents as buckets. The type of bucket aggregation determines the bucket for a given document.
You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users filter the results.
## Supported bucket aggregations
OpenSearch supports the following bucket aggregations:
- [Adjacency matrix]({{site.url}}{{site.baseurl}}/aggregations/bucket/adjacency-matrix/)
- [Date histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/)
- [Date range]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-range/)
- [Diversified sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/diversified-sampler/)
- [Filter]({{site.url}}{{site.baseurl}}/aggregations/bucket/filter/)
- [Filters]({{site.url}}{{site.baseurl}}/aggregations/bucket/filters/)
- [Geodistance]({{site.url}}{{site.baseurl}}/aggregations/bucket/geo-distance/)
- [Geohash grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geohash-grid/)
- [Geohex grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geohex-grid/)
- [Geotile grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geotile-grid/)
- [Global]({{site.url}}{{site.baseurl}}/aggregations/bucket/global/)
- [Histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/histogram/)
- [IP range]({{site.url}}{{site.baseurl}}/aggregations/bucket/ip-range/)
- [Missing]({{site.url}}{{site.baseurl}}/aggregations/bucket/missing/)
- [Multi-terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/multi-terms/)
- [Nested]({{site.url}}{{site.baseurl}}/aggregations/bucket/nested/)
- [Range]({{site.url}}{{site.baseurl}}/aggregations/bucket/range/)
- [Reverse nested]({{site.url}}{{site.baseurl}}/aggregations/bucket/reverse-nested/)
- [Sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/)
- [Significant terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-terms/)
- [Significant text]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-text/)
- [Terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/terms/)

View File

@ -4,6 +4,8 @@ title: IP range
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 110 nav_order: 110
redirect_from:
- /query-dsl/aggregations/bucket/ip-range/
--- ---
# IP range aggregations # IP range aggregations

View File

@ -4,6 +4,8 @@ title: Missing
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 120 nav_order: 120
redirect_from:
- /query-dsl/aggregations/bucket/missing/
--- ---
# Missing aggregations # Missing aggregations

View File

@ -4,6 +4,8 @@ title: Multi-terms
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 130 nav_order: 130
redirect_from:
- /query-dsl/aggregations/multi-terms/
--- ---
# Multi-terms aggregations # Multi-terms aggregations

View File

@ -4,6 +4,8 @@ title: Nested
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 140 nav_order: 140
redirect_from:
- /query-dsl/aggregations/bucket/nested/
--- ---
# Nested aggregations # Nested aggregations

View File

@ -4,6 +4,8 @@ title: Range
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 150 nav_order: 150
redirect_from:
- /query-dsl/aggregations/bucket/range/
--- ---
# Range aggregations # Range aggregations

View File

@ -4,6 +4,8 @@ title: Reverse nested
parent: Bucket aggregations parent: Bucket aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 160 nav_order: 160
redirect_from:
- /query-dsl/aggregations/bucket/reverse-nested/
--- ---
# Reverse nested aggregations # Reverse nested aggregations

View File

@ -3,9 +3,10 @@ layout: default
title: Aggregations title: Aggregations
has_children: true has_children: true
nav_order: 5 nav_order: 5
permalink: /aggregations/ nav_exclude: true
redirect_from: redirect_from:
- /opensearch/aggregations/ - /opensearch/aggregations/
- /query-dsl/aggregations/
--- ---
# Aggregations # Aggregations

View File

@ -4,6 +4,8 @@ title: Average
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 10 nav_order: 10
redirect_from:
- /query-dsl/aggregations/metric/average/
--- ---
# Average aggregations # Average aggregations

View File

@ -4,6 +4,8 @@ title: Cardinality
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 20 nav_order: 20
redirect_from:
- /query-dsl/aggregations/metric/cardinality/
--- ---
# Cardinality aggregations # Cardinality aggregations

View File

@ -4,6 +4,8 @@ title: Extended stats
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 30 nav_order: 30
redirect_from:
- /query-dsl/aggregations/metric/extended-stats/
--- ---
# Extended stats aggregations # Extended stats aggregations

View File

@ -4,6 +4,8 @@ title: Geobounds
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 40 nav_order: 40
redirect_from:
- /query-dsl/aggregations/metric/geobounds/
--- ---
## Geobounds aggregations ## Geobounds aggregations

View File

@ -0,0 +1,47 @@
---
layout: default
title: Metric aggregations
has_children: true
has_toc: false
nav_order: 2
redirect_from:
- /opensearch/metric-agg/
- /query-dsl/aggregations/metric-agg/
- /aggregations/metric-agg/
- /query-dsl/aggregations/metric/
---
# Metric aggregations
Metric aggregations let you perform simple calculations such as finding the minimum, maximum, and average values of a field.
## Types of metric aggregations
There are two types of metric aggregations: single-value metric aggregations and multi-value metric aggregations.
### Single-value metric aggregations
Single-value metric aggregations return a single metric, for example, `sum`, `min`, `max`, `avg`, `cardinality`, or `value_count`.
### Multi-value metric aggregations
Multi-value metric aggregations return more than one metric. These include `stats`, `extended_stats`, `matrix_stats`, `percentile`, `percentile_ranks`, `geo_bound`, `top_hits`, and `scripted_metric`.
## Supported metric aggregations
OpenSearch supports the following metric aggregations:
- [Average]({{site.url}}{{site.baseurl}}/aggregations/metric/average/)
- [Cardinality]({{site.url}}{{site.baseurl}}/aggregations/metric/cardinality/)
- [Extended stats]({{site.url}}{{site.baseurl}}/aggregations/metric/extended-stats/)
- [Geobounds]({{site.url}}{{site.baseurl}}/aggregations/metric/geobounds/)
- [Matrix stats]({{site.url}}{{site.baseurl}}/aggregations/metric/matrix-stats/)
- [Maximum]({{site.url}}{{site.baseurl}}/aggregations/metric/maximum/)
- [Minimum]({{site.url}}{{site.baseurl}}/aggregations/metric/minimum/)
- [Percentile ranks]({{site.url}}{{site.baseurl}}/aggregations/metric/percentile-ranks/)
- [Percentile]({{site.url}}{{site.baseurl}}/aggregations/metric/percentile/)
- [Scripted metric]({{site.url}}{{site.baseurl}}/aggregations/metric/scripted-metric/)
- [Stats]({{site.url}}{{site.baseurl}}/aggregations/metric/stats/)
- [Sum]({{site.url}}{{site.baseurl}}/aggregations/metric/sum/)
- [Top hits]({{site.url}}{{site.baseurl}}/aggregations/metric/top-hits/)
- [Value count]({{site.url}}{{site.baseurl}}/aggregations/metric/value-count/)

View File

@ -4,6 +4,8 @@ title: Matrix stats
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 50 nav_order: 50
redirect_from:
- /query-dsl/aggregations/metric/matrix-stats/
--- ---
# Matrix stats aggregations # Matrix stats aggregations

View File

@ -4,6 +4,8 @@ title: Maximum
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 60 nav_order: 60
redirect_from:
- /query-dsl/aggregations/metric/maximum/
--- ---
# Maximum aggregations # Maximum aggregations

View File

@ -4,6 +4,8 @@ title: Minimum
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 70 nav_order: 70
redirect_from:
- /query-dsl/aggregations/metric/minimum/
--- ---
# Minimum aggregations # Minimum aggregations

View File

@ -4,6 +4,8 @@ title: Percentile ranks
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 80 nav_order: 80
redirect_from:
- /query-dsl/aggregations/metric/percentile-ranks/
--- ---
# Percentile rank aggregations # Percentile rank aggregations

View File

@ -4,6 +4,8 @@ title: Percentile
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 90 nav_order: 90
redirect_from:
- /query-dsl/aggregations/metric/percentile/
--- ---
# Percentile aggregations # Percentile aggregations

View File

@ -4,6 +4,8 @@ title: Scripted metric
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 100 nav_order: 100
redirect_from:
- /query-dsl/aggregations/metric/scripted-metric/
--- ---
# Scripted metric aggregations # Scripted metric aggregations

View File

@ -1,9 +1,11 @@
--- ---
layout: default layout: default
title: Stats aggregations title: Stats
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 110 nav_order: 110
redirect_from:
- /query-dsl/aggregations/metric/stats/
--- ---
# Stats aggregations # Stats aggregations

View File

@ -4,6 +4,8 @@ title: Sum
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 120 nav_order: 120
redirect_from:
- /query-dsl/aggregations/metric/sum/
--- ---
# Sum aggregations # Sum aggregations

View File

@ -4,6 +4,8 @@ title: Top hits
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 130 nav_order: 130
redirect_from:
- /query-dsl/aggregations/metric/top-hits/
--- ---
# Top hits aggregations # Top hits aggregations

View File

@ -4,6 +4,8 @@ title: Value count
parent: Metric aggregations parent: Metric aggregations
grand_parent: Aggregations grand_parent: Aggregations
nav_order: 140 nav_order: 140
redirect_from:
- /query-dsl/aggregations/metric/value-count/
--- ---
# Value count aggregations # Value count aggregations

View File

@ -1,12 +1,11 @@
--- ---
layout: default layout: default
title: Pipeline aggregations title: Pipeline aggregations
parent: Aggregations
nav_order: 5 nav_order: 5
permalink: /aggregations/pipeline-agg/
has_children: false has_children: false
redirect_from: redirect_from:
- /opensearch/pipeline-agg/ - /opensearch/pipeline-agg/
- /query-dsl/aggregations/pipeline-agg/
--- ---
# Pipeline aggregations # Pipeline aggregations

View File

@ -0,0 +1,65 @@
---
layout: default
title: Index analyzers
nav_order: 20
---
# Index analyzers
Index analyzers are specified at indexing time and are used to analyze [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) fields when indexing a document.
## Determining which index analyzer to use
To determine which analyzer to use for a field when a document is indexed, OpenSearch examines the following parameters in order:
1. The `analyzer` mapping parameter of the field
1. The `analysis.analyzer.default` index setting
1. The `standard` analyzer (default)
When specifying an index analyzer, keep in mind that in most cases, specifying an analyzer for each `text` field in an index works best. Analyzing both the text field (at indexing time) and the query string (at query time) with the same analyzer ensures that the search uses the same terms as those that are stored in the index.
{: .important }
For information about verifying which analyzer is associated with which field, see [Verifying analyzer settings]({{site.url}}{{site.baseurl}}/analyzers/index/#verifying-analyzer-settings).
## Specifying an index analyzer for a field
When creating index mappings, you can supply the `analyzer` parameter for each [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. For example, the following request specifies the `simple` analyzer for the `text_entry` field:
```json
PUT testindex
{
"mappings": {
"properties": {
"text_entry": {
"type": "text",
"analyzer": "simple"
}
}
}
}
```
{% include copy-curl.html %}
## Specifying a default index analyzer for an index
If you want to use the same analyzer for all text fields in an index, you can specify it in the `analysis.analyzer.default` setting as follows:
```json
PUT testindex
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "simple"
}
}
}
}
}
```
{% include copy-curl.html %}
If you don't specify a default analyzer, the `standard` analyzer is used.
{: .note}

163
_analyzers/index.md Normal file
View File

@ -0,0 +1,163 @@
---
layout: default
title: Text analysis
has_children: true
nav_order: 5
nav_exclude: true
has_toc: false
redirect_from:
- /opensearch/query-dsl/text-analyzers/
- /query-dsl/analyzers/text-analyzers/
- /analyzers/text-analyzers/
---
# Text analysis
When you are searching documents using a full-text search, you want to receive all relevant results and not only exact matches. If you're looking for "walk", you're interested in results that contain any form of the word, like "Walk", "walked", or "walking." To facilitate full-text search, OpenSearch uses text analysis.
Text analysis consists of the following steps:
1. _Tokenize_ text into terms: For example, after tokenization, the phrase `Actions speak louder than words` is split into tokens `Actions`, `speak`, `louder`, `than`, and `words`.
1. _Normalize_ the terms by converting them into a standard format, for example, converting them to lowercase or performing stemming (reducing the word to its root): For example, after normalization, `Actions` becomes `action`, `louder` becomes `loud`, and `words` becomes `word`.
## Analyzers
In OpenSearch, text analysis is performed by an _analyzer_. Each analyzer contains the following sequentially applied components:
1. **Character filters**: First, a character filter receives the original text as a stream of characters and adds, removes, or modifies characters in the text. For example, a character filter can strip HTML characters from a string so that the text `<p><b>Actions</b> speak louder than <em>words</em></p>` becomes `\nActions speak louder than words\n`. The output of a character filter is a stream of characters.
1. **Tokenizer**: Next, a tokenizer receives the stream of characters that has been processed by the character filter and splits the text into individual _tokens_ (usually, words). For example, a tokenizer can split text on white space so that the preceding text becomes [`Actions`, `speak`, `louder`, `than`, `words`]. Tokenizers also maintain metadata about tokens, such as their starting and ending positions in the text. The output of a tokenizer is a stream of tokens.
1. **Token filters**: Last, a token filter receives the stream of tokens from the tokenizer and adds, removes, or modifies tokens. For example, a token filter may lowercase the tokens so that `Actions` becomes `action`, remove stopwords like `than`, or add synonyms like `talk` for the word `speak`.
An analyzer must contain exactly one tokenizer and may contain zero or more character filters and zero or more token filters.
{: .note}
## Built-in analyzers
The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `Its fun to contribute a brand-new PR or 2 to OpenSearch!`.
Analyzer | Analysis performed | Analyzer output
:--- | :--- | :---
**Standard** (default) | - Parses strings into tokens at word boundaries <br> - Removes most punctuation <br> - Converts tokens to lowercase | [`its`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
**Simple** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Converts tokens to lowercase | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
**Whitespace** | - Parses strings into tokens on white space | [`Its`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
**Stop** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Removes stop words <br> - Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
**Keyword** (noop) | - Outputs the entire string unchanged | [`Its fun to contribute a brand-new PR or 2 to OpenSearch!`]
**Pattern** | - Parses strings into tokens using regular expressions <br> - Supports converting strings to lowercase <br> - Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
[**Language**]({{site.url}}/{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
**Fingerprint** | - Parses strings on any non-letter character <br> - Normalizes characters by converting them to ASCII <br> - Converts tokens to lowercase <br> - Sorts, deduplicates, and concatenates tokens into a single token <br> - Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`] <br> Note that the apostrophe was converted to its ASCII counterpart.
## Custom analyzers
If needed, you can combine tokenizers, token filters, and character filters to create a custom analyzer.
## Text analysis at indexing time and query time
OpenSearch performs text analysis on text fields when you index a document and when you send a search request. Depending on the time of text analysis, the analyzers used for it are classified as follows:
- An _index analyzer_ performs analysis at indexing time: When you are indexing a [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field, OpenSearch analyzes it before indexing it. For more information about ways to specify index analyzers, see [Index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/).
- A _search analyzer_ performs analysis at query time: OpenSearch analyzes the query string when you run a full-text query on a text field. For more information about ways to specify search analyzers, see [Search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
In most cases, you should use the same analyzer at both indexing and search time because the text field and the query string will be analyzed in the same way and the resulting tokens will match as expected.
{: .tip}
### Example
When you index a document that has a text field with the text `Actions speak louder than words`, OpenSearch analyzes the text and produces the following list of tokens:
Text field tokens = [`action`, `speak`, `loud`, `than`, `word`]
When you search for documents that match the query `speaking loudly`, OpenSearch analyzes the query string and produces the following list of tokens:
Query string tokens = [`speak`, `loud`]
Then OpenSearch compares each token in the query string against the list of text field tokens and finds that both lists contain the tokens `speak` and `loud`, so OpenSearch returns this document as part of the search results that match the query.
## Testing an analyzer
To test a built-in analyzer and view the list of tokens it generates when a document is indexed, you can use the [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/#apply-a-built-in-analyzer).
Specify the analyzer and the text to be analyzed in the request:
```json
GET /_analyze
{
"analyzer" : "standard",
"text" : "Lets contribute to OpenSearch!"
}
```
{% include copy-curl.html %}
The following image shows the query string.
![Query string with indices]({{site.url}}{{site.baseurl}}/images/string-indices.png)
The response contains each token and its start and end offsets that correspond to the starting index in the original string (inclusive) and the ending index (exclusive):
```json
{
"tokens": [
{
"token": "lets",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "contribute",
"start_offset": 6,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "to",
"start_offset": 17,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "opensearch",
"start_offset": 20,
"end_offset": 30,
"type": "<ALPHANUM>",
"position": 3
}
]
}
```
## Verifying analyzer settings
To verify which analyzer is associated with which field, you can use the get mapping API operation:
```json
GET /testindex/_mapping
```
{% include copy-curl.html %}
The response provides information about the analyzers for each field:
```json
{
"testindex": {
"mappings": {
"properties": {
"text_entry": {
"type": "text",
"analyzer": "simple",
"search_analyzer": "whitespace"
}
}
}
}
}
```
## Next steps
- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).

View File

@ -1,14 +1,13 @@
--- ---
layout: default layout: default
title: Language analyzers title: Language analyzers
nav_order: 45 nav_order: 10
parent: Text analyzers
--- ---
# Language analyzer # Language analyzer
OpenSearch supports the following language values with the `analyzer` option: OpenSearch supports the following language values with the `analyzer` option:
arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, czech, danish, dutch, english, estonian, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, and thai. `arabic`, `armenian`, `basque`, `bengali`, `brazilian`, `bulgarian`, `catalan`, `czech`, `danish`, `dutch`, `english`, `estonian`, `finnish`, `french`, `galician`, `german`, `greek`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`, `lithuanian`, `norwegian`, `persian`, `portuguese`, `romanian`, `russian`, `sorani`, `spanish`, `swedish`, `turkish`, and `thai`.
To use the analyzer when you map an index, specify the value within your query. For example, to map your index with the French language analyzer, specify the `french` value for the analyzer field: To use the analyzer when you map an index, specify the value within your query. For example, to map your index with the French language analyzer, specify the `french` value for the analyzer field:
@ -18,11 +17,10 @@ To use the analyzer when you map an index, specify the value within your query.
#### Example request #### Example request
The following query maps an index with the language analyzer set to `french`: The following query specifies the `french` language analyzer for the index `my-index`:
```json ```json
PUT my-index-000001 PUT my-index
{ {
"mappings": { "mappings": {
"properties": { "properties": {

View File

@ -0,0 +1,93 @@
---
layout: default
title: Search analyzers
nav_order: 30
---
# Search analyzers
Search analyzers are specified at query time and are used to analyze the query string when you run a full-text query on a [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field.
## Determining which search analyzer to use
To determine which analyzer to use for a query string at query time, OpenSearch examines the following parameters in order:
1. The `analyzer` parameter of the query
1. The `search_analyzer` mapping parameter of the field
1. The `analysis.analyzer.default_search` index setting
1. The `analyzer` mapping parameter of the field
1. The `standard` analyzer (default)
In most cases, specifying a search analyzer that is different from the index analyzer is not necessary and could negatively impact search result relevance or lead to unexpected search results.
{: .warning}
For information about verifying which analyzer is associated with which field, see [Verifying analyzer settings]({{site.url}}{{site.baseurl}}/analyzers/index/#verifying-analyzer-settings).
## Specifying a search analyzer for a query string
Specify the name of the analyzer you want to use at query time in the `analyzer` field:
```json
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": {
"query": "speak the truth",
"analyzer": "english"
}
}
}
}
```
{% include copy-curl.html %}
Valid values for [built-in analyzers]({{site.url}}/{{site.baseurl}}/analyzers/index/#built-in-analyzers/) are `standard`, `simple`, `whitespace`, `stop`, `keyword`, `pattern`, `fingerprint`, or any supported [language analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index/language-analyzers/).
## Specifying a search analyzer for a field
When creating index mappings, you can provide the `search_analyzer` parameter for each [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. When providing the `search_analyzer`, you must also provide the `analyzer` parameter, which specifies the [index analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index-analyzers/) to be used at indexing time.
For example, the following request specifies the `simple` analyzer as the index analyzer and the `whitespace` analyzer as the search analyzer for the `text_entry` field:
```json
PUT testindex
{
"mappings": {
"properties": {
"text_entry": {
"type": "text",
"analyzer": "simple",
"search_analyzer": "whitespace"
}
}
}
}
```
{% include copy-curl.html %}
## Specifying the default search analyzer for an index
If you want to analyze all query strings at search time with the same analyzer, you can specify the search analyzer in the `analysis.analyzer.default_search` setting. When providing the `analysis.analyzer.default_search`, you must also provide the `analysis.analyzer.default` parameter, which specifies the [index analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index-analyzers/) to be used at indexing time.
For example, the following request specifies the `simple` analyzer as the index analyzer and the `whitespace` analyzer as the search analyzer for the `testindex` index:
```json
PUT testindex
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "simple"
},
"default_search": {
"type": "whitespace"
}
}
}
}
}
```
{% include copy-curl.html %}

View File

@ -1,16 +1,20 @@
--- ---
layout: default layout: default
title: Perform text analysis title: Analyze API
parent: Analyze API has_children: true
nav_order: 7
nav_order: 2 redirect_from:
- /opensearch/rest-api/analyze-apis/
- /api-reference/analyze-apis/
--- ---
# Perform text analysis # Analyze API
The perform text analysis API analyzes a text string and returns the resulting tokens. The Analyze API allows you to perform [text analysis]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/), which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search.
If you use the Security plugin, you must have the `manage index` privilege. If you simply want to analyze text, you must have the `manager cluster` privilege. The Analyze API analyzes a text string and returns the resulting tokens.
If you use the Security plugin, you must have the `manage index` privilege. If you only want to analyze text, you must have the `manage cluster` privilege.
{: .note} {: .note}
## Path and HTTP methods ## Path and HTTP methods
@ -22,7 +26,7 @@ POST /_analyze
POST /{index}/_analyze POST /{index}/_analyze
``` ```
Although you can issue an analyzer request via both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared to data that is already in the index. `POST` requests are not cached. Although you can issue an analyze request using both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared with data that is already in the index. `POST` requests are not cached.
{: .note} {: .note}
## Path parameter ## Path parameter

View File

@ -1,12 +0,0 @@
---
layout: default
title: Analyze API
has_children: true
nav_order: 7
redirect_from:
- /opensearch/rest-api/analyze-apis/
---
# Analyze API
The analyze API allows you to perform text analysis, which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search.

View File

@ -20,7 +20,7 @@ If needed, you can combine tokenizers, token filters, and character filters to c
#### Tokenizers #### Tokenizers
Tokenizers break unstuctured text into tokens and maintain metadata about tokens, such as their start and ending positions in the text. Tokenizers break unstructured text into tokens and maintain metadata about tokens, such as their starting and ending positions in the text.
#### Character filters #### Character filters

View File

@ -18,7 +18,7 @@ This reference includes the REST APIs supported by OpenSearch. If a REST API is
## Related articles ## Related articles
- [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/index/) - [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/)
- [Access control API]({{site.url}}{{site.baseurl}}/security/access-control/api/) - [Access control API]({{site.url}}{{site.baseurl}}/security/access-control/api/)
- [Alerting API]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/api/) - [Alerting API]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/api/)
- [Anomaly detection API]({{site.url}}{{site.baseurl}}/observing-your-data/ad/api/) - [Anomaly detection API]({{site.url}}{{site.baseurl}}/observing-your-data/ad/api/)
@ -43,7 +43,8 @@ This reference includes the REST APIs supported by OpenSearch. If a REST API is
- [Point in Time API]({{site.url}}{{site.baseurl}}/search-plugins/point-in-time-api/) - [Point in Time API]({{site.url}}{{site.baseurl}}/search-plugins/point-in-time-api/)
- [Popular APIs]({{site.url}}{{site.baseurl}}/api-reference/popular-api/) - [Popular APIs]({{site.url}}{{site.baseurl}}/api-reference/popular-api/)
- [Ranking evaluation]({{site.url}}{{site.baseurl}}/api-reference/rank-eval/) - [Ranking evaluation]({{site.url}}{{site.baseurl}}/api-reference/rank-eval/)
- [Reload search analyzer]({{site.url}}{{site.baseurl}}/api-reference/reload-search-analyzer/) - [Refresh search analyzer]({{site.url}}{{site.baseurl}}/im-plugin/refresh-analyzer/)
- [Reload search analyzer]({{site.url}}{{site.baseurl}}/im-plugin/reload-search-analyzer/)
- [Remove cluster information]({{site.url}}{{site.baseurl}}/api-reference/remote-info/) - [Remove cluster information]({{site.url}}{{site.baseurl}}/api-reference/remote-info/)
- [Root cause analysis API]({{site.url}}{{site.baseurl}}/monitoring-your-cluster/pa/rca/api/) - [Root cause analysis API]({{site.url}}{{site.baseurl}}/monitoring-your-cluster/pa/rca/api/)
- [Snapshot management API]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/sm-api/) - [Snapshot management API]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/sm-api/)

View File

@ -70,9 +70,15 @@ collections:
reporting: reporting:
permalink: /:collection/:path/ permalink: /:collection/:path/
output: true output: true
analyzers:
permalink: /:collection/:path/
output: true
query-dsl: query-dsl:
permalink: /:collection/:path/ permalink: /:collection/:path/
output: true output: true
aggregations:
permalink: /:collection/:path/
output: true
field-types: field-types:
permalink: /:collection/:path/ permalink: /:collection/:path/
output: true output: true
@ -133,8 +139,14 @@ just_the_docs:
field-types: field-types:
name: Mappings and field types name: Mappings and field types
nav_fold: true nav_fold: true
analyzers:
name: Text analysis
nav_fold: true
query-dsl: query-dsl:
name: Query DSL, Aggregations, and Analyzers name: Query DSL
nav_fold: true
aggregations:
name: Aggregations
nav_fold: true nav_fold: true
search-plugins: search-plugins:
name: Search name: Search

View File

@ -52,7 +52,7 @@ The flat object field type supports the following queries:
- [Range]({{site.url}}{{site.baseurl}}/query-dsl/term#range) - [Range]({{site.url}}{{site.baseurl}}/query-dsl/term#range)
- [Match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#match) - [Match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#match)
- [Multi-match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#multi-match) - [Multi-match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#multi-match)
- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#query-string) - [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/)
- [Simple query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#simple-query-string) - [Simple query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#simple-query-string)
- [Exists]({{site.url}}{{site.baseurl}}/query-dsl/term#exists) - [Exists]({{site.url}}{{site.baseurl}}/query-dsl/term#exists)

View File

@ -2,10 +2,9 @@
layout: default layout: default
title: Refresh search analyzer title: Refresh search analyzer
nav_order: 50 nav_order: 50
parent: Text analyzers
has_toc: false has_toc: false
redirect_from: redirect_from:
- /im-plugin/refresh-analyzer/ - /query-dsl/analyzers/refresh-analyzer/
- /im-plugin/refresh-analyzer/index/ - /im-plugin/refresh-analyzer/index/
--- ---

View File

@ -62,7 +62,7 @@ You've created a new visualization that can be added to a new or existing dashbo
### Limitations of event analytics visualizations ### Limitations of event analytics visualizations
Event analytics visualizations currently do not support [Dashboards Query Language (DQL)]({{site.url}}{{site.baseurl}}/dashboards/discover/dql/) or [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/), and they do not use index patterns. Note the following limitations: Event analytics visualizations currently do not support [Dashboards Query Language (DQL)]({{site.url}}{{site.baseurl}}/dashboards/discover/dql/) or [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/), and they do not use index patterns. Note the following limitations:
- Event analytics visualizations only use filters created using the dropdown interface. If you have DQL query or DSL filters in a dashboard, the visualizations do not use them. - Event analytics visualizations only use filters created using the dropdown interface. If you have DQL query or DSL filters in a dashboard, the visualizations do not use them.
- The **Dashboard** filter dropdown interface only shows fields from the default index pattern or index patterns used by other visualizations in the same dashboard. - The **Dashboard** filter dropdown interface only shows fields from the default index pattern or index patterns used by other visualizations in the same dashboard.

View File

@ -1,18 +0,0 @@
---
layout: default
title: Bucket aggregations
parent: Aggregations
has_children: true
has_toc: true
nav_order: 3
redirect_from:
- /opensearch/bucket-agg/
- /query-dsl/aggregations/bucket-agg/
- /aggregations/bucket-agg/
---
# Bucket aggregations
Bucket aggregations categorize sets of documents as buckets. The type of bucket aggregation determines whether a given document falls into a bucket or not.
You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users narrow down the results.

View File

@ -1,28 +0,0 @@
---
layout: default
title: Metric aggregations
parent: Aggregations
has_children: true
has_toc: true
nav_order: 2
redirect_from:
- /opensearch/metric-agg/
- /query-dsl/aggregations/metric-agg/
- /aggregations/metric-agg/
---
# Metric aggregations
Metric aggregations let you perform simple calculations such as finding the minimum, maximum, and average values of a field.
## Types of metric aggregations
Metric aggregations are of two types: single-value metric aggregations and multi-value metric aggregations.
### Single-value metric aggregations
Single-value metric aggregations return a single metric. For example, `sum`, `min`, `max`, `avg`, `cardinality`, and `value_count`.
### Multi-value metric aggregations
Multi-value metric aggregations return more than one metric. For example, `stats`, `extended_stats`, `matrix_stats`, `percentile`, `percentile_ranks`, `geo_bound`, `top_hits`, and `scripted_metric`.

View File

@ -1,75 +0,0 @@
---
layout: default
title: Text analyzers
nav_order: 190
has_children: true
permalink: /analyzers/text-analyzers/
redirect_from:
- /opensearch/query-dsl/text-analyzers/
- /query-dsl/analyzers/text-analyzers/
---
# Optimizing text for searches with text analyzers
OpenSearch applies text analysis during indexing or searching for `text` fields. There is a standard analyzer that OpenSearch uses by default for text analysis. To optimize unstructured text for search, you can convert it into structured text with our text analyzers.
## Text analyzers
OpenSearch provides several text analyzers to convert your structured text into the format that works best for your searches.
OpenSearch supports the following text analyzers:
- **Standard analyzer** Parses strings into terms at word boundaries according to the Unicode text segmentation algorithm. It removes most, but not all, punctuation and converts strings to lowercase. You can remove stop words if you enable that option, but it does not remove stop words by default.
- **Simple analyzer** Converts strings to lowercase and removes non-letter characters when it splits a string into tokens on any non-letter character.
- **Whitespace analyzer** Parses strings into terms between each whitespace.
- **Stop analyzer** Converts strings to lowercase and removes non-letter characters by splitting strings into tokens at each non-letter character. It also removes stop words (for example, "but" or "this") from strings.
- **Keyword analyzer** Receives a string as input and outputs the entire string as one term.
- **Pattern analyzer** Splits strings into terms using regular expressions and supports converting strings to lowercase. It also supports removing stop words.
- **Language analyzer** Provides analyzers specific to multiple languages.
- **Fingerprint analyzer** Creates a fingerprint to use as a duplicate detector.
The full specialized text analyzers reference is in progress and will be published soon.
{: .note }
## How to use text analyzers
If you want to use a text analyzer, specify the name of the analyzer for the `analyzer` field: standard, simple, whitespace, stop, keyword, pattern, fingerprint, or language.
Each analyzer consists of one tokenizer and zero or more token filters. Different analyzers have different character filters, tokenizers, and token filters. To pre-process the string before the tokenizer is applied, you can use one or more character filters.
#### Example: Specify the standard analyzer in a simple query
```json
GET _search
{
"query": {
"match": {
"title": "A brief history of Time",
"analyzer": "standard"
}
}
}
```
## Analyzer options
Option | Valid values | Description
:--- | :--- | :---
`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, language, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (for example, "an," "but," "this") from the query string. For a full list of acceptable language values, see [Language analyzer]({{site.url}}{{site.baseurl}}/query-dsl/analyzers/language-analyzers/) on this page.
`quote_analyzer` | String | This option lets you choose to use the standard analyzer without any options, such as `language` or other analyzers. Usage is `"quote_analyzer": "standard"`.
<!-- This is a list of the 7 individual new pages we need to write
If you want to select one of the text analyzers, see [Text analyzers reference]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/specialized-analyzers).
## Specialized text analyzers
1. Standard analyzer
1. Simple
1. Whitespace
1. Stop
1. Keyword
1. Pattern
1. Language
1. Fingerprint
-->

View File

@ -4,10 +4,10 @@ title: Boolean queries
parent: Compound queries parent: Compound queries
grand_parent: Query DSL grand_parent: Query DSL
nav_order: 10 nav_order: 10
permalink: /query-dsl/compound/bool/
redirect_from: redirect_from:
- /opensearch/query-dsl/compound/bool/ - /opensearch/query-dsl/compound/bool/
- /opensearch/query-dsl/bool/ - /opensearch/query-dsl/bool/
- /query-dsl/query-dsl/compound/bool/
--- ---
# Boolean queries # Boolean queries

View File

@ -4,6 +4,8 @@ title: Boosting queries
parent: Compound queries parent: Compound queries
grand_parent: Query DSL grand_parent: Query DSL
nav_order: 30 nav_order: 30
redirect_from:
- /query-dsl/query-dsl/compound/boosting/
--- ---
# Boosting queries # Boosting queries

View File

@ -4,6 +4,8 @@ title: Constant score queries
parent: Compound queries parent: Compound queries
grand_parent: Query DSL grand_parent: Query DSL
nav_order: 40 nav_order: 40
redirect_from:
- /query-dsl/query-dsl/compound/constant-score/
--- ---
# Constant score queries # Constant score queries

View File

@ -4,6 +4,8 @@ title: Disjunction max queries
parent: Compound queries parent: Compound queries
grand_parent: Query DSL grand_parent: Query DSL
nav_order: 50 nav_order: 50
redirect_from:
- /query-dsl/query-dsl/compound/disjunction-max/
--- ---
# Disjunction max queries # Disjunction max queries

View File

@ -5,6 +5,8 @@ parent: Compound queries
grand_parent: Query DSL grand_parent: Query DSL
nav_order: 60 nav_order: 60
has_math: true has_math: true
redirect_from:
- /query-dsl/query-dsl/compound/function-score/
--- ---
# Function score queries # Function score queries

View File

@ -1,12 +1,11 @@
--- ---
layout: default layout: default
title: Compound queries title: Compound queries
parent: Query DSL
has_children: true has_children: true
nav_order: 40 nav_order: 40
permalink: /query-dsl/compound/
redirect_from: redirect_from:
- /opensearch/query-dsl/compound/index/ - /opensearch/query-dsl/compound/index/
- /query-dsl/query-dsl/compound/
--- ---
# Compound queries # Compound queries

View File

@ -1,13 +1,13 @@
--- ---
layout: default layout: default
title: Full-text queries title: Full-text queries
parent: Query DSL
has_children: true has_children: true
nav_order: 30 nav_order: 30
permalink: /query-dsl/full-text/
redirect_from: redirect_from:
- /opensearch/query-dsl/full-text/ - /opensearch/query-dsl/full-text/
- /opensearch/query-dsl/full-text/index/ - /opensearch/query-dsl/full-text/index/
- /query-dsl/query-dsl/full-text/
- /query-dsl/full-text/
--- ---
# Full-text queries # Full-text queries
@ -20,9 +20,6 @@ To learn more about search query classes, see [Lucene query JavaDocs](https://lu
The full-text query types shown in this section use the standard analyzer, which analyzes text automatically when the query is submitted. The full-text query types shown in this section use the standard analyzer, which analyzes text automatically when the query is submitted.
You can also analyze fields when you index them. To learn more about how to convert unstructured text into structured text that is optimized for search, see [Optimizing text for searches with text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers).
{: .note }
<!-- to do: rewrite query type definitions per issue: https://github.com/opensearch-project/documentation-website/issues/1116 <!-- to do: rewrite query type definitions per issue: https://github.com/opensearch-project/documentation-website/issues/1116
--> -->
--- ---
@ -428,7 +425,7 @@ GET _search
--> -->
## Advanced filter options ## Advanced filter options
You can filter your query results by using some of the optional query fields, such as wildcards, fuzzy query fields, and synonyms. You can also use analyzers as optional query fields. To learn more, see [How to use text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers/#how-to-use-text-analyzers). You can filter your query results by using some of the optional query fields, such as wildcards, fuzzy query fields, or synonyms. You can also use analyzers as optional query fields.
### Wildcard options ### Wildcard options

View File

@ -4,9 +4,10 @@ title: Query string queries
parent: Full-text queries parent: Full-text queries
grand_parent: Query DSL grand_parent: Query DSL
nav_order: 25 nav_order: 25
permalink: /query-dsl/full-text/query-string/
redirect_from: redirect_from:
- /opensearch/query-dsl/full-text/query-string/ - /opensearch/query-dsl/full-text/query-string/
- /query-dsl/query-dsl/full-text/query-string/
--- ---
# Query string queries # Query string queries

View File

@ -4,9 +4,9 @@ title: Geo-bounding box queries
parent: Geographic and xy queries parent: Geographic and xy queries
grand_parent: Query DSL grand_parent: Query DSL
nav_order: 10 nav_order: 10
permalink: /query-dsl/geo-and-xy/geo-bounding-box/
redirect_from: redirect_from:
- /opensearch/query-dsl/geo-and-xy/geo-bounding-box/ - /opensearch/query-dsl/geo-and-xy/geo-bounding-box/
- /query-dsl/query-dsl/geo-and-xy/geo-bounding-box/
--- ---
# Geo-bounding box queries # Geo-bounding box queries

View File

@ -1,12 +1,12 @@
--- ---
layout: default layout: default
title: Geographic and xy queries title: Geographic and xy queries
parent: Query DSL
has_children: true has_children: true
nav_order: 50 nav_order: 50
permalink: /query-dsl/geo-and-xy/
redirect_from: redirect_from:
- /opensearch/query-dsl/geo-and-xy/index/ - /opensearch/query-dsl/geo-and-xy/index/
- /query-dsl/query-dsl/geo-and-xy/
- /query-dsl/query-dsl/geo-and-xy/index/
--- ---
# Geographic and xy queries # Geographic and xy queries

View File

@ -4,9 +4,10 @@ title: xy queries
parent: Geographic and xy queries parent: Geographic and xy queries
grand_parent: Query DSL grand_parent: Query DSL
nav_order: 50 nav_order: 50
permalink: /query-dsl/geo-and-xy/xy/
redirect_from: redirect_from:
- /opensearch/query-dsl/geo-and-xy/xy/ - /opensearch/query-dsl/geo-and-xy/xy/
- /query-dsl/query-dsl/geo-and-xy/xy/
--- ---
# xy queries # xy queries

View File

@ -1,16 +1,85 @@
--- ---
layout: default layout: default
title: Query DSL, aggregations, and analyzers title: Query DSL
nav_order: 1 nav_order: 2
has_children: false has_children: true
has_toc: false
nav_exclude: true nav_exclude: true
redirect_from:
- /opensearch/query-dsl/
- /opensearch/query-dsl/index/
- /docs/opensearch/query-dsl/
- /query-dsl/query-dsl/
- /query-dsl/
--- ---
# Query DSL, aggregations, and analyzers {%- comment -%}The `/docs/opensearch/query-dsl/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
[Analyzers]({{site.url}}{{site.baseurl}}/analyzers/text-analyzers/) process text to make it searchable. OpenSearch provides various analyzers that let you customize the way text is split into terms and converted into a structured format. To search documents written in a different language, you can use one of the built-in [language analyzers]({{site.url}}{{site.baseurl}}/query-dsl/analyzers/language-analyzers/) for your language of choice. # Query DSL
The most essential search function is using a query to return relevant documents. OpenSearch provides a search language called _query domain-specific language_ (DSL) that lets you build complex and targeted queries. Explore the [query DSL documentation]({{site.url}}{{site.baseurl}}/query-dsl/) to learn more about the different types of queries OpenSearch supports. OpenSearch provides a search language called *query domain-specific language (DSL)* that you can use to search your data. Query DSL is a flexible language with a JSON interface.
[Aggregations]({{site.url}}{{site.baseurl}}/aggregations/) let you categorize your data and analyze it to extract statistics. Use cases for aggregations include analyzing data in real time and using OpenSearch Dashboards to create visualizations. With query DSL, you need to specify a query in the `query` parameter of the search. One of the simplest searches in OpenSearch uses the `match_all` query, which matches all documents in an index:
```json
GET testindex/_search
{
"query": {
"match_all": {
}
}
}
```
A query can consist of many query clauses. You can combine query clauses to produce complex queries.
Broadly, you can classify queries into two categories---*leaf queries* and *compound queries*:
- **Leaf queries**: Leaf queries search for a specified value in a certain field or fields. You can use leaf queries on their own. They include the following query types:
- **Full-text queries**: Use full-text queries to search text documents. For an analyzed text field search, full-text queries split the query string into terms using the same analyzer that was used when the field was indexed. For an exact value search, full-text queries look for the specified value without applying text analysis. To learn more, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index/).
- **Term-level queries**: Use term-level queries to search documents for an exact term, such as an ID or value range. Term-level queries do not analyze search terms or sort results by relevance score. To learn more, see [Term-level queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/).
- **Geographic and xy queries**: Use geographic queries to search documents that include geographic data. Use xy queries to search documents that include points and shapes in a two-dimensional coordinate system. To learn more, see [Geographic and xy queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/index).
- **Joining queries**: Use joining queries to search nested fields or return parent and child documents that match a specific query. Types of joining queries include `nested`, `has_child`, `has_parent`, and `parent_id` queries.
- **Span queries**: Use span queries to perform precise positional searches. Span queries are low-level, specific queries that provide control over the order and proximity of specified query terms. They are primarily used to search legal documents. To learn more, see [Span queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/span-query/).
- **Specialized queries**: Specialized queries include all other query types (`distance_feature`, `more_like_this`, `percolate`, `rank_feature`, `script`, `script_score`, `wrapper`, and `pinned_query`).
- **Compound queries**: Compound queries serve as wrappers for multiple leaf or compound clauses, either to combine their results or to modify their behavior. They include the Boolean, disjunction max, constant score, function score, and boosting query types. To learn more, see [Compound queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/index/).
## A note on Unicode special characters in text fields
Because of word boundaries associated with Unicode special characters, the Unicode standard analyzer cannot index a [text field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) value as a whole value when it includes one of these special characters. As a result, a text field value that includes a special character is parsed by the standard analyzer as multiple values separated by the special character, effectively tokenizing the different elements on either side of it. This can lead to unintentional filtering of documents and potentially compromise control over their access.
The following examples illustrate values containing special characters that will be parsed improperly by the standard analyzer. In this example, the existence of the hyphen/minus sign in the value prevents the analyzer from distinguishing between the two different users for `user.id` and interprets them as being one and the same:
```json
{
"bool": {
"must": {
"match": {
"user.id": "User-1"
}
}
}
}
```
```json
{
"bool": {
"must": {
"match": {
"user.id": "User-2"
}
}
}
}
```
To avoid this circumstance when using either query DSL or the REST API, you can use a custom analyzer or map the field as `keyword`, which performs an exact-match search. See [Keyword field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) for the latter option.
For a list of characters that should be avoided when using `text` field types, see [Word Boundaries](https://unicode.org/reports/tr29/#Word_Boundaries).

View File

@ -1,8 +1,9 @@
--- ---
layout: default layout: default
title: Minimum should match title: Minimum should match
parent: Query DSL
nav_order: 70 nav_order: 70
redirect_from:
- /query-dsl/query-dsl/minimum-should-match/
--- ---
# Minimum should match # Minimum should match

View File

@ -1,83 +0,0 @@
---
layout: default
title: Query DSL
nav_order: 2
has_children: true
permalink: /query-dsl/
redirect_from:
- /opensearch/query-dsl/
- /opensearch/query-dsl/index/
- /docs/opensearch/query-dsl/
---
{%- comment -%}The `/docs/opensearch/query-dsl/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
# Query DSL
OpenSearch provides a search language called *query domain-specific language (DSL)* that you can use to search your data. Query DSL is a flexible language with a JSON interface.
With query DSL, you need to specify a query in the `query` parameter of the search. One of the simplest searches in OpenSearch uses the `match_all` query, which matches all documents in an index:
```json
GET testindex/_search
{
"query": {
"match_all": {
}
}
}
```
A query can consist of many query clauses. You can combine query clauses to produce complex queries.
Broadly, you can classify queries into two categories---*leaf queries* and *compound queries*:
- **Leaf queries**: Leaf queries search for a specified value in a certain field or fields. You can use leaf queries on their own. They include the following query types:
- **Full-text queries**: Use full-text queries to search text documents. For an analyzed text field search, full-text queries split the query string into terms with the same analyzer that was used when the field was indexed. For an exact value search, full-text queries look for the specified value without applying text analysis. To learn more, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index).
- **Term-level queries**: Use term-level queries to search documents for an exact specified term, such as an ID or value range. Term-level queries do not analyze search terms or sort results by relevance score. To learn more, see [Term-level queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/).
- **Geographic and xy queries**: Use geographic queries to search documents that include geographic data. Use xy queries to search documents that include points and shapes in a two-dimensional coordinate system. To learn more, see [Geographic and xy queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/index).
- **Joining queries**: Use joining queries to search nested fields or return parent and child documents that match a specific query. Types of joining queries include `nested`, `has_child`, `has_parent`, and `parent_id` queries.
- **Span queries**: Use span queries to perform precise positional searches. Span queries are low-level, specific queries that provide control over the order and proximity of specified query terms. They are primarily used to search legal documents. To learn more, see [Span queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/span-query/).
- **Specialized queries**: Specialized queries include all other query types (`distance_feature`, `more_like_this`, `percolate`, `rank_feature`, `script`, `script_score`, `wrapper`, and `pinned_query`).
- **Compound queries**: Compound queries serve as wrappers for multiple leaf or compound clauses either to combine their results or to modify their behavior. They include the Boolean, disjunction max, constant score, function score, and boosting query types. To learn more, see [Compound queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/index).
## A note on Unicode special characters in text fields
Because of word boundaries associated with Unicode special characters, the Unicode standard analyzer cannot index a [text field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) value as a whole value when it includes one of these special characters. As a result, a text field value that includes a special character is parsed by the standard analyzer as multiple values separated by the special character, effectively tokenizing the different elements on either side of it. This can lead to unintentional filtering of documents and potentially compromise control over their access.
The following examples illustrate values containing special characters that will be parsed improperly by the standard analyzer. In this example, the existence of the hyphen/minus sign in the value prevents the analyzer from distinguishing between the two different users for `user.id` and interprets them as one and the same:
```json
{
"bool": {
"must": {
"match": {
"user.id": "User-1"
}
}
}
}
```
```json
{
"bool": {
"must": {
"match": {
"user.id": "User-2"
}
}
}
}
```
To avoid this circumstance when using either query DSL or the REST API, you can use a custom analyzer or map the field as `keyword`, which performs an exact-match search. See [Keyword field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) for the latter option.
For a list of characters that should be avoided for `text` field types, see [Word Boundaries](https://unicode.org/reports/tr29/#Word_Boundaries).

View File

@ -1,9 +1,9 @@
--- ---
layout: default layout: default
title: Query and filter context title: Query and filter context
parent: Query DSL
permalink: /query-dsl/query-filter-context/
nav_order: 5 nav_order: 5
redirect_from:
- /query-dsl/query-dsl/query-filter-context/
--- ---
# Query and filter context # Query and filter context

View File

@ -1,11 +1,10 @@
--- ---
layout: default layout: default
title: Span queries title: Span queries
parent: Query DSL
nav_order: 60 nav_order: 60
permalink: /query-dsl/span-query/
redirect_from: redirect_from:
- /opensearch/query-dsl/span-query/ - /opensearch/query-dsl/span-query/
- /query-dsl/query-dsl/span-query/
--- ---
# Span queries # Span queries

View File

@ -1,9 +1,9 @@
--- ---
layout: default layout: default
title: Term-level and full-text queries compared title: Term-level and full-text queries compared
parent: Query DSL
permalink: /query-dsl/term-vs-full-text/
nav_order: 10 nav_order: 10
redirect_from:
- /query-dsl/query-dsl/term-vs-full-text
--- ---
# Term-level and full-text queries compared # Term-level and full-text queries compared

View File

@ -1,11 +1,10 @@
--- ---
layout: default layout: default
title: Term-level queries title: Term-level queries
parent: Query DSL
nav_order: 20 nav_order: 20
permalink: /query-dsl/term/
redirect_from: redirect_from:
- /opensearch/query-dsl/term/ - /opensearch/query-dsl/term/
- /query-dsl/query-dsl/term/
--- ---
# Term-level queries # Term-level queries

View File

@ -11,7 +11,7 @@ nav_exclude: true
OpenSearch provides several features for customizing your search use cases and improving search relevance. In OpenSearch, you can: OpenSearch provides several features for customizing your search use cases and improving search relevance. In OpenSearch, you can:
- Use [SQL and Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) as alternatives to [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/) to search data. - Use [SQL and Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) as alternatives to [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/) for searching data.
- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/). - Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).

BIN
images/string-indices.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB