Split query DSL, analyzer, and aggregation sections and add more to analyzer section (#4693)
* Add analyzer documentation Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Add index and search analyzer pages Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Doc review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * More doc review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Implemented editorial comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update index-analyzers.md Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
parent
aed1d68ae0
commit
a87fdc0f63
|
@ -4,6 +4,8 @@ title: Adjacency matrix
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 10
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/adjacency-matrix/
|
||||
---
|
||||
|
||||
# Adjacency matrix aggregations
|
|
@ -4,6 +4,8 @@ title: Date histogram
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 20
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/date-histogram/
|
||||
---
|
||||
|
||||
# Date histogram aggregations
|
|
@ -4,6 +4,8 @@ title: Date range
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 30
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/date-range/
|
||||
---
|
||||
|
||||
# Date range aggregations
|
|
@ -4,6 +4,8 @@ title: Diversified sampler
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 40
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/diversified-sampler/
|
||||
---
|
||||
|
||||
# Diversified sampler aggregations
|
|
@ -4,6 +4,8 @@ title: Filter
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 50
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/filter/
|
||||
---
|
||||
|
||||
# Filter aggregations
|
|
@ -4,6 +4,8 @@ title: Filters
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 60
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/filters/
|
||||
---
|
||||
|
||||
# Filters aggregations
|
|
@ -4,6 +4,8 @@ title: Geodistance
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 70
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/geo-distance/
|
||||
---
|
||||
|
||||
# Geodistance aggregations
|
|
@ -4,6 +4,8 @@ title: Geohash grid
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 80
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/geohash-grid/
|
||||
---
|
||||
|
||||
# Geohash grid aggregations
|
|
@ -7,6 +7,7 @@ nav_order: 85
|
|||
redirect_from:
|
||||
- /aggregations/geohexgrid/
|
||||
- /query-dsl/aggregations/geohexgrid/
|
||||
- /query-dsl/aggregations/bucket/geohex-grid/
|
||||
---
|
||||
|
||||
# Geohex grid aggregations
|
|
@ -4,6 +4,8 @@ title: Geotile grid
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 87
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/geotile-grid/
|
||||
---
|
||||
|
||||
# Geotile grid aggregations
|
|
@ -4,6 +4,8 @@ title: Global
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 90
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/global/
|
||||
---
|
||||
|
||||
# Global aggregations
|
|
@ -4,6 +4,8 @@ title: Histogram
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 100
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/histogram/
|
||||
---
|
||||
|
||||
# Histogram aggregations
|
|
@ -0,0 +1,45 @@
|
|||
---
|
||||
layout: default
|
||||
title: Bucket aggregations
|
||||
has_children: true
|
||||
has_toc: false
|
||||
nav_order: 3
|
||||
redirect_from:
|
||||
- /opensearch/bucket-agg/
|
||||
- /query-dsl/aggregations/bucket-agg/
|
||||
- /query-dsl/aggregations/bucket/
|
||||
- /aggregations/bucket-agg/
|
||||
---
|
||||
|
||||
# Bucket aggregations
|
||||
|
||||
Bucket aggregations categorize sets of documents as buckets. The type of bucket aggregation determines the bucket for a given document.
|
||||
|
||||
You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users filter the results.
|
||||
|
||||
## Supported bucket aggregations
|
||||
|
||||
OpenSearch supports the following bucket aggregations:
|
||||
|
||||
- [Adjacency matrix]({{site.url}}{{site.baseurl}}/aggregations/bucket/adjacency-matrix/)
|
||||
- [Date histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-histogram/)
|
||||
- [Date range]({{site.url}}{{site.baseurl}}/aggregations/bucket/date-range/)
|
||||
- [Diversified sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/diversified-sampler/)
|
||||
- [Filter]({{site.url}}{{site.baseurl}}/aggregations/bucket/filter/)
|
||||
- [Filters]({{site.url}}{{site.baseurl}}/aggregations/bucket/filters/)
|
||||
- [Geodistance]({{site.url}}{{site.baseurl}}/aggregations/bucket/geo-distance/)
|
||||
- [Geohash grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geohash-grid/)
|
||||
- [Geohex grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geohex-grid/)
|
||||
- [Geotile grid]({{site.url}}{{site.baseurl}}/aggregations/bucket/geotile-grid/)
|
||||
- [Global]({{site.url}}{{site.baseurl}}/aggregations/bucket/global/)
|
||||
- [Histogram]({{site.url}}{{site.baseurl}}/aggregations/bucket/histogram/)
|
||||
- [IP range]({{site.url}}{{site.baseurl}}/aggregations/bucket/ip-range/)
|
||||
- [Missing]({{site.url}}{{site.baseurl}}/aggregations/bucket/missing/)
|
||||
- [Multi-terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/multi-terms/)
|
||||
- [Nested]({{site.url}}{{site.baseurl}}/aggregations/bucket/nested/)
|
||||
- [Range]({{site.url}}{{site.baseurl}}/aggregations/bucket/range/)
|
||||
- [Reverse nested]({{site.url}}{{site.baseurl}}/aggregations/bucket/reverse-nested/)
|
||||
- [Sampler]({{site.url}}{{site.baseurl}}/aggregations/bucket/sampler/)
|
||||
- [Significant terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-terms/)
|
||||
- [Significant text]({{site.url}}{{site.baseurl}}/aggregations/bucket/significant-text/)
|
||||
- [Terms]({{site.url}}{{site.baseurl}}/aggregations/bucket/terms/)
|
|
@ -4,6 +4,8 @@ title: IP range
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 110
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/ip-range/
|
||||
---
|
||||
|
||||
# IP range aggregations
|
|
@ -4,6 +4,8 @@ title: Missing
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 120
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/missing/
|
||||
---
|
||||
|
||||
# Missing aggregations
|
|
@ -4,6 +4,8 @@ title: Multi-terms
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 130
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/multi-terms/
|
||||
---
|
||||
|
||||
# Multi-terms aggregations
|
|
@ -4,6 +4,8 @@ title: Nested
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 140
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/nested/
|
||||
---
|
||||
|
||||
# Nested aggregations
|
|
@ -4,6 +4,8 @@ title: Range
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 150
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/range/
|
||||
---
|
||||
|
||||
# Range aggregations
|
|
@ -4,6 +4,8 @@ title: Reverse nested
|
|||
parent: Bucket aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 160
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/bucket/reverse-nested/
|
||||
---
|
||||
|
||||
# Reverse nested aggregations
|
|
@ -3,9 +3,10 @@ layout: default
|
|||
title: Aggregations
|
||||
has_children: true
|
||||
nav_order: 5
|
||||
permalink: /aggregations/
|
||||
nav_exclude: true
|
||||
redirect_from:
|
||||
- /opensearch/aggregations/
|
||||
- /query-dsl/aggregations/
|
||||
---
|
||||
|
||||
# Aggregations
|
|
@ -4,6 +4,8 @@ title: Average
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 10
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/average/
|
||||
---
|
||||
|
||||
# Average aggregations
|
|
@ -4,6 +4,8 @@ title: Cardinality
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 20
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/cardinality/
|
||||
---
|
||||
|
||||
# Cardinality aggregations
|
|
@ -4,6 +4,8 @@ title: Extended stats
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 30
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/extended-stats/
|
||||
---
|
||||
|
||||
# Extended stats aggregations
|
|
@ -4,6 +4,8 @@ title: Geobounds
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 40
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/geobounds/
|
||||
---
|
||||
|
||||
## Geobounds aggregations
|
|
@ -0,0 +1,47 @@
|
|||
---
|
||||
layout: default
|
||||
title: Metric aggregations
|
||||
has_children: true
|
||||
has_toc: false
|
||||
nav_order: 2
|
||||
redirect_from:
|
||||
- /opensearch/metric-agg/
|
||||
- /query-dsl/aggregations/metric-agg/
|
||||
- /aggregations/metric-agg/
|
||||
- /query-dsl/aggregations/metric/
|
||||
---
|
||||
|
||||
# Metric aggregations
|
||||
|
||||
Metric aggregations let you perform simple calculations such as finding the minimum, maximum, and average values of a field.
|
||||
|
||||
## Types of metric aggregations
|
||||
|
||||
There are two types of metric aggregations: single-value metric aggregations and multi-value metric aggregations.
|
||||
|
||||
### Single-value metric aggregations
|
||||
|
||||
Single-value metric aggregations return a single metric, for example, `sum`, `min`, `max`, `avg`, `cardinality`, or `value_count`.
|
||||
|
||||
### Multi-value metric aggregations
|
||||
|
||||
Multi-value metric aggregations return more than one metric. These include `stats`, `extended_stats`, `matrix_stats`, `percentile`, `percentile_ranks`, `geo_bound`, `top_hits`, and `scripted_metric`.
|
||||
|
||||
## Supported metric aggregations
|
||||
|
||||
OpenSearch supports the following metric aggregations:
|
||||
|
||||
- [Average]({{site.url}}{{site.baseurl}}/aggregations/metric/average/)
|
||||
- [Cardinality]({{site.url}}{{site.baseurl}}/aggregations/metric/cardinality/)
|
||||
- [Extended stats]({{site.url}}{{site.baseurl}}/aggregations/metric/extended-stats/)
|
||||
- [Geobounds]({{site.url}}{{site.baseurl}}/aggregations/metric/geobounds/)
|
||||
- [Matrix stats]({{site.url}}{{site.baseurl}}/aggregations/metric/matrix-stats/)
|
||||
- [Maximum]({{site.url}}{{site.baseurl}}/aggregations/metric/maximum/)
|
||||
- [Minimum]({{site.url}}{{site.baseurl}}/aggregations/metric/minimum/)
|
||||
- [Percentile ranks]({{site.url}}{{site.baseurl}}/aggregations/metric/percentile-ranks/)
|
||||
- [Percentile]({{site.url}}{{site.baseurl}}/aggregations/metric/percentile/)
|
||||
- [Scripted metric]({{site.url}}{{site.baseurl}}/aggregations/metric/scripted-metric/)
|
||||
- [Stats]({{site.url}}{{site.baseurl}}/aggregations/metric/stats/)
|
||||
- [Sum]({{site.url}}{{site.baseurl}}/aggregations/metric/sum/)
|
||||
- [Top hits]({{site.url}}{{site.baseurl}}/aggregations/metric/top-hits/)
|
||||
- [Value count]({{site.url}}{{site.baseurl}}/aggregations/metric/value-count/)
|
|
@ -4,6 +4,8 @@ title: Matrix stats
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 50
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/matrix-stats/
|
||||
---
|
||||
|
||||
# Matrix stats aggregations
|
|
@ -4,6 +4,8 @@ title: Maximum
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 60
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/maximum/
|
||||
---
|
||||
|
||||
# Maximum aggregations
|
|
@ -4,6 +4,8 @@ title: Minimum
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 70
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/minimum/
|
||||
---
|
||||
|
||||
# Minimum aggregations
|
|
@ -4,6 +4,8 @@ title: Percentile ranks
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 80
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/percentile-ranks/
|
||||
---
|
||||
|
||||
# Percentile rank aggregations
|
|
@ -4,6 +4,8 @@ title: Percentile
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 90
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/percentile/
|
||||
---
|
||||
|
||||
# Percentile aggregations
|
|
@ -4,6 +4,8 @@ title: Scripted metric
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 100
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/scripted-metric/
|
||||
---
|
||||
|
||||
# Scripted metric aggregations
|
|
@ -1,9 +1,11 @@
|
|||
---
|
||||
layout: default
|
||||
title: Stats aggregations
|
||||
title: Stats
|
||||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 110
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/stats/
|
||||
---
|
||||
|
||||
# Stats aggregations
|
|
@ -4,6 +4,8 @@ title: Sum
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 120
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/sum/
|
||||
---
|
||||
|
||||
# Sum aggregations
|
|
@ -4,6 +4,8 @@ title: Top hits
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 130
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/top-hits/
|
||||
---
|
||||
|
||||
# Top hits aggregations
|
|
@ -4,6 +4,8 @@ title: Value count
|
|||
parent: Metric aggregations
|
||||
grand_parent: Aggregations
|
||||
nav_order: 140
|
||||
redirect_from:
|
||||
- /query-dsl/aggregations/metric/value-count/
|
||||
---
|
||||
|
||||
# Value count aggregations
|
|
@ -1,12 +1,11 @@
|
|||
---
|
||||
layout: default
|
||||
title: Pipeline aggregations
|
||||
parent: Aggregations
|
||||
nav_order: 5
|
||||
permalink: /aggregations/pipeline-agg/
|
||||
has_children: false
|
||||
redirect_from:
|
||||
- /opensearch/pipeline-agg/
|
||||
- /query-dsl/aggregations/pipeline-agg/
|
||||
---
|
||||
|
||||
# Pipeline aggregations
|
|
@ -0,0 +1,65 @@
|
|||
---
|
||||
layout: default
|
||||
title: Index analyzers
|
||||
nav_order: 20
|
||||
---
|
||||
|
||||
# Index analyzers
|
||||
|
||||
Index analyzers are specified at indexing time and are used to analyze [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) fields when indexing a document.
|
||||
|
||||
## Determining which index analyzer to use
|
||||
|
||||
To determine which analyzer to use for a field when a document is indexed, OpenSearch examines the following parameters in order:
|
||||
|
||||
1. The `analyzer` mapping parameter of the field
|
||||
1. The `analysis.analyzer.default` index setting
|
||||
1. The `standard` analyzer (default)
|
||||
|
||||
When specifying an index analyzer, keep in mind that in most cases, specifying an analyzer for each `text` field in an index works best. Analyzing both the text field (at indexing time) and the query string (at query time) with the same analyzer ensures that the search uses the same terms as those that are stored in the index.
|
||||
{: .important }
|
||||
|
||||
For information about verifying which analyzer is associated with which field, see [Verifying analyzer settings]({{site.url}}{{site.baseurl}}/analyzers/index/#verifying-analyzer-settings).
|
||||
|
||||
## Specifying an index analyzer for a field
|
||||
|
||||
When creating index mappings, you can supply the `analyzer` parameter for each [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. For example, the following request specifies the `simple` analyzer for the `text_entry` field:
|
||||
|
||||
```json
|
||||
PUT testindex
|
||||
{
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"text_entry": {
|
||||
"type": "text",
|
||||
"analyzer": "simple"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
## Specifying a default index analyzer for an index
|
||||
|
||||
If you want to use the same analyzer for all text fields in an index, you can specify it in the `analysis.analyzer.default` setting as follows:
|
||||
|
||||
```json
|
||||
PUT testindex
|
||||
{
|
||||
"settings": {
|
||||
"analysis": {
|
||||
"analyzer": {
|
||||
"default": {
|
||||
"type": "simple"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
If you don't specify a default analyzer, the `standard` analyzer is used.
|
||||
{: .note}
|
||||
|
|
@ -0,0 +1,163 @@
|
|||
---
|
||||
layout: default
|
||||
title: Text analysis
|
||||
has_children: true
|
||||
nav_order: 5
|
||||
nav_exclude: true
|
||||
has_toc: false
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/text-analyzers/
|
||||
- /query-dsl/analyzers/text-analyzers/
|
||||
- /analyzers/text-analyzers/
|
||||
---
|
||||
|
||||
# Text analysis
|
||||
|
||||
When you are searching documents using a full-text search, you want to receive all relevant results and not only exact matches. If you're looking for "walk", you're interested in results that contain any form of the word, like "Walk", "walked", or "walking." To facilitate full-text search, OpenSearch uses text analysis.
|
||||
|
||||
Text analysis consists of the following steps:
|
||||
|
||||
1. _Tokenize_ text into terms: For example, after tokenization, the phrase `Actions speak louder than words` is split into tokens `Actions`, `speak`, `louder`, `than`, and `words`.
|
||||
1. _Normalize_ the terms by converting them into a standard format, for example, converting them to lowercase or performing stemming (reducing the word to its root): For example, after normalization, `Actions` becomes `action`, `louder` becomes `loud`, and `words` becomes `word`.
|
||||
|
||||
## Analyzers
|
||||
|
||||
In OpenSearch, text analysis is performed by an _analyzer_. Each analyzer contains the following sequentially applied components:
|
||||
|
||||
1. **Character filters**: First, a character filter receives the original text as a stream of characters and adds, removes, or modifies characters in the text. For example, a character filter can strip HTML characters from a string so that the text `<p><b>Actions</b> speak louder than <em>words</em></p>` becomes `\nActions speak louder than words\n`. The output of a character filter is a stream of characters.
|
||||
|
||||
1. **Tokenizer**: Next, a tokenizer receives the stream of characters that has been processed by the character filter and splits the text into individual _tokens_ (usually, words). For example, a tokenizer can split text on white space so that the preceding text becomes [`Actions`, `speak`, `louder`, `than`, `words`]. Tokenizers also maintain metadata about tokens, such as their starting and ending positions in the text. The output of a tokenizer is a stream of tokens.
|
||||
|
||||
1. **Token filters**: Last, a token filter receives the stream of tokens from the tokenizer and adds, removes, or modifies tokens. For example, a token filter may lowercase the tokens so that `Actions` becomes `action`, remove stopwords like `than`, or add synonyms like `talk` for the word `speak`.
|
||||
|
||||
An analyzer must contain exactly one tokenizer and may contain zero or more character filters and zero or more token filters.
|
||||
{: .note}
|
||||
|
||||
## Built-in analyzers
|
||||
|
||||
The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`.
|
||||
|
||||
Analyzer | Analysis performed | Analyzer output
|
||||
:--- | :--- | :---
|
||||
**Standard** (default) | - Parses strings into tokens at word boundaries <br> - Removes most punctuation <br> - Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
|
||||
**Simple** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Converts tokens to lowercase | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
|
||||
**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
|
||||
**Stop** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Removes stop words <br> - Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
|
||||
**Keyword** (noop) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`]
|
||||
**Pattern** | - Parses strings into tokens using regular expressions <br> - Supports converting strings to lowercase <br> - Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
|
||||
[**Language**]({{site.url}}/{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
|
||||
**Fingerprint** | - Parses strings on any non-letter character <br> - Normalizes characters by converting them to ASCII <br> - Converts tokens to lowercase <br> - Sorts, deduplicates, and concatenates tokens into a single token <br> - Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`] <br> Note that the apostrophe was converted to its ASCII counterpart.
|
||||
|
||||
## Custom analyzers
|
||||
|
||||
If needed, you can combine tokenizers, token filters, and character filters to create a custom analyzer.
|
||||
|
||||
## Text analysis at indexing time and query time
|
||||
|
||||
OpenSearch performs text analysis on text fields when you index a document and when you send a search request. Depending on the time of text analysis, the analyzers used for it are classified as follows:
|
||||
|
||||
- An _index analyzer_ performs analysis at indexing time: When you are indexing a [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field, OpenSearch analyzes it before indexing it. For more information about ways to specify index analyzers, see [Index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/).
|
||||
|
||||
- A _search analyzer_ performs analysis at query time: OpenSearch analyzes the query string when you run a full-text query on a text field. For more information about ways to specify search analyzers, see [Search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
|
||||
|
||||
In most cases, you should use the same analyzer at both indexing and search time because the text field and the query string will be analyzed in the same way and the resulting tokens will match as expected.
|
||||
{: .tip}
|
||||
|
||||
### Example
|
||||
|
||||
When you index a document that has a text field with the text `Actions speak louder than words`, OpenSearch analyzes the text and produces the following list of tokens:
|
||||
|
||||
Text field tokens = [`action`, `speak`, `loud`, `than`, `word`]
|
||||
|
||||
When you search for documents that match the query `speaking loudly`, OpenSearch analyzes the query string and produces the following list of tokens:
|
||||
|
||||
Query string tokens = [`speak`, `loud`]
|
||||
|
||||
Then OpenSearch compares each token in the query string against the list of text field tokens and finds that both lists contain the tokens `speak` and `loud`, so OpenSearch returns this document as part of the search results that match the query.
|
||||
|
||||
## Testing an analyzer
|
||||
|
||||
To test a built-in analyzer and view the list of tokens it generates when a document is indexed, you can use the [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/#apply-a-built-in-analyzer).
|
||||
|
||||
Specify the analyzer and the text to be analyzed in the request:
|
||||
|
||||
```json
|
||||
GET /_analyze
|
||||
{
|
||||
"analyzer" : "standard",
|
||||
"text" : "Let’s contribute to OpenSearch!"
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The following image shows the query string.
|
||||
|
||||
![Query string with indices]({{site.url}}{{site.baseurl}}/images/string-indices.png)
|
||||
|
||||
The response contains each token and its start and end offsets that correspond to the starting index in the original string (inclusive) and the ending index (exclusive):
|
||||
|
||||
```json
|
||||
{
|
||||
"tokens": [
|
||||
{
|
||||
"token": "let’s",
|
||||
"start_offset": 0,
|
||||
"end_offset": 5,
|
||||
"type": "<ALPHANUM>",
|
||||
"position": 0
|
||||
},
|
||||
{
|
||||
"token": "contribute",
|
||||
"start_offset": 6,
|
||||
"end_offset": 16,
|
||||
"type": "<ALPHANUM>",
|
||||
"position": 1
|
||||
},
|
||||
{
|
||||
"token": "to",
|
||||
"start_offset": 17,
|
||||
"end_offset": 19,
|
||||
"type": "<ALPHANUM>",
|
||||
"position": 2
|
||||
},
|
||||
{
|
||||
"token": "opensearch",
|
||||
"start_offset": 20,
|
||||
"end_offset": 30,
|
||||
"type": "<ALPHANUM>",
|
||||
"position": 3
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Verifying analyzer settings
|
||||
|
||||
To verify which analyzer is associated with which field, you can use the get mapping API operation:
|
||||
|
||||
```json
|
||||
GET /testindex/_mapping
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
The response provides information about the analyzers for each field:
|
||||
|
||||
```json
|
||||
{
|
||||
"testindex": {
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"text_entry": {
|
||||
"type": "text",
|
||||
"analyzer": "simple",
|
||||
"search_analyzer": "whitespace"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Next steps
|
||||
|
||||
- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
|
|
@ -1,28 +1,26 @@
|
|||
---
|
||||
layout: default
|
||||
title: Language analyzers
|
||||
nav_order: 45
|
||||
parent: Text analyzers
|
||||
nav_order: 10
|
||||
---
|
||||
|
||||
# Language analyzer
|
||||
|
||||
OpenSearch supports the following language values with the `analyzer` option:
|
||||
arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, czech, danish, dutch, english, estonian, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, and thai.
|
||||
`arabic`, `armenian`, `basque`, `bengali`, `brazilian`, `bulgarian`, `catalan`, `czech`, `danish`, `dutch`, `english`, `estonian`, `finnish`, `french`, `galician`, `german`, `greek`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`, `lithuanian`, `norwegian`, `persian`, `portuguese`, `romanian`, `russian`, `sorani`, `spanish`, `swedish`, `turkish`, and `thai`.
|
||||
|
||||
To use the analyzer when you map an index, specify the value within your query. For example, to map your index with the French language analyzer, specify the `french` value for the analyzer field:
|
||||
|
||||
```json
|
||||
"analyzer": "french"
|
||||
```
|
||||
```
|
||||
|
||||
#### Example request
|
||||
|
||||
The following query maps an index with the language analyzer set to `french`:
|
||||
The following query specifies the `french` language analyzer for the index `my-index`:
|
||||
|
||||
```json
|
||||
PUT my-index-000001
|
||||
|
||||
PUT my-index
|
||||
{
|
||||
"mappings": {
|
||||
"properties": {
|
||||
|
@ -30,7 +28,7 @@ PUT my-index-000001
|
|||
"type": "text",
|
||||
"fields": {
|
||||
"french": {
|
||||
"type": "text",
|
||||
"type": "text",
|
||||
"analyzer": "french"
|
||||
}
|
||||
}
|
|
@ -0,0 +1,93 @@
|
|||
---
|
||||
layout: default
|
||||
title: Search analyzers
|
||||
nav_order: 30
|
||||
---
|
||||
|
||||
# Search analyzers
|
||||
|
||||
Search analyzers are specified at query time and are used to analyze the query string when you run a full-text query on a [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field.
|
||||
|
||||
## Determining which search analyzer to use
|
||||
|
||||
To determine which analyzer to use for a query string at query time, OpenSearch examines the following parameters in order:
|
||||
|
||||
1. The `analyzer` parameter of the query
|
||||
1. The `search_analyzer` mapping parameter of the field
|
||||
1. The `analysis.analyzer.default_search` index setting
|
||||
1. The `analyzer` mapping parameter of the field
|
||||
1. The `standard` analyzer (default)
|
||||
|
||||
In most cases, specifying a search analyzer that is different from the index analyzer is not necessary and could negatively impact search result relevance or lead to unexpected search results.
|
||||
{: .warning}
|
||||
|
||||
For information about verifying which analyzer is associated with which field, see [Verifying analyzer settings]({{site.url}}{{site.baseurl}}/analyzers/index/#verifying-analyzer-settings).
|
||||
|
||||
## Specifying a search analyzer for a query string
|
||||
|
||||
Specify the name of the analyzer you want to use at query time in the `analyzer` field:
|
||||
|
||||
```json
|
||||
GET shakespeare/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"text_entry": {
|
||||
"query": "speak the truth",
|
||||
"analyzer": "english"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
Valid values for [built-in analyzers]({{site.url}}/{{site.baseurl}}/analyzers/index/#built-in-analyzers/) are `standard`, `simple`, `whitespace`, `stop`, `keyword`, `pattern`, `fingerprint`, or any supported [language analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index/language-analyzers/).
|
||||
|
||||
## Specifying a search analyzer for a field
|
||||
|
||||
When creating index mappings, you can provide the `search_analyzer` parameter for each [text]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field. When providing the `search_analyzer`, you must also provide the `analyzer` parameter, which specifies the [index analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index-analyzers/) to be used at indexing time.
|
||||
|
||||
For example, the following request specifies the `simple` analyzer as the index analyzer and the `whitespace` analyzer as the search analyzer for the `text_entry` field:
|
||||
|
||||
```json
|
||||
PUT testindex
|
||||
{
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"text_entry": {
|
||||
"type": "text",
|
||||
"analyzer": "simple",
|
||||
"search_analyzer": "whitespace"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
{% include copy-curl.html %}
|
||||
|
||||
## Specifying the default search analyzer for an index
|
||||
|
||||
If you want to analyze all query strings at search time with the same analyzer, you can specify the search analyzer in the `analysis.analyzer.default_search` setting. When providing the `analysis.analyzer.default_search`, you must also provide the `analysis.analyzer.default` parameter, which specifies the [index analyzer]({{site.url}}/{{site.baseurl}}/analyzers/index-analyzers/) to be used at indexing time.
|
||||
|
||||
For example, the following request specifies the `simple` analyzer as the index analyzer and the `whitespace` analyzer as the search analyzer for the `testindex` index:
|
||||
|
||||
```json
|
||||
PUT testindex
|
||||
{
|
||||
"settings": {
|
||||
"analysis": {
|
||||
"analyzer": {
|
||||
"default": {
|
||||
"type": "simple"
|
||||
},
|
||||
"default_search": {
|
||||
"type": "whitespace"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
{% include copy-curl.html %}
|
|
@ -1,16 +1,20 @@
|
|||
---
|
||||
layout: default
|
||||
title: Perform text analysis
|
||||
parent: Analyze API
|
||||
|
||||
nav_order: 2
|
||||
title: Analyze API
|
||||
has_children: true
|
||||
nav_order: 7
|
||||
redirect_from:
|
||||
- /opensearch/rest-api/analyze-apis/
|
||||
- /api-reference/analyze-apis/
|
||||
---
|
||||
|
||||
# Perform text analysis
|
||||
# Analyze API
|
||||
|
||||
The perform text analysis API analyzes a text string and returns the resulting tokens.
|
||||
The Analyze API allows you to perform [text analysis]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/), which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search.
|
||||
|
||||
If you use the Security plugin, you must have the `manage index` privilege. If you simply want to analyze text, you must have the `manager cluster` privilege.
|
||||
The Analyze API analyzes a text string and returns the resulting tokens.
|
||||
|
||||
If you use the Security plugin, you must have the `manage index` privilege. If you only want to analyze text, you must have the `manage cluster` privilege.
|
||||
{: .note}
|
||||
|
||||
## Path and HTTP methods
|
||||
|
@ -22,7 +26,7 @@ POST /_analyze
|
|||
POST /{index}/_analyze
|
||||
```
|
||||
|
||||
Although you can issue an analyzer request via both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared to data that is already in the index. `POST` requests are not cached.
|
||||
Although you can issue an analyze request using both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared with data that is already in the index. `POST` requests are not cached.
|
||||
{: .note}
|
||||
|
||||
## Path parameter
|
|
@ -1,12 +0,0 @@
|
|||
---
|
||||
layout: default
|
||||
title: Analyze API
|
||||
has_children: true
|
||||
nav_order: 7
|
||||
redirect_from:
|
||||
- /opensearch/rest-api/analyze-apis/
|
||||
---
|
||||
|
||||
# Analyze API
|
||||
|
||||
The analyze API allows you to perform text analysis, which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search.
|
|
@ -20,7 +20,7 @@ If needed, you can combine tokenizers, token filters, and character filters to c
|
|||
|
||||
#### Tokenizers
|
||||
|
||||
Tokenizers break unstuctured text into tokens and maintain metadata about tokens, such as their start and ending positions in the text.
|
||||
Tokenizers break unstructured text into tokens and maintain metadata about tokens, such as their starting and ending positions in the text.
|
||||
|
||||
#### Character filters
|
||||
|
||||
|
|
|
@ -18,7 +18,7 @@ This reference includes the REST APIs supported by OpenSearch. If a REST API is
|
|||
|
||||
## Related articles
|
||||
|
||||
- [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/index/)
|
||||
- [Analyze API]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/)
|
||||
- [Access control API]({{site.url}}{{site.baseurl}}/security/access-control/api/)
|
||||
- [Alerting API]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/api/)
|
||||
- [Anomaly detection API]({{site.url}}{{site.baseurl}}/observing-your-data/ad/api/)
|
||||
|
@ -43,7 +43,8 @@ This reference includes the REST APIs supported by OpenSearch. If a REST API is
|
|||
- [Point in Time API]({{site.url}}{{site.baseurl}}/search-plugins/point-in-time-api/)
|
||||
- [Popular APIs]({{site.url}}{{site.baseurl}}/api-reference/popular-api/)
|
||||
- [Ranking evaluation]({{site.url}}{{site.baseurl}}/api-reference/rank-eval/)
|
||||
- [Reload search analyzer]({{site.url}}{{site.baseurl}}/api-reference/reload-search-analyzer/)
|
||||
- [Refresh search analyzer]({{site.url}}{{site.baseurl}}/im-plugin/refresh-analyzer/)
|
||||
- [Reload search analyzer]({{site.url}}{{site.baseurl}}/im-plugin/reload-search-analyzer/)
|
||||
- [Remove cluster information]({{site.url}}{{site.baseurl}}/api-reference/remote-info/)
|
||||
- [Root cause analysis API]({{site.url}}{{site.baseurl}}/monitoring-your-cluster/pa/rca/api/)
|
||||
- [Snapshot management API]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/sm-api/)
|
||||
|
|
14
_config.yml
14
_config.yml
|
@ -70,9 +70,15 @@ collections:
|
|||
reporting:
|
||||
permalink: /:collection/:path/
|
||||
output: true
|
||||
analyzers:
|
||||
permalink: /:collection/:path/
|
||||
output: true
|
||||
query-dsl:
|
||||
permalink: /:collection/:path/
|
||||
output: true
|
||||
aggregations:
|
||||
permalink: /:collection/:path/
|
||||
output: true
|
||||
field-types:
|
||||
permalink: /:collection/:path/
|
||||
output: true
|
||||
|
@ -133,8 +139,14 @@ just_the_docs:
|
|||
field-types:
|
||||
name: Mappings and field types
|
||||
nav_fold: true
|
||||
analyzers:
|
||||
name: Text analysis
|
||||
nav_fold: true
|
||||
query-dsl:
|
||||
name: Query DSL, Aggregations, and Analyzers
|
||||
name: Query DSL
|
||||
nav_fold: true
|
||||
aggregations:
|
||||
name: Aggregations
|
||||
nav_fold: true
|
||||
search-plugins:
|
||||
name: Search
|
||||
|
|
|
@ -52,7 +52,7 @@ The flat object field type supports the following queries:
|
|||
- [Range]({{site.url}}{{site.baseurl}}/query-dsl/term#range)
|
||||
- [Match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#match)
|
||||
- [Multi-match]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#multi-match)
|
||||
- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#query-string)
|
||||
- [Query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/)
|
||||
- [Simple query string]({{site.url}}{{site.baseurl}}/query-dsl/full-text/#simple-query-string)
|
||||
- [Exists]({{site.url}}{{site.baseurl}}/query-dsl/term#exists)
|
||||
|
||||
|
|
|
@ -2,10 +2,9 @@
|
|||
layout: default
|
||||
title: Refresh search analyzer
|
||||
nav_order: 50
|
||||
parent: Text analyzers
|
||||
has_toc: false
|
||||
redirect_from:
|
||||
- /im-plugin/refresh-analyzer/
|
||||
- /query-dsl/analyzers/refresh-analyzer/
|
||||
- /im-plugin/refresh-analyzer/index/
|
||||
---
|
||||
|
|
@ -62,7 +62,7 @@ You've created a new visualization that can be added to a new or existing dashbo
|
|||
|
||||
### Limitations of event analytics visualizations
|
||||
|
||||
Event analytics visualizations currently do not support [Dashboards Query Language (DQL)]({{site.url}}{{site.baseurl}}/dashboards/discover/dql/) or [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/), and they do not use index patterns. Note the following limitations:
|
||||
Event analytics visualizations currently do not support [Dashboards Query Language (DQL)]({{site.url}}{{site.baseurl}}/dashboards/discover/dql/) or [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/), and they do not use index patterns. Note the following limitations:
|
||||
|
||||
- Event analytics visualizations only use filters created using the dropdown interface. If you have DQL query or DSL filters in a dashboard, the visualizations do not use them.
|
||||
- The **Dashboard** filter dropdown interface only shows fields from the default index pattern or index patterns used by other visualizations in the same dashboard.
|
||||
|
|
|
@ -1,18 +0,0 @@
|
|||
---
|
||||
layout: default
|
||||
title: Bucket aggregations
|
||||
parent: Aggregations
|
||||
has_children: true
|
||||
has_toc: true
|
||||
nav_order: 3
|
||||
redirect_from:
|
||||
- /opensearch/bucket-agg/
|
||||
- /query-dsl/aggregations/bucket-agg/
|
||||
- /aggregations/bucket-agg/
|
||||
---
|
||||
|
||||
# Bucket aggregations
|
||||
|
||||
Bucket aggregations categorize sets of documents as buckets. The type of bucket aggregation determines whether a given document falls into a bucket or not.
|
||||
|
||||
You can use bucket aggregations to implement faceted navigation (usually placed as a sidebar on a search result landing page) to help your users narrow down the results.
|
|
@ -1,28 +0,0 @@
|
|||
---
|
||||
layout: default
|
||||
title: Metric aggregations
|
||||
parent: Aggregations
|
||||
has_children: true
|
||||
has_toc: true
|
||||
nav_order: 2
|
||||
redirect_from:
|
||||
- /opensearch/metric-agg/
|
||||
- /query-dsl/aggregations/metric-agg/
|
||||
- /aggregations/metric-agg/
|
||||
---
|
||||
|
||||
# Metric aggregations
|
||||
|
||||
Metric aggregations let you perform simple calculations such as finding the minimum, maximum, and average values of a field.
|
||||
|
||||
## Types of metric aggregations
|
||||
|
||||
Metric aggregations are of two types: single-value metric aggregations and multi-value metric aggregations.
|
||||
|
||||
### Single-value metric aggregations
|
||||
|
||||
Single-value metric aggregations return a single metric. For example, `sum`, `min`, `max`, `avg`, `cardinality`, and `value_count`.
|
||||
|
||||
### Multi-value metric aggregations
|
||||
|
||||
Multi-value metric aggregations return more than one metric. For example, `stats`, `extended_stats`, `matrix_stats`, `percentile`, `percentile_ranks`, `geo_bound`, `top_hits`, and `scripted_metric`.
|
|
@ -1,75 +0,0 @@
|
|||
---
|
||||
layout: default
|
||||
title: Text analyzers
|
||||
nav_order: 190
|
||||
has_children: true
|
||||
permalink: /analyzers/text-analyzers/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/text-analyzers/
|
||||
- /query-dsl/analyzers/text-analyzers/
|
||||
---
|
||||
|
||||
|
||||
# Optimizing text for searches with text analyzers
|
||||
|
||||
OpenSearch applies text analysis during indexing or searching for `text` fields. There is a standard analyzer that OpenSearch uses by default for text analysis. To optimize unstructured text for search, you can convert it into structured text with our text analyzers.
|
||||
|
||||
## Text analyzers
|
||||
|
||||
OpenSearch provides several text analyzers to convert your structured text into the format that works best for your searches.
|
||||
|
||||
OpenSearch supports the following text analyzers:
|
||||
|
||||
- **Standard analyzer** – Parses strings into terms at word boundaries according to the Unicode text segmentation algorithm. It removes most, but not all, punctuation and converts strings to lowercase. You can remove stop words if you enable that option, but it does not remove stop words by default.
|
||||
- **Simple analyzer** – Converts strings to lowercase and removes non-letter characters when it splits a string into tokens on any non-letter character.
|
||||
- **Whitespace analyzer** – Parses strings into terms between each whitespace.
|
||||
- **Stop analyzer** – Converts strings to lowercase and removes non-letter characters by splitting strings into tokens at each non-letter character. It also removes stop words (for example, "but" or "this") from strings.
|
||||
- **Keyword analyzer** – Receives a string as input and outputs the entire string as one term.
|
||||
- **Pattern analyzer** – Splits strings into terms using regular expressions and supports converting strings to lowercase. It also supports removing stop words.
|
||||
- **Language analyzer** – Provides analyzers specific to multiple languages.
|
||||
- **Fingerprint analyzer** – Creates a fingerprint to use as a duplicate detector.
|
||||
|
||||
The full specialized text analyzers reference is in progress and will be published soon.
|
||||
{: .note }
|
||||
|
||||
## How to use text analyzers
|
||||
|
||||
If you want to use a text analyzer, specify the name of the analyzer for the `analyzer` field: standard, simple, whitespace, stop, keyword, pattern, fingerprint, or language.
|
||||
|
||||
Each analyzer consists of one tokenizer and zero or more token filters. Different analyzers have different character filters, tokenizers, and token filters. To pre-process the string before the tokenizer is applied, you can use one or more character filters.
|
||||
|
||||
#### Example: Specify the standard analyzer in a simple query
|
||||
|
||||
```json
|
||||
GET _search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"title": "A brief history of Time",
|
||||
"analyzer": "standard"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Analyzer options
|
||||
|
||||
Option | Valid values | Description
|
||||
:--- | :--- | :---
|
||||
`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, language, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (for example, "an," "but," "this") from the query string. For a full list of acceptable language values, see [Language analyzer]({{site.url}}{{site.baseurl}}/query-dsl/analyzers/language-analyzers/) on this page.
|
||||
`quote_analyzer` | String | This option lets you choose to use the standard analyzer without any options, such as `language` or other analyzers. Usage is `"quote_analyzer": "standard"`.
|
||||
|
||||
<!-- This is a list of the 7 individual new pages we need to write
|
||||
If you want to select one of the text analyzers, see [Text analyzers reference]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/specialized-analyzers).
|
||||
|
||||
## Specialized text analyzers
|
||||
|
||||
1. Standard analyzer
|
||||
1. Simple
|
||||
1. Whitespace
|
||||
1. Stop
|
||||
1. Keyword
|
||||
1. Pattern
|
||||
1. Language
|
||||
1. Fingerprint
|
||||
-->
|
|
@ -4,10 +4,10 @@ title: Boolean queries
|
|||
parent: Compound queries
|
||||
grand_parent: Query DSL
|
||||
nav_order: 10
|
||||
permalink: /query-dsl/compound/bool/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/compound/bool/
|
||||
- /opensearch/query-dsl/bool/
|
||||
- /query-dsl/query-dsl/compound/bool/
|
||||
---
|
||||
|
||||
# Boolean queries
|
|
@ -4,6 +4,8 @@ title: Boosting queries
|
|||
parent: Compound queries
|
||||
grand_parent: Query DSL
|
||||
nav_order: 30
|
||||
redirect_from:
|
||||
- /query-dsl/query-dsl/compound/boosting/
|
||||
---
|
||||
|
||||
# Boosting queries
|
|
@ -4,6 +4,8 @@ title: Constant score queries
|
|||
parent: Compound queries
|
||||
grand_parent: Query DSL
|
||||
nav_order: 40
|
||||
redirect_from:
|
||||
- /query-dsl/query-dsl/compound/constant-score/
|
||||
---
|
||||
|
||||
# Constant score queries
|
|
@ -4,6 +4,8 @@ title: Disjunction max queries
|
|||
parent: Compound queries
|
||||
grand_parent: Query DSL
|
||||
nav_order: 50
|
||||
redirect_from:
|
||||
- /query-dsl/query-dsl/compound/disjunction-max/
|
||||
---
|
||||
|
||||
# Disjunction max queries
|
|
@ -5,6 +5,8 @@ parent: Compound queries
|
|||
grand_parent: Query DSL
|
||||
nav_order: 60
|
||||
has_math: true
|
||||
redirect_from:
|
||||
- /query-dsl/query-dsl/compound/function-score/
|
||||
---
|
||||
|
||||
# Function score queries
|
|
@ -1,12 +1,11 @@
|
|||
---
|
||||
layout: default
|
||||
title: Compound queries
|
||||
parent: Query DSL
|
||||
has_children: true
|
||||
nav_order: 40
|
||||
permalink: /query-dsl/compound/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/compound/index/
|
||||
- /query-dsl/query-dsl/compound/
|
||||
---
|
||||
|
||||
# Compound queries
|
|
@ -1,13 +1,13 @@
|
|||
---
|
||||
layout: default
|
||||
title: Full-text queries
|
||||
parent: Query DSL
|
||||
has_children: true
|
||||
nav_order: 30
|
||||
permalink: /query-dsl/full-text/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/full-text/
|
||||
- /opensearch/query-dsl/full-text/index/
|
||||
- /query-dsl/query-dsl/full-text/
|
||||
- /query-dsl/full-text/
|
||||
---
|
||||
|
||||
# Full-text queries
|
||||
|
@ -20,9 +20,6 @@ To learn more about search query classes, see [Lucene query JavaDocs](https://lu
|
|||
|
||||
The full-text query types shown in this section use the standard analyzer, which analyzes text automatically when the query is submitted.
|
||||
|
||||
You can also analyze fields when you index them. To learn more about how to convert unstructured text into structured text that is optimized for search, see [Optimizing text for searches with text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers).
|
||||
{: .note }
|
||||
|
||||
<!-- to do: rewrite query type definitions per issue: https://github.com/opensearch-project/documentation-website/issues/1116
|
||||
-->
|
||||
---
|
||||
|
@ -428,7 +425,7 @@ GET _search
|
|||
-->
|
||||
## Advanced filter options
|
||||
|
||||
You can filter your query results by using some of the optional query fields, such as wildcards, fuzzy query fields, and synonyms. You can also use analyzers as optional query fields. To learn more, see [How to use text analyzers]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers/#how-to-use-text-analyzers).
|
||||
You can filter your query results by using some of the optional query fields, such as wildcards, fuzzy query fields, or synonyms. You can also use analyzers as optional query fields.
|
||||
|
||||
### Wildcard options
|
||||
|
|
@ -4,9 +4,10 @@ title: Query string queries
|
|||
parent: Full-text queries
|
||||
grand_parent: Query DSL
|
||||
nav_order: 25
|
||||
permalink: /query-dsl/full-text/query-string/
|
||||
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/full-text/query-string/
|
||||
- /query-dsl/query-dsl/full-text/query-string/
|
||||
---
|
||||
|
||||
# Query string queries
|
|
@ -4,9 +4,9 @@ title: Geo-bounding box queries
|
|||
parent: Geographic and xy queries
|
||||
grand_parent: Query DSL
|
||||
nav_order: 10
|
||||
permalink: /query-dsl/geo-and-xy/geo-bounding-box/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/geo-and-xy/geo-bounding-box/
|
||||
- /query-dsl/query-dsl/geo-and-xy/geo-bounding-box/
|
||||
---
|
||||
|
||||
# Geo-bounding box queries
|
|
@ -1,12 +1,12 @@
|
|||
---
|
||||
layout: default
|
||||
title: Geographic and xy queries
|
||||
parent: Query DSL
|
||||
has_children: true
|
||||
nav_order: 50
|
||||
permalink: /query-dsl/geo-and-xy/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/geo-and-xy/index/
|
||||
- /query-dsl/query-dsl/geo-and-xy/
|
||||
- /query-dsl/query-dsl/geo-and-xy/index/
|
||||
---
|
||||
|
||||
# Geographic and xy queries
|
|
@ -4,9 +4,10 @@ title: xy queries
|
|||
parent: Geographic and xy queries
|
||||
grand_parent: Query DSL
|
||||
nav_order: 50
|
||||
permalink: /query-dsl/geo-and-xy/xy/
|
||||
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/geo-and-xy/xy/
|
||||
- /query-dsl/query-dsl/geo-and-xy/xy/
|
||||
---
|
||||
|
||||
# xy queries
|
|
@ -1,16 +1,85 @@
|
|||
---
|
||||
layout: default
|
||||
title: Query DSL, aggregations, and analyzers
|
||||
nav_order: 1
|
||||
has_children: false
|
||||
has_toc: false
|
||||
title: Query DSL
|
||||
nav_order: 2
|
||||
has_children: true
|
||||
nav_exclude: true
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/
|
||||
- /opensearch/query-dsl/index/
|
||||
- /docs/opensearch/query-dsl/
|
||||
- /query-dsl/query-dsl/
|
||||
- /query-dsl/
|
||||
---
|
||||
|
||||
# Query DSL, aggregations, and analyzers
|
||||
{%- comment -%}The `/docs/opensearch/query-dsl/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
|
||||
|
||||
[Analyzers]({{site.url}}{{site.baseurl}}/analyzers/text-analyzers/) process text to make it searchable. OpenSearch provides various analyzers that let you customize the way text is split into terms and converted into a structured format. To search documents written in a different language, you can use one of the built-in [language analyzers]({{site.url}}{{site.baseurl}}/query-dsl/analyzers/language-analyzers/) for your language of choice.
|
||||
# Query DSL
|
||||
|
||||
The most essential search function is using a query to return relevant documents. OpenSearch provides a search language called _query domain-specific language_ (DSL) that lets you build complex and targeted queries. Explore the [query DSL documentation]({{site.url}}{{site.baseurl}}/query-dsl/) to learn more about the different types of queries OpenSearch supports.
|
||||
OpenSearch provides a search language called *query domain-specific language (DSL)* that you can use to search your data. Query DSL is a flexible language with a JSON interface.
|
||||
|
||||
[Aggregations]({{site.url}}{{site.baseurl}}/aggregations/) let you categorize your data and analyze it to extract statistics. Use cases for aggregations include analyzing data in real time and using OpenSearch Dashboards to create visualizations.
|
||||
With query DSL, you need to specify a query in the `query` parameter of the search. One of the simplest searches in OpenSearch uses the `match_all` query, which matches all documents in an index:
|
||||
|
||||
```json
|
||||
GET testindex/_search
|
||||
{
|
||||
"query": {
|
||||
"match_all": {
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
A query can consist of many query clauses. You can combine query clauses to produce complex queries.
|
||||
|
||||
Broadly, you can classify queries into two categories---*leaf queries* and *compound queries*:
|
||||
|
||||
- **Leaf queries**: Leaf queries search for a specified value in a certain field or fields. You can use leaf queries on their own. They include the following query types:
|
||||
|
||||
- **Full-text queries**: Use full-text queries to search text documents. For an analyzed text field search, full-text queries split the query string into terms using the same analyzer that was used when the field was indexed. For an exact value search, full-text queries look for the specified value without applying text analysis. To learn more, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index/).
|
||||
|
||||
- **Term-level queries**: Use term-level queries to search documents for an exact term, such as an ID or value range. Term-level queries do not analyze search terms or sort results by relevance score. To learn more, see [Term-level queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/).
|
||||
|
||||
- **Geographic and xy queries**: Use geographic queries to search documents that include geographic data. Use xy queries to search documents that include points and shapes in a two-dimensional coordinate system. To learn more, see [Geographic and xy queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/index).
|
||||
|
||||
- **Joining queries**: Use joining queries to search nested fields or return parent and child documents that match a specific query. Types of joining queries include `nested`, `has_child`, `has_parent`, and `parent_id` queries.
|
||||
|
||||
- **Span queries**: Use span queries to perform precise positional searches. Span queries are low-level, specific queries that provide control over the order and proximity of specified query terms. They are primarily used to search legal documents. To learn more, see [Span queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/span-query/).
|
||||
|
||||
- **Specialized queries**: Specialized queries include all other query types (`distance_feature`, `more_like_this`, `percolate`, `rank_feature`, `script`, `script_score`, `wrapper`, and `pinned_query`).
|
||||
|
||||
- **Compound queries**: Compound queries serve as wrappers for multiple leaf or compound clauses, either to combine their results or to modify their behavior. They include the Boolean, disjunction max, constant score, function score, and boosting query types. To learn more, see [Compound queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/index/).
|
||||
|
||||
## A note on Unicode special characters in text fields
|
||||
|
||||
Because of word boundaries associated with Unicode special characters, the Unicode standard analyzer cannot index a [text field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) value as a whole value when it includes one of these special characters. As a result, a text field value that includes a special character is parsed by the standard analyzer as multiple values separated by the special character, effectively tokenizing the different elements on either side of it. This can lead to unintentional filtering of documents and potentially compromise control over their access.
|
||||
|
||||
The following examples illustrate values containing special characters that will be parsed improperly by the standard analyzer. In this example, the existence of the hyphen/minus sign in the value prevents the analyzer from distinguishing between the two different users for `user.id` and interprets them as being one and the same:
|
||||
|
||||
```json
|
||||
{
|
||||
"bool": {
|
||||
"must": {
|
||||
"match": {
|
||||
"user.id": "User-1"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"bool": {
|
||||
"must": {
|
||||
"match": {
|
||||
"user.id": "User-2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To avoid this circumstance when using either query DSL or the REST API, you can use a custom analyzer or map the field as `keyword`, which performs an exact-match search. See [Keyword field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) for the latter option.
|
||||
|
||||
For a list of characters that should be avoided when using `text` field types, see [Word Boundaries](https://unicode.org/reports/tr29/#Word_Boundaries).
|
|
@ -1,8 +1,9 @@
|
|||
---
|
||||
layout: default
|
||||
title: Minimum should match
|
||||
parent: Query DSL
|
||||
nav_order: 70
|
||||
redirect_from:
|
||||
- /query-dsl/query-dsl/minimum-should-match/
|
||||
---
|
||||
|
||||
# Minimum should match
|
|
@ -1,83 +0,0 @@
|
|||
---
|
||||
layout: default
|
||||
title: Query DSL
|
||||
nav_order: 2
|
||||
has_children: true
|
||||
permalink: /query-dsl/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/
|
||||
- /opensearch/query-dsl/index/
|
||||
- /docs/opensearch/query-dsl/
|
||||
---
|
||||
|
||||
{%- comment -%}The `/docs/opensearch/query-dsl/` redirect is specifically to support the UI links in OpenSearch Dashboards 1.0.0.{%- endcomment -%}
|
||||
|
||||
# Query DSL
|
||||
|
||||
OpenSearch provides a search language called *query domain-specific language (DSL)* that you can use to search your data. Query DSL is a flexible language with a JSON interface.
|
||||
|
||||
With query DSL, you need to specify a query in the `query` parameter of the search. One of the simplest searches in OpenSearch uses the `match_all` query, which matches all documents in an index:
|
||||
|
||||
```json
|
||||
GET testindex/_search
|
||||
{
|
||||
"query": {
|
||||
"match_all": {
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
A query can consist of many query clauses. You can combine query clauses to produce complex queries.
|
||||
|
||||
Broadly, you can classify queries into two categories---*leaf queries* and *compound queries*:
|
||||
|
||||
- **Leaf queries**: Leaf queries search for a specified value in a certain field or fields. You can use leaf queries on their own. They include the following query types:
|
||||
|
||||
- **Full-text queries**: Use full-text queries to search text documents. For an analyzed text field search, full-text queries split the query string into terms with the same analyzer that was used when the field was indexed. For an exact value search, full-text queries look for the specified value without applying text analysis. To learn more, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index).
|
||||
|
||||
- **Term-level queries**: Use term-level queries to search documents for an exact specified term, such as an ID or value range. Term-level queries do not analyze search terms or sort results by relevance score. To learn more, see [Term-level queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/).
|
||||
|
||||
- **Geographic and xy queries**: Use geographic queries to search documents that include geographic data. Use xy queries to search documents that include points and shapes in a two-dimensional coordinate system. To learn more, see [Geographic and xy queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/index).
|
||||
|
||||
- **Joining queries**: Use joining queries to search nested fields or return parent and child documents that match a specific query. Types of joining queries include `nested`, `has_child`, `has_parent`, and `parent_id` queries.
|
||||
|
||||
- **Span queries**: Use span queries to perform precise positional searches. Span queries are low-level, specific queries that provide control over the order and proximity of specified query terms. They are primarily used to search legal documents. To learn more, see [Span queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/span-query/).
|
||||
|
||||
- **Specialized queries**: Specialized queries include all other query types (`distance_feature`, `more_like_this`, `percolate`, `rank_feature`, `script`, `script_score`, `wrapper`, and `pinned_query`).
|
||||
|
||||
- **Compound queries**: Compound queries serve as wrappers for multiple leaf or compound clauses either to combine their results or to modify their behavior. They include the Boolean, disjunction max, constant score, function score, and boosting query types. To learn more, see [Compound queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/index).
|
||||
|
||||
## A note on Unicode special characters in text fields
|
||||
|
||||
Because of word boundaries associated with Unicode special characters, the Unicode standard analyzer cannot index a [text field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) value as a whole value when it includes one of these special characters. As a result, a text field value that includes a special character is parsed by the standard analyzer as multiple values separated by the special character, effectively tokenizing the different elements on either side of it. This can lead to unintentional filtering of documents and potentially compromise control over their access.
|
||||
|
||||
The following examples illustrate values containing special characters that will be parsed improperly by the standard analyzer. In this example, the existence of the hyphen/minus sign in the value prevents the analyzer from distinguishing between the two different users for `user.id` and interprets them as one and the same:
|
||||
|
||||
```json
|
||||
{
|
||||
"bool": {
|
||||
"must": {
|
||||
"match": {
|
||||
"user.id": "User-1"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"bool": {
|
||||
"must": {
|
||||
"match": {
|
||||
"user.id": "User-2"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To avoid this circumstance when using either query DSL or the REST API, you can use a custom analyzer or map the field as `keyword`, which performs an exact-match search. See [Keyword field type]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) for the latter option.
|
||||
|
||||
For a list of characters that should be avoided for `text` field types, see [Word Boundaries](https://unicode.org/reports/tr29/#Word_Boundaries).
|
|
@ -1,9 +1,9 @@
|
|||
---
|
||||
layout: default
|
||||
title: Query and filter context
|
||||
parent: Query DSL
|
||||
permalink: /query-dsl/query-filter-context/
|
||||
nav_order: 5
|
||||
redirect_from:
|
||||
- /query-dsl/query-dsl/query-filter-context/
|
||||
---
|
||||
|
||||
# Query and filter context
|
|
@ -1,11 +1,10 @@
|
|||
---
|
||||
layout: default
|
||||
title: Span queries
|
||||
parent: Query DSL
|
||||
nav_order: 60
|
||||
permalink: /query-dsl/span-query/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/span-query/
|
||||
- /query-dsl/query-dsl/span-query/
|
||||
---
|
||||
|
||||
# Span queries
|
|
@ -1,9 +1,9 @@
|
|||
---
|
||||
layout: default
|
||||
title: Term-level and full-text queries compared
|
||||
parent: Query DSL
|
||||
permalink: /query-dsl/term-vs-full-text/
|
||||
nav_order: 10
|
||||
redirect_from:
|
||||
- /query-dsl/query-dsl/term-vs-full-text
|
||||
---
|
||||
|
||||
# Term-level and full-text queries compared
|
|
@ -1,11 +1,10 @@
|
|||
---
|
||||
layout: default
|
||||
title: Term-level queries
|
||||
parent: Query DSL
|
||||
nav_order: 20
|
||||
permalink: /query-dsl/term/
|
||||
redirect_from:
|
||||
- /opensearch/query-dsl/term/
|
||||
- /query-dsl/query-dsl/term/
|
||||
---
|
||||
|
||||
# Term-level queries
|
|
@ -11,7 +11,7 @@ nav_exclude: true
|
|||
|
||||
OpenSearch provides several features for customizing your search use cases and improving search relevance. In OpenSearch, you can:
|
||||
|
||||
- Use [SQL and Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) as alternatives to [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/) to search data.
|
||||
- Use [SQL and Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) as alternatives to [query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/) for searching data.
|
||||
|
||||
- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).
|
||||
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 15 KiB |
Loading…
Reference in New Issue