add editorial changes for 1376-text analyzers (#1577)
* test new DCO bypass * for dco auto sign test Signed-off-by: alicejw <alicejw@amazon.com> * test dco check Signed-off-by: alicejw <alicejw@amazon.com> * for new analyzers page Signed-off-by: alicejw <alicejw@amazon.com> * test dco check after pull from main Signed-off-by: alicejw <alicejw@amazon.com> * for new text analyzers page Signed-off-by: alicejw <alicejw@amazon.com> * remove lang analyzers section from fulltext page, add link to new page text analyzers Signed-off-by: alicejw <alicejw@amazon.com> * rename page to text-analyzers Signed-off-by: alicejw <alicejw@amazon.com> * rmv test text for DCO check Signed-off-by: alicejw <alicejw@amazon.com> * for querydsl analyzers Signed-off-by: alicejw <alicejw@amazon.com> * for note about other 7 analyzer sections to be published soon Signed-off-by: alicejw <alicejw@amazon.com> * for definitions of 7 specialized analyzers and note that full reference is in-progress to be published soon Signed-off-by: alicejw <alicejw@amazon.com> * add note to learn more and point to concepts page Signed-off-by: alicejw <alicejw@amazon.com> * for peer edit comments Signed-off-by: alicejw <alicejw@amazon.com> * add new line Signed-off-by: alicejw <alicejw@amazon.com> * remove specialized modifier for the text analyzers Signed-off-by: alicejw <alicejw@amazon.com> * doc review comments Signed-off-by: alicejw <alicejw@amazon.com> * change title Signed-off-by: alicejw <alicejw@amazon.com> * better page title Signed-off-by: alicejw <alicejw@amazon.com> * for editorial review updates Signed-off-by: alicejw <alicejw@amazon.com> Signed-off-by: alicejw <alicejw@amazon.com>
This commit is contained in:
parent
7445690c93
commit
db749a60e8
|
@ -16,23 +16,24 @@ OpenSearch provides several text analyzers to convert your structured text into
|
|||
|
||||
OpenSearch supports the following text analyzers:
|
||||
|
||||
1. **Standard analyzer** – parses strings into terms at word boundaries per the Unicode text segmentation algorithm. It removes most, but not all punctuation. It converts strings to lowercase. You can remove stop words if you turn on that option, but it does not remove stop words by default.
|
||||
1. **Simple analyzer** – converts strings to lowercase, and removes non-letter characters when it splits a string into tokens on any non-letter character.
|
||||
1. **Whitespace analyzer** – parses strings into terms between each whitespace.
|
||||
1. **Stop analyzer** – Converts strings to lowercase and removes non-letter characters by splitting strings into tokens at each non-letter character. It also removes stop words (e.g. "but," or "this") from strings.
|
||||
1. **Keyword analyzer** – receives a string as input and outputs the entire string as one term.
|
||||
1. **Pattern analyzer** – splits strings into terms using regular expressions and supports converting strings to lowercase. It also supports removing stop words.
|
||||
1. **Language analyzer** – provides analyzers specific to multiple languages.
|
||||
1. **Fingerprint analyzer** – creates a fingerprint to use as a duplicate detector.
|
||||
1. **Standard analyzer** – Parses strings into terms at word boundaries per the Unicode text segmentation algorithm. It removes most, but not all, punctuation. It converts strings to lowercase. You can remove stop words if you turn on that option, but it does not remove stop words by default.
|
||||
1. **Simple analyzer** – Converts strings to lowercase and removes non-letter characters when it splits a string into tokens on any non-letter character.
|
||||
1. **Whitespace analyzer** – Parses strings into terms between each whitespace.
|
||||
1. **Stop analyzer** – Converts strings to lowercase and removes non-letter characters by splitting strings into tokens at each non-letter character. It also removes stop words (e.g., "but" or "this") from strings.
|
||||
1. **Keyword analyzer** – Receives a string as input and outputs the entire string as one term.
|
||||
1. **Pattern analyzer** – Splits strings into terms using regular expressions and supports converting strings to lowercase. It also supports removing stop words.
|
||||
1. **Language analyzer** – Provides analyzers specific to multiple languages.
|
||||
1. **Fingerprint analyzer** – Creates a fingerprint to use as a duplicate detector.
|
||||
|
||||
The full specialized text analyzers reference is in progress and will be published soon.
|
||||
|
||||
The full specialized text analyzers reference is in-progress and will be published soon.
|
||||
{: .note }
|
||||
|
||||
## How to use text analyzers
|
||||
|
||||
If you want to use a text analyzer, specify the name of the analyzer for the `analyzer` field: standard, simple, whitespace, stop, keyword, pattern, fingerprint, and language.
|
||||
If you want to use a text analyzer, specify the name of the analyzer for the `analyzer` field: standard, simple, whitespace, stop, keyword, pattern, fingerprint, or language.
|
||||
|
||||
Each analyzer consists of one tokenizer and zero or more token filters. Different analyzers have different character filters, tokenizers and token filters. To pre-process the string before the tokenizer is applied, you can use one or more character filters.
|
||||
Each analyzer consists of one tokenizer and zero or more token filters. Different analyzers have different character filters, tokenizers, and token filters. To pre-process the string before the tokenizer is applied, you can use one or more character filters.
|
||||
|
||||
#### Example: Specify the standard analyzer in a simple query
|
||||
|
||||
|
|
Loading…
Reference in New Issue