kolchfa-aws 5abc22147c
Refactor the Query DSL section (#2904)
* for query dsl index page rewrites for proper index page

Signed-off-by: alicejw <alicejw@amazon.com>

* fix formatting in table

Signed-off-by: alicejw <alicejw@amazon.com>

* update query table intro

Signed-off-by: alicejw <alicejw@amazon.com>

* rmv proprietary from overview

Signed-off-by: alicejw <alicejw@amazon.com>

* awkward sentence fix

Signed-off-by: alicejw <alicejw@amazon.com>

* to add list of all query categories

Signed-off-by: alicejw <alicejw@amazon.com>

* for query category descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* remove commented note

Signed-off-by: alicejw <alicejw@amazon.com>

* update term-level query page

Signed-off-by: alicejw <alicejw@amazon.com>

* for clarity about term and full-text query use cases

Signed-off-by: alicejw <alicejw@amazon.com>

* for parallel bullet list of queries

Signed-off-by: alicejw <alicejw@amazon.com>

* remove redundant word

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* for tech review feedback

Signed-off-by: alicejw <alicejw@amazon.com>

* for entire list of query types we support, even though we don't have document topic pages for them yet.

Signed-off-by: alicejw <alicejw@amazon.com>

* to include full list of query types we support

Signed-off-by: alicejw <alicejw@amazon.com>

* change Boolean to  type for consistency in the section

Signed-off-by: alicejw <alicejw@amazon.com>

* update query type category list title

Signed-off-by: alicejw <alicejw@amazon.com>

* for compound query type definitions

Signed-off-by: alicejw <alicejw@amazon.com>

* for additional descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* for query context descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* for additional edits to query descriptions list

Signed-off-by: alicejw <alicejw@amazon.com>

* create span query category page and update bullet list on index to cross-reference to it.

Signed-off-by: alicejw <alicejw@amazon.com>

* add pages for geo and shape query category, and add cross-references

Signed-off-by: alicejw <alicejw@amazon.com>

* remove regex it is part of term-level queries

Signed-off-by: alicejw <alicejw@amazon.com>

* for bullet list granular edits

Signed-off-by: alicejw <alicejw@amazon.com>

* put bullet list in alphabetical order

Signed-off-by: alicejw <alicejw@amazon.com>

* for doc review updates

Signed-off-by: alicejw <alicejw@amazon.com>

* reword for reviewer feedback

Signed-off-by: alicejw <alicejw@amazon.com>

* small rewording

Signed-off-by: alicejw <alicejw@amazon.com>

* typo space

Signed-off-by: alicejw <alicejw@amazon.com>

* put topics in alphabetical order in left nav

Signed-off-by: alicejw <alicejw@amazon.com>

* additional reviewer's comment

Signed-off-by: alicejw <alicejw@amazon.com>

* for second doc reviewer's feedback updates

Signed-off-by: alicejw <alicejw@amazon.com>

* for doc reviewer comment that was hidden

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/geo-and-shape.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/span-query.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/span-query.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* updates from third doc review for tech accuracy requested by editorial

Signed-off-by: alicejw <alicejw@amazon.com>

* create compound query sub-page to move descriptions to make bullet list parallel

Signed-off-by: alicejw <alicejw@amazon.com>

* fix compound query page title

Signed-off-by: alicejw <alicejw@amazon.com>

* add fuzzy query definition

Signed-off-by: alicejw <alicejw@amazon.com>

* for editorial feedback updates

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Refactor Query DSL section

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Adds doc review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Fix typo

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Changed periods to colons when introducing code blocks

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: alicejw <alicejw@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: alicejw <alicejw@amazon.com>
Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com>
2023-02-15 17:12:50 -05:00

4.7 KiB
Raw Blame History

layout title parent nav_order
default Text analyzers Query DSL 75

Optimizing text for searches with text analyzers

OpenSearch applies text analysis during indexing or searching for text fields. There is a standard analyzer that OpenSearch uses by default for text analysis. To optimize unstructured text for search, you can convert it into structured text with our text analyzers.

Text analyzers

OpenSearch provides several text analyzers to convert your structured text into the format that works best for your searches.

OpenSearch supports the following text analyzers:

  • Standard analyzer Parses strings into terms at word boundaries according to the Unicode text segmentation algorithm. It removes most, but not all, punctuation and converts strings to lowercase. You can remove stop words if you enable that option, but it does not remove stop words by default.
  • Simple analyzer Converts strings to lowercase and removes non-letter characters when it splits a string into tokens on any non-letter character.
  • Whitespace analyzer Parses strings into terms between each whitespace.
  • Stop analyzer Converts strings to lowercase and removes non-letter characters by splitting strings into tokens at each non-letter character. It also removes stop words (for example, "but" or "this") from strings.
  • Keyword analyzer Receives a string as input and outputs the entire string as one term.
  • Pattern analyzer Splits strings into terms using regular expressions and supports converting strings to lowercase. It also supports removing stop words.
  • Language analyzer Provides analyzers specific to multiple languages.
  • Fingerprint analyzer Creates a fingerprint to use as a duplicate detector.

The full specialized text analyzers reference is in progress and will be published soon. {: .note }

How to use text analyzers

If you want to use a text analyzer, specify the name of the analyzer for the analyzer field: standard, simple, whitespace, stop, keyword, pattern, fingerprint, or language.

Each analyzer consists of one tokenizer and zero or more token filters. Different analyzers have different character filters, tokenizers, and token filters. To pre-process the string before the tokenizer is applied, you can use one or more character filters.

Example: Specify the standard analyzer in a simple query

 GET _search
{
  "query": {
    "match": {
      "title": "A brief history of Time",
        "analyzer": "standard"
       }
    }
  }

Analyzer options

Option Valid values Description
analyzer standard, simple, whitespace, stop, keyword, pattern, language, fingerprint The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The stop analyzer, for example, removes stop words (for example, "an," "but," "this") from the query string. For a full list of acceptable language values, see Language analyzer on this page.
quote_analyzer String This option lets you choose to use the standard analyzer without any options, such as language or other analyzers. Usage is "quote_analyzer": "standard".

Language analyzer

OpenSearch supports the following language values with the analyzer option: arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, czech, danish, dutch, english, estonian, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, and thai.

To use the analyzer when you map an index, specify the value within your query. For example, to map your index with the French language analyzer, specify the french value for the analyzer field:

 "analyzer": "french"

Sample Request

The following query maps an index with the language analyzer set to french:

PUT my-index-000001

{
  "mappings": {
    "properties": {
      "text": { 
        "type": "text",
        "fields": {
          "french": { 
            "type":     "text",
            "analyzer": "french"
          }
        }
      }
    }
  }
}