OpenSearch/docs/reference/analysis/analyzers/standard-analyzer.asciidoc

[[analysis-standard-analyzer]]
=== Standard Analyzer

The `standard` analyzer is the default analyzer which is used if none is
specified. It provides grammar based tokenization (based on the Unicode Text
Segmentation algorithm, as specified in
http://unicode.org/reports/tr29/[Unicode Standard Annex #29]) and works well
for most languages.

[float]
=== Definition

It consists of:

Tokenizer::
* <<analysis-standard-tokenizer,Standard Tokenizer>>

Token Filters::
* <<analysis-standard-tokenfilter,Standard Token Filter>>
* <<analysis-lowercase-tokenfilter,Lower Case Token Filter>>
* <<analysis-stop-tokenfilter,Stop Token Filter>> (disabled by default)

[float]
=== Example output

[source,js]
---------------------------
POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
---------------------------
// CONSOLE

The above sentence would produce the following terms:

[source,text]
---------------------------
[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]
---------------------------

[float]
=== Configuration

The `standard` analyzer accepts the following parameters:

[horizontal]
`max_token_length`::

    The maximum token length. If a token is seen that exceeds this length then
    it is split at `max_token_length` intervals. Defaults to `255`.

`stopwords`::

    A pre-defined stop words list like `_english_` or an array  containing a
    list of stop words.  Defaults to `_none_`.

`stopwords_path`::

    The path to a file containing stop words.

See the <<analysis-stop-tokenfilter,Stop Token Filter>> for more information
about stop word configuration.


[float]
=== Example configuration

In this example, we configure the `standard` analyzer to have a
`max_token_length` of 5 (for demonstration purposes), and to use the
pre-defined list of English stop words:

[source,js]
----------------------------
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_english_analyzer": {
          "type": "standard",
          "max_token_length": 5,
          "stopwords": "_english_"
        }
      }
    }
  }
}

GET _cluster/health?wait_for_status=yellow

POST my_index/_analyze
{
  "analyzer": "my_english_analyzer",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
----------------------------
// CONSOLE

The above example produces the following terms:

[source,text]
---------------------------
[ 2, quick, brown, foxes, jumpe, d, over, lazy, dog's, bone ]
---------------------------