OpenSearch/docs/reference/analysis/testing.asciidoc

== Testing analyzers

The <<indices-analyze,`analyze` API>> is an invaluable tool for viewing the
terms produced by an analyzer. A built-in analyzer (or combination of built-in
tokenizer, token filters, and character filters) can be specified inline in
the request:

[source,js]
-------------------------------------
POST _analyze
{
  "analyzer": "whitespace",
  "text":     "The quick brown fox."
}

POST _analyze
{
  "tokenizer": "standard",
  "filter":  [ "lowercase", "asciifolding" ],
  "text":      "Is this déja vu?"
}
-------------------------------------
// CONSOLE


.Positions and character offsets
*********************************************************

As can be seen from the output of the `analyze` API, analyzers not only
convert words into terms, they also record the order or relative _positions_
of each term (used for phrase queries or word proximity queries), and the
start and end _character offsets_ of each term in the original text (used for
highlighting search snippets).

*********************************************************


Alternatively, a <<analysis-custom-analyzer,`custom` analyzer>> can be
referred to when running the `analyze` API on a specific index:

[source,js]
-------------------------------------
PUT my_index?include_type_name=true
{
  "settings": {
    "analysis": {
      "analyzer": {
        "std_folded": { <1>
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "my_text": {
          "type": "text",
          "analyzer": "std_folded" <2>
        }
      }
    }
  }
}

GET my_index/_analyze <3>
{
  "analyzer": "std_folded", <4>
  "text":     "Is this déjà vu?"
}

GET my_index/_analyze <3>
{
  "field": "my_text", <5>
  "text":  "Is this déjà vu?"
}
-------------------------------------
// CONSOLE

<1> Define a `custom` analyzer called `std_folded`.
<2> The field `my_text` uses the `std_folded` analyzer.
<3> To refer to this analyzer, the `analyze` API must specify the index name.
<4> Refer to the analyzer by name.
<5> Refer to the analyzer used by field `my_text`.