OpenSearch/docs/reference/analysis/testing.asciidoc

== Testing analyzers

The <<indices-analyze,`analyze` API>> is an invaluable tool for viewing the
terms produced by an analyzer. A built-in analyzer (or combination of built-in
tokenizer, token filters, and character filters) can be specified inline in
the request:

[source,js]
-------------------------------------
POST _analyze
{
  "analyzer": "whitespace",
  "text":     "The quick brown fox."
}

POST _analyze
{
  "tokenizer": "standard",
  "filter":  [ "lowercase", "asciifolding" ],
  "text":      "Is this déja vu?"
}
-------------------------------------
// CONSOLE


.Positions and character offsets
*********************************************************

As can be seen from the output of the `analyze` API, analyzers not only
convert words into terms, they also record the order or relative _positions_
of each term (used for phrase queries or word proximity queries), and the
start and end _character offsets_ of each term in the original text (used for
highlighting search snippets).

*********************************************************


Alternatively, a <<analysis-custom-analyzer,`custom` analyzer>> can be
referred to when running the `analyze` API on a specific index:

[source,js]
-------------------------------------
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "std_folded": { <1>
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "my_text": {
          "type": "text",
          "analyzer": "std_folded" <2>
        }
      }
    }
  }
}

GET my_index/_analyze <3>
{
  "analyzer": "std_folded", <4>
  "text":     "Is this déjà vu?"
}

GET my_index/_analyze <3>
{
  "field": "my_text", <5>
  "text":  "Is this déjà vu?"
}
-------------------------------------
// CONSOLE

<1> Define a `custom` analyzer called `std_folded`.
<2> The field `my_text` uses the `std_folded` analyzer.
<3> To refer to this analyzer, the `analyze` API must specify the index name.
<4> Refer to the analyzer by name.
<5> Refer to the analyzer used by field `my_text`.
First pass at improving analyzer docs (#18269) * Docs: First pass at improving analyzer docs I've rewritten the intro to analyzers plus the docs for all analyzers to provide working examples. I've also removed: * analyzer aliases (see #18244) * analyzer versions (see #18267) * snowball analyzer (see #8690) Next steps will be tokenizers, token filters, char filters * Fixed two typos 2016-05-11 08:17:56 -04:00			`== Testing analyzers`

			The <<indices-analyze,`analyze` API>> is an invaluable tool for viewing the
			`terms produced by an analyzer. A built-in analyzer (or combination of built-in`
			`tokenizer, token filters, and character filters) can be specified inline in`
			`the request:`

			`[source,js]`
			`-------------------------------------`
			`POST _analyze`
			`{`
			`"analyzer": "whitespace",`
			`"text": "The quick brown fox."`
			`}`

			`POST _analyze`
			`{`
			`"tokenizer": "standard",`
			`"filter": [ "lowercase", "asciifolding" ],`
			`"text": "Is this déja vu?"`
			`}`
			`-------------------------------------`
			`// CONSOLE`



			`.Positions and character offsets`
			`*********************************************************`

			As can be seen from the output of the `analyze` API, analyzers not only
			`convert words into terms, they also record the order or relative _positions_`
			`of each term (used for phrase queries or word proximity queries), and the`
			`start and end _character offsets_ of each term in the original text (used for`
			`highlighting search snippets).`

			`*********************************************************`


			Alternatively, a <<analysis-custom-analyzer,`custom` analyzer>> can be
			referred to when running the `analyze` API on a specific index:

			`[source,js]`
			`-------------------------------------`
			`PUT my_index`
			`{`
			`"settings": {`
			`"analysis": {`
			`"analyzer": {`
			`"std_folded": { <1>`
			`"type": "custom",`
			`"tokenizer": "standard",`
			`"filter": [`
			`"lowercase",`
			`"asciifolding"`
			`]`
			`}`
			`}`
			`}`
			`},`
			`"mappings": {`
			`"my_type": {`
			`"properties": {`
			`"my_text": {`
			`"type": "text",`
			`"analyzer": "std_folded" <2>`
			`}`
			`}`
			`}`
			`}`
			`}`

			`GET my_index/_analyze <3>`
			`{`
			`"analyzer": "std_folded", <4>`
			`"text": "Is this déjà vu?"`
			`}`

			`GET my_index/_analyze <3>`
			`{`
			`"field": "my_text", <5>`
			`"text": "Is this déjà vu?"`
			`}`
			`-------------------------------------`
			`// CONSOLE`

			<1> Define a `custom` analyzer called `std_folded`.
			<2> The field `my_text` uses the `std_folded` analyzer.
			<3> To refer to this analyzer, the `analyze` API must specify the index name.
			`<4> Refer to the analyzer by name.`
			<5> Refer to the analyzer used by field `my_text`.