OpenSearch/docs/reference/indices/analyze.asciidoc

[[indices-analyze]]
== Analyze

Performs the analysis process on a text and return the tokens breakdown
of the text.

Can be used without specifying an index against one of the many built in
analyzers:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "standard",
  "text" : "this is a test"
}'
--------------------------------------------------

If text parameter is provided as array of strings, it is analyzed as a multi-valued field.

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "standard",
  "text" : ["this is a test", "the second text"]
}'
--------------------------------------------------

Or by building a custom transient analyzer out of tokenizers,
token filters and char filters. Token filters can use the shorter 'filter'
parameter name:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze' -d '
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "text" : "this is a test"
}'

curl -XGET 'localhost:9200/_analyze' -d '
{
  "tokenizer" : "keyword",
  "token_filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}'
--------------------------------------------------

deprecated[5.0.0, Use `filter`/`token_filter`/`char_filter` instead of `filters`/`token_filters`/`char_filters`]

It can also run against a specific index:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/test/_analyze' -d '
{
  "text" : "this is a test"
}'
--------------------------------------------------

The above will run an analysis on the "this is a test" text, using the
default index analyzer associated with the `test` index. An `analyzer`
can also be provided to use a different analyzer:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/test/_analyze' -d '
{
  "analyzer" : "whitespace",
  "text : "this is a test"
}'
--------------------------------------------------

Also, the analyzer can be derived based on a field mapping, for example:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/test/_analyze' -d '
{
  "field" : "obj1.field1",
  "text" : "this is a test"
}'
--------------------------------------------------

Will cause the analysis to happen based on the analyzer configured in the
mapping for `obj1.field1` (and if not, the default index analyzer).

All parameters can also supplied as request parameters. For example:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filter=lowercase&text=this+is+a+test'
--------------------------------------------------

For backwards compatibility, we also accept the text parameter as the body of the request,
provided it doesn't start with `{` :

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&token_filter=lowercase&char_filter=html_strip' -d 'this is a <b>test</b>'
--------------------------------------------------

=== Explain Analyze

If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
You can filter token attributes you want to output by setting `attributes` option.

experimental[The format of the additional detail information is experimental and can change at any time]

[source,js]
--------------------------------------------------
GET _analyze
{
  "tokenizer" : "standard",
  "token_filter" : ["snowball"],
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"] <1>
}
--------------------------------------------------
// AUTOSENSE
<1> Set "keyword" to output "keyword" attribute only

coming[2.0.0, body based parameters were added in 2.0.0]

The request returns the following result:

[source,js]
--------------------------------------------------
{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [ {
        "token" : "detailed",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1
      } ]
    },
    "tokenfilters" : [ {
      "name" : "snowball",
      "tokens" : [ {
        "token" : "detail",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0,
        "keyword" : false <1>
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1,
        "keyword" : false <1>
      } ]
    } ]
  }
}
--------------------------------------------------
<1> Output only "keyword" attribute, since specify "attributes" in the request.
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[indices-analyze]]`
			`== Analyze`

			`Performs the analysis process on a text and return the tokens breakdown`
			`of the text.`

			`Can be used without specifying an index against one of the many built in`
			`analyzers:`

			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"analyzer" : "standard",`
			`"text" : "this is a test"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

Analysis: Add multi-valued text support Add support array text as a multi-valued for AnalyzeRequestBuilder Add support array text as a multi-valued for Analyze REST API Add docs Closes #3023 2015-04-26 21:55:21 -04:00			`If text parameter is provided as array of strings, it is analyzed as a multi-valued field.`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"analyzer" : "standard",`
			`"text" : ["this is a test", "the second text"]`
			`}'`
			`--------------------------------------------------`

Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00			`Or by building a custom transient analyzer out of tokenizers,`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`token filters and char filters. Token filters can use the shorter 'filter'`
Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00			`parameter name:`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"tokenizer" : "keyword",`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`"filter" : ["lowercase"],`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`"text" : "this is a test"`
			`}'`
Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"tokenizer" : "keyword",`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`"token_filter" : ["lowercase"],`
			`"char_filter" : ["html_strip"],`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`"text" : "this is a <b>test</b>"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			deprecated[5.0.0, Use `filter`/`token_filter`/`char_filter` instead of `filters`/`token_filters`/`char_filters`]

Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`It can also run against a specific index:`

			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/test/_analyze' -d '`
			`{`
			`"text" : "this is a test"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

			`The above will run an analysis on the "this is a test" text, using the`
			default index analyzer associated with the `test` index. An `analyzer`
			`can also be provided to use a different analyzer:`

			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/test/_analyze' -d '`
			`{`
			`"analyzer" : "whitespace",`
			`"text : "this is a test"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

			`Also, the analyzer can be derived based on a field mapping, for example:`

			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/test/_analyze' -d '`
			`{`
			`"field" : "obj1.field1",`
			`"text" : "this is a test"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

[DOCS] Multiple doc fixes Closes #5047 2014-03-07 08:21:45 -05:00			`Will cause the analysis to happen based on the analyzer configured in the`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			mapping for `obj1.field1` (and if not, the default index analyzer).

Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`All parameters can also supplied as request parameters. For example:`

			`[source,js]`
			`--------------------------------------------------`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filter=lowercase&text=this+is+a+test'`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`--------------------------------------------------`

			`For backwards compatibility, we also accept the text parameter as the body of the request,`
			provided it doesn't start with `{` :

			`[source,js]`
			`--------------------------------------------------`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&token_filter=lowercase&char_filter=html_strip' -d 'this is a <b>test</b>'`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`--------------------------------------------------`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00
			`=== Explain Analyze`

			If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
			You can filter token attributes you want to output by setting `attributes` option.

			`experimental[The format of the additional detail information is experimental and can change at any time]`

			`[source,js]`
			`--------------------------------------------------`
Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start. 2016-04-29 10:42:03 -04:00			`GET _analyze`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`{`
			`"tokenizer" : "standard",`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`"token_filter" : ["snowball"],`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`"text" : "detailed output",`
			`"explain" : true,`
			`"attributes" : ["keyword"] <1>`
			`}`
			`--------------------------------------------------`
			`// AUTOSENSE`
			`<1> Set "keyword" to output "keyword" attribute only`

			`coming[2.0.0, body based parameters were added in 2.0.0]`

			`The request returns the following result:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"detail" : {`
			`"custom_analyzer" : true,`
			`"charfilters" : [ ],`
			`"tokenizer" : {`
			`"name" : "standard",`
			`"tokens" : [ {`
			`"token" : "detailed",`
			`"start_offset" : 0,`
			`"end_offset" : 8,`
			`"type" : "<ALPHANUM>",`
			`"position" : 0`
			`}, {`
			`"token" : "output",`
			`"start_offset" : 9,`
			`"end_offset" : 15,`
			`"type" : "<ALPHANUM>",`
			`"position" : 1`
			`} ]`
			`},`
			`"tokenfilters" : [ {`
			`"name" : "snowball",`
			`"tokens" : [ {`
			`"token" : "detail",`
			`"start_offset" : 0,`
			`"end_offset" : 8,`
			`"type" : "<ALPHANUM>",`
			`"position" : 0,`
			`"keyword" : false <1>`
			`}, {`
			`"token" : "output",`
			`"start_offset" : 9,`
			`"end_offset" : 15,`
			`"type" : "<ALPHANUM>",`
			`"position" : 1,`
			`"keyword" : false <1>`
			`} ]`
			`} ]`
			`}`
			`}`
			`--------------------------------------------------`
			`<1> Output only "keyword" attribute, since specify "attributes" in the request.`