OpenSearch/docs/reference/indices/analyze.asciidoc

[[indices-analyze]]
== Analyze

Performs the analysis process on a text and return the tokens breakdown
of the text.

Can be used without specifying an index against one of the many built in
analyzers:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "standard",
  "text" : "this is a test"
}'
--------------------------------------------------

If text parameter is provided as array of strings, it is analyzed as a multi-valued field.

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "standard",
  "text" : ["this is a test", "the second text"]
}'
--------------------------------------------------

Or by building a custom transient analyzer out of tokenizers,
token filters and char filters. Token filters can use the shorter 'filter'
parameter name:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze' -d '
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "text" : "this is a test"
}'

curl -XGET 'localhost:9200/_analyze' -d '
{
  "tokenizer" : "keyword",
  "token_filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}'
--------------------------------------------------

deprecated[5.0.0, Use `filter`/`token_filter`/`char_filter` instead of `filters`/`token_filters`/`char_filters`]

Custom tokenizers, token filters, and character filters can be specified in the request body as follows:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze' -d '
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}'
--------------------------------------------------

It can also run against a specific index:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/test/_analyze' -d '
{
  "text" : "this is a test"
}'
--------------------------------------------------

The above will run an analysis on the "this is a test" text, using the
default index analyzer associated with the `test` index. An `analyzer`
can also be provided to use a different analyzer:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/test/_analyze' -d '
{
  "analyzer" : "whitespace",
  "text" : "this is a test"
}'
--------------------------------------------------

Also, the analyzer can be derived based on a field mapping, for example:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/test/_analyze' -d '
{
  "field" : "obj1.field1",
  "text" : "this is a test"
}'
--------------------------------------------------

Will cause the analysis to happen based on the analyzer configured in the
mapping for `obj1.field1` (and if not, the default index analyzer).

All parameters can also supplied as request parameters. For example:

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filter=lowercase&text=this+is+a+test'
--------------------------------------------------

For backwards compatibility, we also accept the text parameter as the body of the request,
provided it doesn't start with `{` :

[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&token_filter=lowercase&char_filter=html_strip' -d 'this is a <b>test</b>'
--------------------------------------------------

=== Explain Analyze

If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
You can filter token attributes you want to output by setting `attributes` option.

experimental[The format of the additional detail information is experimental and can change at any time]

[source,js]
--------------------------------------------------
GET _analyze
{
  "tokenizer" : "standard",
  "filter" : ["snowball"],
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"] <1>
}
--------------------------------------------------
// CONSOLE
<1> Set "keyword" to output "keyword" attribute only

The request returns the following result:

[source,js]
--------------------------------------------------
{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [ {
        "token" : "detailed",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1
      } ]
    },
    "tokenfilters" : [ {
      "name" : "snowball",
      "tokens" : [ {
        "token" : "detail",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0,
        "keyword" : false <1>
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1,
        "keyword" : false <1>
      } ]
    } ]
  }
}
--------------------------------------------------
// TESTRESPONSE
<1> Output only "keyword" attribute, since specify "attributes" in the request.
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[indices-analyze]]`
			`== Analyze`

			`Performs the analysis process on a text and return the tokens breakdown`
			`of the text.`

			`Can be used without specifying an index against one of the many built in`
			`analyzers:`

			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"analyzer" : "standard",`
			`"text" : "this is a test"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

Analysis: Add multi-valued text support Add support array text as a multi-valued for AnalyzeRequestBuilder Add support array text as a multi-valued for Analyze REST API Add docs Closes #3023 2015-04-26 21:55:21 -04:00			`If text parameter is provided as array of strings, it is analyzed as a multi-valued field.`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"analyzer" : "standard",`
			`"text" : ["this is a test", "the second text"]`
			`}'`
			`--------------------------------------------------`

Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00			`Or by building a custom transient analyzer out of tokenizers,`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`token filters and char filters. Token filters can use the shorter 'filter'`
Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00			`parameter name:`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"tokenizer" : "keyword",`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`"filter" : ["lowercase"],`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`"text" : "this is a test"`
			`}'`
Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"tokenizer" : "keyword",`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`"token_filter" : ["lowercase"],`
			`"char_filter" : ["html_strip"],`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`"text" : "this is a <b>test</b>"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			deprecated[5.0.0, Use `filter`/`token_filter`/`char_filter` instead of `filters`/`token_filters`/`char_filters`]

Analyze: Specify anonymous char_filters/tokenizer/token_filters in the analyze API Add parser for anonymous char_filters/tokenizer/token_filters Using Settings in AnalyzeRequest for anonymous definition Add breaking changes document Closed #8878 2015-09-28 06:06:47 -04:00			`Custom tokenizers, token filters, and character filters can be specified in the request body as follows:`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XGET 'localhost:9200/_analyze' -d '`
			`{`
			`"tokenizer" : "whitespace",`
			`"filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],`
			`"text" : "this is a test"`
			`}'`
			`--------------------------------------------------`

Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`It can also run against a specific index:`

			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/test/_analyze' -d '`
			`{`
			`"text" : "this is a test"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

			`The above will run an analysis on the "this is a test" text, using the`
			default index analyzer associated with the `test` index. An `analyzer`
			`can also be provided to use a different analyzer:`

			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/test/_analyze' -d '`
			`{`
			`"analyzer" : "whitespace",`
Fix typo on analyze.asciidoc (#19354) 2016-07-11 09:49:39 -04:00			`"text" : "this is a test"`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

			`Also, the analyzer can be derived based on a field mapping, for example:`

			`[source,js]`
			`--------------------------------------------------`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`curl -XGET 'localhost:9200/test/_analyze' -d '`
			`{`
			`"field" : "obj1.field1",`
			`"text" : "this is a test"`
			`}'`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`

[DOCS] Multiple doc fixes Closes #5047 2014-03-07 08:21:45 -05:00			`Will cause the analysis to happen based on the analyzer configured in the`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			mapping for `obj1.field1` (and if not, the default index analyzer).

Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`All parameters can also supplied as request parameters. For example:`

			`[source,js]`
			`--------------------------------------------------`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filter=lowercase&text=this+is+a+test'`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`--------------------------------------------------`

			`For backwards compatibility, we also accept the text parameter as the body of the request,`
			provided it doesn't start with `{` :

			`[source,js]`
			`--------------------------------------------------`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&token_filter=lowercase&char_filter=html_strip' -d 'this is a <b>test</b>'`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`--------------------------------------------------`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00
			`=== Explain Analyze`

			If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
			You can filter token attributes you want to output by setting `attributes` option.

			`experimental[The format of the additional detail information is experimental and can change at any time]`

			`[source,js]`
			`--------------------------------------------------`
Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start. 2016-04-29 10:42:03 -04:00			`GET _analyze`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`{`
			`"tokenizer" : "standard",`
Fail yaml tests and docs snippets that get unexpected warnings Adds `warnings` syntax to the yaml test that allows you to expect a `Warning` header that looks like: ``` - do: warnings: - '[index] is deprecated' - quotes are not required because yaml - but this argument is always a list, never a single string - no matter how many warnings you expect get: index: test type: test id: 1 ``` These are accessible from the docs with: ``` // TEST[warning:some warning] ``` This should help to force you to update the docs if you deprecate something. You must add the warnings marker to the docs or the build will fail. While you are there you should update the docs to add deprecation warnings visible in the rendered results. 2016-08-02 17:35:31 -04:00			`"filter" : ["snowball"],`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`"text" : "detailed output",`
			`"explain" : true,`
			`"attributes" : ["keyword"] <1>`
			`}`
			`--------------------------------------------------`
Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 09:42:23 -04:00			`// CONSOLE`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`<1> Set "keyword" to output "keyword" attribute only`

			`The request returns the following result:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"detail" : {`
			`"custom_analyzer" : true,`
			`"charfilters" : [ ],`
			`"tokenizer" : {`
			`"name" : "standard",`
			`"tokens" : [ {`
			`"token" : "detailed",`
			`"start_offset" : 0,`
			`"end_offset" : 8,`
			`"type" : "<ALPHANUM>",`
			`"position" : 0`
			`}, {`
			`"token" : "output",`
			`"start_offset" : 9,`
			`"end_offset" : 15,`
			`"type" : "<ALPHANUM>",`
			`"position" : 1`
			`} ]`
			`},`
			`"tokenfilters" : [ {`
			`"name" : "snowball",`
			`"tokens" : [ {`
			`"token" : "detail",`
			`"start_offset" : 0,`
			`"end_offset" : 8,`
			`"type" : "<ALPHANUM>",`
			`"position" : 0,`
			`"keyword" : false <1>`
			`}, {`
			`"token" : "output",`
			`"start_offset" : 9,`
			`"end_offset" : 15,`
			`"type" : "<ALPHANUM>",`
			`"position" : 1,`
			`"keyword" : false <1>`
			`} ]`
			`} ]`
			`}`
			`}`
			`--------------------------------------------------`
Add CONSOLE to a few snippets in reference docs This allows them to be run in Console and adds them to the list of docs that are automatically tested as part of the build. Relates to #18160 2016-09-01 13:05:22 -04:00			`// TESTRESPONSE`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`<1> Output only "keyword" attribute, since specify "attributes" in the request.`