OpenSearch/docs/reference/indices/analyze.asciidoc

[[indices-analyze]]
== Analyze

Performs the analysis process on a text and return the tokens breakdown
of the text.

Can be used without specifying an index against one of the many built in
analyzers:

[source,js]
--------------------------------------------------
GET _analyze
{
  "analyzer" : "standard",
  "text" : "this is a test"
}
--------------------------------------------------
// CONSOLE

If text parameter is provided as array of strings, it is analyzed as a multi-valued field.

[source,js]
--------------------------------------------------
GET _analyze
{
  "analyzer" : "standard",
  "text" : ["this is a test", "the second text"]
}
--------------------------------------------------
// CONSOLE

Or by building a custom transient analyzer out of tokenizers,
token filters and char filters. Token filters can use the shorter 'filter'
parameter name:

[source,js]
--------------------------------------------------
GET _analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "text" : "this is a test"
}
--------------------------------------------------
// CONSOLE

[source,js]
--------------------------------------------------
GET _analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}
--------------------------------------------------
// CONSOLE

deprecated[5.0.0, Use `filter`/`char_filter` instead of `filters`/`char_filters` and `token_filters` has been removed]

Custom tokenizers, token filters, and character filters can be specified in the request body as follows:

[source,js]
--------------------------------------------------
GET _analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}
--------------------------------------------------
// CONSOLE

It can also run against a specific index:

[source,js]
--------------------------------------------------
GET analyze_sample/_analyze
{
  "text" : "this is a test"
}
--------------------------------------------------
// CONSOLE
// TEST[setup:analyze_sample]

The above will run an analysis on the "this is a test" text, using the
default index analyzer associated with the `analyze_sample` index. An `analyzer`
can also be provided to use a different analyzer:

[source,js]
--------------------------------------------------
GET analyze_sample/_analyze
{
  "analyzer" : "whitespace",
  "text" : "this is a test"
}
--------------------------------------------------
// CONSOLE
// TEST[setup:analyze_sample]

Also, the analyzer can be derived based on a field mapping, for example:

[source,js]
--------------------------------------------------
GET analyze_sample/_analyze
{
  "field" : "obj1.field1",
  "text" : "this is a test"
}
--------------------------------------------------
// CONSOLE
// TEST[setup:analyze_sample]

Will cause the analysis to happen based on the analyzer configured in the
mapping for `obj1.field1` (and if not, the default index analyzer).

A `normalizer` can be provided for keyword field with normalizer associated with the `analyze_sample` index.

[source,js]
--------------------------------------------------
GET analyze_sample/_analyze
{
  "normalizer" : "my_normalizer",
  "text" : "BaR"
}
--------------------------------------------------
// CONSOLE
// TEST[setup:analyze_sample]

Or by building a custom transient normalizer out of token filters and char filters.

[source,js]
--------------------------------------------------
GET _analyze
{
  "filter" : ["lowercase"],
  "text" : "BaR"
}
--------------------------------------------------
// CONSOLE

=== Explain Analyze

If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
You can filter token attributes you want to output by setting `attributes` option.

experimental[The format of the additional detail information is experimental and can change at any time]

[source,js]
--------------------------------------------------
GET _analyze
{
  "tokenizer" : "standard",
  "filter" : ["snowball"],
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"] <1>
}
--------------------------------------------------
// CONSOLE
<1> Set "keyword" to output "keyword" attribute only

The request returns the following result:

[source,js]
--------------------------------------------------
{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [ {
        "token" : "detailed",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1
      } ]
    },
    "tokenfilters" : [ {
      "name" : "snowball",
      "tokens" : [ {
        "token" : "detail",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0,
        "keyword" : false <1>
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1,
        "keyword" : false <1>
      } ]
    } ]
  }
}
--------------------------------------------------
// TESTRESPONSE
<1> Output only "keyword" attribute, since specify "attributes" in the request.
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[indices-analyze]]`
			`== Analyze`

			`Performs the analysis process on a text and return the tokens breakdown`
			`of the text.`

			`Can be used without specifying an index against one of the many built in`
			`analyzers:`

			`[source,js]`
			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`GET _analyze`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`{`
			`"analyzer" : "standard",`
			`"text" : "this is a test"`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`}`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`// CONSOLE`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
Analysis: Add multi-valued text support Add support array text as a multi-valued for AnalyzeRequestBuilder Add support array text as a multi-valued for Analyze REST API Add docs Closes #3023 2015-04-26 21:55:21 -04:00			`If text parameter is provided as array of strings, it is analyzed as a multi-valued field.`

			`[source,js]`
			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`GET _analyze`
Analysis: Add multi-valued text support Add support array text as a multi-valued for AnalyzeRequestBuilder Add support array text as a multi-valued for Analyze REST API Add docs Closes #3023 2015-04-26 21:55:21 -04:00			`{`
			`"analyzer" : "standard",`
			`"text" : ["this is a test", "the second text"]`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`}`
Analysis: Add multi-valued text support Add support array text as a multi-valued for AnalyzeRequestBuilder Add support array text as a multi-valued for Analyze REST API Add docs Closes #3023 2015-04-26 21:55:21 -04:00			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`// CONSOLE`
Analysis: Add multi-valued text support Add support array text as a multi-valued for AnalyzeRequestBuilder Add support array text as a multi-valued for Analyze REST API Add docs Closes #3023 2015-04-26 21:55:21 -04:00
Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00			`Or by building a custom transient analyzer out of tokenizers,`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`token filters and char filters. Token filters can use the shorter 'filter'`
Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00			`parameter name:`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
			`[source,js]`
			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`GET _analyze`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`{`
			`"tokenizer" : "keyword",`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`"filter" : ["lowercase"],`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`"text" : "this is a test"`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`}`
			`--------------------------------------------------`
			`// CONSOLE`
Added support for char filters in the analyze API Closes #5148 2014-02-17 23:25:12 -05:00
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`[source,js]`
			`--------------------------------------------------`
			`GET _analyze`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`{`
			`"tokenizer" : "keyword",`
Remove `token_filter` in _analyze API Remove the param and change docs Closes #20283 2016-09-01 11:04:13 -04:00			`"filter" : ["lowercase"],`
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00			`"char_filter" : ["html_strip"],`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`"text" : "this is a <b>test</b>"`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`}`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`// CONSOLE`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
Remove `token_filter` in _analyze API Remove the param and change docs Closes #20283 2016-09-01 11:04:13 -04:00			deprecated[5.0.0, Use `filter`/`char_filter` instead of `filters`/`char_filters` and `token_filters` has been removed]
Analyze API : Rename filters/token_filters/char_filter to filter/token_filter/char_filter Closes #15189 2016-04-07 12:58:10 -04:00
Analyze: Specify anonymous char_filters/tokenizer/token_filters in the analyze API Add parser for anonymous char_filters/tokenizer/token_filters Using Settings in AnalyzeRequest for anonymous definition Add breaking changes document Closed #8878 2015-09-28 06:06:47 -04:00			`Custom tokenizers, token filters, and character filters can be specified in the request body as follows:`

			`[source,js]`
			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`GET _analyze`
Analyze: Specify anonymous char_filters/tokenizer/token_filters in the analyze API Add parser for anonymous char_filters/tokenizer/token_filters Using Settings in AnalyzeRequest for anonymous definition Add breaking changes document Closed #8878 2015-09-28 06:06:47 -04:00			`{`
			`"tokenizer" : "whitespace",`
			`"filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],`
			`"text" : "this is a test"`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`}`
Analyze: Specify anonymous char_filters/tokenizer/token_filters in the analyze API Add parser for anonymous char_filters/tokenizer/token_filters Using Settings in AnalyzeRequest for anonymous definition Add breaking changes document Closed #8878 2015-09-28 06:06:47 -04:00			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`// CONSOLE`
Analyze: Specify anonymous char_filters/tokenizer/token_filters in the analyze API Add parser for anonymous char_filters/tokenizer/token_filters Using Settings in AnalyzeRequest for anonymous definition Add breaking changes document Closed #8878 2015-09-28 06:06:47 -04:00
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`It can also run against a specific index:`

			`[source,js]`
			`--------------------------------------------------`
[Analysis] Support normalizer in request param (#24767) * [Analysis] Support normalizer in request param Support normalizer param Support custom normalizer with char_filter/filter param Closes #23347 2017-07-04 06:16:56 -04:00			`GET analyze_sample/_analyze`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`{`
			`"text" : "this is a test"`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`}`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`// CONSOLE`
[Analysis] Support normalizer in request param (#24767) * [Analysis] Support normalizer in request param Support normalizer param Support custom normalizer with char_filter/filter param Closes #23347 2017-07-04 06:16:56 -04:00			`// TEST[setup:analyze_sample]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
			`The above will run an analysis on the "this is a test" text, using the`
[Analysis] Support normalizer in request param (#24767) * [Analysis] Support normalizer in request param Support normalizer param Support custom normalizer with char_filter/filter param Closes #23347 2017-07-04 06:16:56 -04:00			default index analyzer associated with the `analyze_sample` index. An `analyzer`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`can also be provided to use a different analyzer:`

			`[source,js]`
			`--------------------------------------------------`
[Analysis] Support normalizer in request param (#24767) * [Analysis] Support normalizer in request param Support normalizer param Support custom normalizer with char_filter/filter param Closes #23347 2017-07-04 06:16:56 -04:00			`GET analyze_sample/_analyze`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`{`
			`"analyzer" : "whitespace",`
Fix typo on analyze.asciidoc (#19354) 2016-07-11 09:49:39 -04:00			`"text" : "this is a test"`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`}`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`// CONSOLE`
[Analysis] Support normalizer in request param (#24767) * [Analysis] Support normalizer in request param Support normalizer param Support custom normalizer with char_filter/filter param Closes #23347 2017-07-04 06:16:56 -04:00			`// TEST[setup:analyze_sample]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
			`Also, the analyzer can be derived based on a field mapping, for example:`

			`[source,js]`
			`--------------------------------------------------`
[Analysis] Support normalizer in request param (#24767) * [Analysis] Support normalizer in request param Support normalizer param Support custom normalizer with char_filter/filter param Closes #23347 2017-07-04 06:16:56 -04:00			`GET analyze_sample/_analyze`
Rest: Add json in request body to scroll, clear scroll, and analyze API Change analyze.asciidoc and scroll.asciidoc Add json support to Analyze and Scroll, and clear scrollAPI Add rest-api-spec/test Closes #5866 2015-04-02 21:51:15 -04:00			`{`
			`"field" : "obj1.field1",`
			`"text" : "this is a test"`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`}`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`--------------------------------------------------`
CONSOLify indices/analyze.asciidoc and search/field-stats.asciidoc Relates #23001 2017-02-07 14:15:09 -05:00			`// CONSOLE`
[Analysis] Support normalizer in request param (#24767) * [Analysis] Support normalizer in request param Support normalizer param Support custom normalizer with char_filter/filter param Closes #23347 2017-07-04 06:16:56 -04:00			`// TEST[setup:analyze_sample]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
[DOCS] Multiple doc fixes Closes #5047 2014-03-07 08:21:45 -05:00			`Will cause the analysis to happen based on the analyzer configured in the`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			mapping for `obj1.field1` (and if not, the default index analyzer).

[Analysis] Support normalizer in request param (#24767) * [Analysis] Support normalizer in request param Support normalizer param Support custom normalizer with char_filter/filter param Closes #23347 2017-07-04 06:16:56 -04:00			A `normalizer` can be provided for keyword field with normalizer associated with the `analyze_sample` index.

			`[source,js]`
			`--------------------------------------------------`
			`GET analyze_sample/_analyze`
			`{`
			`"normalizer" : "my_normalizer",`
			`"text" : "BaR"`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`
			`// TEST[setup:analyze_sample]`

			`Or by building a custom transient normalizer out of token filters and char filters.`

			`[source,js]`
			`--------------------------------------------------`
			`GET _analyze`
			`{`
			`"filter" : ["lowercase"],`
			`"text" : "BaR"`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`=== Explain Analyze`

			If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
			You can filter token attributes you want to output by setting `attributes` option.

			`experimental[The format of the additional detail information is experimental and can change at any time]`

			`[source,js]`
			`--------------------------------------------------`
Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start. 2016-04-29 10:42:03 -04:00			`GET _analyze`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`{`
			`"tokenizer" : "standard",`
Fail yaml tests and docs snippets that get unexpected warnings Adds `warnings` syntax to the yaml test that allows you to expect a `Warning` header that looks like: ``` - do: warnings: - '[index] is deprecated' - quotes are not required because yaml - but this argument is always a list, never a single string - no matter how many warnings you expect get: index: test type: test id: 1 ``` These are accessible from the docs with: ``` // TEST[warning:some warning] ``` This should help to force you to update the docs if you deprecate something. You must add the warnings marker to the docs or the build will fail. While you are there you should update the docs to add deprecation warnings visible in the rendered results. 2016-08-02 17:35:31 -04:00			`"filter" : ["snowball"],`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`"text" : "detailed output",`
			`"explain" : true,`
			`"attributes" : ["keyword"] <1>`
			`}`
			`--------------------------------------------------`
Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 09:42:23 -04:00			`// CONSOLE`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`<1> Set "keyword" to output "keyword" attribute only`

			`The request returns the following result:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"detail" : {`
			`"custom_analyzer" : true,`
			`"charfilters" : [ ],`
			`"tokenizer" : {`
			`"name" : "standard",`
			`"tokens" : [ {`
			`"token" : "detailed",`
			`"start_offset" : 0,`
			`"end_offset" : 8,`
			`"type" : "<ALPHANUM>",`
			`"position" : 0`
			`}, {`
			`"token" : "output",`
			`"start_offset" : 9,`
			`"end_offset" : 15,`
			`"type" : "<ALPHANUM>",`
			`"position" : 1`
			`} ]`
			`},`
			`"tokenfilters" : [ {`
			`"name" : "snowball",`
			`"tokens" : [ {`
			`"token" : "detail",`
			`"start_offset" : 0,`
			`"end_offset" : 8,`
			`"type" : "<ALPHANUM>",`
			`"position" : 0,`
			`"keyword" : false <1>`
			`}, {`
			`"token" : "output",`
			`"start_offset" : 9,`
			`"end_offset" : 15,`
			`"type" : "<ALPHANUM>",`
			`"position" : 1,`
			`"keyword" : false <1>`
			`} ]`
			`} ]`
			`}`
			`}`
			`--------------------------------------------------`
Add CONSOLE to a few snippets in reference docs This allows them to be run in Console and adds them to the list of docs that are automatically tested as part of the build. Relates to #18160 2016-09-01 13:05:22 -04:00			`// TESTRESPONSE`
Analysis: Add detail response support add explain option fix char_filter bug Closes #11076 #15257 2015-06-15 03:32:44 -04:00			`<1> Output only "keyword" attribute, since specify "attributes" in the request.`