[[indices-analyze]] == Analyze Performs the analysis process on a text and return the tokens breakdown of the text. Can be used without specifying an index against one of the many built in analyzers: [source,js] -------------------------------------------------- GET _analyze { "analyzer" : "standard", "text" : "this is a test" } -------------------------------------------------- // CONSOLE If text parameter is provided as array of strings, it is analyzed as a multi-valued field. [source,js] -------------------------------------------------- GET _analyze { "analyzer" : "standard", "text" : ["this is a test", "the second text"] } -------------------------------------------------- // CONSOLE Or by building a custom transient analyzer out of tokenizers, token filters and char filters. Token filters can use the shorter 'filter' parameter name: [source,js] -------------------------------------------------- GET _analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], "text" : "this is a test" } -------------------------------------------------- // CONSOLE [source,js] -------------------------------------------------- GET _analyze { "tokenizer" : "keyword", "filter" : ["lowercase"], "char_filter" : ["html_strip"], "text" : "this is a test" } -------------------------------------------------- // CONSOLE deprecated[5.0.0, Use `filter`/`char_filter` instead of `filters`/`char_filters` and `token_filters` has been removed] Custom tokenizers, token filters, and character filters can be specified in the request body as follows: [source,js] -------------------------------------------------- GET _analyze { "tokenizer" : "whitespace", "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}], "text" : "this is a test" } -------------------------------------------------- // CONSOLE It can also run against a specific index: [source,js] -------------------------------------------------- GET analyze_sample/_analyze { "text" : "this is a test" } -------------------------------------------------- // CONSOLE // TEST[setup:analyze_sample] The above will run an analysis on the "this is a test" text, using the default index analyzer associated with the `analyze_sample` index. An `analyzer` can also be provided to use a different analyzer: [source,js] -------------------------------------------------- GET analyze_sample/_analyze { "analyzer" : "whitespace", "text" : "this is a test" } -------------------------------------------------- // CONSOLE // TEST[setup:analyze_sample] Also, the analyzer can be derived based on a field mapping, for example: [source,js] -------------------------------------------------- GET analyze_sample/_analyze { "field" : "obj1.field1", "text" : "this is a test" } -------------------------------------------------- // CONSOLE // TEST[setup:analyze_sample] Will cause the analysis to happen based on the analyzer configured in the mapping for `obj1.field1` (and if not, the default index analyzer). A `normalizer` can be provided for keyword field with normalizer associated with the `analyze_sample` index. [source,js] -------------------------------------------------- GET analyze_sample/_analyze { "normalizer" : "my_normalizer", "text" : "BaR" } -------------------------------------------------- // CONSOLE // TEST[setup:analyze_sample] Or by building a custom transient normalizer out of token filters and char filters. [source,js] -------------------------------------------------- GET _analyze { "filter" : ["lowercase"], "text" : "BaR" } -------------------------------------------------- // CONSOLE === Explain Analyze If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token. You can filter token attributes you want to output by setting `attributes` option. NOTE: The format of the additional detail information is labelled as experimental in Lucene and it may change in the future. [source,js] -------------------------------------------------- GET _analyze { "tokenizer" : "standard", "filter" : ["snowball"], "text" : "detailed output", "explain" : true, "attributes" : ["keyword"] <1> } -------------------------------------------------- // CONSOLE <1> Set "keyword" to output "keyword" attribute only The request returns the following result: [source,js] -------------------------------------------------- { "detail" : { "custom_analyzer" : true, "charfilters" : [ ], "tokenizer" : { "name" : "standard", "tokens" : [ { "token" : "detailed", "start_offset" : 0, "end_offset" : 8, "type" : "", "position" : 0 }, { "token" : "output", "start_offset" : 9, "end_offset" : 15, "type" : "", "position" : 1 } ] }, "tokenfilters" : [ { "name" : "snowball", "tokens" : [ { "token" : "detail", "start_offset" : 0, "end_offset" : 8, "type" : "", "position" : 0, "keyword" : false <1> }, { "token" : "output", "start_offset" : 9, "end_offset" : 15, "type" : "", "position" : 1, "keyword" : false <1> } ] } ] } } -------------------------------------------------- // TESTRESPONSE <1> Output only "keyword" attribute, since specify "attributes" in the request. [[tokens-limit-settings]] [float] == Settings to prevent tokens explosion Generating excessive amount of tokens may cause a node to run out of memory. The following setting allows to limit the number of tokens that can be produced: `index.analyze.max_token_count`:: The maximum number of tokens that can be produced using `_analyze` API. The default value is `10000`. If more than this limit of tokens gets generated, an error will be thrown. The `_analyze` endpoint without a specified index will always use `10000` value as a limit. This setting allows you to control the limit for a specific index: [source,js] -------------------------------------------------- PUT analyze_sample?include_type_name=true { "settings" : { "index.analyze.max_token_count" : 20000 } } -------------------------------------------------- // CONSOLE [source,js] -------------------------------------------------- GET analyze_sample/_analyze { "text" : "this is a test" } -------------------------------------------------- // CONSOLE // TEST[setup:analyze_sample]