The Analyze API allows you to perform [text analysis]({{site.url}}{{site.baseurl}}/api-reference/analyze-apis/), which is the process of converting unstructured text into individual tokens (usually words) that are optimized for search.
If you use the Security plugin, you must have the `manage index` privilege. If you only want to analyze text, you must have the `manage cluster` privilege.
Although you can issue an analyze request using both `GET` and `POST` requests, the two have important distinctions. A `GET` request causes data to be cached in the index so that the next time the data is requested, it is retrieved faster. A `POST` request sends a string that does not already exist to the analyzer to be compared with data that is already in the index. `POST` requests are not cached.
analyzer | String | The name of the analyzer to apply to the `text` field. The analyzer can be built in or configured in the index.<br/><br/>If `analyzer` is not specified, the analyze API uses the analyzer defined in the mapping of the `field` field.<br/><br/>If the `field` field is not specified, the analyze API uses the default analyzer for the index.<br/><br> If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.
attributes | Array of Strings | Array of token attributes for filtering the output of the `explain` field.
char_filter | Array of Strings | Array of character filters for preprocessing characters before the `tokenizer` field.
explain | Boolean | If true, causes the response to include token attributes and additional details. Defaults to `false`.
field | String | Field for deriving the analyzer. <br/><br> If you specify `field`, you must also specify the `index` path parameter. <br/><br> If you specify the `analyzer` field, it overrides the value of `field`. <br/><br> If you do not specify `field`, the analyze API uses the default analyzer for the index. <br/><br> If you do not specify the `index` field, or the index does not have a default analyzer, the analyze API uses the standard analyzer.
filter | Array of Strings | Array of token filters to apply after the `tokenizer` field.
normalizer | String | Normalizer for converting text into a single token.
tokenizer | String | Tokenizer for converting the `text` field into tokens.
The previous request returns the following fields:
````json
{
"tokens" : [
{
"token" : "opensearch",
"start_offset" : 0,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "text",
"start_offset" : 11,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "analysis",
"start_offset" : 16,
"end_offset" : 24,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
````
#### Apply a custom analyzer
You can create your own analyzer and specify it in an analyze request.
In this scenario, a custom analyzer `lowercase_ascii_folding` has been created and associated with the `books2` index. The analyzer converts text to lowercase and converts non-ASCII characters to ASCII.
The following request applies the custom analyzer to the provided text:
The previous request returns the following fields:
````json
{
"tokens" : [
{
"token" : "opensearch",
"start_offset" : 0,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "analyze",
"start_offset" : 11,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "test",
"start_offset" : 19,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
````
#### Specify a normalizer
Instead of using a keyword field, you can use the normalizer associated with the index. A normalizer causes the analysis change to produce a single token.
In this example, the `books2` index includes a normalizer called `to_lower_fold_ascii` that converts text to lowercase and translates non-ASCII text to ASCII.
The following request applies `to_lower_fold_ascii` to the text:
The preceding request is an index API rather than an analyze API. See [Dynamic index-level index settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index-settings/#dynamic-index-level-index-settings) for additional details.
start_offset | Integer | The token's starting position within the original text string. Offsets are zero-based.
end_offset | Integer | The token's ending position within the original text string.
type | String | Classification of the token: `<ALPHANUM>`, `<NUM>`, and so on. The tokenizer usually sets the type, but some filters define their own types. For example, the synonym filter defines the `<SYNONYM>` type.
position | Integer | The token's position within the `tokens` array.
custom_analyzer | Boolean | Whether the analyzer applied to the text is custom or built in.
charfilters | Array | List of character filters applied to the text.
tokenizer | Object | Name of the tokenizer applied to the text and a list of tokens<sup>*</sup> with content before the token filters were applied.
tokenfilters | Array | List of token filters applied to the text. Each token filter includes the filter's name and a list of tokens<sup>*</sup> with content after the filters were applied. Token filters are listed in the order they are specified in the request.
See [token object](#token-object) for token field descriptions.