---
layout: default
title: Normalizers
nav_order: 100
---

# Normalizers

A _normalizer_ functions similarly to an analyzer but outputs only a single token. It does not contain a tokenizer and can only include specific types of character and token filters. These filters can perform only character-level operations, such as character or pattern replacement, and cannot operate on the token as a whole. This means that replacing a token with a synonym or stemming is not supported.

A normalizer is useful in keyword search (that is, in term-based queries) because it allows you to run token and character filters on any given input. For instance, it makes it possible to match an incoming query `Naïve` with the index term `naive`.

Consider the following example.

Create a new index with a custom normalizer:
```json
PUT /sample-index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "normalized_keyword": {
          "type": "custom",
          "char_filter": [],
          "filter": [ "asciifolding", "lowercase" ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "approach": {
        "type": "keyword",
        "normalizer": "normalized_keyword"
      }
    }
  }
}
```
{% include copy-curl.html %}

Index a document:
```json
POST /sample-index/_doc/
{
  "approach": "naive"
}
```
{% include copy-curl.html %}

The following query matches the document. This is expected:
```json
GET /sample-index/_search
{
  "query": {
    "term": {
      "approach": "naive"
    }
  }
}
```
{% include copy-curl.html %}

But this query matches the document as well:
```json
GET /sample-index/_search
{
  "query": {
    "term": {
      "approach": "Naïve"
    }
  }
}
```
{% include copy-curl.html %}

To understand why, consider the effect of the normalizer:
```json
GET /sample-index/_analyze
{
  "normalizer" : "normalized_keyword",
  "text" : "Naïve"
}
```

Internally, a normalizer accepts only filters that are instances of either `NormalizingTokenFilterFactory` or `NormalizingCharFilterFactory`. The following is a list of compatible filters found in modules and plugins that are part of the core OpenSearch repository.

### The `common-analysis` module

This module does not require installation; it is available by default.

Character filters: `pattern_replace`, `mapping`

Token filters: `arabic_normalization`, `asciifolding`, `bengali_normalization`, `cjk_width`, `decimal_digit`, `elision`, `german_normalization`, `hindi_normalization`, `indic_normalization`, `lowercase`, `persian_normalization`, `scandinavian_folding`, `scandinavian_normalization`, `serbian_normalization`, `sorani_normalization`, `trim`, `uppercase`

### The `analysis-icu` plugin

Character filters: `icu_normalizer`

Token filters: `icu_normalizer`, `icu_folding`, `icu_transform`

### The `analysis-kuromoji` plugin

Character filters: `normalize_kanji`, `normalize_kana`

### The `analysis-nori` plugin

Character filters: `normalize_kanji`, `normalize_kana`

These lists of filters include only analysis components found in the [additional plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/#additional-plugins) that are part of the core OpenSearch repository.
{: .note}