OpenSearch/docs/reference/analysis/normalizers.asciidoc

[[analysis-normalizers]]
== Normalizers

Normalizers are similar to analyzers except that they may only emit a single
token. As a consequence, they do not have a tokenizer and only accept a subset
of the available char filters and token filters. Only the filters that work on
a per-character basis are allowed. For instance a lowercasing filter would be
allowed, but not a stemming filter, which needs to look at the keyword as a
whole. The current list of filters that can be used in a normalizer is
following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
`hindi_normalization`, `indic_normalization`, `lowercase`,
`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
`sorani_normalization`, `uppercase`.

Elasticsearch ships with a `lowercase` built-in normalizer. For other forms of
normalization a custom configuration is required.

[float]
=== Custom normalizers

Custom normalizers take a list of
<<analysis-charfilters, character filters>> and a list of
<<analysis-tokenfilters,token filters>>.

[source,console]
--------------------------------
PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "quote": {
          "type": "mapping",
          "mappings": [
            "« => \"",
            "» => \""
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": ["quote"],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}
--------------------------------