OpenSearch/docs/reference/analysis/normalizers.asciidoc

[[analysis-normalizers]]
== Normalizers

Normalizers are similar to analyzers except that they may only emit a single
token. As a consequence, they do not have a tokenizer and only accept a subset
of the available char filters and token filters. Only the filters that work on
a per-character basis are allowed. For instance a lowercasing filter would be
allowed, but not a stemming filter, which needs to look at the keyword as a
whole. The current list of filters that can be used in a normalizer is
following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
`hindi_normalization`, `indic_normalization`, `lowercase`,
`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
`sorani_normalization`, `uppercase`.

Elasticsearch ships with a `lowercase` built-in normalizer. For other forms of
normalization a custom configuration is required.

[discrete]
=== Custom normalizers

Custom normalizers take a list of
<<analysis-charfilters, character filters>> and a list of
<<analysis-tokenfilters,token filters>>.

[source,console]
--------------------------------
PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "quote": {
          "type": "mapping",
          "mappings": [
            "« => \"",
            "» => \""
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": ["quote"],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}
--------------------------------
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00			`[[analysis-normalizers]]`
			`== Normalizers`

			`Normalizers are similar to analyzers except that they may only emit a single`
			`token. As a consequence, they do not have a tokenizer and only accept a subset`
			`of the available char filters and token filters. Only the filters that work on`
			`a per-character basis are allowed. For instance a lowercasing filter would be`
			`allowed, but not a stemming filter, which needs to look at the keyword as a`
[DOCS] Add supported token filters Update normalizers.asciidoc with the list of supported token filters Closes #28605 2018-02-13 17:08:39 -05:00			`whole. The current list of filters that can be used in a normalizer is`
			following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
			`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
			`hindi_normalization`, `indic_normalization`, `lowercase`,
			`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
			`sorani_normalization`, `uppercase`.
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00
Backport of lowercase normalizer PR #53882 A pre-configured normalizer for lower-casing. Closes #53872 2020-04-03 06:43:40 -04:00			Elasticsearch ships with a `lowercase` built-in normalizer. For other forms of
			`normalization a custom configuration is required.`

[DOCS] Swap `[float]` for `[discrete]` (#60134) Changes instances of `[float]` in our docs for `[discrete]`. Asciidoctor prefers the `[discrete]` tag for floating headings: https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks 2020-07-23 12:42:33 -04:00			`[discrete]`
Update normalizers.asciidoc analyzers -> normalizers 2017-02-07 06:09:16 -05:00			`=== Custom normalizers`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00
Backport of lowercase normalizer PR #53882 A pre-configured normalizer for lower-casing. Closes #53872 2020-04-03 06:43:40 -04:00			`Custom normalizers take a list of`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00			`<<analysis-charfilters, character filters>> and a list of`
			`<<analysis-tokenfilters,token filters>>.`

[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502) 2019-09-09 13:38:14 -04:00			`[source,console]`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00			`--------------------------------`
Remove more include_type_name and types from docs (#37601) 2019-01-18 08:11:18 -05:00			`PUT index`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00			`{`
			`"settings": {`
			`"analysis": {`
			`"char_filter": {`
			`"quote": {`
			`"type": "mapping",`
			`"mappings": [`
			`"« => \"",`
			`"» => \""`
			`]`
			`}`
			`},`
			`"normalizer": {`
			`"my_normalizer": {`
			`"type": "custom",`
			`"char_filter": ["quote"],`
			`"filter": ["lowercase", "asciifolding"]`
			`}`
			`}`
			`}`
			`},`
			`"mappings": {`
Remove more include_type_name and types from docs (#37601) 2019-01-18 08:11:18 -05:00			`"properties": {`
			`"foo": {`
			`"type": "keyword",`
			`"normalizer": "my_normalizer"`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00			`}`
			`}`
			`}`
			`}`
			`--------------------------------`