OpenSearch/docs/reference/analysis/normalizers.asciidoc

[[analysis-normalizers]]
== Normalizers

Normalizers are similar to analyzers except that they may only emit a single
token. As a consequence, they do not have a tokenizer and only accept a subset
of the available char filters and token filters. Only the filters that work on
a per-character basis are allowed. For instance a lowercasing filter would be
allowed, but not a stemming filter, which needs to look at the keyword as a
whole. The current list of filters that can be used in a normalizer is
following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
`hindi_normalization`, `indic_normalization`, `lowercase`,
`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
`sorani_normalization`, `uppercase`.

[float]
=== Custom normalizers

Elasticsearch does not ship with built-in normalizers so far, so the only way
to get one is by building a custom one. Custom normalizers take a list of char
<<analysis-charfilters, character filters>> and a list of
<<analysis-tokenfilters,token filters>>.

[source,js]
--------------------------------
PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "quote": {
          "type": "mapping",
          "mappings": [
            "« => \"",
            "» => \""
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": ["quote"],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "type": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}
--------------------------------
// CONSOLE
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 09:36:10 +01:00			`[[analysis-normalizers]]`
			`== Normalizers`

			`Normalizers are similar to analyzers except that they may only emit a single`
			`token. As a consequence, they do not have a tokenizer and only accept a subset`
			`of the available char filters and token filters. Only the filters that work on`
			`a per-character basis are allowed. For instance a lowercasing filter would be`
			`allowed, but not a stemming filter, which needs to look at the keyword as a`
[DOCS] Add supported token filters Update normalizers.asciidoc with the list of supported token filters Closes #28605 2018-02-13 14:08:39 -08:00			`whole. The current list of filters that can be used in a normalizer is`
			following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
			`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
			`hindi_normalization`, `indic_normalization`, `lowercase`,
			`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
			`sorani_normalization`, `uppercase`.
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 09:36:10 +01:00
			`[float]`
Update normalizers.asciidoc analyzers -> normalizers 2017-02-07 12:09:16 +01:00			`=== Custom normalizers`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 09:36:10 +01:00
			`Elasticsearch does not ship with built-in normalizers so far, so the only way`
			`to get one is by building a custom one. Custom normalizers take a list of char`
			`<<analysis-charfilters, character filters>> and a list of`
			`<<analysis-tokenfilters,token filters>>.`

			`[source,js]`
			`--------------------------------`
			`PUT index`
			`{`
			`"settings": {`
			`"analysis": {`
			`"char_filter": {`
			`"quote": {`
			`"type": "mapping",`
			`"mappings": [`
			`"« => \"",`
			`"» => \""`
			`]`
			`}`
			`},`
			`"normalizer": {`
			`"my_normalizer": {`
			`"type": "custom",`
			`"char_filter": ["quote"],`
			`"filter": ["lowercase", "asciifolding"]`
			`}`
			`}`
			`}`
			`},`
			`"mappings": {`
			`"type": {`
			`"properties": {`
			`"foo": {`
			`"type": "keyword",`
			`"normalizer": "my_normalizer"`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------`
			`// CONSOLE`