OpenSearch/docs/reference/analysis/normalizers.asciidoc

[[analysis-normalizers]]
== Normalizers

experimental[]

Normalizers are similar to analyzers except that they may only emit a single
token. As a consequence, they do not have a tokenizer and only accept a subset
of the available char filters and token filters. Only the filters that work on
a per-character basis are allowed. For instance a lowercasing filter would be
allowed, but not a stemming filter, which needs to look at the keyword as a
whole.

[float]
=== Custom analyzers

Elasticsearch does not ship with built-in normalizers so far, so the only way
to get one is by building a custom one. Custom normalizers take a list of char
<<analysis-charfilters, character filters>> and a list of
<<analysis-tokenfilters,token filters>>.

[source,js]
--------------------------------
PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "quote": {
          "type": "mapping",
          "mappings": [
            "« => \"",
            "» => \""
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": ["quote"],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "type": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}
--------------------------------
// CONSOLE
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00			`[[analysis-normalizers]]`
			`== Normalizers`

			`experimental[]`

			`Normalizers are similar to analyzers except that they may only emit a single`
			`token. As a consequence, they do not have a tokenizer and only accept a subset`
			`of the available char filters and token filters. Only the filters that work on`
			`a per-character basis are allowed. For instance a lowercasing filter would be`
			`allowed, but not a stemming filter, which needs to look at the keyword as a`
			`whole.`

			`[float]`
			`=== Custom analyzers`

			`Elasticsearch does not ship with built-in normalizers so far, so the only way`
			`to get one is by building a custom one. Custom normalizers take a list of char`
			`<<analysis-charfilters, character filters>> and a list of`
			`<<analysis-tokenfilters,token filters>>.`

			`[source,js]`
			`--------------------------------`
			`PUT index`
			`{`
			`"settings": {`
			`"analysis": {`
			`"char_filter": {`
			`"quote": {`
			`"type": "mapping",`
			`"mappings": [`
			`"« => \"",`
			`"» => \""`
			`]`
			`}`
			`},`
			`"normalizer": {`
			`"my_normalizer": {`
			`"type": "custom",`
			`"char_filter": ["quote"],`
			`"filter": ["lowercase", "asciifolding"]`
			`}`
			`}`
			`}`
			`},`
			`"mappings": {`
			`"type": {`
			`"properties": {`
			`"foo": {`
			`"type": "keyword",`
			`"normalizer": "my_normalizer"`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------`
			`// CONSOLE`