OpenSearch/docs/reference/analysis/normalizers.asciidoc

[[analysis-normalizers]]
== Normalizers

beta[]

Normalizers are similar to analyzers except that they may only emit a single
token. As a consequence, they do not have a tokenizer and only accept a subset
of the available char filters and token filters. Only the filters that work on
a per-character basis are allowed. For instance a lowercasing filter would be
allowed, but not a stemming filter, which needs to look at the keyword as a
whole.

[float]
=== Custom normalizers

Elasticsearch does not ship with built-in normalizers so far, so the only way
to get one is by building a custom one. Custom normalizers take a list of char
<<analysis-charfilters, character filters>> and a list of
<<analysis-tokenfilters,token filters>>.

[source,js]
--------------------------------
PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "quote": {
          "type": "mapping",
          "mappings": [
            "« => \"",
            "» => \""
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": ["quote"],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "type": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}
--------------------------------
// CONSOLE
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00			`[[analysis-normalizers]]`
			`== Normalizers`

Update experimental labels in the docs (#25727) Relates https://github.com/elastic/elasticsearch/issues/19798 Removed experimental label from: * Painless * Diversified Sampler Agg * Sampler Agg * Significant Terms Agg * Terms Agg document count error and execution_hint * Cardinality Agg precision_threshold * Pipeline Aggregations * index.shard.check_on_startup * index.store.type (added warning) * Preloading data into the file system cache * foreach ingest processor * Field caps API * Profile API Added experimental label to: * Moving Average Agg Prediction Changed experimental to beta for: * Adjacency matrix agg * Normalizers * Tasks API * Index sorting Labelled experimental in Lucene: * ICU plugin custom rules file * Flatten graph token filter * Synonym graph token filter * Word delimiter graph token filter * Simple pattern tokenizer * Simple pattern split tokenizer Replaced experimental label with warning that details may change in the future: * Analysis explain output format * Segments verbose output format * Percentile Agg compression and HDR Histogram * Percentile Rank Agg HDR Histogram 2017-07-18 08:06:22 -04:00			`beta[]`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00
			`Normalizers are similar to analyzers except that they may only emit a single`
			`token. As a consequence, they do not have a tokenizer and only accept a subset`
			`of the available char filters and token filters. Only the filters that work on`
			`a per-character basis are allowed. For instance a lowercasing filter would be`
			`allowed, but not a stemming filter, which needs to look at the keyword as a`
			`whole.`

			`[float]`
Update normalizers.asciidoc analyzers -> normalizers 2017-02-07 06:09:16 -05:00			`=== Custom normalizers`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 03:36:10 -05:00
			`Elasticsearch does not ship with built-in normalizers so far, so the only way`
			`to get one is by building a custom one. Custom normalizers take a list of char`
			`<<analysis-charfilters, character filters>> and a list of`
			`<<analysis-tokenfilters,token filters>>.`

			`[source,js]`
			`--------------------------------`
			`PUT index`
			`{`
			`"settings": {`
			`"analysis": {`
			`"char_filter": {`
			`"quote": {`
			`"type": "mapping",`
			`"mappings": [`
			`"« => \"",`
			`"» => \""`
			`]`
			`}`
			`},`
			`"normalizer": {`
			`"my_normalizer": {`
			`"type": "custom",`
			`"char_filter": ["quote"],`
			`"filter": ["lowercase", "asciifolding"]`
			`}`
			`}`
			`}`
			`},`
			`"mappings": {`
			`"type": {`
			`"properties": {`
			`"foo": {`
			`"type": "keyword",`
			`"normalizer": "my_normalizer"`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------`
			`// CONSOLE`