OpenSearch/docs/reference/analysis/normalizers.asciidoc

[[analysis-normalizers]]
== Normalizers

Normalizers are similar to analyzers except that they may only emit a single
token. As a consequence, they do not have a tokenizer and only accept a subset
of the available char filters and token filters. Only the filters that work on
a per-character basis are allowed. For instance a lowercasing filter would be
allowed, but not a stemming filter, which needs to look at the keyword as a
whole. The current list of filters that can be used in a normalizer is
following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
`hindi_normalization`, `indic_normalization`, `lowercase`,
`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
`sorani_normalization`, `uppercase`.

[float]
=== Custom normalizers

Elasticsearch does not ship with built-in normalizers so far, so the only way
to get one is by building a custom one. Custom normalizers take a list of char
<<analysis-charfilters, character filters>> and a list of
<<analysis-tokenfilters,token filters>>.

[source,js]
--------------------------------
PUT index?include_type_name=true
{
  "settings": {
    "analysis": {
      "char_filter": {
        "quote": {
          "type": "mapping",
          "mappings": [
            "« => \"",
            "» => \""
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": ["quote"],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}
--------------------------------
// CONSOLE
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 09:36:10 +01:00			`[[analysis-normalizers]]`
			`== Normalizers`

			`Normalizers are similar to analyzers except that they may only emit a single`
			`token. As a consequence, they do not have a tokenizer and only accept a subset`
			`of the available char filters and token filters. Only the filters that work on`
			`a per-character basis are allowed. For instance a lowercasing filter would be`
			`allowed, but not a stemming filter, which needs to look at the keyword as a`
[DOCS] Add supported token filters Update normalizers.asciidoc with the list of supported token filters Closes #28605 2018-02-13 14:08:39 -08:00			`whole. The current list of filters that can be used in a normalizer is`
			following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
			`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
			`hindi_normalization`, `indic_normalization`, `lowercase`,
			`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
			`sorani_normalization`, `uppercase`.
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 09:36:10 +01:00
			`[float]`
Update normalizers.asciidoc analyzers -> normalizers 2017-02-07 12:09:16 +01:00			`=== Custom normalizers`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 09:36:10 +01:00
			`Elasticsearch does not ship with built-in normalizers so far, so the only way`
			`to get one is by building a custom one. Custom normalizers take a list of char`
			`<<analysis-charfilters, character filters>> and a list of`
			`<<analysis-tokenfilters,token filters>>.`

			`[source,js]`
			`--------------------------------`
Update the default for include_type_name to false. (#37285) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs. 2019-01-14 13:08:01 -08:00			`PUT index?include_type_name=true`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 09:36:10 +01:00			`{`
			`"settings": {`
			`"analysis": {`
			`"char_filter": {`
			`"quote": {`
			`"type": "mapping",`
			`"mappings": [`
			`"« => \"",`
			`"» => \""`
			`]`
			`}`
			`},`
			`"normalizer": {`
			`"my_normalizer": {`
			`"type": "custom",`
			`"char_filter": ["quote"],`
			`"filter": ["lowercase", "asciifolding"]`
			`}`
			`}`
			`}`
			`},`
			`"mappings": {`
Make sure to use the type _doc in the REST documentation. (#34662) * Replace custom type names with _doc in REST examples. * Avoid using two mapping types in the percolator docs. * Rename doc -> _doc in the main repository README. * Also replace some custom type names in the HLRC docs. 2018-10-22 11:54:04 -07:00			`"_doc": {`
Add the ability to set an analyzer on keyword fields. (#21919) This adds a new `normalizer` property to `keyword` fields that pre-processes the field value prior to indexing, but without altering the `_source`. Note that only the normalization components that work on a per-character basis are applied, so for instance stemming filters will be ignored while lowercasing or ascii folding will be applied. Closes #18064 2016-12-30 09:36:10 +01:00			`"properties": {`
			`"foo": {`
			`"type": "keyword",`
			`"normalizer": "my_normalizer"`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------`
			`// CONSOLE`