2016-12-30 03:36:10 -05:00
|
|
|
[[analysis-normalizers]]
|
|
|
|
== Normalizers
|
|
|
|
|
|
|
|
Normalizers are similar to analyzers except that they may only emit a single
|
|
|
|
token. As a consequence, they do not have a tokenizer and only accept a subset
|
|
|
|
of the available char filters and token filters. Only the filters that work on
|
|
|
|
a per-character basis are allowed. For instance a lowercasing filter would be
|
|
|
|
allowed, but not a stemming filter, which needs to look at the keyword as a
|
2018-02-13 17:08:39 -05:00
|
|
|
whole. The current list of filters that can be used in a normalizer is
|
|
|
|
following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
|
|
|
|
`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
|
|
|
|
`hindi_normalization`, `indic_normalization`, `lowercase`,
|
|
|
|
`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
|
|
|
|
`sorani_normalization`, `uppercase`.
|
2016-12-30 03:36:10 -05:00
|
|
|
|
2020-04-03 06:43:40 -04:00
|
|
|
Elasticsearch ships with a `lowercase` built-in normalizer. For other forms of
|
|
|
|
normalization a custom configuration is required.
|
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2017-02-07 06:09:16 -05:00
|
|
|
=== Custom normalizers
|
2016-12-30 03:36:10 -05:00
|
|
|
|
2020-04-03 06:43:40 -04:00
|
|
|
Custom normalizers take a list of
|
2016-12-30 03:36:10 -05:00
|
|
|
<<analysis-charfilters, character filters>> and a list of
|
|
|
|
<<analysis-tokenfilters,token filters>>.
|
|
|
|
|
2019-09-09 13:38:14 -04:00
|
|
|
[source,console]
|
2016-12-30 03:36:10 -05:00
|
|
|
--------------------------------
|
2019-01-18 08:11:18 -05:00
|
|
|
PUT index
|
2016-12-30 03:36:10 -05:00
|
|
|
{
|
|
|
|
"settings": {
|
|
|
|
"analysis": {
|
|
|
|
"char_filter": {
|
|
|
|
"quote": {
|
|
|
|
"type": "mapping",
|
|
|
|
"mappings": [
|
|
|
|
"« => \"",
|
|
|
|
"» => \""
|
|
|
|
]
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"normalizer": {
|
|
|
|
"my_normalizer": {
|
|
|
|
"type": "custom",
|
|
|
|
"char_filter": ["quote"],
|
|
|
|
"filter": ["lowercase", "asciifolding"]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"mappings": {
|
2019-01-18 08:11:18 -05:00
|
|
|
"properties": {
|
|
|
|
"foo": {
|
|
|
|
"type": "keyword",
|
|
|
|
"normalizer": "my_normalizer"
|
2016-12-30 03:36:10 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------
|