60 lines
1.7 KiB
Plaintext
60 lines
1.7 KiB
Plaintext
[[analysis-normalizers]]
|
|
== Normalizers
|
|
|
|
Normalizers are similar to analyzers except that they may only emit a single
|
|
token. As a consequence, they do not have a tokenizer and only accept a subset
|
|
of the available char filters and token filters. Only the filters that work on
|
|
a per-character basis are allowed. For instance a lowercasing filter would be
|
|
allowed, but not a stemming filter, which needs to look at the keyword as a
|
|
whole. The current list of filters that can be used in a normalizer is
|
|
following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
|
|
`cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
|
|
`hindi_normalization`, `indic_normalization`, `lowercase`,
|
|
`persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
|
|
`sorani_normalization`, `uppercase`.
|
|
|
|
Elasticsearch ships with a `lowercase` built-in normalizer. For other forms of
|
|
normalization a custom configuration is required.
|
|
|
|
[discrete]
|
|
=== Custom normalizers
|
|
|
|
Custom normalizers take a list of
|
|
<<analysis-charfilters, character filters>> and a list of
|
|
<<analysis-tokenfilters,token filters>>.
|
|
|
|
[source,console]
|
|
--------------------------------
|
|
PUT index
|
|
{
|
|
"settings": {
|
|
"analysis": {
|
|
"char_filter": {
|
|
"quote": {
|
|
"type": "mapping",
|
|
"mappings": [
|
|
"« => \"",
|
|
"» => \""
|
|
]
|
|
}
|
|
},
|
|
"normalizer": {
|
|
"my_normalizer": {
|
|
"type": "custom",
|
|
"char_filter": ["quote"],
|
|
"filter": ["lowercase", "asciifolding"]
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"mappings": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "keyword",
|
|
"normalizer": "my_normalizer"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------
|