Update documentation for ICU Transform

Fixes #40
This commit is contained in:
Gasol Wu 2015-05-31 23:02:42 +08:00
parent 97e6016137
commit 2aea018feb
1 changed files with 46 additions and 0 deletions

View File

@ -224,6 +224,52 @@ Here is a sample settings:
}
```
ICU Transform
-------------
Transforms are used to process Unicode text in many different ways. Some include case mapping, normalization,
transliteration and bidirectional text handling.
You can defined transliterator identifiers by using `id` property, and specify direction to `forward` or `reverse` by
using `dir` property, The default value of both properties are `Null` and `forward`.
For example:
```js
{
"index" : {
"analysis" : {
"analyzer" : {
"latin" : {
"tokenizer" : "keyword",
"filter" : ["myLatinTransform"]
}
},
"filter" : {
"myLatinTransform" : {
"type" : "icu_transform",
"id" : "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC"
}
}
}
}
}
```
This transform transliterated characters to latin, and separates accents from their base characters, removes the accents,
and then puts the remaining text into an unaccented form.
The results are:
`你好` to `ni hao`
`здравствуйте` to `zdravstvujte`
`こんにちは` to `kon'nichiha`
Currently the filter only supports identifier and direction, custom rulesets are not yet supported.
For more documentation, Please see the [user guide of ICU Transform](http://userguide.icu-project.org/transforms/general).
License
-------