From 2aea018feb360b504333185393417456c976951c Mon Sep 17 00:00:00 2001 From: Gasol Wu Date: Sun, 31 May 2015 23:02:42 +0800 Subject: [PATCH] Update documentation for ICU Transform Fixes #40 --- README.md | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/README.md b/README.md index 95d955980d4..46960a69b86 100644 --- a/README.md +++ b/README.md @@ -224,6 +224,52 @@ Here is a sample settings: } ``` +ICU Transform +------------- +Transforms are used to process Unicode text in many different ways. Some include case mapping, normalization, +transliteration and bidirectional text handling. + +You can defined transliterator identifiers by using `id` property, and specify direction to `forward` or `reverse` by +using `dir` property, The default value of both properties are `Null` and `forward`. + +For example: + +```js +{ + "index" : { + "analysis" : { + "analyzer" : { + "latin" : { + "tokenizer" : "keyword", + "filter" : ["myLatinTransform"] + } + }, + "filter" : { + "myLatinTransform" : { + "type" : "icu_transform", + "id" : "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC" + } + } + } + } +} +``` + +This transform transliterated characters to latin, and separates accents from their base characters, removes the accents, +and then puts the remaining text into an unaccented form. + +The results are: + +`你好` to `ni hao` + +`здравствуйте` to `zdravstvujte` + +`こんにちは` to `kon'nichiha` + +Currently the filter only supports identifier and direction, custom rulesets are not yet supported. + +For more documentation, Please see the [user guide of ICU Transform](http://userguide.icu-project.org/transforms/general). + License -------