Fix doc
This commit is contained in:
parent
4c935133e3
commit
f88f5e2fb7
16
README.md
16
README.md
|
@ -28,7 +28,7 @@ In order to install the plugin, simply run: `bin/plugin -install elasticsearch/e
|
|||
ICU Normalization
|
||||
-----------------
|
||||
|
||||
Normalizes characters as explained "here":http://userguide.icu-project.org/transforms/normalization. It registers itself by default under `icu_normalizer` or `icuNormalizer` using the default settings. Allows for the name parameter to be provided which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`. Here is a sample settings:
|
||||
Normalizes characters as explained [here](http://userguide.icu-project.org/transforms/normalization). It registers itself by default under `icu_normalizer` or `icuNormalizer` using the default settings. Allows for the name parameter to be provided which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`. Here is a sample settings:
|
||||
|
||||
{
|
||||
"index" : {
|
||||
|
@ -64,7 +64,9 @@ Folding of unicode characters based on `UTR#30`. It registers itself under `icu_
|
|||
ICU Filtering
|
||||
-------------
|
||||
|
||||
The folding can be filtered by a set of unicode characters with the parameter `unicodeSetFilter`. This is useful for a non-internationalized search engine where retaining a set of national characters which are primary letters in a specific language is wanted. See syntax for the UnicodeSet "here":http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html.
|
||||
The folding can be filtered by a set of unicode characters with the parameter `unicodeSetFilter`. This is useful for a
|
||||
non-internationalized search engine where retaining a set of national characters which are primary letters in a specific
|
||||
language is wanted. See syntax for the UnicodeSet [here](http://icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html).
|
||||
|
||||
The Following example exempts Swedish characters from the folding. Note that the filtered characters are NOT lowercased which is why we add that filter below.
|
||||
|
||||
|
@ -90,7 +92,11 @@ The Following example exempts Swedish characters from the folding. Note that the
|
|||
ICU Collation
|
||||
-------------
|
||||
|
||||
Uses collation token filter. Allows to either specify the rules for collation (defined "here":http://www.icu-project.org/userguide/Collate_Customization.html) using the `rules` parameter (can point to a location or expressed in the settings, location can be relative to config location), or using the `language` parameter (further specialized by country and variant). By default registers under `icu_collation` or `icuCollation` and uses the default locale.
|
||||
Uses collation token filter. Allows to either specify the rules for collation
|
||||
(defined [here](http://www.icu-project.org/userguide/Collate_Customization.html)) using the `rules` parameter
|
||||
(can point to a location or expressed in the settings, location can be relative to config location), or using the
|
||||
`language` parameter (further specialized by country and variant). By default registers under `icu_collation` or
|
||||
`icuCollation` and uses the default locale.
|
||||
|
||||
Here is a sample settings:
|
||||
|
||||
|
@ -132,7 +138,7 @@ Optional options:
|
|||
* `strength` - The strength property determines the minimum level of difference considered significant during comparison.
|
||||
The default strength for the Collator is `tertiary`, unless specified otherwise by the locale used to create the Collator.
|
||||
Possible values: `primary`, `secondary`, `tertiary`, `quaternary` or `identical`.
|
||||
See ICU Collation:http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Collator.html documentation for a more detailed
|
||||
See [ICU Collation](http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Collator.html) documentation for a more detailed
|
||||
explanation for the specific values.
|
||||
* `decomposition` - Possible values: `no` or `canonical`. Defaults to `no`. Setting this decomposition property with
|
||||
`canonical` allows the Collator to handle un-normalized text properly, producing the same results as if the text were
|
||||
|
@ -157,7 +163,7 @@ Expert options:
|
|||
ICU Tokenizer
|
||||
-------------
|
||||
|
||||
Breaks text into words according to UAX #29: Unicode Text Segmentation ((http://www.unicode.org/reports/tr29/)).
|
||||
Breaks text into words according to [UAX #29: Unicode Text Segmentation](http://www.unicode.org/reports/tr29/).
|
||||
|
||||
{
|
||||
"index" : {
|
||||
|
|
Loading…
Reference in New Issue