Use JS markdown formatter
(cherry picked from commit 3941016)
This commit is contained in:
parent
dafa7e764d
commit
f068ef88a4
128
README.md
128
README.md
|
@ -24,36 +24,40 @@ ICU Normalization
|
||||||
|
|
||||||
Normalizes characters as explained [here](http://userguide.icu-project.org/transforms/normalization). It registers itself by default under `icu_normalizer` or `icuNormalizer` using the default settings. Allows for the name parameter to be provided which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`. Here is a sample settings:
|
Normalizes characters as explained [here](http://userguide.icu-project.org/transforms/normalization). It registers itself by default under `icu_normalizer` or `icuNormalizer` using the default settings. Allows for the name parameter to be provided which can include the following values: `nfc`, `nfkc`, and `nfkc_cf`. Here is a sample settings:
|
||||||
|
|
||||||
{
|
```js
|
||||||
"index" : {
|
{
|
||||||
"analysis" : {
|
"index" : {
|
||||||
"analyzer" : {
|
"analysis" : {
|
||||||
"collation" : {
|
"analyzer" : {
|
||||||
"tokenizer" : "keyword",
|
"collation" : {
|
||||||
"filter" : ["icu_normalizer"]
|
"tokenizer" : "keyword",
|
||||||
}
|
"filter" : ["icu_normalizer"]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
ICU Folding
|
ICU Folding
|
||||||
-----------
|
-----------
|
||||||
|
|
||||||
Folding of unicode characters based on `UTR#30`. It registers itself under `icu_folding` and `icuFolding` names. Sample setting:
|
Folding of unicode characters based on `UTR#30`. It registers itself under `icu_folding` and `icuFolding` names. Sample setting:
|
||||||
|
|
||||||
{
|
```js
|
||||||
"index" : {
|
{
|
||||||
"analysis" : {
|
"index" : {
|
||||||
"analyzer" : {
|
"analysis" : {
|
||||||
"collation" : {
|
"analyzer" : {
|
||||||
"tokenizer" : "keyword",
|
"collation" : {
|
||||||
"filter" : ["icu_folding"]
|
"tokenizer" : "keyword",
|
||||||
}
|
"filter" : ["icu_folding"]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
ICU Filtering
|
ICU Filtering
|
||||||
-------------
|
-------------
|
||||||
|
@ -64,24 +68,26 @@ language is wanted. See syntax for the UnicodeSet [here](http://icu-project.org/
|
||||||
|
|
||||||
The Following example exempts Swedish characters from the folding. Note that the filtered characters are NOT lowercased which is why we add that filter below.
|
The Following example exempts Swedish characters from the folding. Note that the filtered characters are NOT lowercased which is why we add that filter below.
|
||||||
|
|
||||||
{
|
```js
|
||||||
"index" : {
|
{
|
||||||
"analysis" : {
|
"index" : {
|
||||||
"analyzer" : {
|
"analysis" : {
|
||||||
"folding" : {
|
"analyzer" : {
|
||||||
"tokenizer" : "standard",
|
"folding" : {
|
||||||
"filter" : ["my_icu_folding", "lowercase"]
|
"tokenizer" : "standard",
|
||||||
}
|
"filter" : ["my_icu_folding", "lowercase"]
|
||||||
}
|
}
|
||||||
"filter" : {
|
}
|
||||||
"my_icu_folding" : {
|
"filter" : {
|
||||||
"type" : "icu_folding"
|
"my_icu_folding" : {
|
||||||
"unicodeSetFilter" : "[^åäöÅÄÖ]"
|
"type" : "icu_folding"
|
||||||
}
|
"unicodeSetFilter" : "[^åäöÅÄÖ]"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
ICU Collation
|
ICU Collation
|
||||||
-------------
|
-------------
|
||||||
|
@ -94,39 +100,43 @@ Uses collation token filter. Allows to either specify the rules for collation
|
||||||
|
|
||||||
Here is a sample settings:
|
Here is a sample settings:
|
||||||
|
|
||||||
{
|
```js
|
||||||
"index" : {
|
{
|
||||||
"analysis" : {
|
"index" : {
|
||||||
"analyzer" : {
|
"analysis" : {
|
||||||
"collation" : {
|
"analyzer" : {
|
||||||
"tokenizer" : "keyword",
|
"collation" : {
|
||||||
"filter" : ["icu_collation"]
|
"tokenizer" : "keyword",
|
||||||
}
|
"filter" : ["icu_collation"]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
And here is a sample of custom collation:
|
And here is a sample of custom collation:
|
||||||
|
|
||||||
{
|
```js
|
||||||
"index" : {
|
{
|
||||||
"analysis" : {
|
"index" : {
|
||||||
"analyzer" : {
|
"analysis" : {
|
||||||
"collation" : {
|
"analyzer" : {
|
||||||
"tokenizer" : "keyword",
|
"collation" : {
|
||||||
"filter" : ["myCollator"]
|
"tokenizer" : "keyword",
|
||||||
}
|
"filter" : ["myCollator"]
|
||||||
},
|
}
|
||||||
"filter" : {
|
},
|
||||||
"myCollator" : {
|
"filter" : {
|
||||||
"type" : "icu_collation",
|
"myCollator" : {
|
||||||
"language" : "en"
|
"type" : "icu_collation",
|
||||||
}
|
"language" : "en"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
Optional options:
|
Optional options:
|
||||||
* `strength` - The strength property determines the minimum level of difference considered significant during comparison.
|
* `strength` - The strength property determines the minimum level of difference considered significant during comparison.
|
||||||
|
@ -159,17 +169,19 @@ ICU Tokenizer
|
||||||
|
|
||||||
Breaks text into words according to [UAX #29: Unicode Text Segmentation](http://www.unicode.org/reports/tr29/).
|
Breaks text into words according to [UAX #29: Unicode Text Segmentation](http://www.unicode.org/reports/tr29/).
|
||||||
|
|
||||||
{
|
```js
|
||||||
"index" : {
|
{
|
||||||
"analysis" : {
|
"index" : {
|
||||||
"analyzer" : {
|
"analysis" : {
|
||||||
"collation" : {
|
"analyzer" : {
|
||||||
"tokenizer" : "icu_tokenizer",
|
"collation" : {
|
||||||
}
|
"tokenizer" : "icu_tokenizer",
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
License
|
License
|
||||||
|
|
Loading…
Reference in New Issue