OpenSearch/README.md

ICU Analysis for ElasticSearch
==================================

The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding ICU relates analysis components.

In order to install the plugin, simply run: `bin/plugin -install elasticsearch/elasticsearch-analysis-icu/1.2.0`.

    ----------------------------------------
    | ICU Analysis Plugin | ElasticSearch  |
    ----------------------------------------
    | master              | 0.19 -> master |
    ----------------------------------------
    | 1.3.0               | 0.19 -> master |
    ----------------------------------------
    | 1.2.0               | 0.19 -> master |
    ----------------------------------------
    | 1.1.0               | 0.18           |
    ----------------------------------------
    | 1.0.0               | 0.18           |
    ----------------------------------------


ICU Normalization
-----------------

Normalizes characters as explained "here":http://userguide.icu-project.org/transforms/normalization. It registers itself by default under @icu_normalizer@ or @icuNormalizer@ using the default settings. Allows for the name parameter to be provided which can include the following values: @nfc@, @nfkc@, and @nfkc_cf@. Here is a sample settings:

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "keyword",
                        "filter" : ["icu_normalizer"]
                    }
                }
            }
        }
    }

ICU Folding
-----------

Folding of unicode characters based on @UTR#30@. It registers itself under @icu_folding@ and @icuFolding@ names. Sample setting:

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "keyword",
                        "filter" : ["icu_folding"]
                    }
                }
            }
        }
    }

ICU Collation
-------------

Uses collation token filter. Allows to either specify the rules for collation (defined "here":http://www.icu-project.org/userguide/Collate_Customization.html) using the @rules@ parameter (can point to a location or expressed in the settings, location can be relative to config location), or using the @language@ parameter (further specialized by country and variant). By default registers under @icu_collation@ or @icuCollation@ and uses the default locale.

Here is a sample settings:

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "keyword",
                        "filter" : ["icu_collation"]
                    }
                }
            }
        }
    }

And here is a sample of custom collation:

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "keyword",
                        "filter" : ["myCollator"]
                    }
                },
                "filter" : {
                    "myCollator" : {
                        "type" : "icu_collation",
                        "language" : "en"
                    }
                }
            }
        }
    }


ICU Tokenizer
-------------

Breaks text into words according to UAX #29: Unicode Text Segmentation ((http://www.unicode.org/reports/tr29/)).

    {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "collation" : {
                        "tokenizer" : "icu_tokenizer",
                    }
                }
            }
        }
    }
first commit 2011-12-05 06:31:59 -05:00			`ICU Analysis for ElasticSearch`
			`==================================`

			`The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding ICU relates analysis components.`

release 1.2.0 which works with 0.19 2012-02-07 08:23:36 -05:00			In order to install the plugin, simply run: `bin/plugin -install elasticsearch/elasticsearch-analysis-icu/1.2.0`.
first commit 2011-12-05 06:31:59 -05:00
Update README.md 2011-12-10 18:40:31 -05:00			`----------------------------------------`
			`\| ICU Analysis Plugin \| ElasticSearch \|`
			`----------------------------------------`
release 1.2.0 which works with 0.19 2012-02-07 08:23:36 -05:00			`\| master \| 0.19 -> master \|`
			`----------------------------------------`
release 1.3.0 2012-03-20 06:29:50 -04:00			`\| 1.3.0 \| 0.19 -> master \|`
			`----------------------------------------`
release 1.2.0 which works with 0.19 2012-02-07 08:23:36 -05:00			`\| 1.2.0 \| 0.19 -> master \|`
Update README.md 2011-12-10 18:40:31 -05:00			`----------------------------------------`
fix readme 2012-02-07 08:46:26 -05:00			`\| 1.1.0 \| 0.18 \|`
release 1.1.0 2011-12-13 08:27:47 -05:00			`----------------------------------------`
fix readme 2012-02-07 08:46:26 -05:00			`\| 1.0.0 \| 0.18 \|`
Update README.md 2011-12-10 18:40:31 -05:00			`----------------------------------------`
first commit 2011-12-05 06:31:59 -05:00
release 1.1.0 2011-12-13 08:27:47 -05:00
			`ICU Normalization`
			`-----------------`

			`Normalizes characters as explained "here":http://userguide.icu-project.org/transforms/normalization. It registers itself by default under @icu_normalizer@ or @icuNormalizer@ using the default settings. Allows for the name parameter to be provided which can include the following values: @nfc@, @nfkc@, and @nfkc_cf@. Here is a sample settings:`

			`{`
			`"index" : {`
			`"analysis" : {`
			`"analyzer" : {`
			`"collation" : {`
			`"tokenizer" : "keyword",`
			`"filter" : ["icu_normalizer"]`
			`}`
			`}`
			`}`
			`}`
			`}`

			`ICU Folding`
			`-----------`

			`Folding of unicode characters based on @UTR#30@. It registers itself under @icu_folding@ and @icuFolding@ names. Sample setting:`

			`{`
			`"index" : {`
			`"analysis" : {`
			`"analyzer" : {`
			`"collation" : {`
			`"tokenizer" : "keyword",`
			`"filter" : ["icu_folding"]`
			`}`
			`}`
			`}`
			`}`
			`}`

			`ICU Collation`
			`-------------`

			`Uses collation token filter. Allows to either specify the rules for collation (defined "here":http://www.icu-project.org/userguide/Collate_Customization.html) using the @rules@ parameter (can point to a location or expressed in the settings, location can be relative to config location), or using the @language@ parameter (further specialized by country and variant). By default registers under @icu_collation@ or @icuCollation@ and uses the default locale.`

			`Here is a sample settings:`

			`{`
			`"index" : {`
			`"analysis" : {`
			`"analyzer" : {`
			`"collation" : {`
			`"tokenizer" : "keyword",`
			`"filter" : ["icu_collation"]`
			`}`
			`}`
			`}`
			`}`
			`}`

			`And here is a sample of custom collation:`

			`{`
			`"index" : {`
			`"analysis" : {`
			`"analyzer" : {`
			`"collation" : {`
			`"tokenizer" : "keyword",`
			`"filter" : ["myCollator"]`
			`}`
			`},`
			`"filter" : {`
			`"myCollator" : {`
			`"type" : "icu_collation",`
			`"language" : "en"`
			`}`
			`}`
			`}`
			`}`
			`}`


			`ICU Tokenizer`
			`-------------`

			`Breaks text into words according to UAX #29: Unicode Text Segmentation ((http://www.unicode.org/reports/tr29/)).`

			`{`
			`"index" : {`
			`"analysis" : {`
			`"analyzer" : {`
			`"collation" : {`
			`"tokenizer" : "icu_tokenizer",`
			`}`
			`}`
			`}`
			`}`
			`}`