🔎 Open source distributed and RESTful search engine.
Go to file
Martijn van Groningen 14586143a6 Set next development version 2012-09-27 10:36:43 +02:00
src Replaced usage of java.util.Locale with com.ibm.icu.ULocale 2012-09-27 10:16:05 +02:00
.gitignore Ignore eclipse files 2012-02-26 23:28:09 +01:00
LICENSE.txt add license and repo 2012-06-10 21:55:08 +02:00
README.md fix date 2012-06-10 21:55:47 +02:00
pom.xml Set next development version 2012-09-27 10:36:43 +02:00

README.md

ICU Analysis for ElasticSearch

The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding ICU relates analysis components.

In order to install the plugin, simply run: bin/plugin -install elasticsearch/elasticsearch-analysis-icu/1.5.0.

----------------------------------------
| ICU Analysis Plugin | ElasticSearch  |
----------------------------------------
| master              | 0.19 -> master |
----------------------------------------
| 1.5.0               | 0.19 -> master |
----------------------------------------
| 1.4.0               | 0.19 -> master |
----------------------------------------
| 1.3.0               | 0.19 -> master |
----------------------------------------
| 1.2.0               | 0.19 -> master |
----------------------------------------
| 1.1.0               | 0.18           |
----------------------------------------
| 1.0.0               | 0.18           |
----------------------------------------

ICU Normalization

Normalizes characters as explained "here":http://userguide.icu-project.org/transforms/normalization. It registers itself by default under @icu_normalizer@ or @icuNormalizer@ using the default settings. Allows for the name parameter to be provided which can include the following values: @nfc@, @nfkc@, and @nfkc_cf@. Here is a sample settings:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_normalizer"]
                }
            }
        }
    }
}

ICU Folding

Folding of unicode characters based on @UTR#30@. It registers itself under @icu_folding@ and @icuFolding@ names. Sample setting:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_folding"]
                }
            }
        }
    }
}

ICU Collation

Uses collation token filter. Allows to either specify the rules for collation (defined "here":http://www.icu-project.org/userguide/Collate_Customization.html) using the @rules@ parameter (can point to a location or expressed in the settings, location can be relative to config location), or using the @language@ parameter (further specialized by country and variant). By default registers under @icu_collation@ or @icuCollation@ and uses the default locale.

Here is a sample settings:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_collation"]
                }
            }
        }
    }
}

And here is a sample of custom collation:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["myCollator"]
                }
            },
            "filter" : {
                "myCollator" : {
                    "type" : "icu_collation",
                    "language" : "en"
                }
            }
        }
    }
}

ICU Tokenizer

Breaks text into words according to UAX #29: Unicode Text Segmentation ((http://www.unicode.org/reports/tr29/)).

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "icu_tokenizer",
                }
            }
        }
    }
}

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2009-2012 Shay Banon and ElasticSearch <http://www.elasticsearch.org>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.