migrate branch for analysis-kuromoji

2015-06-05 13:12:09 +02:00 · 2015-06-05 13:12:09 +02:00 · 7294d27e5c
parent 96101d3e7e 9b41b94459
commit 7294d27e5c
20 changed files with 1721 additions and 0 deletions
--- a/plugins/analysis-kuromoji/README.md
+++ b/plugins/analysis-kuromoji/README.md
@ -0,0 +1,552 @@
+Japanese (kuromoji) Analysis for Elasticsearch
+==================================
+
+The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.
+
+In order to install the plugin, run: 
+
+```sh
+bin/plugin install elasticsearch/elasticsearch-analysis-kuromoji/2.5.0
+```
+
+You need to install a version matching your Elasticsearch version:
+
+| elasticsearch |  Kuromoji Analysis Plugin   |   Docs     |  
+|---------------|-----------------------------|------------|
+| master        |  Build from source          | See below  |
+| es-1.x        |  Build from source          | [2.6.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/es-1.x/#version-260-snapshot-for-elasticsearch-1x)  |
+|    es-1.5              |     2.5.0         | [2.5.0](https://github.com/elastic/elasticsearch-analysis-kuromoji/tree/v2.5.0/#version-250-for-elasticsearch-15)                  |
+|    es-1.4              |     2.4.3         | [2.4.3](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.4.3/#version-243-for-elasticsearch-14)                  |
+| < 1.4.5       |  2.4.2                      | [2.4.2](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.4.2/#version-242-for-elasticsearch-14)                  |
+| < 1.4.3       |  2.4.1                      | [2.4.1](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.4.1/#version-241-for-elasticsearch-14)                  |
+| es-1.3        |  2.3.0                      | [2.3.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.3.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
+| es-1.2        |  2.2.0                      | [2.2.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.2.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
+| es-1.1        |  2.1.0                      | [2.1.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.1.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
+| es-1.0        |  2.0.0                      | [2.0.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.0.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
+| es-0.90       |  1.8.0                      | [1.8.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v1.8.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
+
+To build a `SNAPSHOT` version, you need to build it with Maven:
+
+```bash
+mvn clean install
+plugin --install analysis-kuromoji \
+       --url file:target/releases/elasticsearch-analysis-kuromoji-X.X.X-SNAPSHOT.zip
+```
+
+Includes Analyzer, Tokenizer, TokenFilter, CharFilter
+-----------------------------------------------
+
+The plugin includes these analyzer and tokenizer, tokenfilter.
+
+| name                    | type        |
+|-------------------------|-------------|
+| kuromoji_iteration_mark | charfilter  |
+| kuromoji                | analyzer    |
+| kuromoji_tokenizer      | tokenizer   |
+| kuromoji_baseform       | tokenfilter |
+| kuromoji_part_of_speech | tokenfilter |
+| kuromoji_readingform    | tokenfilter |
+| kuromoji_stemmer        | tokenfilter |
+| ja_stop                 | tokenfilter |
+
+
+Usage
+-----
+
+## Analyzer : kuromoji
+
+An analyzer of type `kuromoji`.
+This analyzer is the following tokenizer and tokenfilter combination.
+
+* `kuromoji_tokenizer` : Kuromoji Tokenizer
+* `kuromoji_baseform` : Kuromoji BasicFormFilter (TokenFilter)
+* `kuromoji_part_of_speech` : Kuromoji Part of Speech Stop Filter (TokenFilter)
+* `cjk_width` : CJK Width Filter (TokenFilter)
+* `stop` : Stop Filter (TokenFilter)
+* `kuromoji_stemmer` : Kuromoji Katakana Stemmer Filter(TokenFilter)
+* `lowercase` : LowerCase Filter (TokenFilter)
+
+## CharFilter : kuromoji_iteration_mark
+
+A charfilter of type `kuromoji_iteration_mark`.
+This charfilter is Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.
+
+The following ar setting that can be set for a `kuromoji_iteration_mark` charfilter type:
+
+| **Setting**     | **Description**                                              | **Default value** |
+|:----------------|:-------------------------------------------------------------|:------------------|
+| normalize_kanji | indicates whether kanji iteration marks should be normalized | `true`            |
+| normalize_kana  | indicates whether kanji iteration marks should be normalized | `true`            |
+
+## Tokenizer : kuromoji_tokenizer
+
+A tokenizer of type `kuromoji_tokenizer`.
+
+The following are settings that can be set for a `kuromoji_tokenizer` tokenizer type:
+
+| **Setting**         | **Description**                                                                                                           | **Default value** |
+|:--------------------|:--------------------------------------------------------------------------------------------------------------------------|:------------------|
+| mode                | Tokenization mode: this determines how the tokenizer handles compound and unknown words. `normal` and `search`, `extended`| `search`          |
+| discard_punctuation | `true` if punctuation tokens should be dropped from the output.                                                           | `true`            |
+| user_dictionary     | set User Dictionary file                                                                                                  |                   |
+
+### Tokenization mode
+
+The mode is three types.
+
+* `normal` : Ordinary segmentation: no decomposition for compounds
+
+* `search` : Segmentation geared towards search: this includes a decompounding process for long nouns, also including the full compound token as a synonym.
+
+* `extended` : Extended mode outputs unigrams for unknown words.
+
+#### Difference tokenization mode outputs
+
+Input text is `関西国際空港` and `アブラカダブラ`.
+
+| **mode**   | `関西国際空港` | `アブラカダブラ` |
+|:-----------|:-------------|:-------|
+| `normal`   | `関西国際空港` | `アブラカダブラ` |
+| `search`   | `関西` `関西国際空港` `国際` `空港` | `アブラカダブラ` |
+| `extended` | `関西` `国際` `空港` | `ア` `ブ` `ラ` `カ` `ダ` `ブ` `ラ` |
+
+### User Dictionary
+
+Kuromoji tokenizer use MeCab-IPADIC dictionary by default.
+And Kuromoji is added an entry of dictionary to define by user; this is User Dictionary.
+User Dictionary entries are defined using the following CSV format:
+
+```
+<text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
+```
+
+Dictionary Example
+
+```
+東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞
+```
+
+To use User Dictionary set file path to `user_dict` attribute.
+User Dictionary file is placed `ES_HOME/config` directory.
+
+### example
+
+_Example Settings:_
+
+```sh
+curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
+{
+    "settings": {
+        "index":{
+            "analysis":{
+                "tokenizer" : {
+                    "kuromoji_user_dict" : {
+                       "type" : "kuromoji_tokenizer",
+                       "mode" : "extended",
+                       "discard_punctuation" : "false",
+                       "user_dictionary" : "userdict_ja.txt"
+                    }
+                },
+                "analyzer" : {
+                    "my_analyzer" : {
+                        "type" : "custom",
+                        "tokenizer" : "kuromoji_user_dict"
+                    }
+                }
+
+            }
+        }
+    }
+}
+'
+```
+
+_Example Request using `_analyze` API :_
+
+```sh
+curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '東京スカイツリー'
+```
+
+_Response :_
+
+```json
+{
+  "tokens" : [ {
+    "token" : "東京",
+    "start_offset" : 0,
+    "end_offset" : 2,
+    "type" : "word",
+    "position" : 1
+  }, {
+    "token" : "スカイツリー",
+    "start_offset" : 2,
+    "end_offset" : 8,
+    "type" : "word",
+    "position" : 2
+  } ]
+}
+```
+
+## TokenFilter : kuromoji_baseform
+
+A token filter of type `kuromoji_baseform` that replaces term text with BaseFormAttribute.
+This acts as a lemmatizer for verbs and adjectives.
+
+### example
+
+_Example Settings:_
+
+```sh
+curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
+{
+    "settings": {
+        "index":{
+            "analysis":{
+                "analyzer" : {
+                    "my_analyzer" : {
+                        "tokenizer" : "kuromoji_tokenizer",
+                        "filter" : ["kuromoji_baseform"]
+                    }
+                }
+            }
+        }
+    }
+}
+'
+```
+
+_Example Request using `_analyze` API :_
+
+```sh
+curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '飲み'
+```
+
+_Response :_
+
+```json
+{
+  "tokens" : [ {
+    "token" : "飲む",
+    "start_offset" : 0,
+    "end_offset" : 2,
+    "type" : "word",
+    "position" : 1
+  } ]
+}
+```
+
+## TokenFilter : kuromoji_part_of_speech
+
+A token filter of type `kuromoji_part_of_speech` that removes tokens that match a set of part-of-speech tags.
+
+The following are settings that can be set for a stop token filter type:
+
+| **Setting** | **Description**                                      |
+|:------------|:-----------------------------------------------------|
+| stoptags    | A list of part-of-speech tags that should be removed |
+
+Note that default setting is stoptags.txt include lucene-analyzer-kuromoji.jar.
+
+### example
+
+_Example Settings:_
+
+```sh
+curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
+{
+    "settings": {
+        "index":{
+            "analysis":{
+                "analyzer" : {
+                    "my_analyzer" : {
+                        "tokenizer" : "kuromoji_tokenizer",
+                        "filter" : ["my_posfilter"]
+                    }
+                },
+                "filter" : {
+                    "my_posfilter" : {
+                        "type" : "kuromoji_part_of_speech",
+                        "stoptags" : [
+                            "助詞-格助詞-一般",
+                            "助詞-終助詞"
+                        ]
+                    }
+                }
+            }
+        }
+    }
+}
+'
+```
+
+_Example Request using `_analyze` API :_
+
+```sh
+curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '寿司がおいしいね'
+```
+
+_Response :_
+
+```json
+{
+  "tokens" : [ {
+    "token" : "寿司",
+    "start_offset" : 0,
+    "end_offset" : 2,
+    "type" : "word",
+    "position" : 1
+  }, {
+    "token" : "おいしい",
+    "start_offset" : 3,
+    "end_offset" : 7,
+    "type" : "word",
+    "position" : 3
+  } ]
+}
+```
+
+## TokenFilter : kuromoji_readingform
+
+A token filter of type `kuromoji_readingform` that replaces the term attribute with the reading of a token in either katakana or romaji form.
+The default reading form is katakana.
+
+The following are settings that can be set for a `kuromoji_readingform` token filter type:
+
+| **Setting** | **Description**                                           | **Default value** |
+|:------------|:----------------------------------------------------------|:------------------|
+| use_romaji  | `true` if romaji reading form output instead of katakana. | `false`           |
+
+Note that elasticsearch-analysis-kuromoji built-in `kuromoji_readingform` set default `true` to `use_romaji` attribute.
+
+### example
+
+_Example Settings:_
+
+```sh
+curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
+{
+    "settings": {
+        "index":{
+            "analysis":{
+                "analyzer" : {
+                    "romaji_analyzer" : {
+                        "tokenizer" : "kuromoji_tokenizer",
+                        "filter" : ["romaji_readingform"]
+                    },
+                    "katakana_analyzer" : {
+                        "tokenizer" : "kuromoji_tokenizer",
+                        "filter" : ["katakana_readingform"]
+                    }
+                },
+                "filter" : {
+                    "romaji_readingform" : {
+                        "type" : "kuromoji_readingform",
+                        "use_romaji" : true
+                    },
+                    "katakana_readingform" : {
+                        "type" : "kuromoji_readingform",
+                        "use_romaji" : false
+                    }
+                }
+            }
+        }
+    }
+}
+'
+```
+
+_Example Request using `_analyze` API :_
+
+```sh
+curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=katakana_analyzer&pretty' -d '寿司'
+```
+
+_Response :_
+
+```json
+{
+  "tokens" : [ {
+    "token" : "スシ",
+    "start_offset" : 0,
+    "end_offset" : 2,
+    "type" : "word",
+    "position" : 1
+  } ]
+}
+```
+
+_Example Request using `_analyze` API :_
+
+```sh
+curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=romaji_analyzer&pretty' -d '寿司'
+```
+
+_Response :_
+
+```json
+{
+  "tokens" : [ {
+    "token" : "sushi",
+    "start_offset" : 0,
+    "end_offset" : 2,
+    "type" : "word",
+    "position" : 1
+  } ]
+}
+```
+
+## TokenFilter : kuromoji_stemmer
+
+A token filter of type `kuromoji_stemmer` that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).
+Only katakana words longer than a minimum length are stemmed (default is four).
+
+Note that only full-width katakana characters are supported.
+
+The following are settings that can be set for a `kuromoji_stemmer` token filter type:
+
+| **Setting**     | **Description**            | **Default value** |
+|:----------------|:---------------------------|:------------------|
+| minimum_length  | The minimum length to stem | `4`               |
+
+### example
+
+_Example Settings:_
+
+```sh
+curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
+{
+    "settings": {
+        "index":{
+            "analysis":{
+                "analyzer" : {
+                    "my_analyzer" : {
+                        "tokenizer" : "kuromoji_tokenizer",
+                        "filter" : ["my_katakana_stemmer"]
+                    }
+                },
+                "filter" : {
+                    "my_katakana_stemmer" : {
+                        "type" : "kuromoji_stemmer",
+                        "minimum_length" : 4
+                    }
+                }
+            }
+        }
+    }
+}
+'
+```
+
+_Example Request using `_analyze` API :_
+
+```sh
+curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d 'コピー'
+```
+
+_Response :_
+
+```json
+{
+  "tokens" : [ {
+    "token" : "コピー",
+    "start_offset" : 0,
+    "end_offset" : 3,
+    "type" : "word",
+    "position" : 1
+  } ]
+}
+```
+
+_Example Request using `_analyze` API :_
+
+```sh
+curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d 'サーバー'
+```
+
+_Response :_
+
+```json
+{
+  "tokens" : [ {
+    "token" : "サーバ",
+    "start_offset" : 0,
+    "end_offset" : 4,
+    "type" : "word",
+    "position" : 1
+  } ]
+}
+```
+
+
+## TokenFilter : ja_stop
+
+
+A token filter of type `ja_stop` that provide a predefined "_japanese_" stop words.
+*Note: It is only provide "_japanese_". If you want to use other predefined stop words, you can use `stop` token filter.*
+
+_Example Settings:_
+
+### example
+
+```sh
+curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
+{
+    "settings": {
+        "index":{
+            "analysis":{
+                "analyzer" : {
+                    "analyzer_with_ja_stop" : {
+                        "tokenizer" : "kuromoji_tokenizer",
+                        "filter" : ["ja_stop"]
+                    }
+                },
+                "filter" : {
+                    "ja_stop" : {
+                        "type" : "ja_stop",
+                        "stopwords" : ["_japanese_", "ストップ"]
+                    }
+                }
+            }
+        }
+    }
+}'
+```
+
+_Example Request using `_analyze` API :_
+
+```sh
+curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=katakana_analyzer&pretty' -d 'ストップは消える'
+```
+
+_Response :_
+
+```json
+{
+  "tokens" : [ {
+    "token" : "消える",
+    "start_offset" : 5,
+    "end_offset" : 8,
+    "type" : "word",
+    "position" : 3
+  } ]
+}
+```
+
+License
+-------
+
+    This software is licensed under the Apache 2 license, quoted below.
+
+    Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>
+
+    Licensed under the Apache License, Version 2.0 (the "License"); you may not
+    use this file except in compliance with the License. You may obtain a copy of
+    the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+    License for the specific language governing permissions and limitations under
+    the License.
--- a/plugins/analysis-kuromoji/pom.xml
+++ b/plugins/analysis-kuromoji/pom.xml
@ -0,0 +1,40 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+    <modelVersion>4.0.0</modelVersion>
+
+    <groupId>org.elasticsearch.plugin</groupId>
+    <artifactId>elasticsearch-analysis-kuromoji</artifactId>
+
+    <packaging>jar</packaging>
+    <name>Elasticsearch Japanese (kuromoji) Analysis plugin</name>
+    <description>The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.</description>
+
+    <parent>
+        <groupId>org.elasticsearch</groupId>
+        <artifactId>elasticsearch-plugin</artifactId>
+        <version>2.0.0-SNAPSHOT</version>
+    </parent>
+
+    <properties>
+        <!-- You can add any specific project property here -->
+    </properties>
+
+    <dependencies>
+        <dependency>
+            <groupId>org.apache.lucene</groupId>
+            <artifactId>lucene-analyzers-kuromoji</artifactId>
+        </dependency>
+    </dependencies>
+
+    <build>
+        <plugins>
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-assembly-plugin</artifactId>
+            </plugin>
+        </plugins>
+    </build>
+
+</project>
--- a/plugins/analysis-kuromoji/src/main/assemblies/plugin.xml
+++ b/plugins/analysis-kuromoji/src/main/assemblies/plugin.xml
@ -0,0 +1,26 @@
+<?xml version="1.0"?>
+<assembly>
+    <id>plugin</id>
+    <formats>
+        <format>zip</format>
+    </formats>
+    <includeBaseDirectory>false</includeBaseDirectory>
+    <dependencySets>
+        <dependencySet>
+            <outputDirectory>/</outputDirectory>
+            <useProjectArtifact>true</useProjectArtifact>
+            <useTransitiveFiltering>true</useTransitiveFiltering>
+            <excludes>
+                <exclude>org.elasticsearch:elasticsearch</exclude>
+            </excludes>
+        </dependencySet>
+        <dependencySet>
+            <outputDirectory>/</outputDirectory>
+            <useProjectArtifact>true</useProjectArtifact>
+            <useTransitiveFiltering>true</useTransitiveFiltering>
+            <includes>
+                <include>org.apache.lucene:lucene-analyzers-kuromoji</include>
+            </includes>
+        </dependencySet>
+    </dependencySets>
+</assembly>
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/JapaneseStopTokenFilterFactory.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/JapaneseStopTokenFilterFactory.java
@ -0,0 +1,76 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.core.StopFilter;
+import org.apache.lucene.analysis.ja.JapaneseAnalyzer;
+import org.apache.lucene.analysis.util.CharArraySet;
+import org.apache.lucene.search.suggest.analyzing.SuggestStopFilter;
+import org.elasticsearch.common.collect.MapBuilder;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.inject.assistedinject.Assisted;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.env.Environment;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.settings.IndexSettings;
+
+import java.util.Map;
+import java.util.Set;
+
+public class JapaneseStopTokenFilterFactory extends AbstractTokenFilterFactory{
+
+
+    private final CharArraySet stopWords;
+
+    private final boolean ignoreCase;
+
+    private final boolean removeTrailing;
+
+    @Inject
+    public JapaneseStopTokenFilterFactory(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) {
+        super(index, indexSettings, name, settings);
+        this.ignoreCase = settings.getAsBoolean("ignore_case", false);
+        this.removeTrailing = settings.getAsBoolean("remove_trailing", true);
+        Map<String, Set<?>> namedStopWords = MapBuilder.<String, Set<?>>newMapBuilder()
+            .put("_japanese_", JapaneseAnalyzer.getDefaultStopSet())
+            .immutableMap();
+        this.stopWords = Analysis.parseWords(env, settings, "stopwords", JapaneseAnalyzer.getDefaultStopSet(), namedStopWords, ignoreCase);
+    }
+
+    @Override
+    public TokenStream create(TokenStream tokenStream) {
+        if (removeTrailing) {
+            return new StopFilter(tokenStream, stopWords);
+        } else {
+            return new SuggestStopFilter(tokenStream, stopWords);
+        }
+    }
+
+    public Set<?> stopWords() {
+        return stopWords;
+    }
+
+    public boolean ignoreCase() {
+        return ignoreCase;
+    }
+
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiAnalyzerProvider.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiAnalyzerProvider.java
@ -0,0 +1,56 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+import org.apache.lucene.analysis.ja.JapaneseAnalyzer;
+import org.apache.lucene.analysis.ja.JapaneseTokenizer;
+import org.apache.lucene.analysis.ja.dict.UserDictionary;
+import org.apache.lucene.analysis.util.CharArraySet;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.inject.assistedinject.Assisted;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.env.Environment;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.settings.IndexSettings;
+
+import java.util.Set;
+
+/**
+ */
+public class KuromojiAnalyzerProvider extends AbstractIndexAnalyzerProvider<JapaneseAnalyzer> {
+
+    private final JapaneseAnalyzer analyzer;
+
+    @Inject
+    public KuromojiAnalyzerProvider(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) {
+        super(index, indexSettings, name, settings);
+        final Set<?> stopWords = Analysis.parseStopWords(env, settings, JapaneseAnalyzer.getDefaultStopSet());
+        final JapaneseTokenizer.Mode mode = KuromojiTokenizerFactory.getMode(settings);
+        final UserDictionary userDictionary = KuromojiTokenizerFactory.getUserDictionary(env, settings);
+        analyzer = new JapaneseAnalyzer(userDictionary, mode, CharArraySet.copy(stopWords), JapaneseAnalyzer.getDefaultStopTags());
+    }
+
+    @Override
+    public JapaneseAnalyzer get() {
+        return this.analyzer;
+    }
+
+
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiBaseFormFilterFactory.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiBaseFormFilterFactory.java
@ -0,0 +1,41 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.ja.JapaneseBaseFormFilter;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.inject.assistedinject.Assisted;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.settings.IndexSettings;
+
+public class KuromojiBaseFormFilterFactory extends AbstractTokenFilterFactory {
+
+    @Inject
+    public KuromojiBaseFormFilterFactory(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {
+        super(index, indexSettings, name, settings);
+    }
+
+    @Override
+    public TokenStream create(TokenStream tokenStream) {
+        return new JapaneseBaseFormFilter(tokenStream);
+    }
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiIterationMarkCharFilterFactory.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiIterationMarkCharFilterFactory.java
@ -0,0 +1,48 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+import org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.inject.assistedinject.Assisted;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.settings.IndexSettings;
+
+import java.io.Reader;
+
+public class KuromojiIterationMarkCharFilterFactory extends AbstractCharFilterFactory {
+
+    private final boolean normalizeKanji;
+    private final boolean normalizeKana;
+
+    @Inject
+    public KuromojiIterationMarkCharFilterFactory(Index index, @IndexSettings Settings indexSettings,
+                                                  @Assisted String name, @Assisted Settings settings) {
+        super(index, indexSettings, name);
+        normalizeKanji = settings.getAsBoolean("normalize_kanji", JapaneseIterationMarkCharFilter.NORMALIZE_KANJI_DEFAULT);
+        normalizeKana = settings.getAsBoolean("normalize_kana", JapaneseIterationMarkCharFilter.NORMALIZE_KANA_DEFAULT);
+    }
+
+    @Override
+    public Reader create(Reader reader) {
+        return new JapaneseIterationMarkCharFilter(reader, normalizeKanji, normalizeKana);
+    }
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiKatakanaStemmerFactory.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiKatakanaStemmerFactory.java
@ -0,0 +1,44 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.ja.JapaneseKatakanaStemFilter;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.inject.assistedinject.Assisted;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.settings.IndexSettings;
+
+public class KuromojiKatakanaStemmerFactory extends AbstractTokenFilterFactory {
+
+    private final int minimumLength;
+
+    @Inject
+    public KuromojiKatakanaStemmerFactory(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {
+        super(index, indexSettings, name, settings);
+        minimumLength = settings.getAsInt("minimum_length", JapaneseKatakanaStemFilter.DEFAULT_MINIMUM_LENGTH);
+    }
+
+    @Override
+    public TokenStream create(TokenStream tokenStream) {
+        return new JapaneseKatakanaStemFilter(tokenStream, minimumLength);
+    }
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiPartOfSpeechFilterFactory.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiPartOfSpeechFilterFactory.java
@ -0,0 +1,53 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.ja.JapanesePartOfSpeechStopFilter;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.inject.assistedinject.Assisted;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.env.Environment;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.settings.IndexSettings;
+
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+public class KuromojiPartOfSpeechFilterFactory extends AbstractTokenFilterFactory {
+
+    private final Set<String> stopTags = new HashSet<String>();
+
+    @Inject
+    public KuromojiPartOfSpeechFilterFactory(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) {
+        super(index, indexSettings, name, settings);
+        List<String> wordList = Analysis.getWordList(env, settings, "stoptags");
+        if (wordList != null) {
+            stopTags.addAll(wordList);
+        }
+    }
+
+    @Override
+    public TokenStream create(TokenStream tokenStream) {
+        return new JapanesePartOfSpeechStopFilter(tokenStream, stopTags);
+    }
+
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiReadingFormFilterFactory.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiReadingFormFilterFactory.java
@ -0,0 +1,44 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.ja.JapaneseReadingFormFilter;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.inject.assistedinject.Assisted;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.settings.IndexSettings;
+
+public class KuromojiReadingFormFilterFactory extends AbstractTokenFilterFactory {
+
+    private final boolean useRomaji;
+
+    @Inject
+    public KuromojiReadingFormFilterFactory(Index index, @IndexSettings Settings indexSettings, @Assisted String name, @Assisted Settings settings) {
+        super(index, indexSettings, name, settings);
+        useRomaji = settings.getAsBoolean("use_romaji", false);
+    }
+
+    @Override
+    public TokenStream create(TokenStream tokenStream) {
+        return new JapaneseReadingFormFilter(tokenStream, useRomaji);
+    }
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiTokenizerFactory.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/index/analysis/KuromojiTokenizerFactory.java
@ -0,0 +1,93 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+import org.apache.lucene.analysis.Tokenizer;
+import org.apache.lucene.analysis.ja.JapaneseTokenizer;
+import org.apache.lucene.analysis.ja.JapaneseTokenizer.Mode;
+import org.apache.lucene.analysis.ja.dict.UserDictionary;
+import org.elasticsearch.ElasticsearchException;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.inject.assistedinject.Assisted;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.env.Environment;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.settings.IndexSettings;
+
+import java.io.IOException;
+import java.io.Reader;
+
+/**
+ */
+public class KuromojiTokenizerFactory extends AbstractTokenizerFactory {
+
+    private static final String USER_DICT_OPTION = "user_dictionary";
+
+    private final UserDictionary userDictionary;
+    private final Mode mode;
+
+    private boolean discartPunctuation;
+
+    @Inject
+    public KuromojiTokenizerFactory(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) {
+        super(index, indexSettings, name, settings);
+        mode = getMode(settings);
+        userDictionary = getUserDictionary(env, settings);
+        discartPunctuation = settings.getAsBoolean("discard_punctuation", true);
+    }
+
+    public static UserDictionary getUserDictionary(Environment env, Settings settings) {
+        try {
+            final Reader reader = Analysis.getReaderFromFile(env, settings, USER_DICT_OPTION);
+            if (reader == null) {
+                return null;
+            } else {
+                try {
+                    return UserDictionary.open(reader);
+                } finally {
+                    reader.close();
+                }
+            }
+        } catch (IOException e) {
+            throw new ElasticsearchException("failed to load kuromoji user dictionary", e);
+        }
+    }
+
+    public static JapaneseTokenizer.Mode getMode(Settings settings) {
+        JapaneseTokenizer.Mode mode = JapaneseTokenizer.DEFAULT_MODE;
+        String modeSetting = settings.get("mode", null);
+        if (modeSetting != null) {
+            if ("search".equalsIgnoreCase(modeSetting)) {
+                mode = JapaneseTokenizer.Mode.SEARCH;
+            } else if ("normal".equalsIgnoreCase(modeSetting)) {
+                mode = JapaneseTokenizer.Mode.NORMAL;
+            } else if ("extended".equalsIgnoreCase(modeSetting)) {
+                mode = JapaneseTokenizer.Mode.EXTENDED;
+            }
+        }
+        return mode;
+    }
+
+    @Override
+    public Tokenizer create() {
+        return new JapaneseTokenizer(userDictionary, discartPunctuation, mode);
+    }
+
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/indices/analysis/KuromojiIndicesAnalysis.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/indices/analysis/KuromojiIndicesAnalysis.java
@ -0,0 +1,131 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.indices.analysis;
+
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.Tokenizer;
+import org.apache.lucene.analysis.ja.*;
+import org.apache.lucene.analysis.ja.JapaneseTokenizer.Mode;
+import org.elasticsearch.common.component.AbstractComponent;
+import org.elasticsearch.common.inject.Inject;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.index.analysis.*;
+
+import java.io.Reader;
+
+/**
+ * Registers indices level analysis components so, if not explicitly configured,
+ * will be shared among all indices.
+ */
+public class KuromojiIndicesAnalysis extends AbstractComponent {
+
+    @Inject
+    public KuromojiIndicesAnalysis(Settings settings,
+                                   IndicesAnalysisService indicesAnalysisService) {
+        super(settings);
+
+        indicesAnalysisService.analyzerProviderFactories().put("kuromoji",
+                new PreBuiltAnalyzerProviderFactory("kuromoji", AnalyzerScope.INDICES,
+                        new JapaneseAnalyzer()));
+
+        indicesAnalysisService.charFilterFactories().put("kuromoji_iteration_mark",
+                new PreBuiltCharFilterFactoryFactory(new CharFilterFactory() {
+                    @Override
+                    public String name() {
+                        return "kuromoji_iteration_mark";
+                    }
+
+                    @Override
+                    public Reader create(Reader reader) {
+                        return new JapaneseIterationMarkCharFilter(reader,
+                                JapaneseIterationMarkCharFilter.NORMALIZE_KANJI_DEFAULT,
+                                JapaneseIterationMarkCharFilter.NORMALIZE_KANA_DEFAULT);
+                    }
+                }));
+
+        indicesAnalysisService.tokenizerFactories().put("kuromoji_tokenizer",
+                new PreBuiltTokenizerFactoryFactory(new TokenizerFactory() {
+                    @Override
+                    public String name() {
+                        return "kuromoji_tokenizer";
+                    }
+
+                    @Override
+                    public Tokenizer create() {
+                        return new JapaneseTokenizer(null, true, Mode.SEARCH);
+                    }
+                }));
+
+        indicesAnalysisService.tokenFilterFactories().put("kuromoji_baseform",
+                new PreBuiltTokenFilterFactoryFactory(new TokenFilterFactory() {
+                    @Override
+                    public String name() {
+                        return "kuromoji_baseform";
+                    }
+
+                    @Override
+                    public TokenStream create(TokenStream tokenStream) {
+                        return new JapaneseBaseFormFilter(tokenStream);
+                    }
+                }));
+
+        indicesAnalysisService.tokenFilterFactories().put(
+                "kuromoji_part_of_speech",
+                new PreBuiltTokenFilterFactoryFactory(new TokenFilterFactory() {
+                    @Override
+                    public String name() {
+                        return "kuromoji_part_of_speech";
+                    }
+
+                    @Override
+                    public TokenStream create(TokenStream tokenStream) {
+                        return new JapanesePartOfSpeechStopFilter(tokenStream, JapaneseAnalyzer
+                                .getDefaultStopTags());
+                    }
+                }));
+
+        indicesAnalysisService.tokenFilterFactories().put(
+                "kuromoji_readingform",
+                new PreBuiltTokenFilterFactoryFactory(new TokenFilterFactory() {
+                    @Override
+                    public String name() {
+                        return "kuromoji_readingform";
+                    }
+
+                    @Override
+                    public TokenStream create(TokenStream tokenStream) {
+                        return new JapaneseReadingFormFilter(tokenStream, true);
+                    }
+                }));
+
+        indicesAnalysisService.tokenFilterFactories().put("kuromoji_stemmer",
+                new PreBuiltTokenFilterFactoryFactory(new TokenFilterFactory() {
+                    @Override
+                    public String name() {
+                        return "kuromoji_stemmer";
+                    }
+
+                    @Override
+                    public TokenStream create(TokenStream tokenStream) {
+                        return new JapaneseKatakanaStemFilter(tokenStream);
+                    }
+                }));
+    }
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/indices/analysis/KuromojiIndicesAnalysisModule.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/indices/analysis/KuromojiIndicesAnalysisModule.java
@ -0,0 +1,32 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.indices.analysis;
+
+import org.elasticsearch.common.inject.AbstractModule;
+
+/**
+ */
+public class KuromojiIndicesAnalysisModule extends AbstractModule {
+
+    @Override
+    protected void configure() {
+        bind(KuromojiIndicesAnalysis.class).asEagerSingleton();
+    }
+}
--- a/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/plugin/analysis/kuromoji/AnalysisKuromojiPlugin.java
+++ b/plugins/analysis-kuromoji/src/main/java/org/elasticsearch/plugin/analysis/kuromoji/AnalysisKuromojiPlugin.java
@ -0,0 +1,62 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.plugin.analysis.kuromoji;
+
+import org.elasticsearch.common.inject.Module;
+import org.elasticsearch.index.analysis.*;
+import org.elasticsearch.indices.analysis.KuromojiIndicesAnalysisModule;
+import org.elasticsearch.plugins.AbstractPlugin;
+
+import java.util.ArrayList;
+import java.util.Collection;
+
+/**
+ *
+ */
+public class AnalysisKuromojiPlugin extends AbstractPlugin {
+
+    @Override
+    public String name() {
+        return "analysis-kuromoji";
+    }
+
+    @Override
+    public String description() {
+        return "Kuromoji analysis support";
+    }
+
+    @Override
+    public Collection<Class<? extends Module>> modules() {
+        Collection<Class<? extends Module>> classes = new ArrayList<>();
+        classes.add(KuromojiIndicesAnalysisModule.class);
+        return classes;
+    }
+
+    public void onModule(AnalysisModule module) {
+        module.addCharFilter("kuromoji_iteration_mark", KuromojiIterationMarkCharFilterFactory.class);
+        module.addAnalyzer("kuromoji", KuromojiAnalyzerProvider.class);
+        module.addTokenizer("kuromoji_tokenizer", KuromojiTokenizerFactory.class);
+        module.addTokenFilter("kuromoji_baseform", KuromojiBaseFormFilterFactory.class);
+        module.addTokenFilter("kuromoji_part_of_speech", KuromojiPartOfSpeechFilterFactory.class);
+        module.addTokenFilter("kuromoji_readingform", KuromojiReadingFormFilterFactory.class);
+        module.addTokenFilter("kuromoji_stemmer", KuromojiKatakanaStemmerFactory.class);
+        module.addTokenFilter("ja_stop", JapaneseStopTokenFilterFactory.class);
+    }
+}
--- a/plugins/analysis-kuromoji/src/main/resources/es-plugin.properties
+++ b/plugins/analysis-kuromoji/src/main/resources/es-plugin.properties
@ -0,0 +1,3 @@
+plugin=org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin
+version=${project.version}
+lucene=${lucene.version}
--- a/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/KuromojiAnalysisTests.java
+++ b/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/KuromojiAnalysisTests.java
@ -0,0 +1,267 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.elasticsearch.index.analysis;
+
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.Tokenizer;
+import org.apache.lucene.analysis.ja.JapaneseAnalyzer;
+import org.apache.lucene.analysis.ja.JapaneseTokenizer;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.elasticsearch.Version;
+import org.elasticsearch.cluster.metadata.IndexMetaData;
+import org.elasticsearch.common.inject.Injector;
+import org.elasticsearch.common.inject.ModulesBuilder;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.common.settings.SettingsModule;
+import org.elasticsearch.env.Environment;
+import org.elasticsearch.env.EnvironmentModule;
+import org.elasticsearch.index.Index;
+import org.elasticsearch.index.IndexNameModule;
+import org.elasticsearch.index.settings.IndexSettingsModule;
+import org.elasticsearch.indices.analysis.IndicesAnalysisModule;
+import org.elasticsearch.indices.analysis.IndicesAnalysisService;
+import org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin;
+import org.elasticsearch.test.ElasticsearchTestCase;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.Reader;
+import java.io.StringReader;
+
+import static org.hamcrest.Matchers.*;
+
+/**
+ */
+public class KuromojiAnalysisTests extends ElasticsearchTestCase {
+
+    @Test
+    public void testDefaultsKuromojiAnalysis() throws IOException {
+        AnalysisService analysisService = createAnalysisService();
+
+        TokenizerFactory tokenizerFactory = analysisService.tokenizer("kuromoji_tokenizer");
+        assertThat(tokenizerFactory, instanceOf(KuromojiTokenizerFactory.class));
+
+        TokenFilterFactory filterFactory = analysisService.tokenFilter("kuromoji_part_of_speech");
+        assertThat(filterFactory, instanceOf(KuromojiPartOfSpeechFilterFactory.class));
+
+        filterFactory = analysisService.tokenFilter("kuromoji_readingform");
+        assertThat(filterFactory, instanceOf(KuromojiReadingFormFilterFactory.class));
+
+        filterFactory = analysisService.tokenFilter("kuromoji_baseform");
+        assertThat(filterFactory, instanceOf(KuromojiBaseFormFilterFactory.class));
+
+        filterFactory = analysisService.tokenFilter("kuromoji_stemmer");
+        assertThat(filterFactory, instanceOf(KuromojiKatakanaStemmerFactory.class));
+
+        filterFactory = analysisService.tokenFilter("ja_stop");
+        assertThat(filterFactory, instanceOf(JapaneseStopTokenFilterFactory.class));
+
+        NamedAnalyzer analyzer = analysisService.analyzer("kuromoji");
+        assertThat(analyzer.analyzer(), instanceOf(JapaneseAnalyzer.class));
+
+        analyzer = analysisService.analyzer("my_analyzer");
+        assertThat(analyzer.analyzer(), instanceOf(CustomAnalyzer.class));
+        assertThat(analyzer.analyzer().tokenStream(null, new StringReader("")), instanceOf(JapaneseTokenizer.class));
+
+        CharFilterFactory  charFilterFactory = analysisService.charFilter("kuromoji_iteration_mark");
+        assertThat(charFilterFactory, instanceOf(KuromojiIterationMarkCharFilterFactory.class));
+
+    }
+
+    @Test
+    public void testBaseFormFilterFactory() throws IOException {
+        AnalysisService analysisService = createAnalysisService();
+        TokenFilterFactory tokenFilter = analysisService.tokenFilter("kuromoji_pos");
+        assertThat(tokenFilter, instanceOf(KuromojiPartOfSpeechFilterFactory.class));
+        String source = "私は制限スピードを超える。";
+        String[] expected = new String[]{"私", "は", "制限", "スピード", "を"};
+        Tokenizer tokenizer = new JapaneseTokenizer(null, true, JapaneseTokenizer.Mode.SEARCH);
+        tokenizer.setReader(new StringReader(source));
+        assertSimpleTSOutput(tokenFilter.create(tokenizer), expected);
+    }
+
+    @Test
+    public void testReadingFormFilterFactory() throws IOException {
+        AnalysisService analysisService = createAnalysisService();
+        TokenFilterFactory tokenFilter = analysisService.tokenFilter("kuromoji_rf");
+        assertThat(tokenFilter, instanceOf(KuromojiReadingFormFilterFactory.class));
+        String source = "今夜はロバート先生と話した";
+        String[] expected_tokens_romaji = new String[]{"kon'ya", "ha", "robato", "sensei", "to", "hanashi", "ta"};
+
+        Tokenizer tokenizer = new JapaneseTokenizer(null, true, JapaneseTokenizer.Mode.SEARCH);
+        tokenizer.setReader(new StringReader(source));
+
+        assertSimpleTSOutput(tokenFilter.create(tokenizer), expected_tokens_romaji);
+
+        tokenizer = new JapaneseTokenizer(null, true, JapaneseTokenizer.Mode.SEARCH);
+        tokenizer.setReader(new StringReader(source));
+        String[] expected_tokens_katakana = new String[]{"コンヤ", "ハ", "ロバート", "センセイ", "ト", "ハナシ", "タ"};
+        tokenFilter = analysisService.tokenFilter("kuromoji_readingform");
+        assertThat(tokenFilter, instanceOf(KuromojiReadingFormFilterFactory.class));
+        assertSimpleTSOutput(tokenFilter.create(tokenizer), expected_tokens_katakana);
+    }
+
+    @Test
+    public void testKatakanaStemFilter() throws IOException {
+        AnalysisService analysisService = createAnalysisService();
+        TokenFilterFactory tokenFilter = analysisService.tokenFilter("kuromoji_stemmer");
+        assertThat(tokenFilter, instanceOf(KuromojiKatakanaStemmerFactory.class));
+        String source = "明後日パーティーに行く予定がある。図書館で資料をコピーしました。";
+
+        Tokenizer tokenizer = new JapaneseTokenizer(null, true, JapaneseTokenizer.Mode.SEARCH);
+        tokenizer.setReader(new StringReader(source));
+
+        // パーティー should be stemmed by default
+        // (min len) コピー should not be stemmed
+        String[] expected_tokens_katakana = new String[]{"明後日", "パーティ", "に", "行く", "予定", "が", "ある", "図書館", "で", "資料", "を", "コピー", "し", "まし", "た"};
+        assertSimpleTSOutput(tokenFilter.create(tokenizer), expected_tokens_katakana);
+
+        tokenFilter = analysisService.tokenFilter("kuromoji_ks");
+        assertThat(tokenFilter, instanceOf(KuromojiKatakanaStemmerFactory.class));
+        tokenizer = new JapaneseTokenizer(null, true, JapaneseTokenizer.Mode.SEARCH);
+        tokenizer.setReader(new StringReader(source));
+
+        // パーティー should not be stemmed since min len == 6
+        // コピー should not be stemmed
+        expected_tokens_katakana = new String[]{"明後日", "パーティー", "に", "行く", "予定", "が", "ある", "図書館", "で", "資料", "を", "コピー", "し", "まし", "た"};
+        assertSimpleTSOutput(tokenFilter.create(tokenizer), expected_tokens_katakana);
+    }
+    @Test
+    public void testIterationMarkCharFilter() throws IOException {
+        AnalysisService analysisService = createAnalysisService();
+        // test only kanji
+        CharFilterFactory charFilterFactory = analysisService.charFilter("kuromoji_im_only_kanji");
+        assertNotNull(charFilterFactory);
+        assertThat(charFilterFactory, instanceOf(KuromojiIterationMarkCharFilterFactory.class));
+
+        String source = "ところゞゝゝ、ジヾが、時々、馬鹿々々しい";
+        String expected = "ところゞゝゝ、ジヾが、時時、馬鹿馬鹿しい";
+
+        assertCharFilterEquals(charFilterFactory.create(new StringReader(source)), expected);
+
+        // test only kana
+
+        charFilterFactory = analysisService.charFilter("kuromoji_im_only_kana");
+        assertNotNull(charFilterFactory);
+        assertThat(charFilterFactory, instanceOf(KuromojiIterationMarkCharFilterFactory.class));
+
+        expected = "ところどころ、ジジが、時々、馬鹿々々しい";
+
+        assertCharFilterEquals(charFilterFactory.create(new StringReader(source)), expected);
+
+        // test default
+
+        charFilterFactory = analysisService.charFilter("kuromoji_im_default");
+        assertNotNull(charFilterFactory);
+        assertThat(charFilterFactory, instanceOf(KuromojiIterationMarkCharFilterFactory.class));
+
+        expected = "ところどころ、ジジが、時時、馬鹿馬鹿しい";
+
+        assertCharFilterEquals(charFilterFactory.create(new StringReader(source)), expected);
+    }
+
+    @Test
+    public void testJapaneseStopFilterFactory() throws IOException {
+        AnalysisService analysisService = createAnalysisService();
+        TokenFilterFactory tokenFilter = analysisService.tokenFilter("ja_stop");
+        assertThat(tokenFilter, instanceOf(JapaneseStopTokenFilterFactory.class));
+        String source = "私は制限スピードを超える。";
+        String[] expected = new String[]{"私", "制限", "超える"};
+        Tokenizer tokenizer = new JapaneseTokenizer(null, true, JapaneseTokenizer.Mode.SEARCH);
+        tokenizer.setReader(new StringReader(source));
+        assertSimpleTSOutput(tokenFilter.create(tokenizer), expected);
+    }
+
+
+    public AnalysisService createAnalysisService() {
+        Settings settings = Settings.settingsBuilder()
+                .put("path.home", createTempDir())
+                .loadFromClasspath("org/elasticsearch/index/analysis/kuromoji_analysis.json")
+                .put(IndexMetaData.SETTING_VERSION_CREATED, Version.CURRENT)
+                .build();
+
+        Index index = new Index("test");
+
+        Injector parentInjector = new ModulesBuilder().add(new SettingsModule(settings),
+                new EnvironmentModule(new Environment(settings)),
+                new IndicesAnalysisModule())
+                .createInjector();
+
+        AnalysisModule analysisModule = new AnalysisModule(settings, parentInjector.getInstance(IndicesAnalysisService.class));
+        new AnalysisKuromojiPlugin().onModule(analysisModule);
+
+        Injector injector = new ModulesBuilder().add(
+                new IndexSettingsModule(index, settings),
+                new IndexNameModule(index),
+                analysisModule)
+                .createChildInjector(parentInjector);
+
+        return injector.getInstance(AnalysisService.class);
+    }
+
+    public static void assertSimpleTSOutput(TokenStream stream,
+                                            String[] expected) throws IOException {
+        stream.reset();
+        CharTermAttribute termAttr = stream.getAttribute(CharTermAttribute.class);
+        assertThat(termAttr, notNullValue());
+        int i = 0;
+        while (stream.incrementToken()) {
+            assertThat(expected.length, greaterThan(i));
+            assertThat( "expected different term at index " + i, expected[i++], equalTo(termAttr.toString()));
+        }
+        assertThat("not all tokens produced", i, equalTo(expected.length));
+    }
+
+    private void assertCharFilterEquals(Reader filtered,
+                                        String expected) throws IOException {
+        String actual = readFully(filtered);
+        assertThat(actual, equalTo(expected));
+    }
+
+    private String readFully(Reader reader) throws IOException {
+        StringBuilder buffer = new StringBuilder();
+        int ch;
+        while((ch = reader.read()) != -1){
+            buffer.append((char)ch);
+        }
+        return buffer.toString();
+    }
+
+    @Test
+    public void testKuromojiUserDict() throws IOException {
+        AnalysisService analysisService = createAnalysisService();
+        TokenizerFactory tokenizerFactory = analysisService.tokenizer("kuromoji_user_dict");
+        String source = "私は制限スピードを超える。";
+        String[] expected = new String[]{"私", "は", "制限スピード", "を", "超える"};
+
+        Tokenizer tokenizer = tokenizerFactory.create();
+        tokenizer.setReader(new StringReader(source));
+        assertSimpleTSOutput(tokenizer, expected);
+    }
+
+    // fix #59
+    @Test
+    public void testKuromojiEmptyUserDict() {
+        AnalysisService analysisService = createAnalysisService();
+        TokenizerFactory tokenizerFactory = analysisService.tokenizer("kuromoji_empty_user_dict");
+        assertThat(tokenizerFactory, instanceOf(KuromojiTokenizerFactory.class));
+    }
+
+}
--- a/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/KuromojiIntegrationTests.java
+++ b/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/KuromojiIntegrationTests.java
@ -0,0 +1,90 @@
+/*
+ * Licensed to Elasticsearch under one or more contributor
+ * license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright
+ * ownership. Elasticsearch licenses this file to you under
+ * the Apache License, Version 2.0 (the "License"); you may
+ * not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.elasticsearch.index.analysis;
+
+import org.elasticsearch.action.admin.indices.analyze.AnalyzeResponse;
+import org.elasticsearch.action.search.SearchResponse;
+import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.common.xcontent.XContentBuilder;
+import org.elasticsearch.index.query.QueryBuilders;
+import org.elasticsearch.plugins.PluginsService;
+import org.elasticsearch.test.ElasticsearchIntegrationTest;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.concurrent.ExecutionException;
+
+import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
+import static org.hamcrest.CoreMatchers.is;
+import static org.hamcrest.CoreMatchers.notNullValue;
+
+@ElasticsearchIntegrationTest.ClusterScope(scope = ElasticsearchIntegrationTest.Scope.SUITE)
+public class KuromojiIntegrationTests extends ElasticsearchIntegrationTest {
+
+    @Override
+    protected Settings nodeSettings(int nodeOrdinal) {
+        return Settings.builder()
+                .put(super.nodeSettings(nodeOrdinal))
+                .put("plugins." + PluginsService.LOAD_PLUGIN_FROM_CLASSPATH, true)
+                .build();
+    }
+
+    @Test
+    public void testKuromojiAnalyzer() throws ExecutionException, InterruptedException {
+        AnalyzeResponse response = client().admin().indices()
+                .prepareAnalyze("JR新宿駅の近くにビールを飲みに行こうか").setAnalyzer("kuromoji")
+                .execute().get();
+
+        String[] expectedTokens = {"jr", "新宿", "駅", "近く", "ビール", "飲む", "行く"};
+
+        assertThat(response, notNullValue());
+        assertThat(response.getTokens().size(), is(7));
+
+        for (int i = 0; i < expectedTokens.length; i++) {
+            assertThat(response.getTokens().get(i).getTerm(), is(expectedTokens[i]));
+        }
+    }
+
+    @Test
+    public void testKuromojiAnalyzerInMapping() throws ExecutionException, InterruptedException, IOException {
+        createIndex("test");
+        ensureGreen("test");
+        final XContentBuilder mapping = jsonBuilder().startObject()
+                .startObject("type")
+                .startObject("properties")
+                .startObject("foo")
+                .field("type", "string")
+                .field("analyzer", "kuromoji")
+                .endObject()
+                .endObject()
+                .endObject()
+                .endObject();
+
+        client().admin().indices().preparePutMapping("test").setType("type").setSource(mapping).get();
+
+        index("test", "type", "1", "foo", "JR新宿駅の近くにビールを飲みに行こうか");
+        refresh();
+
+        SearchResponse response = client().prepareSearch("test").setQuery(
+                QueryBuilders.matchQuery("foo", "jr")
+        ).execute().actionGet();
+
+        assertThat(response.getHits().getTotalHits(), is(1L));
+    }
+}
--- a/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/empty_user_dict.txt
+++ b/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/empty_user_dict.txt
--- a/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/kuromoji_analysis.json
+++ b/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/kuromoji_analysis.json
@ -0,0 +1,62 @@
+{
+    "index":{
+        "analysis":{
+            "filter":{
+                "kuromoji_rf":{
+                    "type":"kuromoji_readingform",
+                    "use_romaji" : "true"
+                },
+                "kuromoji_pos" : {
+                    "type": "kuromoji_part_of_speech",
+                    "stoptags" : ["#  verb-main:", "動詞-自立"]
+                },
+                "kuromoji_ks" : {
+                    "type": "kuromoji_stemmer",
+                    "minimum_length" : 6
+                },
+                "ja_stop" : {
+                    "type": "ja_stop",
+                    "stopwords": ["_japanese_", "スピード"]
+                }
+                
+            },
+
+            "char_filter":{
+                "kuromoji_im_only_kanji":{
+                    "type":"kuromoji_iteration_mark",
+                    "normalize_kanji":true,
+                    "normalize_kana":false
+                },
+                "kuromoji_im_only_kana":{
+                    "type":"kuromoji_iteration_mark",
+                    "normalize_kanji":false,
+                    "normalize_kana":true
+                },
+                "kuromoji_im_default":{
+                    "type":"kuromoji_iteration_mark"
+                }
+            },
+
+            "tokenizer" : {
+                "kuromoji" : {
+                    "type":"kuromoji_tokenizer"
+                },
+                "kuromoji_empty_user_dict" : {
+                    "type":"kuromoji_tokenizer",
+                    "user_dictionary":"org/elasticsearch/index/analysis/empty_user_dict.txt"
+                },
+                "kuromoji_user_dict" : {
+                    "type":"kuromoji_tokenizer",
+                    "user_dictionary":"org/elasticsearch/index/analysis/user_dict.txt"
+                }
+            },
+            "analyzer" : {
+                "my_analyzer" : {
+                    "type" : "custom",
+                    "tokenizer" : "kuromoji_tokenizer"
+                }
+            }
+            
+        }
+    }
+}
--- a/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/user_dict.txt
+++ b/plugins/analysis-kuromoji/src/test/java/org/elasticsearch/index/analysis/user_dict.txt
@ -0,0 +1 @@
+制限スピード,制限スピード,セイゲンスピード,テスト名詞
				`@ -0,0 +1 @@`
				`制限スピード,制限スピード,セイゲンスピード,テスト名詞`