OpenSearch/README.md

Japanese (kuromoji) Analysis for Elasticsearch
==================================

The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.

In order to install the plugin, run: 

```sh
bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.4.1
```

You need to install a version matching your Elasticsearch version:

| elasticsearch |  Kuromoji Analysis Plugin   |   Docs     |  
|---------------|-----------------------------|------------|
| master        |  Build from source          | See below  |
| es-1.x        |  Build from source          | [2.5.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/es-1.x/#version-250-snapshot-for-elasticsearch-1x)  |
|    es-1.4              |     2.4.1         | [2.4.1](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.4.1/#version-241-for-elasticsearch-14)                  |
| es-1.3        |  2.3.0                      | [2.3.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.3.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
| es-1.2        |  2.2.0                      | [2.2.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.2.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
| es-1.1        |  2.1.0                      | [2.1.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.1.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
| es-1.0        |  2.0.0                      | [2.0.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.0.0/#japanese-kuromoji-analysis-for-elasticsearch)  |
| es-0.90       |  1.8.0                      | [1.8.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v1.8.0/#japanese-kuromoji-analysis-for-elasticsearch)  |

To build a `SNAPSHOT` version, you need to build it with Maven:

```bash
mvn clean install
plugin --install analysis-kuromoji \
       --url file:target/releases/elasticsearch-analysis-kuromoji-X.X.X-SNAPSHOT.zip
```

Includes Analyzer, Tokenizer, TokenFilter, CharFilter
-----------------------------------------------

The plugin includes these analyzer and tokenizer, tokenfilter.

| name                    | type        |
|-------------------------|-------------|
| kuromoji_iteration_mark | charfilter  |
| kuromoji                | analyzer    |
| kuromoji_tokenizer      | tokenizer   |
| kuromoji_baseform       | tokenfilter |
| kuromoji_part_of_speech | tokenfilter |
| kuromoji_readingform    | tokenfilter |
| kuromoji_stemmer        | tokenfilter |


Usage
-----

## Analyzer : kuromoji

An analyzer of type `kuromoji`.
This analyzer is the following tokenizer and tokenfilter combination.

* `kuromoji_tokenizer` : Kuromoji Tokenizer
* `kuromoji_baseform` : Kuromoji BasicFormFilter (TokenFilter)
* `kuromoji_part_of_speech` : Kuromoji Part of Speech Stop Filter (TokenFilter)
* `cjk_width` : CJK Width Filter (TokenFilter)
* `stop` : Stop Filter (TokenFilter)
* `kuromoji_stemmer` : Kuromoji Katakana Stemmer Filter(TokenFilter)
* `lowercase` : LowerCase Filter (TokenFilter)

## CharFilter : kuromoji_iteration_mark

A charfilter of type `kuromoji_iteration_mark`.
This charfilter is Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.

The following ar setting that can be set for a `kuromoji_iteration_mark` charfilter type:

| **Setting**     | **Description**                                              | **Default value** |
|:----------------|:-------------------------------------------------------------|:------------------|
| normalize_kanji | indicates whether kanji iteration marks should be normalized | `true`            |
| normalize_kana  | indicates whether kanji iteration marks should be normalized | `true`            |

## Tokenizer : kuromoji_tokenizer

A tokenizer of type `kuromoji_tokenizer`.

The following are settings that can be set for a `kuromoji_tokenizer` tokenizer type:

| **Setting**         | **Description**                                                                                                           | **Default value** |
|:--------------------|:--------------------------------------------------------------------------------------------------------------------------|:------------------|
| mode                | Tokenization mode: this determines how the tokenizer handles compound and unknown words. `normal` and `search`, `extended`| `search`          |
| discard_punctuation | `true` if punctuation tokens should be dropped from the output.                                                           | `true`            |
| user_dict           | set User Dictionary file                                                                                                  |                   |

### Tokenization mode

The mode is three types.

* `normal` : Ordinary segmentation: no decomposition for compounds

* `search` : Segmentation geared towards search: this includes a decompounding process for long nouns, also including the full compound token as a synonym.

* `extended` : Extended mode outputs unigrams for unknown words.

#### Difference tokenization mode outputs

Input text is `関西国際空港` and `アブラカダブラ`.

| **mode**   | `関西国際空港` | `アブラカダブラ` |
|:-----------|:-------------|:-------|
| `normal`   | `関西国際空港` | `アブラカダブラ` |
| `search`   | `関西` `関西国際空港` `国際` `空港` | `アブラカダブラ` |
| `extended` | `関西` `国際` `空港` | `ア` `ブ` `ラ` `カ` `ダ` `ブ` `ラ` |

### User Dictionary

Kuromoji tokenizer use MecCab-IPADIC dictionary by default.
And Kuromoji is added an entry of dictionary to define by user; this is User Dictionary.
User Dictionary entries are defined using the following CSV format:

```
<text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
```

Dictionary Example

```
東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞
```

To use User Dictionary set file path to `user_dict` attribute.
User Dictionary file is placed `ES_HOME/config` directory.

### example

_Example Settings:_

```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "tokenizer" : {
                    "kuromoji_user_dict" : {
                       "type" : "kuromoji_tokenizer",
                       "mode" : "extended",
                       "discard_punctuation" : "false",
                       "user_dictionary" : "userdict_ja.txt"
                    }
                },
                "analyzer" : {
                    "my_analyzer" : {
                        "type" : "custom",
                        "tokenizer" : "kuromoji_user_dict"
                    }
                }

            }
        }
    }
}
'
```

_Example Request using `_analyze` API :_

```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '東京スカイツリー'
```

_Response :_

```json
{
  "tokens" : [ {
    "token" : "東京",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "スカイツリー",
    "start_offset" : 2,
    "end_offset" : 8,
    "type" : "word",
    "position" : 2
  } ]
}
```

## TokenFilter : kuromoji_baseform

A token filter of type `kuromoji_baseform` that replaces term text with BaseFormAttribute.
This acts as a lemmatizer for verbs and adjectives.

### example

_Example Settings:_

```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["kuromoji_baseform"]
                    }
                }
            }
        }
    }
}
'
```

_Example Request using `_analyze` API :_

```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '飲み'
```

_Response :_

```json
{
  "tokens" : [ {
    "token" : "飲む",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  } ]
}
```

## TokenFilter : kuromoji_part_of_speech

A token filter of type `kuromoji_part_of_speech` that removes tokens that match a set of part-of-speech tags.

The following are settings that can be set for a stop token filter type:

| **Setting** | **Description**                                      |
|:------------|:-----------------------------------------------------|
| stoptags    | A list of part-of-speech tags that should be removed |

Note that default setting is stoptags.txt include lucene-analyzer-kuromoji.jar.

### example

_Example Settings:_

```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["my_posfilter"]
                    }
                },
                "filter" : {
                    "my_posfilter" : {
                        "type" : "kuromoji_part_of_speech",
                        "stoptags" : [
                            "助詞-格助詞-一般",
                            "助詞-終助詞"
                        ]
                    }
                }
            }
        }
    }
}
'
```

_Example Request using `_analyze` API :_

```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '寿司がおいしいね'
```

_Response :_

```json
{
  "tokens" : [ {
    "token" : "寿司",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "おいしい",
    "start_offset" : 3,
    "end_offset" : 7,
    "type" : "word",
    "position" : 3
  } ]
}
```

## TokenFilter : kuromoji_readingform

A token filter of type `kuromoji_readingform` that replaces the term attribute with the reading of a token in either katakana or romaji form.
The default reading form is katakana.

The following are settings that can be set for a `kuromoji_readingform` token filter type:

| **Setting** | **Description**                                           | **Default value** |
|:------------|:----------------------------------------------------------|:------------------|
| use_romaji  | `true` if romaji reading form output instead of katakana. | `false`           |

Note that elasticsearch-analysis-kuromoji built-in `kuromoji_readingform` set default `true` to `use_romaji` attribute.

### example

_Example Settings:_

```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "romaji_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["romaji_readingform"]
                    },
                    "katakana_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["katakana_readingform"]
                    }
                },
                "filter" : {
                    "romaji_readingform" : {
                        "type" : "kuromoji_readingform",
                        "use_romaji" : true
                    },
                    "katakana_readingform" : {
                        "type" : "kuromoji_readingform",
                        "use_romaji" : false
                    }
                }
            }
        }
    }
}
'
```

_Example Request using `_analyze` API :_

```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=katakana_analyzer&pretty' -d '寿司'
```

_Response :_

```json
{
  "tokens" : [ {
    "token" : "スシ",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  } ]
}
```

_Example Request using `_analyze` API :_

```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=romaji_analyzer&pretty' -d '寿司'
```

_Response :_

```json
{
  "tokens" : [ {
    "token" : "sushi",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  } ]
}
```

## TokenFilter : kuromoji_stemmer

A token filter of type `kuromoji_stemmer` that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).
Only katakana words longer than a minimum length are stemmed (default is four).

Note that only full-width katakana characters are supported.

The following are settings that can be set for a `kuromoji_stemmer` token filter type:

| **Setting**     | **Description**            | **Default value** |
|:----------------|:---------------------------|:------------------|
| minimum_length  | The minimum length to stem | `4`               |

### example

_Example Settings:_

```sh
curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["my_katakana_stemmer"]
                    }
                },
                "filter" : {
                    "my_katakana_stemmer" : {
                        "type" : "kuromoji_stemmer",
                        "minimum_length" : 4
                    }
                }
            }
        }
    }
}
'
```

_Example Request using `_analyze` API :_

```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d 'コピー'
```

_Response :_

```json
{
  "tokens" : [ {
    "token" : "コピー",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "word",
    "position" : 1
  } ]
}
```

_Example Request using `_analyze` API :_

```sh
curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d 'サーバー'
```

_Response :_

```json
{
  "tokens" : [ {
    "token" : "サーバ",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  } ]
}
```


License
-------

    This software is licensed under the Apache 2 license, quoted below.

    Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>

    Licensed under the Apache License, Version 2.0 (the "License"); you may not
    use this file except in compliance with the License. You may obtain a copy of
    the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
    License for the specific language governing permissions and limitations under
    the License.
Update headers 2014-01-10 17:00:25 -05:00			`Japanese (kuromoji) Analysis for Elasticsearch`
initial commit 2012-04-30 06:42:59 -04:00			`==================================`

			`The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.`

Docs: make the welcome page more obvious Closes #39. 2014-08-22 00:54:19 -04:00			`In order to install the plugin, run:`

			```sh
update documentation with release 2.4.1 2014-11-05 14:22:48 -05:00			`bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.4.1`
Docs: make the welcome page more obvious Closes #39. 2014-08-22 00:54:19 -04:00			```

			`You need to install a version matching your Elasticsearch version:`

			`\| elasticsearch \| Kuromoji Analysis Plugin \| Docs \|`
			`\|---------------\|-----------------------------\|------------\|`
			`\| master \| Build from source \| See below \|`
Create branch es-1.4 for elasticsearch 1.4.0 2014-09-15 07:52:16 -04:00			`\| es-1.x \| Build from source \| [2.5.0-SNAPSHOT](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/es-1.x/#version-250-snapshot-for-elasticsearch-1x) \|`
update documentation with release 2.4.1 2014-11-05 14:22:48 -05:00			`\| es-1.4 \| 2.4.1 \| [2.4.1](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.4.1/#version-241-for-elasticsearch-14) \|`
Docs: make the welcome page more obvious Closes #39. 2014-08-22 00:54:19 -04:00			`\| es-1.3 \| 2.3.0 \| [2.3.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.3.0/#japanese-kuromoji-analysis-for-elasticsearch) \|`
			`\| es-1.2 \| 2.2.0 \| [2.2.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.2.0/#japanese-kuromoji-analysis-for-elasticsearch) \|`
			`\| es-1.1 \| 2.1.0 \| [2.1.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.1.0/#japanese-kuromoji-analysis-for-elasticsearch) \|`
			`\| es-1.0 \| 2.0.0 \| [2.0.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v2.0.0/#japanese-kuromoji-analysis-for-elasticsearch) \|`
			`\| es-0.90 \| 1.8.0 \| [1.8.0](https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/tree/v1.8.0/#japanese-kuromoji-analysis-for-elasticsearch) \|`

			To build a `SNAPSHOT` version, you need to build it with Maven:

			```bash
			`mvn clean install`
			`plugin --install analysis-kuromoji \`
			`--url file:target/releases/elasticsearch-analysis-kuromoji-X.X.X-SNAPSHOT.zip`
			```

			`Includes Analyzer, Tokenizer, TokenFilter, CharFilter`
			`-----------------------------------------------`
Add description and example 2013-10-19 17:05:29 -04:00
			`The plugin includes these analyzer and tokenizer, tokenfilter.`

			`\| name \| type \|`
			`\|-------------------------\|-------------\|`
Add JapaneseIterationMarkCharFilter support Currently, Kuromoji have JapaneseIterationMarkCharFilter. Add IterationMarkCharFilter to analysis-kuromoji. Closes #7. 2013-11-06 21:22:50 -05:00			`\| kuromoji_iteration_mark \| charfilter \|`
Add description and example 2013-10-19 17:05:29 -04:00			`\| kuromoji \| analyzer \|`
			`\| kuromoji_tokenizer \| tokenizer \|`
			`\| kuromoji_baseform \| tokenfilter \|`
			`\| kuromoji_part_of_speech \| tokenfilter \|`
			`\| kuromoji_readingform \| tokenfilter \|`
			`\| kuromoji_stemmer \| tokenfilter \|`


			`Usage`
			`-----`

			`## Analyzer : kuromoji`

			An analyzer of type `kuromoji`.
			`This analyzer is the following tokenizer and tokenfilter combination.`

			* `kuromoji_tokenizer` : Kuromoji Tokenizer
			* `kuromoji_baseform` : Kuromoji BasicFormFilter (TokenFilter)
			* `kuromoji_part_of_speech` : Kuromoji Part of Speech Stop Filter (TokenFilter)
			* `cjk_width` : CJK Width Filter (TokenFilter)
			* `stop` : Stop Filter (TokenFilter)
fix typos in README.md Closes #23 2014-03-03 02:27:15 -05:00			* `kuromoji_stemmer` : Kuromoji Katakana Stemmer Filter(TokenFilter)
Add description and example 2013-10-19 17:05:29 -04:00			* `lowercase` : LowerCase Filter (TokenFilter)

Add JapaneseIterationMarkCharFilter support Currently, Kuromoji have JapaneseIterationMarkCharFilter. Add IterationMarkCharFilter to analysis-kuromoji. Closes #7. 2013-11-06 21:22:50 -05:00			`## CharFilter : kuromoji_iteration_mark`

			A charfilter of type `kuromoji_iteration_mark`.
			`This charfilter is Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.`

			The following ar setting that can be set for a `kuromoji_iteration_mark` charfilter type:

			`\| Setting \| Description \| Default value \|`
			`\|:----------------\|:-------------------------------------------------------------\|:------------------\|`
			\| normalize_kanji \| indicates whether kanji iteration marks should be normalized \| `true` \|
			\| normalize_kana \| indicates whether kanji iteration marks should be normalized \| `true` \|

Add description and example 2013-10-19 17:05:29 -04:00			`## Tokenizer : kuromoji_tokenizer`

			A tokenizer of type `kuromoji_tokenizer`.

			The following are settings that can be set for a `kuromoji_tokenizer` tokenizer type:

			`\| Setting \| Description \| Default value \|`
			`\|:--------------------\|:--------------------------------------------------------------------------------------------------------------------------\|:------------------\|`
			\| mode \| Tokenization mode: this determines how the tokenizer handles compound and unknown words. `normal` and `search`, `extended`\| `search` \|
			\| discard_punctuation \| `true` if punctuation tokens should be dropped from the output. \| `true` \|
			`\| user_dict \| set User Dictionary file \| \|`

			`### Tokenization mode`

			`The mode is three types.`

			* `normal` : Ordinary segmentation: no decomposition for compounds

fix typos in README.md Closes #23 2014-03-03 02:27:15 -05:00			* `search` : Segmentation geared towards search: this includes a decompounding process for long nouns, also including the full compound token as a synonym.
Add description and example 2013-10-19 17:05:29 -04:00
			* `extended` : Extended mode outputs unigrams for unknown words.

			`#### Difference tokenization mode outputs`

			Input text is `関西国際空港` and `アブラカダブラ`.

			\| mode \| `関西国際空港` \| `アブラカダブラ` \|
			`\|:-----------\|:-------------\|:-------\|`
			\| `normal` \| `関西国際空港` \| `アブラカダブラ` \|
			\| `search` \| `関西` `関西国際空港` `国際` `空港` \| `アブラカダブラ` \|
			\| `extended` \| `関西` `国際` `空港` \| `ア` `ブ` `ラ` `カ` `ダ` `ブ` `ラ` \|

			`### User Dictionary`

			`Kuromoji tokenizer use MecCab-IPADIC dictionary by default.`
			`And Kuromoji is added an entry of dictionary to define by user; this is User Dictionary.`
			`User Dictionary entries are defined using the following CSV format:`

			```
			`<text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>`
			```

			`Dictionary Example`

			```
			`東京スカイツリー,東京スカイツリー,トウキョウスカイツリー,カスタム名詞`
			```

			To use User Dictionary set file path to `user_dict` attribute.
			User Dictionary file is placed `ES_HOME/config` directory.

			`### example`

Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`_Example Settings:_`

prepare release elasticsearch-analysis-kuromoji-1.7.0 2013-12-20 02:10:04 -05:00			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'`
			`{`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`"settings": {`
			`"index":{`
			`"analysis":{`
			`"tokenizer" : {`
			`"kuromoji_user_dict" : {`
			`"type" : "kuromoji_tokenizer",`
			`"mode" : "extended",`
			`"discard_punctuation" : "false",`
			`"user_dictionary" : "userdict_ja.txt"`
			`}`
			`},`
			`"analyzer" : {`
			`"my_analyzer" : {`
			`"type" : "custom",`
			`"tokenizer" : "kuromoji_user_dict"`
			`}`
Add description and example 2013-10-19 17:05:29 -04:00			`}`

Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`}`
Add description and example 2013-10-19 17:05:29 -04:00			`}`
			`}`
			`}`
			`'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```
Add description and example 2013-10-19 17:05:29 -04:00
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			_Example Request using `_analyze` API :_

			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '東京スカイツリー'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			`_Response :_`

			```json
Add description and example 2013-10-19 17:05:29 -04:00			`{`
			`"tokens" : [ {`
			`"token" : "東京",`
			`"start_offset" : 0,`
			`"end_offset" : 2,`
			`"type" : "word",`
			`"position" : 1`
			`}, {`
			`"token" : "スカイツリー",`
			`"start_offset" : 2,`
			`"end_offset" : 8,`
			`"type" : "word",`
			`"position" : 2`
			`} ]`
			`}`
			```

			`## TokenFilter : kuromoji_baseform`

fix typos in README.md Closes #23 2014-03-03 02:27:15 -05:00			A token filter of type `kuromoji_baseform` that replaces term text with BaseFormAttribute.
Add description and example 2013-10-19 17:05:29 -04:00			`This acts as a lemmatizer for verbs and adjectives.`

			`### example`

Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`_Example Settings:_`

prepare release elasticsearch-analysis-kuromoji-1.7.0 2013-12-20 02:10:04 -05:00			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'`
			`{`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`"settings": {`
			`"index":{`
			`"analysis":{`
			`"analyzer" : {`
			`"my_analyzer" : {`
			`"tokenizer" : "kuromoji_tokenizer",`
			`"filter" : ["kuromoji_baseform"]`
			`}`
Add description and example 2013-10-19 17:05:29 -04:00			`}`
			`}`
			`}`
			`}`
			`}`
			`'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			_Example Request using `_analyze` API :_
Add description and example 2013-10-19 17:05:29 -04:00
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '飲み'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			`_Response :_`

			```json
Add description and example 2013-10-19 17:05:29 -04:00			`{`
			`"tokens" : [ {`
			`"token" : "飲む",`
			`"start_offset" : 0,`
			`"end_offset" : 2,`
			`"type" : "word",`
			`"position" : 1`
			`} ]`
			`}`
			```

			`## TokenFilter : kuromoji_part_of_speech`

			A token filter of type `kuromoji_part_of_speech` that removes tokens that match a set of part-of-speech tags.

			`The following are settings that can be set for a stop token filter type:`

			`\| Setting \| Description \|`
			`\|:------------\|:-----------------------------------------------------\|`
			`\| stoptags \| A list of part-of-speech tags that should be removed \|`

fix typos in README.md Closes #23 2014-03-03 02:27:15 -05:00			`Note that default setting is stoptags.txt include lucene-analyzer-kuromoji.jar.`
Add description and example 2013-10-19 17:05:29 -04:00
			`### example`

Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`_Example Settings:_`

prepare release elasticsearch-analysis-kuromoji-1.7.0 2013-12-20 02:10:04 -05:00			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'`
			`{`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`"settings": {`
			`"index":{`
			`"analysis":{`
			`"analyzer" : {`
			`"my_analyzer" : {`
			`"tokenizer" : "kuromoji_tokenizer",`
			`"filter" : ["my_posfilter"]`
			`}`
			`},`
			`"filter" : {`
			`"my_posfilter" : {`
			`"type" : "kuromoji_part_of_speech",`
			`"stoptags" : [`
			`"助詞-格助詞-一般",`
			`"助詞-終助詞"`
			`]`
			`}`
Add description and example 2013-10-19 17:05:29 -04:00			`}`
			`}`
			`}`
			`}`
			`}`
			`'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```
Add description and example 2013-10-19 17:05:29 -04:00
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			_Example Request using `_analyze` API :_

			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d '寿司がおいしいね'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			`_Response :_`

			```json
Add description and example 2013-10-19 17:05:29 -04:00			`{`
			`"tokens" : [ {`
			`"token" : "寿司",`
			`"start_offset" : 0,`
			`"end_offset" : 2,`
			`"type" : "word",`
			`"position" : 1`
			`}, {`
			`"token" : "おいしい",`
			`"start_offset" : 3,`
			`"end_offset" : 7,`
			`"type" : "word",`
			`"position" : 3`
			`} ]`
			`}`
			```

			`## TokenFilter : kuromoji_readingform`

			A token filter of type `kuromoji_readingform` that replaces the term attribute with the reading of a token in either katakana or romaji form.
			`The default reading form is katakana.`

			The following are settings that can be set for a `kuromoji_readingform` token filter type:

			`\| Setting \| Description \| Default value \|`
			`\|:------------\|:----------------------------------------------------------\|:------------------\|`
			\| use_romaji \| `true` if romaji reading form output instead of katakana. \| `false` \|

fix typos in README.md Closes #23 2014-03-03 02:27:15 -05:00			Note that elasticsearch-analysis-kuromoji built-in `kuromoji_readingform` set default `true` to `use_romaji` attribute.
Add description and example 2013-10-19 17:05:29 -04:00
			`### example`

Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`_Example Settings:_`

prepare release elasticsearch-analysis-kuromoji-1.7.0 2013-12-20 02:10:04 -05:00			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'`
			`{`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`"settings": {`
			`"index":{`
			`"analysis":{`
			`"analyzer" : {`
			`"romaji_analyzer" : {`
			`"tokenizer" : "kuromoji_tokenizer",`
			`"filter" : ["romaji_readingform"]`
			`},`
			`"katakana_analyzer" : {`
			`"tokenizer" : "kuromoji_tokenizer",`
			`"filter" : ["katakana_readingform"]`
			`}`
Add description and example 2013-10-19 17:05:29 -04:00			`},`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`"filter" : {`
			`"romaji_readingform" : {`
			`"type" : "kuromoji_readingform",`
			`"use_romaji" : true`
			`},`
			`"katakana_readingform" : {`
			`"type" : "kuromoji_readingform",`
			`"use_romaji" : false`
			`}`
Add description and example 2013-10-19 17:05:29 -04:00			`}`
			`}`
			`}`
			`}`
			`}`
			`'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```
Add description and example 2013-10-19 17:05:29 -04:00
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			_Example Request using `_analyze` API :_

			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=katakana_analyzer&pretty' -d '寿司'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			`_Response :_`

			```json
Add description and example 2013-10-19 17:05:29 -04:00			`{`
			`"tokens" : [ {`
			`"token" : "スシ",`
			`"start_offset" : 0,`
			`"end_offset" : 2,`
			`"type" : "word",`
			`"position" : 1`
			`} ]`
			`}`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			_Example Request using `_analyze` API :_
Add description and example 2013-10-19 17:05:29 -04:00
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=romaji_analyzer&pretty' -d '寿司'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			`_Response :_`

			```json
Add description and example 2013-10-19 17:05:29 -04:00			`{`
			`"tokens" : [ {`
			`"token" : "sushi",`
			`"start_offset" : 0,`
			`"end_offset" : 2,`
			`"type" : "word",`
			`"position" : 1`
			`} ]`
			`}`
			```

			`## TokenFilter : kuromoji_stemmer`

			A token filter of type `kuromoji_stemmer` that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).
			`Only katakana words longer than a minimum length are stemmed (default is four).`

			`Note that only full-width katakana characters are supported.`

			The following are settings that can be set for a `kuromoji_stemmer` token filter type:

			`\| Setting \| Description \| Default value \|`
			`\|:----------------\|:---------------------------\|:------------------\|`
			\| minimum_length \| The minimum length to stem \| `4` \|

			`### example`

Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`_Example Settings:_`

prepare release elasticsearch-analysis-kuromoji-1.7.0 2013-12-20 02:10:04 -05:00			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'`
			`{`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			`"settings": {`
			`"index":{`
			`"analysis":{`
			`"analyzer" : {`
			`"my_analyzer" : {`
			`"tokenizer" : "kuromoji_tokenizer",`
			`"filter" : ["my_katakana_stemmer"]`
			`}`
			`},`
			`"filter" : {`
			`"my_katakana_stemmer" : {`
			`"type" : "kuromoji_stemmer",`
			`"minimum_length" : 4`
			`}`
Add description and example 2013-10-19 17:05:29 -04:00			`}`
			`}`
			`}`
			`}`
			`}`
			`'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			_Example Request using `_analyze` API :_
Add description and example 2013-10-19 17:05:29 -04:00
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d 'コピー'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			`_Response :_`

			```json
Add description and example 2013-10-19 17:05:29 -04:00			`{`
			`"tokens" : [ {`
			`"token" : "コピー",`
			`"start_offset" : 0,`
			`"end_offset" : 3,`
			`"type" : "word",`
			`"position" : 1`
			`} ]`
			`}`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```
Add description and example 2013-10-19 17:05:29 -04:00
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			_Example Request using `_analyze` API :_

			```sh
Add description and example 2013-10-19 17:05:29 -04:00			`curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&pretty' -d 'サーバー'`
Docs: revise examples (cherry picked from commit 14ac3b0) 2014-12-19 02:25:51 -05:00			```

			`_Response :_`

			```json
Add description and example 2013-10-19 17:05:29 -04:00			`{`
			`"tokens" : [ {`
			`"token" : "サーバ",`
			`"start_offset" : 0,`
			`"end_offset" : 4,`
			`"type" : "word",`
			`"position" : 1`
			`} ]`
			`}`
			```


md format.... 2012-06-10 15:56:19 -04:00			`License`
			`-------`
cleanup the additional analysis components 2012-06-10 15:51:56 -04:00
md format.... 2012-06-10 15:56:19 -04:00			`This software is licensed under the Apache 2 license, quoted below.`
cleanup the additional analysis components 2012-06-10 15:51:56 -04:00
Update headers 2014-01-10 17:00:25 -05:00			`Copyright 2009-2014 Elasticsearch <http://www.elasticsearch.org>`
cleanup the additional analysis components 2012-06-10 15:51:56 -04:00
md format.... 2012-06-10 15:56:19 -04:00			`Licensed under the Apache License, Version 2.0 (the "License"); you may not`
			`use this file except in compliance with the License. You may obtain a copy of`
			`the License at`
cleanup the additional analysis components 2012-06-10 15:51:56 -04:00
md format.... 2012-06-10 15:56:19 -04:00			`http://www.apache.org/licenses/LICENSE-2.0`
cleanup the additional analysis components 2012-06-10 15:51:56 -04:00
md format.... 2012-06-10 15:56:19 -04:00			`Unless required by applicable law or agreed to in writing, software`
			`distributed under the License is distributed on an "AS IS" BASIS, WITHOUT`
			`WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the`
			`License for the specific language governing permissions and limitations under`
			`the License.`