OpenSearch/docs/plugins/analysis-kuromoji.asciidoc

[[analysis-kuromoji]]
=== Japanese (kuromoji) Analysis Plugin

The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis
module into {es}.

:plugin_name: analysis-kuromoji
include::install_remove.asciidoc[]

[[analysis-kuromoji-analyzer]]
==== `kuromoji` analyzer

The `kuromoji` analyzer consists of the following tokenizer and token filters:

* <<analysis-kuromoji-tokenizer,`kuromoji_tokenizer`>>
* <<analysis-kuromoji-baseform,`kuromoji_baseform`>> token filter
* <<analysis-kuromoji-speech,`kuromoji_part_of_speech`>> token filter
* {ref}/analysis-cjk-width-tokenfilter.html[`cjk_width`] token filter
* <<analysis-kuromoji-stop,`ja_stop`>> token filter
* <<analysis-kuromoji-stemmer,`kuromoji_stemmer`>> token filter
* {ref}/analysis-lowercase-tokenfilter.html[`lowercase`] token filter

It supports the `mode` and `user_dictionary` settings from
<<analysis-kuromoji-tokenizer,`kuromoji_tokenizer`>>.

[discrete]
[[kuromoji-analyzer-normalize-full-width-characters]]
==== Normalize full-width characters

The `kuromoji_tokenizer` tokenizer uses characters from the MeCab-IPADIC
dictionary to split text into tokens. The dictionary includes some full-width
characters, such as `ｏ` and `ｆ`. If a text contains full-width characters,
the tokenizer can produce unexpected tokens.

For example, the `kuromoji_tokenizer` tokenizer converts the text
`Ｃｕｌｔｕｒｅ　ｏｆ　Ｊａｐａｎ` to the tokens `[ culture, o, f, japan ]` by
default. However, a user may expect the tokenizer to instead produce 
`[ culture, of, japan ]`.

To avoid this, add the <<analysis-icu-normalization-charfilter,`icu_normalizer`
character filter>> to a custom analyzer based on the `kuromoji` analyzer. The
`icu_normalizer` character filter converts full-width characters to their normal
equivalents.

First, duplicate the `kuromoji` analyzer to create the basis for a custom
analyzer. Then add the `icu_normalizer` character filter to the custom analyzer.
For example:

[source,console]
----
PUT index-00001
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "kuromoji_normalize": {                 <1>
            "char_filter": [
              "icu_normalizer"                    <2>
            ],
            "tokenizer": "kuromoji_tokenizer",
            "filter": [
              "kuromoji_baseform",
              "kuromoji_part_of_speech",
              "cjk_width",
              "ja_stop",
              "kuromoji_stemmer",
              "lowercase"
            ]
          }
        }
      }
    }
  }
}
----
<1> Creates a new custom analyzer, `kuromoji_normalize`, based on the `kuromoji`
analyzer.
<2> Adds the `icu_normalizer` character filter to the analyzer.


[[analysis-kuromoji-charfilter]]
==== `kuromoji_iteration_mark` character filter

The `kuromoji_iteration_mark` normalizes Japanese horizontal iteration marks
(_odoriji_) to their expanded form. It accepts the following settings:

`normalize_kanji`::

    Indicates whether kanji iteration marks should be normalize. Defaults to `true`.

`normalize_kana`::

    Indicates whether kana iteration marks should be normalized. Defaults to `true`


[[analysis-kuromoji-tokenizer]]
==== `kuromoji_tokenizer`

The `kuromoji_tokenizer` accepts the following settings:

`mode`::
+
--

The tokenization mode determines how the tokenizer handles compound and
unknown words.  It can be set to:

`normal`::

    Normal segmentation, no decomposition for compounds. Example output:

    関西国際空港
    アブラカダブラ

`search`::

    Segmentation geared towards search. This includes a decompounding process
    for long nouns, also including the full compound token as a synonym.
    Example output:

    関西, 関西国際空港, 国際, 空港
    アブラカダブラ

`extended`::

    Extended mode outputs unigrams for unknown words. Example output:

    関西, 関西国際空港, 国際, 空港
    ア, ブ, ラ, カ, ダ, ブ, ラ
--

`discard_punctuation`::

    Whether punctuation should be discarded from the output. Defaults to `true`.

`user_dictionary`::
+
--
The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A `user_dictionary`
may be appended to the default dictionary. The dictionary should have the following CSV format:

[source,csv]
-----------------------
<text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
-----------------------
--

As a demonstration of how the user dictionary can be used, save the following
dictionary to `$ES_HOME/config/userdict_ja.txt`:

[source,csv]
-----------------------
東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞
-----------------------

--

You can also inline the rules directly in the tokenizer definition using
the `user_dictionary_rules` option:

[source,console]
--------------------------------------------------
PUT nori_sample
{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "kuromoji_user_dict": {
            "type": "kuromoji_tokenizer",
            "mode": "extended",
            "user_dictionary_rules": ["東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞"]
          }
        },
        "analyzer": {
          "my_analyzer": {
            "type": "custom",
            "tokenizer": "kuromoji_user_dict"
          }
        }
      }
    }
  }
}
--------------------------------------------------
--

`nbest_cost`/`nbest_examples`::
+
--
Additional expert user parameters `nbest_cost` and `nbest_examples` can be used
to include additional tokens that most likely according to the statistical model.
If both parameters are used, the largest number of both is applied.

`nbest_cost`::

    The `nbest_cost` parameter specifies an additional Viterbi cost.
    The KuromojiTokenizer will include all tokens in Viterbi paths that are
    within the nbest_cost value of the best path.

`nbest_examples`::

    The `nbest_examples` can be used to find a `nbest_cost` value based on examples.
    For example, a value of /箱根山-箱根/成田空港-成田/ indicates that in the texts,
    箱根山 (Mt. Hakone) and 成田空港 (Narita Airport) we'd like a cost that gives is us
    箱根 (Hakone) and 成田 (Narita).
--


Then create an analyzer as follows:

[source,console]
--------------------------------------------------
PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "kuromoji_user_dict": {
            "type": "kuromoji_tokenizer",
            "mode": "extended",
            "discard_punctuation": "false",
            "user_dictionary": "userdict_ja.txt"
          }
        },
        "analyzer": {
          "my_analyzer": {
            "type": "custom",
            "tokenizer": "kuromoji_user_dict"
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "東京スカイツリー"
}
--------------------------------------------------

The above `analyze` request returns the following:

[source,console-result]
--------------------------------------------------
{
  "tokens" : [ {
    "token" : "東京",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "スカイツリー",
    "start_offset" : 2,
    "end_offset" : 8,
    "type" : "word",
    "position" : 1
  } ]
}
--------------------------------------------------

`discard_compound_token`::
    Whether original compound tokens should be discarded from the output with `search` mode. Defaults to `false`.
    Example output with `search` or `extended` mode and this option `true`:

    関西, 国際, 空港

NOTE: If a text contains full-width characters, the `kuromoji_tokenizer`
tokenizer can produce unexpected tokens. To avoid this, add the
<<analysis-icu-normalization-charfilter,`icu_normalizer` character filter>> to
your analyzer. See <<kuromoji-analyzer-normalize-full-width-characters>>.


[[analysis-kuromoji-baseform]]
==== `kuromoji_baseform` token filter

The `kuromoji_baseform` token filter replaces terms with their
BaseFormAttribute. This acts as a lemmatizer for verbs and adjectives. Example:

[source,console]
--------------------------------------------------
PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "kuromoji_tokenizer",
            "filter": [
              "kuromoji_baseform"
            ]
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "飲み"
}
--------------------------------------------------

which responds with:

[source,console-result]
--------------------------------------------------
{
  "tokens" : [ {
    "token" : "飲む",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 0
  } ]
}
--------------------------------------------------


[[analysis-kuromoji-speech]]
==== `kuromoji_part_of_speech` token filter

The `kuromoji_part_of_speech` token filter removes tokens that match a set of
part-of-speech tags. It accepts the following setting:

`stoptags`::

    An array of part-of-speech tags that should be removed. It defaults to the
    `stoptags.txt` file embedded in the `lucene-analyzer-kuromoji.jar`.

For example:

[source,console]
--------------------------------------------------
PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "kuromoji_tokenizer",
            "filter": [
              "my_posfilter"
            ]
          }
        },
        "filter": {
          "my_posfilter": {
            "type": "kuromoji_part_of_speech",
            "stoptags": [
              "助詞-格助詞-一般",
              "助詞-終助詞"
            ]
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "寿司がおいしいね"
}
--------------------------------------------------

Which responds with:

[source,console-result]
--------------------------------------------------
{
  "tokens" : [ {
    "token" : "寿司",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "おいしい",
    "start_offset" : 3,
    "end_offset" : 7,
    "type" : "word",
    "position" : 2
  } ]
}
--------------------------------------------------


[[analysis-kuromoji-readingform]]
==== `kuromoji_readingform` token filter

The `kuromoji_readingform` token filter replaces the token with its reading
form in either katakana or romaji. It accepts the following setting:

`use_romaji`::

    Whether romaji reading form should be output instead of katakana.  Defaults to `false`.

When using the pre-defined `kuromoji_readingform` filter, `use_romaji` is set
to `true`. The default when defining a custom `kuromoji_readingform`, however,
is `false`.  The only reason to use the custom form is if you need the
katakana reading form:

[source,console]
--------------------------------------------------
PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "romaji_analyzer": {
            "tokenizer": "kuromoji_tokenizer",
            "filter": [ "romaji_readingform" ]
          },
          "katakana_analyzer": {
            "tokenizer": "kuromoji_tokenizer",
            "filter": [ "katakana_readingform" ]
          }
        },
        "filter": {
          "romaji_readingform": {
            "type": "kuromoji_readingform",
            "use_romaji": true
          },
          "katakana_readingform": {
            "type": "kuromoji_readingform",
            "use_romaji": false
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "katakana_analyzer",
  "text": "寿司" <1>
}

GET kuromoji_sample/_analyze
{
  "analyzer": "romaji_analyzer",
  "text": "寿司" <2>
}
--------------------------------------------------

<1> Returns `スシ`.
<2> Returns `sushi`.

[[analysis-kuromoji-stemmer]]
==== `kuromoji_stemmer` token filter

The `kuromoji_stemmer` token filter normalizes common katakana spelling
variations ending in a long sound character by removing this character
(U+30FC). Only full-width katakana characters are supported.

This token filter accepts the following setting:

`minimum_length`::

    Katakana words shorter than the `minimum length` are not stemmed (default
    is `4`).


[source,console]
--------------------------------------------------
PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "kuromoji_tokenizer",
            "filter": [
              "my_katakana_stemmer"
            ]
          }
        },
        "filter": {
          "my_katakana_stemmer": {
            "type": "kuromoji_stemmer",
            "minimum_length": 4
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "コピー" <1>
}

GET kuromoji_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "サーバー" <2>
}
--------------------------------------------------

<1> Returns `コピー`.
<2> Return `サーバ`.


[[analysis-kuromoji-stop]]
==== `ja_stop` token filter

The `ja_stop` token filter filters out Japanese stopwords (`_japanese_`), and
any other custom stopwords specified by the user. This filter only supports
the predefined `_japanese_` stopwords list.  If you want to use a different
predefined list, then use the
{ref}/analysis-stop-tokenfilter.html[`stop` token filter] instead.

[source,console]
--------------------------------------------------
PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "analyzer_with_ja_stop": {
            "tokenizer": "kuromoji_tokenizer",
            "filter": [
              "ja_stop"
            ]
          }
        },
        "filter": {
          "ja_stop": {
            "type": "ja_stop",
            "stopwords": [
              "_japanese_",
              "ストップ"
            ]
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "analyzer_with_ja_stop",
  "text": "ストップは消える"
}
--------------------------------------------------

The above request returns:

[source,console-result]
--------------------------------------------------
{
  "tokens" : [ {
    "token" : "消える",
    "start_offset" : 5,
    "end_offset" : 8,
    "type" : "word",
    "position" : 2
  } ]
}
--------------------------------------------------


[[analysis-kuromoji-number]]
==== `kuromoji_number` token filter

The `kuromoji_number` token filter normalizes Japanese numbers (kansūji)
to regular Arabic decimal numbers in half-width characters. For example:

[source,console]
--------------------------------------------------
PUT kuromoji_sample
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "kuromoji_tokenizer",
            "filter": [
              "kuromoji_number"
            ]
          }
        }
      }
    }
  }
}

GET kuromoji_sample/_analyze
{
  "analyzer": "my_analyzer",
  "text": "一〇〇〇"
}
--------------------------------------------------

Which results in:

[source,console-result]
--------------------------------------------------
{
  "tokens" : [ {
    "token" : "1000",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 0
  } ]
}
--------------------------------------------------
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								[[analysis-kuromoji]]
 								=== Japanese (kuromoji) Analysis Plugin
 								The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis
-												[DOCS] Add full-width char section to kuromoji analyzer docs (#60317) (#60324)


											
										
										
											2020-07-28 14:17:23 -04:00
+								module into {es}.
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
-												Added "release-state" support to plugin docs

											
										
										
											2017-04-20 09:01:37 -04:00
+								:plugin_name: analysis-kuromoji
 								include::install_remove.asciidoc[]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
 								[[analysis-kuromoji-analyzer]]
 								==== `kuromoji` analyzer
 								The `kuromoji` analyzer consists of the following tokenizer and token filters:
 								* <<analysis-kuromoji-tokenizer,`kuromoji_tokenizer`>>
 								* <<analysis-kuromoji-baseform,`kuromoji_baseform`>> token filter
 								* <<analysis-kuromoji-speech,`kuromoji_part_of_speech`>> token filter
 								* {ref}/analysis-cjk-width-tokenfilter.html[`cjk_width`] token filter
 								* <<analysis-kuromoji-stop,`ja_stop`>> token filter
 								* <<analysis-kuromoji-stemmer,`kuromoji_stemmer`>> token filter
 								* {ref}/analysis-lowercase-tokenfilter.html[`lowercase`] token filter
 								It supports the `mode` and `user_dictionary` settings from
 								<<analysis-kuromoji-tokenizer,`kuromoji_tokenizer`>>.
-												[DOCS] Add full-width char section to kuromoji analyzer docs (#60317) (#60324)


											
										
										
											2020-07-28 14:17:23 -04:00
+								[discrete]
 								[[kuromoji-analyzer-normalize-full-width-characters]]
 								==== Normalize full-width characters
 								The `kuromoji_tokenizer` tokenizer uses characters from the MeCab-IPADIC
 								dictionary to split text into tokens. The dictionary includes some full-width
 								characters, such as `ｏ` and `ｆ`. If a text contains full-width characters,
 								the tokenizer can produce unexpected tokens.
 								For example, the `kuromoji_tokenizer` tokenizer converts the text
 								`Ｃｕｌｔｕｒｅ　ｏｆ　Ｊａｐａｎ` to the tokens `[ culture, o, f, japan ]` by
 								default. However, a user may expect the tokenizer to instead produce
 								`[ culture, of, japan ]`.
 								To avoid this, add the <<analysis-icu-normalization-charfilter,`icu_normalizer`
 								character filter>> to a custom analyzer based on the `kuromoji` analyzer. The
 								`icu_normalizer` character filter converts full-width characters to their normal
 								equivalents.
 								First, duplicate the `kuromoji` analyzer to create the basis for a custom
 								analyzer. Then add the `icu_normalizer` character filter to the custom analyzer.
 								For example:
 								[source,console]
 								----
 								PUT index-00001
 								{
 								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "analyzer": {
 								          "kuromoji_normalize": {                 <1>
 								            "char_filter": [
 								              "icu_normalizer"                    <2>
 								            ],
 								            "tokenizer": "kuromoji_tokenizer",
 								            "filter": [
 								              "kuromoji_baseform",
 								              "kuromoji_part_of_speech",
 								              "cjk_width",
 								              "ja_stop",
 								              "kuromoji_stemmer",
 								              "lowercase"
 								            ]
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
 								----
 								<1> Creates a new custom analyzer, `kuromoji_normalize`, based on the `kuromoji`
 								analyzer.
 								<2> Adds the `icu_normalizer` character filter to the analyzer.
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								[[analysis-kuromoji-charfilter]]
 								==== `kuromoji_iteration_mark` character filter
 								The `kuromoji_iteration_mark` normalizes Japanese horizontal iteration marks
 								(_odoriji_) to their expanded form. It accepts the following settings:
 								`normalize_kanji`::
 								    Indicates whether kanji iteration marks should be normalize. Defaults to `true`.
 								`normalize_kana`::
 								    Indicates whether kana iteration marks should be normalized. Defaults to `true`
 								[[analysis-kuromoji-tokenizer]]
 								==== `kuromoji_tokenizer`
 								The `kuromoji_tokenizer` accepts the following settings:
 								`mode`::
 								+
 								--
 								The tokenization mode determines how the tokenizer handles compound and
 								unknown words.  It can be set to:
 								`normal`::
 								    Normal segmentation, no decomposition for compounds. Example output:
 								    関西国際空港
 								    アブラカダブラ
 								`search`::
 								    Segmentation geared towards search. This includes a decompounding process
 								    for long nouns, also including the full compound token as a synonym.
 								    Example output:
 								    関西, 関西国際空港, 国際, 空港
 								    アブラカダブラ
 								`extended`::
 								    Extended mode outputs unigrams for unknown words. Example output:
-												Expose discard_compound_token option to kuromoji_tokenizer (#57421)

This commit exposes the new Lucene option `discard_compound_token` to the Elasticsearch Kuromoji plugin.
											
										
										
											2020-06-05 09:33:31 -04:00
+								    関西, 関西国際空港, 国際, 空港
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								    ア, ブ, ラ, カ, ダ, ブ, ラ
 								--
 								`discard_punctuation`::
 								    Whether punctuation should be discarded from the output. Defaults to `true`.
 								`user_dictionary`::
 								+
 								--
 								The Kuromoji tokenizer uses the MeCab-IPADIC dictionary by default. A `user_dictionary`
 								may be appended to the default dictionary. The dictionary should have the following CSV format:
 								[source,csv]
 								-----------------------
 								<text>,<token 1> ... <token n>,<reading 1> ... <reading n>,<part-of-speech tag>
 								-----------------------
 								--
 								As a demonstration of how the user dictionary can be used, save the following
 								dictionary to `$ES_HOME/config/userdict_ja.txt`:
 								[source,csv]
 								-----------------------
 								東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞
 								-----------------------
-												Add support for inlined user dictionary in the Kuromoji plugin (#45489)

This change adds a new option called user_dictionary_rules to
Kuromoji's tokenizer. It can be used to set additional tokenization rules
to the Japanese tokenizer directly in the settings (instead of using a file).
This commit also adds a check that no rules are duplicated since this is not allowed
in the UserDictionary.

Closes #25343
											
										
										
											2019-08-20 09:06:01 -04:00
+								--
 								You can also inline the rules directly in the tokenizer definition using
 								the `user_dictionary_rules` option:
-												[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502)


											
										
										
											2019-09-09 13:38:14 -04:00
+								[source,console]
-												Add support for inlined user dictionary in the Kuromoji plugin (#45489)

This change adds a new option called user_dictionary_rules to
Kuromoji's tokenizer. It can be used to set additional tokenization rules
to the Japanese tokenizer directly in the settings (instead of using a file).
This commit also adds a check that no rules are duplicated since this is not allowed
in the UserDictionary.

Closes #25343
											
										
										
											2019-08-20 09:06:01 -04:00
+								--------------------------------------------------
 								PUT nori_sample
 								{
 								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "tokenizer": {
 								          "kuromoji_user_dict": {
 								            "type": "kuromoji_tokenizer",
 								            "mode": "extended",
 								            "user_dictionary_rules": ["東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞"]
 								          }
 								        },
 								        "analyzer": {
 								          "my_analyzer": {
 								            "type": "custom",
 								            "tokenizer": "kuromoji_user_dict"
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------
 								--
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
+								`nbest_cost`/`nbest_examples`::
 								+
 								--
 								Additional expert user parameters `nbest_cost` and `nbest_examples` can be used
 								to include additional tokens that most likely according to the statistical model.
 								If both parameters are used, the largest number of both is applied.
 								`nbest_cost`::
 								    The `nbest_cost` parameter specifies an additional Viterbi cost.
 								    The KuromojiTokenizer will include all tokens in Viterbi paths that are
 								    within the nbest_cost value of the best path.
 								`nbest_examples`::
 								    The `nbest_examples` can be used to find a `nbest_cost` value based on examples.
 								    For example, a value of /箱根山-箱根/成田空港-成田/ indicates that in the texts,
 								    箱根山 (Mt. Hakone) and 成田空港 (Narita Airport) we'd like a cost that gives is us
 								    箱根 (Hakone) and 成田 (Narita).
 								--
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								Then create an analyzer as follows:
-												[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502)


											
										
										
											2019-09-09 13:38:14 -04:00
+								[source,console]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
-												Remove `include_type_name` in asciidoc where possible (#37568)

The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
											
										
										
											2019-01-18 03:34:11 -05:00
+								PUT kuromoji_sample
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								{
 								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "tokenizer": {
 								          "kuromoji_user_dict": {
 								            "type": "kuromoji_tokenizer",
 								            "mode": "extended",
 								            "discard_punctuation": "false",
 								            "user_dictionary": "userdict_ja.txt"
 								          }
 								        },
 								        "analyzer": {
 								          "my_analyzer": {
 								            "type": "custom",
 								            "tokenizer": "kuromoji_user_dict"
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "my_analyzer",
 								  "text": "東京スカイツリー"
 								}
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
 								The above `analyze` request returns the following:
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
+								[source,console-result]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
 								{
 								  "tokens" : [ {
 								    "token" : "東京",
 								    "start_offset" : 0,
 								    "end_offset" : 2,
 								    "type" : "word",
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								    "position" : 0
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								  }, {
 								    "token" : "スカイツリー",
 								    "start_offset" : 2,
 								    "end_offset" : 8,
 								    "type" : "word",
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								    "position" : 1
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								  } ]
 								}
 								--------------------------------------------------
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
-												Expose discard_compound_token option to kuromoji_tokenizer (#57421)

This commit exposes the new Lucene option `discard_compound_token` to the Elasticsearch Kuromoji plugin.
											
										
										
											2020-06-05 09:33:31 -04:00
+								`discard_compound_token`::
 								    Whether original compound tokens should be discarded from the output with `search` mode. Defaults to `false`.
 								    Example output with `search` or `extended` mode and this option `true`:
 								    関西, 国際, 空港
-												[DOCS] Add full-width char section to kuromoji analyzer docs (#60317) (#60324)


											
										
										
											2020-07-28 14:17:23 -04:00
+								NOTE: If a text contains full-width characters, the `kuromoji_tokenizer`
 								tokenizer can produce unexpected tokens. To avoid this, add the
 								<<analysis-icu-normalization-charfilter,`icu_normalizer` character filter>> to
 								your analyzer. See <<kuromoji-analyzer-normalize-full-width-characters>>.
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
 								[[analysis-kuromoji-baseform]]
 								==== `kuromoji_baseform` token filter
 								The `kuromoji_baseform` token filter replaces terms with their
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								BaseFormAttribute. This acts as a lemmatizer for verbs and adjectives. Example:
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
-												[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502)


											
										
										
											2019-09-09 13:38:14 -04:00
+								[source,console]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
-												Remove `include_type_name` in asciidoc where possible (#37568)

The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
											
										
										
											2019-01-18 03:34:11 -05:00
+								PUT kuromoji_sample
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								{
 								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "analyzer": {
 								          "my_analyzer": {
 								            "tokenizer": "kuromoji_tokenizer",
 								            "filter": [
 								              "kuromoji_baseform"
 								            ]
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "my_analyzer",
 								  "text": "飲み"
 								}
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								which responds with:
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
+								[source,console-result]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
 								{
 								  "tokens" : [ {
 								    "token" : "飲む",
 								    "start_offset" : 0,
 								    "end_offset" : 2,
 								    "type" : "word",
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								    "position" : 0
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								  } ]
 								}
 								--------------------------------------------------
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
 								[[analysis-kuromoji-speech]]
 								==== `kuromoji_part_of_speech` token filter
 								The `kuromoji_part_of_speech` token filter removes tokens that match a set of
 								part-of-speech tags. It accepts the following setting:
 								`stoptags`::
 								    An array of part-of-speech tags that should be removed. It defaults to the
 								    `stoptags.txt` file embedded in the `lucene-analyzer-kuromoji.jar`.
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								For example:
-												[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502)


											
										
										
											2019-09-09 13:38:14 -04:00
+								[source,console]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
-												Remove `include_type_name` in asciidoc where possible (#37568)

The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
											
										
										
											2019-01-18 03:34:11 -05:00
+								PUT kuromoji_sample
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								{
 								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "analyzer": {
 								          "my_analyzer": {
 								            "tokenizer": "kuromoji_tokenizer",
 								            "filter": [
 								              "my_posfilter"
 								            ]
 								          }
 								        },
 								        "filter": {
 								          "my_posfilter": {
 								            "type": "kuromoji_part_of_speech",
 								            "stoptags": [
 								              "助詞-格助詞-一般",
 								              "助詞-終助詞"
 								            ]
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "my_analyzer",
 								  "text": "寿司がおいしいね"
 								}
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								Which responds with:
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
+								[source,console-result]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
 								{
 								  "tokens" : [ {
 								    "token" : "寿司",
 								    "start_offset" : 0,
 								    "end_offset" : 2,
 								    "type" : "word",
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								    "position" : 0
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								  }, {
 								    "token" : "おいしい",
 								    "start_offset" : 3,
 								    "end_offset" : 7,
 								    "type" : "word",
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								    "position" : 2
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								  } ]
 								}
 								--------------------------------------------------
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
 								[[analysis-kuromoji-readingform]]
 								==== `kuromoji_readingform` token filter
 								The `kuromoji_readingform` token filter replaces the token with its reading
 								form in either katakana or romaji. It accepts the following setting:
 								`use_romaji`::
 								    Whether romaji reading form should be output instead of katakana.  Defaults to `false`.
 								When using the pre-defined `kuromoji_readingform` filter, `use_romaji` is set
 								to `true`. The default when defining a custom `kuromoji_readingform`, however,
 								is `false`.  The only reason to use the custom form is if you need the
 								katakana reading form:
-												[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502)


											
										
										
											2019-09-09 13:38:14 -04:00
+								[source,console]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
-												Remove `include_type_name` in asciidoc where possible (#37568)

The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
											
										
										
											2019-01-18 03:34:11 -05:00
+								PUT kuromoji_sample
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								{
-												[DOCS] Reformat Plugin snippets to use two-space indents (#59895) (#59916)


											
										
										
											2020-07-20 15:06:12 -04:00
+								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "analyzer": {
 								          "romaji_analyzer": {
 								            "tokenizer": "kuromoji_tokenizer",
 								            "filter": [ "romaji_readingform" ]
 								          },
 								          "katakana_analyzer": {
 								            "tokenizer": "kuromoji_tokenizer",
 								            "filter": [ "katakana_readingform" ]
 								          }
 								        },
 								        "filter": {
 								          "romaji_readingform": {
 								            "type": "kuromoji_readingform",
 								            "use_romaji": true
 								          },
 								          "katakana_readingform": {
 								            "type": "kuromoji_readingform",
 								            "use_romaji": false
 								          }
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								        }
-												[DOCS] Reformat Plugin snippets to use two-space indents (#59895) (#59916)


											
										
										
											2020-07-20 15:06:12 -04:00
+								      }
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								    }
-												[DOCS] Reformat Plugin snippets to use two-space indents (#59895) (#59916)


											
										
										
											2020-07-20 15:06:12 -04:00
+								  }
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								}
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "katakana_analyzer",
 								  "text": "寿司" <1>
 								}
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "romaji_analyzer",
 								  "text": "寿司" <2>
 								}
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
 								<1> Returns `スシ`.
 								<2> Returns `sushi`.
 								[[analysis-kuromoji-stemmer]]
 								==== `kuromoji_stemmer` token filter
 								The `kuromoji_stemmer` token filter normalizes common katakana spelling
 								variations ending in a long sound character by removing this character
 								(U+30FC). Only full-width katakana characters are supported.
 								This token filter accepts the following setting:
 								`minimum_length`::
 								    Katakana words shorter than the `minimum length` are not stemmed (default
 								    is `4`).
-												[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502)


											
										
										
											2019-09-09 13:38:14 -04:00
+								[source,console]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
-												Remove `include_type_name` in asciidoc where possible (#37568)

The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
											
										
										
											2019-01-18 03:34:11 -05:00
+								PUT kuromoji_sample
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								{
 								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "analyzer": {
 								          "my_analyzer": {
 								            "tokenizer": "kuromoji_tokenizer",
 								            "filter": [
 								              "my_katakana_stemmer"
 								            ]
 								          }
 								        },
 								        "filter": {
 								          "my_katakana_stemmer": {
 								            "type": "kuromoji_stemmer",
 								            "minimum_length": 4
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "my_analyzer",
 								  "text": "コピー" <1>
 								}
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "my_analyzer",
 								  "text": "サーバー" <2>
 								}
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
 								<1> Returns `コピー`.
 								<2> Return `サーバ`.
 								[[analysis-kuromoji-stop]]
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								==== `ja_stop` token filter
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
 								The `ja_stop` token filter filters out Japanese stopwords (`_japanese_`), and
 								any other custom stopwords specified by the user. This filter only supports
 								the predefined `_japanese_` stopwords list.  If you want to use a different
 								predefined list, then use the
 								{ref}/analysis-stop-tokenfilter.html[`stop` token filter] instead.
-												[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502)


											
										
										
											2019-09-09 13:38:14 -04:00
+								[source,console]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
-												Remove `include_type_name` in asciidoc where possible (#37568)

The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
											
										
										
											2019-01-18 03:34:11 -05:00
+								PUT kuromoji_sample
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								{
 								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "analyzer": {
 								          "analyzer_with_ja_stop": {
 								            "tokenizer": "kuromoji_tokenizer",
 								            "filter": [
 								              "ja_stop"
 								            ]
 								          }
 								        },
 								        "filter": {
 								          "ja_stop": {
 								            "type": "ja_stop",
 								            "stopwords": [
 								              "_japanese_",
 								              "ストップ"
 								            ]
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "analyzer_with_ja_stop",
 								  "text": "ストップは消える"
 								}
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
 								The above request returns:
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
+								[source,console-result]
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								--------------------------------------------------
 								{
 								  "tokens" : [ {
 								    "token" : "消える",
 								    "start_offset" : 5,
 								    "end_offset" : 8,
 								    "type" : "word",
-												Cleanup some docs

Mark one `// NOTCONSOLE`, mark another `[source,painless]`, and
another `// TESTRESPONSE` and fix a bug in it.

											
										
										
											2016-08-26 15:59:45 -04:00
+								    "position" : 2
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
+								  } ]
 								}
 								--------------------------------------------------
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
-												Docs: Prepare plugin and integration docs for 2.0

* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579

											
										
										
											2015-08-15 12:00:55 -04:00
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
+								[[analysis-kuromoji-number]]
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								==== `kuromoji_number` token filter
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
 								The `kuromoji_number` token filter normalizes Japanese numbers (kansūji)
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								to regular Arabic decimal numbers in half-width characters. For example:
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
-												[DOCS] [2 of 5] Change // CONSOLE comments to [source,console] (#46353) (#46502)


											
										
										
											2019-09-09 13:38:14 -04:00
+								[source,console]
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
+								--------------------------------------------------
-												Remove `include_type_name` in asciidoc where possible (#37568)

The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
											
										
										
											2019-01-18 03:34:11 -05:00
+								PUT kuromoji_sample
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
+								{
 								  "settings": {
 								    "index": {
 								      "analysis": {
 								        "analyzer": {
 								          "my_analyzer": {
 								            "tokenizer": "kuromoji_tokenizer",
 								            "filter": [
 								              "kuromoji_number"
 								            ]
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
-												Removing request parameters in _analyze API

Remove unused imports
Replace POST method by GET method in docs
 Add breaking changes explanation
 Fix small issue in Kuromoji docs

Closes #20246

											
										
										
											2016-09-30 16:42:45 -04:00
+								GET kuromoji_sample/_analyze
-												Removing request parameters in _analyze API

Remove request params in _analyze API without index param
Change rest-api-test using JSON
Change docs using JSON

Closes #20246

											
										
										
											2016-09-22 07:54:30 -04:00
+								{
 								  "analyzer": "my_analyzer",
 								  "text": "一〇〇〇"
 								}
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
+								--------------------------------------------------
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								Which results in:
-												[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418)


											
										
										
											2019-09-06 09:22:08 -04:00
+								[source,console-result]
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
+								--------------------------------------------------
 								{
 								  "tokens" : [ {
 								    "token" : "1000",
 								    "start_offset" : 0,
 								    "end_offset" : 4,
 								    "type" : "word",
-												Test response snippets in kuromoji docs

Relates to #18160

											
										
										
											2016-08-17 09:56:00 -04:00
+								    "position" : 0
-												Analysis Kuromoji: Add nbest option and NumberFilter

Add nbest_cost and nbest_examples parameter to KuromojiTokenizerFactory
Add KuromojiNumberFilterFactory

											
										
										
											2016-03-16 04:43:21 -04:00
+								  } ]
 								}
 								--------------------------------------------------