parent
3022424f68
commit
995e796eab
|
@ -3,7 +3,7 @@
|
||||||
|
|
||||||
The `cjk_bigram` token filter forms bigrams out of the CJK
|
The `cjk_bigram` token filter forms bigrams out of the CJK
|
||||||
terms that are generated by the <<analysis-standard-tokenizer,`standard` tokenizer>>
|
terms that are generated by the <<analysis-standard-tokenizer,`standard` tokenizer>>
|
||||||
or the `icu_tokenizer` (see <<analysis-icu-plugin>>).
|
or the `icu_tokenizer` (see {plugins}/analysis-icu-tokenizer.html[`analysis-icu` plugin]).
|
||||||
|
|
||||||
By default, when a CJK character has no adjacent characters to form a bigram,
|
By default, when a CJK character has no adjacent characters to form a bigram,
|
||||||
it is output in unigram form. If you always want to output both unigrams and
|
it is output in unigram form. If you always want to output both unigrams and
|
||||||
|
|
|
@ -7,6 +7,6 @@ The `cjk_width` token filter normalizes CJK width differences:
|
||||||
* Folds halfwidth Katakana variants into the equivalent Kana
|
* Folds halfwidth Katakana variants into the equivalent Kana
|
||||||
|
|
||||||
NOTE: This token filter can be viewed as a subset of NFKC/NFKD
|
NOTE: This token filter can be viewed as a subset of NFKC/NFKD
|
||||||
Unicode normalization. See the <<analysis-icu-plugin>>
|
Unicode normalization. See the {plugins}/analysis-icu-normalization-charfilter.html[`analysis-icu` plugin]
|
||||||
for full normalization support.
|
for full normalization support.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue