parent
3022424f68
commit
995e796eab
|
@ -3,7 +3,7 @@
|
|||
|
||||
The `cjk_bigram` token filter forms bigrams out of the CJK
|
||||
terms that are generated by the <<analysis-standard-tokenizer,`standard` tokenizer>>
|
||||
or the `icu_tokenizer` (see <<analysis-icu-plugin>>).
|
||||
or the `icu_tokenizer` (see {plugins}/analysis-icu-tokenizer.html[`analysis-icu` plugin]).
|
||||
|
||||
By default, when a CJK character has no adjacent characters to form a bigram,
|
||||
it is output in unigram form. If you always want to output both unigrams and
|
||||
|
|
|
@ -7,6 +7,6 @@ The `cjk_width` token filter normalizes CJK width differences:
|
|||
* Folds halfwidth Katakana variants into the equivalent Kana
|
||||
|
||||
NOTE: This token filter can be viewed as a subset of NFKC/NFKD
|
||||
Unicode normalization. See the <<analysis-icu-plugin>>
|
||||
Unicode normalization. See the {plugins}/analysis-icu-normalization-charfilter.html[`analysis-icu` plugin]
|
||||
for full normalization support.
|
||||
|
||||
|
|
Loading…
Reference in New Issue