From 9fbc9db1c17d9fe5a7281f89a4bb18e18f38fceb Mon Sep 17 00:00:00 2001 From: Steve Rowe Date: Fri, 26 May 2017 16:57:53 -0400 Subject: [PATCH] SOLR-10758: fix broken internal link to new HMM Chinese Tokenizer section --- solr/solr-ref-guide/src/language-analysis.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/solr/solr-ref-guide/src/language-analysis.adoc b/solr/solr-ref-guide/src/language-analysis.adoc index c82cd61b662..11b0b784e41 100644 --- a/solr/solr-ref-guide/src/language-analysis.adoc +++ b/solr/solr-ref-guide/src/language-analysis.adoc @@ -565,7 +565,7 @@ See the example under < [[LanguageAnalysis-SimplifiedChinese]] === Simplified Chinese -For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`. +For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`. The default configuration of the <> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add to your `SOLR_HOME/lib`. @@ -598,6 +598,7 @@ Also useful for Chinese analysis: ---- +[[LanguageAnalysis-HMMChineseTokenizer]] === HMM Chinese Tokenizer For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, see `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add to your `solr_home/lib`.