LUCENE-1882: improved package level docs for smartcn

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@810247 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Chris M. Hostetter 2009-09-01 21:31:18 +00:00
parent e5cb7f668a
commit 29e6be94c3
2 changed files with 27 additions and 6 deletions

View File

@ -15,14 +15,16 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
<html><head></head> <html><head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body> <body>
<div> <div>
SmartChineseAnalyzer Hidden Markov Model package SmartChineseAnalyzer Hidden Markov Model package.
</div> </div>
<div> <div>
<font color="#FF0000"> <font color="#FF0000">
WARNING: The status of the analyzers/smartcn <b>analysis.cn</b> package is experimental. The APIs WARNING: The status of the analyzers/smartcn <b>analysis.cn.smart</b> package is experimental. The APIs
and file formats introduced here might change in the future and will not be supported anymore and file formats introduced here might change in the future and will not be supported anymore
in such a case. in such a case.
</font> </font>

View File

@ -15,17 +15,36 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
<html><head></head> <html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body> <body>
<div> <div>
SmartChineseAnalyzer Tokenizers and TokenFilters Analyzer for Simplified Chinese, which indexes words.
</div> </div>
<div> <div>
<font color="#FF0000"> <font color="#FF0000">
WARNING: The status of the analyzers/smartcn <b>analysis.cn</b> package is experimental. The APIs WARNING: The status of the analyzers/smartcn <b>analysis.cn.smart</b> package is experimental. The APIs
and file formats introduced here might change in the future and will not be supported anymore and file formats introduced here might change in the future and will not be supported anymore
in such a case. in such a case.
</font> </font>
</div> </div>
<div>
Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.
<ul>
<li>ChineseAnalyzer (in the analyzers/cn package): Index unigrams (individual Chinese characters) as a token.
<li>CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
<li>SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.
</ul>
Example phrase "我是中国人"
<ol>
<li>ChineseAnalyzer: 我-是-中-国-人</li>
<li>CJKAnalyzer: 我是-是中-中国-国人</li>
<li>SmartChineseAnalyzer: 我-是-中国-人</li>
</ol>
</div>
</body> </body>
</html> </html>