mirror of
https://github.com/apache/lucene.git
synced 2025-02-18 07:55:29 +00:00
LUCENE-1882: improved package level docs for smartcn
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@810247 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
e5cb7f668a
commit
29e6be94c3
@ -15,14 +15,16 @@
|
|||||||
See the License for the specific language governing permissions and
|
See the License for the specific language governing permissions and
|
||||||
limitations under the License.
|
limitations under the License.
|
||||||
-->
|
-->
|
||||||
<html><head></head>
|
<html><head>
|
||||||
|
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||||||
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<div>
|
<div>
|
||||||
SmartChineseAnalyzer Hidden Markov Model package
|
SmartChineseAnalyzer Hidden Markov Model package.
|
||||||
</div>
|
</div>
|
||||||
<div>
|
<div>
|
||||||
<font color="#FF0000">
|
<font color="#FF0000">
|
||||||
WARNING: The status of the analyzers/smartcn <b>analysis.cn</b> package is experimental. The APIs
|
WARNING: The status of the analyzers/smartcn <b>analysis.cn.smart</b> package is experimental. The APIs
|
||||||
and file formats introduced here might change in the future and will not be supported anymore
|
and file formats introduced here might change in the future and will not be supported anymore
|
||||||
in such a case.
|
in such a case.
|
||||||
</font>
|
</font>
|
||||||
|
@ -15,17 +15,36 @@
|
|||||||
See the License for the specific language governing permissions and
|
See the License for the specific language governing permissions and
|
||||||
limitations under the License.
|
limitations under the License.
|
||||||
-->
|
-->
|
||||||
<html><head></head>
|
<html>
|
||||||
|
<head>
|
||||||
|
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||||||
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<div>
|
<div>
|
||||||
SmartChineseAnalyzer Tokenizers and TokenFilters
|
Analyzer for Simplified Chinese, which indexes words.
|
||||||
</div>
|
</div>
|
||||||
<div>
|
<div>
|
||||||
<font color="#FF0000">
|
<font color="#FF0000">
|
||||||
WARNING: The status of the analyzers/smartcn <b>analysis.cn</b> package is experimental. The APIs
|
WARNING: The status of the analyzers/smartcn <b>analysis.cn.smart</b> package is experimental. The APIs
|
||||||
and file formats introduced here might change in the future and will not be supported anymore
|
and file formats introduced here might change in the future and will not be supported anymore
|
||||||
in such a case.
|
in such a case.
|
||||||
</font>
|
</font>
|
||||||
</div>
|
</div>
|
||||||
|
<div>
|
||||||
|
Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.
|
||||||
|
<ul>
|
||||||
|
<li>ChineseAnalyzer (in the analyzers/cn package): Index unigrams (individual Chinese characters) as a token.
|
||||||
|
<li>CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
|
||||||
|
<li>SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
Example phrase: "我是中国人"
|
||||||
|
<ol>
|
||||||
|
<li>ChineseAnalyzer: 我-是-中-国-人</li>
|
||||||
|
<li>CJKAnalyzer: 我是-是中-中国-国人</li>
|
||||||
|
<li>SmartChineseAnalyzer: 我-是-中国-人</li>
|
||||||
|
</ol>
|
||||||
|
</div>
|
||||||
|
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
Loading…
x
Reference in New Issue
Block a user