- Added documentation

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150960 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Otis Gospodnetic 2004-03-02 13:56:03 +00:00
parent 4a46c8b05f
commit a016dcef1b
1 changed files with 18 additions and 1 deletions

View File

@ -64,6 +64,23 @@ import org.apache.lucene.analysis.*;
* Rule: A Chinese character as a single token
* Copyright: Copyright (c) 2001
* Company:
*
* The difference between thr ChineseTokenizer and the
* CJKTokenizer (id=23545) is that they have different
* token parsing logic.
*
* Let me use an example. If having a Chinese text
* "C1C2C3C4" to be indexed, the tokens returned from the
* ChineseTokenizer are C1, C2, C3, C4. And the tokens
* returned from the CJKTokenizer are C1C2, C2C3, C3C4.
*
* Therefore the index the CJKTokenizer created is much
* larger.
*
* The problem is that when searching for C1, C1C2, C1C3,
* C4C2, C1C2C3 ... the ChineseTokenizer works, but the
* CJKTokenizer will not work.
*
* @author Yiyi Sun
* @version 1.0
*
@ -149,4 +166,4 @@ public final class ChineseTokenizer extends Tokenizer {
}
}
}
}