LUCENE-9270: Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder

2020-03-12 02:50:36 +09:00 · 2020-03-12 02:50:36 +09:00 · f0a49738ca
parent ed59c3eb33
commit f0a49738ca
2 changed files with 8 additions and 1 deletions
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@ -130,6 +130,8 @@ Other

 * LUCENE-9257: Always keep FST off-heap. FSTLoadMode, Reader attributes and openedFromWriter removed. (Bruno Roustant)

+* LUCENE-9270: Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder. (Namgyu Kim)
+
 ======================= Lucene 8.5.0 =======================

 API Changes
--- a/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/util/DictionaryBuilder.java
+++ b/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/util/DictionaryBuilder.java
@ -26,13 +26,18 @@ import java.util.Locale;
 * Tool to build dictionaries. Usage:
 * <pre>
 *    java -cp [lucene classpath] org.apache.lucene.analysis.ja.util.DictionaryBuilder \
- *          ${inputDir} ${outputDir} ${encoding}
+ *          ${inputDir} ${outputDir} ${encoding} ${normalizeEntry}
 * </pre>
 *
 * <p> The input directory is expected to include unk.def, matrix.def, plus any number of .csv
 * files, roughly following the conventions of IPADIC. JapaneseTokenizer uses dictionaries built
 * with this tool. Note that the input files required by this build generally must be generated from
 * a corpus of real text using tools that are not part of Lucene.  </p>
+ * <p>The normalizeEntry option is a Boolean value.<br>
+ * If true,
+ * check a surface form (first column in csv) is <a href="https://unicode.org/reports/tr15/#Norm_Forms">NFC Normalized</a>.
+ * If it isn't, NFC normalized contents will be added to the TokenInfoDictionary in addition to the original form.<br>
+ * This option is false for pre-built dictionary in the Lucene. </p>
 * @lucene.experimental
 */
 public class DictionaryBuilder {