LUCENE-8603 Fix the inversion of right ids for additional nouns in the Korean user dictionary

2018-12-16 15:03:12 +07:00 · 2018-12-16 15:03:12 +07:00 · 53b05c5e7e
parent a62b5941b0
commit 53b05c5e7e
2 changed files with 9 additions and 6 deletions
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@ -237,9 +237,8 @@ Build

 Bug fixes:

-* LUCENE-8548: The KoreanTokenizer no longer splits unknown words on combining diacritics and
-  detects script boundaries more accurately with Character#UnicodeScript#of.
-  (Christophe Bismuth, Jim Ferenczi)
+* LUCENE-8603: Fix the inversion of right ids for additional nouns in the Korean user dictionary.
+  (Yoo Jeongin via Jim Ferenczi)

 New Features

@ -265,6 +264,10 @@ Improvements
 * LUCENE-8575: SegmentInfos#toString now includes attributes and diagnostics.
  (Namgyu Kim via Adrien Grand)

+* LUCENE-8548: The KoreanTokenizer no longer splits unknown words on combining diacritics and
+  detects script boundaries more accurately with Character#UnicodeScript#of.
+  (Christophe Bismuth, Jim Ferenczi)
+
 Optimizations

 * LUCENE-8552: FieldInfos.getMergedFieldInfos no longer does any merging if there is <= 1 segment.
--- a/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/UserDictionary.java
+++ b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/dict/UserDictionary.java
@ -43,11 +43,11 @@ public final class UserDictionary implements Dictionary {
  public static final short LEFT_ID = 1781;

  // NNG right
-  public static final short RIGHT_ID = 3534;
+  public static final short RIGHT_ID = 3533;
  // NNG right with hangul and a coda on the last char
-  public static final short RIGHT_ID_T = 3534;
+  public static final short RIGHT_ID_T = 3535;
  // NNG right with hangul and no coda on the last char
-  public static final short RIGHT_ID_F = 3535;
+  public static final short RIGHT_ID_F = 3534;

  // length, length... indexed by compound ID or null for simple noun
  private final int segmentations[][];