SOLR-3115: improve japanese stopwords.txt description

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242557 13f79535-47bb-0310-9956-ffa450edef68
2025-03-01 05:49:33 +00:00 · 2012-02-09 22:17:44 +00:00 · 2012-02-09 22:17:44 +00:00 · 9f783ead67
commit 9f783ead67
parent 509f4c557d
2 changed files with 20 additions and 10 deletions
--- a/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
+++ b/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
@ -1,14 +1,19 @@
 #
 # This file defines a stopword set for Japanese.
 #
-# The set is made up hand-picked frequent terms from taken from segmented Japanese
-# Wikipedia.  Punctuation characters and frequent kanji have mostly been left out.
+# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
+# Punctuation characters and frequent kanji have mostly been left out.  See LUCENE-3745
+# for frequency lists, etc. that can be useful for making your own set (if desired)
 #
-# There is an overlap between these stopwords and the terms removed when used in
-# combination with the KuromojiPartOfSpeechStopFilter.  When editing this file, note
+# Note that there is an overlap between these stopwords and the terms stopped when used
+# in combination with the KuromojiPartOfSpeechStopFilter.  When editing this file, note
 # that comments are not allowed on the same line as stopwords.
 #
-# See LUCENE-3745 for frequency lists, etc. that can be useful for making your own set.
+# Also note that stopping is done in a case-insensitive manner.  Change your StopFilter
+# configuration if you need case-sensitive stopping.  Lastly, note that stopping is done
+# using the same character width as the entries in this file.  Since this StopFilter is
+# normally done after a CJKWidthFilter in your chain, you would usually want your romaji
+# entries to be in half-width and your kana entries to be in full-width.
 #
 の
 に
--- a/solr/example/solr/conf/lang/stopwords_ja.txt
+++ b/solr/example/solr/conf/lang/stopwords_ja.txt
@ -1,14 +1,19 @@
 #
 # This file defines a stopword set for Japanese.
 #
-# The set is made up hand-picked frequent terms from taken from segmented Japanese
-# Wikipedia.  Punctuation characters and frequent kanji have mostly been left out.
+# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
+# Punctuation characters and frequent kanji have mostly been left out.  See LUCENE-3745
+# for frequency lists, etc. that can be useful for making your own set (if desired)
 #
-# There is an overlap between these stopwords and the terms removed when used in
-# combination with the KuromojiPartOfSpeechStopFilter.  When editing this file, note
+# Note that there is an overlap between these stopwords and the terms stopped when used
+# in combination with the KuromojiPartOfSpeechStopFilter.  When editing this file, note
 # that comments are not allowed on the same line as stopwords.
 #
-# See LUCENE-3745 for frequency lists, etc. that can be useful for making your own set.
+# Also note that stopping is done in a case-insensitive manner.  Change your StopFilter
+# configuration if you need case-sensitive stopping.  Lastly, note that stopping is done
+# using the same character width as the entries in this file.  Since this StopFilter is
+# normally done after a CJKWidthFilter in your chain, you would usually want your romaji
+# entries to be in half-width and your kana entries to be in full-width.
 #
 の
 に