mirror of https://github.com/apache/lucene.git
37 lines
1.6 KiB
Plaintext
37 lines
1.6 KiB
Plaintext
If possible, use the same JRE major version at both index and search time.
|
|
When upgrading to a different JRE major version, consider re-indexing.
|
|
|
|
Different JRE major versions may implement different versions of Unicode,
|
|
which will change the way some parts of Lucene treat your text.
|
|
|
|
For example: with Java 1.4, LetterTokenizer will split around the character U+02C6,
|
|
but with Java 5 it will not.
|
|
This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4.
|
|
|
|
For reference, JRE major versions with their corresponding Unicode versions:
|
|
Java 1.4, Unicode 3.0
|
|
Java 5, Unicode 4.0
|
|
Java 6, Unicode 4.0
|
|
Java 7, Unicode 5.1
|
|
|
|
In general, whether or not you need to re-index largely depends upon the data that
|
|
you are searching, and what was changed in any given Unicode version. For example,
|
|
if you are completely sure that your content is limited to the "Basic Latin" range
|
|
of Unicode, you can safely ignore this.
|
|
|
|
Special Notes:
|
|
|
|
LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
|
|
|
|
* StandardAnalyzer will return the same results under Java 5 as it did under
|
|
Java 1.4. This is because it is largely independent of the runtime JRE for
|
|
Unicode support, (with the exception of lowercasing). However, no changes to
|
|
casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are
|
|
using this Analyzer you are NOT affected.
|
|
|
|
* SimpleAnalyzer, StopAnalyzer, LetterTokenizer, LowerCaseFilter, and
|
|
LowerCaseTokenizer may return different results, along with many other Analyzers
|
|
and TokenStreams in Lucene's contrib area. If you are using one of these
|
|
components, you may be affected.
|
|
|