lucene/lucene/MIGRATE.txt

147 lines
6.7 KiB
Plaintext

# Apache Lucene Migration Guide
## Changed SPI lookups for codecs and analysis changed (LUCENE-7873) ##
Due to serious problems with context class loaders in several frameworks
(OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats, DocValuesFormats
and all analysis factories was changed to only inspect the current classloader
that defined the interface class (`lucene-core.jar`). Normal applications
should not encounter any issues with that change, because the application
classloader (unnamed module in Java 9) can load all SPIs from all JARs
from classpath.
For any code that relies on the old behaviour (e.g., certain web applications
or components in application servers) one can manually instruct the Lucene
SPI implementation to also inspect the context classloader. To do this,
add this code to the early startup phase of your application before any
Apache Lucene component is used:
ClassLoader cl = Thread.currentThread().getContextClassLoader();
// Codecs:
PostingsFormat.reloadPostingsFormats(cl);
DocValuesFormat.reloadDocValuesFormats(cl);
Codec.reloadCodecs(cl);
// Analysis:
CharFilterFactory.reloadCharFilters(cl);
TokenFilterFactory.reloadTokenFilters(cl);
TokenizerFactory.reloadTokenizers(cl);
This code will reload all service providers from the given class loader
(in our case the context class loader). Of course, instead of specifying
the context class loader, it is receommended to use the application's main
class loader or the module class loader.
If you are migrating your project to Java 9 Jigsaw module system, keep in mind
that Lucene currently does not yet support `module-info.java` declarations of
service provider impls (`provides` statement). It is therefore recommended
to keep all of Lucene in one Uber-Module and not try to split Lucene into
several modules. As soon as Lucene will migrate to Java 9 as minimum
requirement, we will work on improving that.
For OSGI, the same applies. You have to create a bundle with all of Lucene for
SPI to work correctly.
## CustomAnalyzer resources (LUCENE-7883)##
Lucene no longer uses the context class loader when resolving resources in
CustomAnalyzer or ClassPathResourceLoader. Resources are only resolved
against Lucene's class loader by default. Please use another builder method
to change to a custom classloader.
## Query.hashCode and Query.equals are now abstract methods (LUCENE-7277)
Any custom query subclasses should redeclare equivalence relationship according
to the subclass's details. See code patterns used in existing core Lucene query
classes for details.
## CompressionTools removed (LUCENE-7322)
Per-field compression has been superseded by codec-level compression, which has
the benefit of being able to compress several fields, or even documents at once,
yielding better compression ratios. In case you would still like to compress on
top of the codec, you can do it on the application side by using the utility
classes from the java.util.zip package.
## Explanation.toHtml() removed (LUCENE-7360)
Clients wishing to render Explanations as HTML should implement their own
utilities for this.
## Similarity.coord and BooleanQuery.disableCoord removed (LUCENE-7369)
Coordination factors were a workaround for the fact that the ClassicSimilarity
does not have strong enough term frequency saturation. This causes disjunctions
to get better scores on documents that have many occurrences of a few query
terms than on documents that match most clauses, which is most of time
undesirable. The new BM25Similarity does not suffer from this problem since it
has better saturation for the contribution of the term frequency so the coord
factors have been removed from scores. Things now work as if coords were always
disabled when constructing boolean queries.
## Weight.getValueForNormalization() and Weight.normalize() removed (LUCENE-7368)
Query normalization's goal was to make scores comparable across queries, which
was only implemented by the ClassicSimilarity. Since ClassicSimilarity is not
the default similarity anymore, this functionality has been removed. Boosts are
now propagated through Query#createWeight.
## AnalyzingQueryParser removed (LUCENE-7355)
The functionality of AnalyzingQueryParser has been folded into the classic
QueryParser, which now passes terms through Analyzer#normalize when generating
queries.
## CommonQueryParserConfiguration.setLowerCaseExpandedTerms removed (LUCENE-7355)
This option has been removed as expanded terms are now normalized through
Analyzer#normalize.
## Cache key and close listener refactoring (LUCENE-7410)
The way to access cache keys and add close listeners has been refactored in
order to be less trappy. You should now use IndexReader.getReaderCacheHelper()
to have manage caches that take deleted docs and doc values updates into
account, and LeafReader.getCoreCacheHelper() to manage per-segment caches that
do not take deleted docs and doc values updates into account.
## Index-time boosts removal (LUCENE-6819)
Index-time boosts are not supported anymore. As a replacement, index-time
scoring factors should be indexed in a doc value field and combined with the
score at query time using FunctionScoreQuery for instance.
## Grouping collector refactoring (LUCENE-7701)
Groups are now defined by GroupSelector classes, making it easier to define new
types of groups. Rather than having term or function specific collection
classes, FirstPassGroupingCollector, AllGroupsCollector and
AllGroupHeadsCollector are now concrete classes taking a GroupSelector.
SecondPassGroupingCollector is no longer specifically aimed at
collecting TopDocs for each group, but instead takes a GroupReducer that will
perform any type of reduction on the top groups collected on a first-pass. To
reproduce the old behaviour of SecondPassGroupingCollector, you should instead
use TopGroupsCollector.
## Removed legacy numerics (LUCENE-7850)
Support for legacy numerics has been removed since legacy numerics had been
deprecated since Lucene 6.0. Points should be used instead, see
org.apache.lucene.index.PointValues for an introduction.
## TopDocs.totalHits is now a long (LUCENE-7872)
TopDocs.totalHits is now a long so that TopDocs instances can be used to
represent top hits that have more than 2B matches. This is necessary for the
case that multiple TopDocs instances are merged together with TopDocs#merge as
they might have more than 2B matches in total. However TopDocs instances
returned by IndexSearcher will still have a total number of hits which is less
than 2B since Lucene indexes are still bound to at most 2B documents, so it
can safely be casted to an int in that case.
## PrefixAwareTokenFilter and PrefixAndSuffixAwareTokenFilter removed
(LUCENE-7877)
Instead use ConcatentingTokenStream, which will allow for the use of custom
attributes.