mirror of https://github.com/apache/lucene.git
86 lines
3.9 KiB
Plaintext
86 lines
3.9 KiB
Plaintext
# Apache Lucene Migration Guide
|
|
|
|
## The way how number of document calculated is changed (LUCENE-6711)
|
|
The number of documents (numDocs) is used to calculate term specificity (idf) and average document length (avdl).
|
|
Prior to LUCENE-6711, collectionStats.maxDoc() was used for the statistics.
|
|
Now, collectionStats.docCount() is used whenever possible, if not maxDocs() is used.
|
|
|
|
Assume that a collection contains 100 documents, and 50 of them have "keywords" field.
|
|
In this example, maxDocs is 100 while docCount is 50 for the "keywords" field.
|
|
The total number of tokens for "keywords" field is divided by docCount to obtain avdl.
|
|
Therefore, docCount which is the total number of documents that have at least one term for the field, is a more precise metric for optional fields.
|
|
|
|
DefaultSimilarity does not leverage avdl, so this change would have relatively minor change in the result list.
|
|
Because relative idf values of terms will remain same.
|
|
However, when combined with other factors such as term frequency, relative ranking of documents could change.
|
|
Some Similarity implementations (such as the ones instantiated with NormalizationH2 and BM25) take account into avdl and would have notable change in ranked list.
|
|
Especially if you have a collection of documents with varying lengths.
|
|
Because NormalizationH2 tends to punish documents longer than avdl.
|
|
|
|
## FunctionValues.exist() Behavior Changes due to ValueSource bug fixes (LUCENE-5961)
|
|
|
|
Bugs fixed in several ValueSource functions may result in different behavior in
|
|
situations where some documents do not have values for fields wrapped in other
|
|
ValueSources. Users who want to preserve the previous behavior may need to wrap
|
|
their ValueSources in a "DefFunction" along with a ConstValueSource of "0.0".
|
|
|
|
## Removal of Filter and FilteredQuery (LUCENE-6301,LUCENE-6583)
|
|
|
|
Filter and FilteredQuery have been removed. Regular queries can be used instead
|
|
of filters as they have been optimized for the filtering case. And you can
|
|
construct a BooleanQuery with one MUST clause for the query, and one FILTER
|
|
clause for the filter in order to have similar behaviour to FilteredQuery.
|
|
|
|
## PhraseQuery, MultiPhraseQuery, and BooleanQuery made immutable (LUCENE-6531 LUCENE-7064 LUCENE-6570)
|
|
|
|
PhraseQuery, MultiPhraseQuery, and BooleanQuery are now immutable and have a
|
|
builder API to help construct them. For instance a BooleanQuery that used to
|
|
be constructed like this:
|
|
|
|
BooleanQuery bq = new BooleanQuery();
|
|
bq.add(q1, Occur.SHOULD);
|
|
bq.add(q2, Occur.SHOULD);
|
|
bq.add(q3, Occur.MUST);
|
|
bq.setMinimumNumberShouldMatch(1);
|
|
|
|
can now be constructed this way using its builder:
|
|
|
|
BooleanQuery bq = new BooleanQuery.Builder()
|
|
.add(q1, Occur.SHOULD)
|
|
.add(q2, Occur.SHOULD)
|
|
.add(q3, Occur.SHOULD)
|
|
.setMinimumNumberShouldMatch(1)
|
|
.build();
|
|
|
|
## AttributeImpl now requires that reflectWith() is implemented (LUCENE-6651)
|
|
|
|
AttributeImpl removed the default, reflection-based implementation of
|
|
reflectWith(AtrributeReflector). The method was made abstract. If you have
|
|
implemented your own attribute, make sure to add the required method sigature.
|
|
See the Javadocs for an example.
|
|
|
|
## Query.setBoost() and Query.clone() are removed (LUCENE-6590)
|
|
|
|
Query.setBoost has been removed. In order to apply a boost to a Query, you now
|
|
need to wrap it inside a BoostQuery. For instance,
|
|
|
|
Query q = ...;
|
|
float boost = ...;
|
|
q = new BoostQuery(q, boost);
|
|
|
|
would be equivalent to the following code with the old setBoost API:
|
|
|
|
Query q = ...;
|
|
float boost = ...;
|
|
q.setBoost(q.getBoost() * boost);
|
|
|
|
# PointValues replaces NumericField (LUCENE-6917)
|
|
|
|
PointValues provides faster indexing and searching, a smaller
|
|
index size, and less heap used at search time. See org.apache.lucene.index.PointValues
|
|
for an introduction.
|
|
|
|
Legacy numeric encodings from previous versions of Lucene are
|
|
deprecated as LegacyIntField, LegacyFloatField, LegacyLongField, and LegacyDoubleField,
|
|
and can be searched with LegacyNumericRangeQuery.
|