lucene/lucene/MIGRATE.txt

81 lines
3.6 KiB
Plaintext

# Apache Lucene Migration Guide
## The way how number of document calculated is changed (LUCENE-6711)
The number of documents (numDocs) is used to calculate term specificity (idf) and average document length (avdl).
Prior to LUCENE-6711, collectionStats.maxDoc() was used for the statistics.
Now, collectionStats.docCount() is used whenever possible, if not maxDocs() is used.
Assume that a collection contains 100 documents, and 50 of them have "keywords" field.
In this example, maxDocs is 100 while docCount is 50 for the "keywords" field.
The total number of tokens for "keywords" field is divided by docCount to obtain avdl.
Therefore, docCount which is the total number of documents that have at least one term for the field, is a more precise metric for optional fields.
DefaultSimilarity does not leverage avdl, so this change would have relatively minor change in the result list.
Because relative idf values of terms will remain same.
However, when combined with other factors such as term frequency, relative ranking of documents could change.
Some Similarity implementations (such as the ones instantiated with NormalizationH2 and BM25) take account into avdl and would have notable change in ranked list.
Especially if you have a collection of documents with varying lengths.
Because NormalizationH2 tends to punish documents longer than avdl.
## Separation of IndexDocument and StoredDocument (LUCENE-3312)
The API of oal.document was restructured to differentiate between stored
documents and indexed documents. IndexReader.document(int) now returns
StoredDocument instead of Document. In most cases a simple replacement
of the return type is enough to upgrade.
## FunctionValues.exist() Behavior Changes due to ValueSource bug fixes (LUCENE-5961)
Bugs fixed in several ValueSource functions may result in different behavior in
situations where some documents do not have values for fields wrapped in other
ValueSources. Users who want to preserve the previous behavior may need to wrap
their ValueSources in a "DefFunction" along with a ConstValueSource of "0.0".
## Removal of FilteredQuery (LUCENE-6583)
FilteredQuery has been removed. Instead, you can construct a BooleanQuery with
one MUST clause for the query, and one FILTER clause for the filter.
## PhraseQuery and BooleanQuery made immutable (LUCENE-6531 LUCENE-6570)
PhraseQuery and BooleanQuery are now immutable and have a builder API to help
construct them. For instance a BooleanQuery that used to be constructed like
this:
BooleanQuery bq = new BooleanQuery();
bq.add(q1, Occur.SHOULD);
bq.add(q2, Occur.SHOULD);
bq.add(q3, Occur.MUST);
bq.setMinimumNumberShouldMatch(1);
can now be constructed this way using its builder:
BooleanQuery bq = new BooleanQuery.Builder()
.add(q1, Occur.SHOULD)
.add(q2, Occur.SHOULD)
.add(q3, Occur.SHOULD)
.setMinimumNumberShouldMatch(1)
.build();
## AttributeImpl now requires that reflectWith() is implemented (LUCENE-6651)
AttributeImpl removed the default, reflection-based implementation of
reflectWith(AtrributeReflector). The method was made abstract. If you have
implemented your own attribute, make sure to add the required method sigature.
See the Javadocs for an example.
## Query.setBoost() and Query.clone() are removed (LUCENE-6590)
Query.setBoost has been removed. In order to apply a boost to a Query, you now
need to wrap it inside a BoostQuery. For instance,
Query q = ...;
float boost = ...;
q = new BoostQuery(q, boost);
would be equivalent to the following code with the old setBoost API:
Query q = ...;
float boost = ...;
q.setBoost(q.getBoost() * boost);