Revert "LUCENE-8014: Remove deprecated SimScorer methods"

Reverting to fix test failures

This reverts commit 946ec9d5b9.
This commit is contained in:
Alan Woodward 2017-11-10 09:02:03 +00:00
parent a43c318a51
commit 764abcb31a
18 changed files with 324 additions and 18 deletions

View File

@ -20,9 +20,6 @@ API Changes
* LUCENE-8038: Deprecated PayloadScoreQuery constructors have been removed (Alan
Woodward)
* LUCENE-8014: Similarity.computeSlopFactor() and
Similarity.computePayloadFactor() have been removed (Alan Woodward)
Changes in Runtime Behavior
* LUCENE-7837: Indices that were created before the previous major version

View File

@ -1,8 +1,146 @@
# Apache Lucene Migration Guide
## Similarity.SimScorer.computeXXXFactor methods removed (LUCENE-8014) ##
## Changed SPI lookups for codecs and analysis changed (LUCENE-7873) ##
SpanQuery and PhraseQuery now always calculate their slops as (1.0 / (1.0 +
distance)). Payload factor calculation is performed by PayloadDecoder in the
queries module
Due to serious problems with context class loaders in several frameworks
(OSGI, Java 9 Jigsaw), the lookup of Codecs, PostingsFormats, DocValuesFormats
and all analysis factories was changed to only inspect the current classloader
that defined the interface class (`lucene-core.jar`). Normal applications
should not encounter any issues with that change, because the application
classloader (unnamed module in Java 9) can load all SPIs from all JARs
from classpath.
For any code that relies on the old behaviour (e.g., certain web applications
or components in application servers) one can manually instruct the Lucene
SPI implementation to also inspect the context classloader. To do this,
add this code to the early startup phase of your application before any
Apache Lucene component is used:
ClassLoader cl = Thread.currentThread().getContextClassLoader();
// Codecs:
PostingsFormat.reloadPostingsFormats(cl);
DocValuesFormat.reloadDocValuesFormats(cl);
Codec.reloadCodecs(cl);
// Analysis:
CharFilterFactory.reloadCharFilters(cl);
TokenFilterFactory.reloadTokenFilters(cl);
TokenizerFactory.reloadTokenizers(cl);
This code will reload all service providers from the given class loader
(in our case the context class loader). Of course, instead of specifying
the context class loader, it is receommended to use the application's main
class loader or the module class loader.
If you are migrating your project to Java 9 Jigsaw module system, keep in mind
that Lucene currently does not yet support `module-info.java` declarations of
service provider impls (`provides` statement). It is therefore recommended
to keep all of Lucene in one Uber-Module and not try to split Lucene into
several modules. As soon as Lucene will migrate to Java 9 as minimum
requirement, we will work on improving that.
For OSGI, the same applies. You have to create a bundle with all of Lucene for
SPI to work correctly.
## CustomAnalyzer resources (LUCENE-7883)##
Lucene no longer uses the context class loader when resolving resources in
CustomAnalyzer or ClassPathResourceLoader. Resources are only resolved
against Lucene's class loader by default. Please use another builder method
to change to a custom classloader.
## Query.hashCode and Query.equals are now abstract methods (LUCENE-7277)
Any custom query subclasses should redeclare equivalence relationship according
to the subclass's details. See code patterns used in existing core Lucene query
classes for details.
## CompressionTools removed (LUCENE-7322)
Per-field compression has been superseded by codec-level compression, which has
the benefit of being able to compress several fields, or even documents at once,
yielding better compression ratios. In case you would still like to compress on
top of the codec, you can do it on the application side by using the utility
classes from the java.util.zip package.
## Explanation.toHtml() removed (LUCENE-7360)
Clients wishing to render Explanations as HTML should implement their own
utilities for this.
## Similarity.coord and BooleanQuery.disableCoord removed (LUCENE-7369)
Coordination factors were a workaround for the fact that the ClassicSimilarity
does not have strong enough term frequency saturation. This causes disjunctions
to get better scores on documents that have many occurrences of a few query
terms than on documents that match most clauses, which is most of time
undesirable. The new BM25Similarity does not suffer from this problem since it
has better saturation for the contribution of the term frequency so the coord
factors have been removed from scores. Things now work as if coords were always
disabled when constructing boolean queries.
## Weight.getValueForNormalization() and Weight.normalize() removed (LUCENE-7368)
Query normalization's goal was to make scores comparable across queries, which
was only implemented by the ClassicSimilarity. Since ClassicSimilarity is not
the default similarity anymore, this functionality has been removed. Boosts are
now propagated through Query#createWeight.
## AnalyzingQueryParser removed (LUCENE-7355)
The functionality of AnalyzingQueryParser has been folded into the classic
QueryParser, which now passes terms through Analyzer#normalize when generating
queries.
## CommonQueryParserConfiguration.setLowerCaseExpandedTerms removed (LUCENE-7355)
This option has been removed as expanded terms are now normalized through
Analyzer#normalize.
## Cache key and close listener refactoring (LUCENE-7410)
The way to access cache keys and add close listeners has been refactored in
order to be less trappy. You should now use IndexReader.getReaderCacheHelper()
to have manage caches that take deleted docs and doc values updates into
account, and LeafReader.getCoreCacheHelper() to manage per-segment caches that
do not take deleted docs and doc values updates into account.
## Index-time boosts removal (LUCENE-6819)
Index-time boosts are not supported anymore. As a replacement, index-time
scoring factors should be indexed in a doc value field and combined with the
score at query time using FunctionScoreQuery for instance.
## Grouping collector refactoring (LUCENE-7701)
Groups are now defined by GroupSelector classes, making it easier to define new
types of groups. Rather than having term or function specific collection
classes, FirstPassGroupingCollector, AllGroupsCollector and
AllGroupHeadsCollector are now concrete classes taking a GroupSelector.
SecondPassGroupingCollector is no longer specifically aimed at
collecting TopDocs for each group, but instead takes a GroupReducer that will
perform any type of reduction on the top groups collected on a first-pass. To
reproduce the old behaviour of SecondPassGroupingCollector, you should instead
use TopGroupsCollector.
## Removed legacy numerics (LUCENE-7850)
Support for legacy numerics has been removed since legacy numerics had been
deprecated since Lucene 6.0. Points should be used instead, see
org.apache.lucene.index.PointValues for an introduction.
## TopDocs.totalHits is now a long (LUCENE-7872)
TopDocs.totalHits is now a long so that TopDocs instances can be used to
represent top hits that have more than 2B matches. This is necessary for the
case that multiple TopDocs instances are merged together with TopDocs#merge as
they might have more than 2B matches in total. However TopDocs instances
returned by IndexSearcher will still have a total number of hits which is less
than 2B since Lucene indexes are still bound to at most 2B documents, so it
can safely be casted to an int in that case.
## PrefixAwareTokenFilter and PrefixAndSuffixAwareTokenFilter removed
(LUCENE-7877)
Instead use ConcatentingTokenStream, which will allow for the use of custom
attributes.

View File

@ -31,11 +31,11 @@ import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.DirectoryReader; // javadocs
import org.apache.lucene.index.FieldInvertState;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexReaderContext;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriter; // javadocs
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.index.ReaderUtil;
@ -45,8 +45,9 @@ import org.apache.lucene.index.TermContext;
import org.apache.lucene.index.Terms;
import org.apache.lucene.search.similarities.BM25Similarity;
import org.apache.lucene.search.similarities.Similarity;
import org.apache.lucene.store.NIOFSDirectory;
import org.apache.lucene.store.NIOFSDirectory; // javadoc
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.ThreadInterruptedException;
/** Implements search over a single IndexReader.
@ -93,10 +94,22 @@ public class IndexSearcher {
@Override
public SimScorer simScorer(SimWeight weight, LeafReaderContext context) throws IOException {
return new SimScorer() {
@Override
public float score(int doc, float freq) {
return 0f;
}
@Override
public float computeSlopFactor(int distance) {
return 1f;
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return 1f;
}
};
}

View File

@ -107,7 +107,7 @@ final class SloppyPhraseScorer extends Scorer {
}
if (pp.position > next) { // done minimizing current match-length
if (matchLength <= slop) {
freq += (1.0 / (1.0 + matchLength)); // score match
freq += docScorer.computeSlopFactor(matchLength); // score match
numMatches++;
if (!needsScores) {
return freq;
@ -125,7 +125,7 @@ final class SloppyPhraseScorer extends Scorer {
}
}
if (matchLength <= slop) {
freq += (1.0 / (1.0 + matchLength)); // score match
freq += docScorer.computeSlopFactor(matchLength); // score match
numMatches++;
}
return freq;

View File

@ -73,6 +73,11 @@ public class BM25Similarity extends Similarity {
return (float) Math.log(1 + (docCount - docFreq + 0.5D)/(docFreq + 0.5D));
}
/** Implemented as <code>1 / (distance + 1)</code>. */
protected float sloppyFreq(int distance) {
return 1.0f / (distance + 1);
}
/** The default implementation returns <code>1</code> */
protected float scorePayload(int doc, int start, int end, BytesRef payload) {
return 1;
@ -262,6 +267,15 @@ public class BM25Similarity extends Similarity {
}
}
@Override
public float computeSlopFactor(int distance) {
return sloppyFreq(distance);
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return scorePayload(doc, start, end, payload);
}
}
/** Collection statistics for the BM25 model. */

View File

@ -23,6 +23,7 @@ import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.search.CollectionStatistics;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.TermStatistics;
import org.apache.lucene.util.BytesRef;
/**
* Simple similarity that gives terms a score that is equal to their query
@ -79,6 +80,15 @@ public class BooleanSimilarity extends Similarity {
queryBoostExpl);
}
@Override
public float computeSlopFactor(int distance) {
return 1f;
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return 1f;
}
};
}

View File

@ -26,6 +26,7 @@ import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.search.CollectionStatistics;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.TermStatistics;
import org.apache.lucene.util.BytesRef;
/**
* Implements the CombSUM method for combining evidence from multiple
@ -91,6 +92,15 @@ public class MultiSimilarity extends Similarity {
return Explanation.match(score(doc, freq.getValue()), "sum of:", subs);
}
@Override
public float computeSlopFactor(int distance) {
return subScorers[0].computeSlopFactor(distance);
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return subScorers[0].computePayloadFactor(doc, start, end, payload);
}
}
static class MultiStats extends SimWeight {

View File

@ -29,6 +29,7 @@ import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TermStatistics;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.SmallFloat;
/**
@ -158,6 +159,14 @@ public abstract class Similarity {
*/
public abstract float score(int doc, float freq) throws IOException;
/** Computes the amount of a sloppy phrase match, based on an edit distance. */
@Deprecated
public abstract float computeSlopFactor(int distance);
/** Calculate a scoring factor based on the data in the payload. */
@Deprecated
public abstract float computePayloadFactor(int doc, int start, int end, BytesRef payload);
/**
* Explain the score for a single document
* @param doc document id within the inverted index segment

View File

@ -27,6 +27,7 @@ import org.apache.lucene.index.NumericDocValues;
import org.apache.lucene.search.CollectionStatistics;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.TermStatistics;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.SmallFloat;
/**
@ -254,5 +255,14 @@ public abstract class SimilarityBase extends Similarity {
return SimilarityBase.this.explain(stats, doc, freq, getLengthValue(doc));
}
@Override
public float computeSlopFactor(int distance) {
return 1.0f / (distance + 1);
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return 1f;
}
}
}

View File

@ -589,6 +589,16 @@ public abstract class TFIDFSimilarity extends Similarity {
return raw * normValue; // normalize for field
}
}
@Override
public float computeSlopFactor(int distance) {
return sloppyFreq(distance);
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return scorePayload(doc, start, end, payload);
}
@Override
public Explanation explain(int doc, Explanation freq) throws IOException {

View File

@ -106,7 +106,7 @@ public class SpanScorer extends Scorer {
freq = 1;
return;
}
freq += (1.0 / (1.0 + spans.width()));
freq += docScorer.computeSlopFactor(spans.width());
spans.doCurrentSpans();
prevStartPos = startPos;
prevEndPos = endPos;

View File

@ -22,6 +22,7 @@ import java.io.IOException;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.Scorer;
import org.apache.lucene.search.TwoPhaseIterator;
import org.apache.lucene.search.similarities.Similarity.SimScorer;
/** Iterates through combinations of start/end positions per-doc.
* Each start/end position represents a range of term positions within the current document.
@ -52,7 +53,8 @@ public abstract class Spans extends DocIdSetIterator {
public abstract int endPosition();
/**
* Return the width of the match, which is typically used to sloppy freq. It is only legal
* Return the width of the match, which is typically used to compute
* the {@link SimScorer#computeSlopFactor(int) slop factor}. It is only legal
* to call this method when the iterator is on a valid doc ID and positioned.
* The return value must be positive, and lower values means that the match is
* better.

View File

@ -31,6 +31,7 @@ import org.apache.lucene.search.CollectionStatistics;
import org.apache.lucene.search.TermStatistics;
import org.apache.lucene.search.similarities.Similarity;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.LuceneTestCase;
import org.apache.lucene.util.TestUtil;
@ -115,7 +116,6 @@ public class TestMaxTermFrequency extends LuceneTestCase {
@Override
public SimScorer simScorer(SimWeight weight, LeafReaderContext context) throws IOException {
return new SimScorer() {
@Override
@ -123,6 +123,15 @@ public class TestMaxTermFrequency extends LuceneTestCase {
return 0;
}
@Override
public float computeSlopFactor(int distance) {
return 0;
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return 0;
}
};
}

View File

@ -25,8 +25,8 @@ import java.util.Set;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.MockAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
@ -39,6 +39,7 @@ import org.apache.lucene.index.RandomIndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.similarities.Similarity;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.IOUtils;
import org.apache.lucene.util.LuceneTestCase;
@ -112,6 +113,16 @@ public class TestConjunctions extends LuceneTestCase {
public float score(int doc, float freq) {
return freq;
}
@Override
public float computeSlopFactor(int distance) {
return 1F;
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return 1F;
}
};
}
}

View File

@ -32,6 +32,7 @@ import org.apache.lucene.index.Term;
import org.apache.lucene.search.similarities.PerFieldSimilarityWrapper;
import org.apache.lucene.search.similarities.Similarity;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.LuceneTestCase;
/**
@ -172,6 +173,16 @@ public class TestDocValuesScoring extends LuceneTestCase {
public float score(int doc, float freq) throws IOException {
return getValueForDoc(doc) * sub.score(doc, freq);
}
@Override
public float computeSlopFactor(int distance) {
return sub.computeSlopFactor(distance);
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return sub.computePayloadFactor(doc, start, end, payload);
}
@Override
public Explanation explain(int doc, Explanation freq) throws IOException {

View File

@ -33,6 +33,7 @@ import org.apache.lucene.index.Term;
import org.apache.lucene.search.similarities.PerFieldSimilarityWrapper;
import org.apache.lucene.search.similarities.Similarity;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.LuceneTestCase;
public class TestSimilarityProvider extends LuceneTestCase {
@ -126,6 +127,15 @@ public class TestSimilarityProvider extends LuceneTestCase {
return 1;
}
@Override
public float computeSlopFactor(int distance) {
return 1;
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return 1;
}
};
}
@ -151,7 +161,16 @@ public class TestSimilarityProvider extends LuceneTestCase {
public float score(int doc, float freq) throws IOException {
return 10;
}
@Override
public float computeSlopFactor(int distance) {
return 1;
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
return 1;
}
};
}
}

View File

@ -471,7 +471,7 @@ public class TestDiversifiedTopDocsCollector extends LuceneTestCase {
@Override
public SimScorer simScorer(SimWeight stats, LeafReaderContext context)
throws IOException {
final SimScorer sub = sim.simScorer(stats, context);
final NumericDocValues values = DocValues.getNumeric(context.reader(), scoreValueField);
return new SimScorer() {
@ -487,6 +487,17 @@ public class TestDiversifiedTopDocsCollector extends LuceneTestCase {
}
}
@Override
public float computeSlopFactor(int distance) {
return sub.computeSlopFactor(distance);
}
@Override
public float computePayloadFactor(int doc, int start, int end,
BytesRef payload) {
return sub.computePayloadFactor(doc, start, end, payload);
}
@Override
public Explanation explain(int doc, Explanation freq) throws IOException {
return Explanation.match(score(doc, 0f), "indexDocValue(" + scoreValueField + ")");

View File

@ -23,6 +23,8 @@ import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.search.CollectionStatistics;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.TermStatistics;
import org.apache.lucene.search.spans.Spans;
import org.apache.lucene.util.BytesRef;
/** wraps a similarity with checks for testing */
public class AssertingSimilarity extends Similarity {
@ -95,6 +97,36 @@ public class AssertingSimilarity extends Similarity {
return score;
}
@Override
public float computeSlopFactor(int distance) {
// distance in bounds
assert distance >= 0;
// result in bounds
float slopFactor = delegateScorer.computeSlopFactor(distance);
assert Float.isFinite(slopFactor);
assert slopFactor > 0;
assert slopFactor <= 1;
return slopFactor;
}
@Override
public float computePayloadFactor(int doc, int start, int end, BytesRef payload) {
// doc in bounds
assert doc >= 0;
assert doc < context.reader().maxDoc();
// payload in bounds
assert payload.isValid();
// position range in bounds
assert start >= 0;
assert start != Spans.NO_MORE_POSITIONS;
assert end > start;
// result in bounds
float payloadFactor = delegateScorer.computePayloadFactor(doc, start, end, payload);
assert Float.isFinite(payloadFactor);
assert payloadFactor >= 0;
return payloadFactor;
}
@Override
public Explanation explain(int doc, Explanation freq) throws IOException {
// doc in bounds