mirror of https://github.com/apache/lucene.git
add comments from Doug describing how BooleanScorers work
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@727475 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
d3987d9ed4
commit
4561cb369d
|
@ -19,6 +19,39 @@ package org.apache.lucene.search;
|
||||||
|
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
|
|
||||||
|
/* Description from Doug Cutting (excerpted from
|
||||||
|
* LUCENE-1483):
|
||||||
|
*
|
||||||
|
* BooleanScorer uses a ~16k array to score windows of
|
||||||
|
* docs. So it scores docs 0-16k first, then docs 16-32k,
|
||||||
|
* etc. For each window it iterates through all query terms
|
||||||
|
* and accumulates a score in table[doc%16k]. It also stores
|
||||||
|
* in the table a bitmask representing which terms
|
||||||
|
* contributed to the score. Non-zero scores are chained in
|
||||||
|
* a linked list. At the end of scoring each window it then
|
||||||
|
* iterates through the linked list and, if the bitmask
|
||||||
|
* matches the boolean constraints, collects a hit. For
|
||||||
|
* boolean queries with lots of frequent terms this can be
|
||||||
|
* much faster, since it does not need to update a priority
|
||||||
|
* queue for each posting, instead performing constant-time
|
||||||
|
* operations per posting. The only downside is that it
|
||||||
|
* results in hits being delivered out-of-order within the
|
||||||
|
* window, which means it cannot be nested within other
|
||||||
|
* scorers. But it works well as a top-level scorer.
|
||||||
|
*
|
||||||
|
* The new BooleanScorer2 implementation instead works by
|
||||||
|
* merging priority queues of postings, albeit with some
|
||||||
|
* clever tricks. For example, a pure conjunction (all terms
|
||||||
|
* required) does not require a priority queue. Instead it
|
||||||
|
* sorts the posting streams at the start, then repeatedly
|
||||||
|
* skips the first to to the last. If the first ever equals
|
||||||
|
* the last, then there's a hit. When some terms are
|
||||||
|
* required and some terms are optional, the conjunction can
|
||||||
|
* be evaluated first, then the optional terms can all skip
|
||||||
|
* to the match and be added to the score. Thus the
|
||||||
|
* conjunction can reduce the number of priority queue
|
||||||
|
* updates for the optional terms. */
|
||||||
|
|
||||||
final class BooleanScorer extends Scorer {
|
final class BooleanScorer extends Scorer {
|
||||||
private SubScorer scorers = null;
|
private SubScorer scorers = null;
|
||||||
private BucketTable bucketTable = new BucketTable();
|
private BucketTable bucketTable = new BucketTable();
|
||||||
|
|
|
@ -22,6 +22,9 @@ import java.util.ArrayList;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.Iterator;
|
import java.util.Iterator;
|
||||||
|
|
||||||
|
/* See the description in BooleanScorer.java, comparing
|
||||||
|
* BooleanScorer & BooleanScorer2 */
|
||||||
|
|
||||||
/** An alternative to BooleanScorer that also allows a minimum number
|
/** An alternative to BooleanScorer that also allows a minimum number
|
||||||
* of optional scorers that should match.
|
* of optional scorers that should match.
|
||||||
* <br>Implements skipTo(), and has no limitations on the numbers of added scorers.
|
* <br>Implements skipTo(), and has no limitations on the numbers of added scorers.
|
||||||
|
|
Loading…
Reference in New Issue