LUCENE-3732: fix broken links and outdated information in core package.htmls

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1328746 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Robert Muir 2012-04-21 23:10:20 +00:00
parent 921ba028cc
commit 5b59942099
7 changed files with 170 additions and 194 deletions

View File

@ -149,7 +149,7 @@ and proximity searches (though sentence identification is not provided by Lucene
{@link org.apache.lucene.document.Field}s.
</li>
<li>
The modules/analysis library located at the root of the Lucene distribution has a number of different Analyzer implementations to solve a variety
The analysis library located at the root of the Lucene distribution has a number of different Analyzer implementations to solve a variety
of different problems related to searching. Many of the Analyzers are designed to analyze non-English languages.
</li>
<li>
@ -158,7 +158,7 @@ and proximity searches (though sentence identification is not provided by Lucene
</ol>
<p>
Analysis is one of the main causes of performance degradation during indexing. Simply put, the more you analyze the slower the indexing (in most cases).
Perhaps your application would be just fine using the simple WhitespaceTokenizer combined with a StopFilter. The contrib/benchmark library can be useful
Perhaps your application would be just fine using the simple WhitespaceTokenizer combined with a StopFilter. The benchmark/ library can be useful
for testing out the speed of the analysis process.
</p>
<h2>Invoking the Analyzer</h2>

View File

@ -36,9 +36,7 @@ package also provides utilities for working with {@link org.apache.lucene.docume
<p>First and foremost, a {@link org.apache.lucene.document.Document} is something created by the user application. It is your job
to create Documents based on the content of the files you are working with in your application (Word, txt, PDF, Excel or any other format.)
How this is done is completely up to you. That being said, there are many tools available in other projects that can make
the process of taking a file and converting it into a Lucene {@link org.apache.lucene.document.Document}. To see an example of this,
take a look at the Lucene <a href="../../../../../../gettingstarted.html" target="top">demo</a> and the associated source code
for extracting content from HTML.
the process of taking a file and converting it into a Lucene {@link org.apache.lucene.document.Document}.
</p>
<p>The {@link org.apache.lucene.document.DateTools} is a utility class to make dates and times searchable
(remember, Lucene only searches text). {@link org.apache.lucene.document.IntField}, {@link org.apache.lucene.document.LongField},

View File

@ -21,5 +21,6 @@
</head>
<body>
Code to maintain and access indices.
<!-- TODO: add a BASIC overview here, including code examples of using postings apis -->
</body>
</html>

View File

@ -40,164 +40,171 @@ org.apache.lucene.search.IndexSearcher#search(Query,int)} or {@link
org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
<!-- FILL IN MORE HERE -->
<!-- TODO: this page over-links the same things too many times -->
</p>
<a name="query"></a>
<h2>Query Classes</h2>
<h4>
<a href="TermQuery.html">TermQuery</a>
{@link org.apache.lucene.search.TermQuery TermQuery}
</h4>
<p>Of the various implementations of
<a href="Query.html">Query</a>, the
<a href="TermQuery.html">TermQuery</a>
is the easiest to understand and the most often used in applications. A <a
href="TermQuery.html">TermQuery</a> matches all the documents that contain the
{@link org.apache.lucene.search.Query Query}, the
{@link org.apache.lucene.search.TermQuery TermQuery}
is the easiest to understand and the most often used in applications. A
{@link org.apache.lucene.search.TermQuery TermQuery} matches all the documents that contain the
specified
<a href="../index/Term.html">Term</a>,
{@link org.apache.lucene.index.Term Term},
which is a word that occurs in a certain
<a href="../document/Field.html">Field</a>.
Thus, a <a href="TermQuery.html">TermQuery</a> identifies and scores all
<a href="../document/Document.html">Document</a>s that have a <a
href="../document/Field.html">Field</a> with the specified string in it.
Constructing a <a
href="TermQuery.html">TermQuery</a>
{@link org.apache.lucene.document.Field Field}.
Thus, a {@link org.apache.lucene.search.TermQuery TermQuery} identifies and scores all
{@link org.apache.lucene.document.Document Document}s that have a
{@link org.apache.lucene.document.Field Field} with the specified string in it.
Constructing a {@link org.apache.lucene.search.TermQuery TermQuery}
is as simple as:
<pre>
TermQuery tq = new TermQuery(new Term("fieldName", "term"));
</pre>In this example, the <a href="Query.html">Query</a> identifies all <a
href="../document/Document.html">Document</a>s that have the <a
href="../document/Field.html">Field</a> named <tt>"fieldName"</tt>
</pre>In this example, the {@link org.apache.lucene.search.Query Query} identifies all
{@link org.apache.lucene.document.Document Document}s that have the
{@link org.apache.lucene.document.Field Field} named <tt>"fieldName"</tt>
containing the word <tt>"term"</tt>.
</p>
<h4>
<a href="BooleanQuery.html">BooleanQuery</a>
{@link org.apache.lucene.search.BooleanQuery BooleanQuery}
</h4>
<p>Things start to get interesting when one combines multiple
<a href="TermQuery.html">TermQuery</a> instances into a <a
href="BooleanQuery.html">BooleanQuery</a>.
A <a href="BooleanQuery.html">BooleanQuery</a> contains multiple
<a href="BooleanClause.html">BooleanClause</a>s,
where each clause contains a sub-query (<a href="Query.html">Query</a>
instance) and an operator (from <a
href="BooleanClause.Occur.html">BooleanClause.Occur</a>)
{@link org.apache.lucene.search.TermQuery TermQuery} instances into a
{@link org.apache.lucene.search.BooleanQuery BooleanQuery}.
A {@link org.apache.lucene.search.BooleanQuery BooleanQuery} contains multiple
{@link org.apache.lucene.search.BooleanClause BooleanClause}s,
where each clause contains a sub-query ({@link org.apache.lucene.search.Query Query}
instance) and an operator (from
{@link org.apache.lucene.search.BooleanClause.Occur BooleanClause.Occur})
describing how that sub-query is combined with the other clauses:
<ol>
<li><p>SHOULD &mdash; Use this operator when a clause can occur in the result set, but is not required.
<li><p>{@link org.apache.lucene.search.BooleanClause.Occur#SHOULD SHOULD} &mdash; Use this operator when a clause can occur in the result set, but is not required.
If a query is made up of all SHOULD clauses, then every document in the result
set matches at least one of these clauses.</p></li>
<li><p>MUST &mdash; Use this operator when a clause is required to occur in the result set. Every
<li><p>{@link org.apache.lucene.search.BooleanClause.Occur#MUST MUST} &mdash; Use this operator when a clause is required to occur in the result set. Every
document in the result set will match
all such clauses.</p></li>
<li><p>MUST NOT &mdash; Use this operator when a
<li><p>{@link org.apache.lucene.search.BooleanClause.Occur#MUST_NOT MUST NOT} &mdash; Use this operator when a
clause must not occur in the result set. No
document in the result set will match
any such clauses.</p></li>
</ol>
Boolean queries are constructed by adding two or more
<a href="BooleanClause.html">BooleanClause</a>
instances. If too many clauses are added, a <a href="BooleanQuery.TooManyClauses.html">TooManyClauses</a>
{@link org.apache.lucene.search.BooleanClause BooleanClause}
instances. If too many clauses are added, a {@link org.apache.lucene.search.BooleanQuery.TooManyClauses TooManyClauses}
exception will be thrown during searching. This most often occurs
when a <a href="Query.html">Query</a>
is rewritten into a <a href="BooleanQuery.html">BooleanQuery</a> with many
<a href="TermQuery.html">TermQuery</a> clauses,
for example by <a href="WildcardQuery.html">WildcardQuery</a>.
when a {@link org.apache.lucene.search.Query Query}
is rewritten into a {@link org.apache.lucene.search.BooleanQuery BooleanQuery} with many
{@link org.apache.lucene.search.TermQuery TermQuery} clauses,
for example by {@link org.apache.lucene.search.WildcardQuery WildcardQuery}.
The default setting for the maximum number
of clauses 1024, but this can be changed via the
static method <a href="BooleanQuery.html#setMaxClauseCount(int)">setMaxClauseCount</a>
in <a href="BooleanQuery.html">BooleanQuery</a>.
static method {@link org.apache.lucene.search.BooleanQuery#setMaxClauseCount(int)}.
</p>
<h4>Phrases</h4>
<p>Another common search is to find documents containing certain phrases. This
is handled two different ways:
is handled three different ways:
<ol>
<li>
<p><a href="PhraseQuery.html">PhraseQuery</a>
<p>{@link org.apache.lucene.search.PhraseQuery PhraseQuery}
&mdash; Matches a sequence of
<a href="../index/Term.html">Terms</a>.
<a href="PhraseQuery.html">PhraseQuery</a> uses a slop factor to determine
{@link org.apache.lucene.index.Term Term}s.
{@link org.apache.lucene.search.PhraseQuery PhraseQuery} uses a slop factor to determine
how many positions may occur between any two terms in the phrase and still be considered a match.</p>
</li>
<li>
<p><a href="spans/SpanNearQuery.html">SpanNearQuery</a>
<p>{@link org.apache.lucene.search.MultiPhraseQuery MultiPhraseQuery}
&mdash; A more general form of PhraseQuery that accepts multiple Terms
for a position in the phrase. For example, this can be used to perform phrase queries that also
incorporate synonyms.
</li>
<li>
<p>{@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery}
&mdash; Matches a sequence of other
<a href="spans/SpanQuery.html">SpanQuery</a>
instances. <a href="spans/SpanNearQuery.html">SpanNearQuery</a> allows for
{@link org.apache.lucene.search.spans.SpanQuery SpanQuery}
instances. {@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery} allows for
much more
complicated phrase queries since it is constructed from other <a
href="spans/SpanQuery.html">SpanQuery</a>
instances, instead of only <a href="TermQuery.html">TermQuery</a>
complicated phrase queries since it is constructed from other
{@link org.apache.lucene.search.spans.SpanQuery SpanQuery}
instances, instead of only {@link org.apache.lucene.search.TermQuery TermQuery}
instances.</p>
</li>
</ol>
</p>
<h4>
<a href="TermRangeQuery.html">TermRangeQuery</a>
{@link org.apache.lucene.search.TermRangeQuery TermRangeQuery}
</h4>
<p>The
<a href="TermRangeQuery.html">TermRangeQuery</a>
{@link org.apache.lucene.search.TermRangeQuery TermRangeQuery}
matches all documents that occur in the
exclusive range of a lower
<a href="../index/Term.html">Term</a>
{@link org.apache.lucene.index.Term Term}
and an upper
<a href="../index/Term.html">Term</a>.
according to {@link java.lang.String#compareTo(String)}. It is not intended
for numerical ranges, use <a href="NumericRangeQuery.html">NumericRangeQuery</a> instead.
{@link org.apache.lucene.index.Term Term}
according to {@link org.apache.lucene.index.TermsEnum#getComparator TermsEnum.getComparator()}. It is not intended
for numerical ranges, use {@link org.apache.lucene.search.NumericRangeQuery NumericRangeQuery} instead.
For example, one could find all documents
that have terms beginning with the letters <tt>a</tt> through <tt>c</tt>. This type of <a
href="Query.html">Query</a> is frequently used to
that have terms beginning with the letters <tt>a</tt> through <tt>c</tt>. This type of
{@link org.apache.lucene.search.Query} is frequently used to
find
documents that occur in a specific date range.
</p>
<h4>
<a href="NumericRangeQuery.html">NumericRangeQuery</a>
{@link org.apache.lucene.search.NumericRangeQuery NumericRangeQuery}
</h4>
<p>The
<a href="NumericRangeQuery.html">NumericRangeQuery</a>
{@link org.apache.lucene.search.NumericRangeQuery NumericRangeQuery}
matches all documents that occur in a numeric range.
For NumericRangeQuery to work, you must index the values
using a one of the numeric fields (<a href="../document/IntField.html">IntField</a>,
<a href="../document/LongField.html">LongField</a>, <a href="../document/FloatField.html">FloatField</a>,
or <a href="../document/DoubleField.html">DoubleField</a>).
using a one of the numeric fields ({@link org.apache.lucene.document.IntField IntField},
{@link org.apache.lucene.document.LongField LongField}, {@link org.apache.lucene.document.FloatField FloatField},
or {@link org.apache.lucene.document.DoubleField DoubleField}).
</p>
<h4>
<a href="PrefixQuery.html">PrefixQuery</a>,
<a href="WildcardQuery.html">WildcardQuery</a>
{@link org.apache.lucene.search.PrefixQuery PrefixQuery},
{@link org.apache.lucene.search.WildcardQuery WildcardQuery},
{@link org.apache.lucene.search.RegexpQuery RegexpQuery}
</h4>
<p>While the
<a href="PrefixQuery.html">PrefixQuery</a>
{@link org.apache.lucene.search.PrefixQuery PrefixQuery}
has a different implementation, it is essentially a special case of the
<a href="WildcardQuery.html">WildcardQuery</a>.
The <a href="PrefixQuery.html">PrefixQuery</a> allows an application
to identify all documents with terms that begin with a certain string. The <a
href="WildcardQuery.html">WildcardQuery</a> generalizes this by allowing
{@link org.apache.lucene.search.WildcardQuery WildcardQuery}.
The {@link org.apache.lucene.search.PrefixQuery PrefixQuery} allows an application
to identify all documents with terms that begin with a certain string. The
{@link org.apache.lucene.search.WildcardQuery WildcardQuery} generalizes this by allowing
for the use of <tt>*</tt> (matches 0 or more characters) and <tt>?</tt> (matches exactly one character) wildcards.
Note that the <a href="WildcardQuery.html">WildcardQuery</a> can be quite slow. Also
Note that the {@link org.apache.lucene.search.WildcardQuery WildcardQuery} can be quite slow. Also
note that
<a href="WildcardQuery.html">WildcardQuery</a> should
{@link org.apache.lucene.search.WildcardQuery WildcardQuery} should
not start with <tt>*</tt> and <tt>?</tt>, as these are extremely slow.
To remove this protection and allow a wildcard at the beginning of a term, see method
<a href="../queryParser/QueryParser.html#setAllowLeadingWildcard(boolean)">setAllowLeadingWildcard</a> in
<a href="../queryParser/QueryParser.html">QueryParser</a>.
Some QueryParsers may not allow this by default, but provide a <code>setAllowLeadingWildcard</code> method
to remove protection.
The {@link org.apache.lucene.search.RegexpQuery RegexpQuery} is even more general than WildcardQuery,
allowing an application to identify all documents with terms that match a regular expression pattern.
</p>
<h4>
<a href="FuzzyQuery.html">FuzzyQuery</a>
{@link org.apache.lucene.search.FuzzyQuery FuzzyQuery}
</h4>
<p>A
<a href="FuzzyQuery.html">FuzzyQuery</a>
{@link org.apache.lucene.search.FuzzyQuery FuzzyQuery}
matches documents that contain terms similar to the specified term. Similarity is
determined using
<a href="http://en.wikipedia.org/wiki/Levenshtein">Levenshtein (edit) distance</a>.
@ -206,58 +213,9 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
<a name="changingSimilarity"></a>
<h2>Changing Similarity</h2>
<p>Chances are <a href="DefaultSimilarity.html">DefaultSimilarity</a> is sufficient for all
your searching needs.
However, in some applications it may be necessary to customize your <a
href="Similarity.html">Similarity</a> implementation. For instance, some
applications do not need to
distinguish between shorter and longer documents (see <a
href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">a "fair" similarity</a>).</p>
See the {@link org.apache.lucene.search.similarities} package documentation for information
on the available scoring models and extending or changing Similarity.
<p>To change <a href="Similarity.html">Similarity</a>, one must do so for both indexing and
searching, and the changes must happen before
either of these actions take place. Although in theory there is nothing stopping you from changing mid-stream, it
just isn't well-defined what is going to happen.
</p>
<p>To make this change, implement your own <a href="Similarity.html">Similarity</a> (likely
you'll want to simply subclass
<a href="DefaultSimilarity.html">DefaultSimilarity</a>) and then use the new
class by calling
<a href="../index/IndexWriter.html#setSimilarity(org.apache.lucene.search.Similarity)">IndexWriter.setSimilarity</a>
before indexing and
<a href="Searcher.html#setSimilarity(org.apache.lucene.search.Similarity)">Searcher.setSimilarity</a>
before searching.
</p>
<p>
If you are interested in use cases for changing your similarity, see the Lucene users's mailing list at <a
href="http://www.nabble.com/Overriding-Similarity-tf2128934.html">Overriding Similarity</a>.
In summary, here are a few use cases:
<ol>
<li><p><a href="api/org/apache/lucene/misc/SweetSpotSimilarity.html">SweetSpotSimilarity</a> &mdash; <a
href="api/org/apache/lucene/misc/SweetSpotSimilarity.html">SweetSpotSimilarity</a> gives small increases
as the frequency increases a small amount
and then greater increases when you hit the "sweet spot", i.e. where you think the frequency of terms is
more significant.</p></li>
<li><p>Overriding tf &mdash; In some applications, it doesn't matter what the score of a document is as long as a
matching term occurs. In these
cases people have overridden Similarity to return 1 from the tf() method.</p></li>
<li><p>Changing Length Normalization &mdash; By overriding <a
href="Similarity.html#lengthNorm(java.lang.String,%20int)">lengthNorm</a>,
it is possible to discount how the length of a field contributes
to a score. In <a href="DefaultSimilarity.html">DefaultSimilarity</a>,
lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be
1 / (numTerms in field), all fields will be treated
<a href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">"fairly"</a>.</p></li>
</ol>
In general, Chris Hostetter sums it up best in saying (from <a
href="http://www.gossamer-threads.com/lists/lucene/java-user/39125#39125">the Lucene users's mailing list</a>):
<blockquote>[One would override the Similarity in] ... any situation where you know more about your data then just
that
it's "text" is a situation where it *might* make sense to to override your
Similarity method.</blockquote>
</p>
<a name="scoring"></a>
<h2>Changing Scoring &mdash; Expert Level</h2>
@ -270,112 +228,130 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
<span >three main classes</span>:
<ol>
<li>
<a href="Query.html">Query</a> &mdash; The abstract object representation of the
{@link org.apache.lucene.search.Query Query} &mdash; The abstract object representation of the
user's information need.</li>
<li>
<a href="Weight.html">Weight</a> &mdash; The internal interface representation of
{@link org.apache.lucene.search.Weight Weight} &mdash; The internal interface representation of
the user's Query, so that Query objects may be reused.</li>
<li>
<a href="Scorer.html">Scorer</a> &mdash; An abstract class containing common
{@link org.apache.lucene.search.Scorer Scorer} &mdash; An abstract class containing common
functionality for scoring. Provides both scoring and explanation capabilities.</li>
</ol>
Details on each of these classes, and their children, can be found in the subsections below.
</p>
<h4>The Query Class</h4>
<p>In some sense, the
<a href="Query.html">Query</a>
{@link org.apache.lucene.search.Query Query}
class is where it all begins. Without a Query, there would be
nothing to score. Furthermore, the Query class is the catalyst for the other scoring classes as it
is often responsible
for creating them or coordinating the functionality between them. The
<a href="Query.html">Query</a> class has several methods that are important for
{@link org.apache.lucene.search.Query Query} class has several methods that are important for
derived classes:
<ol>
<li>createWeight(Searcher searcher) &mdash; A
<a href="Weight.html">Weight</a> is the internal representation of the
<li>{@link org.apache.lucene.search.Query#createWeight(IndexSearcher) createWeight(IndexSearcher searcher} &mdash; A
{@link org.apache.lucene.search.Weight Weight} is the internal representation of the
Query, so each Query implementation must
provide an implementation of Weight. See the subsection on <a
href="#The Weight Interface">The Weight Interface</a> below for details on implementing the Weight
interface.</li>
<li>rewrite(IndexReader reader) &mdash; Rewrites queries into primitive queries. Primitive queries are:
<a href="TermQuery.html">TermQuery</a>,
<a href="BooleanQuery.html">BooleanQuery</a>, <span
>and other queries that implement Query.html#createWeight(Searcher searcher)</span></li>
<li>{@link org.apache.lucene.search.Query#rewrite(IndexReader) rewrite(IndexReader reader} &mdash; Rewrites queries into primitive queries. Primitive queries are:
{@link org.apache.lucene.search.TermQuery TermQuery},
{@link org.apache.lucene.search.BooleanQuery BooleanQuery}, <span
>and other queries that implement {@link org.apache.lucene.search.Query#createWeight(IndexSearcher) createWeight(IndexSearcher searcher)}</span></li>
</ol>
</p>
<h4>The Weight Interface</h4>
<p>The
<a href="Weight.html">Weight</a>
{@link org.apache.lucene.search.Weight Weight}
interface provides an internal representation of the Query so that it can be reused. Any
<a href="Searcher.html">Searcher</a>
{@link org.apache.lucene.search.IndexSearcher IndexSearcher}
dependent state should be stored in the Weight implementation,
not in the Query class. The interface defines six methods that must be implemented:
not in the Query class. The interface defines five methods that must be implemented:
<ol>
<li>
<a href="Weight.html#getQuery()">Weight#getQuery()</a> &mdash; Pointer to the
{@link org.apache.lucene.search.Weight#getQuery getQuery()} &mdash; Pointer to the
Query that this Weight represents.</li>
<li>
<a href="Weight.html#getValue()">Weight#getValue()</a> &mdash; The weight for
this Query. For example, the TermQuery.TermWeight value is
equal to the idf^2 * boost * queryNorm <!-- DOUBLE CHECK THIS --></li>
{@link org.apache.lucene.search.Weight#getValueForNormalization() getValueForNormalization()} &mdash;
A weight can return a floating point value to indicate its magnitude for query normalization. Typically
a weight such as TermWeight that scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity}
will just defer to the Similarity's implementation:
{@link org.apache.lucene.search.similarities.Similarity.SimWeight#getValueForNormalization SimWeight#getValueForNormalization()}.
For example, with {@link org.apache.lucene.search.similarities.TFIDFSimilarity Lucene's classic vector-space formula}, this
is implemented as the sum of squared weights: <code>(idf * boost)<sup>2</sup></code></li>
<li>
<a href="Weight.html#sumOfSquaredWeights()">
Weight#sumOfSquaredWeights()</a> &mdash; The sum of squared weights. For TermQuery, this is (idf *
boost)^2</li>
<li>
<a href="Weight.html#normalize(float)">
Weight#normalize(float)</a> &mdash; Determine the query normalization factor. The query normalization may
{@link org.apache.lucene.search.Weight#normalize(float,float) normalize(float norm, float topLevelBoost)} &mdash;
Performs query normalization:
<ul>
<li><code>topLevelBoost</code>: A query-boost factor from any wrapping queries that should be multiplied into every
document's score. For example, a TermQuery that is wrapped within a BooleanQuery with a boost of <code>5</code> would
receive this value at this time. This allows the TermQuery (the leaf node in this case) to compute this up-front
a single time (e.g. by multiplying into the IDF), rather than for every document.</li>
<li><code>norm</code>: Passes in a a normalization factor which may
allow for comparing scores between queries.</li>
</ul>
Typically a weight such as TermWeight
that scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity} will just defer to the Similarity's implementation:
{@link org.apache.lucene.search.similarities.Similarity.SimWeight#normalize SimWeight#normalize(float,float)}.</li>
<li>
<a href="Weight.html#scorer(org.apache.lucene.index.AtomicReaderContext, boolean, boolean)">
Weight#scorer(AtomicReaderContext, boolean, boolean)</a> &mdash; Construct a new
<a href="Scorer.html">Scorer</a>
for this Weight. See
<a href="#The Scorer Class">The Scorer Class</a>
below for help defining a Scorer. As the name implies, the
Scorer is responsible for doing the actual scoring of documents given the Query.
{@link org.apache.lucene.search.Weight#scorer(org.apache.lucene.index.AtomicReaderContext, boolean, boolean, org.apache.lucene.util.Bits)
scorer(AtomicReaderContext context, boolean scoresDocsInOrder, boolean topScorer, Bits acceptDocs)} &mdash;
Construct a new {@link org.apache.lucene.search.Scorer Scorer} for this Weight. See <a href="#The Scorer Class">The Scorer Class</a>
below for help defining a Scorer. As the name implies, the Scorer is responsible for doing the actual scoring of documents
given the Query.
</li>
<li>
<a href="Weight.html#explain(org.apache.lucene.search.Searcher, org.apache.lucene.index.AtomicReaderContext, int)">
Weight#explain(Searcher, AtomicReaderContext, int)</a> &mdash; Provide a means for explaining why a given document was
scored
the way it was.</li>
{@link org.apache.lucene.search.Weight#explain(org.apache.lucene.index.AtomicReaderContext, int)
explain(AtomicReaderContext context, int doc)} &mdash; Provide a means for explaining why a given document was
scored the way it was.
Typically a weight such as TermWeight
that scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity} will make use of the Similarity's implementations:
{@link org.apache.lucene.search.similarities.Similarity.ExactSimScorer#explain(int, Explanation) ExactSimScorer#explain(int doc, Explanation freq)},
and {@link org.apache.lucene.search.similarities.Similarity.SloppySimScorer#explain(int, Explanation) SloppySimScorer#explain(int doc, Explanation freq)}
</li>
</li>
</ol>
</p>
<h4>The Scorer Class</h4>
<p>The
<a href="Scorer.html">Scorer</a>
{@link org.apache.lucene.search.Scorer Scorer}
abstract class provides common scoring functionality for all Scorer implementations and
is the heart of the Lucene scoring process. The Scorer defines the following abstract (some of them are not
yet abstract, but will be in future versions and should be considered as such now) methods which
must be implemented (some of them inherited from <a href="DocIdSetIterator.html">DocIdSetIterator</a> ):
must be implemented (some of them inherited from {@link org.apache.lucene.search.DocIdSetIterator DocIdSetIterator}):
<ol>
<li>
<a href="DocIdSetIterator.html#nextDoc()">DocIdSetIterator#nextDoc()</a> &mdash; Advances to the next
document that matches this Query, returning true if and only
if there is another document that matches.</li>
{@link org.apache.lucene.search.Scorer#nextDoc nextDoc()} &mdash; Advances to the next
document that matches this Query, returning true if and only if there is another document that matches.</li>
<li>
<a href="DocIdSetIterator.html#docID()">DocIdSetIterator#docID()</a> &mdash; Returns the id of the
<a href="../document/Document.html">Document</a>
that contains the match. It is not valid until next() has been called at least once.
{@link org.apache.lucene.search.Scorer#docID docID()} &mdash; Returns the id of the
{@link org.apache.lucene.document.Document Document} that contains the match.
</li>
<li>
<a href="Scorer.html#score(org.apache.lucene.search.Collector)">Scorer#score(Collector)</a> &mdash;
Scores and collects all matching documents using the given Collector.
{@link org.apache.lucene.search.Scorer#score score()} &mdash; Return the score of the
current document. This value can be determined in any appropriate way for an application. For instance, the
{@link org.apache.lucene.search.TermScorer TermScorer} simply defers to the configured Similarity:
{@link org.apache.lucene.search.similarities.Similarity.ExactSimScorer#score(int, int) ExactSimScorer.score(int doc, int freq)}.
</li>
<li>
<a href="Scorer.html#score()">Scorer#score()</a> &mdash; Return the score of the
current document. This value can be determined in any
appropriate way for an application. For instance, the
<a href="http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/TermScorer.java?view=log">TermScorer</a>
returns the tf * Weight.getValue() * fieldNorm.
{@link org.apache.lucene.search.Scorer#freq freq()} &mdash; Returns the number of matches
for the current document. This value can be determined in any appropriate way for an application. For instance, the
{@link org.apache.lucene.search.TermScorer TermScorer} simply defers to the term frequency from the inverted index:
{@link org.apache.lucene.index.DocsEnum#freq DocsEnum.freq()}.
</li>
<li>
<a href="DocIdSetIterator.html#advance(int)">DocIdSetIterator#advance(int)</a> &mdash; Skip ahead in
{@link org.apache.lucene.search.Scorer#advance advance()} &mdash; Skip ahead in
the document matches to the document whose id is greater than
or equal to the passed in value. In many instances, advance can be
implemented more efficiently than simply looping through all the matching documents until
the target document is identified.</li>
the target document is identified.
</li>
<li>
{@link org.apache.lucene.search.Scorer#getChildren getChildren()} &mdash; Returns any child subscorers
underneath this scorer. This allows for users to navigate the scorer hierarchy and receive more fine-grained
details on the scoring process.
</li>
</ol>
</p>
<h4>Why would I want to add my own Query?</h4>

View File

@ -25,8 +25,8 @@
</DIV>
<div>
<ol>
<li><a href="./PayloadTermQuery.html">PayloadTermQuery</a> -- Boost a term's score based on the value of the payload located at that term.</li>
<li><a href="./PayloadNearQuery.html">PayloadNearQuery</a> -- A <a href="../spans/SpanNearQuery.html">SpanNearQuery</a> that factors in the value of the payloads located
<li>{@link org.apache.lucene.search.payloads.PayloadTermQuery PayloadTermQuery} -- Boost a term's score based on the value of the payload located at that term.</li>
<li>{@link org.apache.lucene.search.payloads.PayloadNearQuery PayloadNearQuery} -- A {@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery} that factors in the value of the payloads located
at each of the positions where the spans occur.</li>
</ol>
</div>

View File

@ -105,7 +105,7 @@ implement the {@link org.apache.lucene.search.similarities.SimilarityBase#score(
and {@link org.apache.lucene.search.similarities.SimilarityBase#toString()}
methods.</p>
<p>Another options is to extend one of the <a href="#framework">frameworks</a>
<p>Another option is to extend one of the <a href="#framework">frameworks</a>
based on {@link org.apache.lucene.search.similarities.SimilarityBase}. These
Similarities are implemented modularly, e.g.
{@link org.apache.lucene.search.similarities.DFRSimilarity} delegates

View File

@ -26,29 +26,30 @@ The calculus of spans.
<ul>
<li>A <a href="SpanTermQuery.html">SpanTermQuery</a> matches all spans
containing a particular <a href="../../index/Term.html">Term</a>.</li>
<li>A {@link org.apache.lucene.search.spans.SpanTermQuery SpanTermQuery} matches all spans
containing a particular {@link org.apache.lucene.index.Term Term}.</li>
<li> A <a href="SpanNearQuery.html">SpanNearQuery</a> matches spans
<li> A {@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery} matches spans
which occur near one another, and can be used to implement things like
phrase search (when constructed from <a
href="SpanTermQuery.html">SpanTermQueries</a>) and inter-phrase
proximity (when constructed from other <a
href="SpanNearQuery.html">SpanNearQueries</a>).</li>
phrase search (when constructed from {@link org.apache.lucene.search.spans.SpanTermQuery}s)
and inter-phrase proximity (when constructed from other {@link org.apache.lucene.search.spans.SpanNearQuery}s).</li>
<li>A <a href="SpanOrQuery.html">SpanOrQuery</a> merges spans from a
number of other <a href="SpanQuery.html">SpanQueries</a>.</li>
<li>A {@link org.apache.lucene.search.spans.SpanOrQuery SpanOrQuery} merges spans from a
number of other {@link org.apache.lucene.search.spans.SpanQuery}s.</li>
<li>A <a href="SpanNotQuery.html">SpanNotQuery</a> removes spans
matching one <a href="SpanQuery.html">SpanQuery</a> which overlap
<li>A {@link org.apache.lucene.search.spans.SpanNotQuery SpanNotQuery} removes spans
matching one {@link org.apache.lucene.search.spans.SpanQuery SpanQuery} which overlap
another. This can be used, e.g., to implement within-paragraph
search.</li>
<li>A <a href="SpanFirstQuery.html">SpanFirstQuery</a> matches spans
<li>A {@link org.apache.lucene.search.spans.SpanFirstQuery SpanFirstQuery} matches spans
matching <code>q</code> whose end position is less than
<code>n</code>. This can be used to constrain matches to the first
part of the document.</li>
<li>A {@link org.apache.lucene.search.spans.SpanPositionRangeQuery SpanPositionRangeQuery} is
a more general form of SpanFirstQuery that can constrain matches to arbitrary portions of the document.</li>
</ul>
In all cases, output spans are minimally inclusive. In other words, a