mirror of
https://github.com/apache/lucene.git
synced 2025-03-05 15:59:25 +00:00
LUCENE-3732: fix broken links and outdated information in core package.htmls
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1328746 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
921ba028cc
commit
5b59942099
@ -149,7 +149,7 @@ and proximity searches (though sentence identification is not provided by Lucene
|
||||
{@link org.apache.lucene.document.Field}s.
|
||||
</li>
|
||||
<li>
|
||||
The modules/analysis library located at the root of the Lucene distribution has a number of different Analyzer implementations to solve a variety
|
||||
The analysis library located at the root of the Lucene distribution has a number of different Analyzer implementations to solve a variety
|
||||
of different problems related to searching. Many of the Analyzers are designed to analyze non-English languages.
|
||||
</li>
|
||||
<li>
|
||||
@ -158,7 +158,7 @@ and proximity searches (though sentence identification is not provided by Lucene
|
||||
</ol>
|
||||
<p>
|
||||
Analysis is one of the main causes of performance degradation during indexing. Simply put, the more you analyze the slower the indexing (in most cases).
|
||||
Perhaps your application would be just fine using the simple WhitespaceTokenizer combined with a StopFilter. The contrib/benchmark library can be useful
|
||||
Perhaps your application would be just fine using the simple WhitespaceTokenizer combined with a StopFilter. The benchmark/ library can be useful
|
||||
for testing out the speed of the analysis process.
|
||||
</p>
|
||||
<h2>Invoking the Analyzer</h2>
|
||||
|
@ -36,9 +36,7 @@ package also provides utilities for working with {@link org.apache.lucene.docume
|
||||
<p>First and foremost, a {@link org.apache.lucene.document.Document} is something created by the user application. It is your job
|
||||
to create Documents based on the content of the files you are working with in your application (Word, txt, PDF, Excel or any other format.)
|
||||
How this is done is completely up to you. That being said, there are many tools available in other projects that can make
|
||||
the process of taking a file and converting it into a Lucene {@link org.apache.lucene.document.Document}. To see an example of this,
|
||||
take a look at the Lucene <a href="../../../../../../gettingstarted.html" target="top">demo</a> and the associated source code
|
||||
for extracting content from HTML.
|
||||
the process of taking a file and converting it into a Lucene {@link org.apache.lucene.document.Document}.
|
||||
</p>
|
||||
<p>The {@link org.apache.lucene.document.DateTools} is a utility class to make dates and times searchable
|
||||
(remember, Lucene only searches text). {@link org.apache.lucene.document.IntField}, {@link org.apache.lucene.document.LongField},
|
||||
|
@ -21,5 +21,6 @@
|
||||
</head>
|
||||
<body>
|
||||
Code to maintain and access indices.
|
||||
<!-- TODO: add a BASIC overview here, including code examples of using postings apis -->
|
||||
</body>
|
||||
</html>
|
||||
|
@ -40,164 +40,171 @@ org.apache.lucene.search.IndexSearcher#search(Query,int)} or {@link
|
||||
org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
|
||||
|
||||
<!-- FILL IN MORE HERE -->
|
||||
<!-- TODO: this page over-links the same things too many times -->
|
||||
</p>
|
||||
<a name="query"></a>
|
||||
<h2>Query Classes</h2>
|
||||
<h4>
|
||||
<a href="TermQuery.html">TermQuery</a>
|
||||
{@link org.apache.lucene.search.TermQuery TermQuery}
|
||||
</h4>
|
||||
|
||||
<p>Of the various implementations of
|
||||
<a href="Query.html">Query</a>, the
|
||||
<a href="TermQuery.html">TermQuery</a>
|
||||
is the easiest to understand and the most often used in applications. A <a
|
||||
href="TermQuery.html">TermQuery</a> matches all the documents that contain the
|
||||
{@link org.apache.lucene.search.Query Query}, the
|
||||
{@link org.apache.lucene.search.TermQuery TermQuery}
|
||||
is the easiest to understand and the most often used in applications. A
|
||||
{@link org.apache.lucene.search.TermQuery TermQuery} matches all the documents that contain the
|
||||
specified
|
||||
<a href="../index/Term.html">Term</a>,
|
||||
{@link org.apache.lucene.index.Term Term},
|
||||
which is a word that occurs in a certain
|
||||
<a href="../document/Field.html">Field</a>.
|
||||
Thus, a <a href="TermQuery.html">TermQuery</a> identifies and scores all
|
||||
<a href="../document/Document.html">Document</a>s that have a <a
|
||||
href="../document/Field.html">Field</a> with the specified string in it.
|
||||
Constructing a <a
|
||||
href="TermQuery.html">TermQuery</a>
|
||||
{@link org.apache.lucene.document.Field Field}.
|
||||
Thus, a {@link org.apache.lucene.search.TermQuery TermQuery} identifies and scores all
|
||||
{@link org.apache.lucene.document.Document Document}s that have a
|
||||
{@link org.apache.lucene.document.Field Field} with the specified string in it.
|
||||
Constructing a {@link org.apache.lucene.search.TermQuery TermQuery}
|
||||
is as simple as:
|
||||
<pre>
|
||||
TermQuery tq = new TermQuery(new Term("fieldName", "term"));
|
||||
</pre>In this example, the <a href="Query.html">Query</a> identifies all <a
|
||||
href="../document/Document.html">Document</a>s that have the <a
|
||||
href="../document/Field.html">Field</a> named <tt>"fieldName"</tt>
|
||||
</pre>In this example, the {@link org.apache.lucene.search.Query Query} identifies all
|
||||
{@link org.apache.lucene.document.Document Document}s that have the
|
||||
{@link org.apache.lucene.document.Field Field} named <tt>"fieldName"</tt>
|
||||
containing the word <tt>"term"</tt>.
|
||||
</p>
|
||||
<h4>
|
||||
<a href="BooleanQuery.html">BooleanQuery</a>
|
||||
{@link org.apache.lucene.search.BooleanQuery BooleanQuery}
|
||||
</h4>
|
||||
|
||||
<p>Things start to get interesting when one combines multiple
|
||||
<a href="TermQuery.html">TermQuery</a> instances into a <a
|
||||
href="BooleanQuery.html">BooleanQuery</a>.
|
||||
A <a href="BooleanQuery.html">BooleanQuery</a> contains multiple
|
||||
<a href="BooleanClause.html">BooleanClause</a>s,
|
||||
where each clause contains a sub-query (<a href="Query.html">Query</a>
|
||||
instance) and an operator (from <a
|
||||
href="BooleanClause.Occur.html">BooleanClause.Occur</a>)
|
||||
{@link org.apache.lucene.search.TermQuery TermQuery} instances into a
|
||||
{@link org.apache.lucene.search.BooleanQuery BooleanQuery}.
|
||||
A {@link org.apache.lucene.search.BooleanQuery BooleanQuery} contains multiple
|
||||
{@link org.apache.lucene.search.BooleanClause BooleanClause}s,
|
||||
where each clause contains a sub-query ({@link org.apache.lucene.search.Query Query}
|
||||
instance) and an operator (from
|
||||
{@link org.apache.lucene.search.BooleanClause.Occur BooleanClause.Occur})
|
||||
describing how that sub-query is combined with the other clauses:
|
||||
<ol>
|
||||
|
||||
<li><p>SHOULD — Use this operator when a clause can occur in the result set, but is not required.
|
||||
<li><p>{@link org.apache.lucene.search.BooleanClause.Occur#SHOULD SHOULD} — Use this operator when a clause can occur in the result set, but is not required.
|
||||
If a query is made up of all SHOULD clauses, then every document in the result
|
||||
set matches at least one of these clauses.</p></li>
|
||||
|
||||
<li><p>MUST — Use this operator when a clause is required to occur in the result set. Every
|
||||
<li><p>{@link org.apache.lucene.search.BooleanClause.Occur#MUST MUST} — Use this operator when a clause is required to occur in the result set. Every
|
||||
document in the result set will match
|
||||
all such clauses.</p></li>
|
||||
|
||||
<li><p>MUST NOT — Use this operator when a
|
||||
<li><p>{@link org.apache.lucene.search.BooleanClause.Occur#MUST_NOT MUST NOT} — Use this operator when a
|
||||
clause must not occur in the result set. No
|
||||
document in the result set will match
|
||||
any such clauses.</p></li>
|
||||
</ol>
|
||||
Boolean queries are constructed by adding two or more
|
||||
<a href="BooleanClause.html">BooleanClause</a>
|
||||
instances. If too many clauses are added, a <a href="BooleanQuery.TooManyClauses.html">TooManyClauses</a>
|
||||
{@link org.apache.lucene.search.BooleanClause BooleanClause}
|
||||
instances. If too many clauses are added, a {@link org.apache.lucene.search.BooleanQuery.TooManyClauses TooManyClauses}
|
||||
exception will be thrown during searching. This most often occurs
|
||||
when a <a href="Query.html">Query</a>
|
||||
is rewritten into a <a href="BooleanQuery.html">BooleanQuery</a> with many
|
||||
<a href="TermQuery.html">TermQuery</a> clauses,
|
||||
for example by <a href="WildcardQuery.html">WildcardQuery</a>.
|
||||
when a {@link org.apache.lucene.search.Query Query}
|
||||
is rewritten into a {@link org.apache.lucene.search.BooleanQuery BooleanQuery} with many
|
||||
{@link org.apache.lucene.search.TermQuery TermQuery} clauses,
|
||||
for example by {@link org.apache.lucene.search.WildcardQuery WildcardQuery}.
|
||||
The default setting for the maximum number
|
||||
of clauses 1024, but this can be changed via the
|
||||
static method <a href="BooleanQuery.html#setMaxClauseCount(int)">setMaxClauseCount</a>
|
||||
in <a href="BooleanQuery.html">BooleanQuery</a>.
|
||||
static method {@link org.apache.lucene.search.BooleanQuery#setMaxClauseCount(int)}.
|
||||
</p>
|
||||
|
||||
<h4>Phrases</h4>
|
||||
|
||||
<p>Another common search is to find documents containing certain phrases. This
|
||||
is handled two different ways:
|
||||
is handled three different ways:
|
||||
<ol>
|
||||
<li>
|
||||
<p><a href="PhraseQuery.html">PhraseQuery</a>
|
||||
<p>{@link org.apache.lucene.search.PhraseQuery PhraseQuery}
|
||||
— Matches a sequence of
|
||||
<a href="../index/Term.html">Terms</a>.
|
||||
<a href="PhraseQuery.html">PhraseQuery</a> uses a slop factor to determine
|
||||
{@link org.apache.lucene.index.Term Term}s.
|
||||
{@link org.apache.lucene.search.PhraseQuery PhraseQuery} uses a slop factor to determine
|
||||
how many positions may occur between any two terms in the phrase and still be considered a match.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><a href="spans/SpanNearQuery.html">SpanNearQuery</a>
|
||||
<p>{@link org.apache.lucene.search.MultiPhraseQuery MultiPhraseQuery}
|
||||
— A more general form of PhraseQuery that accepts multiple Terms
|
||||
for a position in the phrase. For example, this can be used to perform phrase queries that also
|
||||
incorporate synonyms.
|
||||
</li>
|
||||
<li>
|
||||
<p>{@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery}
|
||||
— Matches a sequence of other
|
||||
<a href="spans/SpanQuery.html">SpanQuery</a>
|
||||
instances. <a href="spans/SpanNearQuery.html">SpanNearQuery</a> allows for
|
||||
{@link org.apache.lucene.search.spans.SpanQuery SpanQuery}
|
||||
instances. {@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery} allows for
|
||||
much more
|
||||
complicated phrase queries since it is constructed from other <a
|
||||
href="spans/SpanQuery.html">SpanQuery</a>
|
||||
instances, instead of only <a href="TermQuery.html">TermQuery</a>
|
||||
complicated phrase queries since it is constructed from other
|
||||
{@link org.apache.lucene.search.spans.SpanQuery SpanQuery}
|
||||
instances, instead of only {@link org.apache.lucene.search.TermQuery TermQuery}
|
||||
instances.</p>
|
||||
</li>
|
||||
</ol>
|
||||
</p>
|
||||
|
||||
<h4>
|
||||
<a href="TermRangeQuery.html">TermRangeQuery</a>
|
||||
{@link org.apache.lucene.search.TermRangeQuery TermRangeQuery}
|
||||
</h4>
|
||||
|
||||
<p>The
|
||||
<a href="TermRangeQuery.html">TermRangeQuery</a>
|
||||
{@link org.apache.lucene.search.TermRangeQuery TermRangeQuery}
|
||||
matches all documents that occur in the
|
||||
exclusive range of a lower
|
||||
<a href="../index/Term.html">Term</a>
|
||||
{@link org.apache.lucene.index.Term Term}
|
||||
and an upper
|
||||
<a href="../index/Term.html">Term</a>.
|
||||
according to {@link java.lang.String#compareTo(String)}. It is not intended
|
||||
for numerical ranges, use <a href="NumericRangeQuery.html">NumericRangeQuery</a> instead.
|
||||
{@link org.apache.lucene.index.Term Term}
|
||||
according to {@link org.apache.lucene.index.TermsEnum#getComparator TermsEnum.getComparator()}. It is not intended
|
||||
for numerical ranges, use {@link org.apache.lucene.search.NumericRangeQuery NumericRangeQuery} instead.
|
||||
|
||||
For example, one could find all documents
|
||||
that have terms beginning with the letters <tt>a</tt> through <tt>c</tt>. This type of <a
|
||||
href="Query.html">Query</a> is frequently used to
|
||||
that have terms beginning with the letters <tt>a</tt> through <tt>c</tt>. This type of
|
||||
{@link org.apache.lucene.search.Query} is frequently used to
|
||||
find
|
||||
documents that occur in a specific date range.
|
||||
</p>
|
||||
|
||||
<h4>
|
||||
<a href="NumericRangeQuery.html">NumericRangeQuery</a>
|
||||
{@link org.apache.lucene.search.NumericRangeQuery NumericRangeQuery}
|
||||
</h4>
|
||||
|
||||
<p>The
|
||||
<a href="NumericRangeQuery.html">NumericRangeQuery</a>
|
||||
{@link org.apache.lucene.search.NumericRangeQuery NumericRangeQuery}
|
||||
matches all documents that occur in a numeric range.
|
||||
For NumericRangeQuery to work, you must index the values
|
||||
using a one of the numeric fields (<a href="../document/IntField.html">IntField</a>,
|
||||
<a href="../document/LongField.html">LongField</a>, <a href="../document/FloatField.html">FloatField</a>,
|
||||
or <a href="../document/DoubleField.html">DoubleField</a>).
|
||||
using a one of the numeric fields ({@link org.apache.lucene.document.IntField IntField},
|
||||
{@link org.apache.lucene.document.LongField LongField}, {@link org.apache.lucene.document.FloatField FloatField},
|
||||
or {@link org.apache.lucene.document.DoubleField DoubleField}).
|
||||
</p>
|
||||
|
||||
<h4>
|
||||
<a href="PrefixQuery.html">PrefixQuery</a>,
|
||||
<a href="WildcardQuery.html">WildcardQuery</a>
|
||||
{@link org.apache.lucene.search.PrefixQuery PrefixQuery},
|
||||
{@link org.apache.lucene.search.WildcardQuery WildcardQuery},
|
||||
{@link org.apache.lucene.search.RegexpQuery RegexpQuery}
|
||||
</h4>
|
||||
|
||||
<p>While the
|
||||
<a href="PrefixQuery.html">PrefixQuery</a>
|
||||
{@link org.apache.lucene.search.PrefixQuery PrefixQuery}
|
||||
has a different implementation, it is essentially a special case of the
|
||||
<a href="WildcardQuery.html">WildcardQuery</a>.
|
||||
The <a href="PrefixQuery.html">PrefixQuery</a> allows an application
|
||||
to identify all documents with terms that begin with a certain string. The <a
|
||||
href="WildcardQuery.html">WildcardQuery</a> generalizes this by allowing
|
||||
{@link org.apache.lucene.search.WildcardQuery WildcardQuery}.
|
||||
The {@link org.apache.lucene.search.PrefixQuery PrefixQuery} allows an application
|
||||
to identify all documents with terms that begin with a certain string. The
|
||||
{@link org.apache.lucene.search.WildcardQuery WildcardQuery} generalizes this by allowing
|
||||
for the use of <tt>*</tt> (matches 0 or more characters) and <tt>?</tt> (matches exactly one character) wildcards.
|
||||
Note that the <a href="WildcardQuery.html">WildcardQuery</a> can be quite slow. Also
|
||||
Note that the {@link org.apache.lucene.search.WildcardQuery WildcardQuery} can be quite slow. Also
|
||||
note that
|
||||
<a href="WildcardQuery.html">WildcardQuery</a> should
|
||||
{@link org.apache.lucene.search.WildcardQuery WildcardQuery} should
|
||||
not start with <tt>*</tt> and <tt>?</tt>, as these are extremely slow.
|
||||
To remove this protection and allow a wildcard at the beginning of a term, see method
|
||||
<a href="../queryParser/QueryParser.html#setAllowLeadingWildcard(boolean)">setAllowLeadingWildcard</a> in
|
||||
<a href="../queryParser/QueryParser.html">QueryParser</a>.
|
||||
Some QueryParsers may not allow this by default, but provide a <code>setAllowLeadingWildcard</code> method
|
||||
to remove protection.
|
||||
The {@link org.apache.lucene.search.RegexpQuery RegexpQuery} is even more general than WildcardQuery,
|
||||
allowing an application to identify all documents with terms that match a regular expression pattern.
|
||||
</p>
|
||||
<h4>
|
||||
<a href="FuzzyQuery.html">FuzzyQuery</a>
|
||||
{@link org.apache.lucene.search.FuzzyQuery FuzzyQuery}
|
||||
</h4>
|
||||
|
||||
<p>A
|
||||
<a href="FuzzyQuery.html">FuzzyQuery</a>
|
||||
{@link org.apache.lucene.search.FuzzyQuery FuzzyQuery}
|
||||
matches documents that contain terms similar to the specified term. Similarity is
|
||||
determined using
|
||||
<a href="http://en.wikipedia.org/wiki/Levenshtein">Levenshtein (edit) distance</a>.
|
||||
@ -206,58 +213,9 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
|
||||
<a name="changingSimilarity"></a>
|
||||
<h2>Changing Similarity</h2>
|
||||
|
||||
<p>Chances are <a href="DefaultSimilarity.html">DefaultSimilarity</a> is sufficient for all
|
||||
your searching needs.
|
||||
However, in some applications it may be necessary to customize your <a
|
||||
href="Similarity.html">Similarity</a> implementation. For instance, some
|
||||
applications do not need to
|
||||
distinguish between shorter and longer documents (see <a
|
||||
href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">a "fair" similarity</a>).</p>
|
||||
See the {@link org.apache.lucene.search.similarities} package documentation for information
|
||||
on the available scoring models and extending or changing Similarity.
|
||||
|
||||
<p>To change <a href="Similarity.html">Similarity</a>, one must do so for both indexing and
|
||||
searching, and the changes must happen before
|
||||
either of these actions take place. Although in theory there is nothing stopping you from changing mid-stream, it
|
||||
just isn't well-defined what is going to happen.
|
||||
</p>
|
||||
|
||||
<p>To make this change, implement your own <a href="Similarity.html">Similarity</a> (likely
|
||||
you'll want to simply subclass
|
||||
<a href="DefaultSimilarity.html">DefaultSimilarity</a>) and then use the new
|
||||
class by calling
|
||||
<a href="../index/IndexWriter.html#setSimilarity(org.apache.lucene.search.Similarity)">IndexWriter.setSimilarity</a>
|
||||
before indexing and
|
||||
<a href="Searcher.html#setSimilarity(org.apache.lucene.search.Similarity)">Searcher.setSimilarity</a>
|
||||
before searching.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
If you are interested in use cases for changing your similarity, see the Lucene users's mailing list at <a
|
||||
href="http://www.nabble.com/Overriding-Similarity-tf2128934.html">Overriding Similarity</a>.
|
||||
In summary, here are a few use cases:
|
||||
<ol>
|
||||
<li><p><a href="api/org/apache/lucene/misc/SweetSpotSimilarity.html">SweetSpotSimilarity</a> — <a
|
||||
href="api/org/apache/lucene/misc/SweetSpotSimilarity.html">SweetSpotSimilarity</a> gives small increases
|
||||
as the frequency increases a small amount
|
||||
and then greater increases when you hit the "sweet spot", i.e. where you think the frequency of terms is
|
||||
more significant.</p></li>
|
||||
<li><p>Overriding tf — In some applications, it doesn't matter what the score of a document is as long as a
|
||||
matching term occurs. In these
|
||||
cases people have overridden Similarity to return 1 from the tf() method.</p></li>
|
||||
<li><p>Changing Length Normalization — By overriding <a
|
||||
href="Similarity.html#lengthNorm(java.lang.String,%20int)">lengthNorm</a>,
|
||||
it is possible to discount how the length of a field contributes
|
||||
to a score. In <a href="DefaultSimilarity.html">DefaultSimilarity</a>,
|
||||
lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be
|
||||
1 / (numTerms in field), all fields will be treated
|
||||
<a href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">"fairly"</a>.</p></li>
|
||||
</ol>
|
||||
In general, Chris Hostetter sums it up best in saying (from <a
|
||||
href="http://www.gossamer-threads.com/lists/lucene/java-user/39125#39125">the Lucene users's mailing list</a>):
|
||||
<blockquote>[One would override the Similarity in] ... any situation where you know more about your data then just
|
||||
that
|
||||
it's "text" is a situation where it *might* make sense to to override your
|
||||
Similarity method.</blockquote>
|
||||
</p>
|
||||
<a name="scoring"></a>
|
||||
<h2>Changing Scoring — Expert Level</h2>
|
||||
|
||||
@ -270,112 +228,130 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
|
||||
<span >three main classes</span>:
|
||||
<ol>
|
||||
<li>
|
||||
<a href="Query.html">Query</a> — The abstract object representation of the
|
||||
{@link org.apache.lucene.search.Query Query} — The abstract object representation of the
|
||||
user's information need.</li>
|
||||
<li>
|
||||
<a href="Weight.html">Weight</a> — The internal interface representation of
|
||||
{@link org.apache.lucene.search.Weight Weight} — The internal interface representation of
|
||||
the user's Query, so that Query objects may be reused.</li>
|
||||
<li>
|
||||
<a href="Scorer.html">Scorer</a> — An abstract class containing common
|
||||
{@link org.apache.lucene.search.Scorer Scorer} — An abstract class containing common
|
||||
functionality for scoring. Provides both scoring and explanation capabilities.</li>
|
||||
</ol>
|
||||
Details on each of these classes, and their children, can be found in the subsections below.
|
||||
</p>
|
||||
<h4>The Query Class</h4>
|
||||
<p>In some sense, the
|
||||
<a href="Query.html">Query</a>
|
||||
{@link org.apache.lucene.search.Query Query}
|
||||
class is where it all begins. Without a Query, there would be
|
||||
nothing to score. Furthermore, the Query class is the catalyst for the other scoring classes as it
|
||||
is often responsible
|
||||
for creating them or coordinating the functionality between them. The
|
||||
<a href="Query.html">Query</a> class has several methods that are important for
|
||||
{@link org.apache.lucene.search.Query Query} class has several methods that are important for
|
||||
derived classes:
|
||||
<ol>
|
||||
<li>createWeight(Searcher searcher) — A
|
||||
<a href="Weight.html">Weight</a> is the internal representation of the
|
||||
<li>{@link org.apache.lucene.search.Query#createWeight(IndexSearcher) createWeight(IndexSearcher searcher} — A
|
||||
{@link org.apache.lucene.search.Weight Weight} is the internal representation of the
|
||||
Query, so each Query implementation must
|
||||
provide an implementation of Weight. See the subsection on <a
|
||||
href="#The Weight Interface">The Weight Interface</a> below for details on implementing the Weight
|
||||
interface.</li>
|
||||
<li>rewrite(IndexReader reader) — Rewrites queries into primitive queries. Primitive queries are:
|
||||
<a href="TermQuery.html">TermQuery</a>,
|
||||
<a href="BooleanQuery.html">BooleanQuery</a>, <span
|
||||
>and other queries that implement Query.html#createWeight(Searcher searcher)</span></li>
|
||||
<li>{@link org.apache.lucene.search.Query#rewrite(IndexReader) rewrite(IndexReader reader} — Rewrites queries into primitive queries. Primitive queries are:
|
||||
{@link org.apache.lucene.search.TermQuery TermQuery},
|
||||
{@link org.apache.lucene.search.BooleanQuery BooleanQuery}, <span
|
||||
>and other queries that implement {@link org.apache.lucene.search.Query#createWeight(IndexSearcher) createWeight(IndexSearcher searcher)}</span></li>
|
||||
</ol>
|
||||
</p>
|
||||
<h4>The Weight Interface</h4>
|
||||
<p>The
|
||||
<a href="Weight.html">Weight</a>
|
||||
{@link org.apache.lucene.search.Weight Weight}
|
||||
interface provides an internal representation of the Query so that it can be reused. Any
|
||||
<a href="Searcher.html">Searcher</a>
|
||||
{@link org.apache.lucene.search.IndexSearcher IndexSearcher}
|
||||
dependent state should be stored in the Weight implementation,
|
||||
not in the Query class. The interface defines six methods that must be implemented:
|
||||
not in the Query class. The interface defines five methods that must be implemented:
|
||||
<ol>
|
||||
<li>
|
||||
<a href="Weight.html#getQuery()">Weight#getQuery()</a> — Pointer to the
|
||||
{@link org.apache.lucene.search.Weight#getQuery getQuery()} — Pointer to the
|
||||
Query that this Weight represents.</li>
|
||||
<li>
|
||||
<a href="Weight.html#getValue()">Weight#getValue()</a> — The weight for
|
||||
this Query. For example, the TermQuery.TermWeight value is
|
||||
equal to the idf^2 * boost * queryNorm <!-- DOUBLE CHECK THIS --></li>
|
||||
{@link org.apache.lucene.search.Weight#getValueForNormalization() getValueForNormalization()} —
|
||||
A weight can return a floating point value to indicate its magnitude for query normalization. Typically
|
||||
a weight such as TermWeight that scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity}
|
||||
will just defer to the Similarity's implementation:
|
||||
{@link org.apache.lucene.search.similarities.Similarity.SimWeight#getValueForNormalization SimWeight#getValueForNormalization()}.
|
||||
For example, with {@link org.apache.lucene.search.similarities.TFIDFSimilarity Lucene's classic vector-space formula}, this
|
||||
is implemented as the sum of squared weights: <code>(idf * boost)<sup>2</sup></code></li>
|
||||
<li>
|
||||
<a href="Weight.html#sumOfSquaredWeights()">
|
||||
Weight#sumOfSquaredWeights()</a> — The sum of squared weights. For TermQuery, this is (idf *
|
||||
boost)^2</li>
|
||||
<li>
|
||||
<a href="Weight.html#normalize(float)">
|
||||
Weight#normalize(float)</a> — Determine the query normalization factor. The query normalization may
|
||||
{@link org.apache.lucene.search.Weight#normalize(float,float) normalize(float norm, float topLevelBoost)} —
|
||||
Performs query normalization:
|
||||
<ul>
|
||||
<li><code>topLevelBoost</code>: A query-boost factor from any wrapping queries that should be multiplied into every
|
||||
document's score. For example, a TermQuery that is wrapped within a BooleanQuery with a boost of <code>5</code> would
|
||||
receive this value at this time. This allows the TermQuery (the leaf node in this case) to compute this up-front
|
||||
a single time (e.g. by multiplying into the IDF), rather than for every document.</li>
|
||||
<li><code>norm</code>: Passes in a a normalization factor which may
|
||||
allow for comparing scores between queries.</li>
|
||||
</ul>
|
||||
Typically a weight such as TermWeight
|
||||
that scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity} will just defer to the Similarity's implementation:
|
||||
{@link org.apache.lucene.search.similarities.Similarity.SimWeight#normalize SimWeight#normalize(float,float)}.</li>
|
||||
<li>
|
||||
<a href="Weight.html#scorer(org.apache.lucene.index.AtomicReaderContext, boolean, boolean)">
|
||||
Weight#scorer(AtomicReaderContext, boolean, boolean)</a> — Construct a new
|
||||
<a href="Scorer.html">Scorer</a>
|
||||
for this Weight. See
|
||||
<a href="#The Scorer Class">The Scorer Class</a>
|
||||
below for help defining a Scorer. As the name implies, the
|
||||
Scorer is responsible for doing the actual scoring of documents given the Query.
|
||||
{@link org.apache.lucene.search.Weight#scorer(org.apache.lucene.index.AtomicReaderContext, boolean, boolean, org.apache.lucene.util.Bits)
|
||||
scorer(AtomicReaderContext context, boolean scoresDocsInOrder, boolean topScorer, Bits acceptDocs)} —
|
||||
Construct a new {@link org.apache.lucene.search.Scorer Scorer} for this Weight. See <a href="#The Scorer Class">The Scorer Class</a>
|
||||
below for help defining a Scorer. As the name implies, the Scorer is responsible for doing the actual scoring of documents
|
||||
given the Query.
|
||||
</li>
|
||||
<li>
|
||||
<a href="Weight.html#explain(org.apache.lucene.search.Searcher, org.apache.lucene.index.AtomicReaderContext, int)">
|
||||
Weight#explain(Searcher, AtomicReaderContext, int)</a> — Provide a means for explaining why a given document was
|
||||
scored
|
||||
the way it was.</li>
|
||||
{@link org.apache.lucene.search.Weight#explain(org.apache.lucene.index.AtomicReaderContext, int)
|
||||
explain(AtomicReaderContext context, int doc)} — Provide a means for explaining why a given document was
|
||||
scored the way it was.
|
||||
Typically a weight such as TermWeight
|
||||
that scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity} will make use of the Similarity's implementations:
|
||||
{@link org.apache.lucene.search.similarities.Similarity.ExactSimScorer#explain(int, Explanation) ExactSimScorer#explain(int doc, Explanation freq)},
|
||||
and {@link org.apache.lucene.search.similarities.Similarity.SloppySimScorer#explain(int, Explanation) SloppySimScorer#explain(int doc, Explanation freq)}
|
||||
</li>
|
||||
</li>
|
||||
</ol>
|
||||
</p>
|
||||
<h4>The Scorer Class</h4>
|
||||
<p>The
|
||||
<a href="Scorer.html">Scorer</a>
|
||||
{@link org.apache.lucene.search.Scorer Scorer}
|
||||
abstract class provides common scoring functionality for all Scorer implementations and
|
||||
is the heart of the Lucene scoring process. The Scorer defines the following abstract (some of them are not
|
||||
yet abstract, but will be in future versions and should be considered as such now) methods which
|
||||
must be implemented (some of them inherited from <a href="DocIdSetIterator.html">DocIdSetIterator</a> ):
|
||||
must be implemented (some of them inherited from {@link org.apache.lucene.search.DocIdSetIterator DocIdSetIterator}):
|
||||
<ol>
|
||||
<li>
|
||||
<a href="DocIdSetIterator.html#nextDoc()">DocIdSetIterator#nextDoc()</a> — Advances to the next
|
||||
document that matches this Query, returning true if and only
|
||||
if there is another document that matches.</li>
|
||||
{@link org.apache.lucene.search.Scorer#nextDoc nextDoc()} — Advances to the next
|
||||
document that matches this Query, returning true if and only if there is another document that matches.</li>
|
||||
<li>
|
||||
<a href="DocIdSetIterator.html#docID()">DocIdSetIterator#docID()</a> — Returns the id of the
|
||||
<a href="../document/Document.html">Document</a>
|
||||
that contains the match. It is not valid until next() has been called at least once.
|
||||
{@link org.apache.lucene.search.Scorer#docID docID()} — Returns the id of the
|
||||
{@link org.apache.lucene.document.Document Document} that contains the match.
|
||||
</li>
|
||||
<li>
|
||||
<a href="Scorer.html#score(org.apache.lucene.search.Collector)">Scorer#score(Collector)</a> —
|
||||
Scores and collects all matching documents using the given Collector.
|
||||
{@link org.apache.lucene.search.Scorer#score score()} — Return the score of the
|
||||
current document. This value can be determined in any appropriate way for an application. For instance, the
|
||||
{@link org.apache.lucene.search.TermScorer TermScorer} simply defers to the configured Similarity:
|
||||
{@link org.apache.lucene.search.similarities.Similarity.ExactSimScorer#score(int, int) ExactSimScorer.score(int doc, int freq)}.
|
||||
</li>
|
||||
<li>
|
||||
<a href="Scorer.html#score()">Scorer#score()</a> — Return the score of the
|
||||
current document. This value can be determined in any
|
||||
appropriate way for an application. For instance, the
|
||||
<a href="http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/TermScorer.java?view=log">TermScorer</a>
|
||||
returns the tf * Weight.getValue() * fieldNorm.
|
||||
{@link org.apache.lucene.search.Scorer#freq freq()} — Returns the number of matches
|
||||
for the current document. This value can be determined in any appropriate way for an application. For instance, the
|
||||
{@link org.apache.lucene.search.TermScorer TermScorer} simply defers to the term frequency from the inverted index:
|
||||
{@link org.apache.lucene.index.DocsEnum#freq DocsEnum.freq()}.
|
||||
</li>
|
||||
<li>
|
||||
<a href="DocIdSetIterator.html#advance(int)">DocIdSetIterator#advance(int)</a> — Skip ahead in
|
||||
{@link org.apache.lucene.search.Scorer#advance advance()} — Skip ahead in
|
||||
the document matches to the document whose id is greater than
|
||||
or equal to the passed in value. In many instances, advance can be
|
||||
implemented more efficiently than simply looping through all the matching documents until
|
||||
the target document is identified.</li>
|
||||
the target document is identified.
|
||||
</li>
|
||||
<li>
|
||||
{@link org.apache.lucene.search.Scorer#getChildren getChildren()} — Returns any child subscorers
|
||||
underneath this scorer. This allows for users to navigate the scorer hierarchy and receive more fine-grained
|
||||
details on the scoring process.
|
||||
</li>
|
||||
</ol>
|
||||
</p>
|
||||
<h4>Why would I want to add my own Query?</h4>
|
||||
|
@ -25,8 +25,8 @@
|
||||
</DIV>
|
||||
<div>
|
||||
<ol>
|
||||
<li><a href="./PayloadTermQuery.html">PayloadTermQuery</a> -- Boost a term's score based on the value of the payload located at that term.</li>
|
||||
<li><a href="./PayloadNearQuery.html">PayloadNearQuery</a> -- A <a href="../spans/SpanNearQuery.html">SpanNearQuery</a> that factors in the value of the payloads located
|
||||
<li>{@link org.apache.lucene.search.payloads.PayloadTermQuery PayloadTermQuery} -- Boost a term's score based on the value of the payload located at that term.</li>
|
||||
<li>{@link org.apache.lucene.search.payloads.PayloadNearQuery PayloadNearQuery} -- A {@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery} that factors in the value of the payloads located
|
||||
at each of the positions where the spans occur.</li>
|
||||
</ol>
|
||||
</div>
|
||||
|
@ -105,7 +105,7 @@ implement the {@link org.apache.lucene.search.similarities.SimilarityBase#score(
|
||||
and {@link org.apache.lucene.search.similarities.SimilarityBase#toString()}
|
||||
methods.</p>
|
||||
|
||||
<p>Another options is to extend one of the <a href="#framework">frameworks</a>
|
||||
<p>Another option is to extend one of the <a href="#framework">frameworks</a>
|
||||
based on {@link org.apache.lucene.search.similarities.SimilarityBase}. These
|
||||
Similarities are implemented modularly, e.g.
|
||||
{@link org.apache.lucene.search.similarities.DFRSimilarity} delegates
|
||||
|
@ -26,29 +26,30 @@ The calculus of spans.
|
||||
|
||||
<ul>
|
||||
|
||||
<li>A <a href="SpanTermQuery.html">SpanTermQuery</a> matches all spans
|
||||
containing a particular <a href="../../index/Term.html">Term</a>.</li>
|
||||
<li>A {@link org.apache.lucene.search.spans.SpanTermQuery SpanTermQuery} matches all spans
|
||||
containing a particular {@link org.apache.lucene.index.Term Term}.</li>
|
||||
|
||||
<li> A <a href="SpanNearQuery.html">SpanNearQuery</a> matches spans
|
||||
<li> A {@link org.apache.lucene.search.spans.SpanNearQuery SpanNearQuery} matches spans
|
||||
which occur near one another, and can be used to implement things like
|
||||
phrase search (when constructed from <a
|
||||
href="SpanTermQuery.html">SpanTermQueries</a>) and inter-phrase
|
||||
proximity (when constructed from other <a
|
||||
href="SpanNearQuery.html">SpanNearQueries</a>).</li>
|
||||
phrase search (when constructed from {@link org.apache.lucene.search.spans.SpanTermQuery}s)
|
||||
and inter-phrase proximity (when constructed from other {@link org.apache.lucene.search.spans.SpanNearQuery}s).</li>
|
||||
|
||||
<li>A <a href="SpanOrQuery.html">SpanOrQuery</a> merges spans from a
|
||||
number of other <a href="SpanQuery.html">SpanQueries</a>.</li>
|
||||
<li>A {@link org.apache.lucene.search.spans.SpanOrQuery SpanOrQuery} merges spans from a
|
||||
number of other {@link org.apache.lucene.search.spans.SpanQuery}s.</li>
|
||||
|
||||
<li>A <a href="SpanNotQuery.html">SpanNotQuery</a> removes spans
|
||||
matching one <a href="SpanQuery.html">SpanQuery</a> which overlap
|
||||
<li>A {@link org.apache.lucene.search.spans.SpanNotQuery SpanNotQuery} removes spans
|
||||
matching one {@link org.apache.lucene.search.spans.SpanQuery SpanQuery} which overlap
|
||||
another. This can be used, e.g., to implement within-paragraph
|
||||
search.</li>
|
||||
|
||||
<li>A <a href="SpanFirstQuery.html">SpanFirstQuery</a> matches spans
|
||||
<li>A {@link org.apache.lucene.search.spans.SpanFirstQuery SpanFirstQuery} matches spans
|
||||
matching <code>q</code> whose end position is less than
|
||||
<code>n</code>. This can be used to constrain matches to the first
|
||||
part of the document.</li>
|
||||
|
||||
<li>A {@link org.apache.lucene.search.spans.SpanPositionRangeQuery SpanPositionRangeQuery} is
|
||||
a more general form of SpanFirstQuery that can constrain matches to arbitrary portions of the document.</li>
|
||||
|
||||
</ul>
|
||||
|
||||
In all cases, output spans are minimally inclusive. In other words, a
|
||||
|
Loading…
x
Reference in New Issue
Block a user