mirror of https://github.com/apache/lucene.git
610 lines
28 KiB
HTML
610 lines
28 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
|
<meta content="Apache Forrest" name="Generator">
|
|
<meta name="Forrest-version" content="0.8">
|
|
<meta name="Forrest-skin-name" content="pelt">
|
|
<title>
|
|
Apache Lucene - Scoring
|
|
</title>
|
|
<link type="text/css" href="skin/basic.css" rel="stylesheet">
|
|
<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
|
|
<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
|
|
<link type="text/css" href="skin/profile.css" rel="stylesheet">
|
|
<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
|
|
<link rel="shortcut icon" href="images/favicon.ico">
|
|
</head>
|
|
<body onload="init()">
|
|
<script type="text/javascript">ndeSetTextSize();</script>
|
|
<div id="top">
|
|
<!--+
|
|
|breadtrail
|
|
+-->
|
|
<div class="breadtrail">
|
|
<a href="http://www.apache.org/">Apache</a> > <a href="http://lucene.apache.org/">Lucene</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
|
|
</div>
|
|
<!--+
|
|
|header
|
|
+-->
|
|
<div class="header">
|
|
<!--+
|
|
|start group logo
|
|
+-->
|
|
<div class="grouplogo">
|
|
<a href="http://lucene.apache.org/"><img class="logoImage" alt="Lucene" src="http://www.apache.org/images/asf_logo_simple.png" title="Apache Lucene"></a>
|
|
</div>
|
|
<!--+
|
|
|end group logo
|
|
+-->
|
|
<!--+
|
|
|start Project Logo
|
|
+-->
|
|
<div class="projectlogo">
|
|
<a href="http://lucene.apache.org/java/"><img class="logoImage" alt="Lucene" src="http://lucene.apache.org/images/lucene_green_300.gif" title="Apache Lucene is a high-performance, full-featured text search engine library written entirely in
|
|
Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform."></a>
|
|
</div>
|
|
<!--+
|
|
|end Project Logo
|
|
+-->
|
|
<!--+
|
|
|start Search
|
|
+-->
|
|
<div class="searchbox">
|
|
<form action="http://www.google.com/search" method="get" class="roundtopsmall">
|
|
<input value="lucene.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">
|
|
<input name="Search" value="Search" type="submit">
|
|
</form>
|
|
</div>
|
|
<!--+
|
|
|end search
|
|
+-->
|
|
<!--+
|
|
|start Tabs
|
|
+-->
|
|
<ul id="tabs">
|
|
<li class="current">
|
|
<a class="selected" href="http://lucene.apache.org/java/docs/">Main</a>
|
|
</li>
|
|
<li>
|
|
<a class="unselected" href="http://wiki.apache.org/lucene-java">Wiki</a>
|
|
</li>
|
|
<li class="current">
|
|
<a class="selected" href="index.html">Lucene 2.9-dev Documentation</a>
|
|
</li>
|
|
</ul>
|
|
<!--+
|
|
|end Tabs
|
|
+-->
|
|
</div>
|
|
</div>
|
|
<div id="main">
|
|
<div id="publishedStrip">
|
|
<!--+
|
|
|start Subtabs
|
|
+-->
|
|
<div id="level2tabs"></div>
|
|
<!--+
|
|
|end Endtabs
|
|
+-->
|
|
<script type="text/javascript"><!--
|
|
document.write("Last Published: " + document.lastModified);
|
|
// --></script>
|
|
</div>
|
|
<!--+
|
|
|breadtrail
|
|
+-->
|
|
<div class="breadtrail">
|
|
|
|
|
|
</div>
|
|
<!--+
|
|
|start Menu, mainarea
|
|
+-->
|
|
<!--+
|
|
|start Menu
|
|
+-->
|
|
<div id="menu">
|
|
<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
|
|
<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
|
|
<div class="menuitem">
|
|
<a href="index.html">Overview</a>
|
|
</div>
|
|
<div onclick="SwitchMenu('menu_1.1.2', 'skin/')" id="menu_1.1.2Title" class="menutitle">Javadocs</div>
|
|
<div id="menu_1.1.2" class="menuitemgroup">
|
|
<div class="menuitem">
|
|
<a href="api/all/index.html">All</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/core/index.html">Core</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/demo/index.html">Demo</a>
|
|
</div>
|
|
<div onclick="SwitchMenu('menu_1.1.2.4', 'skin/')" id="menu_1.1.2.4Title" class="menutitle">Contrib</div>
|
|
<div id="menu_1.1.2.4" class="menuitemgroup">
|
|
<div class="menuitem">
|
|
<a href="api/contrib-analyzers/index.html">Analyzers</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-ant/index.html">Ant</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-bdb/index.html">Bdb</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-bdb-je/index.html">Bdb-je</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-benchmark/index.html">Benchmark</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-highlighter/index.html">Highlighter</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-instantiated/index.html">Instantiated</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-lucli/index.html">Lucli</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-memory/index.html">Memory</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-misc/index.html">Miscellaneous</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-queries/index.html">Queries</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-regex/index.html">Regex</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-snowball/index.html">Snowball</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-spellchecker/index.html">Spellchecker</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-surround/index.html">Surround</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-swing/index.html">Swing</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-wikipedia/index.html">Wikipedia</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-wordnet/index.html">Wordnet</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="api/contrib-xml-query-parser/index.html">XML Query Parser</a>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="benchmarks.html">Benchmarks</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="contributions.html">Contributions</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://wiki.apache.org/lucene-java/LuceneFAQ">FAQ</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="fileformats.html">File Formats</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="gettingstarted.html">Getting Started</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="lucene-sandbox/index.html">Lucene Sandbox</a>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="queryparsersyntax.html">Query Syntax</a>
|
|
</div>
|
|
<div class="menupage">
|
|
<div class="menupagetitle">Scoring</div>
|
|
</div>
|
|
<div class="menuitem">
|
|
<a href="http://wiki.apache.org/lucene-java">Wiki</a>
|
|
</div>
|
|
</div>
|
|
<div id="credit"></div>
|
|
<div id="roundbottom">
|
|
<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
|
|
<!--+
|
|
|alternative credits
|
|
+-->
|
|
<div id="credit2"></div>
|
|
</div>
|
|
<!--+
|
|
|end Menu
|
|
+-->
|
|
<!--+
|
|
|start content
|
|
+-->
|
|
<div id="content">
|
|
<div title="Portable Document Format" class="pdflink">
|
|
<a class="dida" href="scoring.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
|
|
PDF</a>
|
|
</div>
|
|
<h1>
|
|
Apache Lucene - Scoring
|
|
</h1>
|
|
<div id="minitoc-area">
|
|
<ul class="minitoc">
|
|
<li>
|
|
<a href="#Introduction">Introduction</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Scoring">Scoring</a>
|
|
<ul class="minitoc">
|
|
<li>
|
|
<a href="#Fields and Documents">Fields and Documents</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Score Boosting">Score Boosting</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Understanding the Scoring Formula">Understanding the Scoring Formula</a>
|
|
</li>
|
|
<li>
|
|
<a href="#The Big Picture">The Big Picture</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Query Classes">Query Classes</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Changing Similarity">Changing Similarity</a>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
<li>
|
|
<a href="#Changing your Scoring -- Expert Level">Changing your Scoring -- Expert Level</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Appendix">Appendix</a>
|
|
<ul class="minitoc">
|
|
<li>
|
|
<a href="#Class Diagrams">Class Diagrams</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Sequence Diagrams">Sequence Diagrams</a>
|
|
</li>
|
|
<li>
|
|
<a href="#Algorithm">Algorithm</a>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
|
|
<a name="N10013"></a><a name="Introduction"></a>
|
|
<h2 class="boxed">Introduction</h2>
|
|
<div class="section">
|
|
<p>Lucene scoring is the heart of why we all love Lucene. It is blazingly fast and it hides almost all of the complexity from the user.
|
|
In a nutshell, it works. At least, that is, until it doesn't work, or doesn't work as one would expect it to
|
|
work. Then we are left digging into Lucene internals or asking for help on java-user@lucene.apache.org to figure out why a document with five of our query terms
|
|
scores lower than a different document with only one of the query terms. </p>
|
|
<p>While this document won't answer your specific scoring issues, it will, hopefully, point you to the places that can
|
|
help you figure out the what and why of Lucene scoring.</p>
|
|
<p>Lucene scoring uses a combination of the
|
|
<a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model (VSM) of Information
|
|
Retrieval</a> and the <a href="http://en.wikipedia.org/wiki/Standard_Boolean_model">Boolean model</a>
|
|
to determine
|
|
how relevant a given Document is to a User's query. In general, the idea behind the VSM is the more
|
|
times a query term appears in a document relative to
|
|
the number of times the term appears in all the documents in the collection, the more relevant that
|
|
document is to the query. It uses the Boolean model to first narrow down the documents that need to
|
|
be scored based on the use of boolean logic in the Query specification. Lucene also adds some
|
|
capabilities and refinements onto this model to support boolean and fuzzy searching, but it
|
|
essentially remains a VSM based system at the heart.
|
|
For some valuable references on VSM and IR in general refer to the
|
|
<a href="http://wiki.apache.org/lucene-java/InformationRetrieval">Lucene Wiki IR references</a>.
|
|
</p>
|
|
<p>The rest of this document will cover <a href="#Scoring">Scoring</a> basics and how to change your
|
|
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>. Next it will cover ways you can
|
|
customize the Lucene internals in <a href="#Changing your Scoring -- Expert Level">Changing your Scoring
|
|
-- Expert Level</a> which gives details on implementing your own
|
|
<a href="api/org/apache/lucene/search/Query.html">Query</a> class and related functionality. Finally, we
|
|
will finish up with some reference material in the <a href="#Appendix">Appendix</a>.
|
|
</p>
|
|
</div>
|
|
|
|
<a name="N10045"></a><a name="Scoring"></a>
|
|
<h2 class="boxed">Scoring</h2>
|
|
<div class="section">
|
|
<p>Scoring is very much dependent on the way documents are indexed,
|
|
so it is important to understand indexing (see
|
|
<a href="gettingstarted.html">Apache Lucene - Getting Started Guide</a>
|
|
and the Lucene
|
|
<a href="fileformats.html">file formats</a>
|
|
before continuing on with this section.) It is also assumed that readers know how to use the
|
|
<a href="api/org/apache/lucene/search/Searcher.html#explain(Query query, int doc)">Searcher.explain(Query query, int doc)</a> functionality,
|
|
which can go a long way in informing why a score is returned.
|
|
</p>
|
|
<a name="N10059"></a><a name="Fields and Documents"></a>
|
|
<h3 class="boxed">Fields and Documents</h3>
|
|
<p>In Lucene, the objects we are scoring are
|
|
<a href="api/org/apache/lucene/document/Document.html">Documents</a>. A Document is a collection
|
|
of
|
|
<a href="api/org/apache/lucene/document/Field.html">Fields</a>. Each Field has semantics about how
|
|
it is created and stored (i.e. tokenized, untokenized, raw data, compressed, etc.) It is important to
|
|
note that Lucene scoring works on Fields and then combines the results to return Documents. This is
|
|
important because two Documents with the exact same content, but one having the content in two Fields
|
|
and the other in one Field will return different scores for the same query due to length normalization
|
|
(assumming the
|
|
<a href="api/org/apache/lucene/search/DefaultSimilarity.html">DefaultSimilarity</a>
|
|
on the Fields).
|
|
</p>
|
|
<a name="N1006E"></a><a name="Score Boosting"></a>
|
|
<h3 class="boxed">Score Boosting</h3>
|
|
<p>Lucene allows influencing search results by "boosting" in more than one level:
|
|
<ul>
|
|
|
|
<li>
|
|
<b>Document level boosting</b>
|
|
- while indexing - by calling
|
|
<a href="api/org/apache/lucene/document/Document.html#setBoost(float)">document.setBoost()</a>
|
|
before a document is added to the index.
|
|
</li>
|
|
|
|
<li>
|
|
<b>Document's Field level boosting</b>
|
|
- while indexing - by calling
|
|
<a href="api/org/apache/lucene/document/Fieldable.html#setBoost(float)">field.setBoost()</a>
|
|
before adding a field to the document (and before adding the document to the index).
|
|
</li>
|
|
|
|
<li>
|
|
<b>Query level boosting</b>
|
|
- during search, by setting a boost on a query clause, calling
|
|
<a href="api/org/apache/lucene/search/Query.html#setBoost(float)">Query.setBoost()</a>.
|
|
</li>
|
|
|
|
</ul>
|
|
|
|
</p>
|
|
<p>Indexing time boosts are preprocessed for storage efficiency and written to
|
|
the directory (when writing the document) in a single byte (!) as follows:
|
|
For each field of a document, all boosts of that field
|
|
(i.e. all boosts under the same field name in that doc) are multiplied.
|
|
The result is multiplied by the boost of the document,
|
|
and also multiplied by a "field length norm" value
|
|
that represents the length of that field in that doc
|
|
(so shorter fields are automatically boosted up).
|
|
The result is decoded as a single byte
|
|
(with some precision loss of course) and stored in the directory.
|
|
The similarity object in effect at indexing computes the length-norm of the field.
|
|
</p>
|
|
<p>This composition of 1-byte representation of norms
|
|
(that is, indexing time multiplication of field boosts & doc boost & field-length-norm)
|
|
is nicely described in
|
|
<a href="api/org/apache/lucene/document/Fieldable.html#setBoost(float)">Fieldable.setBoost()</a>.
|
|
</p>
|
|
<p>Encoding and decoding of the resulted float norm in a single byte are done by the
|
|
static methods of the class Similarity:
|
|
<a href="api/org/apache/lucene/search/Similarity.html#encodeNorm(float)">encodeNorm()</a> and
|
|
<a href="api/org/apache/lucene/search/Similarity.html#decodeNorm(byte)">decodeNorm()</a>.
|
|
Due to loss of precision, it is not guaranteed that decode(encode(x)) = x,
|
|
e.g. decode(encode(0.89)) = 0.75.
|
|
At scoring (search) time, this norm is brought into the score of document
|
|
as <b>norm(t, d)</b>, as shown by the formula in
|
|
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>.
|
|
</p>
|
|
<a name="N100B1"></a><a name="Understanding the Scoring Formula"></a>
|
|
<h3 class="boxed">Understanding the Scoring Formula</h3>
|
|
<p>
|
|
This scoring formula is described in the
|
|
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a> class. Please take the time to study this formula, as it contains much of the information about how the
|
|
basics of Lucene scoring work, especially the
|
|
<a href="api/org/apache/lucene/search/TermQuery.html">TermQuery</a>.
|
|
</p>
|
|
<a name="N100C2"></a><a name="The Big Picture"></a>
|
|
<h3 class="boxed">The Big Picture</h3>
|
|
<p>OK, so the tf-idf formula and the
|
|
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>
|
|
is great for understanding the basics of Lucene scoring, but what really drives Lucene scoring are
|
|
the use and interactions between the
|
|
<a href="api/org/apache/lucene/search/Query.html">Query</a> classes, as created by each application in
|
|
response to a user's information need.
|
|
</p>
|
|
<p>In this regard, Lucene offers a wide variety of <a href="api/org/apache/lucene/search/Query.html">Query</a> implementations, most of which are in the
|
|
<a href="api/org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a> package.
|
|
These implementations can be combined in a wide variety of ways to provide complex querying
|
|
capabilities along with
|
|
information about where matches took place in the document collection. The <a href="#Query Classes">Query</a>
|
|
section below
|
|
highlights some of the more important Query classes. For information on the other ones, see the
|
|
<a href="api/org/apache/lucene/search/package-summary.html">package summary</a>. For details on implementing
|
|
your own Query class, see <a href="#Changing your Scoring -- Expert Level">Changing your Scoring --
|
|
Expert Level</a> below.
|
|
</p>
|
|
<p>Once a Query has been created and submitted to the
|
|
<a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>, the scoring process
|
|
begins. (See the <a href="#Appendix">Appendix</a> Algorithm section for more notes on the process.) After some infrastructure setup,
|
|
control finally passes to the <a href="api/org/apache/lucene/search/Weight.html">Weight</a> implementation and its
|
|
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a> instance. In the case of any type of
|
|
<a href="api/org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>, scoring is handled by the
|
|
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight2</a> (link goes to ViewVC BooleanQuery java code which contains the BooleanWeight2 inner class),
|
|
unless the static
|
|
<a href="api/org/apache/lucene/search/BooleanQuery.html#setUseScorer14(boolean)">
|
|
BooleanQuery#setUseScorer14(boolean)</a> method is set to true,
|
|
in which case the
|
|
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight</a>
|
|
(link goes to ViewVC BooleanQuery java code, which contains the BooleanWeight inner class) from the 1.4 version of Lucene is used by default.
|
|
See <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/CHANGES.txt">CHANGES.txt</a> under release 1.9 RC1 for more information on choosing which Scorer to use.
|
|
</p>
|
|
<p>
|
|
Assuming the use of the BooleanWeight2, a
|
|
BooleanScorer2 is created by bringing together
|
|
all of the
|
|
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>s from the sub-clauses of the BooleanQuery.
|
|
When the BooleanScorer2 is asked to score it delegates its work to an internal Scorer based on the type
|
|
of clauses in the Query. This internal Scorer essentially loops over the sub scorers and sums the scores
|
|
provided by each scorer while factoring in the coord() score.
|
|
<!-- Do we want to fill in the details of the counting sum scorer, disjunction scorer, etc.? -->
|
|
</p>
|
|
<a name="N1011A"></a><a name="Query Classes"></a>
|
|
<h3 class="boxed">Query Classes</h3>
|
|
<p>For information on the Query Classes, refer to the
|
|
<a href="api/org/apache/lucene/search/package-summary.html#query">search package javadocs</a>
|
|
|
|
</p>
|
|
<a name="N10127"></a><a name="Changing Similarity"></a>
|
|
<h3 class="boxed">Changing Similarity</h3>
|
|
<p>One of the ways of changing the scoring characteristics of Lucene is to change the similarity factors. For information on
|
|
how to do this, see the
|
|
<a href="api/org/apache/lucene/search/package-summary.html#changingSimilarity">search package javadocs</a>
|
|
</p>
|
|
</div>
|
|
|
|
<a name="N10134"></a><a name="Changing your Scoring -- Expert Level"></a>
|
|
<h2 class="boxed">Changing your Scoring -- Expert Level</h2>
|
|
<div class="section">
|
|
<p>At a much deeper level, one can affect scoring by implementing their own Query classes (and related scoring classes.) To learn more
|
|
about how to do this, refer to the
|
|
<a href="api/org/apache/lucene/search/package-summary.html#scoring">search package javadocs</a>
|
|
|
|
</p>
|
|
</div>
|
|
|
|
|
|
<a name="N10141"></a><a name="Appendix"></a>
|
|
<h2 class="boxed">Appendix</h2>
|
|
<div class="section">
|
|
<a name="N10146"></a><a name="Class Diagrams"></a>
|
|
<h3 class="boxed">Class Diagrams</h3>
|
|
<p>
|
|
|
|
<a href="http://wiki.apache.org/lucene-java/KarlWettin?action=AttachFile&do=view&target=search_uml_1.jpg">
|
|
Karl Wettin's UML on the Wiki</a>
|
|
|
|
</p>
|
|
<a name="N10153"></a><a name="Sequence Diagrams"></a>
|
|
<h3 class="boxed">Sequence Diagrams</h3>
|
|
<p>FILL IN HERE. Volunteers?</p>
|
|
<a name="N1015C"></a><a name="Algorithm"></a>
|
|
<h3 class="boxed">Algorithm</h3>
|
|
<p>This section is mostly notes on stepping through the Scoring process and serves as
|
|
fertilizer for the earlier sections.</p>
|
|
<p>In the typical search application, a
|
|
<a href="api/org/apache/lucene/search/Query.html">Query</a>
|
|
is passed to the
|
|
<a href="api/org/apache/lucene/search/Searcher.html">Searcher</a>
|
|
, beginning the scoring process.
|
|
</p>
|
|
<p>Once inside the Searcher, a
|
|
<a href="api/org/apache/lucene/search/HitCollector.html">HitCollector</a>
|
|
is used for the scoring and sorting of the search results.
|
|
These important objects are involved in a search:
|
|
<ol>
|
|
|
|
<li>The
|
|
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
|
object of the Query. The Weight object is an internal representation of the Query that
|
|
allows the Query to be reused by the Searcher.
|
|
</li>
|
|
|
|
<li>The Searcher that initiated the call.</li>
|
|
|
|
<li>A
|
|
<a href="api/org/apache/lucene/search/Filter.html">Filter</a>
|
|
for limiting the result set. Note, the Filter may be null.
|
|
</li>
|
|
|
|
<li>A
|
|
<a href="api/org/apache/lucene/search/Sort.html">Sort</a>
|
|
object for specifying how to sort the results if the standard score based sort method is not
|
|
desired.
|
|
</li>
|
|
|
|
</ol>
|
|
|
|
</p>
|
|
<p> Assuming we are not sorting (since sorting doesn't
|
|
effect the raw Lucene score),
|
|
we call one of the search method of the Searcher, passing in the
|
|
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
|
object created by Searcher.createWeight(Query),
|
|
<a href="api/org/apache/lucene/search/Filter.html">Filter</a>
|
|
and the number of results we want. This method
|
|
returns a
|
|
<a href="api/org/apache/lucene/search/TopDocs.html">TopDocs</a>
|
|
object, which is an internal collection of search results.
|
|
The Searcher creates a
|
|
<a href="api/org/apache/lucene/search/TopDocCollector.html">TopDocCollector</a>
|
|
and passes it along with the Weight, Filter to another expert search method (for more on the
|
|
<a href="api/org/apache/lucene/search/HitCollector.html">HitCollector</a>
|
|
mechanism, see
|
|
<a href="api/org/apache/lucene/search/Searcher.html">Searcher</a>
|
|
.) The TopDocCollector uses a
|
|
<a href="api/org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>
|
|
to collect the top results for the search.
|
|
</p>
|
|
<p>If a Filter is being used, some initial setup is done to determine which docs to include. Otherwise,
|
|
we ask the Weight for
|
|
a
|
|
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
|
for the
|
|
<a href="api/org/apache/lucene/index/IndexReader.html">IndexReader</a>
|
|
of the current searcher and we proceed by
|
|
calling the score method on the
|
|
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
|
.
|
|
</p>
|
|
<p>At last, we are actually going to score some documents. The score method takes in the HitCollector
|
|
(most likely the TopDocCollector) and does its business.
|
|
Of course, here is where things get involved. The
|
|
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
|
that is returned by the
|
|
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
|
object depends on what type of Query was submitted. In most real world applications with multiple
|
|
query terms,
|
|
the
|
|
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
|
is going to be a
|
|
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java?view=log">BooleanScorer2</a>
|
|
(see the section on customizing your scoring for info on changing this.)
|
|
|
|
</p>
|
|
<p>Assuming a BooleanScorer2 scorer, we first initialize the Coordinator, which is used to apply the
|
|
coord() factor. We then
|
|
get a internal Scorer based on the required, optional and prohibited parts of the query.
|
|
Using this internal Scorer, the BooleanScorer2 then proceeds
|
|
into a while loop based on the Scorer#next() method. The next() method advances to the next document
|
|
matching the query. This is an
|
|
abstract method in the Scorer class and is thus overriden by all derived
|
|
implementations. <!-- DOUBLE CHECK THIS -->If you have a simple OR query
|
|
your internal Scorer is most likely a DisjunctionSumScorer, which essentially combines the scorers
|
|
from the sub scorers of the OR'd terms.</p>
|
|
</div>
|
|
|
|
</div>
|
|
<!--+
|
|
|end content
|
|
+-->
|
|
<div class="clearboth"> </div>
|
|
</div>
|
|
<div id="footer">
|
|
<!--+
|
|
|start bottomstrip
|
|
+-->
|
|
<div class="lastmodified">
|
|
<script type="text/javascript"><!--
|
|
document.write("Last Published: " + document.lastModified);
|
|
// --></script>
|
|
</div>
|
|
<div class="copyright">
|
|
Copyright ©
|
|
2006 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
|
|
</div>
|
|
<!--+
|
|
|end bottomstrip
|
|
+-->
|
|
</div>
|
|
</body>
|
|
</html>
|