integrate scoring.html into scoring package, fix broken links, and update for 4.0

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1328929 13f79535-47bb-0310-9956-ffa450edef68
2025-03-04 07:19:18 +00:00 · 2012-04-22 18:41:42 +00:00 · 2012-04-22 18:41:42 +00:00 · f13faf3ee9
commit f13faf3ee9
parent 4850ac1dc0
4 changed files with 189 additions and 355 deletions
--- a/lucene/core/src/java/org/apache/lucene/search/package.html
+++ b/lucene/core/src/java/org/apache/lucene/search/package.html
@ -27,18 +27,33 @@ Code to search indices.
    <ol>
        <li><a href="#search">Search Basics</a></li>
        <li><a href="#query">The Query Classes</a></li>
-        <li><a href="#scoring">Changing the Scoring</a></li>
+        <li><a href="#scoring">Scoring: Introduction</a></li>
+        <li><a href="#scoringBasics">Scoring: Basics</a></li>
+        <li><a href="#changingScoring">Changing the Scoring</a></li>
+        <li><a href="#algorithm">Appendix: Search Algorithm</a></li>
    </ol>
 </p>
 <a name="search"></a>
-<h2>Search</h2>
+<h2>Search Basics</h2>
 <p>
-Search over indices.
-
-Applications usually call {@link
+Lucene offers a wide variety of {@link org.apache.lucene.search.Query} implementations, most of which are in
+this package, its subpackages ({@link org.apache.lucene.search.spans spans}, {@link org.apache.lucene.search.payloads payloads}),
+or the <a href="{@docRoot}/../queries/overview-summary.html">queries module</a>. These implementations can be combined in a wide 
+variety of ways to provide complex querying capabilities along with information about where matches took place in the document 
+collection. The <a href="#query">Query Classes</a> section below highlights some of the more important Query classes. For details 
+on implementing your own Query class, see <a href="#customQueries">Custom Queries -- Expert Level</a> below.
+</p>
+<p>
+To perform a search, applications usually call {@link
 org.apache.lucene.search.IndexSearcher#search(Query,int)} or {@link
 org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
-
+</p>
+<p>
+Once a Query has been created and submitted to the {@link org.apache.lucene.search.IndexSearcher IndexSearcher}, the scoring
+process begins. After some infrastructure setup, control finally passes to the {@link org.apache.lucene.search.Weight Weight}
+implementation and its {@link org.apache.lucene.search.Scorer Scorer} instances. See the <a href="#algorithm">Algorithm</a> 
+section for more notes on the process.
+</p>
    <!-- FILL IN MORE HERE -->   
    <!-- TODO: this page over-links the same things too many times -->
 </p>
@ -211,20 +226,118 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
    This type of query can be useful when accounting for spelling variations in the collection.
 </p>
 <a name="scoring"></a>
+<h2>Scoring &mdash; Introduction</h2>
+<p>Lucene scoring is the heart of why we all love Lucene. It is blazingly fast and it hides 
+   almost all of the complexity from the user. In a nutshell, it works.  At least, that is, 
+   until it doesn't work, or doesn't work as one would expect it to work.  Then we are left 
+   digging into Lucene internals or asking for help on 
+   <a href="mailto:java-user@lucene.apache.org">java-user@lucene.apache.org</a> to figure out 
+   why a document with five of our query terms scores lower than a different document with 
+   only one of the query terms. 
+</p>
+<p>While this document won't answer your specific scoring issues, it will, hopefully, point you 
+  to the places that can help you figure out the <i>what</i> and <i>why</i> of Lucene scoring.
+</p>
+<p>Lucene scoring supports a number of pluggable information retrieval 
+   <a href="http://en.wikipedia.org/wiki/Information_retrieval#Model_types">models</a>, including:
+   <ul>
+     <li><a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model (VSM)</a></li>
+     <li><a href="http://en.wikipedia.org/wiki/Probabilistic_relevance_model">Probablistic Models</a> such as 
+         <a href="http://en.wikipedia.org/wiki/Probabilistic_relevance_model_(BM25)">Okapi BM25</a> and
+         <a href="http://en.wikipedia.org/wiki/Divergence-from-randomness_model">DFR</a></li>
+     <li><a href="http://en.wikipedia.org/wiki/Language_model">Language models</a></li>
+   </ul>
+   These models can be plugged in via the {@link org.apache.lucene.search.similarities Similarity API},
+   and offer extension hooks and parameters for tuning. In general, Lucene first narrows down the documents
+   that need to be scored based on boolean logic in the Query specification, and then ranks this subset of
+   documents via the retrieval model. For some valuable references on VSM and IR in general refer to
+   <a href="http://wiki.apache.org/lucene-java/InformationRetrieval">Lucene Wiki IR references</a>.
+</p>
+<p>The rest of this document will cover <a href="#scoringBasics">Scoring basics</a> and explain how to 
+   change your {@link org.apache.lucene.search.similarities.Similarity Similarity}. Next, it will cover
+   ways you can customize the lucene internals in 
+   <a href="#customQueriesExpert">Custom Queries -- Expert Level</a>, which gives details on 
+   implementing your own {@link org.apache.lucene.search.Query Query} class and related functionality.
+   Finally, we will finish up with some reference material in the <a href="#algorithm">Appendix</a>.
+</p>
+<a name="scoringBasics"></a>
+<h2>Scoring &mdash; Basics</h2>
+<p>Scoring is very much dependent on the way documents are indexed, so it is important to understand 
+   indexing. (see <a href="@{docRoot}/overview-summary.html">Lucene overview</a> before continuing
+   on with this section) It is also assumed that readers know how to use the 
+   {@link org.apache.lucene.search.IndexSearcher#explain(org.apache.lucene.search.Query, int) IndexSearcher.explain(Query, doc)}
+   functionality, which can go a long way in informing why a score is returned.
+</p>
+<h4>Fields and Documents</h4>
+<p>In Lucene, the objects we are scoring are {@link org.apache.lucene.document.Document Document}s.
+   A Document is a collection of {@link org.apache.lucene.document.Field Field}s.  Each Field has
+   {@link org.apache.lucene.document.FieldType semantics} about how it is created and stored 
+   ({@link org.apache.lucene.document.FieldType#tokenized() tokenized}, 
+   {@link org.apache.lucene.document.FieldType#stored() stored}, etc). It is important to note that 
+   Lucene scoring works on Fields and then combines the results to return Documents. This is 
+   important because two Documents with the exact same content, but one having the content in two
+   Fields and the other in one Field may return different scores for the same query due to length
+   normalization.
+</p>
+<h4>Score Boosting</h4>
+<p>Lucene allows influencing search results by "boosting" in more than one level:
+   <ul>                   
+      <li><b>Index-time boost</b> by calling
+       {@link org.apache.lucene.document.Field#setBoost(float) Field.setBoost()} before a document is 
+       added to the index.</li>
+      <li><b>Query-time boost</b> by setting a boost on a query clause, calling
+       {@link org.apache.lucene.search.Query#setBoost(float) Query.setBoost()}.</li>
+   </ul>    
+</p>
+<p>Indexing time boosts are pre-processed for storage efficiency and written to
+   storage for a field as follows:
+   <ul>
+       <li>All boosts of that field (i.e. all boosts under the same field name in that doc) are 
+           multiplied.</li>
+       <li>The boost is then encoded into a normalization value by the Similarity
+           object at index-time: {@link org.apache.lucene.search.similarities.Similarity#computeNorm computeNorm()}.
+           The actual encoding depends upon the Similarity implementation, but note that most
+           use a lossy encoding (such as multiplying the boost with document length or similar, packed
+           into a single byte!).</li>
+       <li>Decoding of any index-time normalization values and integration into the document's score is also performed 
+           at search time by the Similarity.</li>
+    </ul>
+</p>
+<a name="changingScoring"></a>
 <h2>Changing Scoring &mdash; Similarity</h2>
-
+<p>
+Changing {@link org.apache.lucene.search.similarities.Similarity Similarity} is an easy way to 
+influence scoring, this is done at index-time with 
+{@link org.apache.lucene.index.IndexWriterConfig#setSimilarity(org.apache.lucene.search.similarities.Similarity)
+ IndexWriterConfig.setSimilarity(Similarity)} and at query-time with
+{@link org.apache.lucene.search.IndexSearcher#setSimilarity(org.apache.lucene.search.similarities.Similarity)
+ IndexSearcher.setSimilarity(Similarity)}.
+</p>
+<p>
+You can influence scoring by configuring a different built-in Similarity implementation, or by tweaking its
+parameters, subclassing it to override behavior. Some implementations also offer a modular API which you can
+extend by plugging in a different component (e.g. term frequency normalizer).
+</p>
+<p>
+Finally, you can extend the low level {@link org.apache.lucene.search.similarities.Similarity Similarity} directly
+to implement a new retrieval model, or to use external scoring factors particular to your application. For example,
+a custom Similarity can access per-document values via {@link org.apache.lucene.search.FieldCache FieldCache} or
+{@link org.apache.lucene.index.DocValues} and integrate them into the score.
+</p>
+<p>
 See the {@link org.apache.lucene.search.similarities} package documentation for information
-on the available scoring models and extending or changing Similarity.
+on the built-in available scoring models and extending or changing Similarity.
+</p>
+<a name="customQueriesExpert"></a>
+<h2>Custom Queries &mdash; Expert Level</h2>

-<h2>Changing Scoring &mdash; Expert Level</h2>
-
-<p>Changing scoring is an expert level task, so tread carefully and be prepared to share your code if
+<p>Custom queries are an expert level task, so tread carefully and be prepared to share your code if
    you want help.
 </p>

 <p>With the warning out of the way, it is possible to change a lot more than just the Similarity
-    when it comes to scoring in Lucene. Lucene's scoring is a complex mechanism that is grounded by
-    <span >three main classes</span>:
+    when it comes to matching and scoring in Lucene. Lucene's search is a complex mechanism that is grounded by
+    <span>three main classes</span>:
    <ol>
        <li>
            {@link org.apache.lucene.search.Query Query} &mdash; The abstract object representation of the
@ -248,13 +361,13 @@ on the available scoring models and extending or changing Similarity.
        {@link org.apache.lucene.search.Query Query} class has several methods that are important for
        derived classes:
        <ol>
-            <li>{@link org.apache.lucene.search.Query#createWeight(IndexSearcher) createWeight(IndexSearcher searcher} &mdash; A
+            <li>{@link org.apache.lucene.search.Query#createWeight(IndexSearcher) createWeight(IndexSearcher searcher)} &mdash; A
                {@link org.apache.lucene.search.Weight Weight} is the internal representation of the
                Query, so each Query implementation must
                provide an implementation of Weight. See the subsection on <a
                    href="#weightClass">The Weight Interface</a> below for details on implementing the Weight
                interface.</li>
-            <li>{@link org.apache.lucene.search.Query#rewrite(IndexReader) rewrite(IndexReader reader} &mdash; Rewrites queries into primitive queries. Primitive queries are:
+            <li>{@link org.apache.lucene.search.Query#rewrite(IndexReader) rewrite(IndexReader reader)} &mdash; Rewrites queries into primitive queries. Primitive queries are:
                {@link org.apache.lucene.search.TermQuery TermQuery},
                {@link org.apache.lucene.search.BooleanQuery BooleanQuery}, <span
                    >and other queries that implement {@link org.apache.lucene.search.Query#createWeight(IndexSearcher) createWeight(IndexSearcher searcher)}</span></li>
@ -363,5 +476,63 @@ on the available scoring models and extending or changing Similarity.
        back
        out of Lucene (similar to Doug adding SpanQuery functionality).</p>

+<!-- TODO: integrate this better, its better served as an intro than an appendix -->
+<a name="algorithm"></a>
+<h2>Appendix: Search Algorithm</h2>
+<p>This section is mostly notes on stepping through the Scoring process and serves as
+   fertilizer for the earlier sections.</p>
+<p>In the typical search application, a {@link org.apache.lucene.search.Query Query}
+   is passed to the {@link org.apache.lucene.search.IndexSearcher IndexSearcher},
+   beginning the scoring process.</p>
+<p>Once inside the IndexSearcher, a {@link org.apache.lucene.search.Collector Collector}
+   is used for the scoring and sorting of the search results.
+   These important objects are involved in a search:
+   <ol>                
+      <li>The {@link org.apache.lucene.search.Weight Weight} object of the Query. The
+          Weight object is an internal representation of the Query that allows the Query 
+          to be reused by the IndexSearcher.</li>
+      <li>The IndexSearcher that initiated the call.</li>     
+      <li>A {@link org.apache.lucene.search.Filter Filter} for limiting the result set.
+          Note, the Filter may be null.</li>                   
+      <li>A {@link org.apache.lucene.search.Sort Sort} object for specifying how to sort
+          the results if the standard score-based sort method is not desired.</li>                   
+  </ol>       
+</p>
+<p>Assuming we are not sorting (since sorting doesn't affect the raw Lucene score),
+   we call one of the search methods of the IndexSearcher, passing in the
+   {@link org.apache.lucene.search.Weight Weight} object created by
+   {@link org.apache.lucene.search.IndexSearcher#createNormalizedWeight(org.apache.lucene.search.Query)
+    IndexSearcher.createNormalizedWeight(Query)}, 
+   {@link org.apache.lucene.search.Filter Filter} and the number of results we want.
+   This method returns a {@link org.apache.lucene.search.TopDocs TopDocs} object,
+   which is an internal collection of search results. The IndexSearcher creates
+   a {@link org.apache.lucene.search.TopScoreDocCollector TopScoreDocCollector} and
+   passes it along with the Weight, Filter to another expert search method (for
+   more on the {@link org.apache.lucene.search.Collector Collector} mechanism,
+   see {@link org.apache.lucene.search.IndexSearcher IndexSearcher}). The TopScoreDocCollector
+   uses a {@link org.apache.lucene.util.PriorityQueue PriorityQueue} to collect the
+   top results for the search.
+</p> 
+<p>If a Filter is being used, some initial setup is done to determine which docs to include. 
+   Otherwise, we ask the Weight for a {@link org.apache.lucene.search.Scorer Scorer} for each
+   {@link org.apache.lucene.index.IndexReader IndexReader} segment and proceed by calling
+   {@link org.apache.lucene.search.Scorer#score(org.apache.lucene.search.Collector) Scorer.score()}.
+</p>
+<p>At last, we are actually going to score some documents. The score method takes in the Collector
+   (most likely the TopScoreDocCollector or TopFieldCollector) and does its business.Of course, here 
+   is where things get involved. The {@link org.apache.lucene.search.Scorer Scorer} that is returned
+   by the {@link org.apache.lucene.search.Weight Weight} object depends on what type of Query was
+   submitted. In most real world applications with multiple query terms, the 
+   {@link org.apache.lucene.search.Scorer Scorer} is going to be a <code>BooleanScorer2</code> created
+   from {@link org.apache.lucene.search.BooleanQuery.BooleanWeight BooleanWeight} (see the section on
+   <a href="#customQueriesExpert">custom queries</a> for info on changing this).
+</p>
+<p>Assuming a BooleanScorer2, we first initialize the Coordinator, which is used to apply the coord() 
+  factor. We then get a internal Scorer based on the required, optional and prohibited parts of the query.
+  Using this internal Scorer, the BooleanScorer2 then proceeds into a while loop based on the 
+  {@link org.apache.lucene.search.Scorer#nextDoc Scorer.nextDoc()} method. The nextDoc() method advances 
+  to the next document matching the query. This is an abstract method in the Scorer class and is thus 
+  overridden by all derived  implementations. If you have a simple OR query your internal Scorer is most 
+  likely a DisjunctionSumScorer, which essentially combines the scorers from the sub scorers of the OR'd terms.</p>
 </body>
 </html>
--- a/lucene/core/src/java/org/apache/lucene/search/similarities/package.html
+++ b/lucene/core/src/java/org/apache/lucene/search/similarities/package.html
@ -39,7 +39,8 @@ package.
 <h2>Summary of the Ranking Methods</h2>

 <p>{@link org.apache.lucene.search.similarities.DefaultSimilarity} is the original Lucene
-scoring function. It is based on a highly optimized Vector Space Model. For more
+scoring function. It is based on a highly optimized 
+<a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model</a>. For more
 information, see {@link org.apache.lucene.search.similarities.TFIDFSimilarity}.</p>

 <p>{@link org.apache.lucene.search.similarities.BM25Similarity} is an optimized
--- a/lucene/site/html/scoring.html
+++ b/lucene/site/html/scoring.html
@ -1,338 +0,0 @@
-<html>
-<head>
-<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
-<title>Apache Lucene - Scoring</title>
-</head>
-<body>
-<h1>Apache Lucene - Scoring</h1>
-<div id="minitoc-area">
-<ul class="minitoc">
-<li>
-<a href="#Introduction">Introduction</a>
-</li>
-<li>
-<a href="#Scoring">Scoring</a>
-<ul class="minitoc">
-<li>
-<a href="#Fields and Documents">Fields and Documents</a>
-</li>
-<li>
-<a href="#Score Boosting">Score Boosting</a>
-</li>
-<li>
-<a href="#Understanding the Scoring Formula">Understanding the Scoring Formula</a>
-</li>
-<li>
-<a href="#The Big Picture">The Big Picture</a>
-</li>
-<li>
-<a href="#Query Classes">Query Classes</a>
-</li>
-<li>
-<a href="#Changing Similarity">Changing Similarity</a>
-</li>
-</ul>
-</li>
-<li>
-<a href="#Changing your Scoring -- Expert Level">Changing your Scoring -- Expert Level</a>
-</li>
-<li>
-<a href="#Appendix">Appendix</a>
-<ul class="minitoc">
-<li>
-<a href="#Algorithm">Algorithm</a>
-</li>
-</ul>
-</li>
-</ul>
-</div>
-
-        
-<a name="N10013"></a><a name="Introduction"></a>
-<h2 class="boxed">Introduction</h2>
-<div class="section">
-<p>Lucene scoring is the heart of why we all love Lucene.  It is blazingly fast and it hides almost all of the complexity from the user.
-                In a nutshell, it works.  At least, that is, until it doesn't work, or doesn't work as one would expect it to
-            work.  Then we are left digging into Lucene internals or asking for help on java-user@lucene.apache.org to figure out why a document with five of our query terms
-            scores lower than a different document with only one of the query terms. </p>
-<p>While this document won't answer your specific scoring issues, it will, hopefully, point you to the places that can
-            help you figure out the what and why of Lucene scoring.</p>
-<p>Lucene scoring uses a combination of the
-                <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space Model (VSM) of Information
-                    Retrieval</a> and the <a href="http://en.wikipedia.org/wiki/Standard_Boolean_model">Boolean model</a>
-                to determine
-                how relevant a given Document is to a User's query.  In general, the idea behind the VSM is the more
-                times a query term appears in a document relative to
-                the number of times the term appears in all the documents in the collection, the more relevant that
-                document is to the query.  It uses the Boolean model to first narrow down the documents that need to
-                be scored based on the use of boolean logic in the Query specification.  Lucene also adds some
-                capabilities and refinements onto this model to support boolean and fuzzy searching, but it
-                essentially remains a VSM based system at the heart.
-                For some valuable references on VSM and IR in general refer to the
-                <a href="http://wiki.apache.org/lucene-java/InformationRetrieval">Lucene Wiki IR references</a>.
-            </p>
-<p>The rest of this document will cover <a href="#Scoring">Scoring</a> basics and how to change your
-                <a href="core/org/apache/lucene/search/Similarity.html">Similarity</a>.  Next it will cover ways you can
-                customize the Lucene internals in <a href="#Changing your Scoring -- Expert Level">Changing your Scoring
-                -- Expert Level</a> which gives details on implementing your own
-                <a href="core/org/apache/lucene/search/Query.html">Query</a> class and related functionality.  Finally, we
-                will finish up with some reference material in the <a href="#Appendix">Appendix</a>.
-            </p>
-</div>
-        
-<a name="N10045"></a><a name="Scoring"></a>
-<h2 class="boxed">Scoring</h2>
-<div class="section">
-<p>Scoring is very much dependent on the way documents are indexed,
-                so it is important to understand indexing (see
-                <a href="gettingstarted.html">Apache Lucene - Getting Started Guide</a>
-                and the Lucene
-                <a href="fileformats.html">file formats</a>
-                before continuing on with this section.)  It is also assumed that readers know how to use the
-                <a href="core/org/apache/lucene/search/Searcher.html#explain(Query query, int doc)">Searcher.explain(Query query, int doc)</a> functionality,
-                which can go a long way in informing why a score is returned.
-            </p>
-<a name="N10059"></a><a name="Fields and Documents"></a>
-<h3 class="boxed">Fields and Documents</h3>
-<p>In Lucene, the objects we are scoring are
-                    <a href="core/org/apache/lucene/document/Document.html">Documents</a>.  A Document is a collection
-                of
-                    <a href="core/org/apache/lucene/document/Field.html">Fields</a>.  Each Field has semantics about how
-                it is created and stored (i.e. tokenized, untokenized, raw data, compressed, etc.)  It is important to
-                    note that Lucene scoring works on Fields and then combines the results to return Documents.  This is
-                    important because two Documents with the exact same content, but one having the content in two Fields
-                    and the other in one Field will return different scores for the same query due to length normalization
-                    (assumming the
-                    <a href="core/org/apache/lucene/search/DefaultSimilarity.html">DefaultSimilarity</a>
-                    on the Fields).
-                </p>
-<a name="N1006E"></a><a name="Score Boosting"></a>
-<h3 class="boxed">Score Boosting</h3>
-<p>Lucene allows influencing search results by "boosting" in more than one level:
-                  <ul>
-                    
-<li>
-<b>Document level boosting</b>
-                    - while indexing - by calling
-                    <a href="core/org/apache/lucene/document/Document.html#setBoost(float)">document.setBoost()</a>
-                    before a document is added to the index.
-                    </li>
-                    
-<li>
-<b>Document's Field level boosting</b>
-                    - while indexing - by calling
-                    <a href="core/org/apache/lucene/document/Fieldable.html#setBoost(float)">field.setBoost()</a>
-                    before adding a field to the document (and before adding the document to the index).
-                    </li>
-                    
-<li>
-<b>Query level boosting</b>
-                     - during search, by setting a boost on a query clause, calling
-                     <a href="core/org/apache/lucene/search/Query.html#setBoost(float)">Query.setBoost()</a>.
-                    </li>
-                  
-</ul>
-                
-</p>
-<p>Indexing time boosts are preprocessed for storage efficiency and written to
-                  the directory (when writing the document) in a single byte (!) as follows:
-                  For each field of a document, all boosts of that field
-                  (i.e. all boosts under the same field name in that doc) are multiplied.
-                  The result is multiplied by the boost of the document,
-                  and also multiplied by a "field length norm" value
-                  that represents the length of that field in that doc
-                  (so shorter fields are automatically boosted up).
-                  The result is decoded as a single byte
-                  (with some precision loss of course) and stored in the directory.
-                  The similarity object in effect at indexing computes the length-norm of the field.
-                </p>
-<p>This composition of 1-byte representation of norms
-                (that is, indexing time multiplication of field boosts &amp; doc boost &amp; field-length-norm)
-                is nicely described in
-                <a href="core/org/apache/lucene/document/Fieldable.html#setBoost(float)">Fieldable.setBoost()</a>.
-                </p>
-<p>Encoding and decoding of the resulted float norm in a single byte are done by the
-                static methods of the class Similarity:
-                <a href="core/org/apache/lucene/search/Similarity.html#encodeNorm(float)">encodeNorm()</a> and
-                <a href="core/org/apache/lucene/search/Similarity.html#decodeNorm(byte)">decodeNorm()</a>.
-                Due to loss of precision, it is not guaranteed that decode(encode(x)) = x,
-                e.g. decode(encode(0.89)) = 0.75.
-                At scoring (search) time, this norm is brought into the score of document
-                as <b>norm(t, d)</b>, as shown by the formula in
-                <a href="core/org/apache/lucene/search/Similarity.html">Similarity</a>.
-                </p>
-<a name="N100B1"></a><a name="Understanding the Scoring Formula"></a>
-<h3 class="boxed">Understanding the Scoring Formula</h3>
-<p>
-                This scoring formula is described in the
-                    <a href="core/org/apache/lucene/search/Similarity.html">Similarity</a> class.  Please take the time to study this formula, as it contains much of the information about how the
-                    basics of Lucene scoring work, especially the
-                    <a href="core/org/apache/lucene/search/TermQuery.html">TermQuery</a>.
-                </p>
-<a name="N100C2"></a><a name="The Big Picture"></a>
-<h3 class="boxed">The Big Picture</h3>
-<p>OK, so the tf-idf formula and the
-                    <a href="core/org/apache/lucene/search/Similarity.html">Similarity</a>
-                    is great for understanding the basics of Lucene scoring, but what really drives Lucene scoring are
-                    the use and interactions between the
-                    <a href="core/org/apache/lucene/search/Query.html">Query</a> classes, as created by each application in
-                    response to a user's information need.
-                </p>
-<p>In this regard, Lucene offers a wide variety of <a href="core/org/apache/lucene/search/Query.html">Query</a> implementations, most of which are in the
-                    <a href="core/org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a> package.
-                    These implementations can be combined in a wide variety of ways to provide complex querying
-                    capabilities along with
-                    information about where matches took place in the document collection. The <a href="#Query Classes">Query</a>
-                    section below
-                    highlights some of the more important Query classes.  For information on the other ones, see the
-                    <a href="core/org/apache/lucene/search/package-summary.html">package summary</a>.  For details on implementing
-                    your own Query class, see <a href="#Changing your Scoring -- Expert Level">Changing your Scoring --
-                    Expert Level</a> below.
-                </p>
-<p>Once a Query has been created and submitted to the
-                    <a href="core/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>, the scoring process
-                begins.  (See the <a href="#Appendix">Appendix</a> Algorithm section for more notes on the process.)  After some infrastructure setup,
-                control finally passes to the <a href="core/org/apache/lucene/search/Weight.html">Weight</a> implementation and its
-                    <a href="core/org/apache/lucene/search/Scorer.html">Scorer</a> instance.  In the case of any type of
-                    <a href="core/org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>, scoring is handled by the
-                    <a href="http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight2</a>
-                    (link goes to ViewVC BooleanQuery java code which contains the BooleanWeight2 inner class) or
-                    <a href="http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight</a>
-                    (link goes to ViewVC BooleanQuery java code, which contains the BooleanWeight inner class).
-                </p>
-<p>
-                    Assuming the use of the BooleanWeight2, a
-                    BooleanScorer2 is created by bringing together
-                    all of the
-                    <a href="core/org/apache/lucene/search/Scorer.html">Scorer</a>s from the sub-clauses of the BooleanQuery.
-                    When the BooleanScorer2 is asked to score it delegates its work to an internal Scorer based on the type
-                    of clauses in the Query.  This internal Scorer essentially loops over the sub scorers and sums the scores
-                    provided by each scorer while factoring in the coord() score.
-                    <!-- Do we want to fill in the details of the counting sum scorer, disjunction scorer, etc.? -->
-                </p>
-<a name="N10112"></a><a name="Query Classes"></a>
-<h3 class="boxed">Query Classes</h3>
-<p>For information on the Query Classes, refer to the
-                    <a href="core/org/apache/lucene/search/package-summary.html#query">search package javadocs</a>
-                
-</p>
-<a name="N1011F"></a><a name="Changing Similarity"></a>
-<h3 class="boxed">Changing Similarity</h3>
-<p>One of the ways of changing the scoring characteristics of Lucene is to change the similarity factors.  For information on
-                how to do this, see the
-                    <a href="core/org/apache/lucene/search/package-summary.html#changingSimilarity">search package javadocs</a>
-</p>
-</div>
-        
-<a name="N1012C"></a><a name="Changing your Scoring -- Expert Level"></a>
-<h2 class="boxed">Changing your Scoring -- Expert Level</h2>
-<div class="section">
-<p>At a much deeper level, one can affect scoring by implementing their own Query classes (and related scoring classes.)  To learn more
-                about how to do this, refer to the
-                <a href="core/org/apache/lucene/search/package-summary.html#scoring">search package javadocs</a>
-            
-</p>
-</div>
-
-        
-<a name="N10139"></a><a name="Appendix"></a>
-<h2 class="boxed">Appendix</h2>
-<div class="section">
-<a name="N1013E"></a><a name="Algorithm"></a>
-<h3 class="boxed">Algorithm</h3>
-<p>This section is mostly notes on stepping through the Scoring process and serves as
-                    fertilizer for the earlier sections.</p>
-<p>In the typical search application, a
-                    <a href="core/org/apache/lucene/search/Query.html">Query</a>
-                    is passed to the
-                    <a href="core/org/apache/lucene/search/Searcher.html">Searcher</a>
-                    , beginning the scoring process.
-                </p>
-<p>Once inside the Searcher, a
-                    <a href="core/org/apache/lucene/search/Collector.html">Collector</a>
-                    is used for the scoring and sorting of the search results.
-                    These important objects are involved in a search:
-                    <ol>
-                        
-<li>The
-                            <a href="core/org/apache/lucene/search/Weight.html">Weight</a>
-                            object of the Query. The Weight object is an internal representation of the Query that
-                            allows the Query to be reused by the Searcher.
-                        </li>
-                        
-<li>The Searcher that initiated the call.</li>
-                        
-<li>A
-                            <a href="core/org/apache/lucene/search/Filter.html">Filter</a>
-                            for limiting the result set. Note, the Filter may be null.
-                        </li>
-                        
-<li>A
-                            <a href="core/org/apache/lucene/search/Sort.html">Sort</a>
-                            object for specifying how to sort the results if the standard score based sort method is not
-                            desired.
-                        </li>
-                    
-</ol>
-                
-</p>
-<p> Assuming we are not sorting (since sorting doesn't
-                    effect the raw Lucene score),
-                    we call one of the search methods of the Searcher, passing in the
-                    <a href="core/org/apache/lucene/search/Weight.html">Weight</a>
-                    object created by Searcher.createWeight(Query),
-                    <a href="core/org/apache/lucene/search/Filter.html">Filter</a>
-                    and the number of results we want. This method
-                    returns a
-                    <a href="core/org/apache/lucene/search/TopDocs.html">TopDocs</a>
-                    object, which is an internal collection of search results.
-                    The Searcher creates a
-                    <a href="core/org/apache/lucene/search/TopScoreDocCollector.html">TopScoreDocCollector</a>
-                    and passes it along with the Weight, Filter to another expert search method (for more on the
-                    <a href="core/org/apache/lucene/search/Collector.html">Collector</a>
-                    mechanism, see
-                    <a href="core/org/apache/lucene/search/Searcher.html">Searcher</a>
-                    .) The TopDocCollector uses a
-                    <a href="core/org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>
-                    to collect the top results for the search.
-                </p>
-<p>If a Filter is being used, some initial setup is done to determine which docs to include. Otherwise,
-                    we ask the Weight for
-                    a
-                    <a href="core/org/apache/lucene/search/Scorer.html">Scorer</a>
-                    for the
-                    <a href="core/org/apache/lucene/index/IndexReader.html">IndexReader</a>
-                    of the current searcher and we proceed by
-                    calling the score method on the
-                    <a href="core/org/apache/lucene/search/Scorer.html">Scorer</a>
-                    .
-                </p>
-<p>At last, we are actually going to score some documents. The score method takes in the Collector
-                    (most likely the TopScoreDocCollector or TopFieldCollector) and does its business.
-                    Of course, here is where things get involved. The
-                    <a href="core/org/apache/lucene/search/Scorer.html">Scorer</a>
-                    that is returned by the
-                    <a href="core/org/apache/lucene/search/Weight.html">Weight</a>
-                    object depends on what type of Query was submitted. In most real world applications with multiple
-                    query terms,
-                    the
-                    <a href="core/org/apache/lucene/search/Scorer.html">Scorer</a>
-                    is going to be a
-                    <a href="http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/BooleanScorer2.java?view=log">BooleanScorer2</a>
-                    (see the section on customizing your scoring for info on changing this.)
-
-                </p>
-<p>Assuming a BooleanScorer2 scorer, we first initialize the Coordinator, which is used to apply the
-                    coord() factor. We then
-                    get a internal Scorer based on the required, optional and prohibited parts of the query.
-                    Using this internal Scorer, the BooleanScorer2 then proceeds
-                    into a while loop based on the Scorer#next() method. The next() method advances to the next document
-                    matching the query. This is an
-                    abstract method in the Scorer class and is thus overriden by all derived
-                    implementations.  <!-- DOUBLE CHECK THIS -->If you have a simple OR query
-                    your internal Scorer is most likely a DisjunctionSumScorer, which essentially combines the scorers
-                    from the sub scorers of the OR'd terms.</p>
-</div>
-
-</body>
-</html>
--- a/lucene/site/xsl/index.xsl
+++ b/lucene/site/xsl/index.xsl
@ -61,7 +61,7 @@
          <ul>
            <li><a href="changes/Changes.html">Changes</a>: List of changes in this release.</li>
            <li><a href="fileformats.html">File Formats</a>: Guide to the index format used by Lucene.</li>
-            <li><a href="scoring.html">Scoring in Lucene</a>: Introduction to how Lucene scores documents.</li>
+            <li><a href="core/org/apache/lucene/search/package-summary.html#package_description">Search and Scoring in Lucene</a>: Introduction to how Lucene scores documents.</li>
            <li><a href="core/org/apache/lucene/search/similarities/TFIDFSimilarity.html">Classic Scoring Formula</a>: Formula of Lucene's classic <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space</a> implementation. (look <a href="core/org/apache/lucene/search/similarities/package-summary.html#package_description">here</a> for other models)</li>
            <li><a href="queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package_description">Classic QueryParser Syntax</a>: Overview of the Classic QueryParser's syntax and features.</li>
            <li><a href="facet/org/apache/lucene/facet/doc-files/userguide.html">Facet User Guide</a>: User's Guide to implementing <a href="http://en.wikipedia.org/wiki/Faceted_search">Faceted search</a>.</li>