Update javadocs for Lucene 8.

This fixes a couple mistakes, puts more emphasis on BM25 compared to Classic and
gives more guidance regarding custom scores without a custom query.
This commit is contained in:
Adrien Grand 2018-09-03 12:21:12 +02:00
parent d93c46ea94
commit a1ec716e10
5 changed files with 102 additions and 89 deletions

View File

@ -110,8 +110,10 @@
* inverted index, is comprised of "postings." The postings, with their term dictionary, can be * inverted index, is comprised of "postings." The postings, with their term dictionary, can be
* thought of as a map that provides efficient lookup given a {@link org.apache.lucene.index.Term} * thought of as a map that provides efficient lookup given a {@link org.apache.lucene.index.Term}
* (roughly, a word or token), to (the ordered list of) {@link org.apache.lucene.document.Document}s * (roughly, a word or token), to (the ordered list of) {@link org.apache.lucene.document.Document}s
* containing that Term. Postings do not provide any way of retrieving terms given a document, * containing that Term. Codecs may additionally record
* short of scanning the entire index.</p> * {@link org.apache.lucene.index.ImpactsEnum#getImpacts impacts} alongside postings in order to be
* able to skip over low-scoring documents at search time. Postings do not provide any way of
* retrieving terms given a document, short of scanning the entire index.</p>
* *
* <a name="stored-fields"></a> * <a name="stored-fields"></a>
* <p>Stored fields are essentially the opposite of postings, providing efficient retrieval of field * <p>Stored fields are essentially the opposite of postings, providing efficient retrieval of field

View File

@ -28,6 +28,10 @@ import org.apache.lucene.util.automaton.Automaton;
* <p>This query matches the documents looking for terms that fall into the * <p>This query matches the documents looking for terms that fall into the
* supplied range according to {@link BytesRef#compareTo(BytesRef)}. * supplied range according to {@link BytesRef#compareTo(BytesRef)}.
* *
* <p><b>NOTE</b>: {@link TermRangeQuery} performs significantly slower than
* {@link PointRangeQuery point-based ranges} as it needs to visit all terms
* that match the range and merges their matches.
*
* <p>This query uses the {@link * <p>This query uses the {@link
* MultiTermQuery#CONSTANT_SCORE_REWRITE} * MultiTermQuery#CONSTANT_SCORE_REWRITE}
* rewrite method. * rewrite method.

View File

@ -44,7 +44,7 @@
* <p> * <p>
* Once a Query has been created and submitted to the {@link org.apache.lucene.search.IndexSearcher IndexSearcher}, the scoring * Once a Query has been created and submitted to the {@link org.apache.lucene.search.IndexSearcher IndexSearcher}, the scoring
* process begins. After some infrastructure setup, control finally passes to the {@link org.apache.lucene.search.Weight Weight} * process begins. After some infrastructure setup, control finally passes to the {@link org.apache.lucene.search.Weight Weight}
* implementation and its {@link org.apache.lucene.search.Scorer Scorer} or {@link org.apache.lucene.search.BulkScorer BulkScore} * implementation and its {@link org.apache.lucene.search.Scorer Scorer} or {@link org.apache.lucene.search.BulkScorer BulkScorer}
* instances. See the <a href="#algorithm">Algorithm</a> section for more notes on the process. * instances. See the <a href="#algorithm">Algorithm</a> section for more notes on the process.
* <!-- FILL IN MORE HERE --> * <!-- FILL IN MORE HERE -->
* <!-- TODO: this page over-links the same things too many times --> * <!-- TODO: this page over-links the same things too many times -->
@ -95,9 +95,11 @@
* If a query is made up of all SHOULD clauses, then every document in the result * If a query is made up of all SHOULD clauses, then every document in the result
* set matches at least one of these clauses.</p></li> * set matches at least one of these clauses.</p></li>
* *
* <li><p>{@link org.apache.lucene.search.BooleanClause.Occur#MUST MUST} &mdash; Use this operator when a clause is required to occur in the result set. Every * <li><p>{@link org.apache.lucene.search.BooleanClause.Occur#MUST MUST} &mdash; Use this operator when a clause is required to occur in the result set and should
* document in the result set will match * contribute to the score. Every document in the result set will match all such clauses.</p></li>
* all such clauses.</p></li> *
* <li><p>{@link org.apache.lucene.search.BooleanClause.Occur#FILTER FILTER} &mdash; Use this operator when a clause is required to occur in the result set but
* should not contribute to the score. Every document in the result set will match all such clauses.</p></li>
* *
* <li><p>{@link org.apache.lucene.search.BooleanClause.Occur#MUST_NOT MUST NOT} &mdash; Use this operator when a * <li><p>{@link org.apache.lucene.search.BooleanClause.Occur#MUST_NOT MUST NOT} &mdash; Use this operator when a
* clause must not occur in the result set. No * clause must not occur in the result set. No
@ -113,7 +115,7 @@
* {@link org.apache.lucene.search.TermQuery TermQuery} clauses, * {@link org.apache.lucene.search.TermQuery TermQuery} clauses,
* for example by {@link org.apache.lucene.search.WildcardQuery WildcardQuery}. * for example by {@link org.apache.lucene.search.WildcardQuery WildcardQuery}.
* The default setting for the maximum number * The default setting for the maximum number
* of clauses 1024, but this can be changed via the * of clauses is 1024, but this can be changed via the
* static method {@link org.apache.lucene.search.BooleanQuery#setMaxClauseCount(int)}. * static method {@link org.apache.lucene.search.BooleanQuery#setMaxClauseCount(int)}.
* *
* <h3>Phrases</h3> * <h3>Phrases</h3>
@ -149,23 +151,6 @@
* </ol> * </ol>
* *
* <h3> * <h3>
* {@link org.apache.lucene.search.TermRangeQuery TermRangeQuery}
* </h3>
*
* <p>The
* {@link org.apache.lucene.search.TermRangeQuery TermRangeQuery}
* matches all documents that occur in the
* exclusive range of a lower
* {@link org.apache.lucene.index.Term Term}
* and an upper
* {@link org.apache.lucene.index.Term Term}
* according to {@link org.apache.lucene.util.BytesRef#compareTo BytesRef.compareTo()}. It is not intended
* for numerical ranges; use {@link org.apache.lucene.search.PointRangeQuery PointRangeQuery} instead.
*
* For example, one could find all documents
* that have terms beginning with the letters <tt>a</tt> through <tt>c</tt>.
*
* <h3>
* {@link org.apache.lucene.search.PointRangeQuery PointRangeQuery} * {@link org.apache.lucene.search.PointRangeQuery PointRangeQuery}
* </h3> * </h3>
* *
@ -274,6 +259,7 @@
* *
* <a name="changingScoring"></a> * <a name="changingScoring"></a>
* <h2>Changing Scoring &mdash; Similarity</h2> * <h2>Changing Scoring &mdash; Similarity</h2>
* <h3>Changing the scoring formula</h3>
* <p> * <p>
* Changing {@link org.apache.lucene.search.similarities.Similarity Similarity} is an easy way to * Changing {@link org.apache.lucene.search.similarities.Similarity Similarity} is an easy way to
* influence scoring, this is done at index-time with * influence scoring, this is done at index-time with
@ -289,14 +275,54 @@
* extend by plugging in a different component (e.g. term frequency normalizer). * extend by plugging in a different component (e.g. term frequency normalizer).
* <p> * <p>
* Finally, you can extend the low level {@link org.apache.lucene.search.similarities.Similarity Similarity} directly * Finally, you can extend the low level {@link org.apache.lucene.search.similarities.Similarity Similarity} directly
* to implement a new retrieval model, or to use external scoring factors particular to your application. For example, * to implement a new retrieval model.
* a custom Similarity can access per-document values via {@link org.apache.lucene.index.NumericDocValues} and
* integrate them into the score.
* <p> * <p>
* See the {@link org.apache.lucene.search.similarities} package documentation for information * See the {@link org.apache.lucene.search.similarities} package documentation for information
* on the built-in available scoring models and extending or changing Similarity. * on the built-in available scoring models and extending or changing Similarity.
*
* <h3>Integrating field values into the score</h3>
* <p>While similarities help score a document relatively to a query, it is also common for documents to hold
* features that measure the quality of a match. Such features are best integrated into the score by indexing
* a {@link org.apache.lucene.document.FeatureField FeatureField} with the document at index-time, and then
* combining the similarity score and the feature score using a linear combination. For instance the below
* query matches the same documents as {@code originalQuery} and computes scores as
* {@code similarityScore + 0.7 * featureScore}:
* <pre class="prettyprint">
* Query originalQuery = new BooleanQuery.Builder()
* .add(new TermQuery(new Term("body", "apache")), Occur.SHOULD)
* .add(new TermQuery(new Term("body", "lucene")), Occur.SHOULD)
* .build();
* Query featureQuery = FeatureField.newSaturationQuery("features", "pagerank");
* Query query = new BooleanQuery.Builder()
* .add(originalQuery, Occur.MUST)
* .add(new BoostQuery(featureQuery, 0.7f), Occur.SHOULD)
* .build();
* </pre>
* *
* * <p>A less efficient yet more flexible way of modifying scores is to index scoring features into
* doc-value fields and then combine them with the similarity score using a
* <a href="{@docRoot}/../queries/org/apache/lucene/queries/function/FunctionScoreQuery.html">FunctionScoreQuery</a>
* from the <a href="{@docRoot}/../queries/overview-summary.html">queries module</a>. For instance
* the below example shows how to compute scores as {@code similarityScore * Math.log(popularity)}
* using the <a href="{@docRoot}/../expressions/overview-summary.html">expressions module</a> and
* assuming that values for the {@code popularity} field have been set in a
* {@link org.apache.lucene.document.NumericDocValuesField NumericDocValuesField} at index time:
* <pre class="prettyprint">
* // compile an expression:
* Expression expr = JavascriptCompiler.compile("_score * ln(popularity)");
*
* // SimpleBindings just maps variables to SortField instances
* SimpleBindings bindings = new SimpleBindings();
* bindings.add(new SortField("_score", SortField.Type.SCORE));
* bindings.add(new SortField("popularity", SortField.Type.INT));
*
* // create a query that matches based on 'originalQuery' but
* // scores using expr
* Query query = new FunctionScoreQuery(
* originalQuery,
* expr.getDoubleValuesSource(bindings));
* </pre>
*
* <a name="customQueriesExpert"></a> * <a name="customQueriesExpert"></a>
* <h2>Custom Queries &mdash; Expert Level</h2> * <h2>Custom Queries &mdash; Expert Level</h2>
* *
@ -311,15 +337,14 @@
* {@link org.apache.lucene.search.Query Query} &mdash; The abstract object representation of the * {@link org.apache.lucene.search.Query Query} &mdash; The abstract object representation of the
* user's information need.</li> * user's information need.</li>
* <li> * <li>
* {@link org.apache.lucene.search.Weight Weight} &mdash; The internal interface representation of * {@link org.apache.lucene.search.Weight Weight} &mdash; A specialization of a Query for a given
* the user's Query, so that Query objects may be reused. * index. This typically associates a Query object with index statistics that are later used to
* This is global (across all segments of the index) and * compute document scores.
* generally will require global statistics (such as docFreq
* for a given term across all segments).</li>
* <li> * <li>
* {@link org.apache.lucene.search.Scorer Scorer} &mdash; An abstract class containing common * {@link org.apache.lucene.search.Scorer Scorer} &mdash; The core class of the scoring process:
* functionality for scoring. Provides both scoring and * for a given segment, scorers return {@link org.apache.lucene.search.Scorer#iterator iterators}
* explanation capabilities. This is created per-segment.</li> * over matches and give a way to compute the {@link org.apache.lucene.search.Scorer#score score}
* of these matches.</li>
* <li> * <li>
* {@link org.apache.lucene.search.BulkScorer BulkScorer} &mdash; An abstract class that scores * {@link org.apache.lucene.search.BulkScorer BulkScorer} &mdash; An abstract class that scores
* a range of documents. A default implementation simply iterates through the hits from * a range of documents. A default implementation simply iterates through the hits from
@ -338,7 +363,7 @@
* {@link org.apache.lucene.search.Query Query} class has several methods that are important for * {@link org.apache.lucene.search.Query Query} class has several methods that are important for
* derived classes: * derived classes:
* <ol> * <ol>
* <li>{@link org.apache.lucene.search.Query#createWeight(IndexSearcher,ScoreMode,float) createWeight(IndexSearcher searcher, boolean needsScores, float boost)} &mdash; A * <li>{@link org.apache.lucene.search.Query#createWeight(IndexSearcher,ScoreMode,float) createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost)} &mdash; A
* {@link org.apache.lucene.search.Weight Weight} is the internal representation of the * {@link org.apache.lucene.search.Weight Weight} is the internal representation of the
* Query, so each Query implementation must * Query, so each Query implementation must
* provide an implementation of Weight. See the subsection on <a * provide an implementation of Weight. See the subsection on <a
@ -347,7 +372,7 @@
* <li>{@link org.apache.lucene.search.Query#rewrite(org.apache.lucene.index.IndexReader) rewrite(IndexReader reader)} &mdash; Rewrites queries into primitive queries. Primitive queries are: * <li>{@link org.apache.lucene.search.Query#rewrite(org.apache.lucene.index.IndexReader) rewrite(IndexReader reader)} &mdash; Rewrites queries into primitive queries. Primitive queries are:
* {@link org.apache.lucene.search.TermQuery TermQuery}, * {@link org.apache.lucene.search.TermQuery TermQuery},
* {@link org.apache.lucene.search.BooleanQuery BooleanQuery}, <span * {@link org.apache.lucene.search.BooleanQuery BooleanQuery}, <span
* >and other queries that implement {@link org.apache.lucene.search.Query#createWeight(IndexSearcher,ScoreMode,float) createWeight(IndexSearcher searcher,boolean needsScores, float boost)}</span></li> * >and other queries that implement {@link org.apache.lucene.search.Query#createWeight(IndexSearcher,ScoreMode,float) createWeight(IndexSearcher searcher,ScoreMode scoreMode, float boost)}</span></li>
* </ol> * </ol>
* <a name="weightClass"></a> * <a name="weightClass"></a>
* <h3>The Weight Interface</h3> * <h3>The Weight Interface</h3>
@ -356,23 +381,15 @@
* interface provides an internal representation of the Query so that it can be reused. Any * interface provides an internal representation of the Query so that it can be reused. Any
* {@link org.apache.lucene.search.IndexSearcher IndexSearcher} * {@link org.apache.lucene.search.IndexSearcher IndexSearcher}
* dependent state should be stored in the Weight implementation, * dependent state should be stored in the Weight implementation,
* not in the Query class. The interface defines five methods that must be implemented: * not in the Query class. The interface defines four main methods:
* <ol> * <ol>
* <li> * <li>
* {@link org.apache.lucene.search.Weight#getQuery getQuery()} &mdash; Pointer to the
* Query that this Weight represents.</li>
* <li>
* {@link org.apache.lucene.search.Weight#scorer scorer()} &mdash; * {@link org.apache.lucene.search.Weight#scorer scorer()} &mdash;
* Construct a new {@link org.apache.lucene.search.Scorer Scorer} for this Weight. See <a href="#scorerClass">The Scorer Class</a> * Construct a new {@link org.apache.lucene.search.Scorer Scorer} for this Weight. See <a href="#scorerClass">The Scorer Class</a>
* below for help defining a Scorer. As the name implies, the Scorer is responsible for doing the actual scoring of documents * below for help defining a Scorer. As the name implies, the Scorer is responsible for doing the actual scoring of documents
* given the Query. * given the Query.
* </li> * </li>
* <li> * <li>
* {@link org.apache.lucene.search.Weight#bulkScorer bulkScorer()} &mdash;
* Construct a new {@link org.apache.lucene.search.BulkScorer BulkScorer} for this Weight. See <a href="#bulkScorerClass">The BulkScorer Class</a>
* below for help defining a BulkScorer. This is an optional method, and most queries do not implement it.
* </li>
* <li>
* {@link org.apache.lucene.search.Weight#explain(org.apache.lucene.index.LeafReaderContext, int) * {@link org.apache.lucene.search.Weight#explain(org.apache.lucene.index.LeafReaderContext, int)
* explain(LeafReaderContext context, int doc)} &mdash; Provide a means for explaining why a given document was * explain(LeafReaderContext context, int doc)} &mdash; Provide a means for explaining why a given document was
* scored the way it was. * scored the way it was.
@ -380,6 +397,16 @@
* that scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity} will make use of the Similarity's implementation: * that scores via a {@link org.apache.lucene.search.similarities.Similarity Similarity} will make use of the Similarity's implementation:
* {@link org.apache.lucene.search.similarities.Similarity.SimScorer#explain(Explanation, long) SimScorer#explain(Explanation freq, long norm)}. * {@link org.apache.lucene.search.similarities.Similarity.SimScorer#explain(Explanation, long) SimScorer#explain(Explanation freq, long norm)}.
* </li> * </li>
* <li>
* {@link org.apache.lucene.search.Weight#extractTerms(java.util.Set) extractTerms(Set&lt;Term&gt; terms)} &mdash; Extract terms that
* this query operates on. This is typically used to support distributed search: knowing the terms that a query operates on helps
* merge index statistics of these terms so that scores are computed over a subset of the data like they would if all documents
* were in the same index.
* </li>
* <li>
* {@link org.apache.lucene.search.Weight#matches matches(LeafReaderContext context, int doc)} &mdash; Give information about positions
* and offsets of matches. This is typically useful to implement highlighting.
* </li>
* </ol> * </ol>
* <a name="scorerClass"></a> * <a name="scorerClass"></a>
* <h3>The Scorer Class</h3> * <h3>The Scorer Class</h3>
@ -458,17 +485,13 @@
* This method returns a {@link org.apache.lucene.search.TopDocs TopDocs} object, * This method returns a {@link org.apache.lucene.search.TopDocs TopDocs} object,
* which is an internal collection of search results. The IndexSearcher creates * which is an internal collection of search results. The IndexSearcher creates
* a {@link org.apache.lucene.search.TopScoreDocCollector TopScoreDocCollector} and * a {@link org.apache.lucene.search.TopScoreDocCollector TopScoreDocCollector} and
* passes it along with the Weight, Filter to another expert search method (for * passes it along with the Weight to another expert search method (for
* more on the {@link org.apache.lucene.search.Collector Collector} mechanism, * more on the {@link org.apache.lucene.search.Collector Collector} mechanism,
* see {@link org.apache.lucene.search.IndexSearcher IndexSearcher}). The TopScoreDocCollector * see {@link org.apache.lucene.search.IndexSearcher IndexSearcher}). The TopScoreDocCollector
* uses a {@link org.apache.lucene.util.PriorityQueue PriorityQueue} to collect the * uses a {@link org.apache.lucene.util.PriorityQueue PriorityQueue} to collect the
* top results for the search. * top results for the search.
* <p>If a Filter is being used, some initial setup is done to determine which docs to include.
* Otherwise, we ask the Weight for a {@link org.apache.lucene.search.Scorer Scorer} for each
* {@link org.apache.lucene.index.IndexReader IndexReader} segment and proceed by calling
* {@link org.apache.lucene.search.BulkScorer#score(org.apache.lucene.search.LeafCollector,org.apache.lucene.util.Bits) BulkScorer.score(LeafCollector,Bits)}.
* <p>At last, we are actually going to score some documents. The score method takes in the Collector * <p>At last, we are actually going to score some documents. The score method takes in the Collector
* (most likely the TopScoreDocCollector or TopFieldCollector) and does its business.Of course, here * (most likely the TopScoreDocCollector or TopFieldCollector) and does its business. Of course, here
* is where things get involved. The {@link org.apache.lucene.search.Scorer Scorer} that is returned * is where things get involved. The {@link org.apache.lucene.search.Scorer Scorer} that is returned
* by the {@link org.apache.lucene.search.Weight Weight} object depends on what type of Query was * by the {@link org.apache.lucene.search.Weight Weight} object depends on what type of Query was
* submitted. In most real world applications with multiple query terms, the * submitted. In most real world applications with multiple query terms, the

View File

@ -73,9 +73,9 @@
* your searching needs. * your searching needs.
* However, in some applications it may be necessary to customize your <a * However, in some applications it may be necessary to customize your <a
* href="Similarity.html">Similarity</a> implementation. For instance, some * href="Similarity.html">Similarity</a> implementation. For instance, some
* applications do not need to * applications do not need to distinguish between shorter and longer documents
* distinguish between shorter and longer documents (see <a * and could set BM25's {@link org.apache.lucene.search.similarities.BM25Similarity#BM25Similarity(float,float) b}
* href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">a "fair" similarity</a>). * parameter to {@code 0}.
* *
* <p>To change {@link org.apache.lucene.search.similarities.Similarity}, one must do so for both indexing and * <p>To change {@link org.apache.lucene.search.similarities.Similarity}, one must do so for both indexing and
* searching, and the changes must happen before * searching, and the changes must happen before
@ -83,15 +83,27 @@
* just isn't well-defined what is going to happen. * just isn't well-defined what is going to happen.
* *
* <p>To make this change, implement your own {@link org.apache.lucene.search.similarities.Similarity} (likely * <p>To make this change, implement your own {@link org.apache.lucene.search.similarities.Similarity} (likely
* you'll want to simply subclass an existing method, be it * you'll want to simply subclass {@link org.apache.lucene.search.similarities.SimilarityBase}), and
* {@link org.apache.lucene.search.similarities.ClassicSimilarity} or a descendant of
* {@link org.apache.lucene.search.similarities.SimilarityBase}), and
* then register the new class by calling * then register the new class by calling
* {@link org.apache.lucene.index.IndexWriterConfig#setSimilarity(Similarity)} * {@link org.apache.lucene.index.IndexWriterConfig#setSimilarity(Similarity)}
* before indexing and * before indexing and
* {@link org.apache.lucene.search.IndexSearcher#setSimilarity(Similarity)} * {@link org.apache.lucene.search.IndexSearcher#setSimilarity(Similarity)}
* before searching. * before searching.
* *
* <h3>Tuning {@linkplain org.apache.lucene.search.similarities.BM25Similarity}</h3>
* <p>{@link org.apache.lucene.search.similarities.BM25Similarity} has
* two parameters that may be tuned:
* <ul>
* <li><tt>k1</tt>, which calibrates term frequency saturation and must be
* positive or null. A value of {@code 0} makes term frequency completely
* ignored, making documents scored only based on the value of the <tt>IDF</tt>
* of the matched terms. Higher values of <tt>k1</tt> increase the impact of
* term frequency on the final score. Default value is {@code 1.2}.</li>
* <li><tt>b</tt>, which controls how much document length should normalize
* term frequency values and must be in {@code [0, 1]}. A value of {@code 0}
* disables length normalization completely. Default value is {@code 0.75}.</li>
* </ul>
*
* <h3>Extending {@linkplain org.apache.lucene.search.similarities.SimilarityBase}</h3> * <h3>Extending {@linkplain org.apache.lucene.search.similarities.SimilarityBase}</h3>
* <p> * <p>
* The easiest way to quickly implement a new ranking method is to extend * The easiest way to quickly implement a new ranking method is to extend
@ -112,33 +124,5 @@
* subclassing the Similarity, one can simply introduce a new basic model and tell * subclassing the Similarity, one can simply introduce a new basic model and tell
* {@link org.apache.lucene.search.similarities.DFRSimilarity} to use it. * {@link org.apache.lucene.search.similarities.DFRSimilarity} to use it.
* *
* <h3>Changing {@linkplain org.apache.lucene.search.similarities.ClassicSimilarity}</h3>
* <p>
* If you are interested in use cases for changing your similarity, see the Lucene users's mailing list at <a
* href="http://www.gossamer-threads.com/lists/lucene/java-user/39125">Overriding Similarity</a>.
* In summary, here are a few use cases:
* <ol>
* <li><p>The <code>SweetSpotSimilarity</code> in
* <code>org.apache.lucene.misc</code> gives small
* increases as the frequency increases a small amount
* and then greater increases when you hit the "sweet spot", i.e. where
* you think the frequency of terms is more significant.</li>
* <li><p>Overriding tf &mdash; In some applications, it doesn't matter what the score of a document is as long as a
* matching term occurs. In these
* cases people have overridden Similarity to return 1 from the tf() method.</li>
* <li><p>Changing Length Normalization &mdash; By overriding
* {@link org.apache.lucene.search.similarities.Similarity#computeNorm(org.apache.lucene.index.FieldInvertState state)},
* it is possible to discount how the length of a field contributes
* to a score. In {@link org.apache.lucene.search.similarities.ClassicSimilarity},
* lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be
* 1 / (numTerms in field), all fields will be treated
* <a href="http://www.gossamer-threads.com/lists/lucene/java-user/38967#38967">"fairly"</a>.</li>
* </ol>
* In general, Chris Hostetter sums it up best in saying (from <a
* href="http://www.gossamer-threads.com/lists/lucene/java-user/39125#39125">the Lucene users's mailing list</a>):
* <blockquote>[One would override the Similarity in] ... any situation where you know more about your data then just
* that
* it's "text" is a situation where it *might* make sense to to override your
* Similarity method.</blockquote>
*/ */
package org.apache.lucene.search.similarities; package org.apache.lucene.search.similarities;

View File

@ -35,7 +35,7 @@ to check if the results are what we expect):</p>
// Store the index in memory: // Store the index in memory:
Directory directory = new RAMDirectory(); Directory directory = new RAMDirectory();
// To store an index on disk, use this instead: // To store an index on disk, use this instead:
//Directory directory = FSDirectory.open("/tmp/testindex"); //Directory directory = FSDirectory.open(Paths.get("/tmp/testindex"));
IndexWriterConfig config = new IndexWriterConfig(analyzer); IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config); IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document(); Document doc = new Document();
@ -50,7 +50,7 @@ to check if the results are what we expect):</p>
// Parse a simple query that searches for "text": // Parse a simple query that searches for "text":
QueryParser parser = new QueryParser("fieldname", analyzer); QueryParser parser = new QueryParser("fieldname", analyzer);
Query query = parser.parse("text"); Query query = parser.parse("text");
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs; ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs;
assertEquals(1, hits.length); assertEquals(1, hits.length);
// Iterate through the results: // Iterate through the results:
for (int i = 0; i < hits.length; i++) { for (int i = 0; i < hits.length; i++) {