mirror of https://github.com/apache/lucene.git
LUCENE-4592: Improve Javadocs of NumericRangeQuery
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1418652 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
217f8b076b
commit
f15306c649
|
@ -73,14 +73,9 @@ import org.apache.lucene.index.Term; // for javadocs
|
||||||
* details.
|
* details.
|
||||||
*
|
*
|
||||||
* <p>This query defaults to {@linkplain
|
* <p>This query defaults to {@linkplain
|
||||||
* MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} for
|
* MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}.
|
||||||
* 32 bit (int/float) ranges with precisionStep ≤8 and 64
|
* With precision steps of ≤4, this query can be run with
|
||||||
* bit (long/double) ranges with precisionStep ≤6.
|
* one of the BooleanQuery rewrite methods without changing
|
||||||
* Otherwise it uses {@linkplain
|
|
||||||
* MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the
|
|
||||||
* number of terms is likely to be high. With precision
|
|
||||||
* steps of ≤4, this query can be run with one of the
|
|
||||||
* BooleanQuery rewrite methods without changing
|
|
||||||
* BooleanQuery's default max clause count.
|
* BooleanQuery's default max clause count.
|
||||||
*
|
*
|
||||||
* <br><h3>How it works</h3>
|
* <br><h3>How it works</h3>
|
||||||
|
@ -117,17 +112,29 @@ import org.apache.lucene.index.Term; // for javadocs
|
||||||
*
|
*
|
||||||
* <a name="precisionStepDesc"><h3>Precision Step</h3>
|
* <a name="precisionStepDesc"><h3>Precision Step</h3>
|
||||||
* <p>You can choose any <code>precisionStep</code> when encoding values.
|
* <p>You can choose any <code>precisionStep</code> when encoding values.
|
||||||
* Lower step values mean more precisions and so more terms in index (and index gets larger).
|
* Lower step values mean more precisions and so more terms in index (and index gets larger). The number
|
||||||
* On the other hand, the maximum number of terms to match reduces, which optimized query speed.
|
* of indexed terms per value is (those are generated by {@link NumericTokenStream}):
|
||||||
* The formula to calculate the maximum term count is:
|
* <p style="font-family:serif">
|
||||||
* <pre>
|
* indexedTermsPerValue = <b>ceil</b><big>(</big>bitsPerValue / precisionStep<big>)</big>
|
||||||
* n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )
|
* </p>
|
||||||
* </pre>
|
* As the lower precision terms are shared by many values, the additional terms only
|
||||||
* <p><em>(this formula is only correct, when <code>bitsPerValue/precisionStep</code> is an integer;
|
* slightly grow the term dictionary (approx. 7% for <code>precisionStep=4</code>), but have a larger
|
||||||
* in other cases, the value must be rounded up and the last summand must contain the modulo of the division as
|
* impact on the postings (the postings file will have more entries, as every document is linked to
|
||||||
* precision step)</em>.
|
* <code>indexedTermsPerValue</code> terms instead of one). The formula to estimate the growth
|
||||||
* For longs stored using a precision step of 4, <code>n = 15*15*2 + 15 = 465</code>, and for a precision
|
* of the term dictionary in comparison to one term per value:
|
||||||
* step of 2, <code>n = 31*3*2 + 3 = 189</code>. But the faster search speed is reduced by more seeking
|
* <p>
|
||||||
|
* <!-- the formula in the alt attribute was transformed from latex to PNG with http://1.618034.com/latex.php (with 110 dpi): -->
|
||||||
|
* <img src="doc-files/nrq-formula-1.png" alt="\mathrm{termDictOverhead} = \sum\limits_{i=0}^{\mathrm{indexedTermsPerValue}-1} \frac{1}{2^{\mathrm{precisionStep}\cdot i}}" />
|
||||||
|
* </p>
|
||||||
|
* <p>On the other hand, if the <code>precisionStep</code> is smaller, the maximum number of terms to match reduces,
|
||||||
|
* which optimizes query speed. The formula to calculate the maximum number of terms that will be visited while
|
||||||
|
* executing the query is:
|
||||||
|
* <p>
|
||||||
|
* <!-- the formula in the alt attribute was transformed from latex to PNG with http://1.618034.com/latex.php (with 110 dpi): -->
|
||||||
|
* <img src="doc-files/nrq-formula-2.png" alt="\mathrm{maxQueryTerms} = \left[ \left( \mathrm{indexedTermsPerValue} - 1 \right) \cdot \left(2^\mathrm{precisionStep} - 1 \right) \cdot 2 \right] + \left( 2^\mathrm{precisionStep} - 1 \right)" />
|
||||||
|
* </p>
|
||||||
|
* <p>For longs stored using a precision step of 4, <code>maxQueryTerms = 15*15*2 + 15 = 465</code>, and for a precision
|
||||||
|
* step of 2, <code>maxQueryTerms = 31*3*2 + 3 = 189</code>. But the faster search speed is reduced by more seeking
|
||||||
* in the term enum of the index. Because of this, the ideal <code>precisionStep</code> value can only
|
* in the term enum of the index. Because of this, the ideal <code>precisionStep</code> value can only
|
||||||
* be found out by testing. <b>Important:</b> You can index with a lower precision step value and test search speed
|
* be found out by testing. <b>Important:</b> You can index with a lower precision step value and test search speed
|
||||||
* using a multiple of the original step value.</p>
|
* using a multiple of the original step value.</p>
|
||||||
|
@ -143,7 +150,7 @@ import org.apache.lucene.index.Term; // for javadocs
|
||||||
* per value in the index and querying is as slow as a conventional {@link TermRangeQuery}. But it can be used
|
* per value in the index and querying is as slow as a conventional {@link TermRangeQuery}. But it can be used
|
||||||
* to produce fields, that are solely used for sorting (in this case simply use {@link Integer#MAX_VALUE} as
|
* to produce fields, that are solely used for sorting (in this case simply use {@link Integer#MAX_VALUE} as
|
||||||
* <code>precisionStep</code>). Using {@link IntField},
|
* <code>precisionStep</code>). Using {@link IntField},
|
||||||
* {@link LongField}, {@link FloatField} or {@link DoubleField} for sorting
|
* {@link LongField}, {@link FloatField} or {@link DoubleField} for sorting
|
||||||
* is ideal, because building the field cache is much faster than with text-only numbers.
|
* is ideal, because building the field cache is much faster than with text-only numbers.
|
||||||
* These fields have one term per value and therefore also work with term enumeration for building distinct lists
|
* These fields have one term per value and therefore also work with term enumeration for building distinct lists
|
||||||
* (e.g. facets / preselected values to search for).
|
* (e.g. facets / preselected values to search for).
|
||||||
|
|
Binary file not shown.
After Width: | Height: | Size: 3.1 KiB |
Binary file not shown.
After Width: | Height: | Size: 3.6 KiB |
Loading…
Reference in New Issue