From f15306c6495ae37e88ecd9026fa6d143a94486ef Mon Sep 17 00:00:00 2001 From: Uwe Schindler Date: Sat, 8 Dec 2012 12:09:44 +0000 Subject: [PATCH] LUCENE-4592: Improve Javadocs of NumericRangeQuery git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1418652 13f79535-47bb-0310-9956-ffa450edef68 --- .../lucene/search/NumericRangeQuery.java | 47 ++++++++++-------- .../lucene/search/doc-files/nrq-formula-1.png | Bin 0 -> 3171 bytes .../lucene/search/doc-files/nrq-formula-2.png | Bin 0 -> 3694 bytes 3 files changed, 27 insertions(+), 20 deletions(-) create mode 100644 lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-1.png create mode 100644 lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-2.png diff --git a/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java b/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java index fc8da481147..2d7cbe402fd 100644 --- a/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java +++ b/lucene/core/src/java/org/apache/lucene/search/NumericRangeQuery.java @@ -73,14 +73,9 @@ import org.apache.lucene.index.Term; // for javadocs * details. * *

This query defaults to {@linkplain - * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} for - * 32 bit (int/float) ranges with precisionStep ≤8 and 64 - * bit (long/double) ranges with precisionStep ≤6. - * Otherwise it uses {@linkplain - * MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the - * number of terms is likely to be high. With precision - * steps of ≤4, this query can be run with one of the - * BooleanQuery rewrite methods without changing + * MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}. + * With precision steps of ≤4, this query can be run with + * one of the BooleanQuery rewrite methods without changing * BooleanQuery's default max clause count. * *

How it works

@@ -117,17 +112,29 @@ import org.apache.lucene.index.Term; // for javadocs * *

Precision Step

*

You can choose any precisionStep when encoding values. - * Lower step values mean more precisions and so more terms in index (and index gets larger). - * On the other hand, the maximum number of terms to match reduces, which optimized query speed. - * The formula to calculate the maximum term count is: - *

- *  n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )
- * 
- *

(this formula is only correct, when bitsPerValue/precisionStep is an integer; - * in other cases, the value must be rounded up and the last summand must contain the modulo of the division as - * precision step). - * For longs stored using a precision step of 4, n = 15*15*2 + 15 = 465, and for a precision - * step of 2, n = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking + * Lower step values mean more precisions and so more terms in index (and index gets larger). The number + * of indexed terms per value is (those are generated by {@link NumericTokenStream}): + *

+ *   indexedTermsPerValue = ceil(bitsPerValue / precisionStep) + *

+ * As the lower precision terms are shared by many values, the additional terms only + * slightly grow the term dictionary (approx. 7% for precisionStep=4), but have a larger + * impact on the postings (the postings file will have more entries, as every document is linked to + * indexedTermsPerValue terms instead of one). The formula to estimate the growth + * of the term dictionary in comparison to one term per value: + *

+ * + *   \mathrm{termDictOverhead} = \sum\limits_{i=0}^{\mathrm{indexedTermsPerValue}-1} \frac{1}{2^{\mathrm{precisionStep}\cdot i}} + *

+ *

On the other hand, if the precisionStep is smaller, the maximum number of terms to match reduces, + * which optimizes query speed. The formula to calculate the maximum number of terms that will be visited while + * executing the query is: + *

+ * + *   \mathrm{maxQueryTerms} = \left[ \left( \mathrm{indexedTermsPerValue} - 1 \right) \cdot \left(2^\mathrm{precisionStep} - 1 \right) \cdot 2 \right] + \left( 2^\mathrm{precisionStep} - 1 \right) + *

+ *

For longs stored using a precision step of 4, maxQueryTerms = 15*15*2 + 15 = 465, and for a precision + * step of 2, maxQueryTerms = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking * in the term enum of the index. Because of this, the ideal precisionStep value can only * be found out by testing. Important: You can index with a lower precision step value and test search speed * using a multiple of the original step value.

@@ -143,7 +150,7 @@ import org.apache.lucene.index.Term; // for javadocs * per value in the index and querying is as slow as a conventional {@link TermRangeQuery}. But it can be used * to produce fields, that are solely used for sorting (in this case simply use {@link Integer#MAX_VALUE} as * precisionStep). Using {@link IntField}, - * {@link LongField}, {@link FloatField} or {@link DoubleField} for sorting + * {@link LongField}, {@link FloatField} or {@link DoubleField} for sorting * is ideal, because building the field cache is much faster than with text-only numbers. * These fields have one term per value and therefore also work with term enumeration for building distinct lists * (e.g. facets / preselected values to search for). diff --git a/lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-1.png b/lucene/core/src/java/org/apache/lucene/search/doc-files/nrq-formula-1.png new file mode 100644 index 0000000000000000000000000000000000000000..fd7d936d8796bf1b2b295968ae7606d6edf8351c GIT binary patch literal 3171 zcmV-p44m_cP)_D~ zUx}h@3ZRaurTm&+s8c}4WhbyL^%{)LbDg+W#8^_m25D_2S# ztD$kDV6)kHp4YbRJ~xJ8nCH2BB~87%xN`UU z5gEw8SS;4-^*&=^V~jDTu4|l@Wf_K{=Xv|IauhzI&TD{ua92%LRjbtsZ;ml0Ns=^8 zyRL(N!Eu&lC~U38LDzL!YrNhVRBEk_F=?8{alBkE!!T4zbzNtS@jTC3o2IGn`%Tki zS?2qG*LA3Bn-)W1_jX7K!OF7SY&KeJ3~%ufS|rJZ5L)XX2v9>D$Ac!c)>6ta3@5we zHY9|I<2cJQYR+1V@=20-p4W9<5CnrJthL6NEX(@7$Fs$A9`x%Y^f>d2Q<;v;nc#j0 zgb;%PV;F|(^%_^&wxx@k%?204FvLM!*I^i@X-YLvv9@jFIK~zFGzfwnl1$!?GHu&Z z>p}<|k0nPFMNyyz`Y4WLrIb<%A8qRnYNqRb-=}FRrR@8Dy$+A-4Ibn8m=4DM zQd`*ZaF{(|yZ730$3ElZ@t6)5@k#@jOF{_PIP3MA3R5xP_v1KrYg?^WRC3(<pI z^0zfAb7qL0riW|6VX)S}HN?mcs_VKaiagKB+Hx%;RN`LoeSe1}ledq{h>?Bd9+MG7 z)}j!Cs&X3|cW3g|P)bQD-5OAgY+Fo3jD?{B+xqoe_*FfO|2#fE{%xJgoDEe~Jv}|W zyu85Q^!@vH9A{bf{QUgyzyCfwJUl%;RaI40Raurle*Cz-y?uOqM1iWRK7INW1cBE2 z>({T3kB`3ZfBW|B`T02rf-hgb;JB))AP8=6Z}0E#>B^5EKYsrF31h0NDr3y^^YhEg zOI20A?|Yu-77l{o<>dv9-rn8@K_G}3!^N{#k)CmCz!tkD>&xDkW$h@0*cH1E%FuI7)8_7KJp>tFQ72vWJTK4lvMkTakY)~soN+uc%8dEHg0t3AX6n-t$8lMf zd+zfD+_c&RdZd&ZC(idgPbrmU**WNR=ADo;t}IK|am;{!#H&-xS_`|zHEob<9~r5j zL@*h=SS*lfYq?w^X=)Tji^T$I_^Yad({XURk3eu>!LdSZ(Wgz*L{Ws=%ChXb4ox6; zD};yIGRBDQduvi6e`Cz&&!62*;j%1qPxpKRRCCvLcp!r=Bk{j6rYy@ONoZ37dPI<) zt`A!OQ>A}ZkOQtPOP)Gl-avy@G4db9aZK4+duQh~ID=4FO6ev+rM=XU!gO%zX0o63 z!BIVw8sBw1)phN89wlGJ-XhzChYH&%dk0;19e;VAQwOD# z)FbygT6cYgJDym-J+J?^9RKrU{FP>ytqKmWy}M~>pwuj zVK73)-2F}7qu~0!4}ySnWuE6yc1L*vkP1~%dF1}asoV5GoA3XV!1R4j&N%l=!dy99 zw+)1;kB!H1rRzF(lr*{B*}&noZR5xtV2n1Im$@uUH;7~qP%Bwbu3eVPm?wt}*zBGy#rE8t##8hp63n9AMpveRrRk!Doh#Y zc}`L>2U8Tq6)vP7Y);0wOe;8J3>3s*gYggAZ`=iYhEFOuDdk|^bJ9S=4#bEV&ay0y z!C9AJ|)DnQ=%Z4sy9ts`vr>7+*qO2>`M-7(4F&IXKy2jj@$wr#Nq`Jic3 z?%sRcCoq_=NYk`!+rj(>JrlGtm}J09J8qOA1UulExBRl;ilRu81SS&Rv@}hPF;Nte z5uubqz=RM&YfanMVp}=!8k`|sM_ZGpzV9RSLA-E|gLlzlXTVvrt zp67_zaF=SOX*#wLgCGdQaPZ52tS5l!SFLrDB#=Z(iQ#ab=UJ9P9;H;%G)a=63%DiA zvLFbowYV{gqBKp}H#a7VBFt!?T96r%Wm(fSlg~DaqVWpL-ME3P7=#^bY=L*vm6OvY zF>?SRzO3Co2n3u?{;cAs_m8F%@e_{P5Pl2&sI@-27ljoZbMbI~H}SeTDJJ@{1pCwy zZ6%5#*8`5BTi11;KYvC9)e+emCc-cm0fIsf&~9@IC8??kt`Z?c(=@Bqs_QxgKWVMu z53$xltzan*CgPGLsq312IGEpahmh9VqA1`h>HD6%OwdYjH{qkfib#zz4cUfx(Ys}F)qEi^Ztp; zUFI|$0cS3q#&PWXK91=fM47dIH*TTnWibq)8rUFa_d30 zgSH0RrmpKe&)pS1ZUL&TZCje18}|isgJ_ZDci|x_bHOYYi^GCKFe5P=3vRhwLiOO3 z3@#zWV)5%8r{lpzcZwiQQ}7s)7tVTH*td^EUg+!h@SZ0F^2iIFJ^z&3QHqN%*e zzEG?fCi;e6yq;k$6LDxz$m|1d1{NGlt$_tcg~KqU&@0%&t_A0=hU{MhnEJkN+jdZ! zYr)lZJ#I@lYcAc$twBjN;nqgNO!md?^=7l7g^z%JaLh&Wr$Hf0YeYYgpTuLd5qjIHk|5<~+xOkptj2T4!^AMG}JSbE7!OVEI19Svwjmh!mg>pb& z*n(p&l6R`7xoS6fauikT&>;FznxHT3p0`j{EO3XZu{u<@>V4Tay9v>oyK zlgk*H*~60z%%wt0tZ{njv8>ETz&~Zo1+!YM_S{qI=H`Yr_i>MzdBYYQbE#;pu|)=U zx7a;Il~Q7rPgPZOU8lv2VGEAAR7j)kGdSfik?VvN9K$fP<$o41%Vd=1400009a7bBm000o` z000o`0fN&Cm;e9(9dt!lbW?9;ba!ELWdKJ|Odv;SZ*z2WV{&P5bRak`HZ$}pI9UJy z4fjbzK~#90?Oh>&tGE`;{Qu39j2y|xN=B|^WMwK>GIG~}+~=r#&lS0KV8?;216xNd z4s0D?oIm;51X2o=_RZudAR)Qr-Xu5o-h_v?ZHa+_fq{X6fx$HV>(&GqToDf+*laep zrp3VE*G5qkkB^U+CiS6dn&1O3FE4lALz5)AQpvyc81?x0SQLf&e1tsUt+%%~h{W^r z^Be?T3p#md4L&a@rMD)*z~H)w<9M;U@;o1r)Cte?%Cf|DS(d!3?smJ; z)o9^Pk@(Nz#M}$AEX%S?lfiXe*L4@8XGtVU5{BWesThf`%(%}1%d%Y84T4~z5MuT{ zJAYtMPLjlN9NV@TW4qn1X__dC7-P0=CrMJ4CG>2|vKV7w7}~bYpCn0=rfHUCZ*OmH z+Y&;8Ai$i#81sEUiXxs=6a@%tn#S`y$8nGvhGEk*s4)yf-}g704M?JtLSmC7$+9d6 z0zwG#2SH$27Wt6sD2jq0pp=46-}gsMUE@G0-E1~-94}MdI1HpP2!bezT-W9DU|CiW z1fo2sjW5XqW9;?y6->vhZ`*d3WnmbK@*pYDzV8=B@%sAe`#vFr%L5_A^Sqar7f?kh z1#M9jffCR2M45z?LRxMj4{;nnJUnzHENCqGAJR1SJkJ-9rs?uz_`V;8VO`g4+tzgr zW+SOAOHjBxRr8|LHYGJUrPOhp5rt!2wE*v=itdeZyWI|>gX_9nq&?3QE!!8BhB?x4 z9LWZrrm1b)mSt%^aJSoGavJj`S3q6YaU6FF*z}ICer6oUn5`+Lp65YGq|9lWatkvd zq^@hQ$#q>^)OGE79-c@P@mWfoC~ zvouY`#=%g-Z$%pTwe*l{giz*GviXxd&#@v@Re>QDwMqzxZQD}LvMfcCI(3eD8C{ko z^t`@)tX31@vywrwv%$&x6FVlijimGBi9=Xu_CyG@dWN9~ahx9bh5njMbg zAjDzByGfF$UUMcz*ffC+uIwMj@TMrR2;|X%cwMtWAQ~%2*GYn9v2DE z^MW9Nq^VM>e#pX*<6)s$_`Yx3w(Gia9Pf5JxYP1HhpSKZuI+X^?A7sopQ~UzHU#zt zba>Zwp%}v7+_xyn^PIoP%MocKf?5ZwBlnxOA37Z&1cceP9mg^3c!Us2DKxt@O%WQV zI=M;LDzmO@*q7os=B~KiZs#}-z6)_x;=?@8<2dFmCf)L4(W+^hqL9uunGUsN$sGvA zmbY!+SMdCKo_n4L;=wX0RpN_YB*a$`k-*n&`iew!GOn}O*%>F+tCAr)J~{=7H-1@` zh?5ZYeKp2KET23F zuu`0L${2Z=QbnIUU`YQ*W`2c1je8#uRK*#9d79_sHkq<4MGI|R*L-B;@gu<5s4*f~ ziKSskyv_66wryUcXze6~phi)=5H!b&>KivZXC^Uh3Zt{wR+u3~n$;HLWQYwYxyk$b zl0N{QFKSA*EK7Q#HgzB0(ehjy1FuC(U?*QRA|buPwJUu^5>&?Z72D3uIMG!{wBeZ= zLE_z28WYi1MHO89OlbUSfvvXhw1d0SjLq)LgH0rBU3wW~7|R*<-blFO|fVvk#Va^s#J4=v9ioB*$?y8QuI}7Fa#H=6ml$u z6w1&U*?szG6mT$erzy62^Hh}{N@)-T z@b_GF3JSbmj4`o%(VyMv*dEe=zVE-jzHYZ$>_3Led{^isJ4q4^sT#iYiG=hDfNlL# z+N7^oeptoAj2kCNgb+$;6h&~LG)*IB%kw;nBJQ&vLPjTHnyw_`_4QR>6t=yMBV%@8 zA3qGJx2EMbxLbZTO|#i-U>DTq#Hk_T6ULYrYCwc&gSjgGtSE{?YC08A{ikD0cxDtu zQ4}FAp)&(^&u_O|k>!+9#I|%!>KNAso=}8%2wcP3tj|m-MZJ9rjRqn~?M%N|gS8n) zHYG_?6vbTZye%|*=@bd+6*6B7U#ZwBj2l4@VwRI6!Ra?*(%ah`Ap~Ym{lQ;)4D3qF z%O^}bBLICOj^i(@Pmo*i3(#{7tbz0KY*AuhGEfx7h_lfys`Rr0TU(c@Ym7<}1Oa>> z;zJ%djn!D5=g-g2;yH$n<9MFO?F1MH8Dn@0CxQ(zAwV+_pPY?^cU^b0*&yC+_-8@~ z5(^)0O3?!|D;mf5{g;=Q&TMD(N-|ha=a@pg+Vc8T-gJE;p}vA!eMKYaT$phxJ>vw) zvMj`eBaECs#|e5k=1TamQU_YQG+9 zI;pyLr3R|uf9gLzk|Yr~=b%etrznnv7so6pBK*X1*L7jqjiM;aGI3!cjceitA>a40 zK*TK$*3}b#X&nD7a0}vaG&QH5`CTtA_h&9NE6_!*zvQ$xhzK z7mY}SVJN;rK6XRtD>{q5Vy9yu<0MQGBuZ)LuMS0UhIY1+)Rjc;kLXJCVzf#DsQU)d z$BUwXC5a!KZ7>U_{C{M#*@&!(qG)#3sN^kBsey!l|Mk~jgpeQzvMf6ukEhcK$M0)y zUDr*Lu_t)H-)C9I7~^yKdc9_h9goMKKYt#N z$NhevB+2{x``>^6y1?BJQ=}A-3S;qAhdnLx91jrs%`KtM$Q;;ymq42)R z)wb>ZelISw2q8~TPft%z@9*zu2dEu~!{LuV{*VYpYkd3mjc3Lm3XaF4<2b9;N@W>3 z;4HugCP~5=LwA-MCyJugYBg?@KOT?A<53jvAP9c`{J9r!I2?Zb_%Y)beQt<-3Ii$m zr%L+$`}fsqHNr{}1OfUZXU+TjyTqEafo}W#{`cR1*JM!T;cTS_dSL3u2z)#q)nN); zTdh|Ae{}U|(Q^<_k-Ofm9w}h{i{RR)XBrqxfpj+iMHqdOqrqs5K9GB0;ra}h9>{I5 z0Ms_Wi=UfA9i9?pb{0~={FlWLL%M;%0?4uqma1{DV*}T`BpK+ju?rDQ9M3Al2368^^BMjPGMj$7osSd7$}F(oVn;R(BuC9Y#zt4 zQ7M*0r;2%dl>$GA%d!k7?aWv*P6~o+a9(BQmj?b&aBC_IzBGa$==|N!MZj=oW|75k zLleb7P)d!KxePX&P205@{-N`!!RNrk!vikh1m({W0&O-^8QcU2ah=O>cw1}y7s2&R z6n$gVXh6j@B5_sVP=?M%HY$aIfq{X6fx!&8z2n2cz`(%3z~B?$zhZU&O1jG&CjbBd M07*qoM6N<$g2-|NKL7v# literal 0 HcmV?d00001