versioned site updates/fixes for 2.9
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@808056 13f79535-47bb-0310-9956-ffa450edef68
|
@ -1,61 +0,0 @@
|
|||
<benchmark>
|
||||
<ul>
|
||||
<p>
|
||||
<b>Hardware Environment</b><br/>
|
||||
<li><i>Dedicated machine for indexing</i>: Self-explanatory
|
||||
(yes/no)</li>
|
||||
<li><i>CPU</i>: Self-explanatory (Type, Speed and Quantity)</li>
|
||||
<li><i>RAM</i>: Self-explanatory</li>
|
||||
<li><i>Drive configuration</i>: Self-explanatory (IDE, SCSI, RAID-1,
|
||||
RAID-5)</li>
|
||||
</p>
|
||||
<p>
|
||||
<b>Software environment</b><br/>
|
||||
<li><i>Lucene Version</i>: Self-explanatory</li>
|
||||
<li><i>Java Version</i>: Version of Java SDK/JRE that is run </li>
|
||||
<li><i>Java VM</i>: Server/client VM, Sun VM/JRockIt</li>
|
||||
<li><i>OS Version</i>: Self-explanatory</li>
|
||||
<li><i>Location of index</i>: Is the index stored in filesystem or
|
||||
database? Is it on the same server (local) or
|
||||
over the network?</li>
|
||||
</p>
|
||||
<p>
|
||||
<b>Lucene indexing variables</b><br/>
|
||||
<li><i>Number of source documents</i>: Number of documents being
|
||||
indexed</li>
|
||||
<li><i>Total filesize of source documents</i>: Self-explanatory</li>
|
||||
<li><i>Average filesize of source documents</i>:
|
||||
Self-explanatory</li>
|
||||
<li><i>Source documents storage location</i>: Where are the documents
|
||||
being indexed located?
|
||||
Filesystem, DB, http,etc</li>
|
||||
<li><i>File type of source documents</i>: Types of files being
|
||||
indexed, e.g. HTML files, XML files, PDF files, etc.</li>
|
||||
<li><i>Parser(s) used, if any</i>: Parsers used for parsing the
|
||||
various files for indexing,
|
||||
e.g. XML parser, HTML parser, etc.</li>
|
||||
<li><i>Analyzer(s) used</i>: Type of Lucene analyzer used</li>
|
||||
<li><i>Number of fields per document</i>: Number of Fields each
|
||||
Document contains</li>
|
||||
<li><i>Type of fields</i>: Type of each field</li>
|
||||
<li><i>Index persistence</i>: Where the index is stored, e.g.
|
||||
FSDirectory, SqlDirectory, etc</li>
|
||||
</p>
|
||||
<p>
|
||||
<b>Figures</b><br/>
|
||||
<li><i>Time taken (in ms/s as an average of at least 3 indexing
|
||||
runs)</i>: Time taken to index to index all files</li>
|
||||
<li><i>Time taken / 1000 docs indexed</i>: Time taken to index 1000
|
||||
files</li>
|
||||
<li><i>Memory consumption</i>: Self-explanatory</li>
|
||||
<li><i>Query speed</i>: average time a query takes, type
|
||||
of queries (e.g. simple one-term query, phrase query),
|
||||
not measuring any overhead outside Lucene</li>
|
||||
</p>
|
||||
<p>
|
||||
<b>Notes</b><br/>
|
||||
<li><i>Notes</i>: Any comments which don't belong in the above,
|
||||
special tuning/strategies, etc</li>
|
||||
</p>
|
||||
</ul>
|
||||
</benchmark>
|
|
@ -202,9 +202,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menupage">
|
||||
<div class="menupagetitle">Contributions</div>
|
||||
</div>
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
@ -327,11 +324,11 @@ instance of <span class="codefrag">org.apache.lucene.analysis.Analyzer</span>.
|
|||
The particular <span class="codefrag">Analyzer</span> we
|
||||
are using, <span class="codefrag">StandardAnalyzer</span>, is
|
||||
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
|
||||
useless words and characters from the index. By useless words and characters I mean common language
|
||||
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
|
||||
stop words and characters from the index. By stop words and characters I mean common language
|
||||
words such as articles (a, an, the, etc.) and other strings that may have less value for searching
|
||||
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
|
||||
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
|
||||
different languages (see the <span class="codefrag">*Analyzer.java</span> sources under <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
||||
different languages (see the <span class="codefrag">*Analyzer.java</span> sources under <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/common/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
||||
</p>
|
||||
<p>
|
||||
Looking further down in the file, you should see the <span class="codefrag">indexDocs()</span> code. This recursive
|
||||
|
|
|
@ -80,10 +80,10 @@ endobj
|
|||
>>
|
||||
endobj
|
||||
18 0 obj
|
||||
<< /Length 2722 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
<< /Length 2732 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
>>
|
||||
stream
|
||||
Gatm>D3*F0')q<++ll:Z@[F.iQ[[,dQQekL]5)d"%_)C%bI#NP/@q'Va7nn<,h^`W9mU*P#`r?G,!#8=bVLp+kP=I_o@`RgTB/N07pc3O%YF[miYT/gl*]^Ro%0MH=lD0o99U4lO-BS;`uneV'RO#.MqL=-ch#i=a34qf#Ni(Tn#^5A!rNG=EGHQ1j/FupHN/7,YkR!a3#qPX_a8M5Ztq[1>5Y%ZYklV6s#c/\%&a)H&giK=Su-.?if'^%R)BBL;&CcR(pGt=jnMg8Me+]H)H!q@Ts"ttX95Z?if.,al]"8c7Wl9Vki@HMaEm)q\l@E9YEt<8XO3r\FJZe=V:_g:OZ4;RdC65=5t9DuEu(@s?C:k2+37@FBt<987FH6e-[p0feH9L+\F\IJi\+m!ML$_unLK;'s-`=97URl![1aH!E0(MO]='/\'FFA,b4kCq[rcS_DH`da7NEnCQ\V%L-BY\L(;AO'ZK/^]%E%'#-\F.q&f+7AIkr&n)JRE)59/VfOgTX8UR9LXC3A"-S["S..7b,DAHV_FHX\df6^F97i`B!.1j`Z?#*ELH27St#7heNun.p?a6jZcXW4H[d^9sDrjD!)WfG1f-[0-_&O%[E!/$68/l>sGeK9$&5%p%!b$^R;?-:0R/eQ(gj7Nc^GD;BJ;p!O<$4aela_![gf_*$fP7[KGnfCfMtD_njB2R;*tJtUS+<N<XlZ>TXs%H8pH8XNFrH="7X6K7kn1A<cMWrLmaY97E(Gq\\RXV>p[lnG*V*?f!(q$)]MYF_h#q9"<J70+?VUG>$Ua\&b/pl:[[nPfs\4M:f/)!<HIBA<X=OfesL";"3$=,E/,9^/3?DH^(+,0XZC3"K,2QMl^E7%O"$"lK3PFV)h=N-WX)4K$VT!b6`W-nE]Lrsg_Bbn'od?>V>GhI+Ee0]JQM42Q,llUhM$LFCCp7XioUI^P;E\"be<e2BiQ(<&!kGV<t`(h>AEbT5@0?jO+P9A/<CEP.PU,tmTO3:U(EI>]dpq,]Z:<_B(NO1,sf^G<RE&7<D@6gr@iEKi7Dbt"EN'U+_Z3>6TFgmEX0Vb=q%2oZl+3sK$(e.`uEY.'Wifej41S@'SrG?X@0lL7AAF[)ITm>$%@.tT+A!%`qR=#ej^MZ)O\V\*!?O^ZRNB0*N?'MrC4AAL.UTCtTL^u?2:.P,>\Dq:#2E^8V8hAk9\-)"F1*7le=g:<K=0KA$I=/[d0=IR8%at6*d@S&Xq_I@T4\b0nj9,ikSeEouN.8gG)OBs^[l\h'GT8;<p"/Sn(DT-%?0\h4D$#?@L!Sul#h6cln.1a2(7CQ)jk=&!=PJ%kn))o=eoYs1da<tbC0O_t;MjYFL11TUWdP"5O\LaQ)II<RaY*U5F?60(?Wd-pP5H.nh;&^ef9+`$/dOH<S_p?)TGiW4MHK-ooClAq;R"0)qLpo4(Jc(O&.o<-Q.<06jC+#:D-.HF/UO=u"U4jUjnBhW]2B[QQ,',l)P*=Jo+PpS$C1W&3!=JYUWl*ibGY>@K.8_C?FFaZRAm<CEljqlb_=Pej(,-j3?]me0Be=(qha-M16pJj@0CL0t`gm)8E4c7kQKT%Q9fDC^HXqrIA&Be_Aij\UHg<U$p3X:Hp^3H**$FYF;tqID!B>K:JAeJ$!<0J_jfWP'a3ls'DMVU8E!IAcAISH4-^iE-_a+9jNpaD1Rfl0uEau"'r5[UW?rekJSZ3X>W%\nC>'[=W-(9,d'f0_pkH9pa[Q*N2[U!5t=Q"gl"Yh:Fb-iN*d%=H.XcI!$#Hc'e@_q*[*d.dE=3T5FS@qK&@nTWaD?S]*>:qK^o:-.?$ZqKBVP80g,,5UX6u="u#O*uX]haP$3/MF8D;`98([=-p8;)J&B-:75<J.s+e,qpLgtaiJ8T+k@oKo1M^*a]u<=aY4c.,&FY$1dg'j4/tqJd_=)`+jOA`cJ1T4-j'333^J>:n667s%8\TAjRcc>J]$_ALh09!?o-/n3kNM"/4OC7qQ9UB^$_^2]g:@7/p[fL[sVb-J(X^;[Z[.].2ZO1@Wj0FV&sm_bSHf>;LHLKb7PB2IE<oXI+`YsQ@lS,73.-,Zmo`VdGMjPA1;KV'!"5FYcc3=._5TgQB$qC64ddN`Hu<UYX=BnTHG@kim?(in;%*JEkZ*W%F;.3pFbQfN,VrX_s?Pd\)f;gN.pY^uqi"sJ2T)a:nF^i,S'<<&C'EQ*6O>o]n_[X14^;[R?Rr`(,E[4>Re42/-=PlYFL$fMp8\1=C!_CA_jnHUU,*Q+P8BM8'ihP$fD_-OYT8LB5Y(;^A1.:sL7Fm42s(dJ/9:s[<H_:H\XD]:7>UN<p;2CR=<V+[N5BW<ng3*.Ur97G\scDe:'5W`;a^k2`6lbqM/9nGu3;W8Q-])Dja^?SC&G$,tii&R.o[c>m&<(CuF?7rMRB<H@8JS,SNFs2f!YXR(ZQ(So^c\U$+iXa9,6cenF:@J#HXN%<9IHUi_F8+43OD]G!F?1`?qR84*eHRR\E\@eE(50XU#7d0659nAJ=sh/l\U=$?<mSF@-2HWBi<9hii:A5Kjg-URCeLYchKg/%#EWo1j>:lM=d\jQk1+b6*16RJTLY5:_jD`3WUl1%pX#`[_B\P?<$?I?=e)-L9F7@%/(2AZ@mD@XO8;00N`rSEK+hek@.>oWVH)jWl*NM%,i\s"3ikQH<I6^Ep\;j#"kM\"5E\!C)jB<l)"):g>!S"W+!r1Em/@O_#&W]~>
|
||||
Gatm>D3*F0')oV[6HGK=`@b5dhYSnZ?A]F#m]Q('38F30=/Kh@AC75!42,'OUDI,j1?ir/,!^<co4@'PR:h8lI9PG)D/o=IS^4T(o0]_Y)$N>#1\UFao?>p0J,/D+hHfP*hb2r7rf?(,pP0l'5o=l:H7sLB]R*i;4ap)<TUW/fh>,q(%X*APJN#->6C=?hMll`GG6*'Z4n*oiG9:.5gJ#Ls,H&?gH3pA(]Z&3Lpd)[\:STUP`3LXKM%c^:+i;0(^^";H+?X%/3lpij_?[,f)V)-,PX]"%OB;`bI`2\SDV#h_$FObG+jnRgEhfY2@s4QT>>Zso8l_Q\Iq]:Ia%Ntnh]m,!d\(Lt4[MWgp+s]Y[9%DM[eF=2bV\)]didnoCaG[aC9&C4ADG.H;"9mPC1O`:B`"&aS,Y&5![Fq9V;=dr2E7WROBpV*O7Sb7U[fEfV^3o7B^1=#[F#5:j`tlp`?_I(_BO08;AGqT]Oofcs(!h'C!?A5s&[+#U6.hpo(`7j8(RN*%q8;+kg`\BG/e,6-Qij_aTh8+>@KCK3(m$mnZ+3,R_?(4K1N=J#$afj,Dm`k7>&8[U')ogeEo)Sh\m^H14*nGm#"7j7Y(Z9kmB/tb384LqV8gO_8cQ+H`LfMSAck8M,<k7[j=)0Q:N!8?$B/3STOgMO!!s$Ye,/eR"q2<S)V*AdQY>8[eZ*(7MQk4&@;Y$$=,MU<V9^r6'MR)jr"o3j2EMu`4jR7q&1V`Fh_/I'dP=mXX-/E[)Pjk*<qb=+f&p*?R,+?eWH+I?Wf'F.q$85:rZ!c=e#)ZfSO&^+3RuRa&h3[;8sfka&Of6l&(!Y!25X,O[oesPR]b8VI9cV_JI5NZ@M$aftSq#e6*;an@^tgHf<brQpiRP1+LNr0_Vu:Wg#C<D\%KL$mC_u`U]`;223(W,JV?p-!VgO.?u`5LluR`j<rF><u^D34%]TMD1afZ@2.+KpZdWf5oR"M^&m%1%"UlgW>b[dNjiJm<IoQ/-iR"%D\6T@phD23;JT81M$=(Q=6B5_i[^XAkjA1/$8^flf`5!-@WHK=BB"r%)cI;X4I#$"0lG(f&b+ejVYK(g]qq`e[]W3]X;it-1NK[@Mg'cQqdD%amr&Z<&2"8L?pWkc^qlmKjQDj[B5odC)>:.TU2rt$!G-9q$a.>FZknl,(<>W:@o5=:Z\Ejp!>6dM(>'ra"iW7V\Fuc"SJrj\K+j8iL&/ZJ,tULcg^6FlQF)jo/=Nb&>s2H8dYl*nF`Z)QncT#Fdpm]6I5.bd'!^24TN'(s95tdm%!UW*dTU@<5^1mq26FP2b/8/#d4J>I=e=!r<2[`9`m$+P]lDCGo#XKFjtL;R`Cu4kc"$%HdP":%?HT6hT=[ad0Zsul.$+CTeNA5nSl@>nW"%dS,mXp'Bb_*8iVBZ$*r;4]hRH:_l</q69d$kS`#BW:5^pbbOm&ZO.<2MUC84DM&F-Ofm.r]FMI,YjJrcK@[K['!&t.cO(4of)&+V:fWm\4m,![;9>I@T2SS2[P+XO/*;dXKrAq$irG4l/Y[,?4<%gs59JBu?2BT8,hddNjR=/)>KOQUfp*TKlp"?981@S=(dr2/cAq)pLOhm$TIG-`2oe3!!Jp\&[;iqNkL(*"IaqG[/3$ir>/$%Tmm4Tk]Y@ObXA!fgi2]%mq2cJ\CU>ft.":a_[Zc,2gIR?"G[/XllgaKjJkBD<"Je%r+TeiR0flD;EmJ8=9'*gC?<,ui+`)S0mq`3@5&h.YPd<J(H=\?R026s83OjT4d^ekkZV)8e`jEh$6N41Bju!TF9h2&^+G'#W?7<0/WHf#O(L9?]M._.A0-En=9Tp/*ZYL7G$I$^ZKd)2b"uj9hEqe]OKB+rV9_*=tbpIq*+6<Y/i%]P4MRSiJmEXW<s=%LMc$T=VT[(BRV2JV).s9m0U?I_tY4TkF<`&b2,->/8S^6?5<.b*.!N/J5HqU<AS\Nl40$doN#FD\qpLe[*u2g!D?\6bA/2Y4SaQ;K5&5etr;tL3A=5ret85eP)`q[M7jBlU%S<6kS8;#4XY`(n\"L2d'SMbSflQ3D>`*FiW`iJ#$P-$_opnO)lmeFu9`kpT5c\6Q6Ye<lk5]@c_nME`ti"EXVhG<XPWmdM0:H91./6W>p\&(8ikIZDB`Ue+l*!>PonF9$!?0F7Obcocn3?6+sXq^Kt3G*9?<LUtCM&0`XEid?abX.H5C@pd\jT@qIan'/H<#B'UKEEKsgeC>FcT]K"'..AY`To=/S35.4'e;2KWN4)Md+O_bBur8+.0XEM3t"Hl5FfshXh,m\$KbP%ZH>pW2sKdg\2iq]-UCFe3DOB0\\oX;bc3:?7D)\1$dEp6K)R9cp'WB4038IL$sQHn?rKQ'"Z]#25!#ZpoSI77cL-B9(jLn3e%@I[2B3j[!:)$>)pAoAaNKr7%r"86JdPs7"#$P2$TBX]YCd0rJm>83)oD0eO5hCrj9n->?l"TtZlIWn]M[0D-T5*4&k?bD-#_s>[Z$hq6A)3Sq@(5/*B"sFA!5%_YqIs678Bii]$Cln6Y%qP;XC/p?kIdE\`cj-0jg!64<6-WW^,PTbVj#MXsPK%+ifk3G8]$@?Tf/]5a'8%?7(A?"=FB%EF,6seNjTpp_+&IY"b0qb.J"@9<23oVp#;?BF(q09,muBl>2gkh+=`8:$#LeM+IPQo&b%)>3<;Z2u]kS$eHk2ise6jkJr_nYnfoRD56%=';`Y'o(qn=_(_X@Hf2Jqd~>
|
||||
endstream
|
||||
endobj
|
||||
19 0 obj
|
||||
|
@ -244,39 +244,39 @@ endobj
|
|||
xref
|
||||
0 34
|
||||
0000000000 65535 f
|
||||
0000008694 00000 n
|
||||
0000008766 00000 n
|
||||
0000008858 00000 n
|
||||
0000008704 00000 n
|
||||
0000008776 00000 n
|
||||
0000008868 00000 n
|
||||
0000000015 00000 n
|
||||
0000000071 00000 n
|
||||
0000000777 00000 n
|
||||
0000000897 00000 n
|
||||
0000000950 00000 n
|
||||
0000008992 00000 n
|
||||
0000009002 00000 n
|
||||
0000001085 00000 n
|
||||
0000009055 00000 n
|
||||
0000009065 00000 n
|
||||
0000001221 00000 n
|
||||
0000009121 00000 n
|
||||
0000009131 00000 n
|
||||
0000001358 00000 n
|
||||
0000009187 00000 n
|
||||
0000009197 00000 n
|
||||
0000001495 00000 n
|
||||
0000009251 00000 n
|
||||
0000009261 00000 n
|
||||
0000001632 00000 n
|
||||
0000004447 00000 n
|
||||
0000004555 00000 n
|
||||
0000006960 00000 n
|
||||
0000009317 00000 n
|
||||
0000007068 00000 n
|
||||
0000007241 00000 n
|
||||
0000007476 00000 n
|
||||
0000007642 00000 n
|
||||
0000007837 00000 n
|
||||
0000008032 00000 n
|
||||
0000008145 00000 n
|
||||
0000008255 00000 n
|
||||
0000008363 00000 n
|
||||
0000008469 00000 n
|
||||
0000008585 00000 n
|
||||
0000004457 00000 n
|
||||
0000004565 00000 n
|
||||
0000006970 00000 n
|
||||
0000009327 00000 n
|
||||
0000007078 00000 n
|
||||
0000007251 00000 n
|
||||
0000007486 00000 n
|
||||
0000007652 00000 n
|
||||
0000007847 00000 n
|
||||
0000008042 00000 n
|
||||
0000008155 00000 n
|
||||
0000008265 00000 n
|
||||
0000008373 00000 n
|
||||
0000008479 00000 n
|
||||
0000008595 00000 n
|
||||
trailer
|
||||
<<
|
||||
/Size 34
|
||||
|
@ -284,5 +284,5 @@ trailer
|
|||
/Info 4 0 R
|
||||
>>
|
||||
startxref
|
||||
9368
|
||||
9378
|
||||
%%EOF
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
|
|
@ -201,9 +201,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
|
|
@ -201,9 +201,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
@ -459,12 +456,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</ul>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<a href="benchmarks.html">Benchmarks</a> ___________________ <em>benchmarks</em>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<a href="contributions.html">Contributions</a> ___________________ <em>contributions</em>
|
||||
|
|
|
@ -20,10 +20,10 @@ endobj
|
|||
>>
|
||||
endobj
|
||||
7 0 obj
|
||||
<< /Length 1073 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
<< /Length 1045 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
>>
|
||||
stream
|
||||
GatUs?#SFN'Sc)P'tIR&nGEk[c=L4>8Q."=$W8@Sdq"&]$qR<'V>^.0^.:q4/-WOEL:ltbk8YT@DE@VVTcV&NHIKh=HHQokT`-,G'428*aoI(.2r/ep5<WLMDnIek'^D;S4W^CkjBdmD=:Y8rIr;n`hXm9q@4.8Y^df(MCc_J-Y@=*Bj#B\QQ-@6$NCBs9HR^m"$%RL+o3KDH$@`>hQ`=(KWR*tt@3=DsAYR+D*](2F4?l0Tq/o+jRc""F96=U&O90*`j%m4MDhC>!#E5)IkFeh1E$/%R!Oj7jZWph%/:'/_LH]6[iM'lVS;u%a3"qe+?r!e13ano+3TpBPiZe'_BcZS/Squ?u#Z[O'M]QQliSqegc1)jQghbu8N7ULY%kXUhK\Ws!ZZ^b%^bE.H;KP_b4/DK,.=cP$q5-ZCp13/*om(4&WT]_FbH%X,^nsC.ZZ:">#1-OM:WYZNaa#(b04t:%&=Y$0SB'4"`Cc9#Un0>!:Lu`&Ul$_bVF?QQnSiI\Cb;Vnh]LR$<k#OKg0)759#.=k%8E.]RMrc!dr3u8e^2=]NeUXj5XKg8hoR]]D1u.(5$njj94l(Qerb+&EI1aD=&V<D8X9hk%.jP:VGV.Z],6r!RjY\+K?4B](LMt72iVb%fh&K3D:207[2&KPY;XaAaA'R+Fr2kN@2Gs3e$]JM<T\S7Z.?9Z/lXG.2Zsb4]n8&g1I'rg]%gM*;FmlXno0Crg0)utV"7_Ip`N*e!sC@-E`7:CpPlN^(N]W9KqWQ#O5A>j%#<B`6a1JI<nks7(3ZD:%DmG&MY5^AhHYn5b2]L1Z#;I)>X#K[;:/.BX(k$#=N]/i@#kEir$907nH))Y_I<+.Pa&(j*9`Co0W^Hl8_71O1WH;$[P@fI/W?pd5\56ke1]:u96pCC$Ugf&9gEhC/HQ23op#(aUW:bfEEu#rQrN)X:;"pY4:%F"[l"&'B:9+Wf3LPO^o*_r0;?$j0^@nm65W^#pY"t?--K:plLA?>T.rcOKHXlC1gS3]\D<"O)b8R*575pYR1uPa;=+"Lo2p8=WIrWGhhe2tXflaTg9peg6lFhl~>
|
||||
GatUs>u03/'Sc)R.rmX&lM2ZJS4c-5UkqZQWPkSe9lB6^XD(SfPet2-o@:bgY,),"a,p%Dm%!*WE\j>8Sl$#,!582C!kn70Gu4Mb%hi;'1<KTL^,Oqq^HXkj9Wu)X658]LIC_X6Q,pXFTs._q48cY[rcn$Qg4Z"j=p#jD5d9MB:V;Kp*`(fY.o`gA5.;;e$6?'%Qm;\2"jAnGj#[)]e6].EnYAKE=b_?T>EtTh>C"HCB(Z'.WV4p[I;aUR]0=Vr4[KLKAWK(;>/$"$KcDdYlCW^/g;&813<e2q\BPEWTk)QKBfHMECC:q)ZfcFpji<'Ge'X[f^N@=dO)rVFai$X5gXA4;M'<Pl5nU+_q\B:\f#,I`H'dOn:78!LPDhG3NfPcg&;8iG)7&sa>ei&#el$Td.R.IenL9!23VfPmJG*O5+SC"n/ZIs$AEsb+I=P%E!G:`W@LSp;TH97jZBNI"N'GSfKHW6Jc8grVMi3iECIp\MC731QHqBT+b"[cpC$Xa;,hQ9u^V8TJrN'^On$KEakELL6gDcV56+o-t+fA\VC#..5cAmW]\NiK.DVs$>LEajHd3Ai%oE)n,;/@_o(qDf=B@lmQaTt!F?\rDS%C9^9/4jE>(:.7_75=f&=J0"X2[rjB@Cg3-?)YJ#J\T@%W3<@Y2CaA0jQJ+\"C0grbk*Fr7BS7eYD-$NCg/^*42`o?*GZq+jWYd0C(\_k6-Q]0/!&r/Q$2+Q@]c**=Z:ZM$8ET&<tE-]Bil2H^(GO]@d_tuX]W?/na280U`'LJMg_YjiaUEo"EZ0i=B-=*a%EL[!dbp^kHb61`![+W#`&'t;,YZ\37)ke^p$SN,hm.Sc<-Qo7;Dg'^,!Rr!gVH<:qt4%cdhbq=<\r;T*J>>W\T));bou'27]@7PTPSB*0Y,"_MK**J-TN;:tDL.mVNWRk(?S#0(<TDZP$+B6:O-:g(6`]e8;pH;p"m8W:Y=[mZ3a00#A1*/t,#H0"Bbp\0"6:W]1@9>FCatQ7-Zud!Mi4nXH3BhY1KY]@VsGK'r")"$?1bWW~>
|
||||
endstream
|
||||
endobj
|
||||
8 0 obj
|
||||
|
@ -87,19 +87,19 @@ endobj
|
|||
xref
|
||||
0 14
|
||||
0000000000 65535 f
|
||||
0000003189 00000 n
|
||||
0000003253 00000 n
|
||||
0000003303 00000 n
|
||||
0000003161 00000 n
|
||||
0000003225 00000 n
|
||||
0000003275 00000 n
|
||||
0000000015 00000 n
|
||||
0000000071 00000 n
|
||||
0000001255 00000 n
|
||||
0000001361 00000 n
|
||||
0000002526 00000 n
|
||||
0000002632 00000 n
|
||||
0000002744 00000 n
|
||||
0000002854 00000 n
|
||||
0000002965 00000 n
|
||||
0000003073 00000 n
|
||||
0000002498 00000 n
|
||||
0000002604 00000 n
|
||||
0000002716 00000 n
|
||||
0000002826 00000 n
|
||||
0000002937 00000 n
|
||||
0000003045 00000 n
|
||||
trailer
|
||||
<<
|
||||
/Size 14
|
||||
|
@ -107,5 +107,5 @@ trailer
|
|||
/Info 4 0 R
|
||||
>>
|
||||
startxref
|
||||
3425
|
||||
3397
|
||||
%%EOF
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="../benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="../contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
|
|
@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified);
|
|||
</div>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="benchmarks.html">Benchmarks</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
<a href="contributions.html">Contributions</a>
|
||||
</div>
|
||||
<div class="menuitem">
|
||||
|
@ -325,10 +322,10 @@ document.write("Last Published: " + document.lastModified);
|
|||
<a href="http://wiki.apache.org/lucene-java/InformationRetrieval">Lucene Wiki IR references</a>.
|
||||
</p>
|
||||
<p>The rest of this document will cover <a href="#Scoring">Scoring</a> basics and how to change your
|
||||
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>. Next it will cover ways you can
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a>. Next it will cover ways you can
|
||||
customize the Lucene internals in <a href="#Changing your Scoring -- Expert Level">Changing your Scoring
|
||||
-- Expert Level</a> which gives details on implementing your own
|
||||
<a href="api/org/apache/lucene/search/Query.html">Query</a> class and related functionality. Finally, we
|
||||
<a href="api/core/org/apache/lucene/search/Query.html">Query</a> class and related functionality. Finally, we
|
||||
will finish up with some reference material in the <a href="#Appendix">Appendix</a>.
|
||||
</p>
|
||||
</div>
|
||||
|
@ -342,21 +339,21 @@ document.write("Last Published: " + document.lastModified);
|
|||
and the Lucene
|
||||
<a href="fileformats.html">file formats</a>
|
||||
before continuing on with this section.) It is also assumed that readers know how to use the
|
||||
<a href="api/org/apache/lucene/search/Searcher.html#explain(Query query, int doc)">Searcher.explain(Query query, int doc)</a> functionality,
|
||||
<a href="api/core/org/apache/lucene/search/Searcher.html#explain(Query query, int doc)">Searcher.explain(Query query, int doc)</a> functionality,
|
||||
which can go a long way in informing why a score is returned.
|
||||
</p>
|
||||
<a name="N10059"></a><a name="Fields and Documents"></a>
|
||||
<h3 class="boxed">Fields and Documents</h3>
|
||||
<p>In Lucene, the objects we are scoring are
|
||||
<a href="api/org/apache/lucene/document/Document.html">Documents</a>. A Document is a collection
|
||||
<a href="api/core/org/apache/lucene/document/Document.html">Documents</a>. A Document is a collection
|
||||
of
|
||||
<a href="api/org/apache/lucene/document/Field.html">Fields</a>. Each Field has semantics about how
|
||||
<a href="api/core/org/apache/lucene/document/Field.html">Fields</a>. Each Field has semantics about how
|
||||
it is created and stored (i.e. tokenized, untokenized, raw data, compressed, etc.) It is important to
|
||||
note that Lucene scoring works on Fields and then combines the results to return Documents. This is
|
||||
important because two Documents with the exact same content, but one having the content in two Fields
|
||||
and the other in one Field will return different scores for the same query due to length normalization
|
||||
(assumming the
|
||||
<a href="api/org/apache/lucene/search/DefaultSimilarity.html">DefaultSimilarity</a>
|
||||
<a href="api/core/org/apache/lucene/search/DefaultSimilarity.html">DefaultSimilarity</a>
|
||||
on the Fields).
|
||||
</p>
|
||||
<a name="N1006E"></a><a name="Score Boosting"></a>
|
||||
|
@ -367,21 +364,21 @@ document.write("Last Published: " + document.lastModified);
|
|||
<li>
|
||||
<b>Document level boosting</b>
|
||||
- while indexing - by calling
|
||||
<a href="api/org/apache/lucene/document/Document.html#setBoost(float)">document.setBoost()</a>
|
||||
<a href="api/core/org/apache/lucene/document/Document.html#setBoost(float)">document.setBoost()</a>
|
||||
before a document is added to the index.
|
||||
</li>
|
||||
|
||||
<li>
|
||||
<b>Document's Field level boosting</b>
|
||||
- while indexing - by calling
|
||||
<a href="api/org/apache/lucene/document/Fieldable.html#setBoost(float)">field.setBoost()</a>
|
||||
<a href="api/core/org/apache/lucene/document/Fieldable.html#setBoost(float)">field.setBoost()</a>
|
||||
before adding a field to the document (and before adding the document to the index).
|
||||
</li>
|
||||
|
||||
<li>
|
||||
<b>Query level boosting</b>
|
||||
- during search, by setting a boost on a query clause, calling
|
||||
<a href="api/org/apache/lucene/search/Query.html#setBoost(float)">Query.setBoost()</a>.
|
||||
<a href="api/core/org/apache/lucene/search/Query.html#setBoost(float)">Query.setBoost()</a>.
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
|
@ -402,66 +399,66 @@ document.write("Last Published: " + document.lastModified);
|
|||
<p>This composition of 1-byte representation of norms
|
||||
(that is, indexing time multiplication of field boosts & doc boost & field-length-norm)
|
||||
is nicely described in
|
||||
<a href="api/org/apache/lucene/document/Fieldable.html#setBoost(float)">Fieldable.setBoost()</a>.
|
||||
<a href="api/core/org/apache/lucene/document/Fieldable.html#setBoost(float)">Fieldable.setBoost()</a>.
|
||||
</p>
|
||||
<p>Encoding and decoding of the resulted float norm in a single byte are done by the
|
||||
static methods of the class Similarity:
|
||||
<a href="api/org/apache/lucene/search/Similarity.html#encodeNorm(float)">encodeNorm()</a> and
|
||||
<a href="api/org/apache/lucene/search/Similarity.html#decodeNorm(byte)">decodeNorm()</a>.
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html#encodeNorm(float)">encodeNorm()</a> and
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html#decodeNorm(byte)">decodeNorm()</a>.
|
||||
Due to loss of precision, it is not guaranteed that decode(encode(x)) = x,
|
||||
e.g. decode(encode(0.89)) = 0.75.
|
||||
At scoring (search) time, this norm is brought into the score of document
|
||||
as <b>norm(t, d)</b>, as shown by the formula in
|
||||
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>.
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a>.
|
||||
</p>
|
||||
<a name="N100B1"></a><a name="Understanding the Scoring Formula"></a>
|
||||
<h3 class="boxed">Understanding the Scoring Formula</h3>
|
||||
<p>
|
||||
This scoring formula is described in the
|
||||
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a> class. Please take the time to study this formula, as it contains much of the information about how the
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a> class. Please take the time to study this formula, as it contains much of the information about how the
|
||||
basics of Lucene scoring work, especially the
|
||||
<a href="api/org/apache/lucene/search/TermQuery.html">TermQuery</a>.
|
||||
<a href="api/core/org/apache/lucene/search/TermQuery.html">TermQuery</a>.
|
||||
</p>
|
||||
<a name="N100C2"></a><a name="The Big Picture"></a>
|
||||
<h3 class="boxed">The Big Picture</h3>
|
||||
<p>OK, so the tf-idf formula and the
|
||||
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a>
|
||||
is great for understanding the basics of Lucene scoring, but what really drives Lucene scoring are
|
||||
the use and interactions between the
|
||||
<a href="api/org/apache/lucene/search/Query.html">Query</a> classes, as created by each application in
|
||||
<a href="api/core/org/apache/lucene/search/Query.html">Query</a> classes, as created by each application in
|
||||
response to a user's information need.
|
||||
</p>
|
||||
<p>In this regard, Lucene offers a wide variety of <a href="api/org/apache/lucene/search/Query.html">Query</a> implementations, most of which are in the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a> package.
|
||||
<p>In this regard, Lucene offers a wide variety of <a href="api/core/org/apache/lucene/search/Query.html">Query</a> implementations, most of which are in the
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a> package.
|
||||
These implementations can be combined in a wide variety of ways to provide complex querying
|
||||
capabilities along with
|
||||
information about where matches took place in the document collection. The <a href="#Query Classes">Query</a>
|
||||
section below
|
||||
highlights some of the more important Query classes. For information on the other ones, see the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html">package summary</a>. For details on implementing
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html">package summary</a>. For details on implementing
|
||||
your own Query class, see <a href="#Changing your Scoring -- Expert Level">Changing your Scoring --
|
||||
Expert Level</a> below.
|
||||
</p>
|
||||
<p>Once a Query has been created and submitted to the
|
||||
<a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>, the scoring process
|
||||
<a href="api/core/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>, the scoring process
|
||||
begins. (See the <a href="#Appendix">Appendix</a> Algorithm section for more notes on the process.) After some infrastructure setup,
|
||||
control finally passes to the <a href="api/org/apache/lucene/search/Weight.html">Weight</a> implementation and its
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a> instance. In the case of any type of
|
||||
<a href="api/org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>, scoring is handled by the
|
||||
control finally passes to the <a href="api/core/org/apache/lucene/search/Weight.html">Weight</a> implementation and its
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a> instance. In the case of any type of
|
||||
<a href="api/core/org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>, scoring is handled by the
|
||||
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight2</a> (link goes to ViewVC BooleanQuery java code which contains the BooleanWeight2 inner class),
|
||||
unless the static
|
||||
<a href="api/org/apache/lucene/search/BooleanQuery.html#setUseScorer14(boolean)">
|
||||
BooleanQuery#setUseScorer14(boolean)</a> method is set to true,
|
||||
unless
|
||||
<a href="api/core/org/apache/lucene/search/Weight.html#scoresDocsOutOfOrder()">
|
||||
Weight#scoresDocsOutOfOrder()</a> method is set to true,
|
||||
in which case the
|
||||
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight</a>
|
||||
(link goes to ViewVC BooleanQuery java code, which contains the BooleanWeight inner class) from the 1.4 version of Lucene is used by default.
|
||||
See <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/CHANGES.txt">CHANGES.txt</a> under release 1.9 RC1 for more information on choosing which Scorer to use.
|
||||
</p>
|
||||
<p>
|
||||
<p>ry#setUseScorer14(boolean)
|
||||
Assuming the use of the BooleanWeight2, a
|
||||
BooleanScorer2 is created by bringing together
|
||||
all of the
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>s from the sub-clauses of the BooleanQuery.
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>s from the sub-clauses of the BooleanQuery.
|
||||
When the BooleanScorer2 is asked to score it delegates its work to an internal Scorer based on the type
|
||||
of clauses in the Query. This internal Scorer essentially loops over the sub scorers and sums the scores
|
||||
provided by each scorer while factoring in the coord() score.
|
||||
|
@ -470,14 +467,14 @@ document.write("Last Published: " + document.lastModified);
|
|||
<a name="N1011A"></a><a name="Query Classes"></a>
|
||||
<h3 class="boxed">Query Classes</h3>
|
||||
<p>For information on the Query Classes, refer to the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html#query">search package javadocs</a>
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html#query">search package javadocs</a>
|
||||
|
||||
</p>
|
||||
<a name="N10127"></a><a name="Changing Similarity"></a>
|
||||
<h3 class="boxed">Changing Similarity</h3>
|
||||
<p>One of the ways of changing the scoring characteristics of Lucene is to change the similarity factors. For information on
|
||||
how to do this, see the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html#changingSimilarity">search package javadocs</a>
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html#changingSimilarity">search package javadocs</a>
|
||||
</p>
|
||||
</div>
|
||||
|
||||
|
@ -486,7 +483,7 @@ document.write("Last Published: " + document.lastModified);
|
|||
<div class="section">
|
||||
<p>At a much deeper level, one can affect scoring by implementing their own Query classes (and related scoring classes.) To learn more
|
||||
about how to do this, refer to the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html#scoring">search package javadocs</a>
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html#scoring">search package javadocs</a>
|
||||
|
||||
</p>
|
||||
</div>
|
||||
|
@ -511,19 +508,19 @@ document.write("Last Published: " + document.lastModified);
|
|||
<p>This section is mostly notes on stepping through the Scoring process and serves as
|
||||
fertilizer for the earlier sections.</p>
|
||||
<p>In the typical search application, a
|
||||
<a href="api/org/apache/lucene/search/Query.html">Query</a>
|
||||
<a href="api/core/org/apache/lucene/search/Query.html">Query</a>
|
||||
is passed to the
|
||||
<a href="api/org/apache/lucene/search/Searcher.html">Searcher</a>
|
||||
<a href="api/core/org/apache/lucene/search/Searcher.html">Searcher</a>
|
||||
, beginning the scoring process.
|
||||
</p>
|
||||
<p>Once inside the Searcher, a
|
||||
<a href="api/org/apache/lucene/search/HitCollector.html">HitCollector</a>
|
||||
<a href="api/core/org/apache/lucene/search/Collector.html">Collector</a>
|
||||
is used for the scoring and sorting of the search results.
|
||||
These important objects are involved in a search:
|
||||
<ol>
|
||||
|
||||
<li>The
|
||||
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
<a href="api/core/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
object of the Query. The Weight object is an internal representation of the Query that
|
||||
allows the Query to be reused by the Searcher.
|
||||
</li>
|
||||
|
@ -531,12 +528,12 @@ document.write("Last Published: " + document.lastModified);
|
|||
<li>The Searcher that initiated the call.</li>
|
||||
|
||||
<li>A
|
||||
<a href="api/org/apache/lucene/search/Filter.html">Filter</a>
|
||||
<a href="api/core/org/apache/lucene/search/Filter.html">Filter</a>
|
||||
for limiting the result set. Note, the Filter may be null.
|
||||
</li>
|
||||
|
||||
<li>A
|
||||
<a href="api/org/apache/lucene/search/Sort.html">Sort</a>
|
||||
<a href="api/core/org/apache/lucene/search/Sort.html">Sort</a>
|
||||
object for specifying how to sort the results if the standard score based sort method is not
|
||||
desired.
|
||||
</li>
|
||||
|
@ -546,45 +543,45 @@ document.write("Last Published: " + document.lastModified);
|
|||
</p>
|
||||
<p> Assuming we are not sorting (since sorting doesn't
|
||||
effect the raw Lucene score),
|
||||
we call one of the search method of the Searcher, passing in the
|
||||
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
we call one of the search methods of the Searcher, passing in the
|
||||
<a href="api/core/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
object created by Searcher.createWeight(Query),
|
||||
<a href="api/org/apache/lucene/search/Filter.html">Filter</a>
|
||||
<a href="api/core/org/apache/lucene/search/Filter.html">Filter</a>
|
||||
and the number of results we want. This method
|
||||
returns a
|
||||
<a href="api/org/apache/lucene/search/TopDocs.html">TopDocs</a>
|
||||
<a href="api/core/org/apache/lucene/search/TopDocs.html">TopDocs</a>
|
||||
object, which is an internal collection of search results.
|
||||
The Searcher creates a
|
||||
<a href="api/org/apache/lucene/search/TopDocCollector.html">TopDocCollector</a>
|
||||
<a href="api/core/org/apache/lucene/search/TopScoreDocCollector.html">TopScoreDocCollector</a>
|
||||
and passes it along with the Weight, Filter to another expert search method (for more on the
|
||||
<a href="api/org/apache/lucene/search/HitCollector.html">HitCollector</a>
|
||||
<a href="api/core/org/apache/lucene/search/Collector.html">Collector</a>
|
||||
mechanism, see
|
||||
<a href="api/org/apache/lucene/search/Searcher.html">Searcher</a>
|
||||
<a href="api/core/org/apache/lucene/search/Searcher.html">Searcher</a>
|
||||
.) The TopDocCollector uses a
|
||||
<a href="api/org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>
|
||||
<a href="api/core/org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>
|
||||
to collect the top results for the search.
|
||||
</p>
|
||||
<p>If a Filter is being used, some initial setup is done to determine which docs to include. Otherwise,
|
||||
we ask the Weight for
|
||||
a
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
for the
|
||||
<a href="api/org/apache/lucene/index/IndexReader.html">IndexReader</a>
|
||||
<a href="api/core/org/apache/lucene/index/IndexReader.html">IndexReader</a>
|
||||
of the current searcher and we proceed by
|
||||
calling the score method on the
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
.
|
||||
</p>
|
||||
<p>At last, we are actually going to score some documents. The score method takes in the HitCollector
|
||||
(most likely the TopDocCollector) and does its business.
|
||||
<p>At last, we are actually going to score some documents. The score method takes in the Collector
|
||||
(most likely the TopScoreDocCollector or TopFieldCollector) and does its business.
|
||||
Of course, here is where things get involved. The
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
that is returned by the
|
||||
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
<a href="api/core/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
object depends on what type of Query was submitted. In most real world applications with multiple
|
||||
query terms,
|
||||
the
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
is going to be a
|
||||
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java?view=log">BooleanScorer2</a>
|
||||
(see the section on customizing your scoring for info on changing this.)
|
||||
|
|
108
docs/scoring.pdf
|
@ -198,10 +198,10 @@ endobj
|
|||
>>
|
||||
endobj
|
||||
38 0 obj
|
||||
<< /Length 2333 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
<< /Length 2402 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
>>
|
||||
stream
|
||||
Gau`UgN)%,&:O:SE2F.gAh8UZMN<cF:=Eh/-?bT.\do>YMQBB9!D(?3q^sN7(DIuB]hBg6fUtcSce5W@XPV`>]Y2)THJqtApj%&qgVgP;0SKK)T_j)p7hLUEck=Q+?$s;0oATjYkeaOfp=Hd)j`HjWZO=Bpb(bD+2*;bd,d`!c^9W4&r8jiW=4Y7U\%.`NqRaie>%EO8c),X'7Fd]G\j5T2nE=.T'dXS`o/p;/jRi1M0\Rp::Q)YVR?.ko?IWc.r[TIV"RX6lR31.9eSst!f):Xc%_S2(HOaT'"hXgIZ"fCm],Vaqk*@=1P)<_ljRG;BS.;#5WWY+CBdGPh>1<up1hd;'UK=R"ZX^'pm.R\qi`V2q%>h^;Ss32H-4tEYhN2j)@E=NF6I'8-UW59^Z:JYA,.8F(-T>Es#*7h7'cP$b&oZ4?D74N1=A#sam"!`8FRKrbYj;Ks;itCSrPl/P[&5d#=/>'"/Vom)F^kT8:,kM<2C5D+h=PA7lP5bGIZ%K<-p2#8FSm1Y&B#&]mO:sh_8Vki%JYpFZH#T`UB5?E+$S+1,L_qTS7/>?+<[d_7.psNOmm*ce1B.nieUYFSnL,1XHRbpkd/e)CErUTE>r!`5b=h\?TV=<OdZ8MX.h/6!XC\:3PK3/1c\%&d4/FA24lRr`u,([;M%WY9aVg<M"$Wc@<tBXO/G*S4)\QcA]adJU,psU=ukAtSQk[@[*4GN?G]KjN0V%bWf&]QTanV:R9!RkMW4bq5\%0?9O\:R/M88OVf3;Z***-uVdr<>9\FcO*%b'e2M[eJgT%"-fjnpA)HrA'Ds/A>'V.GZ()soe:D7;+F-G(!I:D6KGpQZGZs_]tb!'a92G>76I:-8[cBP,!j+6Tm-De^cVabnk<N78rNCT7-WNWU5<CN[1hO&V_pX#-eh=YKGM(?J<7o^Z%:VrpW61lC,+2WKG^]di^&!qK0a^&gC%!j._`3.9@]XZ(a'B-"!=DFeb]O;$/WELCjOG?p$eQIObB#jrj[#iKB;%Jd<W,+kk1bH.'ZDQG#DA->8Jg95b7BO",Rm$qEiP$tH93%j8B$a&!F'BFc*E!?T&G7+&7S$\h/uA#2HmVh<dcG!D!nf\S%(Eq/]O\D;irR3p>_]`lJoFLY;fek(2srk+kA$+aW9&96/;kFsW\l_\k&\@^T5X?UVQ53p/]Co/AUN&LF[Dc*DG*_+R4:UAn>M+GO$7nS]L)P)FkI#-4pQ1m9(_&U",6b%fcVT]J%/_m+6%gN=Hi>UmlPdLTQSYf.h#+0b!h<0S3]T$B(g1uqQJ_'Fl[HM-3nZ,h]d[t'e%$64!YKP1@,8aHs1@Fi1Tc^RPUqgX)RVaKO`.[g6R`,<-g:a/sKgbo>C7In(uo`jp@aJ&(T-,0VC`^:uZ3Zl,G`aE4jh4,rt-d8OYGkmB\:nN<\\]2&ANESraNV%bD=30(\k22'q)mGlo.SO7!m)\9\CP4AW/l<39?Hf50Y?2'<"N2*``U"3"Xd1?X=`Td*`+!45?qI>OR[OAW^gOrL;4VqBFJoha+&jeJePlfoKk-'EZ`3aYg^]htg+Nf-F+VYFEN%q;4)0^EmNPoqun;!(r17/PBj:.(,$3fgfjmZko7b7L>BF[lkJ_(%J&?n#oW@%Bl%jq20bUA!E.Z$LR(lG_\S?=Q+LT/E@+1RsPX9(^,^-I'2/1,`s=MtGs@=,28!I6QrZ[5M1-9A?eahnJ,m^B6Q(>*`8pjps\iZ\Adi)g;tNEPh&:`""n@`#Dj>8,<h$6BAa3%.:V-4Y@&&&8$XiVAu)fa^eu$lp`%QrFg,$lO_Z%@oa$ED%Zus@)+udM!M03a]G]_?8;(Mda,GQN[*YBr&:20gE`5tqZce?2@5"\n%K@Dn+_EnL[-97]!+cm+202nM1:g)m?kD6BKUd:;^BoPS!)0Z$pT/BLASSs2\Um]?6&D!l7A5fs)M%2/*@$b,WKd7lobYBHMIA)VB!q+Y^rdJgZ#k[e,Q+U-ON-Q#:VY6EBJ`RBjW)3V]0Gu"-)//6(]lOgAc",/K)F@#pSA^#LfI>;OQQl,RX::,^L?nql1G/+_>JBP>^(:FbLR(*;f9A+j3u>??$Yo/);!Jo;F.QAJ0fgN/+)^p[l/5FfctN#t0OZ9R+D%MBHUnm[0TKql^XD4edbN;<[#<4Bu&<X+O31/AY0ch0)2\@!%Q+gn=79[eGn?[j_;]]VT8[5$:@M8\^9OE-S"90KJjbA!",<q#U>$e-Q0hLOll*!SpkhDYW?slq`GbZ7!a'S8%*=FuZVD&c/^sbk13_Si+.MX./(j/Verq4cJ@?a+cd,=GXjgI9\SM`-2"2$b6'A]LPtt~>
|
||||
Gau`UgN)%,&:O:SE2F.gAo*-E8rnuOVeIr0'<B5Bh$8`=,rR$B!2O0(I?um45ln9S[+!,V2C&pjF8"r(b?'U[f8GP?(QPnTT/RR-S(LaYL@Dr"1k/m0Thb_OZim_$([j"`4aVhEcc?N#rph\G*#H=lU_e3$QOWs7X1L0,8SJ"PU9OX0]`7ZiKludkgK[?7r`uWD>,9bQP)ejmMk\M]p]2r[_lXGPOl]PQeUIEIPN1ON_.]\34>,nS@7"#!Guh'Vq_+qm@^I4IB"k<U<l27o=c6W6*-cPa%iiZX$[;Xq@Taq>G&(n:R%*7E9-UYnPK2!3E8#/hW"R;SV3]?@BNA/-d(%5CM&s'8eW-o_YhT&DM7j\:)\RA1*p6Kp-4tEYSe*j+@E=NV6I'8W7^m4mGo"lK6u4e/:2[jp$m*NM.Pur'-/PG[gLTK1Z%Qf<f`A>_FDhn7Yj;Ks;3>+OrPm;#[&5d#QR1k4/Vom)F^kUK:,kKf2E<9i=g_oslNNW?IZ*#C;g?=QC%sig+IG6$2Gg&[Jr:A;)t4dkH&J"[7KqWi^ALZ+N/XA[edOWW!Wd"h,-mR!NgLX+XCR/o(.Z#Pn3jT;@@a3@8,#obYDqIE`GJut!-oJIpsQZEM:<B@>%W%\AN7J>cLF?8XBK+E)QhF6=7Z_3HCSCt-7.:Ign^V%T+RZb`_=^Wiph]BFlP%^4O70"8lR/W74Z!NZ#b!#)jfl,[Eda/6H2W(0\&T1\57rcX\9*T<5ihX7A)D>fI&D,k73gjX])`d0b,^bY>_a%A>\_L7ue?6<1?6MP''VuYr<PiY_m&'e/N]>5"2Ar?=SV3j_:G_BPuG/>b0OB@9UdUYI.co@*Cg<T/K@ca1YteQ;<a@:#gflUF)hU>,Oi7CT:?(>T/YQR8RC']e$H]q7\WW6>rPHH$X/Q-50I8MiOf$jI_f2WIsJ'\itfGP(="FOrl>!N0:g+C`)e6.#Hd@.sQ$@/l%)'dUK2L<q;+cUo.t@6UMo%TLYmpNghX'B[B($5=@Y&VP(4-F^3JL%`t_sK=Hnd=UC+AP@gZFL2GWQE/+6RYpNJ=lBsp[):!(n*&L2DfL")7J/]YG`OM/H2=VP_3D#R4)$jc\h\[tm`HfaBfu=,Sn#6_8$O#F[jJM'70sBKJUX%C6%oCI>GaW1jW`:Qeo$#.5m69Cl@>piX9t(gP!LN=JGLQnPB"0EQ$"U+KV88oJC-C;;q=k$]>F0USe>g$-BP;&L-Q^J.]"1]+"WV"n06+31XBP3`mhZPM6-#^3=4h1B%>p`\$8gC?e8?\Yi9^&B]CP6&IAm8,TLQcuV"_)k#dH`p_$bm/+?PXB/K>,I5!j>Ie9Md(224q#=A%nl6@#<Mpr>A-3GKdVJ%kaGc0GU44X$'WJU(K-##n+sb'[HA\96(H$I?.VnX=NuUm[&eLQ&4]%Ig*/!5']u1Q'ib!)`9AADL"@s*2*@r2`j?$,)UL;G`j'm"lF-cc&E+FN1hj^GSPKeB4:D;(nDLXTIE2VY6c]p<GqNR>0dH6Jp/f>XLX'9)X-tT/NkFNXJZFo4`'3:RiX!!*4P,ojhp"`>'9r<">gQV3=a`pl:9YqG$Jl[[jroQXQ3MP.!*ITtj#YNj4Er=7[YT(lrln)i"#P)0.hM!NK4BRS\@1SF[2U_H)NLM0('c&c*&X;.DHI]F.ZCm],K\mZ6jM9r$bP-R>K-D4T+bRW_0?hMcN<W7AN9I1UliR^L7R5-mDQq<^7=o/j2>1WpC]bP^IW2c_l]p$@4=__B^4M&gi`1cM][O0mA@KiWC=&BAcP>$Vc&M(ls2D;P$-UWLY8fptRMnY<o'f)]sbj(-\ja<OR?F9[-5R9OS]9"onRV?3K,7R,0r)j#`nnm?L1)gGmWEq^=:o0(`B5.E76mQ3Ujj#O]mM-eOG3PH1Mr`N!<?/bs\rht6?O]:L[9'SGmr:IN:Y&=O!9OtGTG)iaV&QfD-_bGnt3@.KJ_4D<a+JDF=h`ah>j7*sE0Muq]SMs[I5;_cOk+IQ`PT7b#Sj1c4Y7ZuIb\FhO7dm.GcPS8DIiG7UL+mQjRs<m8$LIAY^^3=_&,Qu^K-#dD^7rVr8/U+>Um*f=?"$/M-[\$\85g;!7iXt:;YSIE)HGJB%/k,gII'SqE=b*kOJ7:.3'=.1l>c1a@olnR(E=)T"Yr_6<D8/,h\%^%>ngHH?J`6D$oEP=i<'Ts0XDrogQ(#9kaV:Z8[Aep9]f49ik0C.kZV+0</gqmUtUrJ7o29oRG,Rg)O71P8tDqU!SHDV4DXAK=k&r5H_1KPcgq_qa6ln:_<j^@JDL>h5ladnXI0U_qVlqH2g[8!R3@;(RCIf(Y$B]ka&W/RTN`ONg@YPDX'PasWr(k;_o\,ahDt5;FXNSp+V$.:qDFI1Cr^aI^th5Q*0Hr~>
|
||||
endstream
|
||||
endobj
|
||||
39 0 obj
|
||||
|
@ -213,10 +213,10 @@ endobj
|
|||
>>
|
||||
endobj
|
||||
40 0 obj
|
||||
<< /Length 2003 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
<< /Length 1958 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
>>
|
||||
stream
|
||||
Gatm<968iG&AII3n.pAj5mY3X9.)A?DUZ8"2M\jKLkI$!'YsdB*VJ##,Xl\r!OCP'G\hFI!k\?sm^ES:R*T*bBA[@S>Nt$(ZeS'bQR%Fe)o^Y)dkF+N-`FMAI@@j:c^nF^*Y.fYoB"3b(]!n4Hc*][5`$S.o8WM=R@k9lZon)S#5q;0ZJ^>bX6G82,#V<qDU,0+k'VKX-kZ+tDRc@'6l*F8PITb!PG#]<-$WI+-8-]Ccm9G#<GDtB8Spu&T&bV?hoJF)n;fO.m/V\k8(!..=)pr+?HCr\4jZC8o#qWXmXKn0pHSBCbk&cqF+=#]NOK9)c;??O#0.")AoLWY7mh5hZht6.@5=2bTuiS_9ge60rI?5`\`!?59(ja0,X?XF"._ZoX#S:qP0Sb@4S1/2p_B\E$I`h`$Nn^]G0)UuY8CCKZB7mPj&A*t_pO2=UVN#cV?LssI%1Z^r(]X;WVf)J@eCCB`1MOlRfm%Zdd7G>`dVC>`*nOTag"?d)l*`EnHa"4^qHW&>?Vj81&1%X`Q=n6oja45]6_'$H3`rOH9XR-IJrt!0^gZ$[C)`.aBsE@OAQdgr-ou+Pf>+EK+Z-8Q=%&JS>b!X-YuYCU>/Q=C`N:H3]<8C&W3t8^!aIR4\04>@PI];QQ%PW[@QBI.k9pX.hun>K@&.Pn923_jRr0hs#5NG'G`-s@Cr2l(9/_>=nB(2(&PJ8mpcuj,)jCdpE^UcHShd)$oWVV0G7MI%=EiL9s7g0X0rHf8-S`EHO"8R9#&d<!M,$?O.TT*[eA?1?+U4F3?-OlY8"PPF=l^6eb)5`@(.)mAAQ&W:d/?mMk1Zb:mXd8^A,Me$<Bf#dX)D36h?VCQb9>?P[I0.6EPg"47l%<Lm`NnB,CK30CP;;/-r&f<.BMB[Sd_Prd7Ed"D-%qSYQGl%:=M5Nl(]1#)]n&\qSG8g5=Qki@e4Fj*W>?^3-<"j'W[5X7_T\&&>&<ACKXqn&c&>J6j]TC04kq2B#n*.q4[qO^/.k=:[e<JLHq$qOm?-RK.\*<Ui.rb#<Xi/h"*V9[C]/Z/8Xep<j;#W<NH+e;bE^oS:sj0loH99[Fb[g"#CH^9?l7W.-u_^1<L\Wf9'@fS1h6q[B;&o2gJ_HEXs.Kk'5<ed^<dj&!3;h_`%6CGrKqG-1)[S[O9NZCqG+D@KsD`tW7SCOlU6+*5kmVN?ssK/4<g@6oh\*0aV6,KRcZ)E(Qs3Fj%9MB39TTqmhX>$A0?1`u-YmCAJK`k^[GT'W,hL4UXZ^,_D$kUoADPaonm$.sk_QU]%iW!NaU]=+C*6.;Jck8/<B]8D/\P4a&:`7\,&pNDgM-6EB+\^8bg]:$J8KK5)6pJ>lRW(@gHIp$$85jR;`fs#]tEAJ0'J`L80)ZA;BZYh]0pa-.(W)@#RmW<e5g6",`*^D9TC.0_V%YgX_K2+Mkc)^iJDZU3nA^+Ub+5Z%8@Hqk+\&3m(c5W[M_[ml&O4A:[#)9s"b<0BiXrgdiFW!?-)\esk5poV&7(PW1-(6RtkNIt]$KX`mN4m2=+3XsK-G,bV;9m(lI:h<^@SLSa2]'iBP9D]"ae/uR,[,J389#-:M3!72VpVV!c^*;$h.<=u=iHjD`HYi3n%H.&%@lF7:9Kn;40ECMKsdQXqfNPh`11b)Tt22#N5M-")uI>qY8n1><#_UE)l+O]n4SB\i*+s\Wj\^H,]rM-o?N+D`UB1iNc%8G[g1?o&JY:\G?T2#EQ4#?F\.(1VPJ=]Z8l'r6I>S-*081u!eB\iHhr<=p*W_hADQq?dt_3O*d(Xf=Wa@o7dSf>i+4;t\rK@%LiAIE`h.u"_e*V?nh^[2XKhfhW?GOdVt/e-b?GF"7+b:4)XjhbHkaNT"jqA'+Mq03nd&gkcCU*qAN!@#")i/QG'DW#Z65-9E93/@mQN2qZ:MU9$d23"CrMcLBfAYUh#oI(Uql_g8_W^e+95Miq3iteh/4&L5.TaVSo;5)Lqp<)mM^ab[feJRYEAN7W<VcTqQ]$F~>
|
||||
Gatm<gN)%,&:N/3n1F0hK+P]%0:gmg\kR7Pg-4#7J1*rT64s\,g/5W!+=W0U$LjmUn^o06!k[X^mb&mn/f/GlMf0!iD>!AP7IDDX99M^CNT0C&=b'B-$0nW.T,')HMrtn]LBILoEjP949tpVChQS^E:uY"%ZL\!DqM!^d2)"$YpQ)f]FYMjN/C47@li'BkGQ2/;cTG!>4nk;8hg,OJ_>n/B@lQq;N""<9%nD7fVHXs+*[ZMD&N\C6qBMTGBGt'XfluYSaTN*9An*Y8ANmtN%W)+U<'H%+3XB^b`t_cs`ApD-RZ4g25k_3';GPEeN,/ZsU_@F3BqRG55gW49;BH'%*Ba+VLiP9H(KH@C:Oj+.r;LS;n9VZH3st\dYNuLk9WV$L-=DlW2tBf_`$;en+SUbBCM@)$a7-,c2bY8e0@!Cc*9ueJnf?ZWo)3\/bF?grJ9ue:ZE4k<6N6lbM2_!FiW#%9;3iFk%7VC+=FT@t$`8*4PI;mW#lEk"22kFt4G:3m8PuKroL>0/#-q(R68=jUnB6\^<UQc-bnZN;Um6fq%BHrlN<O704%EUa'cJU8]b.oMYXh@!L2MkfBUXJ;%2FbclsD4P`0o9<#fS%^%2TM(W30/j`<e73=Uts/'TRrT\nY=6)ci%5Sk2<N(a2UPl8q$gkMXhRnl&3#hB&.U.iJR".ILZV2[Z5Z>:[9rFZkX%fIY&s=Z?\oqMuU9U1Yc&b]VU!2h!H6K%6$dkrk']WHlJ+;MPM^'[kGKjLF5&kBUa<*S77o^Z?Ed3O@'>.bdi]#g$OXN`8>7W^b_g[/r6f?SkTHr5,,rL6jh1/?>n+%3GLZFXR]r)X7-23p]ufXba:Mm_l:6E78GS&@tB&ap^jl-l!ISX=cdXbrVsA0bs)o07s8kB,q(o4r*^klI:goIn3`5>Bq#_HVLXko[HR\e_Y<&=9A1QMNXq="'<uIbMU7UC8MhFL6TLP>/3bBH/6B!AK:jD0;cu1XAmQR;26HuQuYjG:dHdb(!**C\eUbtG0[3jA8dMrh7BiLI&bQ7NpTHjWB4r+Up?Ztg[#BN4+:s@'PX(QT`cV_F39W2O>RVHb-]NH)cjnu^qltO0`5IEL5MuOLIqYRdBgZr=B!<E?!FiK[/(e9m=gdAAaj^JS2>oE0Q85[f2!\K>0G#.CJJI>"Iqb:)^Al?eHCQg?!G[%ehlU`ft#r)Y%$=N1fp2u"YI:*idGcI)$0e.0\9&moB[e.K+[6nkdX-H$UIM+kQK+]0a<X;-tU<Y_8Q#>Qj/iIl89KciF^nH.sNEej.oX^KPkp^2>WFioZiV):t:G,=oh*cOb6]HjN[#i;;WW1k0&e$\3@RI'uS\aCiJf&`7fW8]>G1$k`6I(#UT`b[i=M0`b_'cN;3'E.hj\4ZZ8D=oV9d(q0>KU5WpDs3"1K4HUEKJV9FN(K;i7M(Y>P(=,+@KTB@d/<OCD@R`Xh7qK-A3d.NeOTL8!IXL.QtT_`&hpaB8=1qM9Bg/:#=LKnpD<P5l;*:K99HaMATJ[9#&Pfo$ea-@"9ZrY]'^MX=0SFamjX1Pp`G^*/(@m>1^CBm:DWh%NH%Z6<MqJ/GHi*P6cb'$MH9Vco2Se#+Uh_IbD%>i00%hs=7.Y-$_]NoTKET1QCVVaU:Z??P\3F'!Zd/'OR:ZUD"JF5!Z!J4_V-W/K7h6lCW+*PJ@?LZ7/T!Mpf=I=]F#='nu<3F#&G=\E2m63eAjX(c2KOUge::[g]Zobn:4[^^he^3eG00T*n(P.pHki?#KMF3&Lr4kZeMSX8#U-)-O3,9O!H=j7]N)pY0]0(!n]HG+@MUM:hI`t;Jgq=`_N,Qb@eY,)l=aKT#,MAN_@t4c-GB`NO;&5s)%,U^<3(A0+HWm"[E!YtahX&3`?"ql6N$--h.e*CI=rmA[(4_^"IDn/1p5f^#-PsGGo6iP%+L#^M4,67C>Cna"hk;''E_Y>=*Fdrf~>
|
||||
endstream
|
||||
endobj
|
||||
41 0 obj
|
||||
|
@ -228,10 +228,10 @@ endobj
|
|||
>>
|
||||
endobj
|
||||
42 0 obj
|
||||
<< /Length 1325 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
<< /Length 1277 /Filter [ /ASCII85Decode /FlateDecode ]
|
||||
>>
|
||||
stream
|
||||
Gat%#>E>75'RnB35h^ufV3V2AQH.9mB.mGfgJ1dtkk)]A'ANY$Cm-^SokPZYT,C2/MA#_MHnO<ehs=CYm8f8uf/i-an#NR#4'!2QqUZ83o./.,m+;s#8lkC^2a@98f6;f-HMYm(^N-niNrHOhIl$T\d.+5DBmj1";(qS7Nm#!V;S='l'-5Ce+H`[Y'ZG?FEo=#[N25In=f*2-8mVRan['`ZV5-7<gZ<PIG"G]Z1Cusg;E9oAJlaOncdqNfM=I(;m175:_:1%i$CWFs`g[M)P"UI^IALVY63VSGI`p1f/e9T8VCFr+%7M!C8L%A,-"?@]5Ftg1#YDic3CQK+S2Jb'D/V-"ADU.nFr&.`:k]H%H#]Y5P!eYALBqE[Pc;lG(0!/cU4>P8"QDqo%QUcPaVTp`J4FBYA\adfUJIEb1D%&8gDM,1k8NE$JG4IZ"mC(Q+@\7\acnR-p_-&bE?Ze[J.WS$fM:_C?n"s3=!6G><"A8`6F"*%73>7c'G[Hl3<AZ\jhZ+Ua+NJPLS9X/Uht_Z4s"W%0.<fGVB4lYUiaGb;YuWNa7,TW;Xoou2\\:<O>gs*nTp8Lb,RUs"X+Ar.VD;r8Tc`ScC)&obam]*#%P2g,@sdl2ua+<>m1lLATc=P1m3nK7LJj;$-HJt7E-gO.h-OC=dmZ`\<3//khn?Aca+tap@)$B%-_IVrF([CDGSud#O1d+<#op:ap^S2F]__43n'0bLg^IZ<r"HN*ucP?#12kN%6\7Y<TP+!F^KBn(:E'\9:/&F2%`rtZ%NrdF=@TJr%"DYg5g*Ck4k!hbX>X8T2M-&2MT!T`nI*)o3%^LLr(oHE*iRW4*iK.WTZ9P##4!oQTE@=s$>'!8?*C$.@Qs-b[t:&@0RXI&<KF^RSWj)`Q('hWGj0a%S<H/"T-p&Y\3RN;F[I.<&:=\oAWd,bXR/uKdhg"W=UVu#f9?=-U$h7JfW89XOI]["g;i]fI.n1F?t]XKcK9FMsQhI'_Gh=':KqN9VWJ(A!hqP28e`I+/o"$aY-B=osFUnI$V"#C9.Y%N=Ic-jcl*'47b%Hb`4'`h$[Q7)Ah@kA$BX0/,U0`f=MR.@_ZU>%Jq@Eopoa7[bn5%K+>W1Yr=sb7=*I(^$VXheg0Wc7ZkBD6I/j8h5%D5+2'G.qk:uQP.G5&?!B?/>plqQi3MT,[MQ;tc@4Dego4&"en#M5XK?E]d7'm]aMF)^Jk^.+\)!87fSfY-[O!j<ltmSAZ*19NHWIVg5)WiQNT+jLnH*p1oI:W:e>9JD"<c1@pO'/>Qd)O[ElXY!lS?S:COH9X6ml1?9dLPQ]KT`t+22ZAeG~>
|
||||
Gat%#>Ar7S'Roe[&?pqd5bgrWDLPjEA_8[ZCZD*!b8r'gZIhaE6;i!_?U%/kU5t>qXU#p9Luc*okK:F(lK@R"H0%+LoD<+,nQaCsj4O8?UOIXWVhl?SOuh:]j6H'EO2&;f*'Z,=gN)Bb[lj9F4/\QFi5T*eC`DQ@S+iY[Bs*/;j4jh@Z^[k^7OUFI2>`0[a,/@^SVXC]+*/Ui#Ed?:_R2CLcY-Us*.)ufD%^/K6m8Lt.&IID\$Q&#_M\$+%7_qS:,,M/Ei3[`jg<<%-P_%kIAj(I)]2+-fNYprHjACf$G\@0(\O@8+E`Up%)>DE7eF>m8TcC@"Uq\0^h42a@Q,,9pe9AD!q%90@R(,kBhN^P+pCK3;)*K8_$C'$IdF!9]ab2?W@htReeT1=Jda`<,UG<%'!__r8__fI-^k?*ms9KNHDBF(VCRbVeXEm'kl5JbXJp$cgC%ha*/6*"#nmVa:OoX!"dt[#6I_7h'@QsQ92W9nJMKu1kaHu!)M)S.ERPMN]Mhujm$jj/d=aWd2*+jo-"\LaV3hdP]HV`9gKcPqp6e";<0C"!jX6n9`k(t1OAWn#nBM'-.@F.9nhJB[c+Ybqq-F>00MY<<6&l]'X*G5"$53skg]un[f&SfZMNB,In"i8)@#Z[.?hGO)523(>mT,`!BXEV'Ni@(;Z5KpK5\F9s8c%3s8IL&<[!h-]o:[Y[=D\tt9AsT]W*%%r8o9F7=DGt^-&11X5lO`d@dUpI?V9IW+L@1glK=BIZZRTM/`rnT)"D(d-RPRlhUFm!hsZShQ9T?b<SKgg:gJps%^U6u'J9$3kL`SX%"666(5<s#MG58bY+ZiPZQVB50)Tn3a'7;tYU0>[acr(o)q^uqloLAmM)A.rP[[t3Y.FEWn6oQ7HR^<l0S6N-P8_<ETT\]iqlDNCW<_nAg9=F@L7$*mK*p;X2]">*S*5W'M_37m"]H>Q2Ds:*9QdHT?69_Ebn!Krl1iWS*kUQ7_Hp2O1VjrN9N=T#\S/9t\6J6c"&IQ$V.aJ#94;)2=`i=3IZ)bBiR<k?0D`N%4UZ^_+#p5=p>bnG6Puu^Fd+qD).*5KF7,2Ca\OIO`8)\Yi8"7ejcnMlDCt-nm,`kqiF1)opk+R?QafV/IMI*F&G9snEdSPqQT@Q)QZ]l_2PHTeB9iH=4'R">2YfMnr_;@K5nZZVZLFX1h8GG>L;l9(Hs<sJ3KBqJ39Y"I+S$m0n@X"?cB[cb!&=ZT-q\n1'#*.ZNYb&0\T$r#J1"ffXam:+b1"!e@eVP3"\f^~>
|
||||
endstream
|
||||
endobj
|
||||
43 0 obj
|
||||
|
@ -437,43 +437,43 @@ endobj
|
|||
21 0 obj
|
||||
<<
|
||||
/S /GoTo
|
||||
/D [39 0 R /XYZ 85.0 297.0 null]
|
||||
/D [39 0 R /XYZ 85.0 310.2 null]
|
||||
>>
|
||||
endobj
|
||||
23 0 obj
|
||||
<<
|
||||
/S /GoTo
|
||||
/D [39 0 R /XYZ 85.0 245.747 null]
|
||||
/D [39 0 R /XYZ 85.0 258.947 null]
|
||||
>>
|
||||
endobj
|
||||
25 0 obj
|
||||
<<
|
||||
/S /GoTo
|
||||
/D [39 0 R /XYZ 85.0 181.294 null]
|
||||
/D [39 0 R /XYZ 85.0 194.494 null]
|
||||
>>
|
||||
endobj
|
||||
27 0 obj
|
||||
<<
|
||||
/S /GoTo
|
||||
/D [41 0 R /XYZ 85.0 624.6 null]
|
||||
/D [41 0 R /XYZ 85.0 637.8 null]
|
||||
>>
|
||||
endobj
|
||||
29 0 obj
|
||||
<<
|
||||
/S /GoTo
|
||||
/D [41 0 R /XYZ 85.0 593.466 null]
|
||||
/D [41 0 R /XYZ 85.0 606.666 null]
|
||||
>>
|
||||
endobj
|
||||
31 0 obj
|
||||
<<
|
||||
/S /GoTo
|
||||
/D [41 0 R /XYZ 85.0 542.213 null]
|
||||
/D [41 0 R /XYZ 85.0 555.413 null]
|
||||
>>
|
||||
endobj
|
||||
33 0 obj
|
||||
<<
|
||||
/S /GoTo
|
||||
/D [41 0 R /XYZ 85.0 490.96 null]
|
||||
/D [41 0 R /XYZ 85.0 504.16 null]
|
||||
>>
|
||||
endobj
|
||||
44 0 obj
|
||||
|
@ -484,68 +484,68 @@ endobj
|
|||
xref
|
||||
0 63
|
||||
0000000000 65535 f
|
||||
0000017815 00000 n
|
||||
0000017908 00000 n
|
||||
0000018000 00000 n
|
||||
0000017791 00000 n
|
||||
0000017884 00000 n
|
||||
0000017976 00000 n
|
||||
0000000015 00000 n
|
||||
0000000071 00000 n
|
||||
0000001068 00000 n
|
||||
0000001188 00000 n
|
||||
0000001297 00000 n
|
||||
0000018123 00000 n
|
||||
0000018099 00000 n
|
||||
0000001432 00000 n
|
||||
0000018186 00000 n
|
||||
0000018162 00000 n
|
||||
0000001569 00000 n
|
||||
0000018252 00000 n
|
||||
0000018228 00000 n
|
||||
0000001706 00000 n
|
||||
0000018318 00000 n
|
||||
0000018294 00000 n
|
||||
0000001843 00000 n
|
||||
0000018382 00000 n
|
||||
0000018358 00000 n
|
||||
0000001979 00000 n
|
||||
0000018448 00000 n
|
||||
0000018424 00000 n
|
||||
0000002116 00000 n
|
||||
0000018514 00000 n
|
||||
0000018490 00000 n
|
||||
0000002253 00000 n
|
||||
0000018578 00000 n
|
||||
0000018554 00000 n
|
||||
0000002389 00000 n
|
||||
0000018644 00000 n
|
||||
0000018620 00000 n
|
||||
0000002526 00000 n
|
||||
0000018710 00000 n
|
||||
0000018686 00000 n
|
||||
0000002663 00000 n
|
||||
0000018774 00000 n
|
||||
0000018750 00000 n
|
||||
0000002799 00000 n
|
||||
0000018840 00000 n
|
||||
0000018816 00000 n
|
||||
0000002935 00000 n
|
||||
0000018906 00000 n
|
||||
0000018882 00000 n
|
||||
0000003072 00000 n
|
||||
0000005698 00000 n
|
||||
0000005806 00000 n
|
||||
0000008135 00000 n
|
||||
0000008243 00000 n
|
||||
0000010669 00000 n
|
||||
0000010777 00000 n
|
||||
0000012873 00000 n
|
||||
0000012981 00000 n
|
||||
0000014399 00000 n
|
||||
0000018971 00000 n
|
||||
0000014507 00000 n
|
||||
0000014670 00000 n
|
||||
0000014858 00000 n
|
||||
0000015078 00000 n
|
||||
0000015277 00000 n
|
||||
0000015588 00000 n
|
||||
0000015792 00000 n
|
||||
0000015985 00000 n
|
||||
0000016200 00000 n
|
||||
0000016521 00000 n
|
||||
0000016701 00000 n
|
||||
0000016886 00000 n
|
||||
0000017103 00000 n
|
||||
0000017259 00000 n
|
||||
0000017372 00000 n
|
||||
0000017482 00000 n
|
||||
0000017590 00000 n
|
||||
0000017706 00000 n
|
||||
0000010738 00000 n
|
||||
0000010846 00000 n
|
||||
0000012897 00000 n
|
||||
0000013005 00000 n
|
||||
0000014375 00000 n
|
||||
0000018947 00000 n
|
||||
0000014483 00000 n
|
||||
0000014646 00000 n
|
||||
0000014834 00000 n
|
||||
0000015054 00000 n
|
||||
0000015253 00000 n
|
||||
0000015564 00000 n
|
||||
0000015768 00000 n
|
||||
0000015961 00000 n
|
||||
0000016176 00000 n
|
||||
0000016497 00000 n
|
||||
0000016677 00000 n
|
||||
0000016862 00000 n
|
||||
0000017079 00000 n
|
||||
0000017235 00000 n
|
||||
0000017348 00000 n
|
||||
0000017458 00000 n
|
||||
0000017566 00000 n
|
||||
0000017682 00000 n
|
||||
trailer
|
||||
<<
|
||||
/Size 63
|
||||
|
@ -553,5 +553,5 @@ trailer
|
|||
/Info 4 0 R
|
||||
>>
|
||||
startxref
|
||||
19022
|
||||
18998
|
||||
%%EOF
|
||||
|
|
Before Width: | Height: | Size: 350 B After Width: | Height: | Size: 348 B |
Before Width: | Height: | Size: 308 B After Width: | Height: | Size: 319 B |
Before Width: | Height: | Size: 191 B After Width: | Height: | Size: 200 B |
Before Width: | Height: | Size: 197 B After Width: | Height: | Size: 199 B |
Before Width: | Height: | Size: 222 B After Width: | Height: | Size: 209 B |
Before Width: | Height: | Size: 197 B After Width: | Height: | Size: 199 B |
Before Width: | Height: | Size: 390 B After Width: | Height: | Size: 390 B |
Before Width: | Height: | Size: 207 B After Width: | Height: | Size: 214 B |
Before Width: | Height: | Size: 219 B After Width: | Height: | Size: 215 B |
Before Width: | Height: | Size: 207 B After Width: | Height: | Size: 214 B |
|
@ -39,15 +39,15 @@ Bring it up in <code>vi</code> or your editor of choice and let's take a look at
|
|||
|
||||
<p>
|
||||
As we discussed in the previous walk-through, the <code><a
|
||||
href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
|
||||
href="api/core/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
|
||||
Index. Let's take a look at how it does this.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The first substantial thing the <code>main</code> function does is instantiate <code><a
|
||||
href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
|
||||
href="api/core/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
|
||||
"<code>index</code>" and a new instance of a class called <code><a
|
||||
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
|
||||
href="api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
|
||||
The "<code>index</code>" string is the name of the filesystem directory where all index information
|
||||
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
|
||||
the current working directory (if it does not already exist). On some platforms, it may be created
|
||||
|
@ -55,45 +55,45 @@ in other directories (such as the user's home directory).
|
|||
</p>
|
||||
|
||||
<p>
|
||||
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
|
||||
The <code><a href="api/core/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
|
||||
class responsible for creating indices. To use it you must instantiate it with a path that it can
|
||||
write the index into. If this path does not exist it will first create it. Otherwise it will
|
||||
refresh the index at that path. You can also create an index using one of the subclasses of <code><a
|
||||
href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
|
||||
href="api/core/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
|
||||
instance of <code><a
|
||||
href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
|
||||
href="api/core/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
|
||||
The particular <code><a href="api/core/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
|
||||
are using, <code><a
|
||||
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
|
||||
href="api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
|
||||
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
|
||||
useless words and characters from the index. By useless words and characters I mean common language
|
||||
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
|
||||
stop words and characters from the index. By stop words and characters I mean common language
|
||||
words such as articles (a, an, the, etc.) and other strings that may have less value for searching
|
||||
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
|
||||
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
|
||||
different languages (see the <code>*Analyzer.java</code> sources under <a
|
||||
href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
||||
href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/common/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
|
||||
function simply crawls the directories and uses <code><a
|
||||
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a
|
||||
href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a
|
||||
href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
|
||||
href="api/core/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a
|
||||
href="api/core/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a
|
||||
href="api/core/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
|
||||
represent the content in the file as well as its creation time and location. These instances are
|
||||
added to the <code>indexWriter</code>. Take a look inside <code><a
|
||||
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
|
||||
href="api/core/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
|
||||
complicated. It just adds fields to the <code><a
|
||||
href="api/org/apache/lucene/document/Document.html">Document</a></code>.
|
||||
href="api/core/org/apache/lucene/document/Document.html">Document</a></code>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
As you can see there isn't much to creating an index. The devil is in the details. You may also
|
||||
wish to examine the other samples in this directory, particularly the <code><a
|
||||
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
|
||||
href="api/core/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
|
||||
complex but builds upon this example.
|
||||
</p>
|
||||
|
||||
|
@ -102,28 +102,28 @@ complex but builds upon this example.
|
|||
<section id="Searching Files"><title>Searching Files</title>
|
||||
|
||||
<p>
|
||||
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
|
||||
The <code><a href="api/core/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
|
||||
quite simple. It primarily collaborates with an <code><a
|
||||
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a
|
||||
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
|
||||
href="api/core/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a
|
||||
href="api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
|
||||
(which is used in the <code><a
|
||||
href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
|
||||
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
|
||||
href="api/core/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
|
||||
<code><a href="api/core/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
|
||||
query parser is constructed with an analyzer used to interpret your query text in the same way the
|
||||
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
|
||||
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
|
||||
'the'. The <code><a href="api/core/org/apache/lucene/search/Query.html">Query</a></code> object contains
|
||||
the results from the <code><a
|
||||
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
|
||||
href="api/core/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
|
||||
the searcher. Note that it's also possible to programmatically construct a rich <code><a
|
||||
href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
|
||||
href="api/core/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
|
||||
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
|
||||
syntax</a> into the corresponding <code><a
|
||||
href="api/org/apache/lucene/search/Query.html">Query</a></code> object. Search can be executed in
|
||||
href="api/core/org/apache/lucene/search/Query.html">Query</a></code> object. Search can be executed in
|
||||
two different ways:
|
||||
<ul>
|
||||
<li>Streaming: A <code><a href="api/org/apache/lucene/search/HitCollector.html">HitCollector</a></code> subclass
|
||||
<li>Streaming: A <code><a href="api/core/org/apache/lucene/search/HitCollector.html">HitCollector</a></code> subclass
|
||||
simply prints out the document ID and score for each matching document.</li>
|
||||
<li>Paging: Using a <code><a href="api/org/apache/lucene/search/TopDocCollector.html">TopDocCollector</a></code>
|
||||
<li>Paging: Using a <code><a href="api/core/org/apache/lucene/search/TopDocCollector.html">TopDocCollector</a></code>
|
||||
the search results are printed in pages, sorted by score (i. e. relevance).</li>
|
||||
</ul>
|
||||
</p>
|
||||
|
|
|
@ -34,10 +34,10 @@
|
|||
<a href="http://wiki.apache.org/lucene-java/InformationRetrieval">Lucene Wiki IR references</a>.
|
||||
</p>
|
||||
<p>The rest of this document will cover <a href="#Scoring">Scoring</a> basics and how to change your
|
||||
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>. Next it will cover ways you can
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a>. Next it will cover ways you can
|
||||
customize the Lucene internals in <a href="#Changing your Scoring -- Expert Level">Changing your Scoring
|
||||
-- Expert Level</a> which gives details on implementing your own
|
||||
<a href="api/org/apache/lucene/search/Query.html">Query</a> class and related functionality. Finally, we
|
||||
<a href="api/core/org/apache/lucene/search/Query.html">Query</a> class and related functionality. Finally, we
|
||||
will finish up with some reference material in the <a href="#Appendix">Appendix</a>.
|
||||
</p>
|
||||
</section>
|
||||
|
@ -48,20 +48,20 @@
|
|||
and the Lucene
|
||||
<a href="fileformats.html">file formats</a>
|
||||
before continuing on with this section.) It is also assumed that readers know how to use the
|
||||
<a href="api/org/apache/lucene/search/Searcher.html#explain(Query query, int doc)">Searcher.explain(Query query, int doc)</a> functionality,
|
||||
<a href="api/core/org/apache/lucene/search/Searcher.html#explain(Query query, int doc)">Searcher.explain(Query query, int doc)</a> functionality,
|
||||
which can go a long way in informing why a score is returned.
|
||||
</p>
|
||||
<section id="Fields and Documents"><title>Fields and Documents</title>
|
||||
<p>In Lucene, the objects we are scoring are
|
||||
<a href="api/org/apache/lucene/document/Document.html">Documents</a>. A Document is a collection
|
||||
<a href="api/core/org/apache/lucene/document/Document.html">Documents</a>. A Document is a collection
|
||||
of
|
||||
<a href="api/org/apache/lucene/document/Field.html">Fields</a>. Each Field has semantics about how
|
||||
<a href="api/core/org/apache/lucene/document/Field.html">Fields</a>. Each Field has semantics about how
|
||||
it is created and stored (i.e. tokenized, untokenized, raw data, compressed, etc.) It is important to
|
||||
note that Lucene scoring works on Fields and then combines the results to return Documents. This is
|
||||
important because two Documents with the exact same content, but one having the content in two Fields
|
||||
and the other in one Field will return different scores for the same query due to length normalization
|
||||
(assumming the
|
||||
<a href="api/org/apache/lucene/search/DefaultSimilarity.html">DefaultSimilarity</a>
|
||||
<a href="api/core/org/apache/lucene/search/DefaultSimilarity.html">DefaultSimilarity</a>
|
||||
on the Fields).
|
||||
</p>
|
||||
</section>
|
||||
|
@ -70,17 +70,17 @@
|
|||
<ul>
|
||||
<li><b>Document level boosting</b>
|
||||
- while indexing - by calling
|
||||
<a href="api/org/apache/lucene/document/Document.html#setBoost(float)">document.setBoost()</a>
|
||||
<a href="api/core/org/apache/lucene/document/Document.html#setBoost(float)">document.setBoost()</a>
|
||||
before a document is added to the index.
|
||||
</li>
|
||||
<li><b>Document's Field level boosting</b>
|
||||
- while indexing - by calling
|
||||
<a href="api/org/apache/lucene/document/Fieldable.html#setBoost(float)">field.setBoost()</a>
|
||||
<a href="api/core/org/apache/lucene/document/Fieldable.html#setBoost(float)">field.setBoost()</a>
|
||||
before adding a field to the document (and before adding the document to the index).
|
||||
</li>
|
||||
<li><b>Query level boosting</b>
|
||||
- during search, by setting a boost on a query clause, calling
|
||||
<a href="api/org/apache/lucene/search/Query.html#setBoost(float)">Query.setBoost()</a>.
|
||||
<a href="api/core/org/apache/lucene/search/Query.html#setBoost(float)">Query.setBoost()</a>.
|
||||
</li>
|
||||
</ul>
|
||||
</p>
|
||||
|
@ -99,68 +99,68 @@
|
|||
<p>This composition of 1-byte representation of norms
|
||||
(that is, indexing time multiplication of field boosts & doc boost & field-length-norm)
|
||||
is nicely described in
|
||||
<a href="api/org/apache/lucene/document/Fieldable.html#setBoost(float)">Fieldable.setBoost()</a>.
|
||||
<a href="api/core/org/apache/lucene/document/Fieldable.html#setBoost(float)">Fieldable.setBoost()</a>.
|
||||
</p>
|
||||
<p>Encoding and decoding of the resulted float norm in a single byte are done by the
|
||||
static methods of the class Similarity:
|
||||
<a href="api/org/apache/lucene/search/Similarity.html#encodeNorm(float)">encodeNorm()</a> and
|
||||
<a href="api/org/apache/lucene/search/Similarity.html#decodeNorm(byte)">decodeNorm()</a>.
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html#encodeNorm(float)">encodeNorm()</a> and
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html#decodeNorm(byte)">decodeNorm()</a>.
|
||||
Due to loss of precision, it is not guaranteed that decode(encode(x)) = x,
|
||||
e.g. decode(encode(0.89)) = 0.75.
|
||||
At scoring (search) time, this norm is brought into the score of document
|
||||
as <b>norm(t, d)</b>, as shown by the formula in
|
||||
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>.
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a>.
|
||||
</p>
|
||||
</section>
|
||||
<section id="Understanding the Scoring Formula"><title>Understanding the Scoring Formula</title>
|
||||
|
||||
<p>
|
||||
This scoring formula is described in the
|
||||
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a> class. Please take the time to study this formula, as it contains much of the information about how the
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a> class. Please take the time to study this formula, as it contains much of the information about how the
|
||||
basics of Lucene scoring work, especially the
|
||||
<a href="api/org/apache/lucene/search/TermQuery.html">TermQuery</a>.
|
||||
<a href="api/core/org/apache/lucene/search/TermQuery.html">TermQuery</a>.
|
||||
</p>
|
||||
</section>
|
||||
<section id="The Big Picture"><title>The Big Picture</title>
|
||||
<p>OK, so the tf-idf formula and the
|
||||
<a href="api/org/apache/lucene/search/Similarity.html">Similarity</a>
|
||||
<a href="api/core/org/apache/lucene/search/Similarity.html">Similarity</a>
|
||||
is great for understanding the basics of Lucene scoring, but what really drives Lucene scoring are
|
||||
the use and interactions between the
|
||||
<a href="api/org/apache/lucene/search/Query.html">Query</a> classes, as created by each application in
|
||||
<a href="api/core/org/apache/lucene/search/Query.html">Query</a> classes, as created by each application in
|
||||
response to a user's information need.
|
||||
</p>
|
||||
<p>In this regard, Lucene offers a wide variety of <a href="api/org/apache/lucene/search/Query.html">Query</a> implementations, most of which are in the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a> package.
|
||||
<p>In this regard, Lucene offers a wide variety of <a href="api/core/org/apache/lucene/search/Query.html">Query</a> implementations, most of which are in the
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a> package.
|
||||
These implementations can be combined in a wide variety of ways to provide complex querying
|
||||
capabilities along with
|
||||
information about where matches took place in the document collection. The <a href="#Query Classes">Query</a>
|
||||
section below
|
||||
highlights some of the more important Query classes. For information on the other ones, see the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html">package summary</a>. For details on implementing
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html">package summary</a>. For details on implementing
|
||||
your own Query class, see <a href="#Changing your Scoring -- Expert Level">Changing your Scoring --
|
||||
Expert Level</a> below.
|
||||
</p>
|
||||
<p>Once a Query has been created and submitted to the
|
||||
<a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>, the scoring process
|
||||
<a href="api/core/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>, the scoring process
|
||||
begins. (See the <a
|
||||
href="#Appendix">Appendix</a> Algorithm section for more notes on the process.) After some infrastructure setup,
|
||||
control finally passes to the <a href="api/org/apache/lucene/search/Weight.html">Weight</a> implementation and its
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a> instance. In the case of any type of
|
||||
<a href="api/org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>, scoring is handled by the
|
||||
control finally passes to the <a href="api/core/org/apache/lucene/search/Weight.html">Weight</a> implementation and its
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a> instance. In the case of any type of
|
||||
<a href="api/core/org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>, scoring is handled by the
|
||||
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight2</a> (link goes to ViewVC BooleanQuery java code which contains the BooleanWeight2 inner class),
|
||||
unless the static
|
||||
<a href="api/org/apache/lucene/search/BooleanQuery.html#setUseScorer14(boolean)">
|
||||
BooleanQuery#setUseScorer14(boolean)</a> method is set to true,
|
||||
unless
|
||||
<a href="api/core/org/apache/lucene/search/Weight.html#scoresDocsOutOfOrder()">
|
||||
Weight#scoresDocsOutOfOrder()</a> method is set to true,
|
||||
in which case the
|
||||
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanQuery.java?view=log">BooleanWeight</a>
|
||||
(link goes to ViewVC BooleanQuery java code, which contains the BooleanWeight inner class) from the 1.4 version of Lucene is used by default.
|
||||
See <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/CHANGES.txt">CHANGES.txt</a> under release 1.9 RC1 for more information on choosing which Scorer to use.
|
||||
</p>
|
||||
<p>
|
||||
<p>ry#setUseScorer14(boolean)
|
||||
Assuming the use of the BooleanWeight2, a
|
||||
BooleanScorer2 is created by bringing together
|
||||
all of the
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>s from the sub-clauses of the BooleanQuery.
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>s from the sub-clauses of the BooleanQuery.
|
||||
When the BooleanScorer2 is asked to score it delegates its work to an internal Scorer based on the type
|
||||
of clauses in the Query. This internal Scorer essentially loops over the sub scorers and sums the scores
|
||||
provided by each scorer while factoring in the coord() score.
|
||||
|
@ -169,20 +169,20 @@
|
|||
</section>
|
||||
<section id="Query Classes"><title>Query Classes</title>
|
||||
<p>For information on the Query Classes, refer to the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html#query">search package javadocs</a>
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html#query">search package javadocs</a>
|
||||
</p>
|
||||
</section>
|
||||
<section id="Changing Similarity"><title>Changing Similarity</title>
|
||||
<p>One of the ways of changing the scoring characteristics of Lucene is to change the similarity factors. For information on
|
||||
how to do this, see the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html#changingSimilarity">search package javadocs</a></p>
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html#changingSimilarity">search package javadocs</a></p>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
<section id="Changing your Scoring -- Expert Level"><title>Changing your Scoring -- Expert Level</title>
|
||||
<p>At a much deeper level, one can affect scoring by implementing their own Query classes (and related scoring classes.) To learn more
|
||||
about how to do this, refer to the
|
||||
<a href="api/org/apache/lucene/search/package-summary.html#scoring">search package javadocs</a>
|
||||
<a href="api/core/org/apache/lucene/search/package-summary.html#scoring">search package javadocs</a>
|
||||
</p>
|
||||
</section>
|
||||
|
||||
|
@ -200,29 +200,29 @@
|
|||
<p>This section is mostly notes on stepping through the Scoring process and serves as
|
||||
fertilizer for the earlier sections.</p>
|
||||
<p>In the typical search application, a
|
||||
<a href="api/org/apache/lucene/search/Query.html">Query</a>
|
||||
<a href="api/core/org/apache/lucene/search/Query.html">Query</a>
|
||||
is passed to the
|
||||
<a
|
||||
href="api/org/apache/lucene/search/Searcher.html">Searcher</a>
|
||||
href="api/core/org/apache/lucene/search/Searcher.html">Searcher</a>
|
||||
, beginning the scoring process.
|
||||
</p>
|
||||
<p>Once inside the Searcher, a
|
||||
<a href="api/org/apache/lucene/search/HitCollector.html">HitCollector</a>
|
||||
<a href="api/core/org/apache/lucene/search/Collector.html">Collector</a>
|
||||
is used for the scoring and sorting of the search results.
|
||||
These important objects are involved in a search:
|
||||
<ol>
|
||||
<li>The
|
||||
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
<a href="api/core/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
object of the Query. The Weight object is an internal representation of the Query that
|
||||
allows the Query to be reused by the Searcher.
|
||||
</li>
|
||||
<li>The Searcher that initiated the call.</li>
|
||||
<li>A
|
||||
<a href="api/org/apache/lucene/search/Filter.html">Filter</a>
|
||||
<a href="api/core/org/apache/lucene/search/Filter.html">Filter</a>
|
||||
for limiting the result set. Note, the Filter may be null.
|
||||
</li>
|
||||
<li>A
|
||||
<a href="api/org/apache/lucene/search/Sort.html">Sort</a>
|
||||
<a href="api/core/org/apache/lucene/search/Sort.html">Sort</a>
|
||||
object for specifying how to sort the results if the standard score based sort method is not
|
||||
desired.
|
||||
</li>
|
||||
|
@ -230,45 +230,45 @@
|
|||
</p>
|
||||
<p> Assuming we are not sorting (since sorting doesn't
|
||||
effect the raw Lucene score),
|
||||
we call one of the search method of the Searcher, passing in the
|
||||
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
we call one of the search methods of the Searcher, passing in the
|
||||
<a href="api/core/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
object created by Searcher.createWeight(Query),
|
||||
<a href="api/org/apache/lucene/search/Filter.html">Filter</a>
|
||||
<a href="api/core/org/apache/lucene/search/Filter.html">Filter</a>
|
||||
and the number of results we want. This method
|
||||
returns a
|
||||
<a href="api/org/apache/lucene/search/TopDocs.html">TopDocs</a>
|
||||
<a href="api/core/org/apache/lucene/search/TopDocs.html">TopDocs</a>
|
||||
object, which is an internal collection of search results.
|
||||
The Searcher creates a
|
||||
<a href="api/org/apache/lucene/search/TopDocCollector.html">TopDocCollector</a>
|
||||
<a href="api/core/org/apache/lucene/search/TopScoreDocCollector.html">TopScoreDocCollector</a>
|
||||
and passes it along with the Weight, Filter to another expert search method (for more on the
|
||||
<a href="api/org/apache/lucene/search/HitCollector.html">HitCollector</a>
|
||||
<a href="api/core/org/apache/lucene/search/Collector.html">Collector</a>
|
||||
mechanism, see
|
||||
<a href="api/org/apache/lucene/search/Searcher.html">Searcher</a>
|
||||
<a href="api/core/org/apache/lucene/search/Searcher.html">Searcher</a>
|
||||
.) The TopDocCollector uses a
|
||||
<a href="api/org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>
|
||||
<a href="api/core/org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>
|
||||
to collect the top results for the search.
|
||||
</p>
|
||||
<p>If a Filter is being used, some initial setup is done to determine which docs to include. Otherwise,
|
||||
we ask the Weight for
|
||||
a
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
for the
|
||||
<a href="api/org/apache/lucene/index/IndexReader.html">IndexReader</a>
|
||||
<a href="api/core/org/apache/lucene/index/IndexReader.html">IndexReader</a>
|
||||
of the current searcher and we proceed by
|
||||
calling the score method on the
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
.
|
||||
</p>
|
||||
<p>At last, we are actually going to score some documents. The score method takes in the HitCollector
|
||||
(most likely the TopDocCollector) and does its business.
|
||||
<p>At last, we are actually going to score some documents. The score method takes in the Collector
|
||||
(most likely the TopScoreDocCollector or TopFieldCollector) and does its business.
|
||||
Of course, here is where things get involved. The
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
that is returned by the
|
||||
<a href="api/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
<a href="api/core/org/apache/lucene/search/Weight.html">Weight</a>
|
||||
object depends on what type of Query was submitted. In most real world applications with multiple
|
||||
query terms,
|
||||
the
|
||||
<a href="api/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
<a href="api/core/org/apache/lucene/search/Scorer.html">Scorer</a>
|
||||
is going to be a
|
||||
<a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java?view=log">BooleanScorer2</a>
|
||||
(see the section on customizing your scoring for info on changing this.)
|
||||
|
|