From 571150f303127515a47c5c7f3d40acca99294489 Mon Sep 17 00:00:00 2001 From: Mark Robert Miller Date: Wed, 26 Aug 2009 15:35:26 +0000 Subject: [PATCH] versioned site updates/fixes for 2.9 git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@808056 13f79535-47bb-0310-9956-ffa450edef68 --- docs/benchmarktemplate.xml | 61 ---------- docs/contributions.html | 3 - docs/demo.html | 3 - docs/demo2.html | 9 +- docs/demo2.pdf | 52 ++++---- docs/demo3.html | 3 - docs/demo4.html | 3 - docs/fileformats.html | 3 - docs/gettingstarted.html | 3 - docs/index.html | 3 - docs/linkmap.html | 9 -- docs/linkmap.pdf | 24 ++-- docs/lucene-sandbox/index.html | 3 - docs/queryparsersyntax.html | 3 - docs/scoring.html | 111 +++++++++--------- docs/scoring.pdf | 108 ++++++++--------- .../images/rc-b-l-15-1body-2menu-3menu.png | Bin 350 -> 348 bytes .../images/rc-b-r-15-1body-2menu-3menu.png | Bin 308 -> 319 bytes ...-5-1header-2tab-selected-3tab-selected.png | Bin 191 -> 200 bytes ...rc-t-l-5-1header-2searchbox-3searchbox.png | Bin 197 -> 199 bytes ...-5-1header-2tab-selected-3tab-selected.png | Bin 222 -> 209 bytes ...header-2tab-unselected-3tab-unselected.png | Bin 197 -> 199 bytes .../images/rc-t-r-15-1body-2menu-3menu.png | Bin 390 -> 390 bytes ...rc-t-r-5-1header-2searchbox-3searchbox.png | Bin 207 -> 214 bytes ...-5-1header-2tab-selected-3tab-selected.png | Bin 219 -> 215 bytes ...header-2tab-unselected-3tab-unselected.png | Bin 207 -> 214 bytes .../src/documentation/content/xdocs/demo2.xml | 56 ++++----- .../documentation/content/xdocs/scoring.xml | 108 ++++++++--------- 28 files changed, 231 insertions(+), 334 deletions(-) delete mode 100644 docs/benchmarktemplate.xml diff --git a/docs/benchmarktemplate.xml b/docs/benchmarktemplate.xml deleted file mode 100644 index df7601f20f9..00000000000 --- a/docs/benchmarktemplate.xml +++ /dev/null @@ -1,61 +0,0 @@ - - - diff --git a/docs/contributions.html b/docs/contributions.html index 14736bb5624..1faa4bab6c8 100644 --- a/docs/contributions.html +++ b/docs/contributions.html @@ -202,9 +202,6 @@ document.write("Last Published: " + document.lastModified); - diff --git a/docs/demo.html b/docs/demo.html index 1935f4dac0c..1853f943b4c 100644 --- a/docs/demo.html +++ b/docs/demo.html @@ -203,9 +203,6 @@ document.write("Last Published: " + document.lastModified); - - - - - - - - - - - @@ -342,21 +339,21 @@ document.write("Last Published: " + document.lastModified); and the Lucene file formats before continuing on with this section.) It is also assumed that readers know how to use the - Searcher.explain(Query query, int doc) functionality, + Searcher.explain(Query query, int doc) functionality, which can go a long way in informing why a score is returned.

Fields and Documents

In Lucene, the objects we are scoring are - Documents. A Document is a collection + Documents. A Document is a collection of - Fields. Each Field has semantics about how + Fields. Each Field has semantics about how it is created and stored (i.e. tokenized, untokenized, raw data, compressed, etc.) It is important to note that Lucene scoring works on Fields and then combines the results to return Documents. This is important because two Documents with the exact same content, but one having the content in two Fields and the other in one Field will return different scores for the same query due to length normalization (assumming the - DefaultSimilarity + DefaultSimilarity on the Fields).

@@ -367,21 +364,21 @@ document.write("Last Published: " + document.lastModified);
  • Document level boosting - while indexing - by calling - document.setBoost() + document.setBoost() before a document is added to the index.
  • Document's Field level boosting - while indexing - by calling - field.setBoost() + field.setBoost() before adding a field to the document (and before adding the document to the index).
  • Query level boosting - during search, by setting a boost on a query clause, calling - Query.setBoost(). + Query.setBoost().
  • @@ -402,66 +399,66 @@ document.write("Last Published: " + document.lastModified);

    This composition of 1-byte representation of norms (that is, indexing time multiplication of field boosts & doc boost & field-length-norm) is nicely described in - Fieldable.setBoost(). + Fieldable.setBoost().

    Encoding and decoding of the resulted float norm in a single byte are done by the static methods of the class Similarity: - encodeNorm() and - decodeNorm(). + encodeNorm() and + decodeNorm(). Due to loss of precision, it is not guaranteed that decode(encode(x)) = x, e.g. decode(encode(0.89)) = 0.75. At scoring (search) time, this norm is brought into the score of document as norm(t, d), as shown by the formula in - Similarity. + Similarity.

    Understanding the Scoring Formula

    This scoring formula is described in the - Similarity class. Please take the time to study this formula, as it contains much of the information about how the + Similarity class. Please take the time to study this formula, as it contains much of the information about how the basics of Lucene scoring work, especially the - TermQuery. + TermQuery.

    The Big Picture

    OK, so the tf-idf formula and the - Similarity + Similarity is great for understanding the basics of Lucene scoring, but what really drives Lucene scoring are the use and interactions between the - Query classes, as created by each application in + Query classes, as created by each application in response to a user's information need.

    -

    In this regard, Lucene offers a wide variety of Query implementations, most of which are in the - org.apache.lucene.search package. +

    In this regard, Lucene offers a wide variety of Query implementations, most of which are in the + org.apache.lucene.search package. These implementations can be combined in a wide variety of ways to provide complex querying capabilities along with information about where matches took place in the document collection. The Query section below highlights some of the more important Query classes. For information on the other ones, see the - package summary. For details on implementing + package summary. For details on implementing your own Query class, see Changing your Scoring -- Expert Level below.

    Once a Query has been created and submitted to the - IndexSearcher, the scoring process + IndexSearcher, the scoring process begins. (See the Appendix Algorithm section for more notes on the process.) After some infrastructure setup, - control finally passes to the Weight implementation and its - Scorer instance. In the case of any type of - BooleanQuery, scoring is handled by the + control finally passes to the Weight implementation and its + Scorer instance. In the case of any type of + BooleanQuery, scoring is handled by the BooleanWeight2 (link goes to ViewVC BooleanQuery java code which contains the BooleanWeight2 inner class), - unless the static - - BooleanQuery#setUseScorer14(boolean) method is set to true, + unless + + Weight#scoresDocsOutOfOrder() method is set to true, in which case the BooleanWeight (link goes to ViewVC BooleanQuery java code, which contains the BooleanWeight inner class) from the 1.4 version of Lucene is used by default. See CHANGES.txt under release 1.9 RC1 for more information on choosing which Scorer to use.

    -

    +

    ry#setUseScorer14(boolean) Assuming the use of the BooleanWeight2, a BooleanScorer2 is created by bringing together all of the - Scorers from the sub-clauses of the BooleanQuery. + Scorers from the sub-clauses of the BooleanQuery. When the BooleanScorer2 is asked to score it delegates its work to an internal Scorer based on the type of clauses in the Query. This internal Scorer essentially loops over the sub scorers and sums the scores provided by each scorer while factoring in the coord() score. @@ -470,14 +467,14 @@ document.write("Last Published: " + document.lastModified);

    Query Classes

    For information on the Query Classes, refer to the - search package javadocs + search package javadocs

    Changing Similarity

    One of the ways of changing the scoring characteristics of Lucene is to change the similarity factors. For information on how to do this, see the - search package javadocs + search package javadocs

    @@ -486,7 +483,7 @@ document.write("Last Published: " + document.lastModified);

    At a much deeper level, one can affect scoring by implementing their own Query classes (and related scoring classes.) To learn more about how to do this, refer to the - search package javadocs + search package javadocs

    @@ -511,19 +508,19 @@ document.write("Last Published: " + document.lastModified);

    This section is mostly notes on stepping through the Scoring process and serves as fertilizer for the earlier sections.

    In the typical search application, a - Query + Query is passed to the - Searcher + Searcher , beginning the scoring process.

    Once inside the Searcher, a - HitCollector + Collector is used for the scoring and sorting of the search results. These important objects are involved in a search:

    1. The - Weight + Weight object of the Query. The Weight object is an internal representation of the Query that allows the Query to be reused by the Searcher.
    2. @@ -531,12 +528,12 @@ document.write("Last Published: " + document.lastModified);
    3. The Searcher that initiated the call.
    4. A - Filter + Filter for limiting the result set. Note, the Filter may be null.
    5. A - Sort + Sort object for specifying how to sort the results if the standard score based sort method is not desired.
    6. @@ -546,45 +543,45 @@ document.write("Last Published: " + document.lastModified);

      Assuming we are not sorting (since sorting doesn't effect the raw Lucene score), - we call one of the search method of the Searcher, passing in the - Weight + we call one of the search methods of the Searcher, passing in the + Weight object created by Searcher.createWeight(Query), - Filter + Filter and the number of results we want. This method returns a - TopDocs + TopDocs object, which is an internal collection of search results. The Searcher creates a - TopDocCollector + TopScoreDocCollector and passes it along with the Weight, Filter to another expert search method (for more on the - HitCollector + Collector mechanism, see - Searcher + Searcher .) The TopDocCollector uses a - PriorityQueue + PriorityQueue to collect the top results for the search.

      If a Filter is being used, some initial setup is done to determine which docs to include. Otherwise, we ask the Weight for a - Scorer + Scorer for the - IndexReader + IndexReader of the current searcher and we proceed by calling the score method on the - Scorer + Scorer .

      -

      At last, we are actually going to score some documents. The score method takes in the HitCollector - (most likely the TopDocCollector) and does its business. +

      At last, we are actually going to score some documents. The score method takes in the Collector + (most likely the TopScoreDocCollector or TopFieldCollector) and does its business. Of course, here is where things get involved. The - Scorer + Scorer that is returned by the - Weight + Weight object depends on what type of Query was submitted. In most real world applications with multiple query terms, the - Scorer + Scorer is going to be a BooleanScorer2 (see the section on customizing your scoring for info on changing this.) diff --git a/docs/scoring.pdf b/docs/scoring.pdf index 4c202a59105..5e1f0624a06 100644 --- a/docs/scoring.pdf +++ b/docs/scoring.pdf @@ -198,10 +198,10 @@ endobj >> endobj 38 0 obj -<< /Length 2333 /Filter [ /ASCII85Decode /FlateDecode ] +<< /Length 2402 /Filter [ /ASCII85Decode /FlateDecode ] >> stream -Gau`UgN)%,&:O:SE2F.gAh8UZMNYMQBB9!D(?3q^sN7(DIuB]hBg6fUtcSce5W@XPV`>]Y2)THJqtApj%&qgVgP;0SKK)T_j)p7hLUEck=Q+?$s;0oATjYkeaOfp=Hd)j`HjWZO=Bpb(bD+2*;bd,d`!c^9W4&r8jiW=4Y7U\%.`NqRaie>%EO8c),X'7Fd]G\j5T2nE=.T'dXS`o/p;/jRi1M0\Rp::Q)YVR?.ko?IWc.r[TIV"RX6lR31.9eSst!f):Xc%_S2(HOaT'"hXgIZ"fCm],Vaqk*@=1P)<_ljRG;BS.;#5WWY+CBdGPh>1h^;Ss32H-4tEYhN2j)@E=NF6I'8-UW59^Z:JYA,.8F(-T>Es#*7h7'cP$b&oZ4?D74N1=A#sam"!`8FRKrbYj;Ks;itCSrPl/P[&5d#=/>'"/Vom)F^kT8:,kM<2C5D+h=PA7lP5bGIZ%K<-p2#8FSm1Y&B#&]mO:sh_8Vki%JYpFZH#T`UB5?E+$S+1,L_qTS7/>?+<[d_7.psNOmm*ce1B.nieUYFSnL,1XHRbpkd/e)CErUTE>r!`5b=h\?TV=9\FcO*%b'e2M[eJgT%"-fjnpA)HrA'Ds/A>'V.GZ()soe:D7;+F-G(!I:D6KGpQZGZs_]tb!'a92G>76I:-8[cBP,!j+6Tm-De^cVabnk8Jg95b7BO",Rm$qEiP$tH93%j8B$a&!F'BFc*E!?T&G7+&7S$\h/uA#2HmVh_]`lJoFLY;fek(2srk+kA$+aW9&96/;kFsW\l_\k&\@^T5X?UVQ53p/]Co/AUN&LF[Dc*DG*_+R4:UAn>M+GO$7nS]L)P)FkI#-4pQ1m9(_&U",6b%fcVT]J%/_m+6%gN=Hi>UmlPdLTQSYf.h#+0b!h<0S3]T$B(g1uqQJ_'Fl[HM-3nZ,h]d[t'e%$64!YKP1@,8aHs1@Fi1Tc^RPUqgX)RVaKO`.[g6R`,<-g:a/sKgbo>C7In(uo`jp@aJ&(T-,0VC`^:uZ3Zl,G`aE4jh4,rt-d8OYGkmB\:nN<\\]2&ANESraNV%bD=30(\k22'q)mGlo.SO7!m)\9\CP4AW/l<39?Hf50Y?2'<"N2*``U"3"Xd1?X=`Td*`+!45?qI>OR[OAW^gOrL;4VqBFJoha+&jeJePlfoKk-'EZ`3aYg^]htg+Nf-F+VYFEN%q;4)0^EmNPoqun;!(r17/PBj:.(,$3fgfjmZko7b7L>BF[lkJ_(%J&?n#oW@%Bl%jq20bUA!E.Z$LR(lG_\S?=Q+LT/E@+1RsPX9(^,^-I'2/1,`s=MtGs@=,28!I6QrZ[5M1-9A?eahnJ,m^B6Q(>*`8pjps\iZ\Adi)g;tNEPh&:`""n@`#Dj>8,;OQQl,RX::,^L?nql1G/+_>JBP>^(:FbLR(*;f9A+j3u>??$Yo/);!Jo;F.QAJ0fgN/+)^p[l/5FfctN#t0OZ9R+D%MBHUnm[0TKql^XD4edbN;<[#<4Bu&$e-Q0hLOll*!SpkhDYW?slq`GbZ7!a'S8%*=FuZVD&c/^sbk13_Si+.MX./(j/Verq4cJ@?a+cd,=GXjgI9\SM`-2"2$b6'A]LPtt~> +Gau`UgN)%,&:O:SE2F.gAo*-E8rnuOVeIr0',9bQP)ejmMk\M]p]2r[_lXGPOl]PQeUIEIPN1ON_.]\34>,nS@7"#!Guh'Vq_+qm@^I4IB"kG&(n:R%*7E9-UYnPK2!3E8#/hW"R;SV3]?@BNA/-d(%5CM&s'8eW-o_YhT&DM7j\:)\RA1*p6Kp-4tEYSe*j+@E=NV6I'8W7^m4mGo"lK6u4e/:2[jp$m*NM.Pur'-/PG[gLTK1Z%Qf_FDhn7Yj;Ks;3>+OrPm;#[&5d#QR1k4/Vom)F^kUK:,kKf2E<9i=g_oslNNW?IZ*#C;g?=QC%sig+IG6$2Gg&[Jr:A;)t4dkH&J"[7KqWi^ALZ+N/XA[edOWW!Wd"h,-mR!NgLX+XCR/o(.Z#Pn3jT;@@a3@8,#obYDqIE`GJut!-oJIpsQZEM:%W%\AN7J>cLF?8XBK+E)QhF6=7Z_3HCSCt-7.:Ign^V%T+RZb`_=^Wiph]BFlP%^4O70"8lR/W74Z!NZ#b!#)jfl,[Eda/6H2W(0\&T1\57rcX\9*T<5ihX7A)D>fI&D,k73gjX])`d0b,^bY>_a%A>\_L7ue?6<1?6MP''VuYr5"2Ar?=SV3j_:G_BPuG/>b0OB@9UdUYI.co@*Cg,Oi7CT:?(>T/YQR8RC']e$H]q7\WW6>rPHH$X/Q-50I8MiOf$jI_f2WIsJ'\itfGP(="FOrl>!N0:g+C`)e6.#Hd@.sQ$@/l%)'dUK2LGaW1jW`:Qeo$#.5m69Cl@>piX9t(gP!LN=JGLQnPB"0EQ$"U+KV88oJC-C;;q=k$]>F0USe>g$-BP;&L-Q^J.]"1]+"WV"n06+31XBP3`mhZPM6-#^3=4h1B%>p`\$8gC?e8?\Yi9^&B]CP6&IAm8,TLQcuV"_)k#dH`p_$bm/+?PXB/K>,I5!j>Ie9Md(224q#=A%nl6@#A-3GKdVJ%kaGc0GU44X$'WJU(K-##n+sb'[HA\96(H$I?.VnX=NuUm[&eLQ&4]%Ig*/!5']u1Q'ib!)`9AADL"@s*2*@r2`j?$,)UL;G`j'm"lF-cc&E+FN1hj^GSPKeB4:D;(nDLXTIE2VY6c]p0dH6Jp/f>XLX'9)X-tT/NkFNXJZFo4`'3:RiX!!*4P,ojhp"`>'9r<">gQV3=a`pl:9YqG$Jl[[jroQXQ3MP.!*ITtj#YNj4Er=7[YT(lrln)i"#P)0.hM!NK4BRS\@1SF[2U_H)NLM0('c&c*&X;.DHI]F.ZCm],K\mZ6jM9r$bP-R>K-D4T+bRW_0?hMcN1WpC]bP^IW2c_l]p$@4=__B^4M&gi`1cM][O0mA@KiWC=&BAcP>$Vc&M(ls2D;P$-UWLY8fptRMnYj7*sE0Muq]SMs[I5;_cOk+IQ`PT7b#Sj1c4Y7ZuIb\FhO7dm.GcPS8DIiG7UL+mQjRsUm*f=?"$/M-[\$\85g;!7iXt:;YSIE)HGJB%/k,gII'SqE=b*kOJ7:.3'=.1l>c1a@olnR(E=)T"Yr_6ngHH?J`6D$oEP=i<'Ts0XDrogQ(#9kaV:Z8[Aep9]f49ik0C.kZV+0h5ladnXI0U_qVlqH2g[8!R3@;(RCIf(Y$B]ka&W/RTN`ONg@YPDX'PasWr(k;_o\,ahDt5;FXNSp+V$.:qDFI1Cr^aI^th5Q*0Hr~> endstream endobj 39 0 obj @@ -213,10 +213,10 @@ endobj >> endobj 40 0 obj -<< /Length 2003 /Filter [ /ASCII85Decode /FlateDecode ] +<< /Length 1958 /Filter [ /ASCII85Decode /FlateDecode ] >> stream -Gatm<968iG&AII3n.pAj5mY3X9.)A?DUZ8"2M\jKLkI$!'YsdB*VJ##,Xl\r!OCP'G\hFI!k\?sm^ES:R*T*bBA[@S>Nt$(ZeS'bQR%Fe)o^Y)dkF+N-`FMAI@@j:c^nF^*Y.fYoB"3b(]!n4Hc*][5`$S.o8WM=R@k9lZon)S#5q;0ZJ^>bX6G82,#V`dVC>`*nOTag"?d)l*`EnHa"4^qHW&>?Vj81&1%X`Q=n6oja45]6_'$H3`rOH9XR-IJrt!0^gZ$[C)`.aBsE@OAQdgr-ou+Pf>+EK+Z-8Q=%&JS>b!X-YuYCU>/Q=C`N:H3]<8C&W3t8^!aIR4\04>@PI];QQ%PW[@QBI.k9pX.hun>K@&.Pn923_jRr0hs#5NG'G`-s@Cr2l(9/_>=nB(2(&PJ8mpcuj,)jCdpE^UcHShd)$oWVV0G7MI%=EiL9s7g0X0rHf8-S`EHO"8R9#&d?P[I0.6EPg"47l%?^3-<"j'W[5X7_T\&&>&J6j]TC04kq2B#n*.q4[qO^/.k=:[e$A0?1`u-YmCAJK`k^[GT'W,hL4UXZ^,_D$kUoADPaonm$.sk_QU]%iW!NaU]=+C*6.;Jck8/lRW(@gHIp$$85jR;`fs#]tEAJ0'J`L80)ZA;BZYh]0pa-.(W)@#RmWqY8n1><#_UE)l+O]n4SB\i*+s\Wj\^H,]rM-o?N+D`UB1iNc%8G[g1?o&JY:\G?T2#EQ4#?F\.(1VPJ=]Z8l'r6I>S-*081u!eB\iHhr<=p*W_hADQq?dt_3O*d(Xf=Wa@o7dSf>i+4;t\rK@%LiAIE`h.u"_e*V?nh^[2XKhfhW?GOdVt/e-b?GF"7+b:4)XjhbHkaNT"jqA'+Mq03nd&gkcCU*qAN!@#")i/QG'DW#Z65-9E93/@mQN2qZ:MU9$d23"CrMcLBfAYUh#oI(Uql_g8_W^e+95Miq3iteh/4&L5.TaVSo;5)Lqp<)mM^ab[feJRYEAN7W +Gatm!AP7IDDX99M^CNT0C&=b'B-$0nW.T,')HMrtn]LBILoEjP949tpVChQS^E:uY"%ZL\!DqM!^d2)"$YpQ)f]FYMjN/C47@li'BkGQ2/;cTG!>4nk;8hg,OJ_>n/B@lQq;N""<9%nD7fVHXs+*[ZMD&N\C6qBMTGBGt'XfluYSaTN*9An*Y8ANmtN%W)+U<'H%+3XB^b`t_cs`ApD-RZ4g25k_3';GPEeN,/ZsU_@F3BqRG55gW49;BH'%*Ba+VLiP9H(KH@C:Oj+.r;LS;n9VZH3st\dYNuLk9WV$L-=DlW2tBf_`$;en+SUbBCM@)$a7-,c2bY8e0@!Cc*9ueJnf?ZWo)3\/bF?grJ9ue:ZE4k<6N6lbM2_!FiW#%9;3iFk%7VC+=FT@t$`8*4PI;mW#lEk"22kFt4G:3m8PuKroL>0/#-q(R68=jUnB6\^:[9rFZkX%fIY&s=Z?\oqMuU9U1Yc&b]VU!2h!H6K%6$dkrk']WHlJ+;MPM^'[kGKjLF5&kBUa<*S77o^Z?Ed3O@'>.bdi]#g$OXN`8>7W^b_g[/r6f?SkTHr5,,rL6jh1/?>n+%3GLZFXR]r)X7-23p]ufXba:Mm_l:6E78GS&@tB&ap^jl-l!ISX=cdXbrVsA0bs)o07s8kB,q(o4r*^klI:goIn3`5>Bq#_HVLXko[HR\e_Y<&=9A1QMNXq="'/3bBH/6B!AK:jD0;cu1XAmQR;26HuQuYjG:dHdb(!**C\eUbtG0[3jA8dMrh7BiLI&bQ7NpTHjWB4r+Up?Ztg[#BN4+:s@'PX(QT`cV_F39W2O>RVHb-]NH)cjnu^qltO0`5IEL5MuOLIqYRdBgZr=B!oE0Q85[f2!\K>0G#.CJJI>"Iqb:)^Al?eHCQg?!G[%ehlU`ft#r)Y%$=N1fp2u"YI:*idGcI)$0e.0\9&moB[e.K+[6nkdX-H$UIM+kQK+]0aQj/iIl89KciF^nH.sNEej.oX^KPkp^2>WFioZiV):t:G,=oh*cOb6]HjN[#i;;WW1k0&e$\3@RI'uS\aCiJf&`7fW8]>G1$k`6I(#UT`b[i=M0`b_'cN;3'E.hj\4ZZ8D=oV9d(q0>KU5WpDs3"1K4HUEKJV9FN(K;i7M(Y>P(=,+@KTB@d/1^CBm:DWh%NH%Z6i00%hs=7.Y-$_]NoTKET1QCVVaU:Z??P\3F'!Zd/'OR:ZUD"JF5!Z!J4_V-W/K7h6lCW+*PJ@?LZ7/T!Mpf=I=]F#='nu<3F#&G=\E2m63eAjX(c2KOUge::[g]Zobn:4[^^he^3eG00T*n(P.pHki?#KMF3&Lr4kZeMSX8#U-)-O3,9O!H=j7]N)pY0]0(!n]HG+@MUM:hI`t;Jgq=`_N,Qb@eY,)l=aKT#,MAN_@t4c-GB`NO;&5s)%,U^<3(A0+HWm"[E!YtahX&3`?"ql6N$--h.e*CI=rmA[(4_^"IDn/1p5f^#-PsGGo6iP%+L#^M4,67C>Cna"hk;''E_Y>=*Fdrf~> endstream endobj 41 0 obj @@ -228,10 +228,10 @@ endobj >> endobj 42 0 obj -<< /Length 1325 /Filter [ /ASCII85Decode /FlateDecode ] +<< /Length 1277 /Filter [ /ASCII85Decode /FlateDecode ] >> stream -Gat%#>E>75'RnB35h^ufV3V2AQH.9mB.mGfgJ1dtkk)]A'ANY$Cm-^SokPZYT,C2/MA#_MHnOP8"QDqo%QUcPaVTp`J4FBYA\adfUJIEb1D%&8gDM,1k8NE$JG4IZ"mC(Q+@\7\acnR-p_-&bE?Ze[J.WS$fM:_C?n"s3=!6G><"A8`6F"*%73>7c'G[Hl3gs*nTp8Lb,RUs"X+Ar.VD;r8Tc`ScC)&obam]*#%P2g,@sdl2ua+<>m1lLATc=P1m3nK7LJj;$-HJt7E-gO.h-OC=dmZ`\<3//khn?Aca+tap@)$B%-_IVrF([CDGSud#O1d+<#op:ap^S2F]__43n'0bLg^IZX8T2M-&2MT!T`nI*)o3%^LLr(oHE*iRW4*iK.WTZ9P##4!oQTE@=s$>'!8?*C$.@Qs-b[t:&@0RXI&%Jq@Eopoa7[bn5%K+>W1Yr=sb7=*I(^$VXheg0Wc7ZkBD6I/j8h5%D5+2'G.qk:uQP.G5&?!B?/>plqQi3MT,[MQ;tc@4Dego4&"en#M5XK?E]d7'm]aMF)^Jk^.+\)!87fSfY-[O!j9JD"Qd)O[ElXY!lS?S:COH9X6ml1?9dLPQ]KT`t+22ZAeG~> +Gat%#>Ar7S'Roe[&?pqd5bgrWDLPjEA_8[ZCZD*!b8r'gZIhaE6;i!_?U%/kU5t>qXU#p9Luc*okK:F(lK@R"H0%+LoD<+,nQaCsj4O8?UOIXWVhl?SOuh:]j6H'EO2&;f*'Z,=gN)Bb[lj9F4/\QFi5T*eC`DQ@S+iY[Bs*/;j4jh@Z^[k^7OUFI2>`0[a,/@^SVXC]+*/Ui#Ed?:_R2CLcY-Us*.)ufD%^/K6m8Lt.&IID\$Q&#_M\$+%7_qS:,,M/Ei3[`jg<<%-P_%kIAj(I)]2+-fNYprHjACf$G\@0(\O@8+E`Up%)>DE7eF>m8TcC@"Uq\0^h42a@Q,,9pe9AD!q%90@R(,kBhN^P+pCK3;)*K8_$C'$IdF!9]ab2?W@htReeT1=Jda`<,UG<%'!__r8__fI-^k?*ms9KNHDBF(VCRbVeXEm'kl5JbXJp$cgC%ha*/6*"#nmVa:OoX!"dt[#6I_7h'@QsQ92W9nJMKu1kaHu!)M)S.ERPMN]Mhujm$jj/d=aWd2*+jo-"\LaV3hdP]HV`9gKcPqp6e";<0C"!jX6n9`k(t1OAWn#nBM'-.@F.9nhJB[c+Ybqq-F>00MY<<6&l]'X*G5"$53skg]un[f&SfZMNB,In"i8)@#Z[.?hGO)523(>mT,`!BXEV'Ni@(;Z5KpK5\F9s8c%3s8IL&<[!h-]o:[Y[=D\tt9AsT]W*%%r8o9F7=DGt^-&11X5lO`d@dUpI?V9IW+L@1glK=BIZZRTM/`rnT)"D(d-RPRlhUFm!hsZShQ9T?b[acr(o)q^uqloLAmM)A.rP[[t3Y.FEWn6oQ7HR^*S*5W'M_37m"]H>Q2Ds:*9QdHT?69_Ebn!Krl1iWS*kUQ7_Hp2O1VjrN9N=T#\S/9t\6J6c"&IQ$V.aJ#94;)2=`i=3IZ)bBiRbnG6Puu^Fd+qD).*5KF7,2Ca\OIO`8)\Yi8"7ejcnMlDCt-nm,`kqiF1)opk+R?QafV/IMI*F&G9snEdSPqQT@Q)QZ]l_2PHTeB9iH=4'R">2YfMnr_;@K5nZZVZLFX1h8GG>L;l9(Hs endstream endobj 43 0 obj @@ -437,43 +437,43 @@ endobj 21 0 obj << /S /GoTo -/D [39 0 R /XYZ 85.0 297.0 null] +/D [39 0 R /XYZ 85.0 310.2 null] >> endobj 23 0 obj << /S /GoTo -/D [39 0 R /XYZ 85.0 245.747 null] +/D [39 0 R /XYZ 85.0 258.947 null] >> endobj 25 0 obj << /S /GoTo -/D [39 0 R /XYZ 85.0 181.294 null] +/D [39 0 R /XYZ 85.0 194.494 null] >> endobj 27 0 obj << /S /GoTo -/D [41 0 R /XYZ 85.0 624.6 null] +/D [41 0 R /XYZ 85.0 637.8 null] >> endobj 29 0 obj << /S /GoTo -/D [41 0 R /XYZ 85.0 593.466 null] +/D [41 0 R /XYZ 85.0 606.666 null] >> endobj 31 0 obj << /S /GoTo -/D [41 0 R /XYZ 85.0 542.213 null] +/D [41 0 R /XYZ 85.0 555.413 null] >> endobj 33 0 obj << /S /GoTo -/D [41 0 R /XYZ 85.0 490.96 null] +/D [41 0 R /XYZ 85.0 504.16 null] >> endobj 44 0 obj @@ -484,68 +484,68 @@ endobj xref 0 63 0000000000 65535 f -0000017815 00000 n -0000017908 00000 n -0000018000 00000 n +0000017791 00000 n +0000017884 00000 n +0000017976 00000 n 0000000015 00000 n 0000000071 00000 n 0000001068 00000 n 0000001188 00000 n 0000001297 00000 n -0000018123 00000 n +0000018099 00000 n 0000001432 00000 n -0000018186 00000 n +0000018162 00000 n 0000001569 00000 n -0000018252 00000 n +0000018228 00000 n 0000001706 00000 n -0000018318 00000 n +0000018294 00000 n 0000001843 00000 n -0000018382 00000 n +0000018358 00000 n 0000001979 00000 n -0000018448 00000 n +0000018424 00000 n 0000002116 00000 n -0000018514 00000 n +0000018490 00000 n 0000002253 00000 n -0000018578 00000 n +0000018554 00000 n 0000002389 00000 n -0000018644 00000 n +0000018620 00000 n 0000002526 00000 n -0000018710 00000 n +0000018686 00000 n 0000002663 00000 n -0000018774 00000 n +0000018750 00000 n 0000002799 00000 n -0000018840 00000 n +0000018816 00000 n 0000002935 00000 n -0000018906 00000 n +0000018882 00000 n 0000003072 00000 n 0000005698 00000 n 0000005806 00000 n 0000008135 00000 n 0000008243 00000 n -0000010669 00000 n -0000010777 00000 n -0000012873 00000 n -0000012981 00000 n -0000014399 00000 n -0000018971 00000 n -0000014507 00000 n -0000014670 00000 n -0000014858 00000 n -0000015078 00000 n -0000015277 00000 n -0000015588 00000 n -0000015792 00000 n -0000015985 00000 n -0000016200 00000 n -0000016521 00000 n -0000016701 00000 n -0000016886 00000 n -0000017103 00000 n -0000017259 00000 n -0000017372 00000 n -0000017482 00000 n -0000017590 00000 n -0000017706 00000 n +0000010738 00000 n +0000010846 00000 n +0000012897 00000 n +0000013005 00000 n +0000014375 00000 n +0000018947 00000 n +0000014483 00000 n +0000014646 00000 n +0000014834 00000 n +0000015054 00000 n +0000015253 00000 n +0000015564 00000 n +0000015768 00000 n +0000015961 00000 n +0000016176 00000 n +0000016497 00000 n +0000016677 00000 n +0000016862 00000 n +0000017079 00000 n +0000017235 00000 n +0000017348 00000 n +0000017458 00000 n +0000017566 00000 n +0000017682 00000 n trailer << /Size 63 @@ -553,5 +553,5 @@ trailer /Info 4 0 R >> startxref -19022 +18998 %%EOF diff --git a/docs/skin/images/rc-b-l-15-1body-2menu-3menu.png b/docs/skin/images/rc-b-l-15-1body-2menu-3menu.png index dd790d316973eecd070581ad25d32085791f34dd..cdb460a1da7304fcc2d5e326db43e27c6fa9b025 100644 GIT binary patch delta 237 zcmVHp6_CSYP=VP+s(GY=O#`I;4_M9J4|sjtky z!Ny9SW>Xy{hQI&*k*8TifQLa{QG$Vqk%@H8AR;5wiGhukOwIC=!VJz9+6)}*tfXrO zfz)7o1_@z)1{P*!(ls+PF)`F<1~7o)omc}2GH57CGB7hUVNFOZ_?!ibh0cOdhBqI+Fr2<}kKxGq n+YFChyk+?E?I**ZKYtkj+WaLkdEpex00000NkvXXu0mjfLEva* delta 239 zcmV_@0IVp^nn`nb!RZkX!ARjjaGZPc( z$rI%NNfprys=)Xs6b4w0=jY{O=qe0jND8!N;NoDzX$a0l4GIHL{5zOxFzh{XjbYo7 pOAOCme*hZ9!tndgKOp|Y002E_C|et0yRQHM002ovPDHLkV1g}@Y+nEX diff --git a/docs/skin/images/rc-b-r-15-1body-2menu-3menu.png b/docs/skin/images/rc-b-r-15-1body-2menu-3menu.png index 790f43fa388874b7d172373244600b9bdd6df834..3eff254fd179852394e8bd960eb1f610f05f2d4f 100644 GIT binary patch delta 208 zcmV;>05AWv0>1)~cYm`Hp6_CSW30BMS>N`I@=d*~r%{D=9>tW;Qk! z26G)HGHnJ1v4x&8d771^MHwVT1xVM-#>&Ew7V1EHRI#$KFxVQZGN{Ullct%8k&!`E zkcS~7%!#x_0@5fVz{Aj-6U4y5&WcDV#5h$XipspZ^JAK0vIVi14iwiPX0wW)k06_6b2mk=Wh7lfZ(09lH0000< KMNUMnLSTXb=vaLK delta 197 zcmV;$06PD_0<;2Hp6_24G}lU|?dP0FbSjm4$^8%}nHL{_*P% z`I@Cg1<2FP!N$tqX`@G;W+o;^24@Q`(lvu@P4Ks2;ACebO*6#*ZMU);^^AqG-gc8Kep1M^NB^ z{0MU@ie{J?D6N8$94O_05&$S3LD7M38!`X@WR@2=SEv9300000NkvXXu0mjf(4tzj diff --git a/docs/skin/images/rc-b-r-5-1header-2tab-selected-3tab-selected.png b/docs/skin/images/rc-b-r-5-1header-2tab-selected-3tab-selected.png index 1987c0aa476fabb406131d64b38b3050df697324..b175f27b16e19464241853fd16c8030d649b6d4a 100644 GIT binary patch delta 88 zcmdnbc!F_4d7zD_i(^Q|t>iB!U;aPez&7IoE6a}^0jx7;&X8F1HB_)kfzi<}uZ^W% snoVqjSezakcdcmK6NYFuzOpbj2CHh;N8AF9@eDxV>FVdQ&MBb@05%gJM*si- delta 79 zcmX@XxSw%Cxrec*i(^Q|t>iB!U;aPez!EWIro;@14GcU7>;L~x<|-=QV7#H>3657XJXs8KLTtFwYS)1JvXgB!Ti!aZ2u+f>Z^`_c>6oOt-ikActZory*+hjK0H6K&$d8^ rnZuwlM1i}3)iG)U>wzW(;|q)op>{gY|JNUT#{dMLu6{1-oD!Mi!aEIbYji|YRV;yYq55s|=@A7`KT~)cB*WozHyK{P|IEMw5-9Co$?)Xm zJBB}h{xN_66G+9=m+#>MU_p?U-+%rxAOj|l8R!52AaE$5b4BTy00000NkvXXu0mjf Dy1Odv delta 110 zcmV-!0FnRE0p0?@REU%kr71@01wY50URV` Qt^fc407*qoM6N<$f{Gw9TL1t6 diff --git a/docs/skin/images/rc-t-l-5-1header-2tab-unselected-3tab-unselected.png b/docs/skin/images/rc-t-l-5-1header-2tab-unselected-3tab-unselected.png index b69812acfd7adaaf41ecb67a27c4434ff9985ad1..e9f4440d1f58d53fa16a2ea141bd04cb69f0a595 100644 GIT binary patch delta 87 zcmX@gc${%Ud4RR2i(^Q|t>i!aZ2u+f>Z^`_c>6oOt-ikActZory*+hjK0H6K&$d8^ rnZuwlM1i}3)iG)U>wzW(;|q)op>{gY|JNUT#{dMLu6{1-oD!Mi!aEIbYji|YRV;yYq55s|=@A7`KTHp8d!py+M&c+}sDa2r@r@~;SqsSmBD!@Ppuz-j^fBrIj z`tzOP>Yb+yw;#S_Sh(p3gQ~m)LrRbxgQ}c3PR-0}W*!a50)PMgWBB*)Kf~+ypBeU_ zzQOR~^#=w6bvXu*>(BucK4<;<{fFV;*;@<+T}v5Wz57I(W`7X){pSzE^Vc64O8Zt~ z7(h@n2>kp1pW(x&Zwz%)*D?I~`I|J&Faut_`@%45&3@7~!@K|rgL_Y2ldc&Sh|^c@ zBTq93+O7>FOp)6Brgz;NQyUGg;l`t^t5~PT{{R2a@bSxchO2j;GHl#` z9_R{o1{GN`21Z6k49!fqO#SieH^a9dzZmANKg>|pzl!1G=YMa+YeqKY_QRJ9xgARw zzW@A1nr4`nK79Vp5LPmc;q%uYq-h2LP*^ZAF)}pG+z4iqq!|SM{rk`G;OQHN9Y-&d zt{D~v3pO4B`=4aZFc<7Sag999AltVexkR335P16f1NoXc*jUKd{QJ*8N;Lmr006wA WXt!t5b0+`*00{s|MNUMnLSTZsMuh|b diff --git a/docs/skin/images/rc-t-r-5-1header-2searchbox-3searchbox.png b/docs/skin/images/rc-t-r-5-1header-2searchbox-3searchbox.png index cbaea9c9a7e5fb4c19a7a1a7166a2078c9fd255e..944ed73333d834f3509955d65fecc604393bf626 100644 GIT binary patch delta 102 zcmV-s0Ga>K0oDPKcV9|LL_t(|+GBkBi!oPq1Fns#-1|bNO`0@QSsvyiL0MqgxB>9U}@c;k-07*qo IM6N<$g6CK)ssI20 delta 95 zcmcb{c%E@Wd4#j4i(^Q|t>iynUcRkoQ%FdD%#h^A=H}+cHtqcUdkse#WYpNi#LU

      |{e63hIGg?50xr{z2(U0RwDt=fJ}Kc-#Q+4Ju6{1-oD!MHjal$Hib{pvDmIXU)LD&dR{V00bXCePh^u z=_o=FCUNA#EmT36Q2<>K9n7(PR;~a5002ov JPDHLkV1io0F311? delta 107 zcmV-x0F?jN0owtPcVtgVL_t(|+GAY3>(qY+AW&D3WYAWYW?*7sWMEmk^*Bh3ft7`Y zL0(Fjp*=r@f#uKNe_+8se}6IDxc{6Xu4*;|69XFf`r{|2AQK}a0{{q16th7(OAi15 N002ovPDHLkV1gF^E3g0n diff --git a/docs/skin/images/rc-t-r-5-1header-2tab-unselected-3tab-unselected.png b/docs/skin/images/rc-t-r-5-1header-2tab-unselected-3tab-unselected.png index cbaea9c9a7e5fb4c19a7a1a7166a2078c9fd255e..944ed73333d834f3509955d65fecc604393bf626 100644 GIT binary patch delta 102 zcmV-s0Ga>K0oDPKcV9|LL_t(|+GBkBi!oPq1Fns#-1|bNO`0@QSsvyiL0MqgxB>9U}@c;k-07*qo IM6N<$g6CK)ssI20 delta 95 zcmcb{c%E@Wd4#j4i(^Q|t>iynUcRkoQ%FdD%#h^A=H}+cHtqcUdkse#WYpNi#LU

      |{e63hIGg?50xr{z2(U0RwDt=fJ}Kc-#Q+4Ju6{1-oD!Mvi or your editor of choice and let's take a look at

      As we discussed in the previous walk-through, the IndexFiles class creates a Lucene +href="api/core/org/apache/lucene/demo/IndexFiles.html">IndexFiles class creates a Lucene Index. Let's take a look at how it does this.

      The first substantial thing the main function does is instantiate IndexWriter. It passes the string +href="api/core/org/apache/lucene/index/IndexWriter.html">IndexWriter. It passes the string "index" and a new instance of a class called StandardAnalyzer. +href="api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer. The "index" string is the name of the filesystem directory where all index information should be stored. Because we're not passing a full path, this will be created as a subdirectory of the current working directory (if it does not already exist). On some platforms, it may be created @@ -55,45 +55,45 @@ in other directories (such as the user's home directory).

      -The IndexWriter is the main +The IndexWriter is the main class responsible for creating indices. To use it you must instantiate it with a path that it can write the index into. If this path does not exist it will first create it. Otherwise it will refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an +href="api/core/org/apache/lucene/store/Directory.html">Directory. In any case, you must also pass an instance of org.apache.lucene.analysis.Analyzer. +href="api/core/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer.

      -The particular Analyzer we +The particular Analyzer we are using, StandardAnalyzer, is +href="api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer, is little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out -useless words and characters from the index. By useless words and characters I mean common language -words such as articles (a, an, the, etc.) and other strings that would be useless for searching +stop words and characters from the index. By stop words and characters I mean common language +words such as articles (a, an, the, etc.) and other strings that may have less value for searching (e.g. 's) . It should be noted that there are different rules for every language, and you should use the proper analyzer for each. Lucene currently provides Analyzers for a number of different languages (see the *Analyzer.java sources under contrib/analyzers/src/java/org/apache/lucene/analysis). +href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/common/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis).

      Looking further down in the file, you should see the indexDocs() code. This recursive function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to +href="api/core/org/apache/lucene/demo/FileDocument.html">FileDocument to create Document objects. The Document is simply a data object to represent the content in the file as well as its creation time and location. These instances are added to the indexWriter. Take a look inside FileDocument. It's not particularly +href="api/core/org/apache/lucene/demo/FileDocument.html">FileDocument. It's not particularly complicated. It just adds fields to the Document. +href="api/core/org/apache/lucene/document/Document.html">Document.

      As you can see there isn't much to creating an index. The devil is in the details. You may also wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more +href="api/core/org/apache/lucene/demo/IndexHTML.html">IndexHTML class. It is a bit more complex but builds upon this example.

      @@ -102,28 +102,28 @@ complex but builds upon this example.
      Searching Files

      -The SearchFiles class is +The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer +href="api/core/org/apache/lucene/search/IndexSearcher.html">IndexSearcher, StandardAnalyzer (which is used in the IndexFiles class as well) and a -QueryParser. The +href="api/core/org/apache/lucene/demo/IndexFiles.html">IndexFiles class as well) and a +QueryParser. The query parser is constructed with an analyzer used to interpret your query text in the same way the documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and -'the'. The Query object contains +'the'. The Query object contains the results from the QueryParser which is passed to +href="api/core/org/apache/lucene/queryParser/QueryParser.html">QueryParser which is passed to the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query +href="api/core/org/apache/lucene/search/Query.html">Query object without using the query parser. The query parser just enables decoding the Lucene query syntax into the corresponding Query object. Search can be executed in +href="api/core/org/apache/lucene/search/Query.html">Query object. Search can be executed in two different ways:

        -
      • Streaming: A HitCollector subclass +
      • Streaming: A HitCollector subclass simply prints out the document ID and score for each matching document.
      • -
      • Paging: Using a TopDocCollector +
      • Paging: Using a TopDocCollector the search results are printed in pages, sorted by score (i. e. relevance).

      diff --git a/src/site/src/documentation/content/xdocs/scoring.xml b/src/site/src/documentation/content/xdocs/scoring.xml index dac9a015d80..f03c264ca7a 100644 --- a/src/site/src/documentation/content/xdocs/scoring.xml +++ b/src/site/src/documentation/content/xdocs/scoring.xml @@ -34,10 +34,10 @@ Lucene Wiki IR references.

      The rest of this document will cover Scoring basics and how to change your - Similarity. Next it will cover ways you can + Similarity. Next it will cover ways you can customize the Lucene internals in Changing your Scoring -- Expert Level which gives details on implementing your own - Query class and related functionality. Finally, we + Query class and related functionality. Finally, we will finish up with some reference material in the Appendix.

      @@ -48,20 +48,20 @@ and the Lucene file formats before continuing on with this section.) It is also assumed that readers know how to use the - Searcher.explain(Query query, int doc) functionality, + Searcher.explain(Query query, int doc) functionality, which can go a long way in informing why a score is returned.

      Fields and Documents

      In Lucene, the objects we are scoring are - Documents. A Document is a collection + Documents. A Document is a collection of - Fields. Each Field has semantics about how + Fields. Each Field has semantics about how it is created and stored (i.e. tokenized, untokenized, raw data, compressed, etc.) It is important to note that Lucene scoring works on Fields and then combines the results to return Documents. This is important because two Documents with the exact same content, but one having the content in two Fields and the other in one Field will return different scores for the same query due to length normalization (assumming the - DefaultSimilarity + DefaultSimilarity on the Fields).

      @@ -70,17 +70,17 @@

      @@ -99,68 +99,68 @@

      This composition of 1-byte representation of norms (that is, indexing time multiplication of field boosts & doc boost & field-length-norm) is nicely described in - Fieldable.setBoost(). + Fieldable.setBoost().

      Encoding and decoding of the resulted float norm in a single byte are done by the static methods of the class Similarity: - encodeNorm() and - decodeNorm(). + encodeNorm() and + decodeNorm(). Due to loss of precision, it is not guaranteed that decode(encode(x)) = x, e.g. decode(encode(0.89)) = 0.75. At scoring (search) time, this norm is brought into the score of document as norm(t, d), as shown by the formula in - Similarity. + Similarity.

      Understanding the Scoring Formula

      This scoring formula is described in the - Similarity class. Please take the time to study this formula, as it contains much of the information about how the + Similarity class. Please take the time to study this formula, as it contains much of the information about how the basics of Lucene scoring work, especially the - TermQuery. + TermQuery.

      The Big Picture

      OK, so the tf-idf formula and the - Similarity + Similarity is great for understanding the basics of Lucene scoring, but what really drives Lucene scoring are the use and interactions between the - Query classes, as created by each application in + Query classes, as created by each application in response to a user's information need.

      -

      In this regard, Lucene offers a wide variety of Query implementations, most of which are in the - org.apache.lucene.search package. +

      In this regard, Lucene offers a wide variety of Query implementations, most of which are in the + org.apache.lucene.search package. These implementations can be combined in a wide variety of ways to provide complex querying capabilities along with information about where matches took place in the document collection. The Query section below highlights some of the more important Query classes. For information on the other ones, see the - package summary. For details on implementing + package summary. For details on implementing your own Query class, see Changing your Scoring -- Expert Level below.

      Once a Query has been created and submitted to the - IndexSearcher, the scoring process + IndexSearcher, the scoring process begins. (See the Appendix Algorithm section for more notes on the process.) After some infrastructure setup, - control finally passes to the Weight implementation and its - Scorer instance. In the case of any type of - BooleanQuery, scoring is handled by the + control finally passes to the Weight implementation and its + Scorer instance. In the case of any type of + BooleanQuery, scoring is handled by the BooleanWeight2 (link goes to ViewVC BooleanQuery java code which contains the BooleanWeight2 inner class), - unless the static - - BooleanQuery#setUseScorer14(boolean) method is set to true, + unless + + Weight#scoresDocsOutOfOrder() method is set to true, in which case the BooleanWeight (link goes to ViewVC BooleanQuery java code, which contains the BooleanWeight inner class) from the 1.4 version of Lucene is used by default. See CHANGES.txt under release 1.9 RC1 for more information on choosing which Scorer to use.

      -

      +

      ry#setUseScorer14(boolean) Assuming the use of the BooleanWeight2, a BooleanScorer2 is created by bringing together all of the - Scorers from the sub-clauses of the BooleanQuery. + Scorers from the sub-clauses of the BooleanQuery. When the BooleanScorer2 is asked to score it delegates its work to an internal Scorer based on the type of clauses in the Query. This internal Scorer essentially loops over the sub scorers and sums the scores provided by each scorer while factoring in the coord() score. @@ -169,20 +169,20 @@

      Query Classes

      For information on the Query Classes, refer to the - search package javadocs + search package javadocs

      Changing Similarity

      One of the ways of changing the scoring characteristics of Lucene is to change the similarity factors. For information on how to do this, see the - search package javadocs

      + search package javadocs

      Changing your Scoring -- Expert Level

      At a much deeper level, one can affect scoring by implementing their own Query classes (and related scoring classes.) To learn more about how to do this, refer to the - search package javadocs + search package javadocs

      @@ -200,29 +200,29 @@

      This section is mostly notes on stepping through the Scoring process and serves as fertilizer for the earlier sections.

      In the typical search application, a - Query + Query is passed to the Searcher + href="api/core/org/apache/lucene/search/Searcher.html">Searcher , beginning the scoring process.

      Once inside the Searcher, a - HitCollector + Collector is used for the scoring and sorting of the search results. These important objects are involved in a search:

      1. The - Weight + Weight object of the Query. The Weight object is an internal representation of the Query that allows the Query to be reused by the Searcher.
      2. The Searcher that initiated the call.
      3. A - Filter + Filter for limiting the result set. Note, the Filter may be null.
      4. A - Sort + Sort object for specifying how to sort the results if the standard score based sort method is not desired.
      5. @@ -230,45 +230,45 @@

        Assuming we are not sorting (since sorting doesn't effect the raw Lucene score), - we call one of the search method of the Searcher, passing in the - Weight + we call one of the search methods of the Searcher, passing in the + Weight object created by Searcher.createWeight(Query), - Filter + Filter and the number of results we want. This method returns a - TopDocs + TopDocs object, which is an internal collection of search results. The Searcher creates a - TopDocCollector + TopScoreDocCollector and passes it along with the Weight, Filter to another expert search method (for more on the - HitCollector + Collector mechanism, see - Searcher + Searcher .) The TopDocCollector uses a - PriorityQueue + PriorityQueue to collect the top results for the search.

        If a Filter is being used, some initial setup is done to determine which docs to include. Otherwise, we ask the Weight for a - Scorer + Scorer for the - IndexReader + IndexReader of the current searcher and we proceed by calling the score method on the - Scorer + Scorer .

        -

        At last, we are actually going to score some documents. The score method takes in the HitCollector - (most likely the TopDocCollector) and does its business. +

        At last, we are actually going to score some documents. The score method takes in the Collector + (most likely the TopScoreDocCollector or TopFieldCollector) and does its business. Of course, here is where things get involved. The - Scorer + Scorer that is returned by the - Weight + Weight object depends on what type of Query was submitted. In most real world applications with multiple query terms, the - Scorer + Scorer is going to be a BooleanScorer2 (see the section on customizing your scoring for info on changing this.)