diff --git a/docs/demo2.html b/docs/demo2.html index f5cb1263cfe..dbbef389133 100644 --- a/docs/demo2.html +++ b/docs/demo2.html @@ -302,27 +302,27 @@ Bring it up in vi or your editor of choice and let

IndexFiles

-As we discussed in the previous walk-through, the IndexFiles class creates a Lucene +As we discussed in the previous walk-through, the IndexFiles class creates a Lucene Index. Let's take a look at how it does this.

-The first substantial thing the main function does is instantiate IndexWriter. It passes the string -"index" and a new instance of a class called StandardAnalyzer. +The first substantial thing the main function does is instantiate IndexWriter. It passes the string +"index" and a new instance of a class called StandardAnalyzer. The "index" string is the name of the filesystem directory where all index information should be stored. Because we're not passing a full path, this will be created as a subdirectory of the current working directory (if it does not already exist). On some platforms, it may be created in other directories (such as the user's home directory).

-The IndexWriter is the main +The IndexWriter is the main class responsible for creating indices. To use it you must instantiate it with a path that it can write the index into. If this path does not exist it will first create it. Otherwise it will -refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an -instance of org.apache.lucene.analysis.Analyzer. +refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an +instance of org.apache.lucene.analysis.Analyzer.

-The particular Analyzer we -are using, StandardAnalyzer, is +The particular Analyzer we +are using, StandardAnalyzer, is little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out stop words and characters from the index. By stop words and characters I mean common language words such as articles (a, an, the, etc.) and other strings that may have less value for searching @@ -332,42 +332,42 @@ different languages (see the *Analyzer.java source

Looking further down in the file, you should see the indexDocs() code. This recursive -function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to +function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to represent the content in the file as well as its creation time and location. These instances are -added to the indexWriter. Take a look inside FileDocument. It's not particularly -complicated. It just adds fields to the Document. +added to the indexWriter. Take a look inside FileDocument. It's not particularly +complicated. It just adds fields to the Document.

As you can see there isn't much to creating an index. The devil is in the details. You may also -wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more +wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more complex but builds upon this example.

- +

Searching Files

-The SearchFiles class is -quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer -(which is used in the IndexFiles class as well) and a -QueryParser. The +The SearchFiles class is +quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer +(which is used in the IndexFiles class as well) and a +QueryParser. The query parser is constructed with an analyzer used to interpret your query text in the same way the documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and -'the'. The Query object contains -the results from the QueryParser which is passed to -the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query +'the'. The Query object contains +the results from the QueryParser which is passed to +the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query parser. The query parser just enables decoding the Lucene query -syntax into the corresponding Query object. Search can be executed in +syntax into the corresponding Query object. Search can be executed in two different ways:

@@ -375,7 +375,7 @@ the search results are printed in pages, sorted by score (i. e. relevance).
- +

The Web example...

diff --git a/docs/demo2.pdf b/docs/demo2.pdf index 006a280c7c6..7fa39bfcdc9 100644 --- a/docs/demo2.pdf +++ b/docs/demo2.pdf @@ -80,10 +80,10 @@ endobj >> endobj 18 0 obj -<< /Length 2732 /Filter [ /ASCII85Decode /FlateDecode ] +<< /Length 2590 /Filter [ /ASCII85Decode /FlateDecode ] >> stream -Gatm>D3*F0')oV[6HGK=`@b5dhYSnZ?A]F#m]Q('38F30=/Kh@AC75!42,'OUDI,j1?ir/,!^#1\UFao?>p0J,/D+hHfP*hb2r7rf?(,pP0l'5o=l:H7sLB]R*i;4ap),q(%X*APJN#->6C=?hMll`GG6*'Z4n*oiG9:.5gJ#Ls,H&?gH3pA(]Z&3Lpd)[\:STUP`3LXKM%c^:+i;0(^^";H+?X%/3lpij_?[,f)V)-,PX]"%OB;`bI`2\SDV#h_$FObG+jnRgEhfY2@s4QT>>Zso8l_Q\Iq]:Ia%Ntnh]m,!d\(Lt4[MWgp+s]Y[9%DM[eF=2bV\)]didnoCaG[aC9&C4ADG.H;"9mPC1O`:B`"&aS,Y&5![Fq9V;=dr2E7WROBpV*O7Sb7U[fEfV^3o7B^1=#[F#5:j`tlp`?_I(_BO08;AGqT]Oofcs(!h'C!?A5s&[+#U6.hpo(`7j8(RN*%q8;+kg`\BG/e,6-Qij_aTh8+>@KCK3(m$mnZ+3,R_?(4K1N=J#$afj,Dm`k7>&8[U')ogeEo)Sh\m^H14*nGm#"7j7Y(Z9kmB/tb384LqV8gO_8cQ+H`LfMSAck8M,8[eZ*(7MQk4&@;Y$$=,MUb[dNjiJm:.TU2rt$!G-9q$a.>FZknl,(<>W:@o5=:Z\Ejp!>6dM(>'ra"iW7V\Fuc"SJrj\K+j8iL&/ZJ,tULcg^6FlQF)jo/=Nb&>s2H8dYl*nF`Z)QncT#Fdpm]6I5.bd'!^24TN'(s95tdm%!UW*dTU@<5^1mq26FP2b/8/#d4J>I=e=!r<2[`9`m$+P]lDCGo#XKFjtL;R`Cu4kc"$%HdP":%?HT6hT=[ad0Zsul.$+CTeNA5nSl@>nW"%dS,mXp'Bb_*8iVBZ$*r;4]hRH:_lI@T2SS2[P+XO/*;dXKrAq$irG4l/Y[,?4<%gs59JBu?2BT8,hddNjR=/)>KOQUfp*TKlp"?981@S=(dr2/cAq)pLOhm$TIG-`2oe3!!Jp\&[;iqNkL(*"IaqG[/3$ir>/$%Tmm4Tk]Y@ObXA!fgi2]%mq2cJ\CU>ft.":a_[Zc,2gIR?"G[/XllgaKjJkBD<"Je%r+TeiR0flD;EmJ8=9'*gC?<,ui+`)S0mq`3@5&h.YPd/8S^6?5<.b*.!N/J5HqU`*FiW`iJ#$P-$_opnO)lmeFu9`kpT5c\6Q6Yep\&(8ikIZDB`Ue+l*!>PonF9$!?0F7Obcocn3?6+sXq^Kt3G*9?FcT]K"'..AY`To=/S35.4'e;2KWN4)Md+O_bBur8+.0XEM3t"Hl5FfshXh,m\$KbP%ZH>pW2sKdg\2iq]-UCFe3DOB0\\oX;bc3:?7D)\1$dEp6K)R9cp'WB4038IL$sQHn?rKQ'"Z]#25!#ZpoSI77cL-B9(jLn3e%@I[2B3j[!:)$>)pAoAaNKr7%r"86JdPs7"#$P2$TBX]YCd0rJm>83)oD0eO5hCrj9n->?l"TtZlIWn]M[0D-T5*4&k?bD-#_s>[Z$hq6A)3Sq@(5/*B"sFA!5%_YqIs678Bii]$Cln6Y%qP;XC/p?kIdE\`cj-0jg!64<6-WW^,PTbVj#MXsPK%+ifk3G8]$@?Tf/]5a'8%?7(A?"=FB%EF,6seNjTpp_+&IY"b0qb.J"@9<23oVp#;?BF(q09,muBl>2gkh+=`8:$#LeM+IPQo&b%)>3<;Z2u]kS$eHk2ise6jkJr_nYnfoRD56%=';`Y'o(qn=_(_X@Hf2Jqd~> +Gatm>D3*F0%/ui*@I-YAiD+#II`oIu[WP\L?n]c$%(gr73D-$rYqk!-HMdB1r(9"JH:*GhG:1G*aG8hDMWt(cnbme(pVLRr.eB!uq&KWiq&T@5bk(JPF+3(+=g3iI/\K-C08o`#ipKP+"m1O3G2[e["\=/nJ_+auQLK4[XUj810*Cpj]+%t!04-rDVOj+H?;-8qeHT'(HL&&2qEZY)LL-B[`3LX+e;7(h&7o<(#WlUr,%89][+"'U&g&[*BiU#&Q3r%f7V\K>nI]]hWnHb[-<."m\Nl<"`EniLa4n-c*-f$)/S'^0^2%b?f,J^+h]R(86DI=leO)gNp+mKXAoHi>*,`hql=kP)\jkEFfhVJ1[OGG-OE9%*6,aqgZ52Ok+RrZiKYQUrNY+7-661bL!m\(QA9Trlj&O>$hKT.KYbcmQF]B*Mp?G>V*+,kO\3uWX=SVDgaIF3m.u'hp$^3;?@hqi0Il/5q)G.jk5O@;6Ogj2H:%=b7'/;U)Q[T5)Vlo(LkR3#&d![=eNJY=lfA>HJCal@&8KJf"ShDlf'=Y#meg9U)RNW[Z:,PsFX(laHDZ0\WqTh7ZrsJ8O.9^cgcN9'>6#9E0Ylg?)VbH^X=C8^M.Xb#MSc=]YQl"]62B36\<^L)7%\&GnXqcM'IaqXr4!(&.7U6$"D.308dVI]cR_JIMVZ@K%]h*4FqeCb@7n9obo[`Z.E0oO[6bG[<`K!R\TObKad?S)[]2_9Mb0=L$IY;#$.)=^;Z%H0@aFWlII6^2*p,N%gq^=l,]3Dpc7=rYC/>=K2>I$.P>%+9s.o,ZKXi!?\&)d22sEPCO]-$(&Q\F:(o'#n8NHV<61t]=$^QF\,nhL9?h8Q6[&*o"\,N>=q]0D\5tWPO8o^F)K"2<]*-Xo&hcPm*WO_qG]m3oSioqNd,RL5]-q%H>i^a"XKkQ`XDF,`[DQn@9Na_*ibhTfT>5[:YVl&\Z>k(eD!k%B\>(f<@ECCBV,ciDPeW'hq94^*q&C8)'H&B[N(W#Ba`X*E?7a`rbEWDY70IBo?/4T]*etW3pE8KDe2Aqa4j#W5[sO5WiR-:fucrXr#Xl1&6U`TFE2ZDENn8LcRf6i1jK7njkMHH\*_f,t6OHE*8%nU@15rciNBl1tKW0QA9P+`;WoN*(A3&@LGK]R`*KjiO'&9!sDN\G8q%0$bWE71sPF'DpHXG.L2f(:tnH3;rr'M(_CMQ.+hi#jmRrZ,o6T`-CMJF!caPQ&+$_H8Z#D/D2jFlnd)XX9(,k@MdY?lJoT83<7(T<7`?PsoGp&)`n3%b$BF=Z&;@t-]2+?Q_[nNn[L)OgR68tVAugP.KDI_;-JT2Td\eI2V\oIlfM-!^YPZ4,WPp`7S.?a+qla&8g,I\HR/@:6?lruTGQN[p^&Pg'aJ^t%M"e_)u>[JfWq.1241u)uQ)JctnB,0Ohc@5hJWM'i?1o8!jZti4mDOq&G7A$2H6'Eh:(/4%B/lnA0J*5"B/]_PHgq0*!sb@W"rfc\TF29XJSbmX_D.HG-sLMH:o?/n9"WmAO'@7\B<*kb6GFSbb%ITRAltPJG@-<+Rr23_9*ma;.*!Q:fGlFu0+F3u#ZpuOXK?\97[ep\8VeBB5.W&@HX=ga4Sc.NfqSf@-f)dT+4)9m$nT(t2['V_XJBi#"S>h6Rh=S%(]ah^HXa3hR4URC$V5VcIdT484>]HN"Fc+U<9l5t@e>-R!bd*b8aHK%5o=^<^/`3$/U.(M"4O@EGSg\7.eN[M>4m$Q>9Ho"om=*Kc[C7d:YmMh(*C<#m^I endstream endobj 19 0 obj @@ -95,10 +95,10 @@ endobj >> endobj 20 0 obj -<< /Length 2312 /Filter [ /ASCII85Decode /FlateDecode ] +<< /Length 2028 /Filter [ /ASCII85Decode /FlateDecode ] >> stream -Gatm=>B?8n'RnB3i%<&EK0b;t+$.k/S$kE7STa%acH^#DA<:Z9(qC9tao;-"J:SpR`"+r#-5P`7kPCb<..H<[c([mY+M`]g:X%1A+Ki:W)nB\K\8O'E]LK_llp%MH1o0S;aK&nP1rg,.Kcmi1GB9-jPeRlYk/D9S_<[KrlbtD7(LMQe@qSue=3nW*VD;&n*=iqg1hcsAEUt5'N?)hAW*&eQA`AbF%<:97u87A`)Kq$.=\poM12s\\F,%K'4FL\gDpVK1?E\/r,O<0^J[0CPnoF.e8^/oUtoWgk"-^AEebFBIJ2I"'?(HLSo@a\i2Asl_9j2Q7ntOVp!pEZgrT1_Q0oati0A?'o4#uq:2#E[A0J53S/$)gTtY:IOh&:Ga#5Pl@bYUNjQ\i%5T@P,44pYFp&fB"1JT:dJ.1OYe;'/r-r-j/X%$S5;Me=bYIL_m)3&l@_T''kV1b(!MB7#.RGXNHW?;"\e6,r<6"Gt>/][Q;B9ji`,24!X,1SmmQcRXd7%(DpWXqai0kK:(KWGq#7@kFPoUimb\hN-?E>%gNYKm_dj0p0!+nlE2>oMFJ.5fhlHlD[Vo+pS'1n95->TRu+S>/F6QcK?B1>.\WgscgOPg&?KOISZXlQCjFM@:SP"pCa7?m)p0$\m>Vc#GELrj%jJ=nmOoR&C2U(RR'^=8IMb>D23I-XC+.fkL5:!^k50gRe*M!prDK7TBEF$U,F#!-5k9YU)_*2SjH.q:kB6g-#hj7AO(\*V'knQ=l=uf.]_@J8LIQTJZ[X(W;39"7rdU7(Om[cr(\o8;O:?Z#mEI\rgV1L?,BqDas9TmV-N#l-I)U=?N?UHh8kW^10[8?k2,d_PS3^tA!hajou!VY$rs$&geA5]Wk*rN(0!5ZhMia8Q=,WXDfR"V/Fa9(nn3";hYW1"r!CMoJIZ%f!8-VhUUn88/b2"hrr`qc4`]0V`]KQ>2@/C$cG42!^3X"+M"o>2!r35Z=2PSc+Sp@Rlc+LGd4AcTQLNTpC+brVFB;,LG&]&;tBEdoU'Zf2Qe-dj5iN-k,gh\_$?%b+JCmb*>1e<%(;`k/8pi#Dh!j%NZmK:"f,#m=X\pe\a>*inV;riZ>%a7MoQ0,$1TSNT3#&>jH&1fjBMW;DRX:5<320Yj&%5o!4H[-GH*:DN*+-VF=rV\HSjC4LbV3I(W\NSpb,%O.0UZA@^A30a3onMRh9f^45bZ5eCA`A)02`*Ao1@JC$ejTF=Q&4lcq$Bfl\n!Q"EUhlC#>\)Y95+pr4-S#tPjrNd-+D%/mA<$s3dh0qd4-g4XP/8,g?V=j[kY)C3&baKo'Z@iOk]:V9E24*@JU-;eV22=s*=C6cjNr'@#b^c#~> +Gatm(40JC)T[02cB0-Ad$ERM17FiD5t82!l:T*iE/9;j':#2W@W!'*1!UYj:&^f.g/bFn5n3A01=S8=B-T0!\_T-NE1<%[b,t[23g0Ngl/c+f[fpIm9m`l5uSIAXm(4QZg3bp,u],#fS)6\FUt(d6KAPqoYeo#r!fip$'?Wo83/(l$a-qm2*WeuWBUt37*H#q+6PF`j?pUKQrF.Sq2rggk=nOE3kqU%jB)icj]8<*AtaR5r2S#41-XsJ"-'\:'(5f(r1I64"ZOu,hf^hNh(T%Et&gamRf\T@+Erpek0Ptg=.`#k6R1tj)%UeTinU!>@j1cHZ\cqG7&L_.4-<9ANZSQh7RfO[c\Kc*F^3$)?rHkKW$Q0(Bbnkrr\FM.psU=r)+p\%+pEPq0I+g^7J;=XgS>c4a1U`1J.]5XD%2(Ar9%12"=\us*En!3)5F`FZhrpA)B_!r3jh(c5U-\n4.r:rp<^Be>%;ZC'eaQ>X9u'F24G$CT@b@NKIgqk(8\CRk8l.<^jq%\Q@5>3\4/'V)GhV&$UQ*t/1^_+8]cnI"%h)eDfN,?*7Ci[;+-91r=5NCJ^-KL9nT#R<2`?5fS5P**h'I6W<4Si=/[KAm&C\3i[WHM[CP)JUYg;#3,?=^.=:=J7Hbgo)N^O[r=cHtVCVQ"rEuMd9f!79>E%F43&:3#HoLb%RZ+MPj2l!PB)81&FXg3adI&`IUX$/WMclj)EX*5\ZmV-Q#Zq0ce9B6oi[9;b2s"+glMQo9pNC$BgMOAoQX6E0S[NO\+aqGO<4n\f1H1Ui>7;,XtDufjZp[_3jQ_;+*;XqT:_+l?XaPLfG;hR89PbeE4IM3rKr-fMAg@!haT=$4XK>lIDq:j5#:TUnG*Xdh=AWSt?Iii9iq;2#u0s-*`NqDL?Q&NS[LNF7%(2:^l'1)c0JNXi^A).kSMCeU&a6h-Si7J>@81Eta^)!uetN;%(D;%#/C'-4?L]LKL4$FLM@*K]i&"_ERff(njT-08`:97nss(/7*8[XZ/L8^>!iOP,)BhqN3^rI+O@Ud?"P1!(#=uDoB(J%;"*T\J>WFrpf,bR+>Vkon..!KuOPtfj/fEEX(qrKlMO.~> endstream endobj 21 0 obj @@ -227,13 +227,13 @@ endobj 15 0 obj << /S /GoTo -/D [21 0 R /XYZ 85.0 507.0 null] +/D [21 0 R /XYZ 85.0 520.2 null] >> endobj 17 0 obj << /S /GoTo -/D [21 0 R /XYZ 85.0 290.266 null] +/D [21 0 R /XYZ 85.0 316.666 null] >> endobj 22 0 obj @@ -244,39 +244,39 @@ endobj xref 0 34 0000000000 65535 f -0000008704 00000 n -0000008776 00000 n -0000008868 00000 n +0000008278 00000 n +0000008350 00000 n +0000008442 00000 n 0000000015 00000 n 0000000071 00000 n 0000000777 00000 n 0000000897 00000 n 0000000950 00000 n -0000009002 00000 n +0000008576 00000 n 0000001085 00000 n -0000009065 00000 n +0000008639 00000 n 0000001221 00000 n -0000009131 00000 n +0000008705 00000 n 0000001358 00000 n -0000009197 00000 n +0000008771 00000 n 0000001495 00000 n -0000009261 00000 n +0000008835 00000 n 0000001632 00000 n -0000004457 00000 n -0000004565 00000 n -0000006970 00000 n -0000009327 00000 n -0000007078 00000 n -0000007251 00000 n -0000007486 00000 n -0000007652 00000 n -0000007847 00000 n -0000008042 00000 n -0000008155 00000 n -0000008265 00000 n -0000008373 00000 n -0000008479 00000 n -0000008595 00000 n +0000004315 00000 n +0000004423 00000 n +0000006544 00000 n +0000008901 00000 n +0000006652 00000 n +0000006825 00000 n +0000007060 00000 n +0000007226 00000 n +0000007421 00000 n +0000007616 00000 n +0000007729 00000 n +0000007839 00000 n +0000007947 00000 n +0000008053 00000 n +0000008169 00000 n trailer << /Size 34 @@ -284,5 +284,5 @@ trailer /Info 4 0 R >> startxref -9378 +8952 %%EOF diff --git a/docs/demo4.html b/docs/demo4.html index 00cd264e569..b4239ba2e63 100644 --- a/docs/demo4.html +++ b/docs/demo4.html @@ -345,7 +345,7 @@ the jars included in the WEB-INF/lib directory in

You'll notice that this file includes the same header and footer as index.jsp. From -there it constructs an IndexSearcher with the +there it constructs an IndexSearcher with the indexLocation that was specified in configuration.jsp. If there is an error of any kind in opening the index, it is displayed to the user and the boolean flag error is set to tell the rest of the sections of the jsp not to continue. @@ -358,42 +358,42 @@ default value. If the criteria isn't provided then a servlet error is thrown (i this is the result of url tampering or some form of browser malfunction).

-The jsp moves on to construct a StandardAnalyzer to -analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally -recommended. This is passed to the QueryParser along with the -criteria to construct a Query +The jsp moves on to construct a StandardAnalyzer to +analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally +recommended. This is passed to the QueryParser along with the +criteria to construct a Query object. You'll also notice the string literal "contents" included. This specifies that the search should cover the contents field and not the title, url or some other field in the indexed documents. If there is any error in -constructing a Query object an +constructing a Query object an error is displayed to the user.

-In the next section of the jsp the IndexSearcher is asked to search +In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are returned in a collection called hits. If the length property of the hits collection is 0 (meaning there were no results) then an error is displayed to the user and the error flag is set.

Finally the jsp iterates through the hits collection, taking the current page into -account, and displays properties of the Document objects we talked about in +account, and displays properties of the Document objects we talked about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case -IndexHTML constructs a document +IndexHTML constructs a document with "url", "title" and "contents").

-Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then +Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then share them across search requests, instead of re-instantiating per search request.

- +

More sources (developers)

There are additional sources used by the web app that were not specifically covered by either -walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very +walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes covered in the first example, with properties specific to parsing and indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting started" with Lucene. @@ -401,7 +401,7 @@ started" with Lucene.

- +

Where to go from here? (everyone!)

@@ -423,7 +423,7 @@ Users' or Developers' +

When to contact the Author

diff --git a/docs/demo4.pdf b/docs/demo4.pdf index ec044415fed..29eb9975317 100644 --- a/docs/demo4.pdf +++ b/docs/demo4.pdf @@ -128,10 +128,10 @@ endobj >> endobj 26 0 obj -<< /Length 2858 /Filter [ /ASCII85Decode /FlateDecode ] +<< /Length 2630 /Filter [ /ASCII85Decode /FlateDecode ] >> stream -Gatm>D3*Gk&cR6oK%1\7J2>+;(\C',2n_&%"$!]=+NX5n*jq.h7rjk@H*[;44!Z+mB!of\e&&>;Ubo]ioCC*]ZhJHfpUBhOmruLPK>6=$jdtcu8Wpm(;;D3L+n@`SEYN_hms_kF\A%k&0maFm>fZga]Y4=rniEs?=VM[NmcVjL4^Zi)aJ8&)7)/UH=g1rJH=?P%pK_,BDYL6'G<[inLZZA[Yn_XB^Cj2:)6Fj3DM5+''t[3MR^p0jLF(]jMhm8&o>X"2M2;p:3bbb/r61FhC4;COr3b#On,iXfktt'A\kbc5&eE%--2L!=^!X`]P5X),%#,1d>(V@_cMV,]0+S@;p!r(5BT+'\?2\km[Cj-kHUL^^A4^;*.!+os'4fBo>pP^ig:g:\S&V)(a3k`/H*XpPOA6%2?VqLboumZfo/AY-!BK/HA)R7g$Yi9crU3/&Ba1Bh&n2k5es_GJK,a;Ik_n^/7XTiD+[.eB+TE,&:m.TUH!H@YXcHSa4Wrf"7#/&#*UQ/eThCleIr/%^YP4@\nZj$k/(ObL[06Q"c@D8d$MI1aSN\;[H1UZW_1&*@]6ToCN((HV?W@msfO%S2I0OiaV)[R;DI*LM*^c#U\2o%MW`ea-;Wo!?;-$UlgoO;BKYW.'2[^dbl^(6a1S[hkBW9OKV')9ZL9OOh%;?.=&VE"Wb/cT\o-N+`eHCMc,bRe?pGNG8/j1*@0'D/P!'F>Fk#,+Oa;r28EKRq=R>jEiNhp&R[7&^J9BSIt3.!gpT\3CLs>m9(.uTpo[,h;l1hc!;U`rOcI#)c7uZfJ(+mgYi8"5VCG7Tg-$RNTH(r?qH1S!cT\h*J+RbE(T#k4Z#Cum(ZFUdP?1_b0re/:K*:d-N1R][Y[p:e5(qUmI)_dD1gt&>!3HXMsW2pB0".-R(21*2Oi'T=f@tg'TW?mdQ.1Y^EBq#?B>g>7WAVSQ'd;>B4+(h,VoHs0MbO1_D*,]Mb1WhL,FSc"^hs^T+]f^?U[<([VrptJL]\chSd%NRL<4\/"MrrUP)RFmV[?Z5Sfl*]H.gf)JUG])+?.RH+@DAeHDS:^jst)E-;R7eB[p-f:2hF?K;6UC'NjN#"U=K^U!-$g?(F@OO)^UcV4<.bKf,K:QchE)Vm64m"\Z2>Z%e]o[-=adW_0W?EegljUE2=Yb7UUs+0pXY3O$*H6(_Baf2!I-,\1S)_W9+i6?l:De1,*BB%UH0Jk]j0tWGc+*`$P55-h3kQ?B2-KqkO5.U*`Dk5)Y;Sl0A0*O@m/6[StJ"E;,nd&?Lit5o0Ybq6T,Wk%C'Bg2b-e;mT>L!gAPi@C\jkaFlYBIRT0L":Gn]f5=M6W +Gau0FD3*^8%fSZ,_TeG7!4+l(gVJ:d\o"o0fE0g^!(G;G%uRJ7lI],egG>>uqP3Gr-5FXd4']%XS]J&]nfdgjPigfam^pa"T6J#ULS+&(et21f8Wpmh;;?YM)eQ&g3/TfP]3k!_D=7Dn5'Hi(^=hl@]O@*2s*SIL)(0=]7F%cNIfD0iFjm?CDju+:D(r'b-*-4T5Lp#N*"IbbMiB<@9g1c`:=*.jWI5.Nh\(_Gu48dfH*]sIP<5WFmOk6>aCW;e""P=gC\2DH/LuqgF;\.o]4G>9gK2/k:QE@_&3t0$tP4uC_Q5jD=/(oBnSH/%&9W/YB3#?XZRA9tDqIF(fo/UEMl>:-$GL##`;BVrl?He>6-0qMbbI%@YbpY7F6\R4]KL^Jfo$8oY`!nFp`EFYnbsC9!(.Mk:dSU3%3bkT:36@B)YnI"DoCIaXH^$Ep%K_QK-/J0].op+G:ct(E`F5U[Wh;Af,KQJj2'0TO8UT3+TO0a_:*5muV,/VGAqj4;*pU3r`iq>kY3d<-L*3=BPMI0_fs%b:Q5C40abe[nDi!367P-K,faH-KV,GV1h(XZK*im#8iH/U]T'RA.]dRVpeYMd.CV3'3/O]fJfO6#;eAPEhQ)UdGBIL;_It:r+)TreOBQUVj^\1+9"al;LjW\mTV1u!P>*u!`qSJ\gLU?U'h7"Q;+6%eC1,1(=(ONDJe`2c5qOCcie<8F1SCaD^Kkd\P/_Jc,GSFe.fZ3r?+a=.Bqt_9@S&%ot^)50#4C!V?S6_\h?lQhA;(]%q#M\:i<%H2mSoD)J"S/.mh09(/>S2Zd&UC>:0sX-5L?R8Ue?g4%L\O\,Zs7B2])D#ONc>C06:M-'`Fjd=mJ?[Lfk9e_"adPIZ/qNSO=lX`dr)iGQ$JtEY&5=<,8lS:]U<-Q/qAuf`qUacNKm5Q4_HUBbg>,]>ii5R&1H;UNc`^8D0A6'jIEZ+3a7`PAn0oH0M:NL]ooS&OVlr#oi+;*]$g0JFn(f&UIL=rg_U"T_5!kS%`k.tH:$oo'LNq^`,F-CP3>;"`Kc^%X0@1jBn[O>2:Q:ic',#CVm>%)[>$k[bs.Ma-jA]W.JoOPr[:8cc;8jbn7*nCRaWG[r6XQPY*"q42[HN;3GQNf'*'Vb>J]%FMZQOb"mbh$n&CYfeiD>LeBHmrJRilt&tLPW>fh^?3n*@]EMtU\[X"P2EE8Gm;uH$a4s?_s\MFOIFGjV]rSeg7Q':BClc]a-rcdH#rU8BCB9&b\Nr@.NIIE,]<'c"\F1J+;A(M59E_DZYIcOhm`ro4;YJXlb1I5RAZQs1i\M(&^5H3<2r,Ae*0g:91!I?MF-t2!k\JtW:^!tCJ0O9k=P=5n-qPJ8R'YTSQrpb,%;DgL](DNS%!k1rSXoEjf"''s7b^+j(8)8).%02@NRe72*kYXks4/`IL(k_\#t(COM&3Cs1rH$k],&CK,\Yb+8%T*Has))]^4!OugES`&io7kn2)-VftSK.Xjlo9OF`E1t(*Q-BqO9)F+dWLrsQ\8EZn+!tN)0VXXQ1]9pkN83QhcT?/Uf['@#ZYj\*0Im*3JTUi=E8rX^aKBT%i+-_I-$K\RM'&fh\\C>6HjkrjFk:=?>TbAShNH8Jer1)&4o(CFP!&Lr;tKYJf9t?Kq*:WN)5*$I53q4=6/:e3]BJ5nFb.WWd^(DFcQ0j/@5tY@!A$E<&53k8;2kL%_sd2+&j'5!W;,CNhoOhLM>^kULbp%Z'#ac3$9Tu08*B2oEI\T@`h2=pI6$oTpuFRp-cPkI&89dBDEJD`]c/fjrM@q]#^ZU?5Or;V49~> endstream endobj 27 0 obj @@ -311,13 +311,13 @@ endobj 19 0 obj << /S /GoTo -/D [27 0 R /XYZ 85.0 258.6 null] +/D [27 0 R /XYZ 85.0 271.8 null] >> endobj 21 0 obj << /S /GoTo -/D [27 0 R /XYZ 85.0 153.466 null] +/D [27 0 R /XYZ 85.0 166.666 null] >> endobj 23 0 obj @@ -334,50 +334,50 @@ endobj xref 0 45 0000000000 65535 f -0000012795 00000 n -0000012874 00000 n -0000012966 00000 n +0000012567 00000 n +0000012646 00000 n +0000012738 00000 n 0000000015 00000 n 0000000071 00000 n 0000000917 00000 n 0000001037 00000 n 0000001111 00000 n -0000013100 00000 n +0000012872 00000 n 0000001246 00000 n -0000013163 00000 n +0000012935 00000 n 0000001383 00000 n -0000013229 00000 n +0000013001 00000 n 0000001520 00000 n -0000013295 00000 n +0000013067 00000 n 0000001657 00000 n -0000013361 00000 n +0000013133 00000 n 0000001794 00000 n -0000013427 00000 n +0000013199 00000 n 0000001931 00000 n -0000013491 00000 n +0000013263 00000 n 0000002068 00000 n -0000013557 00000 n +0000013329 00000 n 0000002205 00000 n 0000004808 00000 n 0000004916 00000 n -0000007867 00000 n -0000007975 00000 n -0000009848 00000 n -0000013621 00000 n -0000009956 00000 n -0000010129 00000 n -0000010498 00000 n -0000010791 00000 n -0000011090 00000 n -0000011336 00000 n -0000011588 00000 n -0000011889 00000 n -0000012133 00000 n -0000012246 00000 n -0000012356 00000 n -0000012464 00000 n -0000012570 00000 n -0000012686 00000 n +0000007639 00000 n +0000007747 00000 n +0000009620 00000 n +0000013393 00000 n +0000009728 00000 n +0000009901 00000 n +0000010270 00000 n +0000010563 00000 n +0000010862 00000 n +0000011108 00000 n +0000011360 00000 n +0000011661 00000 n +0000011905 00000 n +0000012018 00000 n +0000012128 00000 n +0000012236 00000 n +0000012342 00000 n +0000012458 00000 n trailer << /Size 45 @@ -385,5 +385,5 @@ trailer /Info 4 0 R >> startxref -13672 +13444 %%EOF diff --git a/src/site/src/documentation/content/xdocs/demo2.xml b/src/site/src/documentation/content/xdocs/demo2.xml index c4634ff9b84..7f2e780f94b 100644 --- a/src/site/src/documentation/content/xdocs/demo2.xml +++ b/src/site/src/documentation/content/xdocs/demo2.xml @@ -38,16 +38,16 @@ Bring it up in vi or your editor of choice and let's take a look at

IndexFiles

-As we discussed in the previous walk-through, the IndexFiles class creates a Lucene +As we discussed in the previous walk-through, the IndexFiles class creates a Lucene Index. Let's take a look at how it does this.

-The first substantial thing the main function does is instantiate IndexWriter. It passes the string -"index" and a new instance of a class called StandardAnalyzer. +The first substantial thing the main function does is instantiate IndexWriter. It passes the string +"index" and a new instance of a class called StandardAnalyzer. The "index" string is the name of the filesystem directory where all index information should be stored. Because we're not passing a full path, this will be created as a subdirectory of the current working directory (if it does not already exist). On some platforms, it may be created @@ -55,19 +55,19 @@ in other directories (such as the user's home directory).

-The IndexWriter is the main +The IndexWriter is the main class responsible for creating indices. To use it you must instantiate it with a path that it can write the index into. If this path does not exist it will first create it. Otherwise it will -refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an -instance of org.apache.lucene.analysis.Analyzer. +refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an +instance of org.apache.lucene.analysis.Analyzer.

-The particular Analyzer we -are using, StandardAnalyzer, is +The particular Analyzer we +are using, StandardAnalyzer, is little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out stop words and characters from the index. By stop words and characters I mean common language words such as articles (a, an, the, etc.) and other strings that may have less value for searching @@ -79,21 +79,21 @@ href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/common

Looking further down in the file, you should see the indexDocs() code. This recursive -function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to +function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to represent the content in the file as well as its creation time and location. These instances are -added to the indexWriter. Take a look inside FileDocument. It's not particularly -complicated. It just adds fields to the Document. +added to the indexWriter. Take a look inside FileDocument. It's not particularly +complicated. It just adds fields to the Document.

As you can see there isn't much to creating an index. The devil is in the details. You may also -wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more +wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more complex but builds upon this example.

@@ -102,29 +102,29 @@ complex but builds upon this example.
Searching Files

-The SearchFiles class is -quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer -(which is used in the IndexFiles class as well) and a -QueryParser. The +The SearchFiles class is +quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer +(which is used in the IndexFiles class as well) and a +QueryParser. The query parser is constructed with an analyzer used to interpret your query text in the same way the documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and -'the'. The Query object contains -the results from the QueryParser which is passed to -the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query +'the'. The Query object contains +the results from the QueryParser which is passed to +the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query parser. The query parser just enables decoding the Lucene query -syntax into the corresponding Query object. Search can be executed in +syntax into the corresponding Query object. Search can be executed in two different ways:

    -
  • Streaming: A HitCollector subclass +
  • Streaming: A Collector subclass simply prints out the document ID and score for each matching document.
  • -
  • Paging: Using a TopDocCollector -the search results are printed in pages, sorted by score (i. e. relevance).
  • +
  • Paging: Using a TopScoreDocCollector + the search results are printed in pages, sorted by score (i. e. relevance).

diff --git a/src/site/src/documentation/content/xdocs/demo4.xml b/src/site/src/documentation/content/xdocs/demo4.xml index 5ed6389a4af..edd84deeceb 100644 --- a/src/site/src/documentation/content/xdocs/demo4.xml +++ b/src/site/src/documentation/content/xdocs/demo4.xml @@ -62,8 +62,8 @@ the jars included in the WEB-INF/lib directory in the lucenew

You'll notice that this file includes the same header and footer as index.jsp. From -there it constructs an IndexSearcher with the +there it constructs an IndexSearcher with the indexLocation that was specified in configuration.jsp. If there is an error of any kind in opening the index, it is displayed to the user and the boolean flag error is set to tell the rest of the sections of the jsp not to continue. @@ -76,38 +76,38 @@ default value. If the criteria isn't provided then a servlet error is thrown (i this is the result of url tampering or some form of browser malfunction).

-The jsp moves on to construct a StandardAnalyzer to -analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally -recommended. This is passed to the QueryParser along with the -criteria to construct a Query +The jsp moves on to construct a StandardAnalyzer to +analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally +recommended. This is passed to the QueryParser along with the +criteria to construct a Query object. You'll also notice the string literal "contents" included. This specifies that the search should cover the contents field and not the title, url or some other field in the indexed documents. If there is any error in -constructing a Query object an +constructing a Query object an error is displayed to the user.

-In the next section of the jsp the IndexSearcher is asked to search +In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are returned in a collection called hits. If the length property of the hits collection is 0 (meaning there were no results) then an error is displayed to the user and the error flag is set.

Finally the jsp iterates through the hits collection, taking the current page into -account, and displays properties of the Document objects we talked about in +account, and displays properties of the Document objects we talked about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case -IndexHTML constructs a document +IndexHTML constructs a document with "url", "title" and "contents").

-Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then +Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then share them across search requests, instead of re-instantiating per search request.

@@ -115,9 +115,9 @@ share them across search requests, instead of re-instantiating per search reques
More sources (developers)

There are additional sources used by the web app that were not specifically covered by either -walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very +walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes covered in the first example, with properties specific to parsing and indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting started" with Lucene.