diff --git a/docs/demo2.html b/docs/demo2.html index f5cb1263cfe..dbbef389133 100644 --- a/docs/demo2.html +++ b/docs/demo2.html @@ -302,27 +302,27 @@ Bring it up in vi or your editor of choice and let
-As we discussed in the previous walk-through, the IndexFiles class creates a Lucene +As we discussed in the previous walk-through, the IndexFiles class creates a Lucene Index. Let's take a look at how it does this.
-The first substantial thing the main function does is instantiate IndexWriter. It passes the string -"index" and a new instance of a class called StandardAnalyzer. +The first substantial thing the main function does is instantiate IndexWriter. It passes the string +"index" and a new instance of a class called StandardAnalyzer. The "index" string is the name of the filesystem directory where all index information should be stored. Because we're not passing a full path, this will be created as a subdirectory of the current working directory (if it does not already exist). On some platforms, it may be created in other directories (such as the user's home directory).
-The IndexWriter is the main +The IndexWriter is the main class responsible for creating indices. To use it you must instantiate it with a path that it can write the index into. If this path does not exist it will first create it. Otherwise it will -refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an -instance of org.apache.lucene.analysis.Analyzer. +refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an +instance of org.apache.lucene.analysis.Analyzer.
-The particular Analyzer we -are using, StandardAnalyzer, is +The particular Analyzer we +are using, StandardAnalyzer, is little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out stop words and characters from the index. By stop words and characters I mean common language words such as articles (a, an, the, etc.) and other strings that may have less value for searching @@ -332,42 +332,42 @@ different languages (see the *Analyzer.java source
Looking further down in the file, you should see the indexDocs() code. This recursive -function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to +function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to represent the content in the file as well as its creation time and location. These instances are -added to the indexWriter. Take a look inside FileDocument. It's not particularly -complicated. It just adds fields to the Document. +added to the indexWriter. Take a look inside FileDocument. It's not particularly +complicated. It just adds fields to the Document.
As you can see there isn't much to creating an index. The devil is in the details. You may also -wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more +wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more complex but builds upon this example.
-The SearchFiles class is -quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer -(which is used in the IndexFiles class as well) and a -QueryParser. The +The SearchFiles class is +quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer +(which is used in the IndexFiles class as well) and a +QueryParser. The query parser is constructed with an analyzer used to interpret your query text in the same way the documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and -'the'. The Query object contains -the results from the QueryParser which is passed to -the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query +'the'. The Query object contains +the results from the QueryParser which is passed to +the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query parser. The query parser just enables decoding the Lucene query -syntax into the corresponding Query object. Search can be executed in +syntax into the corresponding Query object. Search can be executed in two different ways:
diff --git a/docs/demo2.pdf b/docs/demo2.pdf
index 006a280c7c6..7fa39bfcdc9 100644
--- a/docs/demo2.pdf
+++ b/docs/demo2.pdf
@@ -80,10 +80,10 @@ endobj
>>
endobj
18 0 obj
-<< /Length 2732 /Filter [ /ASCII85Decode /FlateDecode ]
+<< /Length 2590 /Filter [ /ASCII85Decode /FlateDecode ]
>>
stream
-Gatm>D3*F0')oV[6HGK=`@b5dhYSnZ?A]F#m]Q('38F30=/Kh@AC75!42,'OUDI,j1?ir/,!^
You'll notice that this file includes the same header and footer as index.jsp. From -there it constructs an IndexSearcher with the +there it constructs an IndexSearcher with the indexLocation that was specified in configuration.jsp. If there is an error of any kind in opening the index, it is displayed to the user and the boolean flag error is set to tell the rest of the sections of the jsp not to continue. @@ -358,42 +358,42 @@ default value. If the criteria isn't provided then a servlet error is thrown (i this is the result of url tampering or some form of browser malfunction).
-The jsp moves on to construct a StandardAnalyzer to -analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally -recommended. This is passed to the QueryParser along with the -criteria to construct a Query +The jsp moves on to construct a StandardAnalyzer to +analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally +recommended. This is passed to the QueryParser along with the +criteria to construct a Query object. You'll also notice the string literal "contents" included. This specifies that the search should cover the contents field and not the title, url or some other field in the indexed documents. If there is any error in -constructing a Query object an +constructing a Query object an error is displayed to the user.
-In the next section of the jsp the IndexSearcher is asked to search +In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are returned in a collection called hits. If the length property of the hits collection is 0 (meaning there were no results) then an error is displayed to the user and the error flag is set.
Finally the jsp iterates through the hits collection, taking the current page into -account, and displays properties of the Document objects we talked about in +account, and displays properties of the Document objects we talked about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case -IndexHTML constructs a document +IndexHTML constructs a document with "url", "title" and "contents").
-Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then +Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then share them across search requests, instead of re-instantiating per search request.
There are additional sources used by the web app that were not specifically covered by either -walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very +walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes covered in the first example, with properties specific to parsing and indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting started" with Lucene. @@ -401,7 +401,7 @@ started" with Lucene.
@@ -423,7 +423,7 @@ Users' or Developers' +
diff --git a/docs/demo4.pdf b/docs/demo4.pdf
index ec044415fed..29eb9975317 100644
--- a/docs/demo4.pdf
+++ b/docs/demo4.pdf
@@ -128,10 +128,10 @@ endobj
>>
endobj
26 0 obj
-<< /Length 2858 /Filter [ /ASCII85Decode /FlateDecode ]
+<< /Length 2630 /Filter [ /ASCII85Decode /FlateDecode ]
>>
stream
-Gatm>D3*Gk&cR6oK%1\7J2>+;(\C',2n_&%"$!]=+NX5n*jq.h7rjk@H*[;44!Z+mB!of\e&&>;Ubo]ioCC*]ZhJHfpUBhOmruLPK>6=$jdtcu8Wpm(;;D3L+n@`SEYN_hms_kF\A%k&0m
-As we discussed in the previous walk-through, the
-The first substantial thing the
-The
-The particular
Looking further down in the file, you should see the
As you can see there isn't much to creating an index. The devil is in the details. You may also
-wish to examine the other samples in this directory, particularly the
-The 6-0qMbbI%@YbpY7F6\R4]KLvi
or your editor of choice and let's take a look at
IndexFiles
class creates a Lucene
+As we discussed in the previous walk-through, the IndexFiles class creates a Lucene
Index. Let's take a look at how it does this.
main
function does is instantiate IndexWriter
. It passes the string
-"index
" and a new instance of a class called StandardAnalyzer
.
+The first substantial thing the main
function does is instantiate IndexWriter. It passes the string
+"index
" and a new instance of a class called StandardAnalyzer.
The "index
" string is the name of the filesystem directory where all index information
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
the current working directory (if it does not already exist). On some platforms, it may be created
@@ -55,19 +55,19 @@ in other directories (such as the user's home directory).
IndexWriter
is the main
+The IndexWriter is the main
class responsible for creating indices. To use it you must instantiate it with a path that it can
write the index into. If this path does not exist it will first create it. Otherwise it will
-refresh the index at that path. You can also create an index using one of the subclasses of Directory
. In any case, you must also pass an
-instance of org.apache.lucene.analysis.Analyzer
.
+refresh the index at that path. You can also create an index using one of the subclasses of Directory. In any case, you must also pass an
+instance of org.apache.lucene.analysis.Analyzer.
Analyzer
we
-are using, StandardAnalyzer
, is
+The particular Analyzer we
+are using, StandardAnalyzer, is
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
stop words and characters from the index. By stop words and characters I mean common language
words such as articles (a, an, the, etc.) and other strings that may have less value for searching
@@ -79,21 +79,21 @@ href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/common
indexDocs()
code. This recursive
-function simply crawls the directories and uses FileDocument
to create Document
objects. The Document
is simply a data object to
+function simply crawls the directories and uses FileDocument to create Document objects. The Document is simply a data object to
represent the content in the file as well as its creation time and location. These instances are
-added to the indexWriter
. Take a look inside FileDocument
. It's not particularly
-complicated. It just adds fields to the Document
.
+added to the indexWriter
. Take a look inside FileDocument. It's not particularly
+complicated. It just adds fields to the Document.
IndexHTML
class. It is a bit more
+wish to examine the other samples in this directory, particularly the IndexHTML class. It is a bit more
complex but builds upon this example.
SearchFiles
class is
-quite simple. It primarily collaborates with an IndexSearcher
, StandardAnalyzer
-(which is used in the IndexFiles
class as well) and a
-QueryParser
. The
+The SearchFiles class is
+quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer
+(which is used in the IndexFiles class as well) and a
+QueryParser. The
query parser is constructed with an analyzer used to interpret your query text in the same way the
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
-'the'. The Query
object contains
-the results from the QueryParser
which is passed to
-the searcher. Note that it's also possible to programmatically construct a rich Query
object without using the query
+'the'. The Query object contains
+the results from the QueryParser which is passed to
+the searcher. Note that it's also possible to programmatically construct a rich Query object without using the query
parser. The query parser just enables decoding the Lucene query
-syntax into the corresponding Query
object. Search can be executed in
+syntax into the corresponding Query object. Search can be executed in
two different ways:
-
HitCollector
subclass
+TopDocCollector
-the search results are printed in pages, sorted by score (i. e. relevance).
WEB-INF/lib
directory in the lucenew
You'll notice that this file includes the same header and footer as index.jsp
. From
-there it constructs an IndexSearcher
with the
+there it constructs an IndexSearcher with the
indexLocation
that was specified in configuration.jsp
. If there is an
error of any kind in opening the index, it is displayed to the user and the boolean flag
error
is set to tell the rest of the sections of the jsp not to continue.
@@ -76,38 +76,38 @@ default value. If the criteria isn't provided then a servlet error is thrown (i
this is the result of url tampering or some form of browser malfunction).
-The jsp moves on to construct a StandardAnalyzer
to
-analyze the search text. This matches the analyzer used during indexing (IndexHTML
), which is generally
-recommended. This is passed to the QueryParser
along with the
-criteria to construct a Query
+The jsp moves on to construct a StandardAnalyzer to
+analyze the search text. This matches the analyzer used during indexing (IndexHTML), which is generally
+recommended. This is passed to the QueryParser along with the
+criteria to construct a Query
object. You'll also notice the string literal "contents"
included. This specifies
that the search should cover the contents
field and not the title
,
url
or some other field in the indexed documents. If there is any error in
-constructing a Query
object an
+constructing a Query object an
error is displayed to the user.
-In the next section of the jsp the IndexSearcher
is asked to search
+In the next section of the jsp the IndexSearcher is asked to search
given the query object. The results are returned in a collection called hits
. If the
length property of the hits
collection is 0 (meaning there were no results) then an
error is displayed to the user and the error flag is set.
Finally the jsp iterates through the hits
collection, taking the current page into
-account, and displays properties of the Document
objects we talked about in
+account, and displays properties of the Document objects we talked about in
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
-IndexHTML
constructs a document
+IndexHTML constructs a document
with "url", "title" and "contents").
-Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher
and QueryParser
once, and then
+Please note that in a real deployment of Lucene, it's best to instantiate IndexSearcher and QueryParser once, and then
share them across search requests, instead of re-instantiating per search request.
@@ -115,9 +115,9 @@ share them across search requests, instead of re-instantiating per search reques
More sources (developers)
There are additional sources used by the web app that were not specifically covered by either
-walkthrough. For example the HTML parser, the IndexHTML
class and HTMLDocument
class. These are very
+walkthrough. For example the HTML parser, the IndexHTML class and HTMLDocument class. These are very
similar to the classes covered in the first example, with properties specific to parsing and
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
started" with Lucene.