mirror of https://github.com/apache/lucene.git
LUCENE-646: fix various small issues with the "getting started" demo pages (patch by Michael McCandless)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@428554 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
10517a310c
commit
a9a325a4df
|
@ -114,8 +114,8 @@ limitations under the License.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
This document is intended as a "getting started" guide to using and running the
|
||||
Apache Lucene demos. It walks you through some basic installation and configuration.
|
||||
This document is intended as a "getting started" guide to using and running the Lucene demos.
|
||||
It walks you through some basic installation and configuration.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -131,9 +131,8 @@ Apache Lucene demos. It walks you through some basic installation and configura
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
The Lucene Demo code is a set of command line example applications that demonstrate various
|
||||
functionality of Lucene and how one should go about adding it to their
|
||||
applications.
|
||||
The Lucene command-line demo code consists of two applications that demonstrate various
|
||||
functionalities of Lucene and how one should go about adding Lucene to their applications.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -143,22 +142,22 @@ applications.
|
|||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Setting your classpath"><strong>Setting your classpath</strong></a>
|
||||
<a name="Setting your CLASSPATH"><strong>Setting your CLASSPATH</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
First, extract the latest Lucene distribution.
|
||||
First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
|
||||
latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
|
||||
Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
|
||||
</p>
|
||||
<p>
|
||||
You should see the Apache Lucene jar file in the directory you created
|
||||
when you extracted the archive. It should be named something like
|
||||
<b>lucene-{version}.jar</b>.
|
||||
</p>
|
||||
<p>
|
||||
You should also see a file called called <b>lucene-demos-{version}.jar</b>.
|
||||
Put both of these files in your Java CLASSPATH.
|
||||
You should see the Lucene JAR file in the directory you created when you extracted the archive. It
|
||||
should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
|
||||
called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
|
||||
the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
|
||||
successfully). Put both of these files in your Java CLASSPATH.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -174,18 +173,27 @@ Put both of these files in your Java CLASSPATH.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Once you've gotten this far you're probably itching to go. Let's <b> build an index!</b>
|
||||
Assuming you've set your classpath correctly, just type
|
||||
"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce
|
||||
a subdirectory called "index" which will contain an index of all of the Lucene
|
||||
sourcecode.
|
||||
Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
|
||||
you've set your CLASSPATH correctly, just type:
|
||||
|
||||
<pre>
|
||||
java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
|
||||
</pre>
|
||||
|
||||
This will produce a subdirectory called <code>index</code> which will contain an index of all of the
|
||||
Lucene source code.
|
||||
</p>
|
||||
<p>
|
||||
<b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles". You'll be prompted
|
||||
for a query. Type in a swear word and press the enter key. You'll see that the Lucene
|
||||
developers are very well mannered and get no results. Now try entering the word "vector".
|
||||
That should return a whole bunch of documents. The results will page at every tenth
|
||||
result and ask you whether you want more results.
|
||||
To <b>search the index</b> type:
|
||||
|
||||
<pre>
|
||||
java org.apache.lucene.demo.SearchFiles
|
||||
</pre>
|
||||
|
||||
You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
|
||||
Lucene developers are very well mannered and get no results. Now try entering the word "vector".
|
||||
That should return a whole bunch of documents. The results will page at every tenth result and ask
|
||||
you whether you want more results.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
|
|
@ -34,7 +34,7 @@ limitations under the License.
|
|||
|
||||
|
||||
|
||||
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walkthrough</title>
|
||||
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walk-through</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff" text="#000000" link="#525D76">
|
||||
|
@ -114,9 +114,9 @@ limitations under the License.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
In this section we walk through the sources behind the basic Lucene demo such as where to
|
||||
find it, its parts and their function. This section is intended for Java developers
|
||||
wishing to understand how to use Apache Lucene in their applications.
|
||||
In this section we walk through the sources behind the command-line Lucene demo: where to find them,
|
||||
their parts and their function. This section is intended for Java developers wishing to understand
|
||||
how to use Lucene in their applications.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -132,14 +132,14 @@ wishing to understand how to use Apache Lucene in their applications.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Relative to the directory created when you extracted Lucene or retreived it from Subversion, you
|
||||
should see a directory called "src" which in turn contains a directory called "demo".
|
||||
This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo,
|
||||
this is where all the Java sources live.
|
||||
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
||||
should see a directory called <code>src</code> which in turn contains a directory called
|
||||
<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
|
||||
<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
|
||||
</p>
|
||||
<p>
|
||||
Within this directory you should see the IndexFiles class we executed earlier. Bring that
|
||||
up in vi or your alternative text editor and lets take a look at it.
|
||||
Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
|
||||
Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -155,43 +155,45 @@ up in vi or your alternative text editor and lets take a look at it.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index.
|
||||
Lets take a look at how it does this.
|
||||
As we discussed in the previous walk-through, the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
|
||||
Index. Let's take a look at how it does this.
|
||||
</p>
|
||||
<p>
|
||||
The first substantial thing the main function does is instantiate an instance
|
||||
of IndexWriter. It passes a string called "index" and a new instance of a class called
|
||||
"StandardAnalyzer". The "index" string is the name of the directory that all index information
|
||||
should be stored in. Because we're not passing any path information, one must assume this
|
||||
will be created as a subdirectory of the current directory (if it does not already exist). On
|
||||
some platforms this may actually result in it being created in other directories (such as
|
||||
the user's home directory).
|
||||
The first substantial thing the <code>main</code> function does is instantiate <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
|
||||
"<code>index</code>" and a new instance of a class called <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
|
||||
The "<code>index</code>" string is the name of the filesystem directory where all index information
|
||||
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
|
||||
the current working directory (if it does not already exist). On some platforms, it may be created
|
||||
in other directories (such as the user's home directory).
|
||||
</p>
|
||||
<p>
|
||||
The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you
|
||||
must instantiate it with a path that it can write the index into, if this path does not
|
||||
exist it will create it, otherwise it will refresh the index living at that path. You
|
||||
must a also pass an instance of <b>org.apache.lucene.analysis.Analyzer</b>.
|
||||
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
|
||||
class responsible for creating indices. To use it you must instantiate it with a path that it can
|
||||
write the index into. If this path does not exist it will first create it. Otherwise it will
|
||||
refresh the index at that path. You can also create an index using one of the subclasses of <code><a href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
|
||||
instance of <code><a href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
|
||||
</p>
|
||||
<p>
|
||||
The <b>Analyzer</b>, in this case, the <b>StandardAnalyzer</b> is little more than a standard Java
|
||||
Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index.
|
||||
By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other
|
||||
strings that would be useless for searching (e.g. <b>'s</b>) . It should be noted that there are different
|
||||
rules for every language, and you should use the proper analyzer for each. Lucene currently
|
||||
provides Analyzers for English and German, more can be found in the Lucene Sandbox.
|
||||
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
|
||||
are using, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
|
||||
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
|
||||
useless words and characters from the index. By useless words and characters I mean common language
|
||||
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
|
||||
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
|
||||
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
|
||||
different languages (see the <code>*Analyzer.java</code> sources under <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
||||
</p>
|
||||
<p>
|
||||
Looking down further in the file, you should see the indexDocs() code. This recursive function
|
||||
simply crawls the directories and uses FileDocument to create Document objects. The Document
|
||||
is simply a data object to represent the content in the file as well as its creation time and
|
||||
location. These instances are added to the indexWriter. Take a look inside FileDocument. It's
|
||||
not particularly complicated, it just adds fields to the Document.
|
||||
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
|
||||
function simply crawls the directories and uses <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
|
||||
represent the content in the file as well as its creation time and location. These instances are
|
||||
added to the <code>indexWriter</code>. Take a look inside <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
|
||||
complicated. It just adds fields to the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code>.
|
||||
</p>
|
||||
<p>
|
||||
As you can see there isn't much to creating an index. The devil is in the details. You may also
|
||||
wish to examine the other samples in this directory, particularly the IndexHTML class. It is
|
||||
a bit more complex but builds upon this example.
|
||||
wish to examine the other samples in this directory, particularly the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
|
||||
complex but builds upon this example.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -207,12 +209,19 @@ a bit more complex but builds upon this example.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer
|
||||
(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed
|
||||
with an analyzer used to interperate your query in the same way the Index was interperated: finding
|
||||
the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the
|
||||
results from the QueryParser which is passed to the searcher. The searcher results are returned in
|
||||
a collection of Documents called "Hits" which is then iterated through and displayed to the user.
|
||||
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
|
||||
quite simple. It primarily collaborates with an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
|
||||
(which is used in the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
|
||||
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
|
||||
query parser is constructed with an analyzer used to interpret your query text in the same way the
|
||||
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
|
||||
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
|
||||
the results from the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
|
||||
the searcher. Note that it's also possible to programmatically construct a rich <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
|
||||
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
|
||||
syntax</a> into the corresponding <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
|
||||
returned in a collection of Documents called <code><a href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
|
||||
displayed to the user.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
|
|
@ -114,11 +114,10 @@ limitations under the License.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
This document is intended as a "getting started" guide to installing and running the
|
||||
Apache Lucene web application demo. This guide assumes that you have read the
|
||||
information in the previous two examples or already know it anyhow. We'll use
|
||||
Tomcat 4.0.1 as our reference web container. These demos should work with nearly
|
||||
any container, but it is up to you to adapt them appropriately.
|
||||
This document is intended as a "getting started" guide to installing and running the Lucene
|
||||
web application demo. This guide assumes that you have read the information in the previous two
|
||||
examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
|
||||
container, but you may have to adapt them appropriately.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -134,12 +133,11 @@ any container, but it is up to you to adapt them appropriately.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
The Lucene Web Application demo is a template web application intended for deployment
|
||||
on Tomcat or a similar web container. It's NOT designed as a "best practices"
|
||||
implementation by ANY means. It's more of a "hello world" type Lucene Web App.
|
||||
The purpose of this application is to demonstrate Lucene. With that being said,
|
||||
it should be relatively simple to create a small searchable website in Tomcat or
|
||||
a similar application server.
|
||||
The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
|
||||
similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
|
||||
more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
|
||||
Lucene. With that being said, it should be relatively simple to create a small searchable website
|
||||
in Tomcat or a similar application server.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -154,18 +152,19 @@ a similar application server.
|
|||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Once you've gotten this far you're probably itching to go.
|
||||
Let's start by creating the index you'll need for the web examples.
|
||||
Since you've already set your classpath in the previous examples,
|
||||
all you need to do is type
|
||||
<b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>.
|
||||
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer
|
||||
exception).
|
||||
{index-dir}
|
||||
should be a directory that Tomcat has permission to read and write, but is
|
||||
outside of a web accessible context. By default the webapp is configured
|
||||
to look in <b>/opt/lucene/index</b> for this index.
|
||||
<p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
|
||||
you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
|
||||
all you need to do is type:
|
||||
|
||||
<pre>
|
||||
java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
|
||||
</pre>
|
||||
|
||||
You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
|
||||
(make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
|
||||
<code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
|
||||
outside of a web accessible context. By default the webapp is configured to look in
|
||||
<code>/opt/lucene/index</code> for this index.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -180,10 +179,10 @@ to look in <b>/opt/lucene/index</b> for this index.
|
|||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>Located in your distribution directory you should see
|
||||
a war file called luceneweb.war. Copy this to your
|
||||
{tomcat-home}/webapps directory. You may need to restart
|
||||
Tomcat. </p>
|
||||
<p>Located in your distribution directory you should see a war file called
|
||||
<code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
|
||||
<code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
|
||||
You may need to restart Tomcat. </p>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
|
@ -197,15 +196,15 @@ Tomcat. </p>
|
|||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not
|
||||
present, try browsing to "http://localhost:8080/luceneweb" then look again.
|
||||
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the
|
||||
location you used for your index. You may also customize the appTitle and appFooter
|
||||
strings as you see fit. Once you have finished altering the configuration you should
|
||||
restart Tomcat. You may also wish to update the war file by typing
|
||||
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.
|
||||
(The -u option is not available in all versions of jar. In this case recreate the war file).
|
||||
<p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
|
||||
present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
|
||||
the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
|
||||
<code>indexLocation</code> is equal to the location you used for your index. You may also customize
|
||||
the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
|
||||
altering the configuration you may need to restart Tomcat. You may also wish to update the war file
|
||||
by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
|
||||
subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
|
||||
war file).
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -220,14 +219,15 @@ restart Tomcat. You may also wish to update the war file by typing
|
|||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb"
|
||||
enter "test" and the number of items per page and press search.</p>
|
||||
<p>You should now be looking either at a number of results (provided you didn't erase the
|
||||
Tomcat examples) or nothing. Try other search terms. Depending on the number of items
|
||||
per page you set and results returned, there may be a link at the bottom that says "more results>>",
|
||||
clicking it goes to subsequent pages. If you get an error regarding opening the index, then you
|
||||
probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the
|
||||
index (or you skipped the step of creating it).</p>
|
||||
<p>Now you're ready to roll. In your browser set the url to
|
||||
<code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
|
||||
page and press search.</p>
|
||||
<p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
|
||||
examples) or nothing. If you get an error regarding opening the index, then you probably set the
|
||||
path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
|
||||
(or you skipped the step of creating it). Try other search terms. Depending on the number of items
|
||||
per page you set and results returned, there may be a link at the bottom that says <b>More
|
||||
Results>></b>; clicking it takes you to subsequent pages. </p>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
|
@ -242,8 +242,7 @@ index (or you skipped the step of creating it).</p>
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
If you want to know more about how this web app works or how to customize it then
|
||||
<a href="demo4.html">read on>>></a>.
|
||||
If you want to know more about how this web app works or how to customize it then <a href="demo4.html">read on>>></a>.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
|
147
docs/demo4.html
147
docs/demo4.html
|
@ -114,10 +114,10 @@ limitations under the License.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
In this section we walk through the sources behind the basic Lucene Web Application demo.
|
||||
Where to find it, its parts, and their function. This section is intended for Java developers
|
||||
wishing to understand how to use Apache Lucene in their applications or for those involved
|
||||
in deploying web applications based on Lucene.
|
||||
In this section we walk through the sources behind the basic Lucene Web Application demo: where to
|
||||
find them, their parts and their function. This section is intended for Java developers wishing to
|
||||
understand how to use Lucene in their applications or for those involved in deploying web
|
||||
applications based on Lucene.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -133,13 +133,13 @@ in deploying web applications based on Lucene.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Relative the directory created when you extracted Lucene or retreived it from Subversion, you
|
||||
should see a directory called "src" which in turn contains a directory called "jsp".
|
||||
This is the root for all of the Lucene web demo.
|
||||
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
||||
should see a directory called <code>src</code> which in turn contains a directory called
|
||||
<code>jsp</code>. This is the root for all of the Lucene web demo.
|
||||
</p>
|
||||
<p>
|
||||
Within this directory you should see the index.jsp class. Bring this up in vi or your
|
||||
editor of choice.
|
||||
Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
|
||||
choice.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -155,14 +155,12 @@ editor of choice.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
||||
include a footer. If you look at the form, it has two fields: query (where you enter your
|
||||
search criteria) and maxresults where you specify the number of results per page. If you look
|
||||
at the form tag, you'll notice it uses the get method as opposed to the post. While this is
|
||||
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the
|
||||
usefulness of being able to bookmark things like searches. By the structure of this JSP it should
|
||||
be easy to customize it without even editing this particular file. You could simply change the
|
||||
header and footer. Let's look at the header.jsp (located in the same directory) next.
|
||||
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
||||
include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
|
||||
your search criteria) and <code>maxresults</code> where you specify the number of results per page.
|
||||
By the structure of this JSP it should be easy to customize it without even editing this particular
|
||||
file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
|
||||
(located in the same directory) next.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -178,11 +176,11 @@ header and footer. Let's look at the header.jsp (located in the same directory)
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
The header is also very simple by itself. The only thing it does is include the configuration.jsp
|
||||
(which you looked at in the last section of this guide) and set the title and a brief header. This
|
||||
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the
|
||||
footer because all it does is display the footer and close your tags. Let's look at the results.jsp,
|
||||
the meat of this application next.
|
||||
The header is also very simple by itself. The only thing it does is include the
|
||||
<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
|
||||
title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
|
||||
up a bit. We won't cover the footer because all it does is display the footer and close your tags.
|
||||
Let's look at the <code>results.jsp</code>, the meat of this application, next.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -198,43 +196,52 @@ the meat of this application next.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not
|
||||
cover this as it's commented well enough. It does not perform any optimizations such as caching results,
|
||||
etc. as that would make this a more complex example. The first thing in this page is the actual imports
|
||||
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the
|
||||
WEB-INF/lib directory in the final war file.
|
||||
Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
|
||||
results, which we'll not cover here as it's commented well enough. The first thing in this page is
|
||||
the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
|
||||
the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
|
||||
</p>
|
||||
<p>
|
||||
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp
|
||||
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there
|
||||
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell
|
||||
the rest of the sections of the jsp not to continue.
|
||||
You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
|
||||
there it constructs an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
|
||||
<code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
|
||||
error of any kind in opening the index, it is displayed to the user and the boolean flag
|
||||
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
|
||||
</p>
|
||||
<p>
|
||||
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum
|
||||
number of results per page. If the maximum results per page is not set or not valid then it and the
|
||||
start index are set to default values. If only the start index is invalid it is set to a default value. If
|
||||
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering
|
||||
or some form of browser malfunction).
|
||||
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
|
||||
maximum number of results per page. If the maximum results per page is not set or not valid then it
|
||||
and the start index are set to default values. If only the start index is invalid it is set to a
|
||||
default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
|
||||
this is the result of url tampering or some form of browser malfunction).
|
||||
</p>
|
||||
<p>
|
||||
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it
|
||||
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the
|
||||
string literal "contents" included. This is to specify the search should include the contents and not
|
||||
the title, url or some other field in the indexed documents. If there is any error in constructing a Query
|
||||
object an error is displayed to the user.
|
||||
The jsp moves on to construct a <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
|
||||
analyze the search text. This matches the analyzer used during indexing (<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
|
||||
recommended. This is passed to the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
|
||||
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
|
||||
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
|
||||
that the search should cover the <code>contents</code> field and not the <code>title</code>,
|
||||
<code>url</code> or some other field in the indexed documents. If there is any error in
|
||||
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
|
||||
error is displayed to the user.
|
||||
</p>
|
||||
<p>
|
||||
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are
|
||||
returned in a collection called "hits". If the length property of the hits collection is 0 then an error
|
||||
is displayed to the user and the error flag is set.
|
||||
In the next section of the jsp the <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
|
||||
given the query object. The results are returned in a collection called <code>hits</code>. If the
|
||||
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
|
||||
error is displayed to the user and the error flag is set.
|
||||
</p>
|
||||
<p>
|
||||
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked
|
||||
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
||||
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged
|
||||
but the search is repeated every time. This is an area where optimization could improve performance for large
|
||||
result sets.
|
||||
Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
|
||||
account, and displays properties of the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
|
||||
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
||||
<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
|
||||
with "url", "title" and "contents").
|
||||
</p>
|
||||
<p>
|
||||
Please note that in a real deployment of Lucene, it's best to instantiate <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
|
||||
share them across search requests, instead of re-instantiating per search request.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -250,10 +257,11 @@ result sets.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
There are additional sources used by the web app that were not specifically covered by either walkthrough. For
|
||||
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes
|
||||
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is
|
||||
beyond our scope; however, by now you should feel like you're "getting started" with Lucene.
|
||||
There are additional sources used by the web app that were not specifically covered by either
|
||||
walkthrough. For example the HTML parser, the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
|
||||
similar to the classes covered in the first example, with properties specific to parsing and
|
||||
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
|
||||
started" with Lucene.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -263,24 +271,26 @@ beyond our scope; however, by now you should feel like you're "getting started"
|
|||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Where to go from here? (Everyone!)"><strong>Where to go from here? (Everyone!)</strong></a>
|
||||
<a name="Where to go from here? (everyone!)"><strong>Where to go from here? (everyone!)</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
||||
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
||||
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
|
||||
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping,
|
||||
you'll have a broken link in your results. If you want to index non-local files or have some other
|
||||
needs this isn't supported, plus there may be security issues with running the indexing application from
|
||||
your webapps directory. There are a number of things left for you the implementor or developer to do.
|
||||
support that context or redirect to it), anywhere where the directory doesn't quite match the
|
||||
context mapping, you'll have a broken link in your results. If you want to index non-local files or
|
||||
have some other needs this isn't supported, plus there may be security issues with running the
|
||||
indexing application from your webapps directory. There are a number of things left for you the
|
||||
developer to do.
|
||||
</p>
|
||||
<p>
|
||||
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!),
|
||||
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd
|
||||
want to follow the above advice and customize the application to look a little more fancy than black on
|
||||
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists!
|
||||
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
|
||||
to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
|
||||
would assume you'd want to follow the above advice and customize the application to look a little
|
||||
more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
|
||||
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
@ -296,11 +306,12 @@ white with "Lucene Template" at the top. We'll see you on the Lucene Users' or
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First
|
||||
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback,
|
||||
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the
|
||||
lists so that everyone can share in them. Certainly you'll get the most help there as well.
|
||||
Thanks for understanding.
|
||||
Please resist the urge to contact the authors of this document (without bribes of fame and fortune
|
||||
attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
|
||||
Certainly you'll get the most help that way as well. That being said, feedback, and modifications
|
||||
to this document and samples are ever so greatly appreciated. They are just best sent to the lists
|
||||
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
|
||||
everyone can share in them. Thanks for understanding!
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
|
|
@ -114,40 +114,38 @@ limitations under the License.
|
|||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
This document is intended as a "getting started" guide. It has three basic
|
||||
audiences: novices looking to install Apache Lucene on their application or
|
||||
web server, developers looking to modify or base the applications they develop
|
||||
on Lucene, and developers looking to become involved in and contribute to the
|
||||
development of Lucene. This document is written in tutorial and walkthrough
|
||||
format. It intends to help you in "getting started", but does not go into great
|
||||
depth into some of the conceptual or inner details of Apache Lucene.
|
||||
This document is intended as a "getting started" guide. It has three audiences: first-time users
|
||||
looking to install Apache Lucene in their application or web server; developers looking to modify or base
|
||||
the applications they develop on Lucene; and developers looking to become involved in and contribute
|
||||
to the development of Lucene. This document is written in tutorial and walk-through format. The
|
||||
goal is to help you "get started". It does not go into great depth on some of the conceptual or
|
||||
inner details of Lucene.
|
||||
</p>
|
||||
<p>
|
||||
Each section listed below builds on one another. That being said more advanced users may
|
||||
wish to skip sections.
|
||||
Each section listed below builds on one another. More advanced users
|
||||
may wish to skip sections.
|
||||
</p>
|
||||
<ul>
|
||||
<li><a href="demo.html">About the basic Lucene demo and its usage</a>.
|
||||
This section is intended for anyone who wants a basic background on using the provided Lucene demos.</li>
|
||||
<li><a href="demo.html">About the command-line Lucene demo and its usage</a>. This section
|
||||
is intended for anyone who wants to use the command-line Lucene demo.</li> <p />
|
||||
|
||||
<li><a href="demo2.html">About the sources and implementation
|
||||
for the basic Lucene demo</a> section we walk through . This section is intended for developers.</li>
|
||||
<li><a href="demo2.html">About the sources and implementation for the command-line Lucene
|
||||
demo</a>. This section walks through the implementation details (sources) of the
|
||||
command-line Lucene demo. This section is intended for developers.</li> <p />
|
||||
|
||||
<li><a href="demo3.html">About installing
|
||||
and configuring the template web application</a>. While this walkthrough assumes
|
||||
Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have
|
||||
the requisite knowledge) adapt the instructions to your container. This section is intended
|
||||
for those responsible for the development or deployment of Lucene-based web applications.</li>
|
||||
<li><a href="demo3.html">About installing and configuring the demo template web
|
||||
application</a>. While this walk-through assumes Tomcat as your container of choice,
|
||||
there is no reason you can't (provided you have the requisite knowledge) adapt the
|
||||
instructions to your container. This section is intended for those responsible for the
|
||||
development or deployment of Lucene-based web applications.</li> <p />
|
||||
|
||||
<li><a href="demo4.html">About the sources used to construct the demo template web
|
||||
application</a>. Please note the template application is designed to highlight features of
|
||||
Lucene and is <b>not</b> an example of best practices. (One would hopefully use MVC
|
||||
architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
|
||||
would be WAY beyond the scope of this guide.) This section is intended for developers and
|
||||
those wishing to customize the demo template web application to their needs. </li>
|
||||
|
||||
<li><a href="demo4.html">About the sources used to construct the
|
||||
template web application</a>. Please note the template application is designed to highlight
|
||||
features of Lucene and is <b>not</b> an example of best practices. (One would hopefully
|
||||
use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML
|
||||
with stylesheets, but showing you how to do that would be WAY beyond the scope of this
|
||||
demonstration. Additionally one could cache results, and perform other performance
|
||||
optimizations, but those are beyond the scope of this demo).
|
||||
This section is intended for developers and those wishing to customize the template web
|
||||
application to their needs. The sections useful to developers only are clearly delineated.</li>
|
||||
</ul>
|
||||
</blockquote>
|
||||
</p>
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %>
|
||||
<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.analysis.standard.StandardAnalyzer, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %>
|
||||
|
||||
<%
|
||||
/*
|
||||
|
@ -76,7 +76,7 @@ public String escapeHTML(String s) {
|
|||
//query string so you get the
|
||||
//treatment
|
||||
|
||||
Analyzer analyzer = new StopAnalyzer(); //construct our usual analyzer
|
||||
Analyzer analyzer = new StandardAnalyzer(); //construct our usual analyzer
|
||||
try {
|
||||
QueryParser qp = new QueryParser("contents", analyzer);
|
||||
query = qp.parse(queryString); //parse the
|
||||
|
@ -126,8 +126,11 @@ public String escapeHTML(String s) {
|
|||
<%
|
||||
Document doc = hits.doc(i); //get the next document
|
||||
String doctitle = doc.get("title"); //get its title
|
||||
String url = doc.get("url"); //get its url field
|
||||
if ((doctitle == null) || doctitle.equals("")) //use the url if it has no title
|
||||
String url = doc.get("path"); //get its path field
|
||||
if (url != null && url.startsWith("../webapps/")) { // strip off ../webapps prefix if present
|
||||
url = url.substring(10);
|
||||
}
|
||||
if ((doctitle == null) || doctitle.equals("")) //use the path if it has no title
|
||||
doctitle = url;
|
||||
//then output!
|
||||
%>
|
||||
|
|
|
@ -8,49 +8,58 @@
|
|||
|
||||
<section name="About this Document">
|
||||
<p>
|
||||
This document is intended as a "getting started" guide to using and running the
|
||||
Apache Lucene demos. It walks you through some basic installation and configuration.
|
||||
This document is intended as a "getting started" guide to using and running the Lucene demos.
|
||||
It walks you through some basic installation and configuration.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
|
||||
<section name="About the Demos">
|
||||
<p>
|
||||
The Lucene Demo code is a set of command line example applications that demonstrate various
|
||||
functionality of Lucene and how one should go about adding it to their
|
||||
applications.
|
||||
The Lucene command-line demo code consists of two applications that demonstrate various
|
||||
functionalities of Lucene and how one should go about adding Lucene to their applications.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="Setting your classpath">
|
||||
<section name="Setting your CLASSPATH">
|
||||
<p>
|
||||
First, extract the latest Lucene distribution.
|
||||
First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
|
||||
latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a
|
||||
href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
|
||||
Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
|
||||
</p>
|
||||
<p>
|
||||
You should see the Apache Lucene jar file in the directory you created
|
||||
when you extracted the archive. It should be named something like
|
||||
<b>lucene-{version}.jar</b>.
|
||||
</p>
|
||||
<p>
|
||||
You should also see a file called called <b>lucene-demos-{version}.jar</b>.
|
||||
Put both of these files in your Java CLASSPATH.
|
||||
You should see the Lucene JAR file in the directory you created when you extracted the archive. It
|
||||
should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
|
||||
called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
|
||||
the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
|
||||
successfully). Put both of these files in your Java CLASSPATH.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="Indexing Files">
|
||||
<p>
|
||||
Once you've gotten this far you're probably itching to go. Let's <b> build an index!</b>
|
||||
Assuming you've set your classpath correctly, just type
|
||||
"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce
|
||||
a subdirectory called "index" which will contain an index of all of the Lucene
|
||||
sourcecode.
|
||||
Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
|
||||
you've set your CLASSPATH correctly, just type:
|
||||
|
||||
<pre>
|
||||
java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
|
||||
</pre>
|
||||
|
||||
This will produce a subdirectory called <code>index</code> which will contain an index of all of the
|
||||
Lucene source code.
|
||||
</p>
|
||||
<p>
|
||||
<b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles". You'll be prompted
|
||||
for a query. Type in a swear word and press the enter key. You'll see that the Lucene
|
||||
developers are very well mannered and get no results. Now try entering the word "vector".
|
||||
That should return a whole bunch of documents. The results will page at every tenth
|
||||
result and ask you whether you want more results.
|
||||
To <b>search the index</b> type:
|
||||
|
||||
<pre>
|
||||
java org.apache.lucene.demo.SearchFiles
|
||||
</pre>
|
||||
|
||||
You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
|
||||
Lucene developers are very well mannered and get no results. Now try entering the word "vector".
|
||||
That should return a whole bunch of documents. The results will page at every tenth result and ask
|
||||
you whether you want more results.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
|
|
127
xdocs/demo2.xml
127
xdocs/demo2.xml
|
@ -2,89 +2,132 @@
|
|||
<document>
|
||||
<properties>
|
||||
<author email="acoliver@apache.org">Andrew C. Oliver</author>
|
||||
<title>Apache Lucene - Basic Demo Sources Walkthrough</title>
|
||||
<title>Apache Lucene - Basic Demo Sources Walk-through</title>
|
||||
</properties>
|
||||
<body>
|
||||
|
||||
<section name="About the Code">
|
||||
<p>
|
||||
In this section we walk through the sources behind the basic Lucene demo such as where to
|
||||
find it, its parts and their function. This section is intended for Java developers
|
||||
wishing to understand how to use Apache Lucene in their applications.
|
||||
In this section we walk through the sources behind the command-line Lucene demo: where to find them,
|
||||
their parts and their function. This section is intended for Java developers wishing to understand
|
||||
how to use Lucene in their applications.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
|
||||
<section name="Location of the source">
|
||||
|
||||
<p>
|
||||
Relative to the directory created when you extracted Lucene or retreived it from Subversion, you
|
||||
should see a directory called "src" which in turn contains a directory called "demo".
|
||||
This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo,
|
||||
this is where all the Java sources live.
|
||||
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
||||
should see a directory called <code>src</code> which in turn contains a directory called
|
||||
<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
|
||||
<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Within this directory you should see the IndexFiles class we executed earlier. Bring that
|
||||
up in vi or your alternative text editor and lets take a look at it.
|
||||
Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
|
||||
Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
<section name="IndexFiles">
|
||||
|
||||
<p>
|
||||
As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index.
|
||||
Lets take a look at how it does this.
|
||||
As we discussed in the previous walk-through, the <code><a
|
||||
href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
|
||||
Index. Let's take a look at how it does this.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The first substantial thing the main function does is instantiate an instance
|
||||
of IndexWriter. It passes a string called "index" and a new instance of a class called
|
||||
"StandardAnalyzer". The "index" string is the name of the directory that all index information
|
||||
should be stored in. Because we're not passing any path information, one must assume this
|
||||
will be created as a subdirectory of the current directory (if it does not already exist). On
|
||||
some platforms this may actually result in it being created in other directories (such as
|
||||
the user's home directory).
|
||||
The first substantial thing the <code>main</code> function does is instantiate <code><a
|
||||
href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
|
||||
"<code>index</code>" and a new instance of a class called <code><a
|
||||
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
|
||||
The "<code>index</code>" string is the name of the filesystem directory where all index information
|
||||
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
|
||||
the current working directory (if it does not already exist). On some platforms, it may be created
|
||||
in other directories (such as the user's home directory).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you
|
||||
must instantiate it with a path that it can write the index into, if this path does not
|
||||
exist it will create it, otherwise it will refresh the index living at that path. You
|
||||
must a also pass an instance of <b>org.apache.lucene.analysis.Analyzer</b>.
|
||||
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
|
||||
class responsible for creating indices. To use it you must instantiate it with a path that it can
|
||||
write the index into. If this path does not exist it will first create it. Otherwise it will
|
||||
refresh the index at that path. You can also create an index using one of the subclasses of <code><a
|
||||
href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
|
||||
instance of <code><a
|
||||
href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The <b>Analyzer</b>, in this case, the <b>StandardAnalyzer</b> is little more than a standard Java
|
||||
Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index.
|
||||
By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other
|
||||
strings that would be useless for searching (e.g. <b>'s</b>) . It should be noted that there are different
|
||||
rules for every language, and you should use the proper analyzer for each. Lucene currently
|
||||
provides Analyzers for English and German, more can be found in the Lucene Sandbox.
|
||||
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
|
||||
are using, <code><a
|
||||
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
|
||||
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
|
||||
useless words and characters from the index. By useless words and characters I mean common language
|
||||
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
|
||||
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
|
||||
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
|
||||
different languages (see the <code>*Analyzer.java</code> sources under <a
|
||||
href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Looking down further in the file, you should see the indexDocs() code. This recursive function
|
||||
simply crawls the directories and uses FileDocument to create Document objects. The Document
|
||||
is simply a data object to represent the content in the file as well as its creation time and
|
||||
location. These instances are added to the indexWriter. Take a look inside FileDocument. It's
|
||||
not particularly complicated, it just adds fields to the Document.
|
||||
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
|
||||
function simply crawls the directories and uses <code><a
|
||||
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a
|
||||
href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a
|
||||
href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
|
||||
represent the content in the file as well as its creation time and location. These instances are
|
||||
added to the <code>indexWriter</code>. Take a look inside <code><a
|
||||
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
|
||||
complicated. It just adds fields to the <code><a
|
||||
href="api/org/apache/lucene/document/Document.html">Document</a></code>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
As you can see there isn't much to creating an index. The devil is in the details. You may also
|
||||
wish to examine the other samples in this directory, particularly the IndexHTML class. It is
|
||||
a bit more complex but builds upon this example.
|
||||
wish to examine the other samples in this directory, particularly the <code><a
|
||||
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
|
||||
complex but builds upon this example.
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
<section name="Searching Files">
|
||||
|
||||
<p>
|
||||
The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer
|
||||
(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed
|
||||
with an analyzer used to interperate your query in the same way the Index was interperated: finding
|
||||
the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the
|
||||
results from the QueryParser which is passed to the searcher. The searcher results are returned in
|
||||
a collection of Documents called "Hits" which is then iterated through and displayed to the user.
|
||||
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
|
||||
quite simple. It primarily collaborates with an <code><a
|
||||
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a
|
||||
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
|
||||
(which is used in the <code><a
|
||||
href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
|
||||
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
|
||||
query parser is constructed with an analyzer used to interpret your query text in the same way the
|
||||
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
|
||||
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
|
||||
the results from the <code><a
|
||||
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
|
||||
the searcher. Note that it's also possible to programmatically construct a rich <code><a
|
||||
href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
|
||||
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
|
||||
syntax</a> into the corresponding <code><a
|
||||
href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
|
||||
returned in a collection of Documents called <code><a
|
||||
href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
|
||||
displayed to the user.
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
<section name="The Web example...">
|
||||
|
||||
<p>
|
||||
<a href="demo3.html">read on>>></a>
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
</body>
|
||||
|
|
|
@ -9,77 +9,75 @@
|
|||
|
||||
<section name="About this Document">
|
||||
<p>
|
||||
This document is intended as a "getting started" guide to installing and running the
|
||||
Apache Lucene web application demo. This guide assumes that you have read the
|
||||
information in the previous two examples or already know it anyhow. We'll use
|
||||
Tomcat 4.0.1 as our reference web container. These demos should work with nearly
|
||||
any container, but it is up to you to adapt them appropriately.
|
||||
This document is intended as a "getting started" guide to installing and running the Lucene
|
||||
web application demo. This guide assumes that you have read the information in the previous two
|
||||
examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
|
||||
container, but you may have to adapt them appropriately.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
|
||||
<section name="About the Demos">
|
||||
<p>
|
||||
The Lucene Web Application demo is a template web application intended for deployment
|
||||
on Tomcat or a similar web container. It's NOT designed as a "best practices"
|
||||
implementation by ANY means. It's more of a "hello world" type Lucene Web App.
|
||||
The purpose of this application is to demonstrate Lucene. With that being said,
|
||||
it should be relatively simple to create a small searchable website in Tomcat or
|
||||
a similar application server.
|
||||
The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
|
||||
similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
|
||||
more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
|
||||
Lucene. With that being said, it should be relatively simple to create a small searchable website
|
||||
in Tomcat or a similar application server.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="Indexing Files">
|
||||
<p>
|
||||
Once you've gotten this far you're probably itching to go.
|
||||
Let's start by creating the index you'll need for the web examples.
|
||||
Since you've already set your classpath in the previous examples,
|
||||
all you need to do is type
|
||||
<b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>.
|
||||
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer
|
||||
exception).
|
||||
{index-dir}
|
||||
should be a directory that Tomcat has permission to read and write, but is
|
||||
outside of a web accessible context. By default the webapp is configured
|
||||
to look in <b>/opt/lucene/index</b> for this index.
|
||||
<p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
|
||||
you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
|
||||
all you need to do is type:
|
||||
|
||||
<pre>
|
||||
java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
|
||||
</pre>
|
||||
|
||||
You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
|
||||
(make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
|
||||
<code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
|
||||
outside of a web accessible context. By default the webapp is configured to look in
|
||||
<code>/opt/lucene/index</code> for this index.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="Deploying the Demos">
|
||||
<p>Located in your distribution directory you should see
|
||||
a war file called luceneweb.war. Copy this to your
|
||||
{tomcat-home}/webapps directory. You may need to restart
|
||||
Tomcat. </p>
|
||||
</section>
|
||||
<p>Located in your distribution directory you should see a war file called
|
||||
<code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
|
||||
<code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
|
||||
You may need to restart Tomcat. </p> </section>
|
||||
|
||||
<section name="Configuration">
|
||||
<p>
|
||||
From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not
|
||||
present, try browsing to "http://localhost:8080/luceneweb" then look again.
|
||||
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the
|
||||
location you used for your index. You may also customize the appTitle and appFooter
|
||||
strings as you see fit. Once you have finished altering the configuration you should
|
||||
restart Tomcat. You may also wish to update the war file by typing
|
||||
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.
|
||||
(The -u option is not available in all versions of jar. In this case recreate the war file).
|
||||
<p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
|
||||
present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
|
||||
the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
|
||||
<code>indexLocation</code> is equal to the location you used for your index. You may also customize
|
||||
the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
|
||||
altering the configuration you may need to restart Tomcat. You may also wish to update the war file
|
||||
by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
|
||||
subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
|
||||
war file).
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="Running the Demos">
|
||||
<p>Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb"
|
||||
enter "test" and the number of items per page and press search.</p>
|
||||
<p>You should now be looking either at a number of results (provided you didn't erase the
|
||||
Tomcat examples) or nothing. Try other search terms. Depending on the number of items
|
||||
per page you set and results returned, there may be a link at the bottom that says "more results>>",
|
||||
clicking it goes to subsequent pages. If you get an error regarding opening the index, then you
|
||||
probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the
|
||||
index (or you skipped the step of creating it).</p>
|
||||
</section>
|
||||
<p>Now you're ready to roll. In your browser set the url to
|
||||
<code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
|
||||
page and press search.</p>
|
||||
<p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
|
||||
examples) or nothing. If you get an error regarding opening the index, then you probably set the
|
||||
path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
|
||||
(or you skipped the step of creating it). Try other search terms. Depending on the number of items
|
||||
per page you set and results returned, there may be a link at the bottom that says <b>More
|
||||
Results>></b>; clicking it takes you to subsequent pages. </p> </section>
|
||||
|
||||
<section name="About the code...">
|
||||
<p>
|
||||
If you want to know more about how this web app works or how to customize it then
|
||||
<a href="demo4.html">read on>>></a>.
|
||||
If you want to know more about how this web app works or how to customize it then <a
|
||||
href="demo4.html">read on>>></a>.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
|
|
158
xdocs/demo4.xml
158
xdocs/demo4.xml
|
@ -8,124 +8,146 @@
|
|||
|
||||
<section name="About the Code">
|
||||
<p>
|
||||
In this section we walk through the sources behind the basic Lucene Web Application demo.
|
||||
Where to find it, its parts, and their function. This section is intended for Java developers
|
||||
wishing to understand how to use Apache Lucene in their applications or for those involved
|
||||
in deploying web applications based on Lucene.
|
||||
In this section we walk through the sources behind the basic Lucene Web Application demo: where to
|
||||
find them, their parts and their function. This section is intended for Java developers wishing to
|
||||
understand how to use Lucene in their applications or for those involved in deploying web
|
||||
applications based on Lucene.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
|
||||
<section name="Location of the source (developers/deployers)">
|
||||
<p>
|
||||
Relative the directory created when you extracted Lucene or retreived it from Subversion, you
|
||||
should see a directory called "src" which in turn contains a directory called "jsp".
|
||||
This is the root for all of the Lucene web demo.
|
||||
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
||||
should see a directory called <code>src</code> which in turn contains a directory called
|
||||
<code>jsp</code>. This is the root for all of the Lucene web demo.
|
||||
</p>
|
||||
<p>
|
||||
Within this directory you should see the index.jsp class. Bring this up in vi or your
|
||||
editor of choice.
|
||||
Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
|
||||
choice.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="index.jsp (developers/deployers)">
|
||||
<p>
|
||||
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
||||
include a footer. If you look at the form, it has two fields: query (where you enter your
|
||||
search criteria) and maxresults where you specify the number of results per page. If you look
|
||||
at the form tag, you'll notice it uses the get method as opposed to the post. While this is
|
||||
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the
|
||||
usefulness of being able to bookmark things like searches. By the structure of this JSP it should
|
||||
be easy to customize it without even editing this particular file. You could simply change the
|
||||
header and footer. Let's look at the header.jsp (located in the same directory) next.
|
||||
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
||||
include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
|
||||
your search criteria) and <code>maxresults</code> where you specify the number of results per page.
|
||||
By the structure of this JSP it should be easy to customize it without even editing this particular
|
||||
file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
|
||||
(located in the same directory) next.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="header.jsp (developers/deployers)">
|
||||
<p>
|
||||
The header is also very simple by itself. The only thing it does is include the configuration.jsp
|
||||
(which you looked at in the last section of this guide) and set the title and a brief header. This
|
||||
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the
|
||||
footer because all it does is display the footer and close your tags. Let's look at the results.jsp,
|
||||
the meat of this application next.
|
||||
The header is also very simple by itself. The only thing it does is include the
|
||||
<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
|
||||
title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
|
||||
up a bit. We won't cover the footer because all it does is display the footer and close your tags.
|
||||
Let's look at the <code>results.jsp</code>, the meat of this application, next.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="results.jsp (developers)">
|
||||
<p>
|
||||
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not
|
||||
cover this as it's commented well enough. It does not perform any optimizations such as caching results,
|
||||
etc. as that would make this a more complex example. The first thing in this page is the actual imports
|
||||
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the
|
||||
WEB-INF/lib directory in the final war file.
|
||||
Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
|
||||
results, which we'll not cover here as it's commented well enough. The first thing in this page is
|
||||
the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
|
||||
the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
|
||||
</p>
|
||||
<p>
|
||||
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp
|
||||
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there
|
||||
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell
|
||||
the rest of the sections of the jsp not to continue.
|
||||
You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
|
||||
there it constructs an <code><a
|
||||
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
|
||||
<code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
|
||||
error of any kind in opening the index, it is displayed to the user and the boolean flag
|
||||
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
|
||||
</p>
|
||||
<p>
|
||||
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum
|
||||
number of results per page. If the maximum results per page is not set or not valid then it and the
|
||||
start index are set to default values. If only the start index is invalid it is set to a default value. If
|
||||
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering
|
||||
or some form of browser malfunction).
|
||||
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
|
||||
maximum number of results per page. If the maximum results per page is not set or not valid then it
|
||||
and the start index are set to default values. If only the start index is invalid it is set to a
|
||||
default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
|
||||
this is the result of url tampering or some form of browser malfunction).
|
||||
</p>
|
||||
<p>
|
||||
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it
|
||||
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the
|
||||
string literal "contents" included. This is to specify the search should include the contents and not
|
||||
the title, url or some other field in the indexed documents. If there is any error in constructing a Query
|
||||
object an error is displayed to the user.
|
||||
The jsp moves on to construct a <code><a
|
||||
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
|
||||
analyze the search text. This matches the analyzer used during indexing (<code><a
|
||||
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
|
||||
recommended. This is passed to the <code><a
|
||||
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
|
||||
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
|
||||
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
|
||||
that the search should cover the <code>contents</code> field and not the <code>title</code>,
|
||||
<code>url</code> or some other field in the indexed documents. If there is any error in
|
||||
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
|
||||
error is displayed to the user.
|
||||
</p>
|
||||
<p>
|
||||
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are
|
||||
returned in a collection called "hits". If the length property of the hits collection is 0 then an error
|
||||
is displayed to the user and the error flag is set.
|
||||
In the next section of the jsp the <code><a
|
||||
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
|
||||
given the query object. The results are returned in a collection called <code>hits</code>. If the
|
||||
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
|
||||
error is displayed to the user and the error flag is set.
|
||||
</p>
|
||||
<p>
|
||||
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked
|
||||
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
||||
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged
|
||||
but the search is repeated every time. This is an area where optimization could improve performance for large
|
||||
result sets.
|
||||
Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
|
||||
account, and displays properties of the <code><a
|
||||
href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
|
||||
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
||||
<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
|
||||
with "url", "title" and "contents").
|
||||
</p>
|
||||
<p>
|
||||
Please note that in a real deployment of Lucene, it's best to instantiate <code><a
|
||||
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a
|
||||
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
|
||||
share them across search requests, instead of re-instantiating per search request.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="More sources (developers)">
|
||||
<p>
|
||||
There are additional sources used by the web app that were not specifically covered by either walkthrough. For
|
||||
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes
|
||||
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is
|
||||
beyond our scope; however, by now you should feel like you're "getting started" with Lucene.
|
||||
There are additional sources used by the web app that were not specifically covered by either
|
||||
walkthrough. For example the HTML parser, the <code><a
|
||||
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a
|
||||
href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
|
||||
similar to the classes covered in the first example, with properties specific to parsing and
|
||||
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
|
||||
started" with Lucene.
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="Where to go from here? (Everyone!)">
|
||||
<section name="Where to go from here? (everyone!)">
|
||||
<p>
|
||||
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
||||
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
||||
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
|
||||
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping,
|
||||
you'll have a broken link in your results. If you want to index non-local files or have some other
|
||||
needs this isn't supported, plus there may be security issues with running the indexing application from
|
||||
your webapps directory. There are a number of things left for you the implementor or developer to do.
|
||||
support that context or redirect to it), anywhere where the directory doesn't quite match the
|
||||
context mapping, you'll have a broken link in your results. If you want to index non-local files or
|
||||
have some other needs this isn't supported, plus there may be security issues with running the
|
||||
indexing application from your webapps directory. There are a number of things left for you the
|
||||
developer to do.
|
||||
</p>
|
||||
<p>
|
||||
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!),
|
||||
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd
|
||||
want to follow the above advice and customize the application to look a little more fancy than black on
|
||||
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists!
|
||||
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
|
||||
to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
|
||||
would assume you'd want to follow the above advice and customize the application to look a little
|
||||
more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
|
||||
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
|
||||
</p>
|
||||
</section>
|
||||
|
||||
<section name="When to contact the Author">
|
||||
<p>
|
||||
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First
|
||||
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback,
|
||||
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the
|
||||
lists so that everyone can share in them. Certainly you'll get the most help there as well.
|
||||
Thanks for understanding.
|
||||
Please resist the urge to contact the authors of this document (without bribes of fame and fortune
|
||||
attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a
|
||||
href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
|
||||
Certainly you'll get the most help that way as well. That being said, feedback, and modifications
|
||||
to this document and samples are ever so greatly appreciated. They are just best sent to the lists
|
||||
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
|
||||
everyone can share in them. Thanks for understanding!
|
||||
</p>
|
||||
</section>
|
||||
|
||||
|
|
|
@ -8,42 +8,40 @@
|
|||
|
||||
<section name="Getting Started">
|
||||
<p>
|
||||
This document is intended as a "getting started" guide. It has three basic
|
||||
audiences: novices looking to install Apache Lucene on their application or
|
||||
web server, developers looking to modify or base the applications they develop
|
||||
on Lucene, and developers looking to become involved in and contribute to the
|
||||
development of Lucene. This document is written in tutorial and walkthrough
|
||||
format. It intends to help you in "getting started", but does not go into great
|
||||
depth into some of the conceptual or inner details of Apache Lucene.
|
||||
This document is intended as a "getting started" guide. It has three audiences: first-time users
|
||||
looking to install Apache Lucene in their application or web server; developers looking to modify or base
|
||||
the applications they develop on Lucene; and developers looking to become involved in and contribute
|
||||
to the development of Lucene. This document is written in tutorial and walk-through format. The
|
||||
goal is to help you "get started". It does not go into great depth on some of the conceptual or
|
||||
inner details of Lucene.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Each section listed below builds on one another. That being said more advanced users may
|
||||
wish to skip sections.
|
||||
Each section listed below builds on one another. More advanced users
|
||||
may wish to skip sections.
|
||||
</p>
|
||||
|
||||
<ul>
|
||||
<li><a href="demo.html">About the basic Lucene demo and its usage</a>.
|
||||
This section is intended for anyone who wants a basic background on using the provided Lucene demos.</li>
|
||||
<li><a href="demo.html">About the command-line Lucene demo and its usage</a>. This section
|
||||
is intended for anyone who wants to use the command-line Lucene demo.</li> <p/>
|
||||
|
||||
<li><a href="demo2.html">About the sources and implementation
|
||||
for the basic Lucene demo</a> section we walk through . This section is intended for developers.</li>
|
||||
<li><a href="demo2.html">About the sources and implementation for the command-line Lucene
|
||||
demo</a>. This section walks through the implementation details (sources) of the
|
||||
command-line Lucene demo. This section is intended for developers.</li> <p/>
|
||||
|
||||
<li><a href="demo3.html">About installing
|
||||
and configuring the template web application</a>. While this walkthrough assumes
|
||||
Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have
|
||||
the requisite knowledge) adapt the instructions to your container. This section is intended
|
||||
for those responsible for the development or deployment of Lucene-based web applications.</li>
|
||||
<li><a href="demo3.html">About installing and configuring the demo template web
|
||||
application</a>. While this walk-through assumes Tomcat as your container of choice,
|
||||
there is no reason you can't (provided you have the requisite knowledge) adapt the
|
||||
instructions to your container. This section is intended for those responsible for the
|
||||
development or deployment of Lucene-based web applications.</li> <p/>
|
||||
|
||||
<li><a href="demo4.html">About the sources used to construct the demo template web
|
||||
application</a>. Please note the template application is designed to highlight features of
|
||||
Lucene and is <b>not</b> an example of best practices. (One would hopefully use MVC
|
||||
architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
|
||||
would be WAY beyond the scope of this guide.) This section is intended for developers and
|
||||
those wishing to customize the demo template web application to their needs. </li>
|
||||
|
||||
<li><a href="demo4.html">About the sources used to construct the
|
||||
template web application</a>. Please note the template application is designed to highlight
|
||||
features of Lucene and is <b>not</b> an example of best practices. (One would hopefully
|
||||
use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML
|
||||
with stylesheets, but showing you how to do that would be WAY beyond the scope of this
|
||||
demonstration. Additionally one could cache results, and perform other performance
|
||||
optimizations, but those are beyond the scope of this demo).
|
||||
This section is intended for developers and those wishing to customize the template web
|
||||
application to their needs. The sections useful to developers only are clearly delineated.</li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
|
|
Loading…
Reference in New Issue