LUCENE-646: fix various small issues with the "getting started" demo pages (patch by Michael McCandless)

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@428554 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Daniel Naber 2006-08-03 22:24:42 +00:00
parent 10517a310c
commit a9a325a4df
11 changed files with 518 additions and 420 deletions

View File

@ -114,8 +114,8 @@ limitations under the License.
<tr><td>
<blockquote>
<p>
This document is intended as a "getting started" guide to using and running the
Apache Lucene demos. It walks you through some basic installation and configuration.
This document is intended as a "getting started" guide to using and running the Lucene demos.
It walks you through some basic installation and configuration.
</p>
</blockquote>
</p>
@ -131,9 +131,8 @@ Apache Lucene demos. It walks you through some basic installation and configura
<tr><td>
<blockquote>
<p>
The Lucene Demo code is a set of command line example applications that demonstrate various
functionality of Lucene and how one should go about adding it to their
applications.
The Lucene command-line demo code consists of two applications that demonstrate various
functionalities of Lucene and how one should go about adding Lucene to their applications.
</p>
</blockquote>
</p>
@ -143,22 +142,22 @@ applications.
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Setting your classpath"><strong>Setting your classpath</strong></a>
<a name="Setting your CLASSPATH"><strong>Setting your CLASSPATH</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
First, extract the latest Lucene distribution.
First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
</p>
<p>
You should see the Apache Lucene jar file in the directory you created
when you extracted the archive. It should be named something like
<b>lucene-{version}.jar</b>.
</p>
<p>
You should also see a file called called <b>lucene-demos-{version}.jar</b>.
Put both of these files in your Java CLASSPATH.
You should see the Lucene JAR file in the directory you created when you extracted the archive. It
should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
successfully). Put both of these files in your Java CLASSPATH.
</p>
</blockquote>
</p>
@ -174,18 +173,27 @@ Put both of these files in your Java CLASSPATH.
<tr><td>
<blockquote>
<p>
Once you've gotten this far you're probably itching to go. Let's <b> build an index!</b>
Assuming you've set your classpath correctly, just type
"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce
a subdirectory called "index" which will contain an index of all of the Lucene
sourcecode.
Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
you've set your CLASSPATH correctly, just type:
<pre>
java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
</pre>
This will produce a subdirectory called <code>index</code> which will contain an index of all of the
Lucene source code.
</p>
<p>
<b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles". You'll be prompted
for a query. Type in a swear word and press the enter key. You'll see that the Lucene
developers are very well mannered and get no results. Now try entering the word "vector".
That should return a whole bunch of documents. The results will page at every tenth
result and ask you whether you want more results.
To <b>search the index</b> type:
<pre>
java org.apache.lucene.demo.SearchFiles
</pre>
You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
Lucene developers are very well mannered and get no results. Now try entering the word "vector".
That should return a whole bunch of documents. The results will page at every tenth result and ask
you whether you want more results.
</p>
</blockquote>
</p>

View File

@ -34,7 +34,7 @@ limitations under the License.
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walkthrough</title>
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walk-through</title>
</head>
<body bgcolor="#ffffff" text="#000000" link="#525D76">
@ -114,9 +114,9 @@ limitations under the License.
<tr><td>
<blockquote>
<p>
In this section we walk through the sources behind the basic Lucene demo such as where to
find it, its parts and their function. This section is intended for Java developers
wishing to understand how to use Apache Lucene in their applications.
In this section we walk through the sources behind the command-line Lucene demo: where to find them,
their parts and their function. This section is intended for Java developers wishing to understand
how to use Lucene in their applications.
</p>
</blockquote>
</p>
@ -132,14 +132,14 @@ wishing to understand how to use Apache Lucene in their applications.
<tr><td>
<blockquote>
<p>
Relative to the directory created when you extracted Lucene or retreived it from Subversion, you
should see a directory called "src" which in turn contains a directory called "demo".
This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo,
this is where all the Java sources live.
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called <code>src</code> which in turn contains a directory called
<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
</p>
<p>
Within this directory you should see the IndexFiles class we executed earlier. Bring that
up in vi or your alternative text editor and lets take a look at it.
Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
</p>
</blockquote>
</p>
@ -155,43 +155,45 @@ up in vi or your alternative text editor and lets take a look at it.
<tr><td>
<blockquote>
<p>
As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index.
Lets take a look at how it does this.
As we discussed in the previous walk-through, the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
Index. Let's take a look at how it does this.
</p>
<p>
The first substantial thing the main function does is instantiate an instance
of IndexWriter. It passes a string called "index" and a new instance of a class called
"StandardAnalyzer". The "index" string is the name of the directory that all index information
should be stored in. Because we're not passing any path information, one must assume this
will be created as a subdirectory of the current directory (if it does not already exist). On
some platforms this may actually result in it being created in other directories (such as
the user's home directory).
The first substantial thing the <code>main</code> function does is instantiate <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
"<code>index</code>" and a new instance of a class called <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
The "<code>index</code>" string is the name of the filesystem directory where all index information
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
the current working directory (if it does not already exist). On some platforms, it may be created
in other directories (such as the user's home directory).
</p>
<p>
The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you
must instantiate it with a path that it can write the index into, if this path does not
exist it will create it, otherwise it will refresh the index living at that path. You
must a also pass an instance of <b>org.apache.lucene.analysis.Analyzer</b>.
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
class responsible for creating indices. To use it you must instantiate it with a path that it can
write the index into. If this path does not exist it will first create it. Otherwise it will
refresh the index at that path. You can also create an index using one of the subclasses of <code><a href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
instance of <code><a href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
</p>
<p>
The <b>Analyzer</b>, in this case, the <b>StandardAnalyzer</b> is little more than a standard Java
Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index.
By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other
strings that would be useless for searching (e.g. <b>'s</b>) . It should be noted that there are different
rules for every language, and you should use the proper analyzer for each. Lucene currently
provides Analyzers for English and German, more can be found in the Lucene Sandbox.
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
are using, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
useless words and characters from the index. By useless words and characters I mean common language
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
different languages (see the <code>*Analyzer.java</code> sources under <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
</p>
<p>
Looking down further in the file, you should see the indexDocs() code. This recursive function
simply crawls the directories and uses FileDocument to create Document objects. The Document
is simply a data object to represent the content in the file as well as its creation time and
location. These instances are added to the indexWriter. Take a look inside FileDocument. It's
not particularly complicated, it just adds fields to the Document.
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
function simply crawls the directories and uses <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
represent the content in the file as well as its creation time and location. These instances are
added to the <code>indexWriter</code>. Take a look inside <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
complicated. It just adds fields to the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code>.
</p>
<p>
As you can see there isn't much to creating an index. The devil is in the details. You may also
wish to examine the other samples in this directory, particularly the IndexHTML class. It is
a bit more complex but builds upon this example.
wish to examine the other samples in this directory, particularly the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
complex but builds upon this example.
</p>
</blockquote>
</p>
@ -207,12 +209,19 @@ a bit more complex but builds upon this example.
<tr><td>
<blockquote>
<p>
The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer
(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed
with an analyzer used to interperate your query in the same way the Index was interperated: finding
the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the
results from the QueryParser which is passed to the searcher. The searcher results are returned in
a collection of Documents called "Hits" which is then iterated through and displayed to the user.
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
quite simple. It primarily collaborates with an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
(which is used in the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
query parser is constructed with an analyzer used to interpret your query text in the same way the
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
the results from the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
the searcher. Note that it's also possible to programmatically construct a rich <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
syntax</a> into the corresponding <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
returned in a collection of Documents called <code><a href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
displayed to the user.
</p>
</blockquote>
</p>

View File

@ -114,11 +114,10 @@ limitations under the License.
<tr><td>
<blockquote>
<p>
This document is intended as a "getting started" guide to installing and running the
Apache Lucene web application demo. This guide assumes that you have read the
information in the previous two examples or already know it anyhow. We'll use
Tomcat 4.0.1 as our reference web container. These demos should work with nearly
any container, but it is up to you to adapt them appropriately.
This document is intended as a "getting started" guide to installing and running the Lucene
web application demo. This guide assumes that you have read the information in the previous two
examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
container, but you may have to adapt them appropriately.
</p>
</blockquote>
</p>
@ -134,12 +133,11 @@ any container, but it is up to you to adapt them appropriately.
<tr><td>
<blockquote>
<p>
The Lucene Web Application demo is a template web application intended for deployment
on Tomcat or a similar web container. It's NOT designed as a "best practices"
implementation by ANY means. It's more of a "hello world" type Lucene Web App.
The purpose of this application is to demonstrate Lucene. With that being said,
it should be relatively simple to create a small searchable website in Tomcat or
a similar application server.
The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
Lucene. With that being said, it should be relatively simple to create a small searchable website
in Tomcat or a similar application server.
</p>
</blockquote>
</p>
@ -154,18 +152,19 @@ a similar application server.
</td></tr>
<tr><td>
<blockquote>
<p>
Once you've gotten this far you're probably itching to go.
Let's start by creating the index you'll need for the web examples.
Since you've already set your classpath in the previous examples,
all you need to do is type
<b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>.
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer
exception).
{index-dir}
should be a directory that Tomcat has permission to read and write, but is
outside of a web accessible context. By default the webapp is configured
to look in <b>/opt/lucene/index</b> for this index.
<p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
all you need to do is type:
<pre>
java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
</pre>
You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
(make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
<code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
outside of a web accessible context. By default the webapp is configured to look in
<code>/opt/lucene/index</code> for this index.
</p>
</blockquote>
</p>
@ -180,10 +179,10 @@ to look in <b>/opt/lucene/index</b> for this index.
</td></tr>
<tr><td>
<blockquote>
<p>Located in your distribution directory you should see
a war file called luceneweb.war. Copy this to your
{tomcat-home}/webapps directory. You may need to restart
Tomcat. </p>
<p>Located in your distribution directory you should see a war file called
<code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
<code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
You may need to restart Tomcat. </p>
</blockquote>
</p>
</td></tr>
@ -197,15 +196,15 @@ Tomcat. </p>
</td></tr>
<tr><td>
<blockquote>
<p>
From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not
present, try browsing to "http://localhost:8080/luceneweb" then look again.
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the
location you used for your index. You may also customize the appTitle and appFooter
strings as you see fit. Once you have finished altering the configuration you should
restart Tomcat. You may also wish to update the war file by typing
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.
(The -u option is not available in all versions of jar. In this case recreate the war file).
<p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
<code>indexLocation</code> is equal to the location you used for your index. You may also customize
the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
altering the configuration you may need to restart Tomcat. You may also wish to update the war file
by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
war file).
</p>
</blockquote>
</p>
@ -220,14 +219,15 @@ restart Tomcat. You may also wish to update the war file by typing
</td></tr>
<tr><td>
<blockquote>
<p>Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb"
enter "test" and the number of items per page and press search.</p>
<p>You should now be looking either at a number of results (provided you didn't erase the
Tomcat examples) or nothing. Try other search terms. Depending on the number of items
per page you set and results returned, there may be a link at the bottom that says "more results&gt;&gt;",
clicking it goes to subsequent pages. If you get an error regarding opening the index, then you
probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the
index (or you skipped the step of creating it).</p>
<p>Now you're ready to roll. In your browser set the url to
<code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
page and press search.</p>
<p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
examples) or nothing. If you get an error regarding opening the index, then you probably set the
path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
(or you skipped the step of creating it). Try other search terms. Depending on the number of items
per page you set and results returned, there may be a link at the bottom that says <b>More
Results&gt;&gt;</b>; clicking it takes you to subsequent pages. </p>
</blockquote>
</p>
</td></tr>
@ -242,8 +242,7 @@ index (or you skipped the step of creating it).</p>
<tr><td>
<blockquote>
<p>
If you want to know more about how this web app works or how to customize it then
<a href="demo4.html">read on&gt;&gt;&gt;</a>.
If you want to know more about how this web app works or how to customize it then <a href="demo4.html">read on&gt;&gt;&gt;</a>.
</p>
</blockquote>
</p>

View File

@ -114,10 +114,10 @@ limitations under the License.
<tr><td>
<blockquote>
<p>
In this section we walk through the sources behind the basic Lucene Web Application demo.
Where to find it, its parts, and their function. This section is intended for Java developers
wishing to understand how to use Apache Lucene in their applications or for those involved
in deploying web applications based on Lucene.
In this section we walk through the sources behind the basic Lucene Web Application demo: where to
find them, their parts and their function. This section is intended for Java developers wishing to
understand how to use Lucene in their applications or for those involved in deploying web
applications based on Lucene.
</p>
</blockquote>
</p>
@ -133,13 +133,13 @@ in deploying web applications based on Lucene.
<tr><td>
<blockquote>
<p>
Relative the directory created when you extracted Lucene or retreived it from Subversion, you
should see a directory called "src" which in turn contains a directory called "jsp".
This is the root for all of the Lucene web demo.
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called <code>src</code> which in turn contains a directory called
<code>jsp</code>. This is the root for all of the Lucene web demo.
</p>
<p>
Within this directory you should see the index.jsp class. Bring this up in vi or your
editor of choice.
Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
choice.
</p>
</blockquote>
</p>
@ -155,14 +155,12 @@ editor of choice.
<tr><td>
<blockquote>
<p>
This jsp page is pretty boring by itself. All it does is include a header, display a form and
include a footer. If you look at the form, it has two fields: query (where you enter your
search criteria) and maxresults where you specify the number of results per page. If you look
at the form tag, you'll notice it uses the get method as opposed to the post. While this is
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the
usefulness of being able to bookmark things like searches. By the structure of this JSP it should
be easy to customize it without even editing this particular file. You could simply change the
header and footer. Let's look at the header.jsp (located in the same directory) next.
This jsp page is pretty boring by itself. All it does is include a header, display a form and
include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
your search criteria) and <code>maxresults</code> where you specify the number of results per page.
By the structure of this JSP it should be easy to customize it without even editing this particular
file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
(located in the same directory) next.
</p>
</blockquote>
</p>
@ -178,11 +176,11 @@ header and footer. Let's look at the header.jsp (located in the same directory)
<tr><td>
<blockquote>
<p>
The header is also very simple by itself. The only thing it does is include the configuration.jsp
(which you looked at in the last section of this guide) and set the title and a brief header. This
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the
footer because all it does is display the footer and close your tags. Let's look at the results.jsp,
the meat of this application next.
The header is also very simple by itself. The only thing it does is include the
<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
up a bit. We won't cover the footer because all it does is display the footer and close your tags.
Let's look at the <code>results.jsp</code>, the meat of this application, next.
</p>
</blockquote>
</p>
@ -198,43 +196,52 @@ the meat of this application next.
<tr><td>
<blockquote>
<p>
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not
cover this as it's commented well enough. It does not perform any optimizations such as caching results,
etc. as that would make this a more complex example. The first thing in this page is the actual imports
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the
WEB-INF/lib directory in the final war file.
Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
results, which we'll not cover here as it's commented well enough. The first thing in this page is
the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
</p>
<p>
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell
the rest of the sections of the jsp not to continue.
You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
there it constructs an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
<code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
error of any kind in opening the index, it is displayed to the user and the boolean flag
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
</p>
<p>
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum
number of results per page. If the maximum results per page is not set or not valid then it and the
start index are set to default values. If only the start index is invalid it is set to a default value. If
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering
or some form of browser malfunction).
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
maximum number of results per page. If the maximum results per page is not set or not valid then it
and the start index are set to default values. If only the start index is invalid it is set to a
default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
this is the result of url tampering or some form of browser malfunction).
</p>
<p>
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the
string literal "contents" included. This is to specify the search should include the contents and not
the title, url or some other field in the indexed documents. If there is any error in constructing a Query
object an error is displayed to the user.
The jsp moves on to construct a <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
analyze the search text. This matches the analyzer used during indexing (<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
recommended. This is passed to the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
that the search should cover the <code>contents</code> field and not the <code>title</code>,
<code>url</code> or some other field in the indexed documents. If there is any error in
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
error is displayed to the user.
</p>
<p>
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are
returned in a collection called "hits". If the length property of the hits collection is 0 then an error
is displayed to the user and the error flag is set.
In the next section of the jsp the <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
given the query object. The results are returned in a collection called <code>hits</code>. If the
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
error is displayed to the user and the error flag is set.
</p>
<p>
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged
but the search is repeated every time. This is an area where optimization could improve performance for large
result sets.
Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
account, and displays properties of the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
with "url", "title" and "contents").
</p>
<p>
Please note that in a real deployment of Lucene, it's best to instantiate <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
share them across search requests, instead of re-instantiating per search request.
</p>
</blockquote>
</p>
@ -250,10 +257,11 @@ result sets.
<tr><td>
<blockquote>
<p>
There are additional sources used by the web app that were not specifically covered by either walkthrough. For
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is
beyond our scope; however, by now you should feel like you're "getting started" with Lucene.
There are additional sources used by the web app that were not specifically covered by either
walkthrough. For example the HTML parser, the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
similar to the classes covered in the first example, with properties specific to parsing and
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
started" with Lucene.
</p>
</blockquote>
</p>
@ -263,24 +271,26 @@ beyond our scope; however, by now you should feel like you're "getting started"
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Where to go from here? (Everyone!)"><strong>Where to go from here? (Everyone!)</strong></a>
<a name="Where to go from here? (everyone!)"><strong>Where to go from here? (everyone!)</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping,
you'll have a broken link in your results. If you want to index non-local files or have some other
needs this isn't supported, plus there may be security issues with running the indexing application from
your webapps directory. There are a number of things left for you the implementor or developer to do.
support that context or redirect to it), anywhere where the directory doesn't quite match the
context mapping, you'll have a broken link in your results. If you want to index non-local files or
have some other needs this isn't supported, plus there may be security issues with running the
indexing application from your webapps directory. There are a number of things left for you the
developer to do.
</p>
<p>
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!),
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd
want to follow the above advice and customize the application to look a little more fancy than black on
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists!
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
would assume you'd want to follow the above advice and customize the application to look a little
more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
</p>
</blockquote>
</p>
@ -296,11 +306,12 @@ white with "Lucene Template" at the top. We'll see you on the Lucene Users' or
<tr><td>
<blockquote>
<p>
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback,
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the
lists so that everyone can share in them. Certainly you'll get the most help there as well.
Thanks for understanding.
Please resist the urge to contact the authors of this document (without bribes of fame and fortune
attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
Certainly you'll get the most help that way as well. That being said, feedback, and modifications
to this document and samples are ever so greatly appreciated. They are just best sent to the lists
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
everyone can share in them. Thanks for understanding!
</p>
</blockquote>
</p>

View File

@ -114,40 +114,38 @@ limitations under the License.
<tr><td>
<blockquote>
<p>
This document is intended as a "getting started" guide. It has three basic
audiences: novices looking to install Apache Lucene on their application or
web server, developers looking to modify or base the applications they develop
on Lucene, and developers looking to become involved in and contribute to the
development of Lucene. This document is written in tutorial and walkthrough
format. It intends to help you in "getting started", but does not go into great
depth into some of the conceptual or inner details of Apache Lucene.
This document is intended as a "getting started" guide. It has three audiences: first-time users
looking to install Apache Lucene in their application or web server; developers looking to modify or base
the applications they develop on Lucene; and developers looking to become involved in and contribute
to the development of Lucene. This document is written in tutorial and walk-through format. The
goal is to help you "get started". It does not go into great depth on some of the conceptual or
inner details of Lucene.
</p>
<p>
Each section listed below builds on one another. That being said more advanced users may
wish to skip sections.
Each section listed below builds on one another. More advanced users
may wish to skip sections.
</p>
<ul>
<li><a href="demo.html">About the basic Lucene demo and its usage</a>.
This section is intended for anyone who wants a basic background on using the provided Lucene demos.</li>
<li><a href="demo.html">About the command-line Lucene demo and its usage</a>. This section
is intended for anyone who wants to use the command-line Lucene demo.</li> <p />
<li><a href="demo2.html">About the sources and implementation
for the basic Lucene demo</a> section we walk through . This section is intended for developers.</li>
<li><a href="demo2.html">About the sources and implementation for the command-line Lucene
demo</a>. This section walks through the implementation details (sources) of the
command-line Lucene demo. This section is intended for developers.</li> <p />
<li><a href="demo3.html">About installing
and configuring the template web application</a>. While this walkthrough assumes
Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have
the requisite knowledge) adapt the instructions to your container. This section is intended
for those responsible for the development or deployment of Lucene-based web applications.</li>
<li><a href="demo3.html">About installing and configuring the demo template web
application</a>. While this walk-through assumes Tomcat as your container of choice,
there is no reason you can't (provided you have the requisite knowledge) adapt the
instructions to your container. This section is intended for those responsible for the
development or deployment of Lucene-based web applications.</li> <p />
<li><a href="demo4.html">About the sources used to construct the demo template web
application</a>. Please note the template application is designed to highlight features of
Lucene and is <b>not</b> an example of best practices. (One would hopefully use MVC
architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
would be WAY beyond the scope of this guide.) This section is intended for developers and
those wishing to customize the demo template web application to their needs. </li>
<li><a href="demo4.html">About the sources used to construct the
template web application</a>. Please note the template application is designed to highlight
features of Lucene and is <b>not</b> an example of best practices. (One would hopefully
use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML
with stylesheets, but showing you how to do that would be WAY beyond the scope of this
demonstration. Additionally one could cache results, and perform other performance
optimizations, but those are beyond the scope of this demo).
This section is intended for developers and those wishing to customize the template web
application to their needs. The sections useful to developers only are clearly delineated.</li>
</ul>
</blockquote>
</p>

View File

@ -1,4 +1,4 @@
<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %>
<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.analysis.standard.StandardAnalyzer, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %>
<%
/*
@ -76,7 +76,7 @@ public String escapeHTML(String s) {
//query string so you get the
//treatment
Analyzer analyzer = new StopAnalyzer(); //construct our usual analyzer
Analyzer analyzer = new StandardAnalyzer(); //construct our usual analyzer
try {
QueryParser qp = new QueryParser("contents", analyzer);
query = qp.parse(queryString); //parse the
@ -126,8 +126,11 @@ public String escapeHTML(String s) {
<%
Document doc = hits.doc(i); //get the next document
String doctitle = doc.get("title"); //get its title
String url = doc.get("url"); //get its url field
if ((doctitle == null) || doctitle.equals("")) //use the url if it has no title
String url = doc.get("path"); //get its path field
if (url != null && url.startsWith("../webapps/")) { // strip off ../webapps prefix if present
url = url.substring(10);
}
if ((doctitle == null) || doctitle.equals("")) //use the path if it has no title
doctitle = url;
//then output!
%>

View File

@ -8,49 +8,58 @@
<section name="About this Document">
<p>
This document is intended as a "getting started" guide to using and running the
Apache Lucene demos. It walks you through some basic installation and configuration.
This document is intended as a "getting started" guide to using and running the Lucene demos.
It walks you through some basic installation and configuration.
</p>
</section>
<section name="About the Demos">
<p>
The Lucene Demo code is a set of command line example applications that demonstrate various
functionality of Lucene and how one should go about adding it to their
applications.
The Lucene command-line demo code consists of two applications that demonstrate various
functionalities of Lucene and how one should go about adding Lucene to their applications.
</p>
</section>
<section name="Setting your classpath">
<section name="Setting your CLASSPATH">
<p>
First, extract the latest Lucene distribution.
First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a
href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
</p>
<p>
You should see the Apache Lucene jar file in the directory you created
when you extracted the archive. It should be named something like
<b>lucene-{version}.jar</b>.
</p>
<p>
You should also see a file called called <b>lucene-demos-{version}.jar</b>.
Put both of these files in your Java CLASSPATH.
You should see the Lucene JAR file in the directory you created when you extracted the archive. It
should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
successfully). Put both of these files in your Java CLASSPATH.
</p>
</section>
<section name="Indexing Files">
<p>
Once you've gotten this far you're probably itching to go. Let's <b> build an index!</b>
Assuming you've set your classpath correctly, just type
"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce
a subdirectory called "index" which will contain an index of all of the Lucene
sourcecode.
Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
you've set your CLASSPATH correctly, just type:
<pre>
java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
</pre>
This will produce a subdirectory called <code>index</code> which will contain an index of all of the
Lucene source code.
</p>
<p>
<b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles". You'll be prompted
for a query. Type in a swear word and press the enter key. You'll see that the Lucene
developers are very well mannered and get no results. Now try entering the word "vector".
That should return a whole bunch of documents. The results will page at every tenth
result and ask you whether you want more results.
To <b>search the index</b> type:
<pre>
java org.apache.lucene.demo.SearchFiles
</pre>
You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
Lucene developers are very well mannered and get no results. Now try entering the word "vector".
That should return a whole bunch of documents. The results will page at every tenth result and ask
you whether you want more results.
</p>
</section>

View File

@ -2,89 +2,132 @@
<document>
<properties>
<author email="acoliver@apache.org">Andrew C. Oliver</author>
<title>Apache Lucene - Basic Demo Sources Walkthrough</title>
<title>Apache Lucene - Basic Demo Sources Walk-through</title>
</properties>
<body>
<section name="About the Code">
<p>
In this section we walk through the sources behind the basic Lucene demo such as where to
find it, its parts and their function. This section is intended for Java developers
wishing to understand how to use Apache Lucene in their applications.
In this section we walk through the sources behind the command-line Lucene demo: where to find them,
their parts and their function. This section is intended for Java developers wishing to understand
how to use Lucene in their applications.
</p>
</section>
<section name="Location of the source">
<p>
Relative to the directory created when you extracted Lucene or retreived it from Subversion, you
should see a directory called "src" which in turn contains a directory called "demo".
This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo,
this is where all the Java sources live.
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called <code>src</code> which in turn contains a directory called
<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
</p>
<p>
Within this directory you should see the IndexFiles class we executed earlier. Bring that
up in vi or your alternative text editor and lets take a look at it.
Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
</p>
</section>
<section name="IndexFiles">
<p>
As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index.
Lets take a look at how it does this.
As we discussed in the previous walk-through, the <code><a
href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
Index. Let's take a look at how it does this.
</p>
<p>
The first substantial thing the main function does is instantiate an instance
of IndexWriter. It passes a string called "index" and a new instance of a class called
"StandardAnalyzer". The "index" string is the name of the directory that all index information
should be stored in. Because we're not passing any path information, one must assume this
will be created as a subdirectory of the current directory (if it does not already exist). On
some platforms this may actually result in it being created in other directories (such as
the user's home directory).
The first substantial thing the <code>main</code> function does is instantiate <code><a
href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
"<code>index</code>" and a new instance of a class called <code><a
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
The "<code>index</code>" string is the name of the filesystem directory where all index information
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
the current working directory (if it does not already exist). On some platforms, it may be created
in other directories (such as the user's home directory).
</p>
<p>
The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you
must instantiate it with a path that it can write the index into, if this path does not
exist it will create it, otherwise it will refresh the index living at that path. You
must a also pass an instance of <b>org.apache.lucene.analysis.Analyzer</b>.
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
class responsible for creating indices. To use it you must instantiate it with a path that it can
write the index into. If this path does not exist it will first create it. Otherwise it will
refresh the index at that path. You can also create an index using one of the subclasses of <code><a
href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
instance of <code><a
href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
</p>
<p>
The <b>Analyzer</b>, in this case, the <b>StandardAnalyzer</b> is little more than a standard Java
Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index.
By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other
strings that would be useless for searching (e.g. <b>'s</b>) . It should be noted that there are different
rules for every language, and you should use the proper analyzer for each. Lucene currently
provides Analyzers for English and German, more can be found in the Lucene Sandbox.
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
are using, <code><a
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
useless words and characters from the index. By useless words and characters I mean common language
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
different languages (see the <code>*Analyzer.java</code> sources under <a
href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
</p>
<p>
Looking down further in the file, you should see the indexDocs() code. This recursive function
simply crawls the directories and uses FileDocument to create Document objects. The Document
is simply a data object to represent the content in the file as well as its creation time and
location. These instances are added to the indexWriter. Take a look inside FileDocument. It's
not particularly complicated, it just adds fields to the Document.
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
function simply crawls the directories and uses <code><a
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a
href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a
href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
represent the content in the file as well as its creation time and location. These instances are
added to the <code>indexWriter</code>. Take a look inside <code><a
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
complicated. It just adds fields to the <code><a
href="api/org/apache/lucene/document/Document.html">Document</a></code>.
</p>
<p>
As you can see there isn't much to creating an index. The devil is in the details. You may also
wish to examine the other samples in this directory, particularly the IndexHTML class. It is
a bit more complex but builds upon this example.
wish to examine the other samples in this directory, particularly the <code><a
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
complex but builds upon this example.
</p>
</section>
<section name="Searching Files">
<p>
The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer
(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed
with an analyzer used to interperate your query in the same way the Index was interperated: finding
the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the
results from the QueryParser which is passed to the searcher. The searcher results are returned in
a collection of Documents called "Hits" which is then iterated through and displayed to the user.
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
quite simple. It primarily collaborates with an <code><a
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
(which is used in the <code><a
href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
query parser is constructed with an analyzer used to interpret your query text in the same way the
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
the results from the <code><a
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
the searcher. Note that it's also possible to programmatically construct a rich <code><a
href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
syntax</a> into the corresponding <code><a
href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
returned in a collection of Documents called <code><a
href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
displayed to the user.
</p>
</section>
<section name="The Web example...">
<p>
<a href="demo3.html">read on&gt;&gt;&gt;</a>
</p>
</section>
</body>

View File

@ -9,77 +9,75 @@
<section name="About this Document">
<p>
This document is intended as a "getting started" guide to installing and running the
Apache Lucene web application demo. This guide assumes that you have read the
information in the previous two examples or already know it anyhow. We'll use
Tomcat 4.0.1 as our reference web container. These demos should work with nearly
any container, but it is up to you to adapt them appropriately.
This document is intended as a "getting started" guide to installing and running the Lucene
web application demo. This guide assumes that you have read the information in the previous two
examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
container, but you may have to adapt them appropriately.
</p>
</section>
<section name="About the Demos">
<p>
The Lucene Web Application demo is a template web application intended for deployment
on Tomcat or a similar web container. It's NOT designed as a "best practices"
implementation by ANY means. It's more of a "hello world" type Lucene Web App.
The purpose of this application is to demonstrate Lucene. With that being said,
it should be relatively simple to create a small searchable website in Tomcat or
a similar application server.
The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
Lucene. With that being said, it should be relatively simple to create a small searchable website
in Tomcat or a similar application server.
</p>
</section>
<section name="Indexing Files">
<p>
Once you've gotten this far you're probably itching to go.
Let's start by creating the index you'll need for the web examples.
Since you've already set your classpath in the previous examples,
all you need to do is type
<b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>.
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer
exception).
{index-dir}
should be a directory that Tomcat has permission to read and write, but is
outside of a web accessible context. By default the webapp is configured
to look in <b>/opt/lucene/index</b> for this index.
<p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
all you need to do is type:
<pre>
java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
</pre>
You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
(make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
<code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
outside of a web accessible context. By default the webapp is configured to look in
<code>/opt/lucene/index</code> for this index.
</p>
</section>
<section name="Deploying the Demos">
<p>Located in your distribution directory you should see
a war file called luceneweb.war. Copy this to your
{tomcat-home}/webapps directory. You may need to restart
Tomcat. </p>
</section>
<p>Located in your distribution directory you should see a war file called
<code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
<code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
You may need to restart Tomcat. </p> </section>
<section name="Configuration">
<p>
From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not
present, try browsing to "http://localhost:8080/luceneweb" then look again.
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the
location you used for your index. You may also customize the appTitle and appFooter
strings as you see fit. Once you have finished altering the configuration you should
restart Tomcat. You may also wish to update the war file by typing
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.
(The -u option is not available in all versions of jar. In this case recreate the war file).
<p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
<code>indexLocation</code> is equal to the location you used for your index. You may also customize
the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
altering the configuration you may need to restart Tomcat. You may also wish to update the war file
by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
war file).
</p>
</section>
<section name="Running the Demos">
<p>Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb"
enter "test" and the number of items per page and press search.</p>
<p>You should now be looking either at a number of results (provided you didn't erase the
Tomcat examples) or nothing. Try other search terms. Depending on the number of items
per page you set and results returned, there may be a link at the bottom that says "more results>>",
clicking it goes to subsequent pages. If you get an error regarding opening the index, then you
probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the
index (or you skipped the step of creating it).</p>
</section>
<p>Now you're ready to roll. In your browser set the url to
<code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
page and press search.</p>
<p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
examples) or nothing. If you get an error regarding opening the index, then you probably set the
path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
(or you skipped the step of creating it). Try other search terms. Depending on the number of items
per page you set and results returned, there may be a link at the bottom that says <b>More
Results>></b>; clicking it takes you to subsequent pages. </p> </section>
<section name="About the code...">
<p>
If you want to know more about how this web app works or how to customize it then
<a href="demo4.html">read on&gt;&gt;&gt;</a>.
If you want to know more about how this web app works or how to customize it then <a
href="demo4.html">read on&gt;&gt;&gt;</a>.
</p>
</section>

View File

@ -8,124 +8,146 @@
<section name="About the Code">
<p>
In this section we walk through the sources behind the basic Lucene Web Application demo.
Where to find it, its parts, and their function. This section is intended for Java developers
wishing to understand how to use Apache Lucene in their applications or for those involved
in deploying web applications based on Lucene.
In this section we walk through the sources behind the basic Lucene Web Application demo: where to
find them, their parts and their function. This section is intended for Java developers wishing to
understand how to use Lucene in their applications or for those involved in deploying web
applications based on Lucene.
</p>
</section>
<section name="Location of the source (developers/deployers)">
<p>
Relative the directory created when you extracted Lucene or retreived it from Subversion, you
should see a directory called "src" which in turn contains a directory called "jsp".
This is the root for all of the Lucene web demo.
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called <code>src</code> which in turn contains a directory called
<code>jsp</code>. This is the root for all of the Lucene web demo.
</p>
<p>
Within this directory you should see the index.jsp class. Bring this up in vi or your
editor of choice.
Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
choice.
</p>
</section>
<section name="index.jsp (developers/deployers)">
<p>
This jsp page is pretty boring by itself. All it does is include a header, display a form and
include a footer. If you look at the form, it has two fields: query (where you enter your
search criteria) and maxresults where you specify the number of results per page. If you look
at the form tag, you'll notice it uses the get method as opposed to the post. While this is
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the
usefulness of being able to bookmark things like searches. By the structure of this JSP it should
be easy to customize it without even editing this particular file. You could simply change the
header and footer. Let's look at the header.jsp (located in the same directory) next.
This jsp page is pretty boring by itself. All it does is include a header, display a form and
include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
your search criteria) and <code>maxresults</code> where you specify the number of results per page.
By the structure of this JSP it should be easy to customize it without even editing this particular
file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
(located in the same directory) next.
</p>
</section>
<section name="header.jsp (developers/deployers)">
<p>
The header is also very simple by itself. The only thing it does is include the configuration.jsp
(which you looked at in the last section of this guide) and set the title and a brief header. This
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the
footer because all it does is display the footer and close your tags. Let's look at the results.jsp,
the meat of this application next.
The header is also very simple by itself. The only thing it does is include the
<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
up a bit. We won't cover the footer because all it does is display the footer and close your tags.
Let's look at the <code>results.jsp</code>, the meat of this application, next.
</p>
</section>
<section name="results.jsp (developers)">
<p>
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not
cover this as it's commented well enough. It does not perform any optimizations such as caching results,
etc. as that would make this a more complex example. The first thing in this page is the actual imports
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the
WEB-INF/lib directory in the final war file.
Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
results, which we'll not cover here as it's commented well enough. The first thing in this page is
the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
</p>
<p>
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell
the rest of the sections of the jsp not to continue.
You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
there it constructs an <code><a
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
<code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
error of any kind in opening the index, it is displayed to the user and the boolean flag
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
</p>
<p>
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum
number of results per page. If the maximum results per page is not set or not valid then it and the
start index are set to default values. If only the start index is invalid it is set to a default value. If
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering
or some form of browser malfunction).
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
maximum number of results per page. If the maximum results per page is not set or not valid then it
and the start index are set to default values. If only the start index is invalid it is set to a
default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
this is the result of url tampering or some form of browser malfunction).
</p>
<p>
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the
string literal "contents" included. This is to specify the search should include the contents and not
the title, url or some other field in the indexed documents. If there is any error in constructing a Query
object an error is displayed to the user.
The jsp moves on to construct a <code><a
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
analyze the search text. This matches the analyzer used during indexing (<code><a
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
recommended. This is passed to the <code><a
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
that the search should cover the <code>contents</code> field and not the <code>title</code>,
<code>url</code> or some other field in the indexed documents. If there is any error in
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
error is displayed to the user.
</p>
<p>
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are
returned in a collection called "hits". If the length property of the hits collection is 0 then an error
is displayed to the user and the error flag is set.
In the next section of the jsp the <code><a
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
given the query object. The results are returned in a collection called <code>hits</code>. If the
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
error is displayed to the user and the error flag is set.
</p>
<p>
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged
but the search is repeated every time. This is an area where optimization could improve performance for large
result sets.
Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
account, and displays properties of the <code><a
href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
with "url", "title" and "contents").
</p>
<p>
Please note that in a real deployment of Lucene, it's best to instantiate <code><a
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
share them across search requests, instead of re-instantiating per search request.
</p>
</section>
<section name="More sources (developers)">
<p>
There are additional sources used by the web app that were not specifically covered by either walkthrough. For
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is
beyond our scope; however, by now you should feel like you're "getting started" with Lucene.
There are additional sources used by the web app that were not specifically covered by either
walkthrough. For example the HTML parser, the <code><a
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a
href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
similar to the classes covered in the first example, with properties specific to parsing and
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
started" with Lucene.
</p>
</section>
<section name="Where to go from here? (Everyone!)">
<section name="Where to go from here? (everyone!)">
<p>
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping,
you'll have a broken link in your results. If you want to index non-local files or have some other
needs this isn't supported, plus there may be security issues with running the indexing application from
your webapps directory. There are a number of things left for you the implementor or developer to do.
support that context or redirect to it), anywhere where the directory doesn't quite match the
context mapping, you'll have a broken link in your results. If you want to index non-local files or
have some other needs this isn't supported, plus there may be security issues with running the
indexing application from your webapps directory. There are a number of things left for you the
developer to do.
</p>
<p>
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!),
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd
want to follow the above advice and customize the application to look a little more fancy than black on
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists!
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
would assume you'd want to follow the above advice and customize the application to look a little
more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
</p>
</section>
<section name="When to contact the Author">
<p>
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback,
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the
lists so that everyone can share in them. Certainly you'll get the most help there as well.
Thanks for understanding.
Please resist the urge to contact the authors of this document (without bribes of fame and fortune
attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a
href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
Certainly you'll get the most help that way as well. That being said, feedback, and modifications
to this document and samples are ever so greatly appreciated. They are just best sent to the lists
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
everyone can share in them. Thanks for understanding!
</p>
</section>

View File

@ -8,42 +8,40 @@
<section name="Getting Started">
<p>
This document is intended as a "getting started" guide. It has three basic
audiences: novices looking to install Apache Lucene on their application or
web server, developers looking to modify or base the applications they develop
on Lucene, and developers looking to become involved in and contribute to the
development of Lucene. This document is written in tutorial and walkthrough
format. It intends to help you in "getting started", but does not go into great
depth into some of the conceptual or inner details of Apache Lucene.
This document is intended as a "getting started" guide. It has three audiences: first-time users
looking to install Apache Lucene in their application or web server; developers looking to modify or base
the applications they develop on Lucene; and developers looking to become involved in and contribute
to the development of Lucene. This document is written in tutorial and walk-through format. The
goal is to help you "get started". It does not go into great depth on some of the conceptual or
inner details of Lucene.
</p>
<p>
Each section listed below builds on one another. That being said more advanced users may
wish to skip sections.
Each section listed below builds on one another. More advanced users
may wish to skip sections.
</p>
<ul>
<li><a href="demo.html">About the basic Lucene demo and its usage</a>.
This section is intended for anyone who wants a basic background on using the provided Lucene demos.</li>
<li><a href="demo.html">About the command-line Lucene demo and its usage</a>. This section
is intended for anyone who wants to use the command-line Lucene demo.</li> <p/>
<li><a href="demo2.html">About the sources and implementation
for the basic Lucene demo</a> section we walk through . This section is intended for developers.</li>
<li><a href="demo2.html">About the sources and implementation for the command-line Lucene
demo</a>. This section walks through the implementation details (sources) of the
command-line Lucene demo. This section is intended for developers.</li> <p/>
<li><a href="demo3.html">About installing
and configuring the template web application</a>. While this walkthrough assumes
Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have
the requisite knowledge) adapt the instructions to your container. This section is intended
for those responsible for the development or deployment of Lucene-based web applications.</li>
<li><a href="demo3.html">About installing and configuring the demo template web
application</a>. While this walk-through assumes Tomcat as your container of choice,
there is no reason you can't (provided you have the requisite knowledge) adapt the
instructions to your container. This section is intended for those responsible for the
development or deployment of Lucene-based web applications.</li> <p/>
<li><a href="demo4.html">About the sources used to construct the demo template web
application</a>. Please note the template application is designed to highlight features of
Lucene and is <b>not</b> an example of best practices. (One would hopefully use MVC
architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
would be WAY beyond the scope of this guide.) This section is intended for developers and
those wishing to customize the demo template web application to their needs. </li>
<li><a href="demo4.html">About the sources used to construct the
template web application</a>. Please note the template application is designed to highlight
features of Lucene and is <b>not</b> an example of best practices. (One would hopefully
use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML
with stylesheets, but showing you how to do that would be WAY beyond the scope of this
demonstration. Additionally one could cache results, and perform other performance
optimizations, but those are beyond the scope of this demo).
This section is intended for developers and those wishing to customize the template web
application to their needs. The sections useful to developers only are clearly delineated.</li>
</ul>
</section>