LUCENE-646: fix various small issues with the "getting started" demo pages (patch by Michael McCandless)

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@428554 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Daniel Naber 2006-08-03 22:24:42 +00:00
parent 10517a310c
commit a9a325a4df
11 changed files with 518 additions and 420 deletions

View File

@ -114,8 +114,8 @@ limitations under the License.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
This document is intended as a "getting started" guide to using and running the This document is intended as a "getting started" guide to using and running the Lucene demos.
Apache Lucene demos. It walks you through some basic installation and configuration. It walks you through some basic installation and configuration.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -131,9 +131,8 @@ Apache Lucene demos. It walks you through some basic installation and configura
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
The Lucene Demo code is a set of command line example applications that demonstrate various The Lucene command-line demo code consists of two applications that demonstrate various
functionality of Lucene and how one should go about adding it to their functionalities of Lucene and how one should go about adding Lucene to their applications.
applications.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -143,22 +142,22 @@ applications.
<table border="0" cellspacing="0" cellpadding="2" width="100%"> <table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76"> <tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif"> <font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Setting your classpath"><strong>Setting your classpath</strong></a> <a name="Setting your CLASSPATH"><strong>Setting your CLASSPATH</strong></a>
</font> </font>
</td></tr> </td></tr>
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
First, extract the latest Lucene distribution. First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
</p> </p>
<p> <p>
You should see the Apache Lucene jar file in the directory you created You should see the Lucene JAR file in the directory you created when you extracted the archive. It
when you extracted the archive. It should be named something like should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
<b>lucene-{version}.jar</b>. called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
</p> the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
<p> successfully). Put both of these files in your Java CLASSPATH.
You should also see a file called called <b>lucene-demos-{version}.jar</b>.
Put both of these files in your Java CLASSPATH.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -174,18 +173,27 @@ Put both of these files in your Java CLASSPATH.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
Once you've gotten this far you're probably itching to go. Let's <b> build an index!</b> Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
Assuming you've set your classpath correctly, just type you've set your CLASSPATH correctly, just type:
"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce
a subdirectory called "index" which will contain an index of all of the Lucene <pre>
sourcecode. java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
</pre>
This will produce a subdirectory called <code>index</code> which will contain an index of all of the
Lucene source code.
</p> </p>
<p> <p>
<b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles". You'll be prompted To <b>search the index</b> type:
for a query. Type in a swear word and press the enter key. You'll see that the Lucene
developers are very well mannered and get no results. Now try entering the word "vector". <pre>
That should return a whole bunch of documents. The results will page at every tenth java org.apache.lucene.demo.SearchFiles
result and ask you whether you want more results. </pre>
You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
Lucene developers are very well mannered and get no results. Now try entering the word "vector".
That should return a whole bunch of documents. The results will page at every tenth result and ask
you whether you want more results.
</p> </p>
</blockquote> </blockquote>
</p> </p>

View File

@ -34,7 +34,7 @@ limitations under the License.
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walkthrough</title> <title>Apache Lucene - Apache Lucene - Basic Demo Sources Walk-through</title>
</head> </head>
<body bgcolor="#ffffff" text="#000000" link="#525D76"> <body bgcolor="#ffffff" text="#000000" link="#525D76">
@ -114,9 +114,9 @@ limitations under the License.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
In this section we walk through the sources behind the basic Lucene demo such as where to In this section we walk through the sources behind the command-line Lucene demo: where to find them,
find it, its parts and their function. This section is intended for Java developers their parts and their function. This section is intended for Java developers wishing to understand
wishing to understand how to use Apache Lucene in their applications. how to use Lucene in their applications.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -132,14 +132,14 @@ wishing to understand how to use Apache Lucene in their applications.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
Relative to the directory created when you extracted Lucene or retreived it from Subversion, you Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called "src" which in turn contains a directory called "demo". should see a directory called <code>src</code> which in turn contains a directory called
This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo, <code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
this is where all the Java sources live. <code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
</p> </p>
<p> <p>
Within this directory you should see the IndexFiles class we executed earlier. Bring that Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
up in vi or your alternative text editor and lets take a look at it. Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -155,43 +155,45 @@ up in vi or your alternative text editor and lets take a look at it.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index. As we discussed in the previous walk-through, the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
Lets take a look at how it does this. Index. Let's take a look at how it does this.
</p> </p>
<p> <p>
The first substantial thing the main function does is instantiate an instance The first substantial thing the <code>main</code> function does is instantiate <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
of IndexWriter. It passes a string called "index" and a new instance of a class called "<code>index</code>" and a new instance of a class called <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
"StandardAnalyzer". The "index" string is the name of the directory that all index information The "<code>index</code>" string is the name of the filesystem directory where all index information
should be stored in. Because we're not passing any path information, one must assume this should be stored. Because we're not passing a full path, this will be created as a subdirectory of
will be created as a subdirectory of the current directory (if it does not already exist). On the current working directory (if it does not already exist). On some platforms, it may be created
some platforms this may actually result in it being created in other directories (such as in other directories (such as the user's home directory).
the user's home directory).
</p> </p>
<p> <p>
The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
must instantiate it with a path that it can write the index into, if this path does not class responsible for creating indices. To use it you must instantiate it with a path that it can
exist it will create it, otherwise it will refresh the index living at that path. You write the index into. If this path does not exist it will first create it. Otherwise it will
must a also pass an instance of <b>org.apache.lucene.analysis.Analyzer</b>. refresh the index at that path. You can also create an index using one of the subclasses of <code><a href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
instance of <code><a href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
</p> </p>
<p> <p>
The <b>Analyzer</b>, in this case, the <b>StandardAnalyzer</b> is little more than a standard Java The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index. are using, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
strings that would be useless for searching (e.g. <b>'s</b>) . It should be noted that there are different useless words and characters from the index. By useless words and characters I mean common language
rules for every language, and you should use the proper analyzer for each. Lucene currently words such as articles (a, an, the, etc.) and other strings that would be useless for searching
provides Analyzers for English and German, more can be found in the Lucene Sandbox. (e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
different languages (see the <code>*Analyzer.java</code> sources under <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
</p> </p>
<p> <p>
Looking down further in the file, you should see the indexDocs() code. This recursive function Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
simply crawls the directories and uses FileDocument to create Document objects. The Document function simply crawls the directories and uses <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
is simply a data object to represent the content in the file as well as its creation time and represent the content in the file as well as its creation time and location. These instances are
location. These instances are added to the indexWriter. Take a look inside FileDocument. It's added to the <code>indexWriter</code>. Take a look inside <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
not particularly complicated, it just adds fields to the Document. complicated. It just adds fields to the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code>.
</p> </p>
<p> <p>
As you can see there isn't much to creating an index. The devil is in the details. You may also As you can see there isn't much to creating an index. The devil is in the details. You may also
wish to examine the other samples in this directory, particularly the IndexHTML class. It is wish to examine the other samples in this directory, particularly the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
a bit more complex but builds upon this example. complex but builds upon this example.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -207,12 +209,19 @@ a bit more complex but builds upon this example.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed quite simple. It primarily collaborates with an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
with an analyzer used to interperate your query in the same way the Index was interperated: finding (which is used in the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
results from the QueryParser which is passed to the searcher. The searcher results are returned in query parser is constructed with an analyzer used to interpret your query text in the same way the
a collection of Documents called "Hits" which is then iterated through and displayed to the user. documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
the results from the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
the searcher. Note that it's also possible to programmatically construct a rich <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
syntax</a> into the corresponding <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
returned in a collection of Documents called <code><a href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
displayed to the user.
</p> </p>
</blockquote> </blockquote>
</p> </p>

View File

@ -114,11 +114,10 @@ limitations under the License.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
This document is intended as a "getting started" guide to installing and running the This document is intended as a "getting started" guide to installing and running the Lucene
Apache Lucene web application demo. This guide assumes that you have read the web application demo. This guide assumes that you have read the information in the previous two
information in the previous two examples or already know it anyhow. We'll use examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
Tomcat 4.0.1 as our reference web container. These demos should work with nearly container, but you may have to adapt them appropriately.
any container, but it is up to you to adapt them appropriately.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -134,12 +133,11 @@ any container, but it is up to you to adapt them appropriately.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
The Lucene Web Application demo is a template web application intended for deployment The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
on Tomcat or a similar web container. It's NOT designed as a "best practices" similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
implementation by ANY means. It's more of a "hello world" type Lucene Web App. more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
The purpose of this application is to demonstrate Lucene. With that being said, Lucene. With that being said, it should be relatively simple to create a small searchable website
it should be relatively simple to create a small searchable website in Tomcat or in Tomcat or a similar application server.
a similar application server.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -154,18 +152,19 @@ a similar application server.
</td></tr> </td></tr>
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
Once you've gotten this far you're probably itching to go. you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
Let's start by creating the index you'll need for the web examples. all you need to do is type:
Since you've already set your classpath in the previous examples,
all you need to do is type <pre>
<b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>. java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer </pre>
exception).
{index-dir} You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
should be a directory that Tomcat has permission to read and write, but is (make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
outside of a web accessible context. By default the webapp is configured <code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
to look in <b>/opt/lucene/index</b> for this index. outside of a web accessible context. By default the webapp is configured to look in
<code>/opt/lucene/index</code> for this index.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -180,10 +179,10 @@ to look in <b>/opt/lucene/index</b> for this index.
</td></tr> </td></tr>
<tr><td> <tr><td>
<blockquote> <blockquote>
<p>Located in your distribution directory you should see <p>Located in your distribution directory you should see a war file called
a war file called luceneweb.war. Copy this to your <code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
{tomcat-home}/webapps directory. You may need to restart <code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
Tomcat. </p> You may need to restart Tomcat. </p>
</blockquote> </blockquote>
</p> </p>
</td></tr> </td></tr>
@ -197,15 +196,15 @@ Tomcat. </p>
</td></tr> </td></tr>
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
present, try browsing to "http://localhost:8080/luceneweb" then look again. the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the <code>indexLocation</code> is equal to the location you used for your index. You may also customize
location you used for your index. You may also customize the appTitle and appFooter the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
strings as you see fit. Once you have finished altering the configuration you should altering the configuration you may need to restart Tomcat. You may also wish to update the war file
restart Tomcat. You may also wish to update the war file by typing by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory. subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
(The -u option is not available in all versions of jar. In this case recreate the war file). war file).
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -220,14 +219,15 @@ restart Tomcat. You may also wish to update the war file by typing
</td></tr> </td></tr>
<tr><td> <tr><td>
<blockquote> <blockquote>
<p>Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb" <p>Now you're ready to roll. In your browser set the url to
enter "test" and the number of items per page and press search.</p> <code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
<p>You should now be looking either at a number of results (provided you didn't erase the page and press search.</p>
Tomcat examples) or nothing. Try other search terms. Depending on the number of items <p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
per page you set and results returned, there may be a link at the bottom that says "more results&gt;&gt;", examples) or nothing. If you get an error regarding opening the index, then you probably set the
clicking it goes to subsequent pages. If you get an error regarding opening the index, then you path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the (or you skipped the step of creating it). Try other search terms. Depending on the number of items
index (or you skipped the step of creating it).</p> per page you set and results returned, there may be a link at the bottom that says <b>More
Results&gt;&gt;</b>; clicking it takes you to subsequent pages. </p>
</blockquote> </blockquote>
</p> </p>
</td></tr> </td></tr>
@ -242,8 +242,7 @@ index (or you skipped the step of creating it).</p>
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
If you want to know more about how this web app works or how to customize it then If you want to know more about how this web app works or how to customize it then <a href="demo4.html">read on&gt;&gt;&gt;</a>.
<a href="demo4.html">read on&gt;&gt;&gt;</a>.
</p> </p>
</blockquote> </blockquote>
</p> </p>

View File

@ -114,10 +114,10 @@ limitations under the License.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
In this section we walk through the sources behind the basic Lucene Web Application demo. In this section we walk through the sources behind the basic Lucene Web Application demo: where to
Where to find it, its parts, and their function. This section is intended for Java developers find them, their parts and their function. This section is intended for Java developers wishing to
wishing to understand how to use Apache Lucene in their applications or for those involved understand how to use Lucene in their applications or for those involved in deploying web
in deploying web applications based on Lucene. applications based on Lucene.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -133,13 +133,13 @@ in deploying web applications based on Lucene.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
Relative the directory created when you extracted Lucene or retreived it from Subversion, you Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called "src" which in turn contains a directory called "jsp". should see a directory called <code>src</code> which in turn contains a directory called
This is the root for all of the Lucene web demo. <code>jsp</code>. This is the root for all of the Lucene web demo.
</p> </p>
<p> <p>
Within this directory you should see the index.jsp class. Bring this up in vi or your Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
editor of choice. choice.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -156,13 +156,11 @@ editor of choice.
<blockquote> <blockquote>
<p> <p>
This jsp page is pretty boring by itself. All it does is include a header, display a form and This jsp page is pretty boring by itself. All it does is include a header, display a form and
include a footer. If you look at the form, it has two fields: query (where you enter your include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
search criteria) and maxresults where you specify the number of results per page. If you look your search criteria) and <code>maxresults</code> where you specify the number of results per page.
at the form tag, you'll notice it uses the get method as opposed to the post. While this is By the structure of this JSP it should be easy to customize it without even editing this particular
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
usefulness of being able to bookmark things like searches. By the structure of this JSP it should (located in the same directory) next.
be easy to customize it without even editing this particular file. You could simply change the
header and footer. Let's look at the header.jsp (located in the same directory) next.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -178,11 +176,11 @@ header and footer. Let's look at the header.jsp (located in the same directory)
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
The header is also very simple by itself. The only thing it does is include the configuration.jsp The header is also very simple by itself. The only thing it does is include the
(which you looked at in the last section of this guide) and set the title and a brief header. This <code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
footer because all it does is display the footer and close your tags. Let's look at the results.jsp, up a bit. We won't cover the footer because all it does is display the footer and close your tags.
the meat of this application next. Let's look at the <code>results.jsp</code>, the meat of this application, next.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -198,43 +196,52 @@ the meat of this application next.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
cover this as it's commented well enough. It does not perform any optimizations such as caching results, results, which we'll not cover here as it's commented well enough. The first thing in this page is
etc. as that would make this a more complex example. The first thing in this page is the actual imports the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
WEB-INF/lib directory in the final war file.
</p> </p>
<p> <p>
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there there it constructs an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell <code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
the rest of the sections of the jsp not to continue. error of any kind in opening the index, it is displayed to the user and the boolean flag
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
</p> </p>
<p> <p>
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
number of results per page. If the maximum results per page is not set or not valid then it and the maximum number of results per page. If the maximum results per page is not set or not valid then it
start index are set to default values. If only the start index is invalid it is set to a default value. If and the start index are set to default values. If only the start index is invalid it is set to a
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
or some form of browser malfunction). this is the result of url tampering or some form of browser malfunction).
</p> </p>
<p> <p>
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it The jsp moves on to construct a <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the analyze the search text. This matches the analyzer used during indexing (<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
string literal "contents" included. This is to specify the search should include the contents and not recommended. This is passed to the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
the title, url or some other field in the indexed documents. If there is any error in constructing a Query criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
object an error is displayed to the user. object. You'll also notice the string literal <code>"contents"</code> included. This specifies
that the search should cover the <code>contents</code> field and not the <code>title</code>,
<code>url</code> or some other field in the indexed documents. If there is any error in
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
error is displayed to the user.
</p> </p>
<p> <p>
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are In the next section of the jsp the <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
returned in a collection called "hits". If the length property of the hits collection is 0 then an error given the query object. The results are returned in a collection called <code>hits</code>. If the
is displayed to the user and the error flag is set. length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
error is displayed to the user and the error flag is set.
</p> </p>
<p> <p>
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case account, and displays properties of the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
but the search is repeated every time. This is an area where optimization could improve performance for large <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
result sets. with "url", "title" and "contents").
</p>
<p>
Please note that in a real deployment of Lucene, it's best to instantiate <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
share them across search requests, instead of re-instantiating per search request.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -250,10 +257,11 @@ result sets.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
There are additional sources used by the web app that were not specifically covered by either walkthrough. For There are additional sources used by the web app that were not specifically covered by either
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes walkthrough. For example the HTML parser, the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is similar to the classes covered in the first example, with properties specific to parsing and
beyond our scope; however, by now you should feel like you're "getting started" with Lucene. indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
started" with Lucene.
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -263,7 +271,7 @@ beyond our scope; however, by now you should feel like you're "getting started"
<table border="0" cellspacing="0" cellpadding="2" width="100%"> <table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76"> <tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif"> <font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Where to go from here? (Everyone!)"><strong>Where to go from here? (Everyone!)</strong></a> <a name="Where to go from here? (everyone!)"><strong>Where to go from here? (everyone!)</strong></a>
</font> </font>
</td></tr> </td></tr>
<tr><td> <tr><td>
@ -271,16 +279,18 @@ beyond our scope; however, by now you should feel like you're "getting started"
<p> <p>
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping, support that context or redirect to it), anywhere where the directory doesn't quite match the
you'll have a broken link in your results. If you want to index non-local files or have some other context mapping, you'll have a broken link in your results. If you want to index non-local files or
needs this isn't supported, plus there may be security issues with running the indexing application from have some other needs this isn't supported, plus there may be security issues with running the
your webapps directory. There are a number of things left for you the implementor or developer to do. indexing application from your webapps directory. There are a number of things left for you the
developer to do.
</p> </p>
<p> <p>
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!), In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
want to follow the above advice and customize the application to look a little more fancy than black on would assume you'd want to follow the above advice and customize the application to look a little
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists! more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
</p> </p>
</blockquote> </blockquote>
</p> </p>
@ -296,11 +306,12 @@ white with "Lucene Template" at the top. We'll see you on the Lucene Users' or
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First Please resist the urge to contact the authors of this document (without bribes of fame and fortune
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback, attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the Certainly you'll get the most help that way as well. That being said, feedback, and modifications
lists so that everyone can share in them. Certainly you'll get the most help there as well. to this document and samples are ever so greatly appreciated. They are just best sent to the lists
Thanks for understanding. or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
everyone can share in them. Thanks for understanding!
</p> </p>
</blockquote> </blockquote>
</p> </p>

View File

@ -114,40 +114,38 @@ limitations under the License.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
This document is intended as a "getting started" guide. It has three basic This document is intended as a "getting started" guide. It has three audiences: first-time users
audiences: novices looking to install Apache Lucene on their application or looking to install Apache Lucene in their application or web server; developers looking to modify or base
web server, developers looking to modify or base the applications they develop the applications they develop on Lucene; and developers looking to become involved in and contribute
on Lucene, and developers looking to become involved in and contribute to the to the development of Lucene. This document is written in tutorial and walk-through format. The
development of Lucene. This document is written in tutorial and walkthrough goal is to help you "get started". It does not go into great depth on some of the conceptual or
format. It intends to help you in "getting started", but does not go into great inner details of Lucene.
depth into some of the conceptual or inner details of Apache Lucene.
</p> </p>
<p> <p>
Each section listed below builds on one another. That being said more advanced users may Each section listed below builds on one another. More advanced users
wish to skip sections. may wish to skip sections.
</p> </p>
<ul> <ul>
<li><a href="demo.html">About the basic Lucene demo and its usage</a>. <li><a href="demo.html">About the command-line Lucene demo and its usage</a>. This section
This section is intended for anyone who wants a basic background on using the provided Lucene demos.</li> is intended for anyone who wants to use the command-line Lucene demo.</li> <p />
<li><a href="demo2.html">About the sources and implementation <li><a href="demo2.html">About the sources and implementation for the command-line Lucene
for the basic Lucene demo</a> section we walk through . This section is intended for developers.</li> demo</a>. This section walks through the implementation details (sources) of the
command-line Lucene demo. This section is intended for developers.</li> <p />
<li><a href="demo3.html">About installing <li><a href="demo3.html">About installing and configuring the demo template web
and configuring the template web application</a>. While this walkthrough assumes application</a>. While this walk-through assumes Tomcat as your container of choice,
Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have there is no reason you can't (provided you have the requisite knowledge) adapt the
the requisite knowledge) adapt the instructions to your container. This section is intended instructions to your container. This section is intended for those responsible for the
for those responsible for the development or deployment of Lucene-based web applications.</li> development or deployment of Lucene-based web applications.</li> <p />
<li><a href="demo4.html">About the sources used to construct the demo template web
application</a>. Please note the template application is designed to highlight features of
Lucene and is <b>not</b> an example of best practices. (One would hopefully use MVC
architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
would be WAY beyond the scope of this guide.) This section is intended for developers and
those wishing to customize the demo template web application to their needs. </li>
<li><a href="demo4.html">About the sources used to construct the
template web application</a>. Please note the template application is designed to highlight
features of Lucene and is <b>not</b> an example of best practices. (One would hopefully
use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML
with stylesheets, but showing you how to do that would be WAY beyond the scope of this
demonstration. Additionally one could cache results, and perform other performance
optimizations, but those are beyond the scope of this demo).
This section is intended for developers and those wishing to customize the template web
application to their needs. The sections useful to developers only are clearly delineated.</li>
</ul> </ul>
</blockquote> </blockquote>
</p> </p>

View File

@ -1,4 +1,4 @@
<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %> <%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.analysis.standard.StandardAnalyzer, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %>
<% <%
/* /*
@ -76,7 +76,7 @@ public String escapeHTML(String s) {
//query string so you get the //query string so you get the
//treatment //treatment
Analyzer analyzer = new StopAnalyzer(); //construct our usual analyzer Analyzer analyzer = new StandardAnalyzer(); //construct our usual analyzer
try { try {
QueryParser qp = new QueryParser("contents", analyzer); QueryParser qp = new QueryParser("contents", analyzer);
query = qp.parse(queryString); //parse the query = qp.parse(queryString); //parse the
@ -126,8 +126,11 @@ public String escapeHTML(String s) {
<% <%
Document doc = hits.doc(i); //get the next document Document doc = hits.doc(i); //get the next document
String doctitle = doc.get("title"); //get its title String doctitle = doc.get("title"); //get its title
String url = doc.get("url"); //get its url field String url = doc.get("path"); //get its path field
if ((doctitle == null) || doctitle.equals("")) //use the url if it has no title if (url != null && url.startsWith("../webapps/")) { // strip off ../webapps prefix if present
url = url.substring(10);
}
if ((doctitle == null) || doctitle.equals("")) //use the path if it has no title
doctitle = url; doctitle = url;
//then output! //then output!
%> %>

View File

@ -8,49 +8,58 @@
<section name="About this Document"> <section name="About this Document">
<p> <p>
This document is intended as a "getting started" guide to using and running the This document is intended as a "getting started" guide to using and running the Lucene demos.
Apache Lucene demos. It walks you through some basic installation and configuration. It walks you through some basic installation and configuration.
</p> </p>
</section> </section>
<section name="About the Demos"> <section name="About the Demos">
<p> <p>
The Lucene Demo code is a set of command line example applications that demonstrate various The Lucene command-line demo code consists of two applications that demonstrate various
functionality of Lucene and how one should go about adding it to their functionalities of Lucene and how one should go about adding Lucene to their applications.
applications.
</p> </p>
</section> </section>
<section name="Setting your classpath"> <section name="Setting your CLASSPATH">
<p> <p>
First, extract the latest Lucene distribution. First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a
href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
</p> </p>
<p> <p>
You should see the Apache Lucene jar file in the directory you created You should see the Lucene JAR file in the directory you created when you extracted the archive. It
when you extracted the archive. It should be named something like should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
<b>lucene-{version}.jar</b>. called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
</p> the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
<p> successfully). Put both of these files in your Java CLASSPATH.
You should also see a file called called <b>lucene-demos-{version}.jar</b>.
Put both of these files in your Java CLASSPATH.
</p> </p>
</section> </section>
<section name="Indexing Files"> <section name="Indexing Files">
<p> <p>
Once you've gotten this far you're probably itching to go. Let's <b> build an index!</b> Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
Assuming you've set your classpath correctly, just type you've set your CLASSPATH correctly, just type:
"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce
a subdirectory called "index" which will contain an index of all of the Lucene <pre>
sourcecode. java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
</pre>
This will produce a subdirectory called <code>index</code> which will contain an index of all of the
Lucene source code.
</p> </p>
<p> <p>
<b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles". You'll be prompted To <b>search the index</b> type:
for a query. Type in a swear word and press the enter key. You'll see that the Lucene
developers are very well mannered and get no results. Now try entering the word "vector". <pre>
That should return a whole bunch of documents. The results will page at every tenth java org.apache.lucene.demo.SearchFiles
result and ask you whether you want more results. </pre>
You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
Lucene developers are very well mannered and get no results. Now try entering the word "vector".
That should return a whole bunch of documents. The results will page at every tenth result and ask
you whether you want more results.
</p> </p>
</section> </section>

View File

@ -2,89 +2,132 @@
<document> <document>
<properties> <properties>
<author email="acoliver@apache.org">Andrew C. Oliver</author> <author email="acoliver@apache.org">Andrew C. Oliver</author>
<title>Apache Lucene - Basic Demo Sources Walkthrough</title> <title>Apache Lucene - Basic Demo Sources Walk-through</title>
</properties> </properties>
<body> <body>
<section name="About the Code"> <section name="About the Code">
<p> <p>
In this section we walk through the sources behind the basic Lucene demo such as where to In this section we walk through the sources behind the command-line Lucene demo: where to find them,
find it, its parts and their function. This section is intended for Java developers their parts and their function. This section is intended for Java developers wishing to understand
wishing to understand how to use Apache Lucene in their applications. how to use Lucene in their applications.
</p> </p>
</section> </section>
<section name="Location of the source"> <section name="Location of the source">
<p> <p>
Relative to the directory created when you extracted Lucene or retreived it from Subversion, you Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called "src" which in turn contains a directory called "demo". should see a directory called <code>src</code> which in turn contains a directory called
This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo, <code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
this is where all the Java sources live. <code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
</p> </p>
<p> <p>
Within this directory you should see the IndexFiles class we executed earlier. Bring that Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
up in vi or your alternative text editor and lets take a look at it. Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
</p> </p>
</section> </section>
<section name="IndexFiles"> <section name="IndexFiles">
<p> <p>
As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index. As we discussed in the previous walk-through, the <code><a
Lets take a look at how it does this. href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
Index. Let's take a look at how it does this.
</p> </p>
<p> <p>
The first substantial thing the main function does is instantiate an instance The first substantial thing the <code>main</code> function does is instantiate <code><a
of IndexWriter. It passes a string called "index" and a new instance of a class called href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
"StandardAnalyzer". The "index" string is the name of the directory that all index information "<code>index</code>" and a new instance of a class called <code><a
should be stored in. Because we're not passing any path information, one must assume this href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
will be created as a subdirectory of the current directory (if it does not already exist). On The "<code>index</code>" string is the name of the filesystem directory where all index information
some platforms this may actually result in it being created in other directories (such as should be stored. Because we're not passing a full path, this will be created as a subdirectory of
the user's home directory). the current working directory (if it does not already exist). On some platforms, it may be created
in other directories (such as the user's home directory).
</p> </p>
<p> <p>
The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
must instantiate it with a path that it can write the index into, if this path does not class responsible for creating indices. To use it you must instantiate it with a path that it can
exist it will create it, otherwise it will refresh the index living at that path. You write the index into. If this path does not exist it will first create it. Otherwise it will
must a also pass an instance of <b>org.apache.lucene.analysis.Analyzer</b>. refresh the index at that path. You can also create an index using one of the subclasses of <code><a
href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
instance of <code><a
href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
</p> </p>
<p> <p>
The <b>Analyzer</b>, in this case, the <b>StandardAnalyzer</b> is little more than a standard Java The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index. are using, <code><a
By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
strings that would be useless for searching (e.g. <b>'s</b>) . It should be noted that there are different little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
rules for every language, and you should use the proper analyzer for each. Lucene currently useless words and characters from the index. By useless words and characters I mean common language
provides Analyzers for English and German, more can be found in the Lucene Sandbox. words such as articles (a, an, the, etc.) and other strings that would be useless for searching
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
different languages (see the <code>*Analyzer.java</code> sources under <a
href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
</p> </p>
<p> <p>
Looking down further in the file, you should see the indexDocs() code. This recursive function Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
simply crawls the directories and uses FileDocument to create Document objects. The Document function simply crawls the directories and uses <code><a
is simply a data object to represent the content in the file as well as its creation time and href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a
location. These instances are added to the indexWriter. Take a look inside FileDocument. It's href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a
not particularly complicated, it just adds fields to the Document. href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
represent the content in the file as well as its creation time and location. These instances are
added to the <code>indexWriter</code>. Take a look inside <code><a
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
complicated. It just adds fields to the <code><a
href="api/org/apache/lucene/document/Document.html">Document</a></code>.
</p> </p>
<p> <p>
As you can see there isn't much to creating an index. The devil is in the details. You may also As you can see there isn't much to creating an index. The devil is in the details. You may also
wish to examine the other samples in this directory, particularly the IndexHTML class. It is wish to examine the other samples in this directory, particularly the <code><a
a bit more complex but builds upon this example. href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
complex but builds upon this example.
</p> </p>
</section> </section>
<section name="Searching Files"> <section name="Searching Files">
<p> <p>
The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed quite simple. It primarily collaborates with an <code><a
with an analyzer used to interperate your query in the same way the Index was interperated: finding href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a
the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
results from the QueryParser which is passed to the searcher. The searcher results are returned in (which is used in the <code><a
a collection of Documents called "Hits" which is then iterated through and displayed to the user. href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
query parser is constructed with an analyzer used to interpret your query text in the same way the
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
the results from the <code><a
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
the searcher. Note that it's also possible to programmatically construct a rich <code><a
href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
syntax</a> into the corresponding <code><a
href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
returned in a collection of Documents called <code><a
href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
displayed to the user.
</p> </p>
</section> </section>
<section name="The Web example..."> <section name="The Web example...">
<p> <p>
<a href="demo3.html">read on&gt;&gt;&gt;</a> <a href="demo3.html">read on&gt;&gt;&gt;</a>
</p> </p>
</section> </section>
</body> </body>

View File

@ -9,77 +9,75 @@
<section name="About this Document"> <section name="About this Document">
<p> <p>
This document is intended as a "getting started" guide to installing and running the This document is intended as a "getting started" guide to installing and running the Lucene
Apache Lucene web application demo. This guide assumes that you have read the web application demo. This guide assumes that you have read the information in the previous two
information in the previous two examples or already know it anyhow. We'll use examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
Tomcat 4.0.1 as our reference web container. These demos should work with nearly container, but you may have to adapt them appropriately.
any container, but it is up to you to adapt them appropriately.
</p> </p>
</section> </section>
<section name="About the Demos"> <section name="About the Demos">
<p> <p>
The Lucene Web Application demo is a template web application intended for deployment The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
on Tomcat or a similar web container. It's NOT designed as a "best practices" similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
implementation by ANY means. It's more of a "hello world" type Lucene Web App. more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
The purpose of this application is to demonstrate Lucene. With that being said, Lucene. With that being said, it should be relatively simple to create a small searchable website
it should be relatively simple to create a small searchable website in Tomcat or in Tomcat or a similar application server.
a similar application server.
</p> </p>
</section> </section>
<section name="Indexing Files"> <section name="Indexing Files">
<p> <p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
Once you've gotten this far you're probably itching to go. you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
Let's start by creating the index you'll need for the web examples. all you need to do is type:
Since you've already set your classpath in the previous examples,
all you need to do is type <pre>
<b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>. java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer </pre>
exception).
{index-dir} You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
should be a directory that Tomcat has permission to read and write, but is (make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
outside of a web accessible context. By default the webapp is configured <code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
to look in <b>/opt/lucene/index</b> for this index. outside of a web accessible context. By default the webapp is configured to look in
<code>/opt/lucene/index</code> for this index.
</p> </p>
</section> </section>
<section name="Deploying the Demos"> <section name="Deploying the Demos">
<p>Located in your distribution directory you should see <p>Located in your distribution directory you should see a war file called
a war file called luceneweb.war. Copy this to your <code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
{tomcat-home}/webapps directory. You may need to restart <code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
Tomcat. </p> You may need to restart Tomcat. </p> </section>
</section>
<section name="Configuration"> <section name="Configuration">
<p> <p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
present, try browsing to "http://localhost:8080/luceneweb" then look again. the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the <code>indexLocation</code> is equal to the location you used for your index. You may also customize
location you used for your index. You may also customize the appTitle and appFooter the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
strings as you see fit. Once you have finished altering the configuration you should altering the configuration you may need to restart Tomcat. You may also wish to update the war file
restart Tomcat. You may also wish to update the war file by typing by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory. subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
(The -u option is not available in all versions of jar. In this case recreate the war file). war file).
</p> </p>
</section> </section>
<section name="Running the Demos"> <section name="Running the Demos">
<p>Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb" <p>Now you're ready to roll. In your browser set the url to
enter "test" and the number of items per page and press search.</p> <code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
<p>You should now be looking either at a number of results (provided you didn't erase the page and press search.</p>
Tomcat examples) or nothing. Try other search terms. Depending on the number of items <p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
per page you set and results returned, there may be a link at the bottom that says "more results>>", examples) or nothing. If you get an error regarding opening the index, then you probably set the
clicking it goes to subsequent pages. If you get an error regarding opening the index, then you path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the (or you skipped the step of creating it). Try other search terms. Depending on the number of items
index (or you skipped the step of creating it).</p> per page you set and results returned, there may be a link at the bottom that says <b>More
</section> Results>></b>; clicking it takes you to subsequent pages. </p> </section>
<section name="About the code..."> <section name="About the code...">
<p> <p>
If you want to know more about how this web app works or how to customize it then If you want to know more about how this web app works or how to customize it then <a
<a href="demo4.html">read on&gt;&gt;&gt;</a>. href="demo4.html">read on&gt;&gt;&gt;</a>.
</p> </p>
</section> </section>

View File

@ -8,124 +8,146 @@
<section name="About the Code"> <section name="About the Code">
<p> <p>
In this section we walk through the sources behind the basic Lucene Web Application demo. In this section we walk through the sources behind the basic Lucene Web Application demo: where to
Where to find it, its parts, and their function. This section is intended for Java developers find them, their parts and their function. This section is intended for Java developers wishing to
wishing to understand how to use Apache Lucene in their applications or for those involved understand how to use Lucene in their applications or for those involved in deploying web
in deploying web applications based on Lucene. applications based on Lucene.
</p> </p>
</section> </section>
<section name="Location of the source (developers/deployers)"> <section name="Location of the source (developers/deployers)">
<p> <p>
Relative the directory created when you extracted Lucene or retreived it from Subversion, you Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called "src" which in turn contains a directory called "jsp". should see a directory called <code>src</code> which in turn contains a directory called
This is the root for all of the Lucene web demo. <code>jsp</code>. This is the root for all of the Lucene web demo.
</p> </p>
<p> <p>
Within this directory you should see the index.jsp class. Bring this up in vi or your Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
editor of choice. choice.
</p> </p>
</section> </section>
<section name="index.jsp (developers/deployers)"> <section name="index.jsp (developers/deployers)">
<p> <p>
This jsp page is pretty boring by itself. All it does is include a header, display a form and This jsp page is pretty boring by itself. All it does is include a header, display a form and
include a footer. If you look at the form, it has two fields: query (where you enter your include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
search criteria) and maxresults where you specify the number of results per page. If you look your search criteria) and <code>maxresults</code> where you specify the number of results per page.
at the form tag, you'll notice it uses the get method as opposed to the post. While this is By the structure of this JSP it should be easy to customize it without even editing this particular
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
usefulness of being able to bookmark things like searches. By the structure of this JSP it should (located in the same directory) next.
be easy to customize it without even editing this particular file. You could simply change the
header and footer. Let's look at the header.jsp (located in the same directory) next.
</p> </p>
</section> </section>
<section name="header.jsp (developers/deployers)"> <section name="header.jsp (developers/deployers)">
<p> <p>
The header is also very simple by itself. The only thing it does is include the configuration.jsp The header is also very simple by itself. The only thing it does is include the
(which you looked at in the last section of this guide) and set the title and a brief header. This <code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
footer because all it does is display the footer and close your tags. Let's look at the results.jsp, up a bit. We won't cover the footer because all it does is display the footer and close your tags.
the meat of this application next. Let's look at the <code>results.jsp</code>, the meat of this application, next.
</p> </p>
</section> </section>
<section name="results.jsp (developers)"> <section name="results.jsp (developers)">
<p> <p>
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
cover this as it's commented well enough. It does not perform any optimizations such as caching results, results, which we'll not cover here as it's commented well enough. The first thing in this page is
etc. as that would make this a more complex example. The first thing in this page is the actual imports the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
WEB-INF/lib directory in the final war file.
</p> </p>
<p> <p>
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there there it constructs an <code><a
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
the rest of the sections of the jsp not to continue. <code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
error of any kind in opening the index, it is displayed to the user and the boolean flag
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
</p> </p>
<p> <p>
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
number of results per page. If the maximum results per page is not set or not valid then it and the maximum number of results per page. If the maximum results per page is not set or not valid then it
start index are set to default values. If only the start index is invalid it is set to a default value. If and the start index are set to default values. If only the start index is invalid it is set to a
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
or some form of browser malfunction). this is the result of url tampering or some form of browser malfunction).
</p> </p>
<p> <p>
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it The jsp moves on to construct a <code><a
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
string literal "contents" included. This is to specify the search should include the contents and not analyze the search text. This matches the analyzer used during indexing (<code><a
the title, url or some other field in the indexed documents. If there is any error in constructing a Query href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
object an error is displayed to the user. recommended. This is passed to the <code><a
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
that the search should cover the <code>contents</code> field and not the <code>title</code>,
<code>url</code> or some other field in the indexed documents. If there is any error in
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
error is displayed to the user.
</p> </p>
<p> <p>
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are In the next section of the jsp the <code><a
returned in a collection called "hits". If the length property of the hits collection is 0 then an error href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
is displayed to the user and the error flag is set. given the query object. The results are returned in a collection called <code>hits</code>. If the
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
error is displayed to the user and the error flag is set.
</p> </p>
<p> <p>
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case account, and displays properties of the <code><a
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
but the search is repeated every time. This is an area where optimization could improve performance for large the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
result sets. <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
with "url", "title" and "contents").
</p>
<p>
Please note that in a real deployment of Lucene, it's best to instantiate <code><a
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
share them across search requests, instead of re-instantiating per search request.
</p> </p>
</section> </section>
<section name="More sources (developers)"> <section name="More sources (developers)">
<p> <p>
There are additional sources used by the web app that were not specifically covered by either walkthrough. For There are additional sources used by the web app that were not specifically covered by either
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes walkthrough. For example the HTML parser, the <code><a
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a
beyond our scope; however, by now you should feel like you're "getting started" with Lucene. href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
similar to the classes covered in the first example, with properties specific to parsing and
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
started" with Lucene.
</p> </p>
</section> </section>
<section name="Where to go from here? (Everyone!)"> <section name="Where to go from here? (everyone!)">
<p> <p>
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping, support that context or redirect to it), anywhere where the directory doesn't quite match the
you'll have a broken link in your results. If you want to index non-local files or have some other context mapping, you'll have a broken link in your results. If you want to index non-local files or
needs this isn't supported, plus there may be security issues with running the indexing application from have some other needs this isn't supported, plus there may be security issues with running the
your webapps directory. There are a number of things left for you the implementor or developer to do. indexing application from your webapps directory. There are a number of things left for you the
developer to do.
</p> </p>
<p> <p>
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!), In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
want to follow the above advice and customize the application to look a little more fancy than black on would assume you'd want to follow the above advice and customize the application to look a little
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists! more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
</p> </p>
</section> </section>
<section name="When to contact the Author"> <section name="When to contact the Author">
<p> <p>
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First Please resist the urge to contact the authors of this document (without bribes of fame and fortune
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback, attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
lists so that everyone can share in them. Certainly you'll get the most help there as well. Certainly you'll get the most help that way as well. That being said, feedback, and modifications
Thanks for understanding. to this document and samples are ever so greatly appreciated. They are just best sent to the lists
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
everyone can share in them. Thanks for understanding!
</p> </p>
</section> </section>

View File

@ -8,42 +8,40 @@
<section name="Getting Started"> <section name="Getting Started">
<p> <p>
This document is intended as a "getting started" guide. It has three basic This document is intended as a "getting started" guide. It has three audiences: first-time users
audiences: novices looking to install Apache Lucene on their application or looking to install Apache Lucene in their application or web server; developers looking to modify or base
web server, developers looking to modify or base the applications they develop the applications they develop on Lucene; and developers looking to become involved in and contribute
on Lucene, and developers looking to become involved in and contribute to the to the development of Lucene. This document is written in tutorial and walk-through format. The
development of Lucene. This document is written in tutorial and walkthrough goal is to help you "get started". It does not go into great depth on some of the conceptual or
format. It intends to help you in "getting started", but does not go into great inner details of Lucene.
depth into some of the conceptual or inner details of Apache Lucene.
</p> </p>
<p> <p>
Each section listed below builds on one another. That being said more advanced users may Each section listed below builds on one another. More advanced users
wish to skip sections. may wish to skip sections.
</p> </p>
<ul> <ul>
<li><a href="demo.html">About the basic Lucene demo and its usage</a>. <li><a href="demo.html">About the command-line Lucene demo and its usage</a>. This section
This section is intended for anyone who wants a basic background on using the provided Lucene demos.</li> is intended for anyone who wants to use the command-line Lucene demo.</li> <p/>
<li><a href="demo2.html">About the sources and implementation <li><a href="demo2.html">About the sources and implementation for the command-line Lucene
for the basic Lucene demo</a> section we walk through . This section is intended for developers.</li> demo</a>. This section walks through the implementation details (sources) of the
command-line Lucene demo. This section is intended for developers.</li> <p/>
<li><a href="demo3.html">About installing <li><a href="demo3.html">About installing and configuring the demo template web
and configuring the template web application</a>. While this walkthrough assumes application</a>. While this walk-through assumes Tomcat as your container of choice,
Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have there is no reason you can't (provided you have the requisite knowledge) adapt the
the requisite knowledge) adapt the instructions to your container. This section is intended instructions to your container. This section is intended for those responsible for the
for those responsible for the development or deployment of Lucene-based web applications.</li> development or deployment of Lucene-based web applications.</li> <p/>
<li><a href="demo4.html">About the sources used to construct the demo template web
application</a>. Please note the template application is designed to highlight features of
Lucene and is <b>not</b> an example of best practices. (One would hopefully use MVC
architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
would be WAY beyond the scope of this guide.) This section is intended for developers and
those wishing to customize the demo template web application to their needs. </li>
<li><a href="demo4.html">About the sources used to construct the
template web application</a>. Please note the template application is designed to highlight
features of Lucene and is <b>not</b> an example of best practices. (One would hopefully
use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML
with stylesheets, but showing you how to do that would be WAY beyond the scope of this
demonstration. Additionally one could cache results, and perform other performance
optimizations, but those are beyond the scope of this demo).
This section is intended for developers and those wishing to customize the template web
application to their needs. The sections useful to developers only are clearly delineated.</li>
</ul> </ul>
</section> </section>