mirror of https://github.com/apache/lucene.git
LUCENE-646: fix various small issues with the "getting started" demo pages (patch by Michael McCandless)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@428554 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
10517a310c
commit
a9a325a4df
|
@ -114,8 +114,8 @@ limitations under the License.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
This document is intended as a "getting started" guide to using and running the
|
This document is intended as a "getting started" guide to using and running the Lucene demos.
|
||||||
Apache Lucene demos. It walks you through some basic installation and configuration.
|
It walks you through some basic installation and configuration.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -131,9 +131,8 @@ Apache Lucene demos. It walks you through some basic installation and configura
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
The Lucene Demo code is a set of command line example applications that demonstrate various
|
The Lucene command-line demo code consists of two applications that demonstrate various
|
||||||
functionality of Lucene and how one should go about adding it to their
|
functionalities of Lucene and how one should go about adding Lucene to their applications.
|
||||||
applications.
|
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -143,22 +142,22 @@ applications.
|
||||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||||
<tr><td bgcolor="#525D76">
|
<tr><td bgcolor="#525D76">
|
||||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||||
<a name="Setting your classpath"><strong>Setting your classpath</strong></a>
|
<a name="Setting your CLASSPATH"><strong>Setting your CLASSPATH</strong></a>
|
||||||
</font>
|
</font>
|
||||||
</td></tr>
|
</td></tr>
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
First, extract the latest Lucene distribution.
|
First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
|
||||||
|
latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
|
||||||
|
Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
You should see the Apache Lucene jar file in the directory you created
|
You should see the Lucene JAR file in the directory you created when you extracted the archive. It
|
||||||
when you extracted the archive. It should be named something like
|
should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
|
||||||
<b>lucene-{version}.jar</b>.
|
called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
|
||||||
</p>
|
the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
|
||||||
<p>
|
successfully). Put both of these files in your Java CLASSPATH.
|
||||||
You should also see a file called called <b>lucene-demos-{version}.jar</b>.
|
|
||||||
Put both of these files in your Java CLASSPATH.
|
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -174,18 +173,27 @@ Put both of these files in your Java CLASSPATH.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
Once you've gotten this far you're probably itching to go. Let's <b> build an index!</b>
|
Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
|
||||||
Assuming you've set your classpath correctly, just type
|
you've set your CLASSPATH correctly, just type:
|
||||||
"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce
|
|
||||||
a subdirectory called "index" which will contain an index of all of the Lucene
|
<pre>
|
||||||
sourcecode.
|
java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
This will produce a subdirectory called <code>index</code> which will contain an index of all of the
|
||||||
|
Lucene source code.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
<b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles". You'll be prompted
|
To <b>search the index</b> type:
|
||||||
for a query. Type in a swear word and press the enter key. You'll see that the Lucene
|
|
||||||
developers are very well mannered and get no results. Now try entering the word "vector".
|
<pre>
|
||||||
That should return a whole bunch of documents. The results will page at every tenth
|
java org.apache.lucene.demo.SearchFiles
|
||||||
result and ask you whether you want more results.
|
</pre>
|
||||||
|
|
||||||
|
You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
|
||||||
|
Lucene developers are very well mannered and get no results. Now try entering the word "vector".
|
||||||
|
That should return a whole bunch of documents. The results will page at every tenth result and ask
|
||||||
|
you whether you want more results.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
|
|
@ -34,7 +34,7 @@ limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walkthrough</title>
|
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walk-through</title>
|
||||||
</head>
|
</head>
|
||||||
|
|
||||||
<body bgcolor="#ffffff" text="#000000" link="#525D76">
|
<body bgcolor="#ffffff" text="#000000" link="#525D76">
|
||||||
|
@ -114,9 +114,9 @@ limitations under the License.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
In this section we walk through the sources behind the basic Lucene demo such as where to
|
In this section we walk through the sources behind the command-line Lucene demo: where to find them,
|
||||||
find it, its parts and their function. This section is intended for Java developers
|
their parts and their function. This section is intended for Java developers wishing to understand
|
||||||
wishing to understand how to use Apache Lucene in their applications.
|
how to use Lucene in their applications.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -132,14 +132,14 @@ wishing to understand how to use Apache Lucene in their applications.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
Relative to the directory created when you extracted Lucene or retreived it from Subversion, you
|
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
||||||
should see a directory called "src" which in turn contains a directory called "demo".
|
should see a directory called <code>src</code> which in turn contains a directory called
|
||||||
This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo,
|
<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
|
||||||
this is where all the Java sources live.
|
<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Within this directory you should see the IndexFiles class we executed earlier. Bring that
|
Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
|
||||||
up in vi or your alternative text editor and lets take a look at it.
|
Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -155,43 +155,45 @@ up in vi or your alternative text editor and lets take a look at it.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index.
|
As we discussed in the previous walk-through, the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
|
||||||
Lets take a look at how it does this.
|
Index. Let's take a look at how it does this.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
The first substantial thing the main function does is instantiate an instance
|
The first substantial thing the <code>main</code> function does is instantiate <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
|
||||||
of IndexWriter. It passes a string called "index" and a new instance of a class called
|
"<code>index</code>" and a new instance of a class called <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
|
||||||
"StandardAnalyzer". The "index" string is the name of the directory that all index information
|
The "<code>index</code>" string is the name of the filesystem directory where all index information
|
||||||
should be stored in. Because we're not passing any path information, one must assume this
|
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
|
||||||
will be created as a subdirectory of the current directory (if it does not already exist). On
|
the current working directory (if it does not already exist). On some platforms, it may be created
|
||||||
some platforms this may actually result in it being created in other directories (such as
|
in other directories (such as the user's home directory).
|
||||||
the user's home directory).
|
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you
|
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
|
||||||
must instantiate it with a path that it can write the index into, if this path does not
|
class responsible for creating indices. To use it you must instantiate it with a path that it can
|
||||||
exist it will create it, otherwise it will refresh the index living at that path. You
|
write the index into. If this path does not exist it will first create it. Otherwise it will
|
||||||
must a also pass an instance of <b>org.apache.lucene.analysis.Analyzer</b>.
|
refresh the index at that path. You can also create an index using one of the subclasses of <code><a href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
|
||||||
|
instance of <code><a href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
The <b>Analyzer</b>, in this case, the <b>StandardAnalyzer</b> is little more than a standard Java
|
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
|
||||||
Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index.
|
are using, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
|
||||||
By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other
|
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
|
||||||
strings that would be useless for searching (e.g. <b>'s</b>) . It should be noted that there are different
|
useless words and characters from the index. By useless words and characters I mean common language
|
||||||
rules for every language, and you should use the proper analyzer for each. Lucene currently
|
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
|
||||||
provides Analyzers for English and German, more can be found in the Lucene Sandbox.
|
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
|
||||||
|
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
|
||||||
|
different languages (see the <code>*Analyzer.java</code> sources under <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Looking down further in the file, you should see the indexDocs() code. This recursive function
|
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
|
||||||
simply crawls the directories and uses FileDocument to create Document objects. The Document
|
function simply crawls the directories and uses <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
|
||||||
is simply a data object to represent the content in the file as well as its creation time and
|
represent the content in the file as well as its creation time and location. These instances are
|
||||||
location. These instances are added to the indexWriter. Take a look inside FileDocument. It's
|
added to the <code>indexWriter</code>. Take a look inside <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
|
||||||
not particularly complicated, it just adds fields to the Document.
|
complicated. It just adds fields to the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code>.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
As you can see there isn't much to creating an index. The devil is in the details. You may also
|
As you can see there isn't much to creating an index. The devil is in the details. You may also
|
||||||
wish to examine the other samples in this directory, particularly the IndexHTML class. It is
|
wish to examine the other samples in this directory, particularly the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
|
||||||
a bit more complex but builds upon this example.
|
complex but builds upon this example.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -207,12 +209,19 @@ a bit more complex but builds upon this example.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer
|
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
|
||||||
(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed
|
quite simple. It primarily collaborates with an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
|
||||||
with an analyzer used to interperate your query in the same way the Index was interperated: finding
|
(which is used in the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
|
||||||
the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the
|
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
|
||||||
results from the QueryParser which is passed to the searcher. The searcher results are returned in
|
query parser is constructed with an analyzer used to interpret your query text in the same way the
|
||||||
a collection of Documents called "Hits" which is then iterated through and displayed to the user.
|
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
|
||||||
|
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
|
||||||
|
the results from the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
|
||||||
|
the searcher. Note that it's also possible to programmatically construct a rich <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
|
||||||
|
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
|
||||||
|
syntax</a> into the corresponding <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
|
||||||
|
returned in a collection of Documents called <code><a href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
|
||||||
|
displayed to the user.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
|
|
@ -114,11 +114,10 @@ limitations under the License.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
This document is intended as a "getting started" guide to installing and running the
|
This document is intended as a "getting started" guide to installing and running the Lucene
|
||||||
Apache Lucene web application demo. This guide assumes that you have read the
|
web application demo. This guide assumes that you have read the information in the previous two
|
||||||
information in the previous two examples or already know it anyhow. We'll use
|
examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
|
||||||
Tomcat 4.0.1 as our reference web container. These demos should work with nearly
|
container, but you may have to adapt them appropriately.
|
||||||
any container, but it is up to you to adapt them appropriately.
|
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -134,12 +133,11 @@ any container, but it is up to you to adapt them appropriately.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
The Lucene Web Application demo is a template web application intended for deployment
|
The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
|
||||||
on Tomcat or a similar web container. It's NOT designed as a "best practices"
|
similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
|
||||||
implementation by ANY means. It's more of a "hello world" type Lucene Web App.
|
more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
|
||||||
The purpose of this application is to demonstrate Lucene. With that being said,
|
Lucene. With that being said, it should be relatively simple to create a small searchable website
|
||||||
it should be relatively simple to create a small searchable website in Tomcat or
|
in Tomcat or a similar application server.
|
||||||
a similar application server.
|
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -154,18 +152,19 @@ a similar application server.
|
||||||
</td></tr>
|
</td></tr>
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
|
||||||
Once you've gotten this far you're probably itching to go.
|
you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
|
||||||
Let's start by creating the index you'll need for the web examples.
|
all you need to do is type:
|
||||||
Since you've already set your classpath in the previous examples,
|
|
||||||
all you need to do is type
|
<pre>
|
||||||
<b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>.
|
java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
|
||||||
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer
|
</pre>
|
||||||
exception).
|
|
||||||
{index-dir}
|
You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
|
||||||
should be a directory that Tomcat has permission to read and write, but is
|
(make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
|
||||||
outside of a web accessible context. By default the webapp is configured
|
<code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
|
||||||
to look in <b>/opt/lucene/index</b> for this index.
|
outside of a web accessible context. By default the webapp is configured to look in
|
||||||
|
<code>/opt/lucene/index</code> for this index.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -180,10 +179,10 @@ to look in <b>/opt/lucene/index</b> for this index.
|
||||||
</td></tr>
|
</td></tr>
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>Located in your distribution directory you should see
|
<p>Located in your distribution directory you should see a war file called
|
||||||
a war file called luceneweb.war. Copy this to your
|
<code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
|
||||||
{tomcat-home}/webapps directory. You may need to restart
|
<code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
|
||||||
Tomcat. </p>
|
You may need to restart Tomcat. </p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
</td></tr>
|
</td></tr>
|
||||||
|
@ -197,15 +196,15 @@ Tomcat. </p>
|
||||||
</td></tr>
|
</td></tr>
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
|
||||||
From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not
|
present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
|
||||||
present, try browsing to "http://localhost:8080/luceneweb" then look again.
|
the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
|
||||||
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the
|
<code>indexLocation</code> is equal to the location you used for your index. You may also customize
|
||||||
location you used for your index. You may also customize the appTitle and appFooter
|
the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
|
||||||
strings as you see fit. Once you have finished altering the configuration you should
|
altering the configuration you may need to restart Tomcat. You may also wish to update the war file
|
||||||
restart Tomcat. You may also wish to update the war file by typing
|
by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
|
||||||
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.
|
subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
|
||||||
(The -u option is not available in all versions of jar. In this case recreate the war file).
|
war file).
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -220,14 +219,15 @@ restart Tomcat. You may also wish to update the war file by typing
|
||||||
</td></tr>
|
</td></tr>
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb"
|
<p>Now you're ready to roll. In your browser set the url to
|
||||||
enter "test" and the number of items per page and press search.</p>
|
<code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
|
||||||
<p>You should now be looking either at a number of results (provided you didn't erase the
|
page and press search.</p>
|
||||||
Tomcat examples) or nothing. Try other search terms. Depending on the number of items
|
<p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
|
||||||
per page you set and results returned, there may be a link at the bottom that says "more results>>",
|
examples) or nothing. If you get an error regarding opening the index, then you probably set the
|
||||||
clicking it goes to subsequent pages. If you get an error regarding opening the index, then you
|
path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
|
||||||
probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the
|
(or you skipped the step of creating it). Try other search terms. Depending on the number of items
|
||||||
index (or you skipped the step of creating it).</p>
|
per page you set and results returned, there may be a link at the bottom that says <b>More
|
||||||
|
Results>></b>; clicking it takes you to subsequent pages. </p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
</td></tr>
|
</td></tr>
|
||||||
|
@ -242,8 +242,7 @@ index (or you skipped the step of creating it).</p>
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
If you want to know more about how this web app works or how to customize it then
|
If you want to know more about how this web app works or how to customize it then <a href="demo4.html">read on>>></a>.
|
||||||
<a href="demo4.html">read on>>></a>.
|
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
|
143
docs/demo4.html
143
docs/demo4.html
|
@ -114,10 +114,10 @@ limitations under the License.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
In this section we walk through the sources behind the basic Lucene Web Application demo.
|
In this section we walk through the sources behind the basic Lucene Web Application demo: where to
|
||||||
Where to find it, its parts, and their function. This section is intended for Java developers
|
find them, their parts and their function. This section is intended for Java developers wishing to
|
||||||
wishing to understand how to use Apache Lucene in their applications or for those involved
|
understand how to use Lucene in their applications or for those involved in deploying web
|
||||||
in deploying web applications based on Lucene.
|
applications based on Lucene.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -133,13 +133,13 @@ in deploying web applications based on Lucene.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
Relative the directory created when you extracted Lucene or retreived it from Subversion, you
|
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
||||||
should see a directory called "src" which in turn contains a directory called "jsp".
|
should see a directory called <code>src</code> which in turn contains a directory called
|
||||||
This is the root for all of the Lucene web demo.
|
<code>jsp</code>. This is the root for all of the Lucene web demo.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Within this directory you should see the index.jsp class. Bring this up in vi or your
|
Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
|
||||||
editor of choice.
|
choice.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -156,13 +156,11 @@ editor of choice.
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
||||||
include a footer. If you look at the form, it has two fields: query (where you enter your
|
include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
|
||||||
search criteria) and maxresults where you specify the number of results per page. If you look
|
your search criteria) and <code>maxresults</code> where you specify the number of results per page.
|
||||||
at the form tag, you'll notice it uses the get method as opposed to the post. While this is
|
By the structure of this JSP it should be easy to customize it without even editing this particular
|
||||||
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the
|
file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
|
||||||
usefulness of being able to bookmark things like searches. By the structure of this JSP it should
|
(located in the same directory) next.
|
||||||
be easy to customize it without even editing this particular file. You could simply change the
|
|
||||||
header and footer. Let's look at the header.jsp (located in the same directory) next.
|
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -178,11 +176,11 @@ header and footer. Let's look at the header.jsp (located in the same directory)
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
The header is also very simple by itself. The only thing it does is include the configuration.jsp
|
The header is also very simple by itself. The only thing it does is include the
|
||||||
(which you looked at in the last section of this guide) and set the title and a brief header. This
|
<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
|
||||||
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the
|
title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
|
||||||
footer because all it does is display the footer and close your tags. Let's look at the results.jsp,
|
up a bit. We won't cover the footer because all it does is display the footer and close your tags.
|
||||||
the meat of this application next.
|
Let's look at the <code>results.jsp</code>, the meat of this application, next.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -198,43 +196,52 @@ the meat of this application next.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not
|
Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
|
||||||
cover this as it's commented well enough. It does not perform any optimizations such as caching results,
|
results, which we'll not cover here as it's commented well enough. The first thing in this page is
|
||||||
etc. as that would make this a more complex example. The first thing in this page is the actual imports
|
the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
|
||||||
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the
|
the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
|
||||||
WEB-INF/lib directory in the final war file.
|
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp
|
You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
|
||||||
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there
|
there it constructs an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
|
||||||
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell
|
<code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
|
||||||
the rest of the sections of the jsp not to continue.
|
error of any kind in opening the index, it is displayed to the user and the boolean flag
|
||||||
|
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum
|
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
|
||||||
number of results per page. If the maximum results per page is not set or not valid then it and the
|
maximum number of results per page. If the maximum results per page is not set or not valid then it
|
||||||
start index are set to default values. If only the start index is invalid it is set to a default value. If
|
and the start index are set to default values. If only the start index is invalid it is set to a
|
||||||
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering
|
default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
|
||||||
or some form of browser malfunction).
|
this is the result of url tampering or some form of browser malfunction).
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it
|
The jsp moves on to construct a <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
|
||||||
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the
|
analyze the search text. This matches the analyzer used during indexing (<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
|
||||||
string literal "contents" included. This is to specify the search should include the contents and not
|
recommended. This is passed to the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
|
||||||
the title, url or some other field in the indexed documents. If there is any error in constructing a Query
|
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
|
||||||
object an error is displayed to the user.
|
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
|
||||||
|
that the search should cover the <code>contents</code> field and not the <code>title</code>,
|
||||||
|
<code>url</code> or some other field in the indexed documents. If there is any error in
|
||||||
|
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
|
||||||
|
error is displayed to the user.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are
|
In the next section of the jsp the <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
|
||||||
returned in a collection called "hits". If the length property of the hits collection is 0 then an error
|
given the query object. The results are returned in a collection called <code>hits</code>. If the
|
||||||
is displayed to the user and the error flag is set.
|
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
|
||||||
|
error is displayed to the user and the error flag is set.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked
|
Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
|
||||||
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
account, and displays properties of the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
|
||||||
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged
|
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
||||||
but the search is repeated every time. This is an area where optimization could improve performance for large
|
<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
|
||||||
result sets.
|
with "url", "title" and "contents").
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Please note that in a real deployment of Lucene, it's best to instantiate <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
|
||||||
|
share them across search requests, instead of re-instantiating per search request.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -250,10 +257,11 @@ result sets.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
There are additional sources used by the web app that were not specifically covered by either walkthrough. For
|
There are additional sources used by the web app that were not specifically covered by either
|
||||||
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes
|
walkthrough. For example the HTML parser, the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
|
||||||
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is
|
similar to the classes covered in the first example, with properties specific to parsing and
|
||||||
beyond our scope; however, by now you should feel like you're "getting started" with Lucene.
|
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
|
||||||
|
started" with Lucene.
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -263,7 +271,7 @@ beyond our scope; however, by now you should feel like you're "getting started"
|
||||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||||
<tr><td bgcolor="#525D76">
|
<tr><td bgcolor="#525D76">
|
||||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||||
<a name="Where to go from here? (Everyone!)"><strong>Where to go from here? (Everyone!)</strong></a>
|
<a name="Where to go from here? (everyone!)"><strong>Where to go from here? (everyone!)</strong></a>
|
||||||
</font>
|
</font>
|
||||||
</td></tr>
|
</td></tr>
|
||||||
<tr><td>
|
<tr><td>
|
||||||
|
@ -271,16 +279,18 @@ beyond our scope; however, by now you should feel like you're "getting started"
|
||||||
<p>
|
<p>
|
||||||
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
||||||
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
|
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
|
||||||
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping,
|
support that context or redirect to it), anywhere where the directory doesn't quite match the
|
||||||
you'll have a broken link in your results. If you want to index non-local files or have some other
|
context mapping, you'll have a broken link in your results. If you want to index non-local files or
|
||||||
needs this isn't supported, plus there may be security issues with running the indexing application from
|
have some other needs this isn't supported, plus there may be security issues with running the
|
||||||
your webapps directory. There are a number of things left for you the implementor or developer to do.
|
indexing application from your webapps directory. There are a number of things left for you the
|
||||||
|
developer to do.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!),
|
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
|
||||||
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd
|
to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
|
||||||
want to follow the above advice and customize the application to look a little more fancy than black on
|
would assume you'd want to follow the above advice and customize the application to look a little
|
||||||
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists!
|
more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
|
||||||
|
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
@ -296,11 +306,12 @@ white with "Lucene Template" at the top. We'll see you on the Lucene Users' or
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First
|
Please resist the urge to contact the authors of this document (without bribes of fame and fortune
|
||||||
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback,
|
attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
|
||||||
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the
|
Certainly you'll get the most help that way as well. That being said, feedback, and modifications
|
||||||
lists so that everyone can share in them. Certainly you'll get the most help there as well.
|
to this document and samples are ever so greatly appreciated. They are just best sent to the lists
|
||||||
Thanks for understanding.
|
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
|
||||||
|
everyone can share in them. Thanks for understanding!
|
||||||
</p>
|
</p>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
|
|
@ -114,40 +114,38 @@ limitations under the License.
|
||||||
<tr><td>
|
<tr><td>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<p>
|
<p>
|
||||||
This document is intended as a "getting started" guide. It has three basic
|
This document is intended as a "getting started" guide. It has three audiences: first-time users
|
||||||
audiences: novices looking to install Apache Lucene on their application or
|
looking to install Apache Lucene in their application or web server; developers looking to modify or base
|
||||||
web server, developers looking to modify or base the applications they develop
|
the applications they develop on Lucene; and developers looking to become involved in and contribute
|
||||||
on Lucene, and developers looking to become involved in and contribute to the
|
to the development of Lucene. This document is written in tutorial and walk-through format. The
|
||||||
development of Lucene. This document is written in tutorial and walkthrough
|
goal is to help you "get started". It does not go into great depth on some of the conceptual or
|
||||||
format. It intends to help you in "getting started", but does not go into great
|
inner details of Lucene.
|
||||||
depth into some of the conceptual or inner details of Apache Lucene.
|
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Each section listed below builds on one another. That being said more advanced users may
|
Each section listed below builds on one another. More advanced users
|
||||||
wish to skip sections.
|
may wish to skip sections.
|
||||||
</p>
|
</p>
|
||||||
<ul>
|
<ul>
|
||||||
<li><a href="demo.html">About the basic Lucene demo and its usage</a>.
|
<li><a href="demo.html">About the command-line Lucene demo and its usage</a>. This section
|
||||||
This section is intended for anyone who wants a basic background on using the provided Lucene demos.</li>
|
is intended for anyone who wants to use the command-line Lucene demo.</li> <p />
|
||||||
|
|
||||||
<li><a href="demo2.html">About the sources and implementation
|
<li><a href="demo2.html">About the sources and implementation for the command-line Lucene
|
||||||
for the basic Lucene demo</a> section we walk through . This section is intended for developers.</li>
|
demo</a>. This section walks through the implementation details (sources) of the
|
||||||
|
command-line Lucene demo. This section is intended for developers.</li> <p />
|
||||||
|
|
||||||
<li><a href="demo3.html">About installing
|
<li><a href="demo3.html">About installing and configuring the demo template web
|
||||||
and configuring the template web application</a>. While this walkthrough assumes
|
application</a>. While this walk-through assumes Tomcat as your container of choice,
|
||||||
Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have
|
there is no reason you can't (provided you have the requisite knowledge) adapt the
|
||||||
the requisite knowledge) adapt the instructions to your container. This section is intended
|
instructions to your container. This section is intended for those responsible for the
|
||||||
for those responsible for the development or deployment of Lucene-based web applications.</li>
|
development or deployment of Lucene-based web applications.</li> <p />
|
||||||
|
|
||||||
|
<li><a href="demo4.html">About the sources used to construct the demo template web
|
||||||
|
application</a>. Please note the template application is designed to highlight features of
|
||||||
|
Lucene and is <b>not</b> an example of best practices. (One would hopefully use MVC
|
||||||
|
architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
|
||||||
|
would be WAY beyond the scope of this guide.) This section is intended for developers and
|
||||||
|
those wishing to customize the demo template web application to their needs. </li>
|
||||||
|
|
||||||
<li><a href="demo4.html">About the sources used to construct the
|
|
||||||
template web application</a>. Please note the template application is designed to highlight
|
|
||||||
features of Lucene and is <b>not</b> an example of best practices. (One would hopefully
|
|
||||||
use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML
|
|
||||||
with stylesheets, but showing you how to do that would be WAY beyond the scope of this
|
|
||||||
demonstration. Additionally one could cache results, and perform other performance
|
|
||||||
optimizations, but those are beyond the scope of this demo).
|
|
||||||
This section is intended for developers and those wishing to customize the template web
|
|
||||||
application to their needs. The sections useful to developers only are clearly delineated.</li>
|
|
||||||
</ul>
|
</ul>
|
||||||
</blockquote>
|
</blockquote>
|
||||||
</p>
|
</p>
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %>
|
<%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.analysis.standard.StandardAnalyzer, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*, org.apache.lucene.demo.*, org.apache.lucene.demo.html.Entities, java.net.URLEncoder" %>
|
||||||
|
|
||||||
<%
|
<%
|
||||||
/*
|
/*
|
||||||
|
@ -76,7 +76,7 @@ public String escapeHTML(String s) {
|
||||||
//query string so you get the
|
//query string so you get the
|
||||||
//treatment
|
//treatment
|
||||||
|
|
||||||
Analyzer analyzer = new StopAnalyzer(); //construct our usual analyzer
|
Analyzer analyzer = new StandardAnalyzer(); //construct our usual analyzer
|
||||||
try {
|
try {
|
||||||
QueryParser qp = new QueryParser("contents", analyzer);
|
QueryParser qp = new QueryParser("contents", analyzer);
|
||||||
query = qp.parse(queryString); //parse the
|
query = qp.parse(queryString); //parse the
|
||||||
|
@ -126,8 +126,11 @@ public String escapeHTML(String s) {
|
||||||
<%
|
<%
|
||||||
Document doc = hits.doc(i); //get the next document
|
Document doc = hits.doc(i); //get the next document
|
||||||
String doctitle = doc.get("title"); //get its title
|
String doctitle = doc.get("title"); //get its title
|
||||||
String url = doc.get("url"); //get its url field
|
String url = doc.get("path"); //get its path field
|
||||||
if ((doctitle == null) || doctitle.equals("")) //use the url if it has no title
|
if (url != null && url.startsWith("../webapps/")) { // strip off ../webapps prefix if present
|
||||||
|
url = url.substring(10);
|
||||||
|
}
|
||||||
|
if ((doctitle == null) || doctitle.equals("")) //use the path if it has no title
|
||||||
doctitle = url;
|
doctitle = url;
|
||||||
//then output!
|
//then output!
|
||||||
%>
|
%>
|
||||||
|
|
|
@ -8,49 +8,58 @@
|
||||||
|
|
||||||
<section name="About this Document">
|
<section name="About this Document">
|
||||||
<p>
|
<p>
|
||||||
This document is intended as a "getting started" guide to using and running the
|
This document is intended as a "getting started" guide to using and running the Lucene demos.
|
||||||
Apache Lucene demos. It walks you through some basic installation and configuration.
|
It walks you through some basic installation and configuration.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
||||||
<section name="About the Demos">
|
<section name="About the Demos">
|
||||||
<p>
|
<p>
|
||||||
The Lucene Demo code is a set of command line example applications that demonstrate various
|
The Lucene command-line demo code consists of two applications that demonstrate various
|
||||||
functionality of Lucene and how one should go about adding it to their
|
functionalities of Lucene and how one should go about adding Lucene to their applications.
|
||||||
applications.
|
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="Setting your classpath">
|
<section name="Setting your CLASSPATH">
|
||||||
<p>
|
<p>
|
||||||
First, extract the latest Lucene distribution.
|
First, you should <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">download</a> the
|
||||||
|
latest Lucene distribution and then extract it to a working directory. Alternatively, you can <a
|
||||||
|
href="http://wiki.apache.org/jakarta-lucene/SourceRepository">check out the sources from
|
||||||
|
Subversion</a>, and then run <code>ant war-demo</code> to generate the JARs and WARs.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
You should see the Apache Lucene jar file in the directory you created
|
You should see the Lucene JAR file in the directory you created when you extracted the archive. It
|
||||||
when you extracted the archive. It should be named something like
|
should be named something like <code>lucene-core-{version}.jar</code>. You should also see a file
|
||||||
<b>lucene-{version}.jar</b>.
|
called <code>lucene-demos-{version}.jar</code>. If you checked out the sources from Subversion then
|
||||||
</p>
|
the JARs are located under the <code>build</code> subdirectory (after running <code>ant</code>
|
||||||
<p>
|
successfully). Put both of these files in your Java CLASSPATH.
|
||||||
You should also see a file called called <b>lucene-demos-{version}.jar</b>.
|
|
||||||
Put both of these files in your Java CLASSPATH.
|
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="Indexing Files">
|
<section name="Indexing Files">
|
||||||
<p>
|
<p>
|
||||||
Once you've gotten this far you're probably itching to go. Let's <b> build an index!</b>
|
Once you've gotten this far you're probably itching to go. Let's <b>build an index!</b> Assuming
|
||||||
Assuming you've set your classpath correctly, just type
|
you've set your CLASSPATH correctly, just type:
|
||||||
"java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src". This will produce
|
|
||||||
a subdirectory called "index" which will contain an index of all of the Lucene
|
<pre>
|
||||||
sourcecode.
|
java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
This will produce a subdirectory called <code>index</code> which will contain an index of all of the
|
||||||
|
Lucene source code.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
<b> To search the index </b> type "java org.apache.lucene.demo.SearchFiles". You'll be prompted
|
To <b>search the index</b> type:
|
||||||
for a query. Type in a swear word and press the enter key. You'll see that the Lucene
|
|
||||||
developers are very well mannered and get no results. Now try entering the word "vector".
|
<pre>
|
||||||
That should return a whole bunch of documents. The results will page at every tenth
|
java org.apache.lucene.demo.SearchFiles
|
||||||
result and ask you whether you want more results.
|
</pre>
|
||||||
|
|
||||||
|
You'll be prompted for a query. Type in a swear word and press the enter key. You'll see that the
|
||||||
|
Lucene developers are very well mannered and get no results. Now try entering the word "vector".
|
||||||
|
That should return a whole bunch of documents. The results will page at every tenth result and ask
|
||||||
|
you whether you want more results.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
127
xdocs/demo2.xml
127
xdocs/demo2.xml
|
@ -2,89 +2,132 @@
|
||||||
<document>
|
<document>
|
||||||
<properties>
|
<properties>
|
||||||
<author email="acoliver@apache.org">Andrew C. Oliver</author>
|
<author email="acoliver@apache.org">Andrew C. Oliver</author>
|
||||||
<title>Apache Lucene - Basic Demo Sources Walkthrough</title>
|
<title>Apache Lucene - Basic Demo Sources Walk-through</title>
|
||||||
</properties>
|
</properties>
|
||||||
<body>
|
<body>
|
||||||
|
|
||||||
<section name="About the Code">
|
<section name="About the Code">
|
||||||
<p>
|
<p>
|
||||||
In this section we walk through the sources behind the basic Lucene demo such as where to
|
In this section we walk through the sources behind the command-line Lucene demo: where to find them,
|
||||||
find it, its parts and their function. This section is intended for Java developers
|
their parts and their function. This section is intended for Java developers wishing to understand
|
||||||
wishing to understand how to use Apache Lucene in their applications.
|
how to use Lucene in their applications.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
||||||
<section name="Location of the source">
|
<section name="Location of the source">
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Relative to the directory created when you extracted Lucene or retreived it from Subversion, you
|
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
||||||
should see a directory called "src" which in turn contains a directory called "demo".
|
should see a directory called <code>src</code> which in turn contains a directory called
|
||||||
This is the root for all of the Lucene demos. Under this directory is org/apache/lucene/demo,
|
<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
|
||||||
this is where all the Java sources live.
|
<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Within this directory you should see the IndexFiles class we executed earlier. Bring that
|
Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
|
||||||
up in vi or your alternative text editor and lets take a look at it.
|
Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="IndexFiles">
|
<section name="IndexFiles">
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
As we discussed in the previous walkthrough, the IndexFiles class creates a Lucene Index.
|
As we discussed in the previous walk-through, the <code><a
|
||||||
Lets take a look at how it does this.
|
href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
|
||||||
|
Index. Let's take a look at how it does this.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The first substantial thing the main function does is instantiate an instance
|
The first substantial thing the <code>main</code> function does is instantiate <code><a
|
||||||
of IndexWriter. It passes a string called "index" and a new instance of a class called
|
href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
|
||||||
"StandardAnalyzer". The "index" string is the name of the directory that all index information
|
"<code>index</code>" and a new instance of a class called <code><a
|
||||||
should be stored in. Because we're not passing any path information, one must assume this
|
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
|
||||||
will be created as a subdirectory of the current directory (if it does not already exist). On
|
The "<code>index</code>" string is the name of the filesystem directory where all index information
|
||||||
some platforms this may actually result in it being created in other directories (such as
|
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
|
||||||
the user's home directory).
|
the current working directory (if it does not already exist). On some platforms, it may be created
|
||||||
|
in other directories (such as the user's home directory).
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The <b>IndexWriter</b> is the main class responsible for creating indicies. To use it you
|
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
|
||||||
must instantiate it with a path that it can write the index into, if this path does not
|
class responsible for creating indices. To use it you must instantiate it with a path that it can
|
||||||
exist it will create it, otherwise it will refresh the index living at that path. You
|
write the index into. If this path does not exist it will first create it. Otherwise it will
|
||||||
must a also pass an instance of <b>org.apache.lucene.analysis.Analyzer</b>.
|
refresh the index at that path. You can also create an index using one of the subclasses of <code><a
|
||||||
|
href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
|
||||||
|
instance of <code><a
|
||||||
|
href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The <b>Analyzer</b>, in this case, the <b>StandardAnalyzer</b> is little more than a standard Java
|
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
|
||||||
Tokenizer, converting all strings to lowercase and filtering out useless words and characters from the index.
|
are using, <code><a
|
||||||
By useless words and characters I mean common language words such as articles (a, an, the, etc.) and other
|
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
|
||||||
strings that would be useless for searching (e.g. <b>'s</b>) . It should be noted that there are different
|
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
|
||||||
rules for every language, and you should use the proper analyzer for each. Lucene currently
|
useless words and characters from the index. By useless words and characters I mean common language
|
||||||
provides Analyzers for English and German, more can be found in the Lucene Sandbox.
|
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
|
||||||
|
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
|
||||||
|
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
|
||||||
|
different languages (see the <code>*Analyzer.java</code> sources under <a
|
||||||
|
href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Looking down further in the file, you should see the indexDocs() code. This recursive function
|
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
|
||||||
simply crawls the directories and uses FileDocument to create Document objects. The Document
|
function simply crawls the directories and uses <code><a
|
||||||
is simply a data object to represent the content in the file as well as its creation time and
|
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a
|
||||||
location. These instances are added to the indexWriter. Take a look inside FileDocument. It's
|
href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a
|
||||||
not particularly complicated, it just adds fields to the Document.
|
href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
|
||||||
|
represent the content in the file as well as its creation time and location. These instances are
|
||||||
|
added to the <code>indexWriter</code>. Take a look inside <code><a
|
||||||
|
href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
|
||||||
|
complicated. It just adds fields to the <code><a
|
||||||
|
href="api/org/apache/lucene/document/Document.html">Document</a></code>.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
As you can see there isn't much to creating an index. The devil is in the details. You may also
|
As you can see there isn't much to creating an index. The devil is in the details. You may also
|
||||||
wish to examine the other samples in this directory, particularly the IndexHTML class. It is
|
wish to examine the other samples in this directory, particularly the <code><a
|
||||||
a bit more complex but builds upon this example.
|
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
|
||||||
|
complex but builds upon this example.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="Searching Files">
|
<section name="Searching Files">
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The SearchFiles class is quite simple. It primarily collaborates with an IndexSearcher, StandardAnalyzer
|
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
|
||||||
(which is used in the IndexFiles class as well) and a QueryParser. The query parser is constructed
|
quite simple. It primarily collaborates with an <code><a
|
||||||
with an analyzer used to interperate your query in the same way the Index was interperated: finding
|
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a
|
||||||
the end of words and removing useless words like 'a', 'an' and 'the'. The Query object contains the
|
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
|
||||||
results from the QueryParser which is passed to the searcher. The searcher results are returned in
|
(which is used in the <code><a
|
||||||
a collection of Documents called "Hits" which is then iterated through and displayed to the user.
|
href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
|
||||||
|
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
|
||||||
|
query parser is constructed with an analyzer used to interpret your query text in the same way the
|
||||||
|
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
|
||||||
|
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
|
||||||
|
the results from the <code><a
|
||||||
|
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
|
||||||
|
the searcher. Note that it's also possible to programmatically construct a rich <code><a
|
||||||
|
href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
|
||||||
|
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
|
||||||
|
syntax</a> into the corresponding <code><a
|
||||||
|
href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
|
||||||
|
returned in a collection of Documents called <code><a
|
||||||
|
href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
|
||||||
|
displayed to the user.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="The Web example...">
|
<section name="The Web example...">
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<a href="demo3.html">read on>>></a>
|
<a href="demo3.html">read on>>></a>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</body>
|
</body>
|
||||||
|
|
|
@ -9,77 +9,75 @@
|
||||||
|
|
||||||
<section name="About this Document">
|
<section name="About this Document">
|
||||||
<p>
|
<p>
|
||||||
This document is intended as a "getting started" guide to installing and running the
|
This document is intended as a "getting started" guide to installing and running the Lucene
|
||||||
Apache Lucene web application demo. This guide assumes that you have read the
|
web application demo. This guide assumes that you have read the information in the previous two
|
||||||
information in the previous two examples or already know it anyhow. We'll use
|
examples. We'll use Tomcat as our reference web container. These demos should work with nearly any
|
||||||
Tomcat 4.0.1 as our reference web container. These demos should work with nearly
|
container, but you may have to adapt them appropriately.
|
||||||
any container, but it is up to you to adapt them appropriately.
|
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
||||||
<section name="About the Demos">
|
<section name="About the Demos">
|
||||||
<p>
|
<p>
|
||||||
The Lucene Web Application demo is a template web application intended for deployment
|
The Lucene Web Application demo is a template web application intended for deployment on Tomcat or a
|
||||||
on Tomcat or a similar web container. It's NOT designed as a "best practices"
|
similar web container. It's NOT designed as a "best practices" implementation by ANY means. It's
|
||||||
implementation by ANY means. It's more of a "hello world" type Lucene Web App.
|
more of a "hello world" type Lucene Web App. The purpose of this application is to demonstrate
|
||||||
The purpose of this application is to demonstrate Lucene. With that being said,
|
Lucene. With that being said, it should be relatively simple to create a small searchable website
|
||||||
it should be relatively simple to create a small searchable website in Tomcat or
|
in Tomcat or a similar application server.
|
||||||
a similar application server.
|
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="Indexing Files">
|
<section name="Indexing Files">
|
||||||
<p>
|
<p> Once you've gotten this far you're probably itching to go. Let's start by creating the index
|
||||||
Once you've gotten this far you're probably itching to go.
|
you'll need for the web examples. Since you've already set your CLASSPATH in the previous examples,
|
||||||
Let's start by creating the index you'll need for the web examples.
|
all you need to do is type:
|
||||||
Since you've already set your classpath in the previous examples,
|
|
||||||
all you need to do is type
|
<pre>
|
||||||
<b> "java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .."</b>.
|
java org.apache.lucene.demo.IndexHTML -create -index {index-dir} ..
|
||||||
You'll need to do this from a (any) subdirectory of your {tomcat}/webapps directory (make sure you didn't leave off the ".." or you'll get a null pointer
|
</pre>
|
||||||
exception).
|
|
||||||
{index-dir}
|
You'll need to do this from a (any) subdirectory of your <code>{tomcat}/webapps</code> directory
|
||||||
should be a directory that Tomcat has permission to read and write, but is
|
(make sure you didn't leave off the <code>..</code> or you'll get a null pointer exception).
|
||||||
outside of a web accessible context. By default the webapp is configured
|
<code>{index-dir}</code> should be a directory that Tomcat has permission to read and write, but is
|
||||||
to look in <b>/opt/lucene/index</b> for this index.
|
outside of a web accessible context. By default the webapp is configured to look in
|
||||||
|
<code>/opt/lucene/index</code> for this index.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="Deploying the Demos">
|
<section name="Deploying the Demos">
|
||||||
<p>Located in your distribution directory you should see
|
<p>Located in your distribution directory you should see a war file called
|
||||||
a war file called luceneweb.war. Copy this to your
|
<code>luceneweb.war</code>. If you're working with a Subversion checkout, this will be under the
|
||||||
{tomcat-home}/webapps directory. You may need to restart
|
<code>build</code> subdirectory. Copy this to your <code>{tomcat-home}/webapps</code> directory.
|
||||||
Tomcat. </p>
|
You may need to restart Tomcat. </p> </section>
|
||||||
</section>
|
|
||||||
|
|
||||||
<section name="Configuration">
|
<section name="Configuration">
|
||||||
<p>
|
<p> From your Tomcat directory look in the <code>webapps/luceneweb</code> subdirectory. If it's not
|
||||||
From your Tomcat directory look in the webapps/luceneweb subdirectory. If it's not
|
present, try browsing to <code>http://localhost:8080/luceneweb</code> (which causes Tomcat to deploy
|
||||||
present, try browsing to "http://localhost:8080/luceneweb" then look again.
|
the webapp), then look again. Edit a file called <code>configuration.jsp</code>. Ensure that the
|
||||||
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the
|
<code>indexLocation</code> is equal to the location you used for your index. You may also customize
|
||||||
location you used for your index. You may also customize the appTitle and appFooter
|
the <code>appTitle</code> and <code>appFooter</code> strings as you see fit. Once you have finished
|
||||||
strings as you see fit. Once you have finished altering the configuration you should
|
altering the configuration you may need to restart Tomcat. You may also wish to update the war file
|
||||||
restart Tomcat. You may also wish to update the war file by typing
|
by typing <code>jar -uf luceneweb.war configuration.jsp</code> from the <code>luceneweb</code>
|
||||||
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.
|
subdirectory. (The -u option is not available in all versions of jar. In this case recreate the
|
||||||
(The -u option is not available in all versions of jar. In this case recreate the war file).
|
war file).
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="Running the Demos">
|
<section name="Running the Demos">
|
||||||
<p>Now you're ready to roll. In your browser set the url to "http://localhost:8080/luceneweb"
|
<p>Now you're ready to roll. In your browser set the url to
|
||||||
enter "test" and the number of items per page and press search.</p>
|
<code>http://localhost:8080/luceneweb</code> enter <code>test</code> and the number of items per
|
||||||
<p>You should now be looking either at a number of results (provided you didn't erase the
|
page and press search.</p>
|
||||||
Tomcat examples) or nothing. Try other search terms. Depending on the number of items
|
<p>You should now be looking either at a number of results (provided you didn't erase the Tomcat
|
||||||
per page you set and results returned, there may be a link at the bottom that says "more results>>",
|
examples) or nothing. If you get an error regarding opening the index, then you probably set the
|
||||||
clicking it goes to subsequent pages. If you get an error regarding opening the index, then you
|
path in <code>configuration.jsp</code> incorrectly or Tomcat doesn't have permissions to the index
|
||||||
probably set the path in "configuration" incorrectly or Tomcat doesn't have permissions to the
|
(or you skipped the step of creating it). Try other search terms. Depending on the number of items
|
||||||
index (or you skipped the step of creating it).</p>
|
per page you set and results returned, there may be a link at the bottom that says <b>More
|
||||||
</section>
|
Results>></b>; clicking it takes you to subsequent pages. </p> </section>
|
||||||
|
|
||||||
<section name="About the code...">
|
<section name="About the code...">
|
||||||
<p>
|
<p>
|
||||||
If you want to know more about how this web app works or how to customize it then
|
If you want to know more about how this web app works or how to customize it then <a
|
||||||
<a href="demo4.html">read on>>></a>.
|
href="demo4.html">read on>>></a>.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
154
xdocs/demo4.xml
154
xdocs/demo4.xml
|
@ -8,124 +8,146 @@
|
||||||
|
|
||||||
<section name="About the Code">
|
<section name="About the Code">
|
||||||
<p>
|
<p>
|
||||||
In this section we walk through the sources behind the basic Lucene Web Application demo.
|
In this section we walk through the sources behind the basic Lucene Web Application demo: where to
|
||||||
Where to find it, its parts, and their function. This section is intended for Java developers
|
find them, their parts and their function. This section is intended for Java developers wishing to
|
||||||
wishing to understand how to use Apache Lucene in their applications or for those involved
|
understand how to use Lucene in their applications or for those involved in deploying web
|
||||||
in deploying web applications based on Lucene.
|
applications based on Lucene.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
||||||
<section name="Location of the source (developers/deployers)">
|
<section name="Location of the source (developers/deployers)">
|
||||||
<p>
|
<p>
|
||||||
Relative the directory created when you extracted Lucene or retreived it from Subversion, you
|
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
||||||
should see a directory called "src" which in turn contains a directory called "jsp".
|
should see a directory called <code>src</code> which in turn contains a directory called
|
||||||
This is the root for all of the Lucene web demo.
|
<code>jsp</code>. This is the root for all of the Lucene web demo.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Within this directory you should see the index.jsp class. Bring this up in vi or your
|
Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
|
||||||
editor of choice.
|
choice.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="index.jsp (developers/deployers)">
|
<section name="index.jsp (developers/deployers)">
|
||||||
<p>
|
<p>
|
||||||
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
||||||
include a footer. If you look at the form, it has two fields: query (where you enter your
|
include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
|
||||||
search criteria) and maxresults where you specify the number of results per page. If you look
|
your search criteria) and <code>maxresults</code> where you specify the number of results per page.
|
||||||
at the form tag, you'll notice it uses the get method as opposed to the post. While this is
|
By the structure of this JSP it should be easy to customize it without even editing this particular
|
||||||
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the
|
file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
|
||||||
usefulness of being able to bookmark things like searches. By the structure of this JSP it should
|
(located in the same directory) next.
|
||||||
be easy to customize it without even editing this particular file. You could simply change the
|
|
||||||
header and footer. Let's look at the header.jsp (located in the same directory) next.
|
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="header.jsp (developers/deployers)">
|
<section name="header.jsp (developers/deployers)">
|
||||||
<p>
|
<p>
|
||||||
The header is also very simple by itself. The only thing it does is include the configuration.jsp
|
The header is also very simple by itself. The only thing it does is include the
|
||||||
(which you looked at in the last section of this guide) and set the title and a brief header. This
|
<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
|
||||||
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the
|
title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
|
||||||
footer because all it does is display the footer and close your tags. Let's look at the results.jsp,
|
up a bit. We won't cover the footer because all it does is display the footer and close your tags.
|
||||||
the meat of this application next.
|
Let's look at the <code>results.jsp</code>, the meat of this application, next.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="results.jsp (developers)">
|
<section name="results.jsp (developers)">
|
||||||
<p>
|
<p>
|
||||||
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not
|
Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
|
||||||
cover this as it's commented well enough. It does not perform any optimizations such as caching results,
|
results, which we'll not cover here as it's commented well enough. The first thing in this page is
|
||||||
etc. as that would make this a more complex example. The first thing in this page is the actual imports
|
the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
|
||||||
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the
|
the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
|
||||||
WEB-INF/lib directory in the final war file.
|
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp
|
You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
|
||||||
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there
|
there it constructs an <code><a
|
||||||
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell
|
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
|
||||||
the rest of the sections of the jsp not to continue.
|
<code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
|
||||||
|
error of any kind in opening the index, it is displayed to the user and the boolean flag
|
||||||
|
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum
|
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
|
||||||
number of results per page. If the maximum results per page is not set or not valid then it and the
|
maximum number of results per page. If the maximum results per page is not set or not valid then it
|
||||||
start index are set to default values. If only the start index is invalid it is set to a default value. If
|
and the start index are set to default values. If only the start index is invalid it is set to a
|
||||||
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering
|
default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
|
||||||
or some form of browser malfunction).
|
this is the result of url tampering or some form of browser malfunction).
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it
|
The jsp moves on to construct a <code><a
|
||||||
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the
|
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
|
||||||
string literal "contents" included. This is to specify the search should include the contents and not
|
analyze the search text. This matches the analyzer used during indexing (<code><a
|
||||||
the title, url or some other field in the indexed documents. If there is any error in constructing a Query
|
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
|
||||||
object an error is displayed to the user.
|
recommended. This is passed to the <code><a
|
||||||
|
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
|
||||||
|
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
|
||||||
|
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
|
||||||
|
that the search should cover the <code>contents</code> field and not the <code>title</code>,
|
||||||
|
<code>url</code> or some other field in the indexed documents. If there is any error in
|
||||||
|
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
|
||||||
|
error is displayed to the user.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are
|
In the next section of the jsp the <code><a
|
||||||
returned in a collection called "hits". If the length property of the hits collection is 0 then an error
|
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
|
||||||
is displayed to the user and the error flag is set.
|
given the query object. The results are returned in a collection called <code>hits</code>. If the
|
||||||
|
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
|
||||||
|
error is displayed to the user and the error flag is set.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked
|
Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
|
||||||
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
account, and displays properties of the <code><a
|
||||||
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged
|
href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
|
||||||
but the search is repeated every time. This is an area where optimization could improve performance for large
|
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
||||||
result sets.
|
<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
|
||||||
|
with "url", "title" and "contents").
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Please note that in a real deployment of Lucene, it's best to instantiate <code><a
|
||||||
|
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a
|
||||||
|
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
|
||||||
|
share them across search requests, instead of re-instantiating per search request.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="More sources (developers)">
|
<section name="More sources (developers)">
|
||||||
<p>
|
<p>
|
||||||
There are additional sources used by the web app that were not specifically covered by either walkthrough. For
|
There are additional sources used by the web app that were not specifically covered by either
|
||||||
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes
|
walkthrough. For example the HTML parser, the <code><a
|
||||||
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is
|
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a
|
||||||
beyond our scope; however, by now you should feel like you're "getting started" with Lucene.
|
href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
|
||||||
|
similar to the classes covered in the first example, with properties specific to parsing and
|
||||||
|
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
|
||||||
|
started" with Lucene.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="Where to go from here? (Everyone!)">
|
<section name="Where to go from here? (everyone!)">
|
||||||
<p>
|
<p>
|
||||||
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
||||||
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
|
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
|
||||||
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping,
|
support that context or redirect to it), anywhere where the directory doesn't quite match the
|
||||||
you'll have a broken link in your results. If you want to index non-local files or have some other
|
context mapping, you'll have a broken link in your results. If you want to index non-local files or
|
||||||
needs this isn't supported, plus there may be security issues with running the indexing application from
|
have some other needs this isn't supported, plus there may be security issues with running the
|
||||||
your webapps directory. There are a number of things left for you the implementor or developer to do.
|
indexing application from your webapps directory. There are a number of things left for you the
|
||||||
|
developer to do.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!),
|
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
|
||||||
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd
|
to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
|
||||||
want to follow the above advice and customize the application to look a little more fancy than black on
|
would assume you'd want to follow the above advice and customize the application to look a little
|
||||||
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists!
|
more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
|
||||||
|
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section name="When to contact the Author">
|
<section name="When to contact the Author">
|
||||||
<p>
|
<p>
|
||||||
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First
|
Please resist the urge to contact the authors of this document (without bribes of fame and fortune
|
||||||
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback,
|
attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a
|
||||||
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the
|
href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
|
||||||
lists so that everyone can share in them. Certainly you'll get the most help there as well.
|
Certainly you'll get the most help that way as well. That being said, feedback, and modifications
|
||||||
Thanks for understanding.
|
to this document and samples are ever so greatly appreciated. They are just best sent to the lists
|
||||||
|
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
|
||||||
|
everyone can share in them. Thanks for understanding!
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
|
@ -8,42 +8,40 @@
|
||||||
|
|
||||||
<section name="Getting Started">
|
<section name="Getting Started">
|
||||||
<p>
|
<p>
|
||||||
This document is intended as a "getting started" guide. It has three basic
|
This document is intended as a "getting started" guide. It has three audiences: first-time users
|
||||||
audiences: novices looking to install Apache Lucene on their application or
|
looking to install Apache Lucene in their application or web server; developers looking to modify or base
|
||||||
web server, developers looking to modify or base the applications they develop
|
the applications they develop on Lucene; and developers looking to become involved in and contribute
|
||||||
on Lucene, and developers looking to become involved in and contribute to the
|
to the development of Lucene. This document is written in tutorial and walk-through format. The
|
||||||
development of Lucene. This document is written in tutorial and walkthrough
|
goal is to help you "get started". It does not go into great depth on some of the conceptual or
|
||||||
format. It intends to help you in "getting started", but does not go into great
|
inner details of Lucene.
|
||||||
depth into some of the conceptual or inner details of Apache Lucene.
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Each section listed below builds on one another. That being said more advanced users may
|
Each section listed below builds on one another. More advanced users
|
||||||
wish to skip sections.
|
may wish to skip sections.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li><a href="demo.html">About the basic Lucene demo and its usage</a>.
|
<li><a href="demo.html">About the command-line Lucene demo and its usage</a>. This section
|
||||||
This section is intended for anyone who wants a basic background on using the provided Lucene demos.</li>
|
is intended for anyone who wants to use the command-line Lucene demo.</li> <p/>
|
||||||
|
|
||||||
<li><a href="demo2.html">About the sources and implementation
|
<li><a href="demo2.html">About the sources and implementation for the command-line Lucene
|
||||||
for the basic Lucene demo</a> section we walk through . This section is intended for developers.</li>
|
demo</a>. This section walks through the implementation details (sources) of the
|
||||||
|
command-line Lucene demo. This section is intended for developers.</li> <p/>
|
||||||
|
|
||||||
<li><a href="demo3.html">About installing
|
<li><a href="demo3.html">About installing and configuring the demo template web
|
||||||
and configuring the template web application</a>. While this walkthrough assumes
|
application</a>. While this walk-through assumes Tomcat as your container of choice,
|
||||||
Tomcat 4.0.x as your container of choice, there is no reason you can't (provided you have
|
there is no reason you can't (provided you have the requisite knowledge) adapt the
|
||||||
the requisite knowledge) adapt the instructions to your container. This section is intended
|
instructions to your container. This section is intended for those responsible for the
|
||||||
for those responsible for the development or deployment of Lucene-based web applications.</li>
|
development or deployment of Lucene-based web applications.</li> <p/>
|
||||||
|
|
||||||
|
<li><a href="demo4.html">About the sources used to construct the demo template web
|
||||||
|
application</a>. Please note the template application is designed to highlight features of
|
||||||
|
Lucene and is <b>not</b> an example of best practices. (One would hopefully use MVC
|
||||||
|
architecture such as provided by Jakarta Struts and taglibs, but showing you how to do that
|
||||||
|
would be WAY beyond the scope of this guide.) This section is intended for developers and
|
||||||
|
those wishing to customize the demo template web application to their needs. </li>
|
||||||
|
|
||||||
<li><a href="demo4.html">About the sources used to construct the
|
|
||||||
template web application</a>. Please note the template application is designed to highlight
|
|
||||||
features of Lucene and is <b>not</b> an example of best practices. (One would hopefully
|
|
||||||
use MVC architecture such as provided by Jakarta Struts and taglibs, or better yet XML
|
|
||||||
with stylesheets, but showing you how to do that would be WAY beyond the scope of this
|
|
||||||
demonstration. Additionally one could cache results, and perform other performance
|
|
||||||
optimizations, but those are beyond the scope of this demo).
|
|
||||||
This section is intended for developers and those wishing to customize the template web
|
|
||||||
application to their needs. The sections useful to developers only are clearly delineated.</li>
|
|
||||||
</ul>
|
</ul>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue