2002-01-26 10:01:32 -05:00
|
|
|
<?xml version="1.0"?>
|
|
|
|
<document>
|
|
|
|
<properties>
|
|
|
|
<author email="acoliver@apache.org">Andrew C. Oliver</author>
|
2005-02-14 11:48:47 -05:00
|
|
|
<title>Apache Lucene - Basic Demo Sources Walkthrough</title>
|
2002-01-26 10:01:32 -05:00
|
|
|
</properties>
|
|
|
|
<body>
|
|
|
|
|
|
|
|
<section name="About the Code">
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
In this section we walk through the sources behind the basic Lucene Web Application demo: where to
|
|
|
|
find them, their parts and their function. This section is intended for Java developers wishing to
|
|
|
|
understand how to use Lucene in their applications or for those involved in deploying web
|
|
|
|
applications based on Lucene.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
|
|
|
|
<section name="Location of the source (developers/deployers)">
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
|
|
|
should see a directory called <code>src</code> which in turn contains a directory called
|
|
|
|
<code>jsp</code>. This is the root for all of the Lucene web demo.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
|
|
|
|
choice.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section name="index.jsp (developers/deployers)">
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
|
|
|
include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
|
|
|
|
your search criteria) and <code>maxresults</code> where you specify the number of results per page.
|
|
|
|
By the structure of this JSP it should be easy to customize it without even editing this particular
|
|
|
|
file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
|
|
|
|
(located in the same directory) next.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section name="header.jsp (developers/deployers)">
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
The header is also very simple by itself. The only thing it does is include the
|
|
|
|
<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
|
|
|
|
title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
|
|
|
|
up a bit. We won't cover the footer because all it does is display the footer and close your tags.
|
|
|
|
Let's look at the <code>results.jsp</code>, the meat of this application, next.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section name="results.jsp (developers)">
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
|
|
|
|
results, which we'll not cover here as it's commented well enough. The first thing in this page is
|
|
|
|
the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
|
|
|
|
the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
|
|
|
|
there it constructs an <code><a
|
|
|
|
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
|
|
|
|
<code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
|
|
|
|
error of any kind in opening the index, it is displayed to the user and the boolean flag
|
|
|
|
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
|
|
|
|
maximum number of results per page. If the maximum results per page is not set or not valid then it
|
|
|
|
and the start index are set to default values. If only the start index is invalid it is set to a
|
|
|
|
default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
|
|
|
|
this is the result of url tampering or some form of browser malfunction).
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
The jsp moves on to construct a <code><a
|
|
|
|
href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
|
|
|
|
analyze the search text. This matches the analyzer used during indexing (<code><a
|
|
|
|
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
|
|
|
|
recommended. This is passed to the <code><a
|
|
|
|
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
|
|
|
|
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
|
|
|
|
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
|
|
|
|
that the search should cover the <code>contents</code> field and not the <code>title</code>,
|
|
|
|
<code>url</code> or some other field in the indexed documents. If there is any error in
|
|
|
|
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
|
|
|
|
error is displayed to the user.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
In the next section of the jsp the <code><a
|
|
|
|
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
|
|
|
|
given the query object. The results are returned in a collection called <code>hits</code>. If the
|
|
|
|
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
|
|
|
|
error is displayed to the user and the error flag is set.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
|
|
|
|
account, and displays properties of the <code><a
|
|
|
|
href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
|
|
|
|
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
|
|
|
<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
|
|
|
|
with "url", "title" and "contents").
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
Please note that in a real deployment of Lucene, it's best to instantiate <code><a
|
|
|
|
href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a
|
|
|
|
href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
|
|
|
|
share them across search requests, instead of re-instantiating per search request.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section name="More sources (developers)">
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
There are additional sources used by the web app that were not specifically covered by either
|
|
|
|
walkthrough. For example the HTML parser, the <code><a
|
|
|
|
href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a
|
|
|
|
href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
|
|
|
|
similar to the classes covered in the first example, with properties specific to parsing and
|
|
|
|
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
|
|
|
|
started" with Lucene.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
</section>
|
|
|
|
|
2006-08-03 18:24:42 -04:00
|
|
|
<section name="Where to go from here? (everyone!)">
|
2002-01-26 10:01:32 -05:00
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
2002-01-26 10:01:32 -05:00
|
|
|
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
|
2006-08-03 18:24:42 -04:00
|
|
|
support that context or redirect to it), anywhere where the directory doesn't quite match the
|
|
|
|
context mapping, you'll have a broken link in your results. If you want to index non-local files or
|
|
|
|
have some other needs this isn't supported, plus there may be security issues with running the
|
|
|
|
indexing application from your webapps directory. There are a number of things left for you the
|
|
|
|
developer to do.
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
|
|
|
|
to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
|
|
|
|
would assume you'd want to follow the above advice and customize the application to look a little
|
|
|
|
more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
|
|
|
|
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section name="When to contact the Author">
|
|
|
|
<p>
|
2006-08-03 18:24:42 -04:00
|
|
|
Please resist the urge to contact the authors of this document (without bribes of fame and fortune
|
|
|
|
attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a
|
|
|
|
href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
|
|
|
|
Certainly you'll get the most help that way as well. That being said, feedback, and modifications
|
|
|
|
to this document and samples are ever so greatly appreciated. They are just best sent to the lists
|
|
|
|
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
|
|
|
|
everyone can share in them. Thanks for understanding!
|
2002-01-26 10:01:32 -05:00
|
|
|
</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
</body>
|
|
|
|
</document>
|
|
|
|
|