lucene/docs/demo4.html

360 lines
17 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!--
Copyright 1999-2004 The Apache Software Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- Content Stylesheet for Site -->
<!-- start the processing -->
<!-- ====================================================================== -->
<!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! -->
<!-- Main Page Section -->
<!-- ====================================================================== -->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<meta name="author" value="Andrew C. Oliver">
<meta name="email" value="acoliver@apache.org">
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walkthrough</title>
<link rel="stylesheet" type="text/css" href="styles/lucene.css">
</head>
<body bgcolor="#ffffff" text="#000000" link="#525D76">
<table border="0" width="100%" cellspacing="0">
<!-- TOP IMAGE -->
<tr>
<td align="left">
<a href="http://www.apache.org"><img src="http://lucene.apache.org/java/docs/images/asf-logo.gif" width="387" height="100" border="0"/></a>
</td>
<td align="right">
<a href="http://lucene.apache.org/"><img src="./images/lucene_green_300.gif" alt="Apache Lucene" border="0"/></a>
</td>
</tr>
</table>
<table border="0" width="100%" cellspacing="4">
<tr><td colspan="2">
<hr noshade="" size="1"/>
</td></tr>
<tr>
<!-- LEFT SIDE NAVIGATION -->
<td width="20%" valign="top" nowrap="true">
<!-- ============================================================ -->
<p><strong>About</strong></p>
<ul>
<li> <a href="./index.html">Overview</a>
</li>
<li> <a href="./features.html">Features</a>
</li>
<li> <a href="http://wiki.apache.org/jakarta-lucene/PoweredBy">Powered by Lucene</a>
</li>
<li> <a href="./whoweare.html">Who We Are</a>
</li>
<li> <a href="./mailinglists.html">Mailing Lists</a>
</li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li> <a href="http://wiki.apache.org/jakarta-lucene">Wiki</a>
</li>
<li> <a href="http://wiki.apache.org/jakarta-lucene/LuceneFAQ">FAQ</a>
</li>
<li> <a href="./gettingstarted.html">Getting Started</a>
</li>
<li> <a href="./queryparsersyntax.html">Query Syntax</a>
</li>
<li> <a href="./fileformats.html">File Formats</a>
</li>
<li> <a href="./scoring.html">Scoring</a>
</li>
<li> <a href="./api/index.html">Javadoc</a>
</li>
<li> <a href="./contributions.html">Contributions</a>
</li>
<li> <a href="./benchmarks.html">Benchmarks</a>
</li>
<li> <a href="http://issues.apache.org/jira/browse/LUCENE">Issue Tracker</a>
</li>
<li> <a href="./lucene-sandbox/">Lucene Sandbox</a>
</li>
</ul>
<p><strong>Download</strong></p>
<ul>
<li> <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">Releases</a>
</li>
<li> <a href="http://svn.apache.org/viewcvs.cgi/lucene/java/">Source Repository</a>
</li>
</ul>
</td>
<td width="80%" align="left" valign="top">
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="About the Code"><strong>About the Code</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
In this section we walk through the sources behind the basic Lucene Web Application demo: where to
find them, their parts and their function. This section is intended for Java developers wishing to
understand how to use Lucene in their applications or for those involved in deploying web
applications based on Lucene.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Location of the source (developers/deployers)"><strong>Location of the source (developers/deployers)</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called <code>src</code> which in turn contains a directory called
<code>jsp</code>. This is the root for all of the Lucene web demo.
</p>
<p>
Within this directory you should see <code>index.jsp</code>. Bring this up in vi or your editor of
choice.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="index.jsp (developers/deployers)"><strong>index.jsp (developers/deployers)</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
This jsp page is pretty boring by itself. All it does is include a header, display a form and
include a footer. If you look at the form, it has two fields: <code>query</code> (where you enter
your search criteria) and <code>maxresults</code> where you specify the number of results per page.
By the structure of this JSP it should be easy to customize it without even editing this particular
file. You could simply change the header and footer. Let's look at the <code>header.jsp</code>
(located in the same directory) next.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="header.jsp (developers/deployers)"><strong>header.jsp (developers/deployers)</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
The header is also very simple by itself. The only thing it does is include the
<code>configuration.jsp</code> (which you looked at in the last section of this guide) and set the
title and a brief header. This would be a good place to put your own custom HTML to "pretty" things
up a bit. We won't cover the footer because all it does is display the footer and close your tags.
Let's look at the <code>results.jsp</code>, the meat of this application, next.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="results.jsp (developers)"><strong>results.jsp (developers)</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
Most of the functionality lies in <code>results.jsp</code>. Much of it is for paging the search
results, which we'll not cover here as it's commented well enough. The first thing in this page is
the actual imports for the Lucene classes and Lucene demo classes. These classes are loaded from
the jars included in the <code>WEB-INF/lib</code> directory in the <code>luceneweb.war</code> file.
</p>
<p>
You'll notice that this file includes the same header and footer as <code>index.jsp</code>. From
there it constructs an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> with the
<code>indexLocation</code> that was specified in <code>configuration.jsp</code>. If there is an
error of any kind in opening the index, it is displayed to the user and the boolean flag
<code>error</code> is set to tell the rest of the sections of the jsp not to continue.
</p>
<p>
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the
maximum number of results per page. If the maximum results per page is not set or not valid then it
and the start index are set to default values. If only the start index is invalid it is set to a
default value. If the criteria isn't provided then a servlet error is thrown (it is assumed that
this is the result of url tampering or some form of browser malfunction).
</p>
<p>
The jsp moves on to construct a <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code> to
analyze the search text. This matches the analyzer used during indexing (<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code>), which is generally
recommended. This is passed to the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> along with the
criteria to construct a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code>
object. You'll also notice the string literal <code>"contents"</code> included. This specifies
that the search should cover the <code>contents</code> field and not the <code>title</code>,
<code>url</code> or some other field in the indexed documents. If there is any error in
constructing a <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object an
error is displayed to the user.
</p>
<p>
In the next section of the jsp the <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> is asked to search
given the query object. The results are returned in a collection called <code>hits</code>. If the
length property of the <code>hits</code> collection is 0 (meaning there were no results) then an
error is displayed to the user and the error flag is set.
</p>
<p>
Finally the jsp iterates through the <code>hits</code> collection, taking the current page into
account, and displays properties of the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects we talked about in
the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
<code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> constructs a document
with "url", "title" and "contents").
</p>
<p>
Please note that in a real deployment of Lucene, it's best to instantiate <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code> and <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> once, and then
share them across search requests, instead of re-instantiating per search request.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="More sources (developers)"><strong>More sources (developers)</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
There are additional sources used by the web app that were not specifically covered by either
walkthrough. For example the HTML parser, the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class and <code><a href="api/org/apache/lucene/demo/HTMLDocument.html">HTMLDocument</a></code> class. These are very
similar to the classes covered in the first example, with properties specific to parsing and
indexing HTML. This is beyond our scope; however, by now you should feel like you're "getting
started" with Lucene.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Where to go from here? (everyone!)"><strong>Where to go from here? (everyone!)</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
support that context or redirect to it), anywhere where the directory doesn't quite match the
context mapping, you'll have a broken link in your results. If you want to index non-local files or
have some other needs this isn't supported, plus there may be security issues with running the
indexing application from your webapps directory. There are a number of things left for you the
developer to do.
</p>
<p>
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love
to hear it!), but for now: this is where you begin and the search engine/indexer ends. Lastly, one
would assume you'd want to follow the above advice and customize the application to look a little
more fancy than black on white with "Lucene Template" at the top. We'll see you on the Lucene
Users' or Developers' <a href="mailinglists.html">mailing lists</a>!
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="When to contact the Author"><strong>When to contact the Author</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
Please resist the urge to contact the authors of this document (without bribes of fame and fortune
attached). First contact the <a href="mailinglists.html">mailing lists</a>, taking care to <a href="http://www.catb.org/~esr/faqs/smart-questions.html">Ask Questions The Smart Way</a>.
Certainly you'll get the most help that way as well. That being said, feedback, and modifications
to this document and samples are ever so greatly appreciated. They are just best sent to the lists
or <a href="http://wiki.apache.org/jakarta-lucene/HowToContribute">posted as patches</a>, so that
everyone can share in them. Thanks for understanding!
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
</td>
</tr>
<!-- FOOTER -->
<tr><td colspan="2">
<hr noshade="" size="1"/>
</td></tr>
<tr><td colspan="2">
<div align="center"><font color="#525D76" size="-1"><em>
Copyright &#169; 1999-2005, The Apache Software Foundation
</em></font></div>
</td></tr>
</table>
</body>
</html>
<!-- end the processing -->