mirror of https://github.com/apache/lucene.git
346 lines
16 KiB
HTML
346 lines
16 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
|
|
<!--
|
|
Copyright 1999-2004 The Apache Software Foundation
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
-->
|
|
|
|
|
|
<!-- Content Stylesheet for Site -->
|
|
|
|
|
|
<!-- start the processing -->
|
|
<!-- ====================================================================== -->
|
|
<!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! -->
|
|
<!-- Main Page Section -->
|
|
<!-- ====================================================================== -->
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
|
|
|
|
<meta name="author" value="Andrew C. Oliver">
|
|
<meta name="email" value="acoliver@apache.org">
|
|
|
|
|
|
|
|
|
|
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walkthrough</title>
|
|
</head>
|
|
|
|
<body bgcolor="#ffffff" text="#000000" link="#525D76">
|
|
<table border="0" width="100%" cellspacing="0">
|
|
<!-- TOP IMAGE -->
|
|
<tr>
|
|
<td align="left">
|
|
<a href="http://www.apache.org"><img src="http://lucene.apache.org/java/docs/images/asf-logo.gif" width="387" height="100" border="0"/></a>
|
|
</td>
|
|
<td align="right">
|
|
<a href="http://lucene.apache.org/"><img src="./images/lucene_green_300.gif" alt="Apache Lucene" border="0"/></a>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<table border="0" width="100%" cellspacing="4">
|
|
<tr><td colspan="2">
|
|
<hr noshade="" size="1"/>
|
|
</td></tr>
|
|
|
|
<tr>
|
|
<!-- LEFT SIDE NAVIGATION -->
|
|
<td width="20%" valign="top" nowrap="true">
|
|
|
|
<!-- ============================================================ -->
|
|
|
|
<p><strong>About</strong></p>
|
|
<ul>
|
|
<li> <a href="./index.html">Overview</a>
|
|
</li>
|
|
<li> <a href="./features.html">Features</a>
|
|
</li>
|
|
<li> <a href="http://wiki.apache.org/jakarta-lucene/PoweredBy">Powered by Lucene</a>
|
|
</li>
|
|
<li> <a href="./whoweare.html">Who We Are</a>
|
|
</li>
|
|
<li> <a href="./mailinglists.html">Mailing Lists</a>
|
|
</li>
|
|
</ul>
|
|
<p><strong>Resources</strong></p>
|
|
<ul>
|
|
<li> <a href="http://wiki.apache.org/jakarta-lucene">Wiki</a>
|
|
</li>
|
|
<li> <a href="http://wiki.apache.org/jakarta-lucene/LuceneFAQ">FAQ</a>
|
|
</li>
|
|
<li> <a href="./gettingstarted.html">Getting Started</a>
|
|
</li>
|
|
<li> <a href="./queryparsersyntax.html">Query Syntax</a>
|
|
</li>
|
|
<li> <a href="./fileformats.html">File Formats</a>
|
|
</li>
|
|
<li> <a href="./api/index.html">Javadoc</a>
|
|
</li>
|
|
<li> <a href="./contributions.html">Contributions</a>
|
|
</li>
|
|
<li> <a href="./benchmarks.html">Benchmarks</a>
|
|
</li>
|
|
<li> <a href="http://issues.apache.org/jira/browse/LUCENE">Issue Tracker</a>
|
|
</li>
|
|
<li> <a href="./lucene-sandbox/">Lucene Sandbox</a>
|
|
</li>
|
|
</ul>
|
|
<p><strong>Download</strong></p>
|
|
<ul>
|
|
<li> <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">Releases</a>
|
|
</li>
|
|
<li> <a href="http://svn.apache.org/viewcvs.cgi/lucene/java/">Source Repository</a>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
<td width="80%" align="left" valign="top">
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="About the Code"><strong>About the Code</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
In this section we walk through the sources behind the basic Lucene Web Application demo.
|
|
Where to find it, its parts, and their function. This section is intended for Java developers
|
|
wishing to understand how to use Apache Lucene in their applications or for those involved
|
|
in deploying web applications based on Lucene.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="Location of the source (developers/deployers)"><strong>Location of the source (developers/deployers)</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
Relative the directory created when you extracted Lucene or retreived it from Subversion, you
|
|
should see a directory called "src" which in turn contains a directory called "jsp".
|
|
This is the root for all of the Lucene web demo.
|
|
</p>
|
|
<p>
|
|
Within this directory you should see the index.jsp class. Bring this up in vi or your
|
|
editor of choice.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="index.jsp (developers/deployers)"><strong>index.jsp (developers/deployers)</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
This jsp page is pretty boring by itself. All it does is include a header, display a form and
|
|
include a footer. If you look at the form, it has two fields: query (where you enter your
|
|
search criteria) and maxresults where you specify the number of results per page. If you look
|
|
at the form tag, you'll notice it uses the get method as opposed to the post. While this is
|
|
considered deprecated functionality by the latest w3c specs, its unlikely to go away due to the
|
|
usefulness of being able to bookmark things like searches. By the structure of this JSP it should
|
|
be easy to customize it without even editing this particular file. You could simply change the
|
|
header and footer. Let's look at the header.jsp (located in the same directory) next.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="header.jsp (developers/deployers)"><strong>header.jsp (developers/deployers)</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
The header is also very simple by itself. The only thing it does is include the configuration.jsp
|
|
(which you looked at in the last section of this guide) and set the title and a brief header. This
|
|
would be a good place to put your own custom HTML to "pretty" things up a bit. We won't cover the
|
|
footer because all it does is display the footer and close your tags. Let's look at the results.jsp,
|
|
the meat of this application next.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="results.jsp (developers)"><strong>results.jsp (developers)</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
The results.jsp had a lot more functionality. Much of it is for paging the search results we'll not
|
|
cover this as it's commented well enough. It does not perform any optimizations such as caching results,
|
|
etc. as that would make this a more complex example. The first thing in this page is the actual imports
|
|
for the Lucene classes and Lucene demo classes. These classes are loaded from the jars included in the
|
|
WEB-INF/lib directory in the final war file.
|
|
</p>
|
|
<p>
|
|
You'll notice that this file includes the same header and footer as the "index.jsp". From there the jsp
|
|
constructs an IndexSearcher with the "indexLocation" that was specified in the "configuration.jsp". If there
|
|
is an error of any kind in opening the index, it is diplayed to the user and a boolean flag is set to tell
|
|
the rest of the sections of the jsp not to continue.
|
|
</p>
|
|
<p>
|
|
From there, this jsp attempts to get the search criteria, the start index (used for paging) and the maximum
|
|
number of results per page. If the maximum results per page is not set or not valid then it and the
|
|
start index are set to default values. If only the start index is invalid it is set to a default value. If
|
|
the criteria isn't provided then a servlet error is thrown (it is assumed that this is the result of url tampering
|
|
or some form of browser malfunction).
|
|
</p>
|
|
<p>
|
|
The jsp moves on to construct a StandardAnalyzer just as in the simple demo, to analyze the search critieria, it
|
|
is passed to the QueryParser along with the criteria to construct a Query object. You'll also notice the
|
|
string literal "contents" included. This is to specify the search should include the contents and not
|
|
the title, url or some other field in the indexed documents. If there is any error in constructing a Query
|
|
object an error is displayed to the user.
|
|
</p>
|
|
<p>
|
|
In the next section of the jsp the IndexSearcher is asked to search given the query object. The results are
|
|
returned in a collection called "hits". If the length property of the hits collection is 0 then an error
|
|
is displayed to the user and the error flag is set.
|
|
</p>
|
|
<p>
|
|
Finally the jsp iterates through the hits collection and displayed properties of the "Document" objects we talked
|
|
about in the first walkthrough. These objects contain "known" fields specific to their indexer (in this case
|
|
"IndexHTML" constructs a document with "url", "title" and "contents"). You'll notice that these results are paged
|
|
but the search is repeated every time. This is an area where optimization could improve performance for large
|
|
result sets.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="More sources (developers)"><strong>More sources (developers)</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
There are additional sources used by the web app that were not specifically covered by either walkthrough. For
|
|
example the HTML parser, the IndexHTML class and HTMLDocument class. These are very similar to the classes
|
|
covered in the first example, however they have properties sepecific to parsing and indexing HTML. This is
|
|
beyond our scope; however, by now you should feel like you're "getting started" with Lucene.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="Where to go from here? (Everyone!)"><strong>Where to go from here? (Everyone!)</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
There are a number of things this demo doesn't do or doesn't do quite right. For instance, you may
|
|
have noticed that documents in the root context are unreachable (unless you reconfigure Tomcat to
|
|
support that context or redirect to it), anywhere where the directory doesn't quite match the context mapping,
|
|
you'll have a broken link in your results. If you want to index non-local files or have some other
|
|
needs this isn't supported, plus there may be security issues with running the indexing application from
|
|
your webapps directory. There are a number of things left for you the implementor or developer to do.
|
|
</p>
|
|
<p>
|
|
In time some of these things may be added to Lucene as features (if you've got a good idea we'd love to hear it!),
|
|
but for now: this is where you begin and the search engine/indexer ends. Lastly, one would assume you'd
|
|
want to follow the above advice and customize the application to look a little more fancy than black on
|
|
white with "Lucene Template" at the top. We'll see you on the Lucene Users' or Developers' mailing lists!
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="When to contact the Author"><strong>When to contact the Author</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
Please resist the urge to contact the authors of this document (without bribes of fame and fortune attached). First
|
|
contact the <a href="http://lucene.apache.org/java/docs/mailinglists.html">mailing lists</a>. That being said feedback,
|
|
and modifications to this document and samples are ever so greatly appreciated. They are just best sent to the
|
|
lists so that everyone can share in them. Certainly you'll get the most help there as well.
|
|
Thanks for understanding.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
</td>
|
|
</tr>
|
|
|
|
<!-- FOOTER -->
|
|
<tr><td colspan="2">
|
|
<hr noshade="" size="1"/>
|
|
</td></tr>
|
|
<tr><td colspan="2">
|
|
<div align="center"><font color="#525D76" size="-1"><em>
|
|
Copyright © 1999-2005, The Apache Software Foundation
|
|
</em></font></div>
|
|
</td></tr>
|
|
</table>
|
|
</body>
|
|
</html>
|
|
<!-- end the processing -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|