lucene/docs/demo2.html

288 lines
14 KiB
HTML
Raw Normal View History

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- Content Stylesheet for Site -->
<!-- start the processing -->
<!-- ====================================================================== -->
<!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! -->
<!-- Main Page Section -->
<!-- ====================================================================== -->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<meta name="author" value="Andrew C. Oliver">
<meta name="email" value="acoliver@apache.org">
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walk-through</title>
<link rel="stylesheet" type="text/css" href="styles/lucene.css">
</head>
<body bgcolor="#ffffff" text="#000000" link="#525D76">
<table border="0" width="100%" cellspacing="0">
<!-- TOP IMAGE -->
<tr>
<td align="left">
<a href="http://www.apache.org"><img src="http://lucene.apache.org/java/docs/images/asf-logo.gif" width="387" height="100" border="0"/></a>
</td>
<td align="right">
<a href="http://lucene.apache.org/"><img src="./images/lucene_green_300.gif" alt="Apache Lucene" border="0"/></a>
</td>
</tr>
</table>
<table border="0" width="100%" cellspacing="4">
<tr><td colspan="2">
<hr noshade="" size="1"/>
</td></tr>
<tr>
<!-- LEFT SIDE NAVIGATION -->
<td width="20%" valign="top" nowrap="true">
<!-- ============================================================ -->
<p><strong>About</strong></p>
<ul>
<li> <a href="./index.html">Overview</a>
</li>
<li> <a href="./features.html">Features</a>
</li>
<li> <a href="http://wiki.apache.org/jakarta-lucene/PoweredBy">Powered by Lucene</a>
</li>
<li> <a href="./whoweare.html">Who We Are</a>
</li>
<li> <a href="./mailinglists.html">Mailing Lists</a>
</li>
</ul>
<p><strong>Resources</strong></p>
<ul>
<li> <a href="http://wiki.apache.org/jakarta-lucene">Wiki</a>
</li>
<li> <a href="http://wiki.apache.org/jakarta-lucene/LuceneFAQ">FAQ</a>
</li>
<li> <a href="./gettingstarted.html">Getting Started</a>
</li>
<li> <a href="./queryparsersyntax.html">Query Syntax</a>
</li>
<li> <a href="./fileformats.html">File Formats</a>
</li>
<li> <a href="./scoring.html">Scoring</a>
</li>
<li> <a href="./api/index.html">Javadoc</a>
</li>
<li> <a href="./contributions.html">Contributions</a>
</li>
<li> <a href="./benchmarks.html">Benchmarks</a>
</li>
<li> <a href="http://issues.apache.org/jira/browse/LUCENE">Issue Tracker</a>
</li>
<li> <a href="./lucene-sandbox/">Lucene Sandbox</a>
</li>
</ul>
<p><strong>Download</strong></p>
<ul>
<li> <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">Releases</a>
</li>
<li> <a href="http://svn.apache.org/viewcvs.cgi/lucene/java/">Source Repository</a>
</li>
</ul>
</td>
<td width="80%" align="left" valign="top">
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="About the Code"><strong>About the Code</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
In this section we walk through the sources behind the command-line Lucene demo: where to find them,
their parts and their function. This section is intended for Java developers wishing to understand
how to use Lucene in their applications.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Location of the source"><strong>Location of the source</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
should see a directory called <code>src</code> which in turn contains a directory called
<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
</p>
<p>
Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="IndexFiles"><strong>IndexFiles</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
As we discussed in the previous walk-through, the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
Index. Let's take a look at how it does this.
</p>
<p>
The first substantial thing the <code>main</code> function does is instantiate <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
"<code>index</code>" and a new instance of a class called <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
The "<code>index</code>" string is the name of the filesystem directory where all index information
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
the current working directory (if it does not already exist). On some platforms, it may be created
in other directories (such as the user's home directory).
</p>
<p>
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
class responsible for creating indices. To use it you must instantiate it with a path that it can
write the index into. If this path does not exist it will first create it. Otherwise it will
refresh the index at that path. You can also create an index using one of the subclasses of <code><a href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
instance of <code><a href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
</p>
<p>
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
are using, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
useless words and characters from the index. By useless words and characters I mean common language
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
different languages (see the <code>*Analyzer.java</code> sources under <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
</p>
<p>
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
function simply crawls the directories and uses <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
represent the content in the file as well as its creation time and location. These instances are
added to the <code>indexWriter</code>. Take a look inside <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
complicated. It just adds fields to the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code>.
</p>
<p>
As you can see there isn't much to creating an index. The devil is in the details. You may also
wish to examine the other samples in this directory, particularly the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
complex but builds upon this example.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Searching Files"><strong>Searching Files</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
quite simple. It primarily collaborates with an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
(which is used in the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
query parser is constructed with an analyzer used to interpret your query text in the same way the
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
the results from the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
the searcher. Note that it's also possible to programmatically construct a rich <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
syntax</a> into the corresponding <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
returned in a collection of Documents called <code><a href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
displayed to the user.
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="The Web example..."><strong>The Web example...</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
<a href="demo3.html">read on&gt;&gt;&gt;</a>
</p>
</blockquote>
</p>
</td></tr>
<tr><td><br/></td></tr>
</table>
</td>
</tr>
<!-- FOOTER -->
<tr><td colspan="2">
<hr noshade="" size="1"/>
</td></tr>
<tr><td colspan="2">
<div align="center"><font color="#525D76" size="-1"><em>
Copyright &#169; 1999-2005, The Apache Software Foundation
</em></font></div>
</td></tr>
</table>
</body>
</html>
<!-- end the processing -->