mirror of https://github.com/apache/lucene.git
286 lines
14 KiB
HTML
286 lines
14 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
|
|
<!--
|
|
Copyright 1999-2004 The Apache Software Foundation
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
-->
|
|
|
|
|
|
<!-- Content Stylesheet for Site -->
|
|
|
|
|
|
<!-- start the processing -->
|
|
<!-- ====================================================================== -->
|
|
<!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! -->
|
|
<!-- Main Page Section -->
|
|
<!-- ====================================================================== -->
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
|
|
|
|
<meta name="author" value="Andrew C. Oliver">
|
|
<meta name="email" value="acoliver@apache.org">
|
|
|
|
|
|
|
|
|
|
<title>Apache Lucene - Apache Lucene - Basic Demo Sources Walk-through</title>
|
|
<link rel="stylesheet" type="text/css" href="styles/lucene.css">
|
|
</head>
|
|
|
|
<body bgcolor="#ffffff" text="#000000" link="#525D76">
|
|
<table border="0" width="100%" cellspacing="0">
|
|
<!-- TOP IMAGE -->
|
|
<tr>
|
|
<td align="left">
|
|
<a href="http://www.apache.org"><img src="http://lucene.apache.org/java/docs/images/asf-logo.gif" width="387" height="100" border="0"/></a>
|
|
</td>
|
|
<td align="right">
|
|
<a href="http://lucene.apache.org/"><img src="./images/lucene_green_300.gif" alt="Apache Lucene" border="0"/></a>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<table border="0" width="100%" cellspacing="4">
|
|
<tr><td colspan="2">
|
|
<hr noshade="" size="1"/>
|
|
</td></tr>
|
|
|
|
<tr>
|
|
<!-- LEFT SIDE NAVIGATION -->
|
|
<td width="20%" valign="top" nowrap="true">
|
|
|
|
<!-- ============================================================ -->
|
|
|
|
<p><strong>About</strong></p>
|
|
<ul>
|
|
<li> <a href="./index.html">Overview</a>
|
|
</li>
|
|
<li> <a href="./features.html">Features</a>
|
|
</li>
|
|
<li> <a href="http://wiki.apache.org/jakarta-lucene/PoweredBy">Powered by Lucene</a>
|
|
</li>
|
|
<li> <a href="./whoweare.html">Who We Are</a>
|
|
</li>
|
|
<li> <a href="./mailinglists.html">Mailing Lists</a>
|
|
</li>
|
|
</ul>
|
|
<p><strong>Resources</strong></p>
|
|
<ul>
|
|
<li> <a href="http://wiki.apache.org/jakarta-lucene">Wiki</a>
|
|
</li>
|
|
<li> <a href="http://wiki.apache.org/jakarta-lucene/LuceneFAQ">FAQ</a>
|
|
</li>
|
|
<li> <a href="./gettingstarted.html">Getting Started</a>
|
|
</li>
|
|
<li> <a href="./queryparsersyntax.html">Query Syntax</a>
|
|
</li>
|
|
<li> <a href="./fileformats.html">File Formats</a>
|
|
</li>
|
|
<li> <a href="./scoring.html">Scoring</a>
|
|
</li>
|
|
<li> <a href="./api/index.html">Javadoc</a>
|
|
</li>
|
|
<li> <a href="./contributions.html">Contributions</a>
|
|
</li>
|
|
<li> <a href="./benchmarks.html">Benchmarks</a>
|
|
</li>
|
|
<li> <a href="http://issues.apache.org/jira/browse/LUCENE">Issue Tracker</a>
|
|
</li>
|
|
<li> <a href="./lucene-sandbox/">Lucene Sandbox</a>
|
|
</li>
|
|
</ul>
|
|
<p><strong>Download</strong></p>
|
|
<ul>
|
|
<li> <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">Releases</a>
|
|
</li>
|
|
<li> <a href="http://svn.apache.org/viewcvs.cgi/lucene/java/">Source Repository</a>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
<td width="80%" align="left" valign="top">
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="About the Code"><strong>About the Code</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
In this section we walk through the sources behind the command-line Lucene demo: where to find them,
|
|
their parts and their function. This section is intended for Java developers wishing to understand
|
|
how to use Lucene in their applications.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="Location of the source"><strong>Location of the source</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
Relative to the directory created when you extracted Lucene or retrieved it from Subversion, you
|
|
should see a directory called <code>src</code> which in turn contains a directory called
|
|
<code>demo</code>. This is the root for all of the Lucene demos. Under this directory is
|
|
<code>org/apache/lucene/demo</code>. This is where all the Java sources for the demos live.
|
|
</p>
|
|
<p>
|
|
Within this directory you should see the <code>IndexFiles.java</code> class we executed earlier.
|
|
Bring it up in <code>vi</code> or your editor of choice and let's take a look at it.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="IndexFiles"><strong>IndexFiles</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
As we discussed in the previous walk-through, the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class creates a Lucene
|
|
Index. Let's take a look at how it does this.
|
|
</p>
|
|
<p>
|
|
The first substantial thing the <code>main</code> function does is instantiate <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code>. It passes the string
|
|
"<code>index</code>" and a new instance of a class called <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>.
|
|
The "<code>index</code>" string is the name of the filesystem directory where all index information
|
|
should be stored. Because we're not passing a full path, this will be created as a subdirectory of
|
|
the current working directory (if it does not already exist). On some platforms, it may be created
|
|
in other directories (such as the user's home directory).
|
|
</p>
|
|
<p>
|
|
The <code><a href="api/org/apache/lucene/index/IndexWriter.html">IndexWriter</a></code> is the main
|
|
class responsible for creating indices. To use it you must instantiate it with a path that it can
|
|
write the index into. If this path does not exist it will first create it. Otherwise it will
|
|
refresh the index at that path. You can also create an index using one of the subclasses of <code><a href="api/org/apache/lucene/store/Directory.html">Directory</a></code>. In any case, you must also pass an
|
|
instance of <code><a href="api/org/apache/lucene/analysis/Analyzer.html">org.apache.lucene.analysis.Analyzer</a></code>.
|
|
</p>
|
|
<p>
|
|
The particular <code><a href="api/org/apache/lucene/analysis/Analyzer.html">Analyzer</a></code> we
|
|
are using, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>, is
|
|
little more than a standard Java Tokenizer, converting all strings to lowercase and filtering out
|
|
useless words and characters from the index. By useless words and characters I mean common language
|
|
words such as articles (a, an, the, etc.) and other strings that would be useless for searching
|
|
(e.g. <b>'s</b>) . It should be noted that there are different rules for every language, and you
|
|
should use the proper analyzer for each. Lucene currently provides Analyzers for a number of
|
|
different languages (see the <code>*Analyzer.java</code> sources under <a href="http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/">contrib/analyzers/src/java/org/apache/lucene/analysis</a>).
|
|
</p>
|
|
<p>
|
|
Looking further down in the file, you should see the <code>indexDocs()</code> code. This recursive
|
|
function simply crawls the directories and uses <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code> to create <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> objects. The <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code> is simply a data object to
|
|
represent the content in the file as well as its creation time and location. These instances are
|
|
added to the <code>indexWriter</code>. Take a look inside <code><a href="api/org/apache/lucene/demo/FileDocument.html">FileDocument</a></code>. It's not particularly
|
|
complicated. It just adds fields to the <code><a href="api/org/apache/lucene/document/Document.html">Document</a></code>.
|
|
</p>
|
|
<p>
|
|
As you can see there isn't much to creating an index. The devil is in the details. You may also
|
|
wish to examine the other samples in this directory, particularly the <code><a href="api/org/apache/lucene/demo/IndexHTML.html">IndexHTML</a></code> class. It is a bit more
|
|
complex but builds upon this example.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="Searching Files"><strong>Searching Files</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
The <code><a href="api/org/apache/lucene/demo/SearchFiles.html">SearchFiles</a></code> class is
|
|
quite simple. It primarily collaborates with an <code><a href="api/org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a></code>, <code><a href="api/org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a></code>
|
|
(which is used in the <code><a href="api/org/apache/lucene/demo/IndexFiles.html">IndexFiles</a></code> class as well) and a
|
|
<code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code>. The
|
|
query parser is constructed with an analyzer used to interpret your query text in the same way the
|
|
documents are interpreted: finding the end of words and removing useless words like 'a', 'an' and
|
|
'the'. The <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object contains
|
|
the results from the <code><a href="api/org/apache/lucene/queryParser/QueryParser.html">QueryParser</a></code> which is passed to
|
|
the searcher. Note that it's also possible to programmatically construct a rich <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object without using the query
|
|
parser. The query parser just enables decoding the <a href="queryparsersyntax.html">Lucene query
|
|
syntax</a> into the corresponding <code><a href="api/org/apache/lucene/search/Query.html">Query</a></code> object. The searcher results are
|
|
returned in a collection of Documents called <code><a href="api/org/apache/lucene/search/Hits.html">Hits</a></code> which is then iterated through and
|
|
displayed to the user.
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
|
<tr><td bgcolor="#525D76">
|
|
<font color="#ffffff" face="arial,helvetica,sanserif">
|
|
<a name="The Web example..."><strong>The Web example...</strong></a>
|
|
</font>
|
|
</td></tr>
|
|
<tr><td>
|
|
<blockquote>
|
|
<p>
|
|
<a href="demo3.html">read on>>></a>
|
|
</p>
|
|
</blockquote>
|
|
</p>
|
|
</td></tr>
|
|
<tr><td><br/></td></tr>
|
|
</table>
|
|
</td>
|
|
</tr>
|
|
|
|
<!-- FOOTER -->
|
|
<tr><td colspan="2">
|
|
<hr noshade="" size="1"/>
|
|
</td></tr>
|
|
<tr><td colspan="2">
|
|
<div align="center"><font color="#525D76" size="-1"><em>
|
|
Copyright © 1999-2005, The Apache Software Foundation
|
|
</em></font></div>
|
|
</td></tr>
|
|
</table>
|
|
</body>
|
|
</html>
|
|
<!-- end the processing -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|