mirror of https://github.com/apache/lucene.git
- Indyo docs.
PR: Obtained from: Submitted by: Kelvin Tan Reviewed by: git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149844 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
eeb7867808
commit
9a3636d255
|
@ -112,6 +112,24 @@
|
|||
You can access Lucene Sandbox CVS repository at
|
||||
<A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
|
||||
</P>
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#828DA6">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Indyo"><strong>Indyo</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Indyo is a datasource-independent Lucene indexing framework.
|
||||
</p>
|
||||
<p>
|
||||
A tutorial for using Indyo can be found <a href="indyo/tutorial.html">here</a>.
|
||||
</p>
|
||||
</blockquote>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
|
|
|
@ -0,0 +1,466 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||||
|
||||
<!-- Content Stylesheet for Site -->
|
||||
|
||||
|
||||
<!-- start the processing -->
|
||||
<!-- ====================================================================== -->
|
||||
<!-- Main Page Section -->
|
||||
<!-- ====================================================================== -->
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
|
||||
|
||||
<meta name="author" value="Kelvin Tan">
|
||||
<meta name="email" value="kelvint@apache.org">
|
||||
|
||||
|
||||
|
||||
<title>Jakarta Lucene - Indyo Tutorial</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff" text="#000000" link="#525D76">
|
||||
<table border="0" width="100%" cellspacing="0">
|
||||
<!-- TOP IMAGE -->
|
||||
<tr>
|
||||
<td align="left">
|
||||
<a href="http://jakarta.apache.org"><img src="http://jakarta.apache.org/images/jakarta-logo.gif" border="0"/></a>
|
||||
</td>
|
||||
<td align="right">
|
||||
<a href="http://jakarta.apache.org/lucene/"><img src="../../images/lucene_green_300.gif" alt="Jakarta Lucene" border="0"/></a>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<table border="0" width="100%" cellspacing="4">
|
||||
<tr><td colspan="2">
|
||||
<hr noshade="" size="1"/>
|
||||
</td></tr>
|
||||
|
||||
<tr>
|
||||
<!-- LEFT SIDE NAVIGATION -->
|
||||
<td width="20%" valign="top" nowrap="true">
|
||||
<p><strong>About</strong></p>
|
||||
<ul>
|
||||
<li> <a href="../../index.html">Overview</a>
|
||||
</li>
|
||||
<li> <a href="../../powered.html">Powered by Lucene</a>
|
||||
</li>
|
||||
<li> <a href="../../whoweare.html">Who We Are</a>
|
||||
</li>
|
||||
<li> <a href="http://jakarta.apache.org/site/mail.html">Mailing Lists</a>
|
||||
</li>
|
||||
</ul>
|
||||
<p><strong>Resources</strong></p>
|
||||
<ul>
|
||||
<li> <a href="http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi">FAQ (Official)</a>
|
||||
</li>
|
||||
<li> <a href="http://www.jguru.com/faq/Lucene">JGuru FAQ</a>
|
||||
</li>
|
||||
<li> <a href="../../gettingstarted.html">Getting Started</a>
|
||||
</li>
|
||||
<li> <a href="http://jakarta.apache.org/site/bugs.html">Bugs</a>
|
||||
</li>
|
||||
<li> <a href="http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Lucene Bugs</a>
|
||||
</li>
|
||||
<li> <a href="../../queryparsersyntax.html">Query Syntax</a>
|
||||
</li>
|
||||
<li> <a href="../../api/index.html">Javadoc</a>
|
||||
</li>
|
||||
<li> <a href="../../contributions.html">Contributions</a>
|
||||
</li>
|
||||
<li> <a href="../../lucenesandbox.html">Lucene Sandbox</a>
|
||||
</li>
|
||||
<li> <a href="../../resources.html">Articles, etc.</a>
|
||||
</li>
|
||||
</ul>
|
||||
<p><strong>Plans</strong></p>
|
||||
<ul>
|
||||
<li> <a href="../../luceneplan.html">Application Extensions</a>
|
||||
</li>
|
||||
</ul>
|
||||
<p><strong>Download</strong></p>
|
||||
<ul>
|
||||
<li> <a href="http://jakarta.apache.org/site/binindex.html">Binaries</a>
|
||||
</li>
|
||||
<li> <a href="http://jakarta.apache.org/site/sourceindex.html">Source Code</a>
|
||||
</li>
|
||||
<li> <a href="http://jakarta.apache.org/site/cvsindex.html">CVS Repositories</a>
|
||||
</li>
|
||||
</ul>
|
||||
<p><strong>Jakarta</strong></p>
|
||||
<ul>
|
||||
<li> <a href="http://jakarta.apache.org/site/getinvolved.html">Get Involved</a>
|
||||
</li>
|
||||
<li> <a href="http://jakarta.apache.org/site/acknowledgements.html">Acknowledgements</a>
|
||||
</li>
|
||||
<li> <a href="http://jakarta.apache.org/site/contact.html">Contact</a>
|
||||
</li>
|
||||
<li> <a href="http://jakarta.apache.org/site/legal.html">Legal</a>
|
||||
</li>
|
||||
</ul>
|
||||
</td>
|
||||
<td width="80%" align="left" valign="top">
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="About this Tutorial"><strong>About this Tutorial</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
This tutorial is intended to give first-time users an
|
||||
introduction to using Indyo, a datasource-independent
|
||||
Lucene indexing framework.
|
||||
</p>
|
||||
<p>
|
||||
This will include how to obtain Indyo, configuring Indyo
|
||||
and indexing a directory on a filesystem.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Step 1: Obtaining Indyo"><strong>Step 1: Obtaining Indyo</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
First, you need to obtain Indyo. As
|
||||
of this writing, Indyo is only available via CVS, from the
|
||||
"jakarta-lucene-sandbox" repository. See
|
||||
<a href="http://jakarta.apache.org/cvsindex.html">Jakarta CVS</a>
|
||||
on accessing files via CVS.</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Step 2: Building Indyo"><strong>Step 2: Building Indyo</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Get a copy of <a href="http://jakarta.apache.org/ant">Ant</a> if
|
||||
you don't already have it installed. Then simply type "ant" in the
|
||||
directory where the local copy of the Indyo sources reside.
|
||||
</p>
|
||||
<p>
|
||||
Voila! You should now have a jar file "indyo-<version number>.jar".
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Step 3: Configuring Indyo"><strong>Step 3: Configuring Indyo</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
The "src/conf" folder contains a default configuration file which is
|
||||
sufficient for normal use.
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Step 4: Using Indyo"><strong>Step 4: Using Indyo</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Congratulations, you have finally reached the fun the
|
||||
part of this tutorial. This is where you'll discover
|
||||
the power of Indyo.
|
||||
</p>
|
||||
<p>
|
||||
To index a datasource, first instantiate the respective
|
||||
datasource, then hand it to IndyoIndexer for indexing.
|
||||
For example:
|
||||
</p>
|
||||
<div align="left">
|
||||
<table cellspacing="4" cellpadding="0" border="0">
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#ffffff"><pre>
|
||||
IndexDataSource ds = new FSDataSource("/usr/local/lucene/docs");
|
||||
IndyoIndexer indexer = new IndyoIndexer("/usr/local/index",
|
||||
"/usr/local/indyo/default.config.xml");
|
||||
indexer.index(ds);
|
||||
</pre></td>
|
||||
<td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
<p>
|
||||
FSDataSource is a simple datasource which indexes both files
|
||||
and directories. The metadata FSDataSource adds to each document is:
|
||||
filePath, fileName, fileSize, fileFormat, fileContents,
|
||||
fileLastModifiedDate. Based on the file extension of the files indexed,
|
||||
Indyo will use file content-handlers according to the mappings found in the
|
||||
configuration file. If you're not happy with this list of file
|
||||
metadata, feel free to subclass FSDataSource, or, as we're about
|
||||
to cover next, write your own custom IndexDataSource.
|
||||
</p>
|
||||
<p>
|
||||
Get familiar with FSDataSource. You'll find it very handy, both for indexing
|
||||
files directly, as well as nesting it within another datasource. For example,
|
||||
you might need to index a database table, in which one of the rows represent
|
||||
the location of a file, and you may want to use FSDataSource to index this
|
||||
file as well.
|
||||
</p>
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#828DA6">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Writing your custom IndexDataSource"><strong>Writing your custom IndexDataSource</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
To write a custom IndexDataSource, you need to write a class
|
||||
which implements IndexDataSource, and provides an implementation
|
||||
for the getData() method which returns a Map[]. The javadoc of the
|
||||
getData() method reads:
|
||||
</p>
|
||||
<div align="left">
|
||||
<table cellspacing="4" cellpadding="0" border="0">
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#ffffff"><pre>
|
||||
/**
|
||||
* Retrieve a array of Maps. Each map represents the
|
||||
* a document to be indexed. The key:value pair of the map
|
||||
* is the metadata of the document.
|
||||
*/
|
||||
</pre></td>
|
||||
<td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
<p>
|
||||
So, the getData() method provides a way for Indyo to retrieve document
|
||||
metadata from each IndexDataSource. A simple example of a custom
|
||||
IndexDataSource, HashMapDataSource is provided below.
|
||||
</p>
|
||||
<div align="left">
|
||||
<table cellspacing="4" cellpadding="0" border="0">
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#ffffff"><pre>
|
||||
public class HashMapDataSource implements IndexDataSource
|
||||
{
|
||||
private Map data;
|
||||
|
||||
public HashMapDataSource(Map data)
|
||||
{
|
||||
this.data = data;
|
||||
}
|
||||
|
||||
public Map[] getData() throws Exception
|
||||
{
|
||||
return new Map[1]{data};
|
||||
}
|
||||
}
|
||||
</pre></td>
|
||||
<td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
<p>
|
||||
As you can see, HashMapDataSource doesn't do anything very useful. It
|
||||
always results in one Document being indexed, and the document's fields
|
||||
depend on the contents of the map that HashMapDataSource was initialized
|
||||
with.
|
||||
</p>
|
||||
<p>
|
||||
A slightly more useful IndexDataSource, SingleDocumentFSDataSource
|
||||
provides an example of how to nest datasources. Given a directory,
|
||||
SingleDocumentFSDataSource recursively indexes all directories
|
||||
and files within that directory <i>as the same Document</i>. In other
|
||||
words, only one Document is created in the index. This is accomplished
|
||||
by the use of a nested datasource. The code for
|
||||
SingleDocumentFSDataSource is listed below:
|
||||
</p>
|
||||
<div align="left">
|
||||
<table cellspacing="4" cellpadding="0" border="0">
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#ffffff"><pre>
|
||||
public class SingleDocumentFSDataSource
|
||||
implements IndexDataSource
|
||||
{
|
||||
private File file;
|
||||
|
||||
public SingleDocumentFSDataSource(File file)
|
||||
{
|
||||
this.file = file;
|
||||
}
|
||||
|
||||
public Map[] getData() throws Exception
|
||||
{
|
||||
Map data = new HashMap(1);
|
||||
data.put(NESTED_DATASOURCE, new FSDataSource(file));
|
||||
return new Map[1]{data};
|
||||
}
|
||||
}
|
||||
</pre></td>
|
||||
<td bgcolor="#023264" width="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
<td bgcolor="#023264" width="1" height="1"><img src="/images/void.gif" width="1" height="1" vspace="0" hspace="0" border="0"/></td>
|
||||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
<p>
|
||||
Nested datasources don't result in a separate Document being created.
|
||||
Use them when working with complex datasources, i.e., datasources
|
||||
which are an aggregation of multiple datasources. The current way to
|
||||
add a nested datasource is using the key "NESTED_DATASOURCE". Indyo
|
||||
accepts an IndexDataSource object, a List of IndexDataSources,
|
||||
or an IndexDataSource[] for this key.
|
||||
</p>
|
||||
</blockquote>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Where to Go From Here"><strong>Where to Go From Here</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Congratulations! You have completed the Indyo
|
||||
tutorial. Although this has only been an introduction
|
||||
to Torque, it should be sufficient to get you started
|
||||
with Indyo in your applications. For those of you
|
||||
seeking additional information, there are several other
|
||||
documents on this site that can provide details on
|
||||
various subjects. Lastly, the source code is an
|
||||
invaluable resource when all else fails to provide
|
||||
answers!
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Acknowledgements"><strong>Acknowledgements</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
This document was shamelessly ripped from the extremely well-written
|
||||
and well-organized
|
||||
<a href="http://jakarta.apache.org/turbine/torque/tutorial.html">Torque
|
||||
</a> tutorial. Thanks Pete!
|
||||
</p>
|
||||
</blockquote>
|
||||
</p>
|
||||
</td></tr>
|
||||
<tr><td><br/></td></tr>
|
||||
</table>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<!-- FOOTER -->
|
||||
<tr><td colspan="2">
|
||||
<hr noshade="" size="1"/>
|
||||
</td></tr>
|
||||
<tr><td colspan="2">
|
||||
<div align="center"><font color="#525D76" size="-1"><em>
|
||||
Copyright © 1999-2002, Apache Software Foundation
|
||||
</em></font></div>
|
||||
</td></tr>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
||||
<!-- end the processing -->
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -17,6 +17,16 @@ not necessarily be maintained, particularly in their current state.
|
|||
You can access Lucene Sandbox CVS repository at
|
||||
<A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
|
||||
</P>
|
||||
|
||||
<subsection name="Indyo">
|
||||
<p>
|
||||
Indyo is a datasource-independent Lucene indexing framework.
|
||||
</p>
|
||||
<p>
|
||||
A tutorial for using Indyo can be found <a href="indyo/tutorial.html">here</a>.
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
</section>
|
||||
|
||||
</body>
|
||||
|
|
|
@ -0,0 +1,221 @@
|
|||
<?xml version="1.0"?>
|
||||
|
||||
<document>
|
||||
|
||||
<properties>
|
||||
<title>Indyo Tutorial</title>
|
||||
<author email="kelvint@apache.org">Kelvin Tan</author>
|
||||
</properties>
|
||||
|
||||
<body>
|
||||
|
||||
<section name="About this Tutorial">
|
||||
|
||||
<p>
|
||||
This tutorial is intended to give first-time users an
|
||||
introduction to using Indyo, a datasource-independent
|
||||
Lucene indexing framework.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
This will include how to obtain Indyo, configuring Indyo
|
||||
and indexing a directory on a filesystem.
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
<section name="Step 1: Obtaining Indyo">
|
||||
|
||||
<p>
|
||||
First, you need to obtain Indyo. As
|
||||
of this writing, Indyo is only available via CVS, from the
|
||||
"jakarta-lucene-sandbox" repository. See
|
||||
<a href="http://jakarta.apache.org/cvsindex.html">Jakarta CVS</a>
|
||||
on accessing files via CVS.</p>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
<section name="Step 2: Building Indyo">
|
||||
|
||||
<p>
|
||||
Get a copy of <a href="http://jakarta.apache.org/ant">Ant</a> if
|
||||
you don't already have it installed. Then simply type "ant" in the
|
||||
directory where the local copy of the Indyo sources reside.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Voila! You should now have a jar file "indyo-<version number>.jar".
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
<section name="Step 3: Configuring Indyo">
|
||||
|
||||
<p>
|
||||
The "src/conf" folder contains a default configuration file which is
|
||||
sufficient for normal use.
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
<section name="Step 4: Using Indyo">
|
||||
|
||||
<p>
|
||||
Congratulations, you have finally reached the fun the
|
||||
part of this tutorial. This is where you'll discover
|
||||
the power of Indyo.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
To index a datasource, first instantiate the respective
|
||||
datasource, then hand it to IndyoIndexer for indexing.
|
||||
For example:
|
||||
</p>
|
||||
|
||||
<source><![CDATA[
|
||||
IndexDataSource ds = new FSDataSource("/usr/local/lucene/docs");
|
||||
IndyoIndexer indexer = new IndyoIndexer("/usr/local/index",
|
||||
"/usr/local/indyo/default.config.xml");
|
||||
indexer.index(ds);
|
||||
]]></source>
|
||||
|
||||
<p>
|
||||
FSDataSource is a simple datasource which indexes both files
|
||||
and directories. The metadata FSDataSource adds to each document is:
|
||||
filePath, fileName, fileSize, fileFormat, fileContents,
|
||||
fileLastModifiedDate. Based on the file extension of the files indexed,
|
||||
Indyo will use file content-handlers according to the mappings found in the
|
||||
configuration file. If you're not happy with this list of file
|
||||
metadata, feel free to subclass FSDataSource, or, as we're about
|
||||
to cover next, write your own custom IndexDataSource.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Get familiar with FSDataSource. You'll find it very handy, both for indexing
|
||||
files directly, as well as nesting it within another datasource. For example,
|
||||
you might need to index a database table, in which one of the rows represent
|
||||
the location of a file, and you may want to use FSDataSource to index this
|
||||
file as well.
|
||||
</p>
|
||||
|
||||
<subsection name="Writing your custom IndexDataSource">
|
||||
|
||||
<p>
|
||||
To write a custom IndexDataSource, you need to write a class
|
||||
which implements IndexDataSource, and provides an implementation
|
||||
for the getData() method which returns a Map[]. The javadoc of the
|
||||
getData() method reads:
|
||||
</p>
|
||||
|
||||
<source><![CDATA[
|
||||
/**
|
||||
* Retrieve a array of Maps. Each map represents the
|
||||
* a document to be indexed. The key:value pair of the map
|
||||
* is the metadata of the document.
|
||||
*/
|
||||
]]></source>
|
||||
|
||||
<p>
|
||||
So, the getData() method provides a way for Indyo to retrieve document
|
||||
metadata from each IndexDataSource. A simple example of a custom
|
||||
IndexDataSource, HashMapDataSource is provided below.
|
||||
</p>
|
||||
|
||||
<source><![CDATA[
|
||||
public class HashMapDataSource implements IndexDataSource
|
||||
{
|
||||
private Map data;
|
||||
|
||||
public HashMapDataSource(Map data)
|
||||
{
|
||||
this.data = data;
|
||||
}
|
||||
|
||||
public Map[] getData() throws Exception
|
||||
{
|
||||
return new Map[1]{data};
|
||||
}
|
||||
}
|
||||
]]></source>
|
||||
|
||||
<p>
|
||||
As you can see, HashMapDataSource doesn't do anything very useful. It
|
||||
always results in one Document being indexed, and the document's fields
|
||||
depend on the contents of the map that HashMapDataSource was initialized
|
||||
with.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
A slightly more useful IndexDataSource, SingleDocumentFSDataSource
|
||||
provides an example of how to nest datasources. Given a directory,
|
||||
SingleDocumentFSDataSource recursively indexes all directories
|
||||
and files within that directory <i>as the same Document</i>. In other
|
||||
words, only one Document is created in the index. This is accomplished
|
||||
by the use of a nested datasource. The code for
|
||||
SingleDocumentFSDataSource is listed below:
|
||||
</p>
|
||||
|
||||
<source><![CDATA[
|
||||
public class SingleDocumentFSDataSource
|
||||
implements IndexDataSource
|
||||
{
|
||||
private File file;
|
||||
|
||||
public SingleDocumentFSDataSource(File file)
|
||||
{
|
||||
this.file = file;
|
||||
}
|
||||
|
||||
public Map[] getData() throws Exception
|
||||
{
|
||||
Map data = new HashMap(1);
|
||||
data.put(NESTED_DATASOURCE, new FSDataSource(file));
|
||||
return new Map[1]{data};
|
||||
}
|
||||
}
|
||||
]]></source>
|
||||
|
||||
<p>
|
||||
Nested datasources don't result in a separate Document being created.
|
||||
Use them when working with complex datasources, i.e., datasources
|
||||
which are an aggregation of multiple datasources. The current way to
|
||||
add a nested datasource is using the key "NESTED_DATASOURCE". Indyo
|
||||
accepts an IndexDataSource object, a List of IndexDataSources,
|
||||
or an IndexDataSource[] for this key.
|
||||
</p>
|
||||
|
||||
</subsection>
|
||||
|
||||
</section>
|
||||
|
||||
<section name="Where to Go From Here">
|
||||
|
||||
<p>
|
||||
Congratulations! You have completed the Indyo
|
||||
tutorial. Although this has only been an introduction
|
||||
to Torque, it should be sufficient to get you started
|
||||
with Indyo in your applications. For those of you
|
||||
seeking additional information, there are several other
|
||||
documents on this site that can provide details on
|
||||
various subjects. Lastly, the source code is an
|
||||
invaluable resource when all else fails to provide
|
||||
answers!
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
<section name="Acknowledgements">
|
||||
|
||||
<p>
|
||||
This document was shamelessly ripped from the extremely well-written
|
||||
and well-organized
|
||||
<a href="http://jakarta.apache.org/turbine/torque/tutorial.html">Torque
|
||||
</a> tutorial. Thanks Pete!
|
||||
</p>
|
||||
|
||||
</section>
|
||||
|
||||
</body>
|
||||
</document>
|
||||
|
Loading…
Reference in New Issue