This tutorial is intended to give first-time users an introduction to using Indyo, a datasource-independent Lucene indexing framework.
This will include how to obtain Indyo, configuring Indyo and indexing a directory on a filesystem.
First, you need to obtain Indyo. As of this writing, Indyo is only available via CVS, from the "jakarta-lucene-sandbox" repository. See Jakarta CVS on accessing files via CVS.
Get a copy of Ant if you don't already have it installed. Then simply type "ant" in the directory where the local copy of the Indyo sources reside.
Voila! You should now have a jar file "indyo-<version number>.jar".
The "src/conf" folder contains a default configuration file which is sufficient for normal use.
Congratulations, you have finally reached the fun the part of this tutorial. This is where you'll discover the power of Indyo.
To index a datasource, first instantiate the respective datasource, then hand it to IndyoIndexer for indexing. For example:
FSDataSource is a simple datasource which indexes both files and directories. The metadata FSDataSource adds to each document is: filePath, fileName, fileSize, fileFormat, fileContents, fileLastModifiedDate. Based on the file extension of the files indexed, Indyo will use file content-handlers according to the mappings found in the configuration file. If you're not happy with this list of file metadata, feel free to subclass FSDataSource, or, as we're about to cover next, write your own custom IndexDataSource.
Get familiar with FSDataSource. You'll find it very handy, both for indexing files directly, as well as nesting it within another datasource. For example, you might need to index a database table, in which one of the rows represent the location of a file, and you may want to use FSDataSource to index this file as well.
To write a custom IndexDataSource, you need to write a class which implements IndexDataSource, and provides an implementation for the getData() method which returns a Map[]. The javadoc of the getData() method reads:
So, the getData() method provides a way for Indyo to retrieve document metadata from each IndexDataSource. A simple example of a custom IndexDataSource, HashMapDataSource is provided below.
As you can see, HashMapDataSource doesn't do anything very useful. It always results in one Document being indexed, and the document's fields depend on the contents of the map that HashMapDataSource was initialized with.
A slightly more useful IndexDataSource, SingleDocumentFSDataSource provides an example of how to nest datasources. Given a directory, SingleDocumentFSDataSource recursively indexes all directories and files within that directory as the same Document. In other words, only one Document is created in the index. This is accomplished by the use of a nested datasource. The code for SingleDocumentFSDataSource is listed below:
Nested datasources don't result in a separate Document being created. Use them when working with complex datasources, i.e., datasources which are an aggregation of multiple datasources. The current way to add a nested datasource is using the key "NESTED_DATASOURCE". Indyo accepts an IndexDataSource object, a List of IndexDataSources, or an IndexDataSource[] for this key.
Congratulations! You have completed the Indyo tutorial. Although this has only been an introduction to Torque, it should be sufficient to get you started with Indyo in your applications. For those of you seeking additional information, there are several other documents on this site that can provide details on various subjects. Lastly, the source code is an invaluable resource when all else fails to provide answers!
This document was shamelessly ripped from the extremely well-written and well-organized Torque tutorial. Thanks Pete!