diff --git a/docs/lucene-sandbox/index.html b/docs/lucene-sandbox/index.html index f45719ce498..cf9c561311d 100644 --- a/docs/lucene-sandbox/index.html +++ b/docs/lucene-sandbox/index.html @@ -112,6 +112,24 @@ You can access Lucene Sandbox CVS repository at http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/.

+ + + + +
+ + Indyo + +
+
+

+Indyo is a datasource-independent Lucene indexing framework. +

+

+A tutorial for using Indyo can be found here. +

+
+

diff --git a/docs/lucene-sandbox/indyo/tutorial.html b/docs/lucene-sandbox/indyo/tutorial.html new file mode 100644 index 00000000000..bbe6a302256 --- /dev/null +++ b/docs/lucene-sandbox/indyo/tutorial.html @@ -0,0 +1,466 @@ + + + + + + + + + + + + + + + + + + + Jakarta Lucene - Indyo Tutorial + + + + + + + + + +
+ + +Jakarta Lucene +
+ + + + + + + + + + + + +
+
+
+

About

+ +

Resources

+ +

Plans

+ +

Download

+ +

Jakarta

+ +
+ + + + +
+ + About this Tutorial + +
+
+

+ This tutorial is intended to give first-time users an + introduction to using Indyo, a datasource-independent + Lucene indexing framework. +

+

+ This will include how to obtain Indyo, configuring Indyo + and indexing a directory on a filesystem. +

+
+

+

+ + + + +
+ + Step 1: Obtaining Indyo + +
+
+

+ First, you need to obtain Indyo. As + of this writing, Indyo is only available via CVS, from the + "jakarta-lucene-sandbox" repository. See + Jakarta CVS + on accessing files via CVS.

+
+

+

+ + + + +
+ + Step 2: Building Indyo + +
+
+

+ Get a copy of Ant if + you don't already have it installed. Then simply type "ant" in the + directory where the local copy of the Indyo sources reside. +

+

+ Voila! You should now have a jar file "indyo-<version number>.jar". +

+
+

+

+ + + + +
+ + Step 3: Configuring Indyo + +
+
+

+ The "src/conf" folder contains a default configuration file which is + sufficient for normal use. +

+
+

+

+ + + + +
+ + Step 4: Using Indyo + +
+
+

+ Congratulations, you have finally reached the fun the + part of this tutorial. This is where you'll discover + the power of Indyo. +

+

+ To index a datasource, first instantiate the respective + datasource, then hand it to IndyoIndexer for indexing. + For example: +

+
+ + + + + + + + + + + + + + + + +
+IndexDataSource ds = new FSDataSource("/usr/local/lucene/docs");
+IndyoIndexer indexer = new IndyoIndexer("/usr/local/index", 
+                    "/usr/local/indyo/default.config.xml");
+indexer.index(ds);                    
+
+
+

+ FSDataSource is a simple datasource which indexes both files + and directories. The metadata FSDataSource adds to each document is: + filePath, fileName, fileSize, fileFormat, fileContents, + fileLastModifiedDate. Based on the file extension of the files indexed, + Indyo will use file content-handlers according to the mappings found in the + configuration file. If you're not happy with this list of file + metadata, feel free to subclass FSDataSource, or, as we're about + to cover next, write your own custom IndexDataSource. +

+

+ Get familiar with FSDataSource. You'll find it very handy, both for indexing + files directly, as well as nesting it within another datasource. For example, + you might need to index a database table, in which one of the rows represent + the location of a file, and you may want to use FSDataSource to index this + file as well. +

+ + + + +
+ + Writing your custom IndexDataSource + +
+
+

+ To write a custom IndexDataSource, you need to write a class + which implements IndexDataSource, and provides an implementation + for the getData() method which returns a Map[]. The javadoc of the + getData() method reads: +

+
+ + + + + + + + + + + + + + + + +
+/**
+ * Retrieve a array of Maps. Each map represents the
+ * a document to be indexed. The key:value pair of the map
+ * is the metadata of the document.
+ */
+
+
+

+ So, the getData() method provides a way for Indyo to retrieve document + metadata from each IndexDataSource. A simple example of a custom + IndexDataSource, HashMapDataSource is provided below. +

+
+ + + + + + + + + + + + + + + + +
+public class HashMapDataSource implements IndexDataSource
+{
+    private Map data;
+
+    public HashMapDataSource(Map data)
+    {
+        this.data = data;
+    }
+
+    public Map[] getData() throws Exception
+    {
+        return new Map[1]{data};
+    }
+}
+
+
+

+ As you can see, HashMapDataSource doesn't do anything very useful. It + always results in one Document being indexed, and the document's fields + depend on the contents of the map that HashMapDataSource was initialized + with. +

+

+ A slightly more useful IndexDataSource, SingleDocumentFSDataSource + provides an example of how to nest datasources. Given a directory, + SingleDocumentFSDataSource recursively indexes all directories + and files within that directory as the same Document. In other + words, only one Document is created in the index. This is accomplished + by the use of a nested datasource. The code for + SingleDocumentFSDataSource is listed below: +

+
+ + + + + + + + + + + + + + + + +
 
+public class SingleDocumentFSDataSource
+        implements IndexDataSource
+{
+    private File file;
+
+    public SingleDocumentFSDataSource(File file)
+    {
+        this.file = file;
+    }
+
+    public Map[] getData() throws Exception
+    {
+        Map data = new HashMap(1);
+        data.put(NESTED_DATASOURCE, new FSDataSource(file));
+        return new Map[1]{data};
+    }
+}
+
+
+

+ Nested datasources don't result in a separate Document being created. + Use them when working with complex datasources, i.e., datasources + which are an aggregation of multiple datasources. The current way to + add a nested datasource is using the key "NESTED_DATASOURCE". Indyo + accepts an IndexDataSource object, a List of IndexDataSources, + or an IndexDataSource[] for this key. +

+
+

+
+

+

+ + + + +
+ + Where to Go From Here + +
+
+

+ Congratulations! You have completed the Indyo + tutorial. Although this has only been an introduction + to Torque, it should be sufficient to get you started + with Indyo in your applications. For those of you + seeking additional information, there are several other + documents on this site that can provide details on + various subjects. Lastly, the source code is an + invaluable resource when all else fails to provide + answers! +

+
+

+

+ + + + +
+ + Acknowledgements + +
+
+

+ This document was shamelessly ripped from the extremely well-written + and well-organized + Torque + tutorial. Thanks Pete! +

+
+

+

+
+
+
+
+ Copyright © 1999-2002, Apache Software Foundation +
+
+ + + + + + + + + + + + + + + + + + + + + + + diff --git a/xdocs/lucene-sandbox/index.xml b/xdocs/lucene-sandbox/index.xml index 0a78523d2b1..476a7027db1 100644 --- a/xdocs/lucene-sandbox/index.xml +++ b/xdocs/lucene-sandbox/index.xml @@ -17,6 +17,16 @@ not necessarily be maintained, particularly in their current state. You can access Lucene Sandbox CVS repository at http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/.

+ + +

+Indyo is a datasource-independent Lucene indexing framework. +

+

+A tutorial for using Indyo can be found here. +

+
+ diff --git a/xdocs/lucene-sandbox/indyo/tutorial.xml b/xdocs/lucene-sandbox/indyo/tutorial.xml new file mode 100644 index 00000000000..3f1a282fb1e --- /dev/null +++ b/xdocs/lucene-sandbox/indyo/tutorial.xml @@ -0,0 +1,221 @@ + + + + + + Indyo Tutorial + Kelvin Tan + + + + +
+ +

+ This tutorial is intended to give first-time users an + introduction to using Indyo, a datasource-independent + Lucene indexing framework. +

+ +

+ This will include how to obtain Indyo, configuring Indyo + and indexing a directory on a filesystem. +

+ +
+ +
+ +

+ First, you need to obtain Indyo. As + of this writing, Indyo is only available via CVS, from the + "jakarta-lucene-sandbox" repository. See + Jakarta CVS + on accessing files via CVS.

+ + +
+ +
+ +

+ Get a copy of Ant if + you don't already have it installed. Then simply type "ant" in the + directory where the local copy of the Indyo sources reside. +

+ +

+ Voila! You should now have a jar file "indyo-<version number>.jar". +

+ +
+ +
+ +

+ The "src/conf" folder contains a default configuration file which is + sufficient for normal use. +

+ +
+ +
+ +

+ Congratulations, you have finally reached the fun the + part of this tutorial. This is where you'll discover + the power of Indyo. +

+ +

+ To index a datasource, first instantiate the respective + datasource, then hand it to IndyoIndexer for indexing. + For example: +

+ + + +

+ FSDataSource is a simple datasource which indexes both files + and directories. The metadata FSDataSource adds to each document is: + filePath, fileName, fileSize, fileFormat, fileContents, + fileLastModifiedDate. Based on the file extension of the files indexed, + Indyo will use file content-handlers according to the mappings found in the + configuration file. If you're not happy with this list of file + metadata, feel free to subclass FSDataSource, or, as we're about + to cover next, write your own custom IndexDataSource. +

+ +

+ Get familiar with FSDataSource. You'll find it very handy, both for indexing + files directly, as well as nesting it within another datasource. For example, + you might need to index a database table, in which one of the rows represent + the location of a file, and you may want to use FSDataSource to index this + file as well. +

+ + + +

+ To write a custom IndexDataSource, you need to write a class + which implements IndexDataSource, and provides an implementation + for the getData() method which returns a Map[]. The javadoc of the + getData() method reads: +

+ + + +

+ So, the getData() method provides a way for Indyo to retrieve document + metadata from each IndexDataSource. A simple example of a custom + IndexDataSource, HashMapDataSource is provided below. +

+ + + +

+ As you can see, HashMapDataSource doesn't do anything very useful. It + always results in one Document being indexed, and the document's fields + depend on the contents of the map that HashMapDataSource was initialized + with. +

+ +

+ A slightly more useful IndexDataSource, SingleDocumentFSDataSource + provides an example of how to nest datasources. Given a directory, + SingleDocumentFSDataSource recursively indexes all directories + and files within that directory as the same Document. In other + words, only one Document is created in the index. This is accomplished + by the use of a nested datasource. The code for + SingleDocumentFSDataSource is listed below: +

+ + + +

+ Nested datasources don't result in a separate Document being created. + Use them when working with complex datasources, i.e., datasources + which are an aggregation of multiple datasources. The current way to + add a nested datasource is using the key "NESTED_DATASOURCE". Indyo + accepts an IndexDataSource object, a List of IndexDataSources, + or an IndexDataSource[] for this key. +

+ +
+ +
+ +
+ +

+ Congratulations! You have completed the Indyo + tutorial. Although this has only been an introduction + to Torque, it should be sufficient to get you started + with Indyo in your applications. For those of you + seeking additional information, there are several other + documents on this site that can provide details on + various subjects. Lastly, the source code is an + invaluable resource when all else fails to provide + answers! +

+ +
+ +
+ +

+ This document was shamelessly ripped from the extremely well-written + and well-organized + Torque + tutorial. Thanks Pete! +

+ +
+ + +
+