lucene/sandbox/contributions/indyo/docs/introduction.txt

9 lines
1.3 KiB
Plaintext
Raw Normal View History

Indyo is a datasource-independent Lucene indexing framework. What this means, is that Indyo allows a myriad of sources from which data is fed to the search engine to be indexed. Datasources can take the form of traditional storage mediums (filesystem, database, web site, etc), objects, complex datasources which consist of a mixture of objects and storage medium, and pretty much anything which implements com.relevanz.indyo.IndexDataSource. If it's a file that's being indexed (via com.relevanz.indyo.FSDataSource), the contents of the file can be indexed by a class which implements com.relevanz.indyo.contenthandler.FileContentHandler (e.g. TextHandler, ZIPHandler, etc). Via the datasource, applications can also associate a search result object with the object that was indexed (or optionally use Peter's SearchBean contribution), for display purposes.
To summarize, if you:
a) Want a way of indexing various sources of data, and even nested datasources (like indexing a HTML file, which spawns a custom datasource, say RemoteHTMLDataSource, for every link it encounters)
b) Simply want a pluggable system of indexing different types of file content (currently plain text, Zip, Tar, GZip file formats are supported, but writing new file content handlers are easy)
then Indyo may be worth checking out.