Added simple README and GETTING STARTED docs. Hope it will ease the learning curve somewhat,

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150766 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Kelvin Tan 2002-05-10 17:20:29 +00:00
parent bb68f17330
commit bcfa0cbc60
2 changed files with 33 additions and 0 deletions

View File

@ -0,0 +1,14 @@
The fastest way to get started is to instantiate FSDataSource, passing a File or
Directory into the constructor. Then in SearchIndexer, invoke indexDataSource and
pass in the FSDataSource in as a parameter. The other argument of indexDataSource,
customFields, is for declaring what are the types of the fields you wish indexed.
It's an optional argument.
Now you might want to try writing your own DataSource by writing a class which
implements the DataSource interface. The only method you need to implement in the
DataSource interface is the getData method which returns an array of Maps.
From the javadoc of this method, "Each map represents a document to be indexed.
The key:value pairs of the map is the metadata of the document.".
That should be pretty self-explanatory. What the framework essentially does is
convert all keys in the map as Fields, and the value of the keys becoming the
value of the Fields. It's that simple!

View File

@ -0,0 +1,19 @@
This is the README file for a search framework contribution to Lucene Sandbox.
It is an attempt at constructing a framework around the Lucene search API.
(Can I have a name for it?)
3 interesting features of this framework are:
datasource independence - through various datasource implementations,
regardless of whether it is a database table, an object, a filesystem directory,
or a website, these can all be indexed.
complex datasource support - complex datasources are containers for what are
potentially new datasources (a Zip archive, a HTML document containing links to
other HTML documents, a Java object which contains references to other objects
to be indexed, etc). The framework has basic support for complex datasources.
pluggable file content handlers - content handlers which 'know' how to index
various file formats (MS Word, Zip, Tar, etc) can be easily configured via an
xml configuration file.