mirror of https://github.com/apache/lucene.git
Added simple README and GETTING STARTED docs. Hope it will ease the learning curve somewhat,
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150766 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
bb68f17330
commit
bcfa0cbc60
|
@ -0,0 +1,14 @@
|
|||
The fastest way to get started is to instantiate FSDataSource, passing a File or
|
||||
Directory into the constructor. Then in SearchIndexer, invoke indexDataSource and
|
||||
pass in the FSDataSource in as a parameter. The other argument of indexDataSource,
|
||||
customFields, is for declaring what are the types of the fields you wish indexed.
|
||||
It's an optional argument.
|
||||
|
||||
Now you might want to try writing your own DataSource by writing a class which
|
||||
implements the DataSource interface. The only method you need to implement in the
|
||||
DataSource interface is the getData method which returns an array of Maps.
|
||||
From the javadoc of this method, "Each map represents a document to be indexed.
|
||||
The key:value pairs of the map is the metadata of the document.".
|
||||
That should be pretty self-explanatory. What the framework essentially does is
|
||||
convert all keys in the map as Fields, and the value of the keys becoming the
|
||||
value of the Fields. It's that simple!
|
|
@ -0,0 +1,19 @@
|
|||
This is the README file for a search framework contribution to Lucene Sandbox.
|
||||
|
||||
It is an attempt at constructing a framework around the Lucene search API.
|
||||
(Can I have a name for it?)
|
||||
|
||||
3 interesting features of this framework are:
|
||||
|
||||
datasource independence - through various datasource implementations,
|
||||
regardless of whether it is a database table, an object, a filesystem directory,
|
||||
or a website, these can all be indexed.
|
||||
|
||||
complex datasource support - complex datasources are containers for what are
|
||||
potentially new datasources (a Zip archive, a HTML document containing links to
|
||||
other HTML documents, a Java object which contains references to other objects
|
||||
to be indexed, etc). The framework has basic support for complex datasources.
|
||||
|
||||
pluggable file content handlers - content handlers which 'know' how to index
|
||||
various file formats (MS Word, Zip, Tar, etc) can be easily configured via an
|
||||
xml configuration file.
|
Loading…
Reference in New Issue