2002-07-14 15:00:11 -04:00
|
|
|
<?xml version="1.0"?>
|
|
|
|
<document>
|
|
|
|
<properties>
|
|
|
|
<author>Otis Gospodentic</author>
|
|
|
|
<title>Lucene Sandbox</title>
|
|
|
|
</properties>
|
|
|
|
<body>
|
|
|
|
|
|
|
|
<section name="Lucene Sandbox">
|
2002-10-29 23:09:30 -05:00
|
|
|
<p>
|
2002-07-14 15:00:11 -04:00
|
|
|
Lucene project also contains a workspace, Lucene Sandbox, that is open to all Lucene committers, as well
|
|
|
|
as a few other developers. The purpose of the Sandbox is to host various third party contributions,
|
|
|
|
and to serve as a place to try out new ideas and prepare them for inclusion into the core Lucene
|
2002-10-29 23:09:30 -05:00
|
|
|
distribution.<br/>
|
2002-07-14 15:00:11 -04:00
|
|
|
Users are free to experiment with the components developed in the Sandbox, but Sandbox components will
|
|
|
|
not necessarily be maintained, particularly in their current state.
|
2002-10-29 23:09:30 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
You can access the Lucene Sandbox CVS repository at
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</a>.
|
|
|
|
</p>
|
|
|
|
|
2002-09-17 00:21:06 -04:00
|
|
|
|
|
|
|
<subsection name="Indyo">
|
|
|
|
<p>
|
|
|
|
Indyo is a datasource-independent Lucene indexing framework.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
A tutorial for using Indyo can be found <a href="indyo/tutorial.html">here</a>.
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
2002-10-29 23:09:30 -05:00
|
|
|
<subsection name="LARM">
|
|
|
|
<p>
|
|
|
|
LARM is a web crawler optimized for large intranets with up to a couple of hundred hosts.
|
|
|
|
</p>
|
|
|
|
<a href="larm/overview.html">Technical Overview</a>
|
|
|
|
|
2002-12-24 16:20:23 -05:00
|
|
|
</subsection>
|
|
|
|
<subsection name="Snowball Stemmers for Lucene">
|
|
|
|
<p>
|
|
|
|
This project provides pre-compiled versions of the Snowball stemmers
|
|
|
|
for Lucene.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
More information can be found
|
|
|
|
<a href="http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/">here</a>.
|
|
|
|
</p>
|
|
|
|
|
2003-01-28 17:54:23 -05:00
|
|
|
<p>
|
|
|
|
<a href="http://snowball.tartarus.org/">Background information on Snowball</a>,
|
|
|
|
which is a language for stemmers developed by Martin Porter.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="Ant">
|
|
|
|
<p>
|
|
|
|
The Ant project is a useful Ant task that creates a Lucene index out of an Ant fileset. It also
|
|
|
|
contains an example HTML parser that uses JTidy.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/ant/">The
|
|
|
|
CVS repository for the Ant contribution.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="SearchBean">
|
|
|
|
<p>
|
|
|
|
SearchBean is a UI component that can be used to browse through the results of a Lucene search.
|
|
|
|
The SearchBean searches the index for a given query string, retrieves the hits, and then brings
|
|
|
|
them into the HitsIterator class, which can be used for paging and sorting through search results.
|
|
|
|
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/searchBean/">The
|
|
|
|
CVS repository for the SearchBean contribution.</a>
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
<a href="http://snowball.tartarus.org/">Background information on Snowball</a>,
|
|
|
|
which is a language for stemmers developed by Martin Porter.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
</subsection>
|
|
|
|
|
2003-01-31 14:42:30 -05:00
|
|
|
<subsection name="Lucene Service for Fulcrum">
|
2003-01-28 17:54:23 -05:00
|
|
|
<p>
|
2003-01-31 14:42:30 -05:00
|
|
|
Lucene can be run as a service inside <a href="http://jakarta.apache.org/turbine/fulcrum/index.html">Fulcrum</a>,
|
|
|
|
which is the services framework from the
|
|
|
|
<a href="http://jakarta.apache.org/turbine/">Turbine</a> project.</p>
|
|
|
|
<p>
|
|
|
|
The implementation consists of a SearchService interface, a LuceneSearchSearchService implementation, and a
|
|
|
|
SearchResults object that gets an array of Document objects from a Hits object. Calls to the search methods on
|
|
|
|
the service return the SearchResults object.
|
2003-01-28 17:54:23 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2003-01-31 14:42:30 -05:00
|
|
|
The service supports querying, but does not support indexing.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/fulcrum/">
|
|
|
|
CVS repository for the Fulcrum Service.</a>
|
2003-01-28 17:54:23 -05:00
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
2003-01-31 14:42:30 -05:00
|
|
|
<subsection name="WordNet/Synonyms">
|
2003-01-28 17:54:23 -05:00
|
|
|
<p>
|
2003-01-31 14:42:30 -05:00
|
|
|
The Lucene WordNet code consists of a single class which parses a prolog file
|
|
|
|
from the WordNet site that contains a list of English words and synonyms.
|
|
|
|
The class builds a Lucene index from the synonyms file. Your querying code could
|
|
|
|
hit this index to build up a set of synonyms for the terms in the
|
|
|
|
search query.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
More information on the <a href="http://www.tropo.com/techno/java/lucene/wordnet.html">Lucene WordNet package</a>.
|
|
|
|
<a href="http://www.cogsci.princeton.edu/~wn/">WordNet</a> is an online database of English language words that contains
|
|
|
|
synonyms, definitions, and various relationships between synonym sets.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/">
|
|
|
|
CVS for the WordNet module.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
2003-01-28 17:54:23 -05:00
|
|
|
|
2003-01-31 14:42:30 -05:00
|
|
|
<subsection name="SAX/DOM XML Indexing demo">
|
|
|
|
<p>
|
|
|
|
This contribution is some sample code that demonstrates adding simple XML documents into the index. It creates
|
|
|
|
a new Document object for each file, and then populates the Document with a Field for each XML element, recursively.
|
|
|
|
There are examples included for both SAX and DOM.
|
2003-01-28 17:54:23 -05:00
|
|
|
</p>
|
|
|
|
<p>
|
2003-01-31 14:42:30 -05:00
|
|
|
|
|
|
|
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/XML-Indexing-Demo/">
|
|
|
|
CVS for the XML Indexing Demo.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="High Frequency Terms">
|
|
|
|
<p>
|
|
|
|
The miscellaneous package is for classes that don't fit anywhere else. The only class in it right now determines
|
|
|
|
what terms occur the most inside a Lucene index. This could be useful for analyzing which terms may need to go
|
|
|
|
into a custom stop word list for better search results.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
|
|
|
|
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/miscellaneous/src/java/org/apache/lucene/misc/">
|
|
|
|
CVS for miscellaneous classes.</a>
|
2003-01-28 17:54:23 -05:00
|
|
|
</p>
|
2002-10-29 23:09:30 -05:00
|
|
|
</subsection>
|
|
|
|
|
2002-07-14 15:00:11 -04:00
|
|
|
</section>
|
|
|
|
|
|
|
|
</body>
|
|
|
|
</document>
|