lucene/xdocs/lucene-sandbox/index.xml

155 lines
5.2 KiB
XML

<?xml version="1.0"?>
<document>
<properties>
<author>Otis Gospodentic</author>
<title>Lucene Sandbox</title>
</properties>
<body>
<section name="Lucene Sandbox">
<p>
Lucene project also contains a workspace, Lucene Sandbox, that is open to all Lucene committers, as well
as a few other developers. The purpose of the Sandbox is to host various third party contributions,
and to serve as a place to try out new ideas and prepare them for inclusion into the core Lucene
distribution.<br/>
Users are free to experiment with the components developed in the Sandbox, but Sandbox components will
not necessarily be maintained, particularly in their current state.
</p>
<p>
You can access the Lucene Sandbox CVS repository at
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</a>.
</p>
<subsection name="Indyo">
<p>
Indyo is a datasource-independent Lucene indexing framework.
</p>
<p>
A tutorial for using Indyo can be found <a href="indyo/tutorial.html">here</a>.
</p>
</subsection>
<subsection name="LARM">
<p>
LARM is a web crawler optimized for large intranets with up to a couple of hundred hosts.
</p>
<a href="larm/overview.html">Technical Overview</a>
</subsection>
<subsection name="Snowball Stemmers for Lucene">
<p>
This project provides pre-compiled versions of the Snowball stemmers
for Lucene.
</p>
<p>
More information can be found
<a href="http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/">here</a>.
</p>
<p>
<a href="http://snowball.tartarus.org/">Background information on Snowball</a>,
which is a language for stemmers developed by Martin Porter.
</p>
</subsection>
<subsection name="Ant">
<p>
The Ant project is a useful Ant task that creates a Lucene index out of an Ant fileset. It also
contains an example HTML parser that uses JTidy.
</p>
<p>
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/ant/">The
CVS repository for the Ant contribution.</a>
</p>
</subsection>
<subsection name="SearchBean">
<p>
SearchBean is a UI component that can be used to browse through the results of a Lucene search.
The SearchBean searches the index for a given query string, retrieves the hits, and then brings
them into the HitsIterator class, which can be used for paging and sorting through search results.
</p>
<p>
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/searchBean/">The
CVS repository for the SearchBean contribution.</a>
</p>
<p>
<a href="http://snowball.tartarus.org/">Background information on Snowball</a>,
which is a language for stemmers developed by Martin Porter.
</p>
</subsection>
<subsection name="Lucene Service for Fulcrum">
<p>
Lucene can be run as a service inside <a href="http://jakarta.apache.org/turbine/fulcrum/index.html">Fulcrum</a>,
which is the services framework from the
<a href="http://jakarta.apache.org/turbine/">Turbine</a> project.</p>
<p>
The implementation consists of a SearchService interface, a LuceneSearchSearchService implementation, and a
SearchResults object that gets an array of Document objects from a Hits object. Calls to the search methods on
the service return the SearchResults object.
</p>
<p>
The service supports querying, but does not support indexing.
</p>
<p>
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/fulcrum/">
CVS repository for the Fulcrum Service.</a>
</p>
</subsection>
<subsection name="WordNet/Synonyms">
<p>
The Lucene WordNet code consists of a single class which parses a prolog file
from the WordNet site that contains a list of English words and synonyms.
The class builds a Lucene index from the synonyms file. Your querying code could
hit this index to build up a set of synonyms for the terms in the
search query.
</p>
<p>
More information on the <a href="http://www.tropo.com/techno/java/lucene/wordnet.html">Lucene WordNet package</a>.
<a href="http://www.cogsci.princeton.edu/~wn/">WordNet</a> is an online database of English language words that contains
synonyms, definitions, and various relationships between synonym sets.
</p>
<p>
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/">
CVS for the WordNet module.</a>
</p>
</subsection>
<subsection name="SAX/DOM XML Indexing demo">
<p>
This contribution is some sample code that demonstrates adding simple XML documents into the index. It creates
a new Document object for each file, and then populates the Document with a Field for each XML element, recursively.
There are examples included for both SAX and DOM.
</p>
<p>
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/XML-Indexing-Demo/">
CVS for the XML Indexing Demo.</a>
</p>
</subsection>
<subsection name="High Frequency Terms">
<p>
The miscellaneous package is for classes that don't fit anywhere else. The only class in it right now determines
what terms occur the most inside a Lucene index. This could be useful for analyzing which terms may need to go
into a custom stop word list for better search results.
</p>
<p>
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/miscellaneous/src/java/org/apache/lucene/misc/">
CVS for miscellaneous classes.</a>
</p>
</subsection>
</section>
</body>
</document>