2002-07-14 15:00:11 -04:00
|
|
|
<?xml version="1.0"?>
|
|
|
|
<document>
|
2003-10-15 09:51:38 -04:00
|
|
|
<properties>
|
|
|
|
<author>Otis Gospodentic</author>
|
|
|
|
<title>Lucene Sandbox</title>
|
|
|
|
</properties>
|
|
|
|
<body>
|
|
|
|
|
|
|
|
<section name="Lucene Sandbox">
|
|
|
|
<p>
|
|
|
|
Lucene project also contains a workspace, Lucene Sandbox, that is open to all Lucene committers, as well
|
|
|
|
as a few other developers. The purpose of the Sandbox is to host various third party contributions,
|
|
|
|
and to serve as a place to try out new ideas and prepare them for inclusion into the core Lucene
|
|
|
|
distribution.<br/>
|
|
|
|
Users are free to experiment with the components developed in the Sandbox, but Sandbox components will
|
|
|
|
not necessarily be maintained, particularly in their current state.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
You can access the Lucene Sandbox CVS repository at
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</a>.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<subsection name="Snowball Stemmers for Lucene">
|
|
|
|
<p>
|
|
|
|
This project provides pre-compiled versions of the Snowball stemmers
|
|
|
|
for Lucene.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
More information can be found
|
|
|
|
<a href="http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/">here</a>.
|
|
|
|
</p>
|
|
|
|
|
|
|
|
<p>
|
|
|
|
<a href="http://snowball.tartarus.org/">Background information on Snowball</a>,
|
|
|
|
which is a language for stemmers developed by Martin Porter.
|
|
|
|
</p>
|
2004-01-28 06:45:19 -05:00
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="Analyzers, Tokenizers, Filters">
|
|
|
|
<p>
|
|
|
|
Contributed Analyzers, Tokenizers, and Filters for various languages.
|
|
|
|
</p>
|
2003-10-15 09:51:38 -04:00
|
|
|
|
2004-01-28 06:45:19 -05:00
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/analyzers/">The
|
|
|
|
CVS repository for the Analyzers contribution.</a>
|
|
|
|
</p>
|
2003-10-15 09:51:38 -04:00
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="SearchBean">
|
|
|
|
<p>
|
|
|
|
SearchBean is a UI component that can be used to browse through the results of a Lucene search.
|
|
|
|
The SearchBean searches the index for a given query string, retrieves the hits, and then brings
|
|
|
|
them into the HitsIterator class, which can be used for paging and sorting through search results.
|
|
|
|
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/searchbean/">The
|
|
|
|
CVS repository for the SearchBean contribution.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="Ant">
|
|
|
|
<p>
|
|
|
|
The Ant project is a useful Ant task that creates a Lucene index out of an Ant fileset. It also
|
|
|
|
contains an example HTML parser that uses JTidy.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/ant/">The
|
|
|
|
CVS repository for the Ant contribution.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="WordNet/Synonyms">
|
|
|
|
<p>
|
|
|
|
The Lucene WordNet code consists of a single class which parses a prolog file
|
|
|
|
from the WordNet site that contains a list of English words and synonyms.
|
|
|
|
The class builds a Lucene index from the synonyms file. Your querying code could
|
|
|
|
hit this index to build up a set of synonyms for the terms in the
|
|
|
|
search query.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
More information on the <a href="http://www.tropo.com/techno/java/lucene/wordnet.html">Lucene WordNet package</a>.
|
|
|
|
<a href="http://www.cogsci.princeton.edu/~wn/">WordNet</a> is an online database of English language words that contains
|
|
|
|
synonyms, definitions, and various relationships between synonym sets.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/">
|
|
|
|
CVS for the WordNet module.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="SAX/DOM XML Indexing demo">
|
|
|
|
<p>
|
|
|
|
This contribution is some sample code that demonstrates adding simple XML documents into the index. It creates
|
|
|
|
a new Document object for each file, and then populates the Document with a Field for each XML element, recursively.
|
|
|
|
There are examples included for both SAX and DOM.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
|
|
|
|
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/XML-Indexing-Demo/">
|
|
|
|
CVS for the XML Indexing Demo.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
2004-01-28 06:45:19 -05:00
|
|
|
<subsection name="Lucli - Lucene Command-line Interface">
|
2003-10-15 09:51:38 -04:00
|
|
|
<p>
|
2004-01-28 06:45:19 -05:00
|
|
|
The Lucli application allows index manipulation from the
|
|
|
|
command-line.
|
2003-10-15 09:51:38 -04:00
|
|
|
</p>
|
|
|
|
|
2004-01-28 06:45:19 -05:00
|
|
|
<p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/lucli/">The
|
|
|
|
CVS repository for the Lucli contribution.</a>
|
|
|
|
</p>
|
2003-10-15 09:51:38 -04:00
|
|
|
</subsection>
|
|
|
|
|
2004-04-20 03:17:23 -04:00
|
|
|
<subsection name="Term Highlighter">
|
|
|
|
<p>
|
|
|
|
A small set of classes for highlighting matching terms in
|
|
|
|
search results.
|
|
|
|
</p>
|
|
|
|
<a href="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/contributions/highlighter/">The
|
|
|
|
CVS repository for the Highlighter contribution.</a>
|
|
|
|
</subsection>
|
|
|
|
|
2003-10-15 09:51:38 -04:00
|
|
|
<subsection name="Javascript Query Constructor">
|
|
|
|
<p>
|
|
|
|
Javascript library to support client-side query-building. Provides support for a user interface similar to
|
|
|
|
<a href="http://www.google.com.sg/advanced_search">Google's Advanced Search</a>.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
|
|
|
|
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/javascript/queryConstructor/">
|
|
|
|
CVS for the files.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="Javascript Query Validator">
|
|
|
|
<p>
|
|
|
|
Javascript library to support client-side query validation. Lucene doesn't like malformed queries and tends to
|
|
|
|
throw ParseException, which are often difficult to interpret and pass on to the user. This library hopes to
|
|
|
|
alleviate that problem.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
|
|
|
|
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/javascript/queryValidator/">
|
|
|
|
CVS for files.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
|
|
|
<subsection name="High Frequency Terms">
|
|
|
|
<p>
|
|
|
|
The miscellaneous package is for classes that don't fit anywhere else. The only class in it right now determines
|
|
|
|
what terms occur the most inside a Lucene index. This could be useful for analyzing which terms may need to go
|
|
|
|
into a custom stop word list for better search results.
|
|
|
|
</p>
|
|
|
|
<p>
|
|
|
|
|
|
|
|
<a href="http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/miscellaneous/src/java/org/apache/lucene/misc/">
|
|
|
|
CVS for miscellaneous classes.</a>
|
|
|
|
</p>
|
|
|
|
</subsection>
|
|
|
|
|
2004-01-28 06:45:19 -05:00
|
|
|
<subsection name="LARM">
|
|
|
|
<p>
|
|
|
|
LARM is a web crawler optimized for large intranets with up to a couple of hundred hosts.
|
|
|
|
</p>
|
|
|
|
<a href="larm/overview.html">Technical Overview</a>.
|
|
|
|
See also: <a href="http://larm.sourceforge.net/">LARM's home page on SourceForge</a>.
|
|
|
|
</subsection>
|
|
|
|
|
2003-10-15 09:51:38 -04:00
|
|
|
</section>
|
|
|
|
|
|
|
|
</body>
|
2002-07-14 15:00:11 -04:00
|
|
|
</document>
|