Updated my bio, since I'm no longer at Grand Central.

Added a missing tag in lucenesandbox.xml that broke Anakia.
Rebuilt out-of-date docs.


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149772 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Cutting 2002-06-08 20:43:01 +00:00
parent 2ef3150b70
commit ce605b9a08
7 changed files with 117 additions and 100 deletions

View File

@ -194,10 +194,7 @@ Tomcat. </p>
From your Tomcat directory look in the webapps/luceneweb subdirectory. If its not From your Tomcat directory look in the webapps/luceneweb subdirectory. If its not
present, try browsing to "http://localhost:8080/luceneweb" then look again. present, try browsing to "http://localhost:8080/luceneweb" then look again.
Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the Edit a file called configuration.jsp. Ensure that the indexLocation is equal to the
location you used for your index. Note that if you are using a location you used for your index. You may also customize the appTitle and appFooter
Windows OS, you will need to escape backslashes. For example, the indexLocation may look
something like "c:\\dev\\lucene-index".
You may also customize the appTitle and appFooter
strings as you see fit. Once you have finsihed altering the configuration you should strings as you see fit. Once you have finsihed altering the configuration you should
restart Tomcat. You may also wish to update the war file by typing restart Tomcat. You may also wish to update the war file by typing
<b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory. <b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.

View File

@ -2,7 +2,7 @@
<!-- Content Stylesheet for Site --> <!-- Content Stylesheet for Site -->
<!-- start the processing --> <!-- start the processing -->
<!-- ====================================================================== --> <!-- ====================================================================== -->
<!-- Main Page Section --> <!-- Main Page Section -->
@ -19,13 +19,13 @@
<meta name="email" value="cutting@apache.org"> <meta name="email" value="cutting@apache.org">
<meta name="author" value="Peer Carlson"> <meta name="author" value="Peer Carlson">
<meta name="email" value="carlson@apache.org"> <meta name="email" value="carlson@apache.org">
<title>Jakarta Lucene - Overview - Jakarta Lucene</title> <title>Jakarta Lucene - Overview - Jakarta Lucene</title>
</head> </head>
<body bgcolor="#ffffff" text="#000000" link="#525D76"> <body bgcolor="#ffffff" text="#000000" link="#525D76">
<table border="0" width="100%" cellspacing="0"> <table border="0" width="100%" cellspacing="0">
<!-- TOP IMAGE --> <!-- TOP IMAGE -->
<tr> <tr>
@ -41,7 +41,7 @@
<tr><td colspan="2"> <tr><td colspan="2">
<hr noshade="" size="1"/> <hr noshade="" size="1"/>
</td></tr> </td></tr>
<tr> <tr>
<!-- LEFT SIDE NAVIGATION --> <!-- LEFT SIDE NAVIGATION -->
<td width="20%" valign="top" nowrap="true"> <td width="20%" valign="top" nowrap="true">
@ -59,24 +59,22 @@
<p><strong>Resources</strong></p> <p><strong>Resources</strong></p>
<ul> <ul>
<li> <a href="http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi">FAQ (Official)</a> <li> <a href="http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi">FAQ (Official)</a>
</li>
<li> <a href="http://www.jguru.com/faq/Lucene">JGuru FAQ</a>
</li> </li>
<li> <a href="./gettingstarted.html">Getting Started</a> <li> <a href="./gettingstarted.html">Getting Started</a>
</li>
<li> <a href="http://www.jguru.com/faq/Lucene">JGuru FAQ</a>
</li> </li>
<li> <a href="http://jakarta.apache.org/site/bugs.html">Bugs</a> <li> <a href="http://jakarta.apache.org/site/bugs.html">Bugs</a>
</li> </li>
<li> <a href="http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Lucene Bugs</a> <li> <a href="http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Lucene Bugs</a>
</li>
<li> <a href="./resources.html">Articles</a>
</li> </li>
<li> <a href="./queryparsersyntax.html">Query Syntax</a> <li> <a href="./queryparsersyntax.html">Query Syntax</a>
</li> </li>
<li> <a href="./api/index.html">Javadoc API</a> <li> <a href="./api/index.html">Javadoc</a>
</li> </li>
<li> <a href="./contributions.html">Contributions</a> <li> <a href="./contributions.html">Contributions</a>
</li>
<li> <a href="./lucenesandbox.html">Lucene Sandbox</a>
</li>
<li> <a href="./resources.html">Articles</a>
</li> </li>
</ul> </ul>
<p><strong>Plans</strong></p> <p><strong>Plans</strong></p>
@ -117,11 +115,11 @@
<p> <p>
Jakarta Lucene is a high-performance, full-featured text search engine Jakarta Lucene is a high-performance, full-featured text search engine
written entirely in Java. It is a technology suitable for nearly any written entirely in Java. It is a technology suitable for nearly any
application that requires full-text search, especially cross-platform. application that requires full-text search, especially cross-platform.
</p> </p>
<p> <p>
Jakarta Lucene is an open source project available for Jakarta Lucene is an open source project available for
<a href="http://jakarta.apache.org/site/binindex.html">free download</a> from Apache Jakarta. <a href="http://jakarta.apache.org/site/binindex.html">free download</a> from Apache Jakarta.
Please use the links on the left to access Lucene. Please use the links on the left to access Lucene.
</p> </p>
</blockquote> </blockquote>
@ -142,14 +140,14 @@ Please use the links on the left to access Lucene.
Download it <a href="http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc5/"> here</a>. Download it <a href="http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc5/"> here</a>.
</p> </p>
<p> <p>
<b>Lucene v1.02 released</b> - This release repackages Lucene as a product <b>Lucene v1.02 released</b> - This release repackages Lucene as a product
of the Apache Software Foundation. Download it of the Apache Software Foundation. Download it
<a href="http://jakarta.apache.org/site/binindex.html">here</a>. <a href="http://jakarta.apache.org/site/binindex.html">here</a>.
</p> </p>
<p> <p>
<b>Lucene Joins Jakarta</b> - The Lucene Team is happy to announce that <b>Lucene Joins Jakarta</b> - The Lucene Team is happy to announce that
Lucene is now a part of the Apache Jakarta Project. This move will Lucene is now a part of the Apache Jakarta Project. This move will
help Lucene continue to grow, and enhance its position as the leading help Lucene continue to grow, and enhance its position as the leading
server-side searching solution for Java. server-side searching solution for Java.
</p> </p>
</blockquote> </blockquote>
@ -166,7 +164,7 @@ server-side searching solution for Java.
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
The goal of the <a href="http://jakarta.apache.org/">Apache Jakarta Project</a> The goal of the <a href="http://jakarta.apache.org/">Apache Jakarta Project</a>
is to provide commercial-quality server solutions, based on the Java Platform, is to provide commercial-quality server solutions, based on the Java Platform,
developed in an open and cooperative fashion. developed in an open and cooperative fashion.
</p> </p>
@ -191,3 +189,23 @@ developed in an open and cooperative fashion.
</body> </body>
</html> </html>
<!-- end the processing --> <!-- end the processing -->

View File

@ -115,8 +115,8 @@
The best reference is <a href="http://www.htdig.org"> The best reference is <a href="http://www.htdig.org">
htDig</a>, though it is not quite as sophisticated as htDig</a>, though it is not quite as sophisticated as
Lucene, it has a number of features that make it Lucene, it has a number of features that make it
desireable. It however is a traditional c-compiled app desirable. It however is a traditional c-compiled app
which makes it somewhat unpleasent to install on some which makes it somewhat unpleasant to install on some
platforms (like Solaris!). platforms (like Solaris!).
</p> </p>
<p> <p>
@ -124,12 +124,12 @@
community for an initial reaction, advice, feedback and community for an initial reaction, advice, feedback and
consent. Following this it will be submitted to the consent. Following this it will be submitted to the
Lucene user community for support. Although, I'm (Andy Lucene user community for support. Although, I'm (Andy
Oliver) capable of providing these enhancements by Oliver) capable of providing these enhancements by
myself, I'd of course prefer to work on them in concert myself, I'd of course prefer to work on them in concert
with others. with others.
</p> </p>
<p> <p>
While I'm outlaying a fairly large featureset, these can While I'm outlaying a fairly large feature set, these can
be implemented incrementally of course (and are probably be implemented incrementally of course (and are probably
best if done that way). best if done that way).
</p> </p>
@ -148,27 +148,27 @@
<blockquote> <blockquote>
<p> <p>
The goal is to provide features to Lucene that allow it The goal is to provide features to Lucene that allow it
to be used as a dropin search engine. It should provide to be used as a drop-in search engine. It should provide
many of the features of projects like <a href="http://www.htdig.org">htDig</a> while surpassing many of the features of projects like <a href="http://www.htdig.org">htDig</a> while surpassing
them with unique Lucene features and capabillities such as them with unique Lucene features and capabilities such as
easy installation on and java-supporting platform, easy installation on and java-supporting platform,
and support for document fields and field searches. And and support for document fields and field searches. And
of course, <a href="http://apache.org/LICENSE"> of course, <a href="http://apache.org/LICENSE">
a pragmatic software license</a>. a pragmatic software license</a>.
</p> </p>
<p> <p>
To reach this goal we'll implement code to support the To reach this goal we'll implement code to support the
following objectives that augment but do not replace following objectives that augment but do not replace
the current Lucene featureset. the current Lucene feature set.
</p> </p>
<ul> <ul>
<li> <li>
Document Location Independance - meaning mapping Document Location Independence - meaning mapping
real contexts to runtime contexts. real contexts to runtime contexts.
Essentially, if the document is at Essentially, if the document is at
/var/www/htdocs/mydoc.html, I probably want it /var/www/htdocs/mydoc.html, I probably want it
indexed as indexed as
http://www.bigevilmegacorp.com/mydoc.html. http://www.bigevilmegacorp.com/mydoc.html.
</li> </li>
<li> <li>
Standard methods of creating central indicies - Standard methods of creating central indicies -
@ -176,21 +176,21 @@
many environments than is *remote* indexing (for many environments than is *remote* indexing (for
instance http). I would suggest that most folks instance http). I would suggest that most folks
would prefer that general functionality be would prefer that general functionality be
suppored by Lucene instead of having to write supported by Lucene instead of having to write
code for every indexing project. Obviously, if code for every indexing project. Obviously, if
what they are doing is *special* they'll have to what they are doing is *special* they'll have to
code, but general document indexing accross code, but general document indexing across
webservers would not qualify. web servers would not qualify.
</li> </li>
<li> <li>
Document interperatation abstraction - currently Document interpretation abstraction - currently
one must handle document object construction via one must handle document object construction via
custom code. A standard interface for plugging custom code. A standard interface for plugging
in format handlers should be supported. in format handlers should be supported.
</li> </li>
<li> <li>
Mime and file-extension to document Mime and file-extension to document
interperatation mapping. interpretation mapping.
</li> </li>
</ul> </ul>
</blockquote> </blockquote>
@ -241,7 +241,7 @@
</li> </li>
<li> <li>
replacement type - the type of replacement type - the type of
replacewith path: relative, url or replace with path: relative, URL or
path. path.
</li> </li>
<li> <li>
@ -266,8 +266,8 @@
0 - Long.MAX_VALUE. 0 - Long.MAX_VALUE.
</li> </li>
<li> <li>
SleeptimeBetweenCalls - can be used to SleeptimeBetweenCalls - can be used to
avoid flooding a machine with too many avoid flooding a machine with too many
requests requests
</li> </li>
<li> <li>
@ -276,12 +276,12 @@
inactivity. inactivity.
</li> </li>
<li> <li>
IncludeFilter - include only items IncludeFilter - include only items
matching filter. (can occur mulitple matching filter. (can occur multiple
times) times)
</li> </li>
<li> <li>
ExcludeFilter - exclude only items ExcludeFilter - exclude only items
matching filter. (can occur multiple matching filter. (can occur multiple
times) times)
</li> </li>
@ -309,9 +309,9 @@
(probably from the command line) read (probably from the command line) read
this properties file and get them from this properties file and get them from
it. Command line options override it. Command line options override
the properties file in the case of the properties file in the case of
duplicates. There should also be an duplicates. There should also be an
enivironment variable or VM parameter to environment variable or VM parameter to
set this. set this.
</li> </li>
</ul> </ul>
@ -320,19 +320,19 @@
</p> </p>
<p> <p>
This should extend the AbstractCrawler and This should extend the AbstractCrawler and
support any addtional options required for a support any additional options required for a
filesystem index. file system index.
</p> </p>
<p> <p>
<b>HTTP Crawler </b> <b>HTTP Crawler </b>
</p> </p>
<p> <p>
Supports the AbstractCrawler options as well as: Supports the AbstractCrawler options as well as:
</p> </p>
<ul> <ul>
<li> <li>
span hosts - Wheter to span hosts or not, span hosts - Whether to span hosts or not,
by default this should be no. by default this should be no.
</li> </li>
<li> <li>
restrict domains - (ignored if span restrict domains - (ignored if span
@ -346,11 +346,11 @@
recurse and go to recurse and go to
/nextcontext/index.html this option says /nextcontext/index.html this option says
to also try /nextcontext to get the dir to also try /nextcontext to get the dir
lsiting) listing)
</li> </li>
<li> <li>
map extensions - map extensions -
(always/default/never/fallback). Wether (always/default/never/fallback). Whether
to always use extension mapping, by to always use extension mapping, by
default (fallback to mime type), NEVER default (fallback to mime type), NEVER
or fallback if mime is not available or fallback if mime is not available
@ -376,8 +376,8 @@
<blockquote> <blockquote>
<p> <p>
A configurable registry of document types, their A configurable registry of document types, their
description, an identifyer, mime-type and file description, an identifier, mime-type and file
extension. This should map both MIME -&gt; factory extension. This should map both MIME -&gt; factory
and extension -&gt; factory. and extension -&gt; factory.
</p> </p>
<p> <p>
@ -500,17 +500,17 @@
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
A class taht maps standard fields from the A class that maps standard fields from the
DocumentFactories into *fields* in the Document objects DocumentFactories into *fields* in the Document objects
they create. I suggest that a regular expression system they create. I suggest that a regular expression system
or xpath might be the most universal way to do this. or xpath might be the most universal way to do this.
For instance if perhaps I had an XML factory that For instance if perhaps I had an XML factory that
represented XML elements as fields, I could map content represented XML elements as fields, I could map content
from particular fields to ther fields or supress them from particular fields to their fields or suppress them
entirely. We could even make this configurable. entirely. We could even make this configurable.
</p> </p>
<p> <p>
for example: for example:
</p> </p>
<ul> <ul>
@ -533,11 +533,11 @@
title.suppress=false title.suppress=false
</li> </li>
</ul> </ul>
<p> <p>
In this example we map html documents such that all In this example we map html documents such that all
fields are suppressed but author and title. We map fields are suppressed but author and title. We map
author and title to anything in the content matching author and title to anything in the content matching
author: (and x characters). Okay my regular expresions author: (and x characters). Okay my regular expresions
suck but hopefully you get the idea. suck but hopefully you get the idea.
</p> </p>
</blockquote> </blockquote>
@ -554,35 +554,35 @@
<tr><td> <tr><td>
<blockquote> <blockquote>
<p> <p>
We might also consider eliminating the DocumentFactory We might also consider eliminating the DocumentFactory
entirely by making an AbstractDocument from which the entirely by making an AbstractDocument from which the
current document object would inherit from. I current document object would inherit from. I
experimented with this locally, and it was a relatively experimented with this locally, and it was a relatively
minor code change and there was of course no difference minor code change and there was of course no difference
in performance. The Document Factory classes would in performance. The Document Factory classes would
instead be instances of various subclasses of instead be instances of various subclasses of
AbstractDocument. AbstractDocument.
</p> </p>
<p> <p>
My inspiration for this is HTDig (http://www.htdig.org/). My inspiration for this is HTDig (http://www.htdig.org/).
While this goes slightly beyond what HTDig provides by While this goes slightly beyond what HTDig provides by
providing field mapping (where HTDIG is just interested providing field mapping (where HTDIG is just interested
in Strings/numbers wherever they are found), it provides in Strings/numbers wherever they are found), it provides
at least what I would need to use this as a dropin for at least what I would need to use this as a drop-in for
most places I contract at (with the obvious exception of most places I contract at (with the obvious exception of
a default set of content handlers which would of course a default set of content handlers which would of course
develop naturally over time). develop naturally over time).
</p> </p>
<p> <p>
I am able to certainly contribute to this effort if the I am able to certainly contribute to this effort if the
development community is open to it. I'd suggest we do development community is open to it. I'd suggest we do
it iteratively in stages and not aim for all of this at it iteratively in stages and not aim for all of this at
once (for instance leave out the field mapping at first). once (for instance leave out the field mapping at first).
</p> </p>
<p> <p>
Anyhow, please give me some feedback, counter Anyhow, please give me some feedback, counter
suggestions, let me know if I'm way off base or out of suggestions, let me know if I'm way off base or out of
line, etc. -Andy line, etc. -Andy
</p> </p>
</blockquote> </blockquote>

View File

@ -106,7 +106,11 @@
</td></tr> </td></tr>
<tr><td> <tr><td>
<blockquote> <blockquote>
</blockquote> <P>
You can access Lucene Sandbox CVS repository at
<A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
</P>
</blockquote>
</p> </p>
</td></tr> </td></tr>
<tr><td><br/></td></tr> <tr><td><br/></td></tr>

View File

@ -131,11 +131,10 @@ Palo Alto Research Center (PARC), Apple, and Excite@Home, and authored
several information retrieval <a href="http://lucene.sourceforge.net/publications.html">papers and several information retrieval <a href="http://lucene.sourceforge.net/publications.html">papers and
patents</a>.</p> patents</a>.</p>
<p>Doug currently works for <a href="http://www.grandcentral.com/">Grand <p>Recently Doug has worked on peer-to-peer search at Infrasearch
Central</a>.</p> (aquired by Sun's JXTA project) and on web services at Grand Central.
Currently he continues to help develop Lucene and is available for
<p>Please do not email Doug directly about Lucene. Instead use contract work.</p>
the <a href="http://jakarta.apache.org/site/mail.html">Jakarta-Lucene mailing lists</a>.</p>
</li> </li>
<li><b>Otis Gospodnetic</b> (otis at apache.org)</li> <li><b>Otis Gospodnetic</b> (otis at apache.org)</li>

View File

@ -16,7 +16,7 @@ not necessarily be maintained, particularly in their current state.
<P> <P>
You can access Lucene Sandbox CVS repository at You can access Lucene Sandbox CVS repository at
<A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>. <A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
</P>
</section> </section>
</body> </body>

View File

@ -22,11 +22,10 @@ several information retrieval <a
href="http://lucene.sourceforge.net/publications.html">papers and href="http://lucene.sourceforge.net/publications.html">papers and
patents</a>.</p> patents</a>.</p>
<p>Doug currently works for <a href="http://www.grandcentral.com/">Grand <p>Recently Doug has worked on peer-to-peer search at Infrasearch
Central</a>.</p> (aquired by Sun's JXTA project) and on web services at Grand Central.
Currently he continues to help develop Lucene and is available for
<p>Please do not email Doug directly about Lucene. Instead use contract work.</p>
the <a href="http://jakarta.apache.org/site/mail.html">Jakarta-Lucene mailing lists</a>.</p>
</li> </li>
<li><b>Otis Gospodnetic</b> (otis at apache.org)</li> <li><b>Otis Gospodnetic</b> (otis at apache.org)</li>