From ce605b9a08f350ae6302cfa1f43018b724d06cb8 Mon Sep 17 00:00:00 2001
From: Doug Cutting <cutting@apache.org>
Date: Sat, 8 Jun 2002 20:43:01 +0000
Subject: [PATCH] Updated my bio, since I'm no longer at Grand Central. Added a
 missing tag in lucenesandbox.xml that broke Anakia. Rebuilt out-of-date docs.

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149772 13f79535-47bb-0310-9956-ffa450edef68
---
 docs/demo3.html         |   5 +-
 docs/index.html         |  62 +++++++++++++-------
 docs/luceneplan.html    | 124 ++++++++++++++++++++--------------------
 docs/lucenesandbox.html |   6 +-
 docs/whoweare.html      |   9 ++-
 xdocs/lucenesandbox.xml |   2 +-
 xdocs/whoweare.xml      |   9 ++-
 7 files changed, 117 insertions(+), 100 deletions(-)
diff --git a/docs/demo3.html b/docs/demo3.html
index cd9a8637c4a..a4adabd7129 100644
--- a/docs/demo3.html
+++ b/docs/demo3.html
@@ -194,10 +194,7 @@ Tomcat.  </p>
 From your Tomcat directory look in the webapps/luceneweb subdirectory.  If its not 
 present, try browsing to "http://localhost:8080/luceneweb" then look again.  
 Edit a file called configuration.jsp.  Ensure that the indexLocation is equal to the 
-location you used for your index.  Note that if you are using a
-Windows OS, you will need to escape backslashes.  For example, the indexLocation may look
-something like "c:\\dev\\lucene-index".  
-You may also customize the appTitle and appFooter 
+location you used for your index.  You may also customize the appTitle and appFooter 
 strings as you see fit.  Once you have finsihed altering the configuration you should 
 restart Tomcat.  You may also wish to update the war file by typing 
 <b>jar -uf luceneweb.war configuration.jsp</b> from the luceneweb subdirectory.  
diff --git a/docs/index.html b/docs/index.html
index 0b21d526b2b..0a79af5ea79 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -2,7 +2,7 @@
 
 <!-- Content Stylesheet for Site -->
 
-
+        
 <!-- start the processing -->
     <!-- ====================================================================== -->
     <!-- Main Page Section -->
@@ -19,13 +19,13 @@
             <meta name="email" value="cutting@apache.org">
                                         <meta name="author" value="Peer Carlson">
             <meta name="email" value="carlson@apache.org">
-
-
-
+            
+           
+                                    
             <title>Jakarta Lucene - Overview - Jakarta Lucene</title>
         </head>
 
-        <body bgcolor="#ffffff" text="#000000" link="#525D76">
+        <body bgcolor="#ffffff" text="#000000" link="#525D76">        
             <table border="0" width="100%" cellspacing="0">
                 <!-- TOP IMAGE -->
                 <tr>
@@ -41,7 +41,7 @@
                 <tr><td colspan="2">
                     <hr noshade="" size="1"/>
                 </td></tr>
-
+                
                 <tr>
                     <!-- LEFT SIDE NAVIGATION -->
                     <td width="20%" valign="top" nowrap="true">
@@ -59,24 +59,22 @@
             <p><strong>Resources</strong></p>
         <ul>
                     <li>    <a href="http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi">FAQ (Official)</a>
-</li>
-                    <li>    <a href="http://www.jguru.com/faq/Lucene">JGuru FAQ</a>
 </li>
                     <li>    <a href="./gettingstarted.html">Getting Started</a>
+</li>
+                    <li>    <a href="http://www.jguru.com/faq/Lucene">JGuru FAQ</a>
 </li>
                     <li>    <a href="http://jakarta.apache.org/site/bugs.html">Bugs</a>
 </li>
                     <li>    <a href="http://nagoya.apache.org/bugzilla/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=&emailtype1=substring&emailassigned_to1=1&email2=&emailtype2=substring&emailreporter2=1&bugidtype=include&bug_id=&changedin=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&product=Lucene&short_desc=&short_desc_type=allwordssubstr&long_desc=&long_desc_type=allwordssubstr&bug_file_loc=&bug_file_loc_type=allwordssubstr&keywords=&keywords_type=anywords&field0-0-0=noop&type0-0-0=noop&value0-0-0=&cmdtype=doit&order=%27Importance%27">Lucene Bugs</a>
+</li>
+                    <li>    <a href="./resources.html">Articles</a>
 </li>
                     <li>    <a href="./queryparsersyntax.html">Query Syntax</a>
 </li>
-                    <li>    <a href="./api/index.html">Javadoc API</a>
+                    <li>    <a href="./api/index.html">Javadoc</a>
 </li>
                     <li>    <a href="./contributions.html">Contributions</a>
-</li>
-                    <li>    <a href="./lucenesandbox.html">Lucene Sandbox</a>
-</li>
-                    <li>    <a href="./resources.html">Articles</a>
 </li>
                 </ul>
             <p><strong>Plans</strong></p>
@@ -117,11 +115,11 @@
                                     <p>
 Jakarta Lucene is a high-performance, full-featured text search engine
 written entirely in Java.  It is a technology suitable for nearly any
-application that requires full-text search, especially cross-platform.
+application that requires full-text search, especially cross-platform. 
 </p>
                                                 <p>
-Jakarta Lucene is an open source project available for
-<a href="http://jakarta.apache.org/site/binindex.html">free download</a> from Apache Jakarta.
+Jakarta Lucene is an open source project available for 
+<a href="http://jakarta.apache.org/site/binindex.html">free download</a> from Apache Jakarta.  
 Please use the links on the left to access Lucene.
 </p>
                             </blockquote>
@@ -142,14 +140,14 @@ Please use the links on the left to access Lucene.
 Download it <a href="http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2-rc5/"> here</a>.
 </p>
                                                 <p>
-<b>Lucene v1.02 released</b> - This release repackages Lucene as a product
-of the Apache Software Foundation. Download it
+<b>Lucene v1.02 released</b> - This release repackages Lucene as a product 
+of the Apache Software Foundation. Download it 
 <a href="http://jakarta.apache.org/site/binindex.html">here</a>.
 </p>
                                                 <p>
-<b>Lucene Joins Jakarta</b> - The Lucene Team is happy to announce that
-Lucene is now a part of the Apache Jakarta Project. This move will
-help Lucene continue to grow, and enhance its position as the leading
+<b>Lucene Joins Jakarta</b> - The Lucene Team is happy to announce that 
+Lucene is now a part of the Apache Jakarta Project. This move will 
+help Lucene continue to grow, and enhance its position as the leading 
 server-side searching solution for Java.
 </p>
                             </blockquote>
@@ -166,7 +164,7 @@ server-side searching solution for Java.
       <tr><td>
         <blockquote>
                                     <p>
-The goal of the <a href="http://jakarta.apache.org/">Apache Jakarta Project</a>
+The goal of the <a href="http://jakarta.apache.org/">Apache Jakarta Project</a> 
 is to provide commercial-quality server solutions, based on the Java Platform,
 developed in an open and cooperative fashion.
 </p>
@@ -191,3 +189,23 @@ developed in an open and cooperative fashion.
         </body>
     </html>
 <!-- end the processing -->
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/luceneplan.html b/docs/luceneplan.html
index d76548dbb99..8c73a02c8dc 100644
--- a/docs/luceneplan.html
+++ b/docs/luceneplan.html
@@ -115,8 +115,8 @@
                         The best reference is <a href="http://www.htdig.org">
                         htDig</a>, though it is not quite as sophisticated as
                         Lucene, it has a number of features that make it
-                        desireable.  It however is a traditional c-compiled app
-                        which makes it somewhat unpleasent to install on some
+                        desirable.  It however is a traditional c-compiled app
+                        which makes it somewhat unpleasant to install on some
                         platforms (like Solaris!).
                 </p>
                                                 <p>
@@ -124,12 +124,12 @@
                         community for an initial reaction, advice, feedback and
                         consent.  Following this it will be submitted to the
                         Lucene user community for support.  Although, I'm (Andy
-                        Oliver) capable of providing these enhancements by 
-                        myself, I'd of course prefer to work on them in concert 
+                        Oliver) capable of providing these enhancements by
+                        myself, I'd of course prefer to work on them in concert
                         with others.
                 </p>
                                                 <p>
-                        While I'm outlaying a fairly large featureset, these can
+                        While I'm outlaying a fairly large feature set, these can
                         be implemented incrementally of course (and are probably
                         best if done that way).
                 </p>
@@ -148,27 +148,27 @@
         <blockquote>
                                     <p>
                         The goal is to provide features to Lucene that allow it
-                        to be used as a dropin search engine.  It should provide
+                        to be used as a drop-in search engine.  It should provide
                         many of the features of projects like <a href="http://www.htdig.org">htDig</a> while surpassing
-                        them with unique Lucene features and capabillities such as
+                        them with unique Lucene features and capabilities such as
                         easy installation on and java-supporting platform,
-                        and support for document fields and field searches.  And 
+                        and support for document fields and field searches.  And
                         of course, <a href="http://apache.org/LICENSE">
                         a pragmatic software license</a>.
                 </p>
                                                 <p>
                         To reach this goal we'll implement code to support the
                         following objectives that augment but do not replace
-                        the current Lucene featureset.  
+                        the current Lucene feature set.
                 </p>
                                                 <ul>
                         <li>
-                                Document Location Independance - meaning mapping
+                                Document Location Independence - meaning mapping
                                 real contexts to runtime contexts.
                                 Essentially, if the document is at
                                 /var/www/htdocs/mydoc.html, I probably want it
                                 indexed as
-                                http://www.bigevilmegacorp.com/mydoc.html.                                
+                                http://www.bigevilmegacorp.com/mydoc.html.
                         </li>
                         <li>
                                 Standard methods of creating central indicies -
@@ -176,21 +176,21 @@
                                 many environments than is *remote* indexing (for
                                 instance http).  I would suggest that most folks
                                 would prefer that general functionality be
-                                suppored by Lucene instead of having to write
+                                supported by Lucene instead of having to write
                                 code for every indexing project.  Obviously, if
                                 what they are doing is *special* they'll have to
-                                code, but general document indexing accross
-                                webservers would not qualify.
+                                code, but general document indexing across
+                                web servers would not qualify.
                         </li>
                         <li>
-                                Document interperatation abstraction - currently
+                                Document interpretation abstraction - currently
                                 one must handle document object construction via
                                 custom code.  A standard interface for plugging
-                                in format handlers should be supported.  
+                                in format handlers should be supported.
                         </li>
                         <li>
                                 Mime and file-extension to document
-                                interperatation mapping.                                  
+                                interpretation mapping.
                         </li>
                 </ul>
                             </blockquote>
@@ -241,7 +241,7 @@
                                 </li>
                                 <li>
                                         replacement type - the type of
-                                        replacewith path:  relative, url or
+                                        replace with path:  relative, URL or
                                         path.
                                 </li>
                                 <li>
@@ -266,8 +266,8 @@
                                         0 - Long.MAX_VALUE.
                                 </li>
                                 <li>
-                                        SleeptimeBetweenCalls - can be used to 
-                                        avoid flooding a machine with too many 
+                                        SleeptimeBetweenCalls - can be used to
+                                        avoid flooding a machine with too many
                                         requests
                                 </li>
                                 <li>
@@ -276,12 +276,12 @@
                                         inactivity.
                                 </li>
                                 <li>
-                                        IncludeFilter - include only items 
-                                        matching filter.  (can occur mulitple
+                                        IncludeFilter - include only items
+                                        matching filter.  (can occur multiple
                                         times)
                                 </li>
                                 <li>
-                                        ExcludeFilter - exclude only items 
+                                        ExcludeFilter - exclude only items
                                         matching filter.  (can occur multiple
                                         times)
                                 </li>
@@ -309,9 +309,9 @@
                                         (probably from the command line) read
                                         this properties file and get them from
                                         it.  Command line options override
-                                        the properties file in the case of 
+                                        the properties file in the case of
                                         duplicates.  There should also be an
-                                        enivironment variable or VM parameter to
+                                        environment variable or VM parameter to
                                         set this.
                                 </li>
                         </ul>
@@ -320,19 +320,19 @@
                         </p>
                                                 <p>
                                 This should extend the AbstractCrawler and
-                                support any addtional options required for a
-                                filesystem index.
+                                support any additional options required for a
+                                file system index.
                         </p>
                                                 <p>
 			      <b>HTTP Crawler </b>
                         </p>
                                                 <p>
-                                Supports the AbstractCrawler options as well as:                                
+                                Supports the AbstractCrawler options as well as:
                         </p>
                                                 <ul>
                                 <li>
-                                        span hosts - Wheter to span hosts or not,
-                                        by default this should be no.                                        
+                                        span hosts - Whether to span hosts or not,
+                                        by default this should be no.
                                 </li>
                                 <li>
                                         restrict domains - (ignored if span
@@ -346,11 +346,11 @@
                                         recurse and go to
                                         /nextcontext/index.html this option says
                                         to also try /nextcontext to get the dir
-                                        lsiting)
+                                        listing)
                                 </li>
                                 <li>
                                         map extensions -
-                                        (always/default/never/fallback).  Wether
+                                        (always/default/never/fallback).  Whether
                                         to always use extension mapping, by
                                         default (fallback to mime type), NEVER
                                         or fallback if mime is not available
@@ -376,8 +376,8 @@
         <blockquote>
                                     <p>
                         A configurable registry of document types, their
-                        description, an identifyer, mime-type and file
-                        extension.  This should map both MIME -&gt; factory 
+                        description, an identifier, mime-type and file
+                        extension.  This should map both MIME -&gt; factory
                         and extension -&gt; factory.
                 </p>
                                                 <p>
@@ -500,17 +500,17 @@
       <tr><td>
         <blockquote>
                                     <p>
-                        A class taht maps standard fields from the
+                        A class that maps standard fields from the
                         DocumentFactories into *fields* in the Document objects
                         they create.  I suggest that a regular expression system
                         or xpath might be the most universal way to do this.
                         For instance if perhaps I had an XML factory that
                         represented XML elements as fields, I could map content
-                        from particular fields to ther fields or supress them
+                        from particular fields to their fields or suppress them
                         entirely.  We could even make this configurable.
                 </p>
                                                 <p>
-                
+
                         for example:
                 </p>
                                                 <ul>
@@ -533,11 +533,11 @@
                         title.suppress=false
                         </li>
                 </ul>
-                                                <p>                
-                        In this example we map html documents such that all 
-                        fields are suppressed but author and title.  We map 
-                        author and title to anything in the content matching 
-                        author: (and x characters).  Okay my regular expresions 
+                                                <p>
+                        In this example we map html documents such that all
+                        fields are suppressed but author and title.  We map
+                        author and title to anything in the content matching
+                        author: (and x characters).  Okay my regular expresions
                         suck but hopefully you get the idea.
                 </p>
                             </blockquote>
@@ -554,35 +554,35 @@
       <tr><td>
         <blockquote>
                                     <p>
-                        We might also consider eliminating the DocumentFactory 
-                        entirely by making an AbstractDocument from which the 
-                        current document object would inherit from.  I 
-                        experimented with this locally, and it was a relatively 
-                        minor code change and there was of course no difference 
-                        in performance.  The Document Factory classes would 
-                        instead be instances of various subclasses of 
+                        We might also consider eliminating the DocumentFactory
+                        entirely by making an AbstractDocument from which the
+                        current document object would inherit from.  I
+                        experimented with this locally, and it was a relatively
+                        minor code change and there was of course no difference
+                        in performance.  The Document Factory classes would
+                        instead be instances of various subclasses of
                         AbstractDocument.
                 </p>
                                                 <p>
-                        My inspiration for this is HTDig (http://www.htdig.org/).  
-                        While this goes slightly beyond what HTDig provides by 
-                        providing field mapping (where HTDIG is just interested 
-                        in Strings/numbers wherever they are found), it provides 
-                        at least what I would need to use this as a dropin for 
-                        most places I contract at (with the obvious exception of 
-                        a default set of content handlers which would of course 
+                        My inspiration for this is HTDig (http://www.htdig.org/).
+                        While this goes slightly beyond what HTDig provides by
+                        providing field mapping (where HTDIG is just interested
+                        in Strings/numbers wherever they are found), it provides
+                        at least what I would need to use this as a drop-in for
+                        most places I contract at (with the obvious exception of
+                        a default set of content handlers which would of course
                         develop naturally over time).
                 </p>
                                                 <p>
-                        I am able to certainly contribute to this effort if the 
-                        development community is open to it.  I'd suggest we do 
-                        it iteratively in stages and not aim for all of this at 
+                        I am able to certainly contribute to this effort if the
+                        development community is open to it.  I'd suggest we do
+                        it iteratively in stages and not aim for all of this at
                         once (for instance leave out the field mapping at first).
                 </p>
                                                 <p>
-                
-                        Anyhow, please give me some feedback, counter 
-                        suggestions, let me know if I'm way off base or out of 
+
+                        Anyhow, please give me some feedback, counter
+                        suggestions, let me know if I'm way off base or out of
                         line, etc. -Andy
                 </p>
                             </blockquote>
diff --git a/docs/lucenesandbox.html b/docs/lucenesandbox.html
index df5704069d7..8b0799f3fc1 100644
--- a/docs/lucenesandbox.html
+++ b/docs/lucenesandbox.html
@@ -106,7 +106,11 @@
       </td></tr>
       <tr><td>
         <blockquote>
-                </blockquote>
+                                    <P>
+You can access Lucene Sandbox CVS repository at
+<A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
+</P>
+                            </blockquote>
         </p>
       </td></tr>
       <tr><td><br/></td></tr>
diff --git a/docs/whoweare.html b/docs/whoweare.html
index 53b8ec47470..e3030702806 100644
--- a/docs/whoweare.html
+++ b/docs/whoweare.html
@@ -131,11 +131,10 @@ Palo Alto Research Center (PARC), Apple, and Excite@Home, and authored
 several information retrieval <a href="http://lucene.sourceforge.net/publications.html">papers and
 patents</a>.</p>
 
-<p>Doug currently works for <a href="http://www.grandcentral.com/">Grand
-Central</a>.</p>
-
-<p>Please do not email Doug directly about Lucene. Instead use
-the <a href="http://jakarta.apache.org/site/mail.html">Jakarta-Lucene mailing lists</a>.</p>
+<p>Recently Doug has worked on peer-to-peer search at Infrasearch
+(aquired by Sun's JXTA project) and on web services at Grand Central.
+Currently he continues to help develop Lucene and is available for
+contract work.</p>
 
 </li>
 <li><b>Otis Gospodnetic</b> (otis at apache.org)</li>
diff --git a/xdocs/lucenesandbox.xml b/xdocs/lucenesandbox.xml
index ded1cc40f09..0a78523d2b1 100644
--- a/xdocs/lucenesandbox.xml
+++ b/xdocs/lucenesandbox.xml
@@ -16,7 +16,7 @@ not necessarily be maintained, particularly in their current state.
 <P>
 You can access Lucene Sandbox CVS repository at
 <A HREF="http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/">http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/</A>.
-
+</P>
 </section>
 
 </body>
diff --git a/xdocs/whoweare.xml b/xdocs/whoweare.xml
index 40a61269c38..0671b5daa9a 100644
--- a/xdocs/whoweare.xml
+++ b/xdocs/whoweare.xml
@@ -22,11 +22,10 @@ several information retrieval <a
 href="http://lucene.sourceforge.net/publications.html">papers and
 patents</a>.</p>
 
-<p>Doug currently works for <a href="http://www.grandcentral.com/">Grand
-Central</a>.</p>
-
-<p>Please do not email Doug directly about Lucene. Instead use
-the <a href="http://jakarta.apache.org/site/mail.html">Jakarta-Lucene mailing lists</a>.</p>
+<p>Recently Doug has worked on peer-to-peer search at Infrasearch
+(aquired by Sun's JXTA project) and on web services at Grand Central.
+Currently he continues to help develop Lucene and is available for
+contract work.</p>
 
 </li>
 <li><b>Otis Gospodnetic</b> (otis at apache.org)</li>


@@ -59,24 +59,22 @@ Resources FAQ (Official) - - JGuru FAQ Getting Started + + JGuru FAQ Bugs Lucene Bugs + + Articles Query Syntax - Javadoc API + Javadoc Contributions - - Lucene Sandbox - - Articles Plans @@ -117,11 +115,11 @@ Jakarta Lucene is a high-performance, full-featured text search engine written entirely in Java. It is a technology suitable for nearly any -application that requires full-text search, especially cross-platform. +application that requires full-text search, especially cross-platform. -Jakarta Lucene is an open source project available for -free download from Apache Jakarta. +Jakarta Lucene is an open source project available for +free download from Apache Jakarta. Please use the links on the left to access Lucene. @@ -142,14 +140,14 @@ Please use the links on the left to access Lucene. Download it here. -Lucene v1.02 released - This release repackages Lucene as a product -of the Apache Software Foundation. Download it +Lucene v1.02 released - This release repackages Lucene as a product +of the Apache Software Foundation. Download it here. -Lucene Joins Jakarta - The Lucene Team is happy to announce that -Lucene is now a part of the Apache Jakarta Project. This move will -help Lucene continue to grow, and enhance its position as the leading +Lucene Joins Jakarta - The Lucene Team is happy to announce that +Lucene is now a part of the Apache Jakarta Project. This move will +help Lucene continue to grow, and enhance its position as the leading server-side searching solution for Java. @@ -166,7 +164,7 @@ server-side searching solution for Java.
-The goal of the Apache Jakarta Project +The goal of the Apache Jakarta Project is to provide commercial-quality server solutions, based on the Java Platform, developed in an open and cooperative fashion. @@ -191,3 +189,23 @@ developed in an open and cooperative fashion. + + + + + + + + + + + + + + + + + + + + diff --git a/docs/luceneplan.html b/docs/luceneplan.html index d76548dbb99..8c73a02c8dc 100644 --- a/docs/luceneplan.html +++ b/docs/luceneplan.html @@ -115,8 +115,8 @@ The best reference is htDig, though it is not quite as sophisticated as Lucene, it has a number of features that make it - desireable. It however is a traditional c-compiled app - which makes it somewhat unpleasent to install on some + desirable. It however is a traditional c-compiled app + which makes it somewhat unpleasant to install on some platforms (like Solaris!). @@ -124,12 +124,12 @@ community for an initial reaction, advice, feedback and consent. Following this it will be submitted to the Lucene user community for support. Although, I'm (Andy - Oliver) capable of providing these enhancements by - myself, I'd of course prefer to work on them in concert + Oliver) capable of providing these enhancements by + myself, I'd of course prefer to work on them in concert with others. - While I'm outlaying a fairly large featureset, these can + While I'm outlaying a fairly large feature set, these can be implemented incrementally of course (and are probably best if done that way). @@ -148,27 +148,27 @@ The goal is to provide features to Lucene that allow it - to be used as a dropin search engine. It should provide + to be used as a drop-in search engine. It should provide many of the features of projects like htDig while surpassing - them with unique Lucene features and capabillities such as + them with unique Lucene features and capabilities such as easy installation on and java-supporting platform, - and support for document fields and field searches. And + and support for document fields and field searches. And of course, a pragmatic software license. To reach this goal we'll implement code to support the following objectives that augment but do not replace - the current Lucene featureset. + the current Lucene feature set. - Document Location Independance - meaning mapping + Document Location Independence - meaning mapping real contexts to runtime contexts. Essentially, if the document is at /var/www/htdocs/mydoc.html, I probably want it indexed as - http://www.bigevilmegacorp.com/mydoc.html. + http://www.bigevilmegacorp.com/mydoc.html. Standard methods of creating central indicies - @@ -176,21 +176,21 @@ many environments than is remote indexing (for instance http). I would suggest that most folks would prefer that general functionality be - suppored by Lucene instead of having to write + supported by Lucene instead of having to write code for every indexing project. Obviously, if what they are doing is special they'll have to - code, but general document indexing accross - webservers would not qualify. + code, but general document indexing across + web servers would not qualify. - Document interperatation abstraction - currently + Document interpretation abstraction - currently one must handle document object construction via custom code. A standard interface for plugging - in format handlers should be supported. + in format handlers should be supported. Mime and file-extension to document - interperatation mapping. + interpretation mapping. @@ -241,7 +241,7 @@ replacement type - the type of - replacewith path: relative, url or + replace with path: relative, URL or path. @@ -266,8 +266,8 @@ 0 - Long.MAX_VALUE. - SleeptimeBetweenCalls - can be used to - avoid flooding a machine with too many + SleeptimeBetweenCalls - can be used to + avoid flooding a machine with too many requests @@ -276,12 +276,12 @@ inactivity. - IncludeFilter - include only items - matching filter. (can occur mulitple + IncludeFilter - include only items + matching filter. (can occur multiple times) - ExcludeFilter - exclude only items + ExcludeFilter - exclude only items matching filter. (can occur multiple times) @@ -309,9 +309,9 @@ (probably from the command line) read this properties file and get them from it. Command line options override - the properties file in the case of + the properties file in the case of duplicates. There should also be an - enivironment variable or VM parameter to + environment variable or VM parameter to set this. @@ -320,19 +320,19 @@ This should extend the AbstractCrawler and - support any addtional options required for a - filesystem index. + support any additional options required for a + file system index. HTTP Crawler - Supports the AbstractCrawler options as well as: + Supports the AbstractCrawler options as well as: - span hosts - Wheter to span hosts or not, - by default this should be no. + span hosts - Whether to span hosts or not, + by default this should be no. restrict domains - (ignored if span @@ -346,11 +346,11 @@ recurse and go to /nextcontext/index.html this option says to also try /nextcontext to get the dir - lsiting) + listing) map extensions - - (always/default/never/fallback). Wether + (always/default/never/fallback). Whether to always use extension mapping, by default (fallback to mime type), NEVER or fallback if mime is not available @@ -376,8 +376,8 @@ A configurable registry of document types, their - description, an identifyer, mime-type and file - extension. This should map both MIME -> factory + description, an identifier, mime-type and file + extension. This should map both MIME -> factory and extension -> factory. @@ -500,17 +500,17 @@
- A class taht maps standard fields from the + A class that maps standard fields from the DocumentFactories into fields in the Document objects they create. I suggest that a regular expression system or xpath might be the most universal way to do this. For instance if perhaps I had an XML factory that represented XML elements as fields, I could map content - from particular fields to ther fields or supress them + from particular fields to their fields or suppress them entirely. We could even make this configurable. - + for example: @@ -533,11 +533,11 @@ title.suppress=false - - In this example we map html documents such that all - fields are suppressed but author and title. We map - author and title to anything in the content matching - author: (and x characters). Okay my regular expresions + + In this example we map html documents such that all + fields are suppressed but author and title. We map + author and title to anything in the content matching + author: (and x characters). Okay my regular expresions suck but hopefully you get the idea. @@ -554,35 +554,35 @@
- We might also consider eliminating the DocumentFactory - entirely by making an AbstractDocument from which the - current document object would inherit from. I - experimented with this locally, and it was a relatively - minor code change and there was of course no difference - in performance. The Document Factory classes would - instead be instances of various subclasses of + We might also consider eliminating the DocumentFactory + entirely by making an AbstractDocument from which the + current document object would inherit from. I + experimented with this locally, and it was a relatively + minor code change and there was of course no difference + in performance. The Document Factory classes would + instead be instances of various subclasses of AbstractDocument. - My inspiration for this is HTDig (http://www.htdig.org/). - While this goes slightly beyond what HTDig provides by - providing field mapping (where HTDIG is just interested - in Strings/numbers wherever they are found), it provides - at least what I would need to use this as a dropin for - most places I contract at (with the obvious exception of - a default set of content handlers which would of course + My inspiration for this is HTDig (http://www.htdig.org/). + While this goes slightly beyond what HTDig provides by + providing field mapping (where HTDIG is just interested + in Strings/numbers wherever they are found), it provides + at least what I would need to use this as a drop-in for + most places I contract at (with the obvious exception of + a default set of content handlers which would of course develop naturally over time). - I am able to certainly contribute to this effort if the - development community is open to it. I'd suggest we do - it iteratively in stages and not aim for all of this at + I am able to certainly contribute to this effort if the + development community is open to it. I'd suggest we do + it iteratively in stages and not aim for all of this at once (for instance leave out the field mapping at first). - - Anyhow, please give me some feedback, counter - suggestions, let me know if I'm way off base or out of + + Anyhow, please give me some feedback, counter + suggestions, let me know if I'm way off base or out of line, etc. -Andy diff --git a/docs/lucenesandbox.html b/docs/lucenesandbox.html index df5704069d7..8b0799f3fc1 100644 --- a/docs/lucenesandbox.html +++ b/docs/lucenesandbox.html @@ -106,7 +106,11 @@
- + +You can access Lucene Sandbox CVS repository at +http://cvs.apache.org/viewcvs/jakarta-lucene-sandbox/. + +