- Fixed spelling a bit.

- Nukes trailing blank spaces. git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149766 13f79535-47bb-0310-9956-ffa450edef68
2002-06-04 15:29:32 +00:00 · 2002-06-04 15:29:32 +00:00 · abefb1b48e
parent 241f32309d
commit abefb1b48e
1 changed files with 68 additions and 68 deletions
--- a/xdocs/luceneplan.xml
+++ b/xdocs/luceneplan.xml
@ -1,5 +1,5 @@
 <?xml version="1.0" encoding="UTF-8"?>
-     
+
 <document>
  <properties>
   <title>Plan for enhancements to Lucene</title>
@ -8,7 +8,7 @@
   </authors>
  </properties>
  <body>
-  
+
        <section name="Purpose">
                <p>
                        The purpose of this document is to outline plans for
@ -21,8 +21,8 @@
                        The best reference is <a href="http://www.htdig.org">
                        htDig</a>, though it is not quite as sophisticated as
                        Lucene, it has a number of features that make it
-                        desireable.  It however is a traditional c-compiled app
-                        which makes it somewhat unpleasent to install on some
+                        desirable.  It however is a traditional c-compiled app
+                        which makes it somewhat unpleasant to install on some
                        platforms (like Solaris!).
                </p>
                <p>
@ -30,42 +30,42 @@
                        community for an initial reaction, advice, feedback and
                        consent.  Following this it will be submitted to the
                        Lucene user community for support.  Although, I'm (Andy
-                        Oliver) capable of providing these enhancements by 
-                        myself, I'd of course prefer to work on them in concert 
+                        Oliver) capable of providing these enhancements by
+                        myself, I'd of course prefer to work on them in concert
                        with others.
                </p>
                <p>
-                        While I'm outlaying a fairly large featureset, these can
+                        While I'm outlaying a fairly large feature set, these can
                        be implemented incrementally of course (and are probably
                        best if done that way).
                </p>
        </section>
-  
+
        <section name="Goal and Objectives">
                <p>
                        The goal is to provide features to Lucene that allow it
-                        to be used as a dropin search engine.  It should provide
+                        to be used as a drop-in search engine.  It should provide
                        many of the features of projects like <a
                        href="http://www.htdig.org">htDig</a> while surpassing
-                        them with unique Lucene features and capabillities such as
+                        them with unique Lucene features and capabilities such as
                        easy installation on and java-supporting platform,
-                        and support for document fields and field searches.  And 
+                        and support for document fields and field searches.  And
                        of course, <a href="http://apache.org/LICENSE">
                        a pragmatic software license</a>.
                </p>
                <p>
                        To reach this goal we'll implement code to support the
                        following objectives that augment but do not replace
-                        the current Lucene featureset.  
+                        the current Lucene feature set.
                </p>
                <ul>
                        <li>
-                                Document Location Independance - meaning mapping
+                                Document Location Independence - meaning mapping
                                real contexts to runtime contexts.
                                Essentially, if the document is at
                                /var/www/htdocs/mydoc.html, I probably want it
                                indexed as
-                                http://www.bigevilmegacorp.com/mydoc.html.                                
+                                http://www.bigevilmegacorp.com/mydoc.html.
                        </li>
                        <li>
                                Standard methods of creating central indicies -
@ -73,21 +73,21 @@
                                many environments than is *remote* indexing (for
                                instance http).  I would suggest that most folks
                                would prefer that general functionality be
-                                suppored by Lucene instead of having to write
+                                supported by Lucene instead of having to write
                                code for every indexing project.  Obviously, if
                                what they are doing is *special* they'll have to
-                                code, but general document indexing accross
-                                webservers would not qualify.
+                                code, but general document indexing across
+                                web servers would not qualify.
                        </li>
                        <li>
-                                Document interperatation abstraction - currently
+                                Document interpretation abstraction - currently
                                one must handle document object construction via
                                custom code.  A standard interface for plugging
-                                in format handlers should be supported.  
+                                in format handlers should be supported.
                        </li>
                        <li>
                                Mime and file-extension to document
-                                interperatation mapping.                                  
+                                interpretation mapping.
                        </li>
                </ul>
        </section>
@ -128,7 +128,7 @@
                                </li>
                                <li>
                                        replacement type - the type of
-                                        replacewith path:  relative, url or
+                                        replace with path:  relative, URL or
                                        path.
                                </li>
                                <li>
@ -153,8 +153,8 @@
                                        0 - Long.MAX_VALUE.
                                </li>
                                <li>
-                                        SleeptimeBetweenCalls - can be used to 
-                                        avoid flooding a machine with too many 
+                                        SleeptimeBetweenCalls - can be used to
+                                        avoid flooding a machine with too many
                                        requests
                                </li>
                                <li>
@ -163,12 +163,12 @@
                                        inactivity.
                                </li>
                                <li>
-                                        IncludeFilter - include only items 
-                                        matching filter.  (can occur mulitple
+                                        IncludeFilter - include only items
+                                        matching filter.  (can occur multiple
                                        times)
                                </li>
                                <li>
-                                        ExcludeFilter - exclude only items 
+                                        ExcludeFilter - exclude only items
                                        matching filter.  (can occur multiple
                                        times)
                                </li>
@ -196,9 +196,9 @@
                                        (probably from the command line) read
                                        this properties file and get them from
                                        it.  Command line options override
-                                        the properties file in the case of 
+                                        the properties file in the case of
                                        duplicates.  There should also be an
-                                        enivironment variable or VM parameter to
+                                        environment variable or VM parameter to
                                        set this.
                                </li>
                        </ul>
@ -209,8 +209,8 @@
                        </p>
                        <p>
                                This should extend the AbstractCrawler and
-                                support any addtional options required for a
-                                filesystem index.
+                                support any additional options required for a
+                                file system index.
                        </p>
                <!--</s2>-->
                <!--<s2 title="HTTPIndexer">-->
@ -218,12 +218,12 @@
 			      <b>HTTP Crawler </b>
                        </p>
                        <p>
-                                Supports the AbstractCrawler options as well as:                                
+                                Supports the AbstractCrawler options as well as:
                        </p>
                        <ul>
                                <li>
-                                        span hosts - Wheter to span hosts or not,
-                                        by default this should be no.                                        
+                                        span hosts - Whether to span hosts or not,
+                                        by default this should be no.
                                </li>
                                <li>
                                        restrict domains - (ignored if span
@ -237,11 +237,11 @@
                                        recurse and go to
                                        /nextcontext/index.html this option says
                                        to also try /nextcontext to get the dir
-                                        lsiting)
+                                        listing)
                                </li>
                                <li>
                                        map extensions -
-                                        (always/default/never/fallback).  Wether
+                                        (always/default/never/fallback).  Whether
                                        to always use extension mapping, by
                                        default (fallback to mime type), NEVER
                                        or fallback if mime is not available
@ -254,12 +254,12 @@
                        </ul>
        <!--        </s2> -->
        </section>
-        
+
        <section name="MIMEMap">
                <p>
                        A configurable registry of document types, their
-                        description, an identifyer, mime-type and file
-                        extension.  This should map both MIME -> factory 
+                        description, an identifier, mime-type and file
+                        extension.  This should map both MIME -> factory
                        and extension -> factory.
                </p>
                <p>
@ -287,7 +287,7 @@
                                        <td>"html,htm"</td>
                                        <td></td>
                                        <td>HTMLDocumentFactory</td>
-                                </tr>                                
+                                </tr>
                        </table>
        </section>
        <section name="DocumentFactory">
@ -300,17 +300,17 @@
        </section>
        <section name="FieldMapping classes">
                <p>
-                        A class taht maps standard fields from the
+                        A class that maps standard fields from the
                        DocumentFactories into *fields* in the Document objects
                        they create.  I suggest that a regular expression system
                        or xpath might be the most universal way to do this.
                        For instance if perhaps I had an XML factory that
                        represented XML elements as fields, I could map content
-                        from particular fields to ther fields or supress them
+                        from particular fields to their fields or suppress them
                        entirely.  We could even make this configurable.
                </p>
                <p>
-                
+
                        for example:
                </p>
                <ul>
@ -333,48 +333,48 @@
                        title.suppress=false
                        </li>
                </ul>
-                <p>                
-                        In this example we map html documents such that all 
-                        fields are suppressed but author and title.  We map 
-                        author and title to anything in the content matching 
-                        author: (and x characters).  Okay my regular expresions 
+                <p>
+                        In this example we map html documents such that all
+                        fields are suppressed but author and title.  We map
+                        author and title to anything in the content matching
+                        author: (and x characters).  Okay my regular expresions
                        suck but hopefully you get the idea.
                </p>
        </section>
        <section name="Final Thoughts">
                <p>
-                        We might also consider eliminating the DocumentFactory 
-                        entirely by making an AbstractDocument from which the 
-                        current document object would inherit from.  I 
-                        experimented with this locally, and it was a relatively 
-                        minor code change and there was of course no difference 
-                        in performance.  The Document Factory classes would 
-                        instead be instances of various subclasses of 
+                        We might also consider eliminating the DocumentFactory
+                        entirely by making an AbstractDocument from which the
+                        current document object would inherit from.  I
+                        experimented with this locally, and it was a relatively
+                        minor code change and there was of course no difference
+                        in performance.  The Document Factory classes would
+                        instead be instances of various subclasses of
                        AbstractDocument.
                </p>
                <p>
-                        My inspiration for this is HTDig (http://www.htdig.org/).  
-                        While this goes slightly beyond what HTDig provides by 
-                        providing field mapping (where HTDIG is just interested 
-                        in Strings/numbers wherever they are found), it provides 
-                        at least what I would need to use this as a dropin for 
-                        most places I contract at (with the obvious exception of 
-                        a default set of content handlers which would of course 
+                        My inspiration for this is HTDig (http://www.htdig.org/).
+                        While this goes slightly beyond what HTDig provides by
+                        providing field mapping (where HTDIG is just interested
+                        in Strings/numbers wherever they are found), it provides
+                        at least what I would need to use this as a drop-in for
+                        most places I contract at (with the obvious exception of
+                        a default set of content handlers which would of course
                        develop naturally over time).
                </p>
                <p>
-                        I am able to certainly contribute to this effort if the 
-                        development community is open to it.  I'd suggest we do 
-                        it iteratively in stages and not aim for all of this at 
+                        I am able to certainly contribute to this effort if the
+                        development community is open to it.  I'd suggest we do
+                        it iteratively in stages and not aim for all of this at
                        once (for instance leave out the field mapping at first).
                </p>
                <p>
-                
-                        Anyhow, please give me some feedback, counter 
-                        suggestions, let me know if I'm way off base or out of 
+
+                        Anyhow, please give me some feedback, counter
+                        suggestions, let me know if I'm way off base or out of
                        line, etc. -Andy
                </p>
        </section>
-                
+
  </body>
 </document>