Obtained from:
Submitted by:
Reviewed by:
implemented suggestions by Marc Tucker


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149703 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Andrew C. Oliver 2002-02-24 15:58:41 +00:00
parent 0aa1ebe281
commit 3929335807
2 changed files with 78 additions and 20 deletions

View File

@ -199,24 +199,24 @@
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr><td bgcolor="#525D76">
<font color="#ffffff" face="arial,helvetica,sanserif">
<a name="Indexers"><strong>Indexers</strong></a>
<a name="Crawlers"><strong>Crawlers</strong></a>
</font>
</td></tr>
<tr><td>
<blockquote>
<p>
Indexers are standard crawlers. They go crawl a file
Crawlers are data source executable code. They crawl a file
system, ftp site, web site, etc. to create the index.
These standard indexers may not make ALL of Lucene's
These standard crawlers may not make ALL of Lucene's
functionality available, though they should be able to
make most of it available through configuration.
</p>
<p>
<b> Abstract Indexer </b>
<b> Abstract Crawler </b>
</p>
<p>
The Abstract indexer is basically the parent for all
Indexer classes. It provides implementation for the
The AbstractCrawler is basically the parent for all
Crawler classes. It provides implementation for the
following functions/properties:
</p>
<ul>
@ -263,6 +263,35 @@
be turned on or this is ignored. Range:
0 - Long.MAX_VALUE.
</li>
<li>
SleeptimeBetweenCalls - can be used to
avoid flooding a machine with too many
requests
</li>
<li>
RequestTimeout - kill the crawler
request after the specified period of
inactivity.
</li>
<li>
IncludeFilter - include only items
matching filter. (can occur mulitple
times)
</li>
<li>
ExcludeFilter - exclude only items
matching filter. (can occur multiple
times)
</li>
<li>
MaxItems - stops indexing after x
documents have been indexed.
</li>
<li>
MaxMegs - stops indexing after x megs
have been indexed.. (should this be in
specific crawlers?)
</li>
<li>
properties - in addition to the settings
(probably from the command line) read
@ -275,18 +304,18 @@
</li>
</ul>
<p>
<b>FileSystemIndexer</b>
<b>FileSystemCrawler</b>
</p>
<p>
This should extend the AbstractIndexer and
This should extend the AbstractCrawler and
support any addtional options required for a
filesystem index.
</p>
<p>
<b>HTTP Indexer </b>
<b>HTTP Crawler </b>
</p>
<p>
Supports the AbstractIndexer options as well as:
Supports the AbstractCrawler options as well as:
</p>
<ul>
<li>

View File

@ -91,21 +91,21 @@
</li>
</ul>
</section>
<section name="Indexers">
<section name="Crawlers">
<p>
Indexers are standard crawlers. They go crawl a file
Crawlers are data source executable code. They crawl a file
system, ftp site, web site, etc. to create the index.
These standard indexers may not make ALL of Lucene's
These standard crawlers may not make ALL of Lucene's
functionality available, though they should be able to
make most of it available through configuration.
</p>
<!--<section name="AbstractIndexer">-->
<p>
<b> Abstract Indexer </b>
<b> Abstract Crawler </b>
</p>
<p>
The Abstract indexer is basically the parent for all
Indexer classes. It provides implementation for the
The AbstractCrawler is basically the parent for all
Crawler classes. It provides implementation for the
following functions/properties:
</p>
<ul>
@ -152,6 +152,35 @@
be turned on or this is ignored. Range:
0 - Long.MAX_VALUE.
</li>
<li>
SleeptimeBetweenCalls - can be used to
avoid flooding a machine with too many
requests
</li>
<li>
RequestTimeout - kill the crawler
request after the specified period of
inactivity.
</li>
<li>
IncludeFilter - include only items
matching filter. (can occur mulitple
times)
</li>
<li>
ExcludeFilter - exclude only items
matching filter. (can occur multiple
times)
</li>
<li>
MaxItems - stops indexing after x
documents have been indexed.
</li>
<li>
MaxMegs - stops indexing after x megs
have been indexed.. (should this be in
specific crawlers?)
</li>
<li>
properties - in addition to the settings
(probably from the command line) read
@ -166,20 +195,20 @@
<!--</section>-->
<!--<s2 title="FileSystemIndexer">-->
<p>
<b>FileSystemIndexer</b>
<b>FileSystemCrawler</b>
</p>
<p>
This should extend the AbstractIndexer and
This should extend the AbstractCrawler and
support any addtional options required for a
filesystem index.
</p>
<!--</s2>-->
<!--<s2 title="HTTPIndexer">-->
<p>
<b>HTTP Indexer </b>
<b>HTTP Crawler </b>
</p>
<p>
Supports the AbstractIndexer options as well as:
Supports the AbstractCrawler options as well as:
</p>
<ul>
<li>