mirror of https://github.com/apache/lucene.git
PR:
Obtained from: Submitted by: Reviewed by: implemented suggestions by Marc Tucker git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@149703 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
0aa1ebe281
commit
3929335807
|
@ -199,24 +199,24 @@
|
|||
<table border="0" cellspacing="0" cellpadding="2" width="100%">
|
||||
<tr><td bgcolor="#525D76">
|
||||
<font color="#ffffff" face="arial,helvetica,sanserif">
|
||||
<a name="Indexers"><strong>Indexers</strong></a>
|
||||
<a name="Crawlers"><strong>Crawlers</strong></a>
|
||||
</font>
|
||||
</td></tr>
|
||||
<tr><td>
|
||||
<blockquote>
|
||||
<p>
|
||||
Indexers are standard crawlers. They go crawl a file
|
||||
Crawlers are data source executable code. They crawl a file
|
||||
system, ftp site, web site, etc. to create the index.
|
||||
These standard indexers may not make ALL of Lucene's
|
||||
These standard crawlers may not make ALL of Lucene's
|
||||
functionality available, though they should be able to
|
||||
make most of it available through configuration.
|
||||
</p>
|
||||
<p>
|
||||
<b> Abstract Indexer </b>
|
||||
<b> Abstract Crawler </b>
|
||||
</p>
|
||||
<p>
|
||||
The Abstract indexer is basically the parent for all
|
||||
Indexer classes. It provides implementation for the
|
||||
The AbstractCrawler is basically the parent for all
|
||||
Crawler classes. It provides implementation for the
|
||||
following functions/properties:
|
||||
</p>
|
||||
<ul>
|
||||
|
@ -263,6 +263,35 @@
|
|||
be turned on or this is ignored. Range:
|
||||
0 - Long.MAX_VALUE.
|
||||
</li>
|
||||
<li>
|
||||
SleeptimeBetweenCalls - can be used to
|
||||
avoid flooding a machine with too many
|
||||
requests
|
||||
</li>
|
||||
<li>
|
||||
RequestTimeout - kill the crawler
|
||||
request after the specified period of
|
||||
inactivity.
|
||||
</li>
|
||||
<li>
|
||||
IncludeFilter - include only items
|
||||
matching filter. (can occur mulitple
|
||||
times)
|
||||
</li>
|
||||
<li>
|
||||
ExcludeFilter - exclude only items
|
||||
matching filter. (can occur multiple
|
||||
times)
|
||||
</li>
|
||||
<li>
|
||||
MaxItems - stops indexing after x
|
||||
documents have been indexed.
|
||||
</li>
|
||||
<li>
|
||||
MaxMegs - stops indexing after x megs
|
||||
have been indexed.. (should this be in
|
||||
specific crawlers?)
|
||||
</li>
|
||||
<li>
|
||||
properties - in addition to the settings
|
||||
(probably from the command line) read
|
||||
|
@ -275,18 +304,18 @@
|
|||
</li>
|
||||
</ul>
|
||||
<p>
|
||||
<b>FileSystemIndexer</b>
|
||||
<b>FileSystemCrawler</b>
|
||||
</p>
|
||||
<p>
|
||||
This should extend the AbstractIndexer and
|
||||
This should extend the AbstractCrawler and
|
||||
support any addtional options required for a
|
||||
filesystem index.
|
||||
</p>
|
||||
<p>
|
||||
<b>HTTP Indexer </b>
|
||||
<b>HTTP Crawler </b>
|
||||
</p>
|
||||
<p>
|
||||
Supports the AbstractIndexer options as well as:
|
||||
Supports the AbstractCrawler options as well as:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
|
|
|
@ -91,21 +91,21 @@
|
|||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<section name="Indexers">
|
||||
<section name="Crawlers">
|
||||
<p>
|
||||
Indexers are standard crawlers. They go crawl a file
|
||||
Crawlers are data source executable code. They crawl a file
|
||||
system, ftp site, web site, etc. to create the index.
|
||||
These standard indexers may not make ALL of Lucene's
|
||||
These standard crawlers may not make ALL of Lucene's
|
||||
functionality available, though they should be able to
|
||||
make most of it available through configuration.
|
||||
</p>
|
||||
<!--<section name="AbstractIndexer">-->
|
||||
<p>
|
||||
<b> Abstract Indexer </b>
|
||||
<b> Abstract Crawler </b>
|
||||
</p>
|
||||
<p>
|
||||
The Abstract indexer is basically the parent for all
|
||||
Indexer classes. It provides implementation for the
|
||||
The AbstractCrawler is basically the parent for all
|
||||
Crawler classes. It provides implementation for the
|
||||
following functions/properties:
|
||||
</p>
|
||||
<ul>
|
||||
|
@ -152,6 +152,35 @@
|
|||
be turned on or this is ignored. Range:
|
||||
0 - Long.MAX_VALUE.
|
||||
</li>
|
||||
<li>
|
||||
SleeptimeBetweenCalls - can be used to
|
||||
avoid flooding a machine with too many
|
||||
requests
|
||||
</li>
|
||||
<li>
|
||||
RequestTimeout - kill the crawler
|
||||
request after the specified period of
|
||||
inactivity.
|
||||
</li>
|
||||
<li>
|
||||
IncludeFilter - include only items
|
||||
matching filter. (can occur mulitple
|
||||
times)
|
||||
</li>
|
||||
<li>
|
||||
ExcludeFilter - exclude only items
|
||||
matching filter. (can occur multiple
|
||||
times)
|
||||
</li>
|
||||
<li>
|
||||
MaxItems - stops indexing after x
|
||||
documents have been indexed.
|
||||
</li>
|
||||
<li>
|
||||
MaxMegs - stops indexing after x megs
|
||||
have been indexed.. (should this be in
|
||||
specific crawlers?)
|
||||
</li>
|
||||
<li>
|
||||
properties - in addition to the settings
|
||||
(probably from the command line) read
|
||||
|
@ -166,20 +195,20 @@
|
|||
<!--</section>-->
|
||||
<!--<s2 title="FileSystemIndexer">-->
|
||||
<p>
|
||||
<b>FileSystemIndexer</b>
|
||||
<b>FileSystemCrawler</b>
|
||||
</p>
|
||||
<p>
|
||||
This should extend the AbstractIndexer and
|
||||
This should extend the AbstractCrawler and
|
||||
support any addtional options required for a
|
||||
filesystem index.
|
||||
</p>
|
||||
<!--</s2>-->
|
||||
<!--<s2 title="HTTPIndexer">-->
|
||||
<p>
|
||||
<b>HTTP Indexer </b>
|
||||
<b>HTTP Crawler </b>
|
||||
</p>
|
||||
<p>
|
||||
Supports the AbstractIndexer options as well as:
|
||||
Supports the AbstractCrawler options as well as:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
|
|
Loading…
Reference in New Issue