lucene/sandbox/contributions/webcrawler-LARM
cmarschner 922db8cfe6 added host semaphores
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150847 13f79535-47bb-0310-9956-ffa450edef68
2002-10-22 15:22:59 +00:00
..
doc added with -kb option 2002-06-30 18:05:15 +00:00
libs added storage pipeline; some fixes on Tokenizer 2002-06-01 18:55:16 +00:00
src added host semaphores 2002-10-22 15:22:59 +00:00
.cvsignore - Files to ignore. 2002-09-15 18:18:10 +00:00
CHANGES.txt added LuceneStorage 2002-06-18 00:49:57 +00:00
README.txt added documentation 2002-05-13 21:26:09 +00:00
TODO.txt see file 2002-06-18 11:39:51 +00:00
build.bat added license info; added anchor text extraction 2002-05-22 23:09:22 +00:00
build.properties.sample - Added oro.jar property. 2002-06-30 14:58:49 +00:00
build.sh added license info; added anchor text extraction 2002-05-22 23:09:22 +00:00
build.xml - Split target build into targets compile and dist. Made compile target the 2002-09-15 00:58:07 +00:00
clean.sh Initial revision 2002-05-04 13:58:45 +00:00
cleanlastrun.sh - Modified to clean dirs/files created by run.sh. 2002-09-14 19:22:45 +00:00
default.build.properties - Default build properties for LARM. Initial checkin. 2002-09-14 19:19:47 +00:00
run.bat added license info; added anchor text extraction 2002-05-22 23:09:22 +00:00
run.sh - Modified to make it usable. This way we don't have to use Ant to run LARM. 2002-09-14 18:51:49 +00:00

README.txt

$Id$

This is the README file for webcrawler-LARM contribution to Lucene Sandbox.

This contribution requires:

a) HTTPClient.jar (not Jakarta's, but this one:
    http://www.innovation.ch/java/HTTPClient/
b) Jakarta ORO package for regular expressions

Put the .jars into the lib directory. 

Some of the HTTPClient source files will be replaced during the build, so they 
will be needed during the build. Sorry, I remember I couldn't do that with
inheritance.

- This contribution also uses portions of the HeX HTML parser, which is
included.

OG>  I am not sure if Clemens' modified this parser in any way.  If not,
OG>  maybe we don't have to include it and can instead just add it to the
OG>  list of required packages.

The parser was put upside down. Although it apparently still needs some 
of the original interfaces, most of them can probably be removed. I will check
that out.

OG>  This code requires(?) JDK 1.4, as it uses assert keyword.

No. It still contains a method called assert() for testing. I will probably 
rename this sometime (e.g. when changing the tests to JUnit).

$Id$