From 91b3058b38fb382c2a8f3209d8e141fc573b176d Mon Sep 17 00:00:00 2001 From: cmarschner Date: Tue, 18 Jun 2002 00:49:57 +0000 Subject: [PATCH] added LuceneStorage git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@150788 13f79535-47bb-0310-9956-ffa450edef68 --- sandbox/contributions/webcrawler-LARM/CHANGES.txt | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/sandbox/contributions/webcrawler-LARM/CHANGES.txt b/sandbox/contributions/webcrawler-LARM/CHANGES.txt index 58d8bf752b7..b11e0036f8e 100644 --- a/sandbox/contributions/webcrawler-LARM/CHANGES.txt +++ b/sandbox/contributions/webcrawler-LARM/CHANGES.txt @@ -1,4 +1,16 @@ -$id: $ +$Id$ + +2002-06-18 (cmarschner) + * added an experimental version of Lucene storage. see FetcherMain.java for details how to use it + LuceneStorage simply saves all fields as specified in WebDocument. add a converter to the + storage pipeline before LuceneStorage to do preprocessing + +2002-06-17 (cmarschner) + * moved HostInfo and HostManager to larm.net package + * included URLNormalizer (todo: source code Docs) + * changed filters to use normalized URLs when appropriate; + logs contain normalized version of referer and URL now + (todo: change description of log format in technical_overview.rtf) 2002-06-01 (cmarschner) * divided Storage into LinkStorage and DocumentStorage