mirror of https://github.com/apache/lucene.git
- LARM has left the building... and has been living at larm.sf.net for a long time now.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151018 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
22c837d3be
commit
98faec8805
|
@ -10,7 +10,7 @@
|
|||
|
||||
<fileset dir="."
|
||||
includes="*/build.xml"
|
||||
excludes="webcrawler-LARM/build.xml,taglib/build.xml"
|
||||
excludes="taglib/build.xml"
|
||||
/>
|
||||
</subant>
|
||||
</sequential>
|
||||
|
|
|
@ -256,7 +256,7 @@ public class ChainedFilter extends Filter
|
|||
break;
|
||||
default:
|
||||
doChain(result, reader, DEFAULT, filter);
|
||||
break;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -1,2 +0,0 @@
|
|||
build
|
||||
build.properties
|
|
@ -1,33 +0,0 @@
|
|||
$Id$
|
||||
|
||||
2003-04-11 (cmarschner)
|
||||
* fixed build issues
|
||||
|
||||
2002-06-18 (cmarschner)
|
||||
* added an experimental version of Lucene storage. see FetcherMain.java for details how to use it
|
||||
LuceneStorage simply saves all fields as specified in WebDocument. add a converter to the
|
||||
storage pipeline before LuceneStorage to do preprocessing
|
||||
|
||||
2002-06-17 (cmarschner)
|
||||
* moved HostInfo and HostManager to larm.net package
|
||||
* included URLNormalizer (todo: source code Docs)
|
||||
* changed filters to use normalized URLs when appropriate;
|
||||
logs contain normalized version of referer and URL now
|
||||
(todo: change description of log format in technical_overview.rtf)
|
||||
|
||||
2002-06-01 (cmarschner)
|
||||
* divided Storage into LinkStorage and DocumentStorage
|
||||
* introduced StoragePipeline, made MessageHandler a LinkStorage. Fetcher now stores everything in storages
|
||||
* removed a couple of unused classes
|
||||
now everything's prepared for a LuceneStorage
|
||||
* added build.xml by Mehran Mehr
|
||||
|
||||
2002-05-23 (cmarschner)
|
||||
* removed 0x0d0d from the source files (Otis?)
|
||||
* included Apache License into all of the source files in de.lanlab.larm.* directories
|
||||
* added anchor text deparsing to the Tokenizer
|
||||
* split store.log in two files:
|
||||
- store.log contains the page file index: <referer> <URL> <ResultCode> <MimeType> <Size> <Title> <PageFileNo> <PageFileOffset>
|
||||
- links.log contains link information: <referer> <URL> <isFrame> <AnchorText>
|
||||
* changed lib to libs in the startup scripts
|
||||
* added .bat files for Windows
|
|
@ -1 +0,0 @@
|
|||
LARM is now hosted at http://larm.sourceforge.net/
|
|
@ -1,96 +0,0 @@
|
|||
|
||||
Todos for 1.0 (not yet ordered in decreasing priority)
|
||||
|
||||
$Id$
|
||||
|
||||
-----------------------------------------------------------------------------------------------
|
||||
solved:
|
||||
-----------------------------------------------------------------------------------------------
|
||||
|
||||
|
||||
Bugs:
|
||||
- some relative URLs are not appended appropriately, leading to wrong and growing URLs
|
||||
- 301/302 URLs were not updated: the docs were saved under the old URL, which lead to
|
||||
wrong relative URLs (cmarschner, 2002-06-17)
|
||||
- fixed build.xml
|
||||
|
||||
URLs:
|
||||
- include a URLNormalizer
|
||||
* lowercase host names
|
||||
* avoid ambiguities like '%20' / '+'
|
||||
* make sure http://host URLs end with "/"
|
||||
* avoid host name aliases
|
||||
- two host names / one ip adress can point to the same web site: www.lmu.de / www.uni-muenchen.de
|
||||
- two host names / one ip adress can point to different web sites (then other URLs / pages must differ)
|
||||
suche.lmu.de / interesse.lmu.de
|
||||
* cater 301/302 result codes
|
||||
STATUS: seems to be solved except that URL parameters can occur in different orders, which is NOT resolved
|
||||
host names are resolved by hand, via a synonym in HostManager. (cmarschner, 2002-06-17)
|
||||
problem: URLMessage size doubles
|
||||
|
||||
-----------------------------------------------------------------------------------------------
|
||||
remaining:
|
||||
-----------------------------------------------------------------------------------------------
|
||||
|
||||
* Bugs
|
||||
- on very fast LAN connections (100MBit), sockets are not freed as fast as allocated
|
||||
probably this will be solved by changing from HTTPClient.* to Jakarta HTTP client and reuse sockets
|
||||
|
||||
|
||||
|
||||
* LuceneStorage
|
||||
- define a configurable interface that saves fetched pages into a Lucene index
|
||||
|
||||
* Configuration
|
||||
- move all configuration stuff into a meaningful properties file
|
||||
|
||||
|
||||
* Repository
|
||||
- optionally use a database as repository (caches, queues, logs)
|
||||
- if done so, use URL reordering to speed things up
|
||||
|
||||
* Tests
|
||||
- Put all tests into a JUnit test suite
|
||||
|
||||
* distribution
|
||||
- optionally send messages through a JMS topic.
|
||||
- create an executable that installs a source (like JMS, page files) and a storage pipeline
|
||||
- partition the URL space for distributed Fetchers
|
||||
|
||||
* Speed
|
||||
- avoid synchronization delays by putting several URLMessages into one FetcherTask
|
||||
|
||||
* Services
|
||||
- clean up ThreadMonitor
|
||||
- incorporate a CRON-like service that enables timed GC'ing, batched data transfer, and
|
||||
monitoring
|
||||
|
||||
* Politeness
|
||||
- add the option to restrict the number of host accesses per hour/minute
|
||||
|
||||
* URL Extraction
|
||||
- URLs can be encoded in different encoding styles - see http://www.unicode.org/unicode/faq/unicode_web.html
|
||||
|
||||
* I18N, HTML encoding
|
||||
- determine document encoding style in content-type, meta tag (http-equiv), or Doctype-tag; adapt URLs to
|
||||
encoding style
|
||||
|
||||
* Anchor text extraction
|
||||
* read until a meaningful end tag, not just the first encountered
|
||||
* remove entities
|
||||
* optionally remove Tags, leave ALT attribute
|
||||
* remove redundant spaces
|
||||
|
||||
* URLNormalizer
|
||||
* add possibility to add synonyms to top level domains, i.e. "d1.com = d2.com" --> "sub1.d1.com = sub1.d2.com"
|
||||
* add possibility to detect synonyms automatically, i.e. by comparing IP addresses or file checksums
|
||||
|
||||
Nice-to-have:
|
||||
|
||||
* Stop and Continue (probably with database repository)
|
||||
* "Hot Configure" from outside
|
||||
* Web Interface
|
||||
|
||||
Next topic:
|
||||
* Incremental crawling
|
||||
|
|
@ -1,27 +0,0 @@
|
|||
# -------------------------------------------------------------
|
||||
# D E F A U L T L U C E N E B U I L D P R O P E R T I E S
|
||||
# -------------------------------------------------------------
|
||||
#
|
||||
# DO NOT EDIT THIS FILE IN ORDER TO CUSTOMIZE BUILD PROPERTIES.
|
||||
# CREATE AND EDIT build.properties FILE INSTEAD.
|
||||
#
|
||||
name = webcrawler_LARM
|
||||
version = 0.5
|
||||
final.name = ${name}-${version}
|
||||
|
||||
src.dir = ./src
|
||||
lib.dir = ./libs
|
||||
build.dir = ./build
|
||||
|
||||
src.httpclient = ${lib.dir}/HTTPClient.zip
|
||||
build.classes = ${build.dir}/classes
|
||||
build.src = ${build.dir}/src
|
||||
build.encoding = ISO-8859-1
|
||||
|
||||
|
||||
lucene.jar = /usr/local/jakarta-lucene/lucene.jar
|
||||
oro.jar = /usr/local/jakarta-oro/oro.jar
|
||||
|
||||
build.compiler = modern
|
||||
debug = on
|
||||
deprecation = on
|
Binary file not shown.
Binary file not shown.
|
@ -1,857 +0,0 @@
|
|||
{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1031\deflangfe1031{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}
|
||||
{\f2\fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}{\f3\froman\fcharset2\fprq2{\*\panose 05050102010706020507}Symbol;}{\f14\fnil\fcharset2\fprq2{\*\panose 05000000000000000000}Wingdings;}
|
||||
{\f29\fswiss\fcharset0\fprq2{\*\panose 020b0603020202020204}Trebuchet MS;}{\f31\froman\fcharset0\fprq2{\*\panose 00000000000000000000}Palatino{\*\falt Book Antiqua};}{\f165\froman\fcharset238\fprq2 Times New Roman CE;}
|
||||
{\f166\froman\fcharset204\fprq2 Times New Roman Cyr;}{\f168\froman\fcharset161\fprq2 Times New Roman Greek;}{\f169\froman\fcharset162\fprq2 Times New Roman Tur;}{\f170\froman\fcharset177\fprq2 Times New Roman (Hebrew);}
|
||||
{\f171\froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f172\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f173\fswiss\fcharset238\fprq2 Arial CE;}{\f174\fswiss\fcharset204\fprq2 Arial Cyr;}{\f176\fswiss\fcharset161\fprq2 Arial Greek;}
|
||||
{\f177\fswiss\fcharset162\fprq2 Arial Tur;}{\f178\fswiss\fcharset177\fprq2 Arial (Hebrew);}{\f179\fswiss\fcharset178\fprq2 Arial (Arabic);}{\f180\fswiss\fcharset186\fprq2 Arial Baltic;}{\f181\fmodern\fcharset238\fprq1 Courier New CE;}
|
||||
{\f182\fmodern\fcharset204\fprq1 Courier New Cyr;}{\f184\fmodern\fcharset161\fprq1 Courier New Greek;}{\f185\fmodern\fcharset162\fprq1 Courier New Tur;}{\f186\fmodern\fcharset177\fprq1 Courier New (Hebrew);}
|
||||
{\f187\fmodern\fcharset178\fprq1 Courier New (Arabic);}{\f188\fmodern\fcharset186\fprq1 Courier New Baltic;}{\f397\fswiss\fcharset238\fprq2 Trebuchet MS CE;}{\f401\fswiss\fcharset162\fprq2 Trebuchet MS Tur;}}{\colortbl;\red0\green0\blue0;
|
||||
\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;
|
||||
\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;\red255\green255\blue255;}{\stylesheet{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \snext0 Normal;}{\s1\ql \fi-432\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\adjustright\rin0\lin0\itap0
|
||||
\b\fs36\lang2057\langfe1031\kerning28\cgrid\langnp2057\langfenp1031 \sbasedon0 \snext0 heading 1;}{\s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\adjustright\rin0\lin0\itap0
|
||||
\b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 \sbasedon0 \snext0 heading 2;}{\s3\ql \fi-720\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl2\adjustright\rin0\lin0\itap0
|
||||
\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 heading 3;}{\s4\ql \fi-864\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl3\adjustright\rin0\lin0\itap0
|
||||
\f29\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 heading 4;}{\s5\ql \fi-1008\li1008\ri0\sb240\sa60\widctlpar\jclisttab\tx1008\aspalpha\aspnum\faauto\ls8\ilvl4\adjustright\rin0\lin1008\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 heading 5;}{\s6\ql \fi-1152\li1152\ri0\sb240\sa60\widctlpar\jclisttab\tx1152\aspalpha\aspnum\faauto\ls8\ilvl5\adjustright\rin0\lin1152\itap0
|
||||
\i\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 heading 6;}{\s7\ql \fi-1296\li1296\ri0\sb240\sa60\widctlpar\jclisttab\tx1296\aspalpha\aspnum\faauto\ls8\ilvl6\adjustright\rin0\lin1296\itap0
|
||||
\f1\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 heading 7;}{\s8\ql \fi-1440\li1440\ri0\sb240\sa60\widctlpar\jclisttab\tx1440\aspalpha\aspnum\faauto\ls8\ilvl7\adjustright\rin0\lin1440\itap0
|
||||
\i\f1\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 heading 8;}{\s9\ql \fi-1584\li1584\ri0\sb240\sa60\widctlpar\jclisttab\tx1584\aspalpha\aspnum\faauto\ls8\ilvl8\adjustright\rin0\lin1584\itap0
|
||||
\b\i\f1\fs18\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 heading 9;}{\*\cs10 \additive Default Paragraph Font;}{\s15\ql \li0\ri0\sa60\widctlpar\tqc\tx4536\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f1\fs16\lang1031\langfe1031\langnp1031\langfenp1031 \sbasedon0 \snext15 header;}{\s16\ql \li0\ri0\sl-240\slmult0\widctlpar\tqc\tx4536\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f1\fs16\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext16 footer;}{\*\cs17 \additive \b \sbasedon10 Strong;}{\*\cs18 \additive \i \sbasedon10 Emphasis;}{\s19\ql \li0\ri0\sa120\widctlpar
|
||||
\tqc\tx4536\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f1\fs16\lang1031\langfe1031\langnp1031\langfenp1031 \sbasedon15 \snext19 Fu\'dfzeile 1;}{\s20\ql \li0\ri0\sl-240\slmult0\widctlpar
|
||||
\tqc\tx4536\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f1\fs12\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon16 \snext20 Fu\'dfzeile 2;}{\s21\ql \li0\ri0\sa60\widctlpar
|
||||
\tqc\tx4536\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f1\fs12\lang1031\langfe1031\langnp1031\langfenp1031 \sbasedon15 \snext21 Kopfzeile 2;}{\s22\ql \fi-357\li357\ri0\sa120\widctlpar
|
||||
\jclisttab\tx360\aspalpha\aspnum\faauto\ls7\adjustright\rin0\lin357\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext22 Aufz\'e4hlung 1;}{\s23\ql \fi-357\li714\ri0\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls9\ilvl1\adjustright\rin0\lin714\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext23 Aufz\'e4hlung 2;}{\s24\ql \fi-357\li1077\ri0\widctlpar
|
||||
\jclisttab\tx1080\aspalpha\aspnum\faauto\ls9\ilvl2\adjustright\rin0\lin1077\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext24 Aufz\'e4hlung 3;}{\s25\ql \fi-360\li1440\ri0\widctlpar
|
||||
\jclisttab\tx1440\aspalpha\aspnum\faauto\ls7\ilvl3\adjustright\rin0\lin1440\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext25 Aufz\'e4hlung 4;}{\s26\ql \li0\ri0\sl360\slmult1\widctlpar
|
||||
\tx440\tqr\tldot\tx9062\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 1;}{\s27\ql \li220\ri0\sl360\slmult1\widctlpar
|
||||
\tx660\tx880\tqr\tldot\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin220\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 2;}{\s28\ql \li440\ri0\sl360\slmult1\widctlpar
|
||||
\tx1100\tx1320\tqr\tldot\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin440\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 3;}{
|
||||
\s29\ql \li660\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin660\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 4;}{
|
||||
\s30\ql \li880\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin880\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 5;}{
|
||||
\s31\ql \li1100\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1100\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 6;}{
|
||||
\s32\ql \li1320\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1320\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 7;}{
|
||||
\s33\ql \li1540\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1540\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 8;}{
|
||||
\s34\ql \li1760\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1760\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 9;}{
|
||||
\s35\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \b\fs40\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext35 Inhalt-\'dcberschrift;}{
|
||||
\s36\ql \fi426\li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext36 Body Text 2;}{
|
||||
\s37\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \b\fs56\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext37 Dokumenten-Titel;}{\s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext38 footnote text;}{\*\cs39 \additive \super \sbasedon10 footnote reference;}{\s40\ql \fi-360\li360\ri0\widctlpar\jclisttab\tx360{\*\pn \pnlvlbody\ilvl0\ls13\pnrnot0\pndec }
|
||||
\aspalpha\aspnum\faauto\ls13\adjustright\rin0\lin360\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext40 \sautoupd List Bullet;}{\*\cs41 \additive \ul\cf2 \sbasedon10 Hyperlink;}{\*\cs42 \additive \ul\cf12 \sbasedon10
|
||||
FollowedHyperlink;}}{\*\listtable{\list\listtemplateid1288180088\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1\fbias0 \s40\fi-360\li360\jclisttab\tx360 }{\listname ;}\listid-119}{\list\listtemplateid2025747750\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat9230\levelspace0\levelindent0
|
||||
{\leveltext\leveltemplateid426543360\'01-;}{\levelnumbers;}\loch\af0\hich\af0\dbch\af0\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720\jclisttab\tx720 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567619\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1440\jclisttab\tx1440 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567621\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2160\jclisttab\tx2160 }{\listlevel\levelnfc23\levelnfcn23
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567617\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567619\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3600\jclisttab\tx3600 }{\listlevel
|
||||
\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567621\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li4320
|
||||
\jclisttab\tx4320 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567617\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0
|
||||
\fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567619\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567621\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li6480\jclisttab\tx6480 }{\listname ;}\listid77531085}{\list\listtemplateid67567617\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0
|
||||
\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360\jclisttab\tx360 }{\listname ;}\listid130220302}{\list\listtemplateid1247607680{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3978 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \s22\fi-360\li360\jclisttab\tx360 }{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3880 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \s23\fi-360\li720\jclisttab\tx720 }{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \s24\fi-360\li1080\jclisttab\tx1080 }{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \s25\fi-360\li1440\jclisttab\tx1440 }{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3928 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1800\jclisttab\tx1800 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3880 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2160\jclisttab\tx2160 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2520\jclisttab\tx2520 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3928 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3240\jclisttab\tx3240 }{\listname ;}\listid163085644}{\list\listtemplateid1464243652
|
||||
\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3928 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360\jclisttab\tx360 }
|
||||
{\listname ;}\listid278416750}{\list\listtemplateid67567617\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360\jclisttab\tx360 }{\listname ;}\listid450631953}{\list\listtemplateid67567617\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360\jclisttab\tx360 }{\listname ;}\listid907614837}{\list\listtemplateid1148328050{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\'00;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s1\fi-432\li432\jclisttab\tx432 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1
|
||||
\levelspace0\levelindent0{\leveltext\'03\'00.\'01;}{\levelnumbers\'01\'03;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s2\fi-576\li576\jclisttab\tx576 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0
|
||||
\levelindent0{\leveltext\'05\'00.\'01.\'02;}{\levelnumbers\'01\'03\'05;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s3\fi-720\li720\jclisttab\tx720 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0
|
||||
\levelindent0{\leveltext\'07\'00.\'01.\'02.\'03;}{\levelnumbers\'01\'03\'05\'07;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s4\fi-864\li864\jclisttab\tx864 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1
|
||||
\levelspace0\levelindent0{\leveltext\'09\'00.\'01.\'02.\'03.\'04;}{\levelnumbers\'01\'03\'05\'07\'09;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s5\fi-1008\li1008\jclisttab\tx1008 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0b\'00.\'01.\'02.\'03.\'04.\'05;}{\levelnumbers\'01\'03\'05\'07\'09\'0b;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s6\fi-1152\li1152\jclisttab\tx1152 }{\listlevel\levelnfc0
|
||||
\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0d\'00.\'01.\'02.\'03.\'04.\'05.\'06;}{\levelnumbers\'01\'03\'05\'07\'09\'0b\'0d;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s7\fi-1296\li1296
|
||||
\jclisttab\tx1296 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0f\'00.\'01.\'02.\'03.\'04.\'05.\'06.\'07;}{\levelnumbers\'01\'03\'05\'07\'09\'0b\'0d\'0f;}\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1 \s8\fi-1440\li1440\jclisttab\tx1440 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'11\'00.\'01.\'02.\'03.\'04.\'05.\'06.\'07.\'08;}{\levelnumbers
|
||||
\'01\'03\'05\'07\'09\'0b\'0d\'0f\'11;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s9\fi-1584\li1584\jclisttab\tx1584 }{\listname ;}\listid983581600}{\list\listtemplateid1464243652\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3928 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360\jclisttab\tx360 }{\listname ;}\listid1104501034}{\list\listtemplateid-748938384
|
||||
\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\leveltemplateid426543360\'01-;}{\levelnumbers;}\loch\af0\hich\af0\dbch\af0\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720\jclisttab\tx720 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567619\'01o;}{\levelnumbers;}\f2\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1440\jclisttab\tx1440 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567621\'01\u-3929 ?;}{\levelnumbers
|
||||
;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2160\jclisttab\tx2160 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567617
|
||||
\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567619\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0
|
||||
{\leveltext\leveltemplateid67567621\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li4320\jclisttab\tx4320 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0
|
||||
\levelindent0{\leveltext\leveltemplateid67567617\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1
|
||||
\levelspace0\levelindent0{\leveltext\leveltemplateid67567619\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567621\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li6480\jclisttab\tx6480 }{\listname ;}\listid1374885547}
|
||||
{\list\listtemplateid2068468094\listhybrid{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid-1418546572\'02\'00.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720\jclisttab\tx720 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567641\'02\'01.;}{\levelnumbers\'01;}\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li1440\jclisttab\tx1440 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567643\'02\'02.;}{\levelnumbers\'01;}\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li2160\jclisttab\tx2160 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567631\'02\'03.;}{\levelnumbers\'01;}\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567641\'02\'04.;}{\levelnumbers\'01;}\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567643\'02\'05.;}{\levelnumbers\'01;}\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li4320\jclisttab\tx4320 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567631\'02\'06.;}{\levelnumbers\'01;}\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567641\'02\'07.;}{\levelnumbers\'01;}\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567643\'02\'08.;}{\levelnumbers\'01;}\chbrdr
|
||||
\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li6480\jclisttab\tx6480 }{\listname ;}\listid1380398570}{\list\listtemplateid648566982\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0
|
||||
\levelindent0{\leveltext\'01-;}{\levelnumbers;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360\jclisttab\tx360 }{\listname ;}\listid1506508248}{\list\listtemplateid67567617\listsimple{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360\jclisttab\tx360 }{\listname ;}\listid1542666708}
|
||||
{\list\listtemplateid67567631\listsimple{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0
|
||||
\fi-360\li360\jclisttab\tx360 }{\listname ;}\listid2092700867}{\list\listtemplateid-1297428624\listhybrid{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid1167366418
|
||||
\'02\'00.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720\jclisttab\tx720 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567641\'02\'01.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li1440\jclisttab\tx1440 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567643\'02\'02.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li2160\jclisttab\tx2160 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567631\'02\'03.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567641\'02\'04.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567643\'02\'05.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li4320\jclisttab\tx4320 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567631\'02\'06.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567641\'02\'07.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\leveltemplateid67567643\'02\'08.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li6480\jclisttab\tx6480 }{\listname ;}\listid2133397028}}{\*\listoverridetable{\listoverride\listid450631953\listoverridecount0\ls1}
|
||||
{\listoverride\listid1506508248\listoverridecount0\ls2}{\listoverride\listid2092700867\listoverridecount0\ls3}{\listoverride\listid278416750\listoverridecount0\ls4}{\listoverride\listid1104501034\listoverridecount0\ls5}{\listoverride\listid1542666708
|
||||
\listoverridecount0\ls6}{\listoverride\listid163085644\listoverridecount0\ls7}{\listoverride\listid983581600\listoverridecount0\ls8}{\listoverride\listid163085644\listoverridecount9{\lfolevel\listoverrideformat{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3928 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360\jclisttab\tx360 }}{\lfolevel\listoverrideformat{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3979 ?;}{\levelnumbers;}\f14\fs12\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720\jclisttab\tx720 }}{\lfolevel\listoverrideformat
|
||||
{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1080\jclisttab\tx1080 }}{\lfolevel
|
||||
\listoverrideformat{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1440
|
||||
\jclisttab\tx1440 }}{\lfolevel\listoverrideformat{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3928 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1800\jclisttab\tx1800 }}{\lfolevel\listoverrideformat{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3880 ?;}{\levelnumbers;}\f14
|
||||
\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2160\jclisttab\tx2160 }}{\lfolevel\listoverrideformat{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2520\jclisttab\tx2520 }}{\lfolevel\listoverrideformat{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0
|
||||
\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2880\jclisttab\tx2880 }}{\lfolevel\listoverrideformat{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3928 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3240\jclisttab\tx3240 }}\ls9}{\listoverride\listid130220302\listoverridecount0\ls10}
|
||||
{\listoverride\listid907614837\listoverridecount0\ls11}{\listoverride\listid77531085\listoverridecount0\ls12}{\listoverride\listid-119\listoverridecount0\ls13}{\listoverride\listid1374885547\listoverridecount0\ls14}{\listoverride\listid2133397028
|
||||
\listoverridecount0\ls15}{\listoverride\listid1380398570\listoverridecount0\ls16}}{\info{\title Meine \'dcberschrift}{\author Clemens Marschner}{\operator Clemens Marschner}{\creatim\yr2002\mo6\dy30\hr17\min10}{\revtim\yr2002\mo6\dy30\hr17\min10}
|
||||
{\printim\yr2002\mo5\dy6\hr19\min52}{\version2}{\edmins0}{\nofpages16}{\nofwords5234}{\nofchars24603}{\*\company Ludwig-Maximilians-Universit\'e4t}{\nofcharsws36642}{\vern8249}}\paperw11906\paperh16838\margl1701\margr1133\margt1702\margb1985
|
||||
\deftab708\widowctrl\ftnbj\aenddoc\hyphhotz425\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\formshade\horzdoc\dghspace180\dgvspace180\dghorigin1701\dgvorigin1984\dghshow0\dgvshow0
|
||||
\jexpand\viewkind1\viewscale100\pgbrdrhead\pgbrdrfoot\nolnhtadjtbl \fet0{\*\template C:\\vorlagen\\Standard-Vorlage.dot}\sectd \psz9\linex0\footery866\endnhere\sectdefaultcl {\header \pard\plain \s21\ql \li0\ri0\sa60\widctlpar
|
||||
\tx0\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f1\fs12\lang1031\langfe1031\langnp1031\langfenp1031 {\fs16\lang2057\langfe1031\langnp2057 The Fetcher Web Crawler \endash Technical Overview \endash Version 0.5}{
|
||||
\lang2057\langfe1031\langnp2057
|
||||
\par }}{\footer \pard\plain \s16\ql \li0\ri0\sl-240\slmult0\widctlpar\brdrt\brdrs\brdrw10\brsp20 \tqc\tx4536\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f1\fs16\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {Version: }{\field{\*\fldinst
|
||||
{ REVNUM \\* MERGEFORMAT }}{\fldrslt {\lang1024\langfe1024\noproof 13}}}{\pard\plain \s16\ql \li0\ri0\sl-240\slmult0\widctlpar\brdrt\brdrs\brdrw10\brsp20 \tqc\tx4536\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\v\f1\fs16\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\tc {\tcf92l\tcn }}}{\tab \tab }{\field{\*\fldinst {\cgrid0 PAGE }}{\fldrslt {\lang1024\langfe1024\cgrid0\noproof 1}}}{\cgrid0 / }{\field{\*\fldinst {\cgrid0 NUMPAGES }}{\fldrslt {
|
||||
\lang1024\langfe1024\cgrid0\noproof 16}}}{
|
||||
\par }\pard \s16\ql \li0\ri0\sl-240\slmult0\widctlpar\tqc\tx4536\tqr\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\tab }{\fs24
|
||||
\par }}{\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}
|
||||
{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8
|
||||
\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain \s26\ql \li0\ri0\sl360\slmult1\widctlpar
|
||||
\tx440\tqr\tldot\tx9062\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {
|
||||
\par
|
||||
\par
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par Apache Jakarta Lucene
|
||||
\par
|
||||
\par }\pard\plain \s37\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \b\fs56\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 The Fetcher Web Crawler
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 Technical Overview
|
||||
\par
|
||||
\par Version 0.5
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par Author:
|
||||
\par Clemens Marschner\tab \tab LMU \endash University of Munich, Germany
|
||||
\par }\pard \ql \fi708\li2124\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin2124\itap0 {\lang2057\langfe1031\langnp2057 Clemens.Marschner at campus.lmu.de
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par }\pard\plain \s35\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \b\fs40\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 \page Table Of Contents
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
\par }\pard\plain \s26\ql \li0\ri0\sl360\slmult1\widctlpar\tx440\tqr\tldot\tx9062\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {\field\fldedit{\*\fldinst { TOC \\o "1-3" }}{\fldrslt {1}{
|
||||
\f0\fs24 \tab }{Overview\tab }{\field{\*\fldinst { PAGEREF _Toc8477592 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003500390032000000}}}{\fldrslt {3}}}{\f0\fs24
|
||||
\par }\pard\plain \s27\ql \li220\ri0\sl360\slmult1\widctlpar\tx660\tx880\tqr\tldot\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin220\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {1.1}{\f0\fs24 \tab }{
|
||||
Purpose and Intended Audience}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477593 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003500390033000000}}}{\fldrslt {3}}}{\f0\fs24
|
||||
\par }{1.2}{\f0\fs24 \tab }{Why do we need web crawlers?}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477594 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003500390034000000}}}{\fldrslt {3}}}{\f0\fs24
|
||||
|
||||
\par }{1.3}{\f0\fs24 \tab }{Implementation \endash the first attempt}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477595 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003500390035000000}}}{\fldrslt {4}
|
||||
}}{\f0\fs24
|
||||
\par }{1.4}{\f0\fs24 \tab }{Features of the Fetcher crawler}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477596 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003500390036000000}}}{\fldrslt {4}}}{
|
||||
\f0\fs24
|
||||
\par }{1.5}{\f0\fs24 \tab }{What the crawler can do for you, and what it cannot (yet)}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477597 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003500390037000000
|
||||
}}}{\fldrslt {5}}}{\f0\fs24
|
||||
\par }{1.6}{\f0\fs24 \tab }{Syntax and runtime behaviour}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477598 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003500390038000000}}}{\fldrslt {6}}}{\f0\fs24
|
||||
|
||||
\par }\pard\plain \s26\ql \li0\ri0\sl360\slmult1\widctlpar\tx440\tqr\tldot\tx9062\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {2}{\f0\fs24 \tab }{Architecture\tab }{\field{\*\fldinst {
|
||||
PAGEREF _Toc8477599 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003500390039000000}}}{\fldrslt {7}}}{\f0\fs24
|
||||
\par }\pard\plain \s27\ql \li220\ri0\sl360\slmult1\widctlpar\tx660\tx880\tqr\tldot\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin220\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {2.1}{\f0\fs24 \tab }{Performance}{\tab }
|
||||
{\field{\*\fldinst { PAGEREF _Toc8477600 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300030000000}}}{\fldrslt {8}}}{\f0\fs24
|
||||
\par }{2.2}{\f0\fs24 \tab }{Memory Usage}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477601 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300031000000}}}{\fldrslt {10}}}{\f0\fs24
|
||||
\par }{2.3}{\f0\fs24 \tab }{The Filters}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477602 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300032000000}}}{\fldrslt {12}}}{\f0\fs24
|
||||
\par }\pard\plain \s28\ql \li440\ri0\sl360\slmult1\widctlpar\tx1100\tx1320\tqr\tldot\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin440\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {2.3.1}{\f0\fs24 \tab }{RobotExclusionFilter}{
|
||||
\tab }{\field{\*\fldinst { PAGEREF _Toc8477603 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300033000000}}}{\fldrslt {12}}}{\f0\fs24
|
||||
\par }{2.3.2}{\f0\fs24 \tab }{URLLengthFilter}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477604 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300034000000}}}{\fldrslt {12}}}{\f0\fs24
|
||||
\par }{2.3.3}{\f0\fs24 \tab }{KnownPathsFilter}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477605 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300035000000}}}{\fldrslt {12}}}{\f0\fs24
|
||||
\par }{2.3.4}{\f0\fs24 \tab }{URLScopeFilter}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477606 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300036000000}}}{\fldrslt {12}}}{\f0\fs24
|
||||
\par }{2.3.5}{\f0\fs24 \tab }{URLVisitedFilter}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477607 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300037000000}}}{\fldrslt {12}}}{\f0\fs24
|
||||
\par }{2.3.6}{\f0\fs24 \tab }{Fetcher}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477608 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300038000000}}}{\fldrslt {12}}}{\f0\fs24
|
||||
\par }{2.3.7}{\f0\fs24 \tab }{A Note on DNS}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477609 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600300039000000}}}{\fldrslt {13}}}{\f0\fs24
|
||||
\par }\pard\plain \s26\ql \li0\ri0\sl360\slmult1\widctlpar\tx440\tqr\tldot\tx9062\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {3}{\f0\fs24 \tab }{Future Enhancements\tab }
|
||||
{\field{\*\fldinst { PAGEREF _Toc8477610 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600310030000000}}}{\fldrslt {14}}}{\f0\fs24
|
||||
\par }\pard\plain \s27\ql \li220\ri0\sl360\slmult1\widctlpar\tx660\tx880\tqr\tldot\tx9072\aspalpha\aspnum\faauto\adjustright\rin0\lin220\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {3.1}{\f0\fs24 \tab }{\'93Politeness\'94}{\tab }
|
||||
{\field{\*\fldinst { PAGEREF _Toc8477611 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600310031000000}}}{\fldrslt {14}}}{\f0\fs24
|
||||
\par }{3.2}{\f0\fs24 \tab }{The processing pipeline}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477612 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600310032000000}}}{\fldrslt {14}}}{\f0\fs24
|
||||
|
||||
\par }{3.3}{\f0\fs24 \tab }{Lucene integration}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477613 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600310033000000}}}{\fldrslt {14}}}{\f0\fs24
|
||||
\par }{3.4}{\f0\fs24 \tab }{A Real Server}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477614 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600310034000000}}}{\fldrslt {14}}}{\f0\fs24
|
||||
\par }{3.5}{\f0\fs24 \tab }{Distribution}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477615 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600310035000000}}}{\fldrslt {14}}}{\f0\fs24
|
||||
\par }{3.6}{\f0\fs24 \tab }{URL Reordering}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477616 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600310036000000}}}{\fldrslt {15}}}{\f0\fs24
|
||||
\par }{3.7}{\f0\fs24\lang1024\langfe1024\langnp1031 \tab }{Recovery}{\tab }{\field{\*\fldinst { PAGEREF _Toc8477617 \\h }{{\*\datafield 08d0c9ea79f9bace118c8200aa004ba90b02000000080000000c0000005f0054006f00630038003400370037003600310037000000}}}{\fldrslt {15}
|
||||
}}{\f0\fs24\lang1024\langfe1024\langnp1031
|
||||
\par }\pard\plain \s26\ql \li0\ri0\sl360\slmult1\widctlpar\tx440\tqr\tldot\tx9062\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 }}\pard\plain \s26\ql \li0\ri0\sl360\slmult1\widctlpar
|
||||
\tx440\tqr\tldot\tx9062\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {
|
||||
\par {\*\bkmkstart _Toc8477592}{\listtext\pard\plain\s1 \b\fs36\lang2057\langfe1031\kerning28\langnp2057 \hich\af0\dbch\af0\loch\f0 1\tab}}\pard\plain \s1\ql \fi-432\li0\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\outlinelevel0\adjustright\rin0\lin0\itap0 \b\fs36\lang2057\langfe1031\kerning28\cgrid\langnp2057\langfenp1031 {Overview{\*\bkmkend _Toc8477592}
|
||||
\par {\*\bkmkstart _Toc8477593}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 1.1\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Purpose and Intended Audience{\*\bkmkend _Toc8477593}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
This document was made for Lucene developers, not necessarily with any background knowledge on crawlers, t
|
||||
o understand the inner workings of the Fetcher crawler, the current problems and some directions for future development. The aim is to keep the entry costs low for people who have an interest in developing this piece of software further.
|
||||
\par {\*\bkmkstart _Toc8477594}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 1.2\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Why do we need web crawlers?{\*\bkmkend _Toc8477594}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
The answer is: Because the web is not perfect. It became necessary because the web standard protocols didn\rquote t contain any mechanisms to inform search engines that the data on a web server had been changed. If this were possible, a search engine c
|
||||
ould be notified in a \'93push\'94 fashion, which would simplify the total process and would make indexes as current as possible.
|
||||
\par Imagine a web server that notifies another web server that a link was created from one of its pages to the other server. That other server could then send a message back if the page was removed.}{\cs39\lang2057\langfe1031\super\langnp2057 \chftn
|
||||
{\footnote \pard\plain \s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\cs39\super \chftn }{\lang2057\langfe1031\langnp2057
|
||||
I know that there is research on that matter. }}}{\lang2057\langfe1031\langnp2057
|
||||
\par On the other hand, this system would be a lot more complicated to handle. Keeping distributed information up to date is an erroneous task. Even in a single relational database it is often co
|
||||
mplicated to define and handle dependencies between relations. Should it be possible to allow inconsistencies for a short period of time? Should dependent data be deleted if a record is removed? Handling relationships between clusters of information well
|
||||
incorporates a new level of complexity.
|
||||
\par In order to keep the software (web servers and browsers) simple, the inventors of the web concentrated on just a few core elements \endash URLs for (more or less) uniquely identifying distributed information, HTTP for handl
|
||||
ing the information, and HTML for structuring it. That system was so simple that one could understand it in a very short time. This is probably one of the main reasons why the WWW became so popular. Well, another one would probably be coloured, moving gra
|
||||
phics of naked people.
|
||||
\par But the WWW has some major disadvantages: There is no single index of all available pages. Information can change without notice. URLs can point to pages that no longer exist. There is no mechanism to get \'93all\'94 pages from a web server
|
||||
. The whole system is in a constant process of change. And after all, the whole thing is growing at phenomenal rates. Building a search engine on top of that is not something you can do on a Saturday afternoon. Given the sheer size, it would take months t
|
||||
o search through all the pages in order to answer a single query, even if we had a means to get from server to server, get the pages from there, and search them. But we don\rquote t even know how to do }{\i\lang2057\langfe1031\langnp2057 that}{
|
||||
\lang2057\langfe1031\langnp2057 , since we don\rquote t know all the web servers.
|
||||
\par That first problem was addressed by bookmark collections, which soon became very popular. The most popular probably was Yahoo, which evolved to one of the most popular pages in the web just a year after it emerged from a college dorm room.
|
||||
\par The second problem was how to get the information from all those pages laying around. This is where a web crawler comes in.
|
||||
\par Ok, those engineers said, we are not able to get a list of all the pages. But almost every page contains links to other pages. We can save a page, extract all the
|
||||
links, and load all of these pages these links point to. If we start at a popular location which contains a lot of links, like Yahoo for example, chances should be that we can get \'93all\'94 pages on the web.
|
||||
\par A little more formal, the web can be seen as a directional graph, with pages as nodes and links as edges between them. A web crawler, also called \'93spider\'94 or \'93fetcher\'94
|
||||
, uses the graph structure of the web to get documents in order to be able to index them. Since there is no \'93push\'94 mechanism for updating our index, we need to \'93pull\'94 the information on our own, by repeatedly crawling the web.
|
||||
\par {\*\bkmkstart _Toc8477595}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 1.3\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Implementation \endash the first attempt{\*\bkmkend _Toc8477595}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 \'93Easy\'94, you may think now, \'93
|
||||
just implement what he said in the paragraph before.\'94 So you start getting a page, extracting the links, following all the pages you have not already visited\'85 In Perl that can be done in a few lines of code.
|
||||
\par But then, very soon (I can tell you), you end up in a lot of problems:
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls14\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057 a server doesn
|
||||
\rquote t respond. Your program always wait for it to time out
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}you get OutOfMemory errors soon after the beginning
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}your hard drive fills up
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}You notice that one page is loaded again time after time, because the URL changed a little
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Some servers will behave very strange. They will respond after 30 seconds, sometimes they time out, sometimes they are not accessible at all
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}some URLs will get longer and longer. Suddenly you will get URLs with a length of thousands of characters.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}But the main problem will be: you notice that your network interface card (NIC) is waiting, and your CPU is waiting. What\rquote
|
||||
s going on? The overall process will take days
|
||||
\par {\*\bkmkstart _Toc8477596}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 1.4\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Features of the Fetcher crawler{\*\bkmkend _Toc8477596}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
The Fetcher web crawler is a result of experiences with the errors as mentioned above, connected with a lot of monitoring to get the maximum out of the given system ressources. It was designed with several different aspects in mind:
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls14\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
Speed. This involves balancing the resources to prevent bottlenecks. The crawler is multithreaded. A lot of work went in avoiding synchronization between threads, i.e. by rewriting or
|
||||
replacing the standard Java classes, which slows down multithreaded programs a lot
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Simplicity. The underlying scheme is quite modular and comprehensible. See the description of the pipeline below
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Power. The modular design and the ease of the Java language makes customisation simple
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Java. Although there are many crawlers around at the time when I started to think about it (in Summer 2000), I couldn\rquote
|
||||
t find a good available implementation in Java. If this crawler would have to be integrated in a Java search engine, a homogenous system would be an advantage. And after all, I wanted to see if a fast implementation could be done in this language.
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057
|
||||
\par {\*\bkmkstart _Toc8477597}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 1.5\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {What the crawler can do for you, and what it cannot (yet){\*\bkmkend _Toc8477597}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 What it can do for you:
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls14\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
Crawl a distinct set of the web, on
|
||||
ly restricted by a given regular expression all pages have to match. The pages are saved into page files of max. 50 MB and an index file that contains the links between the URL and the position in the page file. Links are logged as well. This is part of t
|
||||
he standard LogStorage. Other storages exist as well (see below)
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}
|
||||
Crawling is done breadth first. Hosts are accessed in a round-robin manner, to prevent the situation that all threads access one host at once. However, at the moment there is no means to throttle access to a server \endash
|
||||
the crawler works as fast as it can. There are also some problems with this technique, as will be described below.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}The main part of the crawler is implemented as a pool of concurrent threads, which speeds up I/O access
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}The HTML link extractor has been optimised for speed. It was made 10 x faster than a generic SAX parser implementation
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}A lot of logging and monitoring is done, to be able to track down the going-ons in the inside
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}A lot of parts of the crawler have already been optimi
|
||||
sed to consume not more memory then needed. A lot of the internal queues are cached on hard drive, for example. Only the HashMap of already crawled pages and the HostInfo structures still completely remain in memory, thus limiting the number of crawled h
|
||||
osts and the number of crawled pages. At the moment, OutOfMemory errors are not prevented, so beware.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}URLs are passed through a pipeline of filters that limit, for example, the length of a URL, load robots.txt the first time a host is accessed, etc. This p
|
||||
ipeline can be extended easily by adding a Java class to the pipeline.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}
|
||||
The storage mechanism is also pluggable. One of the next issues would be to include this storage mechanism into the pipeline, to allow a seperation of logging, processing, and storage
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057
|
||||
\par On the other hand, at the time of this writing, the crawler has not yet evolved into a production release. The reason is: until now, it just served me alone. These issues remain:
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls14\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
The missing things as noted above
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}There may be bugs which prevent it from ru
|
||||
nning for longer than a couple of hours. I noticed for example that very slowly system sockets were eaten, although the Java code seemed to be ok. One reason why I wanted to publish it now was to have other people have a look on the code, to learn from th
|
||||
eir experiences and to let them find errors I couldn\rquote t see anymore.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}
|
||||
Only some of the configuration can be done with command line parameters. The pipeline is put together in the startup procedure. It should not be very hard to put that into a property file
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}The ThreadMonitor is very experimental. It has evolved from a pure monitoring mechanism to a central part of the whole crawler. It should probably be refactored.
|
||||
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Speed could still be optimised. Synchronization takes place too often
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}After all, the crawler is not yet incorporated into the Lucene engine.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}URLs should be handled in a more intelligent manner. At the moment \'93http://host?id=1\'94, \'93http://host/?id=1\'94, and \'93http://host/index.shtml?id=1\'94
|
||||
are handled as three different URLs. It also doesn\rquote t recognize host aliases or mirrors. Other crawlers also calculate finger prints of the pages loaded, to prevent loading mirrors. This does not.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}No processing whatsoever is done on the documents (except extracting the links). It should be decided how much of this is supp
|
||||
osed to be done within the crawler, and what should be done in a post processing step
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Unix is the favoured operating system. I used a SUSE Linux with 2.2 kernel. I remember that I ran into problems with the I/O routines on Windows machines. I haven
|
||||
\rquote t tried it for a long time now, though.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Only http is supported, no file server crawling with recurse directory options, etc.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}It\rquote s not polite. It sucks out the servers, which can impose DOS (Denial of Service) problems
|
||||
\par {\*\bkmkstart _Toc8477598}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 1.6\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Syntax and runtime behaviour{\*\bkmkend _Toc8477598}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 The command line options are very simple:
|
||||
\par }{\b\lang2057\langfe1031\langnp2057 java [-server] [-Xmx<ZZ>mb] \endash classpath fetcher.jar de.lanlab.larm.fetcher.FetcherMain
|
||||
\par \tab \tab -start STARTURL
|
||||
\par \tab \tab -restrictto REGEX
|
||||
\par \tab \tab [-threads[=10]]
|
||||
\par }\pard \ql \fi-1416\li1416\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1416\itap0 {\lang2057\langfe1031\langnp2057 -start\tab a start URL. Currently only one. It must be a valid http-URL, including the http prefix
|
||||
\par -restrictto\tab a (Perl5) regular expression that all
|
||||
\par \tab If you are not familiar with regular expressions
|
||||
\par -threads\tab the number of concurrent threads that crawl the pages. At this time, more than 25 threads don\rquote t provide any advantages because synchronization effects and (probably) the overhead of the scheduler slow the system down
|
||||
\par }{\b\lang2057\langfe1031\langnp2057 Java runtime options:
|
||||
\par }{\lang2057\langfe1031\langnp2057 -server\tab starts the hot spot VM in server mode, which starts up a little slower, but is faster during the run
|
||||
\par -Xmx<ZZ>mb\tab sets the maximum size of the heap to <ZZ> mb. Should be a lot. Set it to what you have
|
||||
\par
|
||||
\par You also have to provide a \'93logs/\'94 directory (won\rquote t be created for you).
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057 You may also want to have a look at the source code, because some options cannot be dealt with from the outside at this time.
|
||||
\par
|
||||
\par }\pard \ql \fi-1416\li1416\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1416\itap0 {\lang2057\langfe1031\langnp2057 What happens now?
|
||||
\par {\listtext\pard\plain\f31\fs22\lang2057\langfe1031\langnp2057 \hich\af31\dbch\af0\loch\f31 1.\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls15\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
The filter pipeline is built. The ScopeFilter is initialised with the expression given by restrictto
|
||||
\par {\listtext\pard\plain\f31\fs22\lang2057\langfe1031\langnp2057 \hich\af31\dbch\af0\loch\f31 2.\tab}The URL is put into the pipeline
|
||||
\par {\listtext\pard\plain\f31\fs22\lang2057\langfe1031\langnp2057 \hich\af31\dbch\af0\loch\f31 3.\tab}The documents are fetched. If the mime type is text/html, links are extracted and put back into the queue
|
||||
. The documents and URLs are forwarded to the storage, which saves them
|
||||
\par {\listtext\pard\plain\f31\fs22\lang2057\langfe1031\langnp2057 \hich\af31\dbch\af0\loch\f31 4.\tab}
|
||||
Meanwhile, every 5 seconds, the ThreadMonitor gathers statistics, flushes log files, starts the garbage collection, and stops the fetcher when everything seems to be done: all threads are idle, and nothing is remaining in the queues
|
||||
\par {\listtext\pard\plain\s1 \b\fs36\lang2057\langfe1031\kerning28\langnp2057 \hich\af0\dbch\af0\loch\f0 2\tab}}\pard\plain \s1\ql \fi-432\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\outlinelevel0\adjustright\rin0\lin0\itap0
|
||||
\b\fs36\lang2057\langfe1031\kerning28\cgrid\langnp2057\langfenp1031 {\page {\*\bkmkstart _Toc8477599}Architecture{\*\bkmkend _Toc8477599}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 I studied the Mercator web crawler}{
|
||||
\cs39\lang2057\langfe1031\super\langnp2057 \chftn {\footnote \pard\plain \s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\cs39\super \chftn }{
|
||||
\lang2057\langfe1031\langnp2057 see }{\field{\*\fldinst {\lang2057\langfe1031\langnp2057 HYPERLINK "http://citeseer.nj.nec.com/heydon99mercator.html" }{\lang2057\langfe1031\langnp2057 {\*\datafield
|
||||
00d0c9ea79f9bace118c8200aa004ba90b02000000170000003100000068007400740070003a002f002f00630069007400650073006500650072002e006e006a002e006e00650063002e0063006f006d002f0068006500790064006f006e00390039006d00650072006300610074006f0072002e00680074006d006c000000
|
||||
e0c9ea79f9bace118c8200aa004ba90b6200000068007400740070003a002f002f00630069007400650073006500650072002e006e006a002e006e00650063002e0063006f006d002f0068006500790064006f006e00390039006d00650072006300610074006f0072002e00680074006d006c000000}}}{\fldrslt {
|
||||
\cs41\ul\cf2\lang2057\langfe1031\langnp2057 http://citeseer.nj.nec.com/heydon99mercator.html}}}{\lang2057\langfe1031\langnp2057 }}}{\lang2057\langfe1031\langnp2057
|
||||
but decided to implement a somewhat different architecture. Here is a high level overview of the default configuration:
|
||||
\par }{\fs20\lang1024\langfe1024\noproof {\shpgrp{\*\shpinst\shpleft-180\shptop140\shpright8280\shpbottom6980\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr3\shpwrk0\shpfblwtxt0\shpz7\shplid1110
|
||||
{\sp{\sn groupLeft}{\sv 1521}}{\sp{\sn groupTop}{\sv 3150}}{\sp{\sn groupRight}{\sv 9981}}{\sp{\sn groupBottom}{\sv 9990}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}
|
||||
{\sp{\sn fLayoutInCell}{\sv 1}}{\shp{\*\shpinst\shplid1046{\sp{\sn relLeft}{\sv 2961}}{\sp{\sn relTop}{\sv 3150}}{\sp{\sn relRight}{\sv 3501}}{\sp{\sn relBottom}{\sv 8910}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}
|
||||
{\sp{\sn lTxid}{\sv 65536}}{\sp{\sn txflTextFlow}{\sv 3}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18\lang2057\langfe1031\langnp2057 Message Handler }{\i\fs18\lang2057\langfe1031\langnp2057 (Thread)}{\fs18\lang2057\langfe1031\langnp2057
|
||||
\par }}}}{\shp{\*\shpinst\shplid1047{\sp{\sn relLeft}{\sv 4041}}{\sp{\sn relTop}{\sv 4144}}{\sp{\sn relRight}{\sv 6741}}{\sp{\sn relBottom}{\sv 4504}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 131072}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLScopeFilter}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1048{\sp{\sn relLeft}{\sv 4041}}{\sp{\sn relTop}{\sv 3604}}{\sp{\sn relRight}{\sv 6741}}{\sp{\sn relBottom}{\sv 3964}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 196608}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLLengthFilter}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1049{\sp{\sn relLeft}{\sv 4041}}{\sp{\sn relTop}{\sv 4684}}{\sp{\sn relRight}{\sv 6741}}{\sp{\sn relBottom}{\sv 5044}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 262144}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 RobotExclusionFilter}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1050{\sp{\sn relLeft}{\sv 4041}}{\sp{\sn relTop}{\sv 5223}}{\sp{\sn relRight}{\sv 6741}}{\sp{\sn relBottom}{\sv 5583}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 327680}}{\sp{\sn hspNext}{\sv 1050}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLVisitedFilter}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1051{\sp{\sn relLeft}{\sv 4041}}{\sp{\sn relTop}{\sv 5763}}{\sp{\sn relRight}{\sv 6741}}{\sp{\sn relBottom}{\sv 6123}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 393216}}{\sp{\sn hspNext}{\sv 1051}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 KnownPathsFilter}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1053{\sp{\sn relLeft}{\sv 4041}}{\sp{\sn relTop}{\sv 9270}}{\sp{\sn relRight}{\sv 6741}}{\sp{\sn relBottom}{\sv 9990}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 524288}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 Storage}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1054{\sp{\sn relLeft}{\sv 8718}}{\sp{\sn relTop}{\sv 4050}}{\sp{\sn relRight}{\sv 9978}}{\sp{\sn relBottom}{\sv 4770}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 589824}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 Host\line Manager}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1055{\sp{\sn relLeft}{\sv 4038}}{\sp{\sn relTop}{\sv 6390}}{\sp{\sn relRight}{\sv 6738}}{\sp{\sn relBottom}{\sv 8910}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 655360}}{\sp{\sn hspNext}{\sv 1055}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 Fetcher}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1056{\sp{\sn relLeft}{\sv 5301}}{\sp{\sn relTop}{\sv 3963}}{\sp{\sn relRight}{\sv 5301}}{\sp{\sn relBottom}{\sv 4143}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1057{\sp{\sn relLeft}{\sv 5301}}{\sp{\sn relTop}{\sv 5043}}{\sp{\sn relRight}{\sv 5301}}{\sp{\sn relBottom}{\sv 5223}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1058{\sp{\sn relLeft}{\sv 5301}}{\sp{\sn relTop}{\sv 4503}}{\sp{\sn relRight}{\sv 5301}}{\sp{\sn relBottom}{\sv 4683}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1059{\sp{\sn relLeft}{\sv 5301}}{\sp{\sn relTop}{\sv 5583}}{\sp{\sn relRight}{\sv 5301}}{\sp{\sn relBottom}{\sv 5763}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1061{\sp{\sn relLeft}{\sv 5298}}{\sp{\sn relTop}{\sv 6124}}{\sp{\sn relRight}{\sv 5301}}{\sp{\sn relBottom}{\sv 6390}}{\sp{\sn fRelFlipH}{\sv 1}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1062{\sp{\sn relLeft}{\sv 3501}}{\sp{\sn relTop}{\sv 3784}}{\sp{\sn relRight}{\sv 4038}}{\sp{\sn relBottom}{\sv 3784}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1064{\sp{\sn relLeft}{\sv 6738}}{\sp{\sn relTop}{\sv 4410}}{\sp{\sn relRight}{\sv 8718}}{\sp{\sn relBottom}{\sv 4410}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 1}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineDashing}{\sv 6}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn fLine}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1065{\sp{\sn relLeft}{\sv 4218}}{\sp{\sn relTop}{\sv 6750}}{\sp{\sn relRight}{\sv 6558}}{\sp{\sn relBottom}{\sv 8730}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 720896}}
|
||||
{\sp{\sn hspNext}{\sv 1065}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18
|
||||
FetcherPool }{\i\fs18 (Thread)}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1066{\sp{\sn relLeft}{\sv 4398}}{\sp{\sn relTop}{\sv 7110}}{\sp{\sn relRight}{\sv 4938}}{\sp{\sn relBottom}{\sv 8550}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 786432}}{\sp{\sn txflTextFlow}{\sv 3}}{\sp{\sn hspNext}{\sv 1066}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 FetcherThread
|
||||
\par }}}}{\shp{\*\shpinst\shplid1067{\sp{\sn relLeft}{\sv 5118}}{\sp{\sn relTop}{\sv 7110}}{\sp{\sn relRight}{\sv 5658}}{\sp{\sn relBottom}{\sv 8550}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 851968}}{\sp{\sn txflTextFlow}{\sv 3}}{\sp{\sn hspNext}{\sv 1067}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 FetcherThread
|
||||
\par }}}}{\shp{\*\shpinst\shplid1068{\sp{\sn relLeft}{\sv 5838}}{\sp{\sn relTop}{\sv 7110}}{\sp{\sn relRight}{\sv 6378}}{\sp{\sn relBottom}{\sv 8550}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 917504}}{\sp{\sn txflTextFlow}{\sv 3}}{\sp{\sn hspNext}{\sv 1068}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 FetcherThread
|
||||
\par }}}}{\shp{\*\shpinst\shplid1071{\sp{\sn relLeft}{\sv 2241}}{\sp{\sn relTop}{\sv 3510}}{\sp{\sn relRight}{\sv 2781}}{\sp{\sn relBottom}{\sv 5310}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 983040}}{\sp{\sn dxTextRight}{\sv 0}}{\sp{\sn txflTextFlow}{\sv 3}}{\sp{\sn hspNext}{\sv 1071}}
|
||||
{\sp{\sn fLine}{\sv 0}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\i\fs18 puts U
|
||||
RLs into
|
||||
\par }}}}{\shp{\*\shpinst\shplid1073{\sp{\sn relLeft}{\sv 5841}}{\sp{\sn relTop}{\sv 8550}}{\sp{\sn relRight}{\sv 7281}}{\sp{\sn relBottom}{\sv 8910}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 1114112}}{\sp{\sn dyTextTop}{\sv 0}}{\sp{\sn hspNext}{\sv 1073}}{\sp{\sn fFilled}{\sv 0}}
|
||||
{\sp{\sn fLine}{\sv 0}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\i\fs18
|
||||
WebDocument}{\i
|
||||
\par }}}}{\shp{\*\shpinst\shplid1075{\sp{\sn relLeft}{\sv 6381}}{\sp{\sn relTop}{\sv 4590}}{\sp{\sn relRight}{\sv 8721}}{\sp{\sn relBottom}{\sv 7290}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 0}}{\sp{\sn rotation}{\sv 0}}
|
||||
{\sp{\sn geoRight}{\sv 2340}}{\sp{\sn geoBottom}{\sv 2700}}{\sp{\sn shapePath}{\sv 4}}{\sp{\sn pVerticies}{\sv 8;4;(0,2700);(720,2700);(720,0);(2340,0)}}{\sp{\sn pSegmentInfo}{\sv 2;9;16384;44032;1;44032;1;44032;1;44032
|
||||
;32768}}{\sp{\sn fFillOK}{\sv 1}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineOpacity}{\sv 65536}}{\sp{\sn lineType}{\sv 0}}{\sp{\sn lineDashing}{\sv 6}}{\sp{\sn lineEndArrowhead}{\sv 1}}
|
||||
{\sp{\sn lineEndCapStyle}{\sv 2}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn fLine}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn posh}{\sv 0}}{\sp{\sn posv}{\sv 0}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1076{\sp{\sn relLeft}{\sv 6738}}{\sp{\sn relTop}{\sv 4050}}{\sp{\sn relRight}{\sv 8718}}{\sp{\sn relBottom}{\sv 4410}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 1179648}}
|
||||
{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\i\fs18 use}{\i
|
||||
\par }}}}{\shpgrp{\*\shpinst\shplid1078{\sp{\sn groupLeft}{\sv 1881}}{\sp{\sn groupTop}{\sv 9544}}{\sp{\sn groupRight}{\sv 4941}}{\sp{\sn groupBottom}{\sv 15124}}{\sp{\sn relLeft}{\sv 2781}}
|
||||
{\sp{\sn relTop}{\sv 3510}}{\sp{\sn relRight}{\sv 5301}}{\sp{\sn relBottom}{\sv 9090}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn rotation}{\sv 0}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn posh}{\sv 0}}{\sp{\sn posv}{\sv 0}}
|
||||
{\sp{\sn fLayoutInCell}{\sv 1}}{\shp{\*\shpinst\shplid1070{\sp{\sn relLeft}{\sv 1881}}{\sp{\sn relTop}{\sv 9544}}{\sp{\sn relRight}{\sv 3141}}{\sp{\sn relBottom}{\sv 15124}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 0}}
|
||||
{\sp{\sn geoRight}{\sv 1260}}{\sp{\sn geoBottom}{\sv 4860}}{\sp{\sn shapePath}{\sv 4}}{\sp{\sn pVerticies}{\sv 8;4;(1260,4860);(0,4860);(0,0);(180,0)}}{\sp{\sn pSegmentInfo}{\sv 2;9;16384;44032;1;44032;1;44032;1;44032
|
||||
;32768}}{\sp{\sn fFillOK}{\sv 1}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineOpacity}{\sv 65536}}{\sp{\sn lineType}{\sv 0}}{\sp{\sn lineDashing}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}
|
||||
{\sp{\sn lineEndCapStyle}{\sv 2}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn fLine}{\sv 1}}{\sp{\sn fLayoutInCell}{\sv 1}}}}{\shp{\*\shpinst\shplid1077{\sp{\sn relLeft}{\sv 3141}}{\sp{\sn relTop}{\sv 14584}}
|
||||
{\sp{\sn relRight}{\sv 4941}}{\sp{\sn relBottom}{\sv 15124}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 0}}{\sp{\sn geoRight}{\sv 1800}}{\sp{\sn geoBottom}{\sv 540}}{\sp{\sn shapePath}{\sv 4}}{\sp{\sn pVerticies}{\sv 8;3
|
||||
;(0,540);(1800,540);(1800,0)}}{\sp{\sn pSegmentInfo}{\sv 2;7;16384;44032;1;44032;1;44032;32768}}{\sp{\sn fFillOK}{\sv 1}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineOpacity}{\sv 65536}}{\sp{\sn lineType}{\sv 0}}{\sp{\sn lineDashing}{\sv 0}}
|
||||
{\sp{\sn lineEndCapStyle}{\sv 2}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn fLine}{\sv 1}}{\sp{\sn fLayoutInCell}{\sv 1}}}}}}{\shp{\*\shpinst\shplid1080{\sp{\sn relLeft}{\sv 8721}}{\sp{\sn relTop}{\sv 5786}}
|
||||
{\sp{\sn relRight}{\sv 9981}}{\sp{\sn relBottom}{\sv 6570}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 1245184}}
|
||||
{\sp{\sn hspNext}{\sv 1080}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {
|
||||
\fs16\lang2057\langfe1031\langnp2057 Thread\line Monitor }{\i\fs16\lang2057\langfe1031\langnp2057 (Thread)}{\fs20\lang2057\langfe1031\langnp2057
|
||||
\par }}}}{\shp{\*\shpinst\shplid1081{\sp{\sn relLeft}{\sv 8181}}{\sp{\sn relTop}{\sv 5426}}{\sp{\sn relRight}{\sv 8721}}{\sp{\sn relBottom}{\sv 5966}}{\sp{\sn fRelFlipH}{\sv 1}}{\sp{\sn fRelFlipV}{\sv 1}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1082{\sp{\sn relLeft}{\sv 8181}}{\sp{\sn relTop}{\sv 6146}}{\sp{\sn relRight}{\sv 8721}}{\sp{\sn relBottom}{\sv 6146}}{\sp{\sn fRelFlipH}{\sv 1}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1083{\sp{\sn relLeft}{\sv 8181}}{\sp{\sn relTop}{\sv 6326}}{\sp{\sn relRight}{\sv 8721}}{\sp{\sn relBottom}{\sv 6866}}{\sp{\sn fRelFlipH}{\sv 1}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1084{\sp{\sn relLeft}{\sv 7281}}{\sp{\sn relTop}{\sv 5786}}{\sp{\sn relRight}{\sv 8361}}{\sp{\sn relBottom}{\sv 6686}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 1310720}}
|
||||
{\sp{\sn dyTextTop}{\sv 0}}{\sp{\sn hspNext}{\sv 1084}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\i\fs18 monitors every 5 seconds}{\i
|
||||
\par }}}}{\shp{\*\shpinst\shplid1086{\sp{\sn relLeft}{\sv 9081}}{\sp{\sn relTop}{\sv 6750}}{\sp{\sn relRight}{\sv 9981}}{\sp{\sn relBottom}{\sv 7650}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 22}}{\sp{\sn lTxid}{\sv 1376256}}{\sp{\sn hspNext}{\sv 1086}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {log
|
||||
\par }}}}{\shp{\*\shpinst\shplid1089{\sp{\sn relLeft}{\sv 7461}}{\sp{\sn relTop}{\sv 9090}}{\sp{\sn relRight}{\sv 8361}}{\sp{\sn relBottom}{\sv 9990}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 22}}{\sp{\sn lTxid}{\sv 1507328}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {Log
|
||||
\par }}}}{\shp{\*\shpinst\shplid1092{\sp{\sn relLeft}{\sv 6201}}{\sp{\sn relTop}{\sv 8550}}{\sp{\sn relRight}{\sv 6201}}{\sp{\sn relBottom}{\sv 9270}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1093{\sp{\sn relLeft}{\sv 5481}}{\sp{\sn relTop}{\sv 8550}}{\sp{\sn relRight}{\sv 5481}}{\sp{\sn relBottom}{\sv 9270}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1094{\sp{\sn relLeft}{\sv 4761}}{\sp{\sn relTop}{\sv 8550}}{\sp{\sn relRight}{\sv 4761}}{\sp{\sn relBottom}{\sv 9270}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1095{\sp{\sn relLeft}{\sv 7461}}{\sp{\sn relTop}{\sv 8010}}{\sp{\sn relRight}{\sv 8361}}{\sp{\sn relBottom}{\sv 8910}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 22}}{\sp{\sn lTxid}{\sv 1572864}}
|
||||
{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {Store
|
||||
\par }}}}{\shp{\*\shpinst\shplid1096{\sp{\sn relLeft}{\sv 3501}}{\sp{\sn relTop}{\sv 3964}}{\sp{\sn relRight}{\sv 3861}}{\sp{\sn relBottom}{\sv 6304}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 1048576}}{\sp{\sn dxTextLeft}{\sv 0}}{\sp{\sn dyTextTop}{\sv 0}}{\sp{\sn dxTextRight}{\sv 0}}
|
||||
{\sp{\sn dyTextBottom}{\sv 0}}{\sp{\sn txflTextFlow}{\sv 3}}{\sp{\sn hspNext}{\sv 1096}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLMessage}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1097{\sp{\sn relLeft}{\sv 2421}}{\sp{\sn relTop}{\sv 4950}}{\sp{\sn relRight}{\sv 2781}}{\sp{\sn relBottom}{\sv 7290}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 202}}{\sp{\sn lTxid}{\sv 1441792}}{\sp{\sn dxTextLeft}{\sv 0}}{\sp{\sn dyTextTop}{\sv 0}}{\sp{\sn dxTextRight}{\sv 0}}
|
||||
{\sp{\sn dyTextBottom}{\sv 0}}{\sp{\sn txflTextFlow}{\sv 3}}{\sp{\sn hspNext}{\sv 1097}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLMessage}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1098{\sp{\sn relLeft}{\sv 9441}}{\sp{\sn relTop}{\sv 6570}}{\sp{\sn relRight}{\sv 9441}}{\sp{\sn relBottom}{\sv 6750}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 1}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}{\shp{\*\shpinst\shplid1099{\sp{\sn relLeft}{\sv 6741}}{\sp{\sn relTop}{\sv 9810}}
|
||||
{\sp{\sn relRight}{\sv 7461}}{\sp{\sn relBottom}{\sv 9810}}{\sp{\sn fRelFlipH}{\sv 1}}{\sp{\sn fRelFlipV}{\sv 1}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fArrowheadsOK}{\sv 1}}
|
||||
{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}{\shp{\*\shpinst\shplid1100{\sp{\sn relLeft}{\sv 6741}}{\sp{\sn relTop}{\sv 8730}}{\sp{\sn relRight}{\sv 7461}}{\sp{\sn relBottom}{\sv 9450}}{\sp{\sn fRelFlipH}{\sv 1}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn lidRegroup}{\sv 5}}{\sp{\sn fLayoutInCell}{\sv 1}}}}
|
||||
{\shp{\*\shpinst\shplid1105{\sp{\sn relLeft}{\sv 7461}}{\sp{\sn relTop}{\sv 7024}}{\sp{\sn relRight}{\sv 8361}}{\sp{\sn relBottom}{\sv 7924}}{\sp{\sn fRelFlipH}{\sv 0}}{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 22}}{\sp{\sn lTxid}{\sv 458752}}
|
||||
{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs20 Queue}{
|
||||
\par }}}}{\shp{\*\shpinst\shplid1106{\sp{\sn relLeft}{\sv 1521}}{\sp{\sn relTop}{\sv 6484}}{\sp{\sn relRight}{\sv 2421}}{\sp{\sn relBottom}{\sv 7384}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 22}}{\sp{\sn lTxid}{\sv 1638400}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs20 Queue
|
||||
\par }}}}{\shp{\*\shpinst\shplid1107{\sp{\sn relLeft}{\sv 6561}}{\sp{\sn relTop}{\sv 7564}}{\sp{\sn relRight}{\sv 7461}}{\sp{\sn relBottom}{\sv 7564}}{\sp{\sn fRelFlipH}{\sv 1}}{\sp{\sn fRelFlipV}{\sv 1}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}
|
||||
{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn fLayoutInCell}{\sv 1}}}}{\shp{\*\shpinst\shplid1108{\sp{\sn relLeft}{\sv 2421}}{\sp{\sn relTop}{\sv 6844}}
|
||||
{\sp{\sn relRight}{\sv 2961}}{\sp{\sn relBottom}{\sv 6844}}{\sp{\sn fRelFlipH}{\sv 1}}{\sp{\sn fRelFlipV}{\sv 1}}{\sp{\sn shapeType}{\sv 20}}{\sp{\sn shapePath}{\sv 4}}{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fArrowheadsOK}{\sv 1}}
|
||||
{\sp{\sn fLayoutInCell}{\sv 1}}}}{\shp{\*\shpinst\shplid1109{\sp{\sn relLeft}{\sv 3321}}{\sp{\sn relTop}{\sv 5404}}{\sp{\sn relRight}{\sv 4221}}{\sp{\sn relBottom}{\sv 6304}}{\sp{\sn fRelFlipH}{\sv 0}}
|
||||
{\sp{\sn fRelFlipV}{\sv 0}}{\sp{\sn shapeType}{\sv 22}}{\sp{\sn lTxid}{\sv 1703936}}{\sp{\sn lineDashing}{\sv 2}}{\sp{\sn fLine}{\sv 1}}{\sp{\sn fLayoutInCell}{\sv 1}}{\shptxt \pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {Logs
|
||||
\par }}}}}{\shprslt{\*\do\dobxcolumn\dobypara\dodhgt8199\dpgroup\dpcount47\dpx-180\dpy140\dpxsize8460\dpysize6840\dptxbx\dptxtbrl{\dptxbxtext\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18\lang2057\langfe1031\langnp2057 Message Handler }{\i\fs18\lang2057\langfe1031\langnp2057 (Thread)}{\fs18\lang2057\langfe1031\langnp2057
|
||||
\par }}\dpx1440\dpy0\dpxsize540\dpysize5760\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLScopeFilter}{
|
||||
\par }}\dpx2520\dpy994\dpxsize2700\dpysize360\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLLengthFilter}{
|
||||
\par }}\dpx2520\dpy454\dpxsize2700\dpysize360\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 RobotExclusionFilter}{
|
||||
\par }}\dpx2520\dpy1534\dpxsize2700\dpysize360\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLVisitedFilter}{
|
||||
\par }}\dpx2520\dpy2073\dpxsize2700\dpysize360\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 KnownPathsFilter}{
|
||||
\par }}\dpx2520\dpy2613\dpxsize2700\dpysize360\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 Storage}{
|
||||
\par }}\dpx2520\dpy6120\dpxsize2700\dpysize720\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 Host\line Manager}{
|
||||
\par }}\dpx7197\dpy900\dpxsize1260\dpysize720\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 Fetcher}{
|
||||
\par }}\dpx2517\dpy3240\dpxsize2700\dpysize2520\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840
|
||||
\dpx3780\dpy813\dpxsize0\dpysize180\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx3780\dpy1893\dpxsize0\dpysize180\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840
|
||||
\dpx3780\dpy1353\dpxsize0\dpysize180\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx3780\dpy2433\dpxsize0\dpysize180\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx8460\dppty0\dpptx0\dppty6840
|
||||
\dpx3777\dpy2974\dpxsize3\dpysize266\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx1980\dpy634\dpxsize537\dpysize0\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx8460\dppty0\dpptx0\dppty6840
|
||||
\dpx5217\dpy1260\dpxsize1980\dpysize0\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {
|
||||
\fs18 FetcherPool }{\i\fs18 (Thread)}{
|
||||
\par }}\dpx2697\dpy3600\dpxsize2340\dpysize1980\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxtbrl{\dptxbxtext\pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 FetcherThread
|
||||
\par }}\dpx2877\dpy3960\dpxsize540\dpysize1440\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxtbrl{\dptxbxtext\pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 FetcherThread
|
||||
\par }}\dpx3597\dpy3960\dpxsize540\dpysize1440\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxtbrl{\dptxbxtext\pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 FetcherThread
|
||||
\par }}\dpx4317\dpy3960\dpxsize540\dpysize1440\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxtbrl{\dptxbxtext\pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\i\fs18 puts URLs into
|
||||
\par }}\dpx720\dpy360\dpxsize540\dpysize1800\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinehollow\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\i\fs18 WebDocument}{\i
|
||||
\par }}\dpx4320\dpy5400\dpxsize1440\dpysize360\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinehollow\dppolygon\dppolycount4\dpptx0\dppty2700\dpptx720\dppty2700\dpptx720\dppty0\dpptx2340\dppty0
|
||||
\dpx4860\dpy1440\dpxsize2340\dpysize2700\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\i\fs18 use}{\i
|
||||
\par }}\dpx5217\dpy900\dpxsize1980\dpysize360\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinehollow\dpgroup\dpcount3\dpx1260\dpy360\dpxsize2520\dpysize5580\dppolygon\dppolycount4\dpptx1260\dppty5580
|
||||
\dpptx0\dppty5580\dpptx0\dppty0\dpptx180\dppty0\dpx0\dpy0\dpxsize1038\dpysize5580\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinew15\dplinecor0\dplinecog0\dplinecob0\dppolygon\dppolycount3
|
||||
\dpptx0\dppty540\dpptx1800\dppty540\dpptx1800\dppty0\dpx1038\dpy5040\dpxsize1482\dpysize540\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinew15\dplinecor0\dplinecog0\dplinecob0
|
||||
\dpendgroup\dpx0\dpy0\dpxsize0\dpysize0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs16\lang2057\langfe1031\langnp2057
|
||||
Thread\line Monitor }{\i\fs16\lang2057\langfe1031\langnp2057 (Thread)}{\fs20\lang2057\langfe1031\langnp2057
|
||||
\par }}\dpx7200\dpy2636\dpxsize1260\dpysize784\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840
|
||||
\dpx6660\dpy2276\dpxsize540\dpysize540\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx8460\dppty0\dpptx0\dppty6840\dpx6660\dpy2996\dpxsize540\dpysize0\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx8460\dppty0\dpptx0\dppty6840
|
||||
\dpx6660\dpy3176\dpxsize540\dpysize540\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031
|
||||
{\i\fs18 monitors every 5 seconds}{\i
|
||||
\par }}\dpx5760\dpy2636\dpxsize1080\dpysize900\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinehollow\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {log
|
||||
\par }}\dpx7560\dpy3600\dpxsize900\dpysize900\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {Log
|
||||
\par }}\dpx5940\dpy5940\dpxsize900\dpysize900\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx4680\dpy5400\dpxsize0\dpysize720
|
||||
\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx3960\dpy5400\dpxsize0\dpysize720\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx3240\dpy5400\dpxsize0\dpysize720
|
||||
\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {Store
|
||||
\par }}\dpx5940\dpy4860\dpxsize900\dpysize900\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxtbrl{\dptxbxtext\pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLMessage}{
|
||||
\par }}\dpx1980\dpy814\dpxsize360\dpysize2340\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinehollow\dptxbx\dptxtbrl{\dptxbxtext\pard\plain
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs18 URLMessage}{
|
||||
\par }}\dpx900\dpy1800\dpxsize360\dpysize2340\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat0\dplinehollow\dpline\dpptx8460\dppty0\dpptx0\dppty6840\dpx7920\dpy3420\dpxsize0\dpysize180
|
||||
\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx5220\dpy6660\dpxsize720\dpysize0\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx8460\dppty0\dpptx0\dppty6840\dpx5220\dpy5580\dpxsize720\dpysize720
|
||||
\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs20 Queue}{
|
||||
\par }}\dpx5940\dpy3874\dpxsize900\dpysize900\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\fs20 Queue
|
||||
\par }}\dpx0\dpy3334\dpxsize900\dpysize900\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx5040\dpy4414\dpxsize900\dpysize0
|
||||
\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpline\dpptx0\dppty0\dpptx8460\dppty6840\dpx900\dpy3694\dpxsize540\dpysize0\dplinew15\dplinecor0\dplinecog0\dplinecob0\dptxbx\dptxlrtb{\dptxbxtext\pard\plain
|
||||
\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {Logs
|
||||
\par }}\dpx1800\dpy2254\dpxsize900\dpysize900\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0\dpendgroup\dpx0\dpy0\dpxsize0\dpysize0}}}}{\lang2057\langfe1031\langnp2057
|
||||
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par The message handler is an implementation of a simple }{\i\lang2057\langfe1031\langnp2057 chain of responsibility}{\lang2057\langfe1031\langnp2057 . Implementations of }{\i\lang2057\langfe1031\langnp2057 Message}{\lang2057\langfe1031\langnp2057
|
||||
are passed down a filter chain. Each of the filters can decide whether to send the message along, change it, or even delete it. In this case, Messages of type URLMessage are used. The message handler runs in its own thread. Thus, a call of }{
|
||||
\i\lang2057\langfe1031\langnp2057 putMessage()}{\lang2057\langfe1031\langnp2057 or }{\i\lang2057\langfe1031\langnp2057 putMessages()}{\lang2057\langfe1031\langnp2057 resp. involve a }{\i\lang2057\langfe1031\langnp2057 producer-consumer-}{
|
||||
\lang2057\langfe1031\langnp2057 like message transfer. The filters themselves run within the message handler thread.
|
||||
\par At the end of the pipeline the Fetcher distributes the incoming messages to its worker threads. They are implemented as a }{\i\lang2057\langfe1031\langnp2057 thread pool}{\lang2057\langfe1031\langnp2057 : Several }{\i\lang2057\langfe1031\langnp2057
|
||||
ServerThreads}{\lang2057\langfe1031\langnp2057 are running concurrently and wait for }{\i\lang2057\langfe1031\langnp2057 Tasks}{\lang2057\langfe1031\langnp2057 which include the procedure to be executed. If more tasks are to be done than th
|
||||
reads are available, they are kept in a queue, which will be read whenever a task is finished.
|
||||
\par At this point the pipeline pattern is left}{\cs39\lang2057\langfe1031\super\langnp2057 \chftn {\footnote \pard\plain \s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031
|
||||
{\cs39\super \chftn }{\lang2057\langfe1031\langnp2057 probably this will be one of the foremost places to work on}}}{\lang2057\langfe1031\langnp2057 . The }{\i\lang2057\langfe1031\langnp2057 FetcherTask}{\lang2057\langfe1031\langnp2057
|
||||
itself is still quite monolithic. It gets the document, parses it if possible, and stores it into a
|
||||
storage. In the future one might think of additional configurable processing steps within another processing pipeline. I thought about incorporating it into the filter pipeline, but since the filters are passive components and the }{
|
||||
\i\lang2057\langfe1031\langnp2057 FetcherThreads}{\lang2057\langfe1031\langnp2057 are active, this didn\rquote t work.
|
||||
\par {\*\bkmkstart _Toc8477600}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 2.1\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Performance{\*\bkmkend _Toc8477600}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 The performance was improved about 10-15 times compared to the first na\'ef
|
||||
ve attempts with a pre-built parser and Sun\rquote s network classes. And there is still room left. On a network with about 150 web servers, which the crawler
|
||||
server was connected to by a 100 MBit FDDS connection, I was able to crawl an average of 60 documents per second, or 3,7 MB, after 10 minutes in the startup period. In this first period, crawling is slower because the number of servers is small, so the se
|
||||
rver output limits crawling. There may also be servers that don\rquote t respond. They are excluded from the crawl after a few attempts.
|
||||
\par Overall, performance is affected by a lot of factors: The operating system, the native interface, the Java libraries, the web servers, the number of threads, whether dynamic pages are included in the crawl, etc.
|
||||
\par From a development side, the speed is affected by the balance between I/O and CPU usage. Both has to be kept at 100%, otherwise one of them becomes the bottleneck. Managing these resources is the central part of a crawler.
|
||||
\par Imagine that only one thread is crawling. This is the worst case, as can be seen very fast:
|
||||
\par
|
||||
\par }\trowd \trgaph70\trrh564\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt
|
||||
\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10
|
||||
\clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt
|
||||
\brdrs\brdrw10 \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\pard \ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\b\fs18\lang1036\langfe1031\langnp1036 Action
|
||||
\cell CPU Usage\cell }{\b\fs18\lang1040\langfe1031\langnp1040 I/O Usage\cell }\pard \qc \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\b\fs18\lang2057\langfe1031\langnp2057 :}{\b\fs18\ul\lang2057\langfe1031\langnp2057 Crawler}{
|
||||
\b\fs18\lang2057\langfe1031\langnp2057 \cell Network}{\b\fs18\ul\lang2057\langfe1031\langnp2057 \cell :Web Server\cell }\pard \ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\b\fs18\lang2057\langfe1031\langnp2057 \trowd
|
||||
\trgaph70\trrh564\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt
|
||||
\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10
|
||||
\clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt
|
||||
\brdrs\brdrw10 \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\row }\trowd \trgaph70\trrh407\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr
|
||||
\brdrs\brdrw10 \trbrdrh\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10
|
||||
\clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt
|
||||
\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417
|
||||
\cellx8574\pard \ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 1. Process URL\cell 100%\cell 0%\cell }\pard \qc \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {
|
||||
\fs20\lang1024\langfe1024\noproof {\shp{\*\shpinst\shpleft647\shptop381\shpright3559\shpbottom686\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr3\shpwrk0\shpfblwtxt0\shpz4\shplid1042
|
||||
{\sp{\sn shapeType}{\sv 20}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}{\sp{\sn shapePath}{\sv 4}}{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}
|
||||
{\sp{\sn fLayoutInCell}{\sv 1}}}{\shprslt{\*\do\dobxcolumn\dobypara\dodhgt8196\dpline\dpptx0\dppty0\dpptx2912\dppty305\dpx647\dpy381\dpxsize2912\dpysize305\dplinew15\dplinecor0\dplinecog0\dplinecob0}}}
|
||||
{\shp{\*\shpinst\shpleft557\shptop21\shpright737\shpbottom381\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr3\shpwrk0\shpfblwtxt0\shpz2\shplid1040{\sp{\sn shapeType}{\sv 1}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}
|
||||
{\sp{\sn fLayoutInCell}{\sv 1}}}{\shprslt{\*\do\dobxcolumn\dobypara\dodhgt8194\dprect\dpx557\dpy21\dpxsize180\dpysize360
|
||||
\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0}}}
|
||||
{\shp{\*\shpinst\shpleft647\shptop21\shpright647\shpbottom3801\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr3\shpwrk0\shpfblwtxt0\shpz0\shplid1038{\sp{\sn shapeType}{\sv 20}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}
|
||||
{\sp{\sn shapePath}{\sv 4}}{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn fLayoutInCell}{\sv 1}}}{\shprslt{\*\do\dobxcolumn\dobypara\dodhgt8192\dpline\dpptx0\dppty0\dpptx0\dppty3780
|
||||
\dpx647\dpy21\dpxsize0\dpysize3780\dplinew15\dplinecor0\dplinecog0\dplinecob0}}}}{\fs18\lang2057\langfe1031\langnp2057 \cell \cell }{\fs20\lang1024\langfe1024\noproof
|
||||
{\shp{\*\shpinst\shpleft694\shptop31\shpright694\shpbottom3811\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr3\shpwrk0\shpfblwtxt0\shpz1\shplid1039{\sp{\sn shapeType}{\sv 20}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}
|
||||
{\sp{\sn shapePath}{\sv 4}}{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn fLayoutInCell}{\sv 1}}}{\shprslt{\*\do\dobxcolumn\dobypara\dodhgt8193\dpline\dpptx0\dppty0\dpptx0\dppty3780
|
||||
\dpx694\dpy31\dpxsize0\dpysize3780\dplinew15\dplinecor0\dplinecog0\dplinecob0}}}}{\fs18\lang2057\langfe1031\langnp2057 \cell }\pard \ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 \trowd
|
||||
\trgaph70\trrh407\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt
|
||||
\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb
|
||||
\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone
|
||||
\clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\row }\trowd \trgaph70\trrh270\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh
|
||||
\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418
|
||||
\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb
|
||||
\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\pard
|
||||
\ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 2. Send Request\cell <10%?\cell <100%\cell }\pard \qc \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {
|
||||
\fs18\lang2057\langfe1031\langnp2057 \cell \cell \cell }\pard \ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 \trowd \trgaph70\trrh270\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10
|
||||
\trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr
|
||||
\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl
|
||||
\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740
|
||||
\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\row }\trowd \trgaph70\trrh702\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10
|
||||
\trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl
|
||||
\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323
|
||||
\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\pard
|
||||
\ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 3. Wait\cell 0%\cell 0%\cell }\pard \qc \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {
|
||||
\fs18\lang2057\langfe1031\langnp2057 \cell \cell }{\fs20\lang1024\langfe1024\noproof {\shp{\*\shpinst\shpleft598\shptop2\shpright847\shpbottom723\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr3\shpwrk0\shpfblwtxt0\shpz3\shplid1041
|
||||
{\sp{\sn shapeType}{\sv 1}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}{\sp{\sn fLayoutInCell}{\sv 1}}}{\shprslt{\*\do\dobxcolumn\dobypara\dodhgt8195\dprect\dpx598\dpy2\dpxsize249\dpysize721
|
||||
\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0}}}}{\fs18\lang2057\langfe1031\langnp2057 \cell }\pard
|
||||
\ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 \trowd \trgaph70\trrh702\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh
|
||||
\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418
|
||||
\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb
|
||||
\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\row }\trowd
|
||||
\trgaph70\trrh1264\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt
|
||||
\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb
|
||||
\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone
|
||||
\clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\pard \ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 4. Receive\cell <10%?
|
||||
\cell <100%\cell }\pard \qc \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs20\lang1024\langfe1024\noproof
|
||||
{\shp{\*\shpinst\shpleft652\shptop0\shpright3532\shpbottom1274\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr3\shpwrk0\shpfblwtxt0\shpz5\shplid1043{\sp{\sn shapeType}{\sv 20}}{\sp{\sn fFlipH}{\sv 1}}{\sp{\sn fFlipV}{\sv 0}}
|
||||
{\sp{\sn shapePath}{\sv 4}}{\sp{\sn fFillOK}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}{\sp{\sn lineEndArrowhead}{\sv 1}}{\sp{\sn fArrowheadsOK}{\sv 1}}{\sp{\sn fLayoutInCell}{\sv 1}}}{\shprslt{\*\do\dobxcolumn\dobypara\dodhgt8197
|
||||
\dpline\dpptx2880\dppty0\dpptx0\dppty1274\dpx652\dpy0\dpxsize2880\dpysize1274\dplinew15\dplinecor0\dplinecog0\dplinecob0}}}}{\fs18\lang2057\langfe1031\langnp2057 \cell \cell \cell }\pard
|
||||
\ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 \trowd \trgaph70\trrh1264\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh
|
||||
\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418
|
||||
\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb
|
||||
\brdrnone \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrnone \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\row }\trowd
|
||||
\trgaph70\trrh976\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt
|
||||
\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb
|
||||
\brdrs\brdrw10 \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt
|
||||
\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\pard \ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057
|
||||
Process Doc.\cell 100%\cell 0%\cell }\pard \qc \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs20\lang1024\langfe1024\noproof
|
||||
{\shp{\*\shpinst\shpleft599\shptop34\shpright807\shpbottom969\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr3\shpwrk0\shpfblwtxt0\shpz6\shplid1044{\sp{\sn shapeType}{\sv 1}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}
|
||||
{\sp{\sn fLayoutInCell}{\sv 1}}}{\shprslt{\*\do\dobxcolumn\dobypara\dodhgt8198\dprect\dpx599\dpy34\dpxsize208\dpysize935
|
||||
\dpfillfgcr255\dpfillfgcg255\dpfillfgcb255\dpfillbgcr255\dpfillbgcg255\dpfillbgcb255\dpfillpat1\dplinew15\dplinecor0\dplinecog0\dplinecob0}}}}{\fs18\lang2057\langfe1031\langnp2057 \cell \cell \cell }\pard
|
||||
\ql \li0\ri0\widctlpar\intbl\aspalpha\aspnum\faauto\adjustright\rin0\lin0 {\fs18\lang2057\langfe1031\langnp2057 \trowd \trgaph70\trrh976\trleft-70\trbrdrt\brdrs\brdrw10 \trbrdrl\brdrs\brdrw10 \trbrdrb\brdrs\brdrw10 \trbrdrr\brdrs\brdrw10 \trbrdrh
|
||||
\brdrs\brdrw10 \trbrdrv\brdrs\brdrw10 \trftsWidth1\trautofit1\trpaddl70\trpaddr70\trpaddfl3\trpaddfr3 \clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx1418
|
||||
\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1488 \cellx2906\clvertalt\clbrdrt\brdrs\brdrw10 \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10
|
||||
\cltxlrtb\clftsWidth3\clwWidth1417 \cellx4323\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrs\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx5740\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb
|
||||
\brdrs\brdrw10 \clbrdrr\brdrdash\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx7157\clvertalt\clbrdrt\brdrnone \clbrdrl\brdrdash\brdrw10 \clbrdrb\brdrs\brdrw10 \clbrdrr\brdrs\brdrw10 \cltxlrtb\clftsWidth3\clwWidth1417 \cellx8574\row }\pard
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057
|
||||
\par The diagram to the right resembles a UML sequence diagram, except that it stresses the time that a message needs to traverse the network.
|
||||
\par 1, The URL is processed somehow. That\rquote s the filter part as stated above
|
||||
\par 2. The request is sent. It goes through the different network layers of the crawler server. A TCP/IP connection is established. Several packets are sent back and forth. Then the crawler waits until the web server proc
|
||||
esses the request, looks up the file or renders the page (which can take several seconds or even minutes), then sends the file to the crawler.
|
||||
\par 3. The crawler receives packet after packet, combines them to a file. Probably it is copied through several buffe
|
||||
rs until it is complete. This will take some CPU time, but mostly it will wait for the next packet to arrive. The network transfer by itself is also affected by a lot of factors, i.e. the speed of the web server, acknowledgement messages, resent packages
|
||||
etc. so 100% network utilization will almost never be reached.
|
||||
\par 4. The document is processed, which will take up the whole CPU. The network will be idle at that time.
|
||||
\par The storage process, which by itself uses CPU and disk I/O resources, was left out here. That process will be very similar, although the traversal will be faster.
|
||||
\par As you can see, both CPU and I/O are not used most of the time, and wait for the other one (or the network) to complete. This is the reason why single threaded web crawlers tend to be
|
||||
very slow (wget for example). The slowest component always becomes the bottleneck.
|
||||
\par Two strategies can be followed to make this situation better:
|
||||
\par {\listtext\pard\plain\f31\fs22\lang2057\langfe1031\langnp2057 \hich\af31\dbch\af0\loch\f31 1.\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls16\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
use asynchronous I/O
|
||||
\par {\listtext\pard\plain\f31\fs22\lang2057\langfe1031\langnp2057 \hich\af31\dbch\af0\loch\f31 2.\tab}use several threads
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057 Asynchronous I/O means, I/O requests are sent, but then the crawler continues to process documents it has already crawled.
|
||||
\par Actually I haven\rquote t seen an implementation of this technique. Well, asynchronous I/O was not available in Java until version 1.4. An advantage would be that thread handling is also an expensive process in
|
||||
terms of CPU and memory usage. Threads are resources and, thus, limited. I heard that application server developers wanted asynchronous I/O, to be able to cope with hundreds of simultaneous requests without spawning extra threads for each of them. Probab
|
||||
ly this can be a solution in the future. But from what I know about it today, it will not be necessary
|
||||
\par The way this problem is solved usually in Java is with the use of several threads. If many threads are used, chances are good that at any given moment, at least one thread is in one of the states above, which means both CPU and I/O will be at a maximum.
|
||||
|
||||
\par The problem with this is that multi threaded programming is considered to be one of the most difficult areas in computer science. But given the simple line
|
||||
ar structure of web crawlers, it is not very hard to avoid race conditions or dead lock problems. You always get into problems when threads are supposed to access shared resources, though. Don\rquote
|
||||
t touch this until you have read the standard literature and have made at least 10 mistakes (and solved them!)}{\cs39\lang2057\langfe1031\super\langnp2057 \chftn {\footnote \pard\plain
|
||||
\s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\cs39\super \chftn }{\lang2057\langfe1031\langnp2057
|
||||
see for example Magee, Kramer: Concurrency. State Models and Java Programs. Wiley 1999; Lea, Doug: Concurrent Programming in Java, Second Edition. Design Principles and Patterns. Addison-Wesley 2000}}}{\lang2057\langfe1031\langnp2057 .
|
||||
\par Multithreading doesn\rquote t come without a cost, however. First, there is the cost of thread scheduling. I don\rquote t have numbers for that in Java, but I suppose that this should not be very expensive. MutExes can affect the whole program a lot}{
|
||||
\cs39\lang2057\langfe1031\super\langnp2057 \chftn {\footnote \pard\plain \s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\cs39\super \chftn }{
|
||||
\lang2057\langfe1031\langnp2057 the sequential part of a parallel program has a massive effect on the maximum speed gain of parallelization. See i.e. the \'93Amdahl law\'94
|
||||
(I hope this can be transferred to a single-processor, multithreaded system) in Amdahl, G.: The validity of the single processor approach to achieving large
|
||||
\par scale computing capabilities. In: AFIPS conference proceedings, Spring Joint Computing Conference, Issue 30, pp. 483-485, 1967. Cited by Pizka, Markus: }{\lang2057\langfe1031\langnp2057 Integrated Management of Extensible Distributed Systems}{
|
||||
\i\lang2057\langfe1031\langnp2057 }{\lang2057\langfe1031\langnp2057 (Ph.D. thesis), online at }{\field{\*\fldinst {\lang2057\langfe1031\langnp2057 HYPERLINK "http://wwwbroy.in.tum.de/~pizka/dissertation.pdf" }{\lang2057\langfe1031\langnp2057
|
||||
{\*\datafield
|
||||
00d0c9ea79f9bace118c8200aa004ba90b02000000170000003100000068007400740070003a002f002f00770077007700620072006f0079002e0069006e002e00740075006d002e00640065002f007e00700069007a006b0061002f0064006900730073006500720074006100740069006f006e002e007000640066000000
|
||||
e0c9ea79f9bace118c8200aa004ba90b6200000068007400740070003a002f002f00770077007700620072006f0079002e0069006e002e00740075006d002e00640065002f007e00700069007a006b0061002f0064006900730073006500720074006100740069006f006e002e007000640066000000}}}{\fldrslt {
|
||||
\cs41\ul\cf2\lang2057\langfe1031\langnp2057 http://wwwbroy.in.tum.de/~pizka/dissertation.pdf}}}{\lang2057\langfe1031\langnp2057 (in German)}}}{\lang2057\langfe1031\langnp2057
|
||||
. I noticed that they should be avoided like hell. In a crawler, a MutEx is used, for example, when a new URL is passed to the thread, or when the fetched documents are supposed to be stored linearly, one after the other.
|
||||
\par For
|
||||
example, the tasks used to insert a new URL into the global message handler each time when a new URL was found in the document. I was able to speed it up considerably when I changed this so that the URLs are collected locally and then inserted only once p
|
||||
er document. Probably this can be augmented even further if each task is comprised of several documents which are fetched one after the other and then stored together.
|
||||
\par Nonetheless, keeping the right balance between the two resources is a big concern. At the
|
||||
moment, the number of threads and the number of processing steps is static, and is only optimised by trial and error. Few hosts, slow network -> few threads. slow CPU -> few processing steps. many hosts, fast network -> many threads. Probably those heuri
|
||||
stics will do well, but I wonder if these figures could also be fine-tuned dynamically during runtime.
|
||||
\par Another issue that was optimised were very fine-grained method calls. For example, the original implementation of the HTML parser used to call the read()-method for each character. This call had probably to traverse several }{
|
||||
\i\lang2057\langfe1031\langnp2057 Decorators}{\lang2057\langfe1031\langnp2057 until it got to a \endash synchronized call. That\rquote
|
||||
s why the CharArrayReader was replaced by a SimpleCharArrayReader, because only one thread works on a document at a time.
|
||||
\par These issues can only be traced down with special tools, i.e. profilers. The work is worth it, because it allows one to work on the 20% of the code that costs 80% of the time.
|
||||
\par {\*\bkmkstart _Toc8477601}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 2.2\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Memory Usage{\*\bkmkend _Toc8477601}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 One \'93web crawler law\'94 could be defined as:
|
||||
\par }\pard \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\i\lang2057\langfe1031\langnp2057 What can get infinite, will get infinite. Eventually. Very soon.
|
||||
\par }\pard\plain \s26\ql \li0\ri0\sl360\slmult1\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1024\langfe1024\cgrid\noproof\langnp2057\langfenp1031 {
|
||||
A major task during the development was to get memory usage low. But a lot of work still needs to be done here. Most of the optimizations incorporated now move the problem from main memory to the hard disk, which doesn\rquote t solve the problem.
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 Here are some means that were used:
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls14\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
CachingQueues: The message queue, the Fetcher queue, the robot exclusion queue (see below) \endash a lot of queues can fill up the whole main memory in a very short period of time. But since queues are only acc
|
||||
essed at their ends, a very simple mechanism was implemented to keep memory usage low: The queue was divided into blocks of fixed size. Only the two blocks at its end are kept in RAM. The rest is serialized on disk. In the end, only a list of block refere
|
||||
nces has to be managed
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Define a maximum value for everything, and keep an eye on it. Downloaded files can get \'93infinitely\'94
|
||||
large. URLs can get infinitely long. Servers may contain an infinite set of documents. A lot of these checks had to be included even
|
||||
in the university network mentioned. A special case were the URLs. Some .shtml pages on a web server pointed to a subdirectory that didn\rquote
|
||||
t exist but revealed the same page. If these errors are introduced at will, they are called crawler traps: An infinite URL space. The only way of dealing with this is manually excluding the hosts.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}
|
||||
Optimized HTML parser. Current parsers tend to create a huge amount of very small objects. Most of that work is unnecessary for the task to be done. This can only be optimised b
|
||||
y stripping down the parser to do only what it is supposed to do in that special task.
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057 However, there still remains a problem: The HashMap of already visited URLs needs to be accessed randomly while reading }{
|
||||
\i\lang2057\langfe1031\langnp2057 and}{\lang2057\langfe1031\langnp2057 writing. I can imagine only two ways to overcome this:
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls14\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
Limiting, in some way, the number of URLs in RAM. If the crawler were distributed, this could be done by assigning only a certain number of hosts to each crawler node, while at the same time limiting the number of pages read from one host. In t
|
||||
he end this will only limit the number of hosts that can be crawled by the number of crawler nodes available. Another solution would be to store complete hosts on drive, together with the list of unresolved URLs. Again, this shifts the problem only from R
|
||||
AM to the hard drive
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}
|
||||
Something worth while would be to compress the URLs. A lot of parts of URLs are the same between hundreds of URLs (i.e. the host name). And since only a limited number of characters are allowed in URLs, Huffman compression will lead to
|
||||
a good compression rate}{\cs39\lang2057\langfe1031\super\langnp2057 \chftn {\footnote \pard\plain \s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\cs39\super
|
||||
\chftn }{\lang2057\langfe1031\langnp2057 see Randall, Stata et al.: The Link Database: Fast Access to Graphs of the Web, 2000; Witten, Moffat, Bell: Managing Gigabytes, Morgan Kaufmann 1999}}}{\lang2057\langfe1031\langnp2057
|
||||
. A first attempt would be to incorporate the visited URLs hash into the HostInfo structure.
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057 After all, the VisitedFilter hash map turned out to be the data structure that will take up most of the RAM after some time.
|
||||
\par {\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 2.3\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0
|
||||
\b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {\page {\*\bkmkstart _Toc8477602}The Filters{\*\bkmkend _Toc8477602}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
Most of the functionality of the different filters has already been described. Here\rquote s another, more detailed view}{\cs39\lang2057\langfe1031\super\langnp2057 \chftn {\footnote \pard\plain
|
||||
\s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\cs39\super \chftn }{\lang2057\langfe1031\langnp2057 this chapter will
|
||||
probably be left out in future revisions, since that information can also be found in the Javadoc and the source code. Or do you disagree?}}}{\lang2057\langfe1031\langnp2057 :
|
||||
\par {\*\bkmkstart _Toc8477603}{\listtext\pard\plain\s3 \hich\af0\dbch\af0\loch\f0 2.3.1\tab}}\pard\plain \s3\ql \fi-720\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl2\outlinelevel2\adjustright\rin0\lin0\itap0
|
||||
\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {RobotExclusionFilter{\*\bkmkend _Toc8477603}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
The first implementation of this filter just kept a list of hosts, and every time a new URLMessage with an unknown host came by, it attempted to read the robots.txt file first to determine whether the URL should be filtered.
|
||||
\par A major drawback of that was that when the server was not accessible somehow, the whole crawler was held until the connection timed out (well with Sun\rquote s classes that even didn\rquote t happen, causing the whole program to die).
|
||||
\par The second implementation has its own little ThreadPool, and keeps a state machine of each host in the HostInfo structure.
|
||||
\par If the host manager doesn\rquote t contain a HostInfo structure at all, the filter creates it and creates a task to get the robots.txt file. During this time, the host state is set to \'93isLoadingRobotsTxt\'94
|
||||
, which means further requests to that host are put into a queue. When loading is finished, these URLs (and all subsequent ones) are put back to the beginning of the queue.
|
||||
\par After this initial step, every URL that enters the filter is compared to the disallow rules set (if present), and is filtered if necessary.
|
||||
\par Since the URLs are put back to the beginning of the queue, the filter has to be put in front of the VisitedFilter.
|
||||
\par In the host info structure, which is also used by the FetcherTasks, some information about the health of the hosts is stored as well. If the server is in a bad state several times, it is excluded from
|
||||
the crawl. Note that it is possible that a server will be accessed more than the (predefined) 5 times that it can time out, since a FetcherThread may already have started to get a document when another one marks it as bad.
|
||||
\par {\*\bkmkstart _Toc8477604}{\listtext\pard\plain\s3 \hich\af0\dbch\af0\loch\f0 2.3.2\tab}}\pard\plain \s3\ql \fi-720\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl2\outlinelevel2\adjustright\rin0\lin0\itap0
|
||||
\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {URLLengthFilter{\*\bkmkend _Toc8477604}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 This very simple
|
||||
filter just filters a URL if a certain (total) length is exceeded
|
||||
\par {\*\bkmkstart _Toc8477605}{\listtext\pard\plain\s3 \hich\af0\dbch\af0\loch\f0 2.3.3\tab}}\pard\plain \s3\ql \fi-720\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl2\outlinelevel2\adjustright\rin0\lin0\itap0
|
||||
\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {KnownPathsFilter{\*\bkmkend _Toc8477605}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
This one filters some very common URLs (i.e. different views of an Apache directory index), or hosts known to make problems. Should be more configurable from outside in the future\'85
|
||||
\par {\*\bkmkstart _Toc8477606}{\listtext\pard\plain\s3 \hich\af0\dbch\af0\loch\f0 2.3.4\tab}}\pard\plain \s3\ql \fi-720\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl2\outlinelevel2\adjustright\rin0\lin0\itap0
|
||||
\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {URLScopeFilter{\*\bkmkend _Toc8477606}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 The scope filter filters a URL if it doesn\rquote
|
||||
t match a given regular expression.
|
||||
\par {\*\bkmkstart _Toc8477607}{\listtext\pard\plain\s3 \hich\af0\dbch\af0\loch\f0 2.3.5\tab}}\pard\plain \s3\ql \fi-720\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl2\outlinelevel2\adjustright\rin0\lin0\itap0
|
||||
\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {URLVisitedFilter{\*\bkmkend _Toc8477607}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
This filter keeps a HashMap of already visited URLs, and filters out what it already knows
|
||||
\par {\*\bkmkstart _Toc8477608}{\listtext\pard\plain\s3 \hich\af0\dbch\af0\loch\f0 2.3.6\tab}}\pard\plain \s3\ql \fi-720\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl2\outlinelevel2\adjustright\rin0\lin0\itap0
|
||||
\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {Fetcher{\*\bkmkend _Toc8477608}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 The fetcher itself is also a filter that filters all URLs \endash
|
||||
they are passed along to the storage as WebDocuments, in a different manner. It contains a ThreadPool that runs in its own thread of control, which takes tasks from the queue an distributes them to the different FetcherThreads.
|
||||
\par In the first implementation the fetcher would simply distribute the incoming URLs to the threads. The thread pool would use a simple queue to store the remaining tasks. But this can lead to a very \'93unpolite\'94 distribution of the tasks: Since \'be
|
||||
of the links in a
|
||||
page point to the same server, and all links of a page are added to the message handler at once, groups of successive tasks would all try to access the same server, probably causing denial of service, while other hosts present in the queue are not accesse
|
||||
d.
|
||||
\par To overcome this, the queue is divided into different parts, one for each host. Each host contains its own (caching) queue. But the methods used to pull tasks from the \'93end\'94
|
||||
of this queue cycle through the hosts and always get a URL from a different host.
|
||||
\par One major problem still remains with this technique: If one host is very slow, it can still slow down everything. Since with n host every n}{\lang2057\langfe1031\sub\langnp2057 th}{\lang2057\langfe1031\langnp2057
|
||||
task will be accessed to this host, it can eat one thread after the other if loading a document takes longer t
|
||||
han loading it from the (n-1) other servers. Then two concurrent requests will result on the same server, which slows down the response times even more, and so on. In reality, this will clog up the queue very fast. A little more work has to be done to avo
|
||||
id these situations, i.e. by limiting the number of threads that access one host at a time.
|
||||
\par {\*\bkmkstart _Toc8477609}{\listtext\pard\plain\s3 \hich\af0\dbch\af0\loch\f0 2.3.7\tab}}\pard\plain \s3\ql \fi-720\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl2\outlinelevel2\adjustright\rin0\lin0\itap0
|
||||
\fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {A Note on DNS{\*\bkmkend _Toc8477609}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
The Mercator crawler document stresses a lot on resolving host names. Because of that, a DNSResolver filter was implemented in the very first time. Two reasons prevented that it is used any more:
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls14\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
newer versions of the JDK than the one Mercator used resolve the IP address of a host the first time it is accessed, and keep a cache of already resolved host names.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}the crawler itself was designed to crawl large local networks, and not the internet. Thus, the number of hosts is very limited.
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057
|
||||
\par
|
||||
\par {\listtext\pard\plain\s1 \b\fs36\lang2057\langfe1031\kerning28\langnp2057 \hich\af0\dbch\af0\loch\f0 3\tab}}\pard\plain \s1\ql \fi-432\li0\ri0\sb240\sa60\keepn\widctlpar\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\outlinelevel0\adjustright\rin0\lin0\itap0
|
||||
\b\fs36\lang2057\langfe1031\kerning28\cgrid\langnp2057\langfenp1031 {\page {\*\bkmkstart _Toc8477610}Future Enhancements{\*\bkmkend _Toc8477610}
|
||||
\par {\*\bkmkstart _Toc8477611}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 3.1\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {\'93Politeness\'94{\*\bkmkend _Toc8477611}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
A crawler should not cause a Denial of Service attack. So this has to be addressed.
|
||||
\par {\*\bkmkstart _Toc8477612}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 3.2\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {The processing pipeline{\*\bkmkend _Toc8477612}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 The FetcherTask, as already
|
||||
stated, is very monolithic at this time. Probably some more processing should be done at this step (the problem with balanced CPU/IO usage taken into account). At least different handlers for different mime types should be provided, i.e. to extract links
|
||||
from PDF documents. The Storage should also be broken up. I only used the LogStorage within the last months, which now doesn\rquote
|
||||
t only writes to log files, but also stored the files on disk. This should probably be replaced by a storage chain where different stores could be appended.
|
||||
\par {\*\bkmkstart _Toc8477613}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 3.3\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Lucene integration{\*\bkmkend _Toc8477613}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
A very simple enhancement would be a LuceneStorage, which takes the document, parses it, and puts it into a Lucene store. But this will probably be very CPU intensive. Probably this should be done in a distributed environment.
|
||||
\par {\*\bkmkstart _Toc8477614}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 3.4\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {A Real Server{\*\bkmkend _Toc8477614}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
The only way to start a crawl today is starting the crawler from the shell. But it could also remain idle and wait for commands from an RMI connection or expose a Web Service. Monitoring could be done by a simple included web s
|
||||
erver that provides current statistics via HTML
|
||||
\par {\*\bkmkstart _Toc8477615}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 3.5\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Distribution{\*\bkmkend _Toc8477615}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 Distribution is a big issue. Some people say \'93
|
||||
Distribute your program late. And then later.\'94 But as others have implemented distributed crawlers, this should not be very hard.
|
||||
\par I see two possible architectures for that:
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls14\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057
|
||||
Write a single dispatcher (a star network) that contains the whole MessageHandler except the Fetcher itself. The crawlers are run as servers (see above), and are configured with a URL source that gets their input from the dispatcher
|
||||
and a MessageHandler that stores URLs back to the dispatcher. The main drawback being that this can become a bottleneck.
|
||||
\par {\listtext\pard\plain\fs22\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Partition the domain to be crawled into several parts. This could be done for example by dividing up different intervals of the hash v
|
||||
alue of the host names. Then plugging in another crawler could be done dynamically, even within a peer to peer network. Each node knows which node is responsible for which interval, and sends all URLs to the right node. This could even be implemented as a
|
||||
filter.
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057 One thing to keep in mind is that the number of URLs transferred to other nodes should be as large as possible.
|
||||
\par The next thing to be distributed is the storage mechanism. Here, the number of pure crawling nodes and the number of storing (post processing) nodes could possibly diverge. An issue here is that the whole documents have to be transferred over the net.
|
||||
|
||||
\par {\*\bkmkstart _Toc8477616}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 3.6\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {URL Reordering{\*\bkmkend _Toc8477616}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 One paper discussed different types of reordering URLs while crawling}{
|
||||
\cs39\lang2057\langfe1031\super\langnp2057 \chftn {\footnote \pard\plain \s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\cs39\super \chftn }{
|
||||
\lang2057\langfe1031\langnp2057 see J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through url ordering. In Proc. 7th Intl. World Wide Web Conference, Brisbane, Australia, 1998}}}{\lang2057\langfe1031\langnp2057
|
||||
. One of the most promising attempts was to take the calculated PageRank into account}{\cs39\lang2057\langfe1031\super\langnp2057 \chftn {\footnote \pard\plain \s38\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\f31\fs20\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\cs39\super \chftn }{\lang2057\langfe1031\langnp2057 see Brin, S., Page, L.: The Anatomy of a large scale Hypertextual Web Search Engine, 1998}}}{\lang2057\langfe1031\langnp2057
|
||||
. Crawling pages with higher PageRanks first seemed to get important pages earlier. Yes, this is not rocket science, folks, the research was already done years ago.
|
||||
\par
|
||||
\par {\*\bkmkstart _Toc8477617}{\listtext\pard\plain\s2 \b\fs28\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 3.7\tab}}\pard\plain \s2\ql \fi-578\li0\ri0\sb480\sa60\keepn\widctlpar
|
||||
\jclisttab\tx0\aspalpha\aspnum\faauto\ls8\ilvl1\outlinelevel1\adjustright\rin0\lin0\itap0 \b\fs28\lang2057\langfe1031\cgrid\langnp2057\langfenp1031 {Recovery{\*\bkmkend _Toc8477617}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \f31\fs22\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 At the moment there is no way of stopping and restarting a crawl.
|
||||
\par
|
||||
\par
|
||||
\par }}
|
|
@ -1,2 +0,0 @@
|
|||
AnyObjectId[d34fb76425342af72284e9f06c085244aa8dfe27] was removed in git history.
|
||||
Apache SVN contains full history.
|
|
@ -1,278 +0,0 @@
|
|||
/*
|
||||
* @(#)ContentEncodingModule.java 0.3-3 06/05/2001
|
||||
*
|
||||
* This file is part of the HTTPClient package
|
||||
* Copyright (C) 1996-2001 Ronald Tschalär
|
||||
*
|
||||
* This library is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU Lesser General Public
|
||||
* License as published by the Free Software Foundation; either
|
||||
* version 2 of the License, or (at your option) any later version.
|
||||
*
|
||||
* This library is distributed in the hope that it will be useful,
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
* Lesser General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU Lesser General Public
|
||||
* License along with this library; if not, write to the Free
|
||||
* Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
* MA 02111-1307, USA
|
||||
*
|
||||
* For questions, suggestions, bug-reports, enhancement-requests etc.
|
||||
* I may be contacted at:
|
||||
*
|
||||
* ronald@innovation.ch
|
||||
*
|
||||
* The HTTPClient's home page is located at:
|
||||
*
|
||||
* http://www.innovation.ch/java/HTTPClient/
|
||||
*
|
||||
*/
|
||||
package HTTPClient;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Vector;
|
||||
import java.util.zip.InflaterInputStream;
|
||||
import java.util.zip.GZIPInputStream;
|
||||
|
||||
/**
|
||||
* This module handles the Content-Encoding response header. It currently
|
||||
* handles the "gzip", "deflate", "compress" and "identity" tokens.
|
||||
*
|
||||
* @author Ronald Tschalär
|
||||
* @created 29. Dezember 2001
|
||||
* @version 0.3-3 06/05/2001
|
||||
*/
|
||||
public class ContentEncodingModule implements HTTPClientModule
|
||||
{
|
||||
// Methods
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param req Description of the Parameter
|
||||
* @param resp Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
* @exception ModuleException Description of the Exception
|
||||
*/
|
||||
public int requestHandler(Request req, Response[] resp)
|
||||
throws ModuleException
|
||||
{
|
||||
// parse Accept-Encoding header
|
||||
|
||||
int idx;
|
||||
NVPair[] hdrs = req.getHeaders();
|
||||
for (idx = 0; idx < hdrs.length; idx++)
|
||||
{
|
||||
if (hdrs[idx].getName().equalsIgnoreCase("Accept-Encoding"))
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
Vector pae;
|
||||
if (idx == hdrs.length)
|
||||
{
|
||||
hdrs = Util.resizeArray(hdrs, idx + 1);
|
||||
req.setHeaders(hdrs);
|
||||
pae = new Vector();
|
||||
}
|
||||
else
|
||||
{
|
||||
try
|
||||
{
|
||||
pae = Util.parseHeader(hdrs[idx].getValue());
|
||||
}
|
||||
catch (ParseException pe)
|
||||
{
|
||||
throw new ModuleException(pe.toString());
|
||||
}
|
||||
}
|
||||
|
||||
// done if "*;q=1.0" present
|
||||
|
||||
HttpHeaderElement all = Util.getElement(pae, "*");
|
||||
if (all != null)
|
||||
{
|
||||
NVPair[] params = all.getParams();
|
||||
for (idx = 0; idx < params.length; idx++)
|
||||
{
|
||||
if (params[idx].getName().equalsIgnoreCase("q"))
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (idx == params.length)
|
||||
{
|
||||
// no qvalue, i.e. q=1.0
|
||||
return REQ_CONTINUE;
|
||||
}
|
||||
|
||||
if (params[idx].getValue() == null ||
|
||||
params[idx].getValue().length() == 0)
|
||||
{
|
||||
throw new ModuleException("Invalid q value for \"*\" in " +
|
||||
"Accept-Encoding header: ");
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
if (Float.valueOf(params[idx].getValue()).floatValue() > 0.)
|
||||
{
|
||||
return REQ_CONTINUE;
|
||||
}
|
||||
}
|
||||
catch (NumberFormatException nfe)
|
||||
{
|
||||
throw new ModuleException("Invalid q value for \"*\" in " +
|
||||
"Accept-Encoding header: " + nfe.getMessage());
|
||||
}
|
||||
}
|
||||
|
||||
// Add gzip, deflate and compress tokens to the Accept-Encoding header
|
||||
|
||||
if (!pae.contains(new HttpHeaderElement("deflate")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("deflate"));
|
||||
}
|
||||
if (!pae.contains(new HttpHeaderElement("gzip")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("gzip"));
|
||||
}
|
||||
if (!pae.contains(new HttpHeaderElement("x-gzip")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("x-gzip"));
|
||||
}
|
||||
if (!pae.contains(new HttpHeaderElement("compress")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("compress"));
|
||||
}
|
||||
if (!pae.contains(new HttpHeaderElement("x-compress")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("x-compress"));
|
||||
}
|
||||
|
||||
hdrs[idx] = new NVPair("Accept-Encoding", Util.assembleHeader(pae));
|
||||
|
||||
return REQ_CONTINUE;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param resp Description of the Parameter
|
||||
* @param req Description of the Parameter
|
||||
*/
|
||||
public void responsePhase1Handler(Response resp, RoRequest req)
|
||||
{
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param resp Description of the Parameter
|
||||
* @param req Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public int responsePhase2Handler(Response resp, Request req)
|
||||
{
|
||||
return RSP_CONTINUE;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param resp Description of the Parameter
|
||||
* @param req Description of the Parameter
|
||||
* @exception IOException Description of the Exception
|
||||
* @exception ModuleException Description of the Exception
|
||||
*/
|
||||
public void responsePhase3Handler(Response resp, RoRequest req)
|
||||
throws IOException, ModuleException
|
||||
{
|
||||
String ce = resp.getHeader("Content-Encoding");
|
||||
if (ce == null || req.getMethod().equals("HEAD") ||
|
||||
resp.getStatusCode() == 206)
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
Vector pce;
|
||||
try
|
||||
{
|
||||
pce = Util.parseHeader(ce);
|
||||
}
|
||||
catch (ParseException pe)
|
||||
{
|
||||
throw new ModuleException(pe.toString());
|
||||
}
|
||||
|
||||
if (pce.size() == 0)
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
String encoding = ((HttpHeaderElement) pce.firstElement()).getName();
|
||||
if (encoding.equalsIgnoreCase("gzip") ||
|
||||
encoding.equalsIgnoreCase("x-gzip"))
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: pushing gzip-input-stream");
|
||||
|
||||
resp.inp_stream = new GZIPInputStream(resp.inp_stream);
|
||||
pce.removeElementAt(pce.size() - 1);
|
||||
resp.deleteHeader("Content-length");
|
||||
}
|
||||
else if (encoding.equalsIgnoreCase("deflate"))
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: pushing inflater-input-stream");
|
||||
|
||||
resp.inp_stream = new InflaterInputStream(resp.inp_stream);
|
||||
pce.removeElementAt(pce.size() - 1);
|
||||
resp.deleteHeader("Content-length");
|
||||
}
|
||||
else if (encoding.equalsIgnoreCase("compress") ||
|
||||
encoding.equalsIgnoreCase("x-compress"))
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: pushing uncompress-input-stream");
|
||||
|
||||
resp.inp_stream = new UncompressInputStream(resp.inp_stream);
|
||||
pce.removeElementAt(pce.size() - 1);
|
||||
resp.deleteHeader("Content-length");
|
||||
}
|
||||
else if (encoding.equalsIgnoreCase("identity"))
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: ignoring 'identity' token");
|
||||
pce.removeElementAt(pce.size() - 1);
|
||||
}
|
||||
else
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: Unknown content encoding '" +
|
||||
encoding + "'");
|
||||
}
|
||||
|
||||
if (pce.size() > 0)
|
||||
{
|
||||
resp.setHeader("Content-Encoding", Util.assembleHeader(pce));
|
||||
}
|
||||
else
|
||||
{
|
||||
resp.deleteHeader("Content-Encoding");
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param resp Description of the Parameter
|
||||
* @param req Description of the Parameter
|
||||
*/
|
||||
public void trailerHandler(Response resp, RoRequest req)
|
||||
{
|
||||
}
|
||||
}
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -1,9 +0,0 @@
|
|||
This directory contains a patch to the HTTPClient which adds a maximum bytes
|
||||
attribute to the getData() method. This method retrieves a file at once but
|
||||
would crash the server if the remote file was too large.
|
||||
|
||||
To compile it, download HTTPClient.jar from http://www.innovation.ch, extract it,
|
||||
overwrite the source files with the ones in this folder, and compile it.
|
||||
|
||||
The file ../HTTPClient-0.3.3-patched.jar already contains a patched version
|
||||
|
|
@ -1,2 +0,0 @@
|
|||
AnyObjectId[de6bd859d1cfbb70c2374ec4a9b8cb2e00cb18da] was removed in git history.
|
||||
Apache SVN contains full history.
|
|
@ -1,2 +0,0 @@
|
|||
AnyObjectId[6f2c19103cb79576ac58437c18a3cf2d272ec422] was removed in git history.
|
||||
Apache SVN contains full history.
|
|
@ -1,2 +0,0 @@
|
|||
AnyObjectId[c64e88c4dc1601a42658356c5a581243618fceea] was removed in git history.
|
||||
Apache SVN contains full history.
|
|
@ -1,2 +0,0 @@
|
|||
AnyObjectId[6eafa4a78ed116ae83df3c0da41f971742ee9c37] was removed in git history.
|
||||
Apache SVN contains full history.
|
|
@ -1,2 +0,0 @@
|
|||
AnyObjectId[e899308968d6b3260bc7990c843bf79a097b7be4] was removed in git history.
|
||||
Apache SVN contains full history.
|
|
@ -1,278 +0,0 @@
|
|||
/*
|
||||
* @(#)ContentEncodingModule.java 0.3-3 06/05/2001
|
||||
*
|
||||
* This file is part of the HTTPClient package
|
||||
* Copyright (C) 1996-2001 Ronald Tschalär
|
||||
*
|
||||
* This library is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU Lesser General Public
|
||||
* License as published by the Free Software Foundation; either
|
||||
* version 2 of the License, or (at your option) any later version.
|
||||
*
|
||||
* This library is distributed in the hope that it will be useful,
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
* Lesser General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU Lesser General Public
|
||||
* License along with this library; if not, write to the Free
|
||||
* Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
* MA 02111-1307, USA
|
||||
*
|
||||
* For questions, suggestions, bug-reports, enhancement-requests etc.
|
||||
* I may be contacted at:
|
||||
*
|
||||
* ronald@innovation.ch
|
||||
*
|
||||
* The HTTPClient's home page is located at:
|
||||
*
|
||||
* http://www.innovation.ch/java/HTTPClient/
|
||||
*
|
||||
*/
|
||||
package HTTPClient;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Vector;
|
||||
import java.util.zip.InflaterInputStream;
|
||||
import java.util.zip.GZIPInputStream;
|
||||
|
||||
/**
|
||||
* This module handles the Content-Encoding response header. It currently
|
||||
* handles the "gzip", "deflate", "compress" and "identity" tokens.
|
||||
*
|
||||
* @author Ronald Tschalär
|
||||
* @created 29. Dezember 2001
|
||||
* @version 0.3-3 06/05/2001
|
||||
*/
|
||||
public class ContentEncodingModule implements HTTPClientModule
|
||||
{
|
||||
// Methods
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param req Description of the Parameter
|
||||
* @param resp Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
* @exception ModuleException Description of the Exception
|
||||
*/
|
||||
public int requestHandler(Request req, Response[] resp)
|
||||
throws ModuleException
|
||||
{
|
||||
// parse Accept-Encoding header
|
||||
|
||||
int idx;
|
||||
NVPair[] hdrs = req.getHeaders();
|
||||
for (idx = 0; idx < hdrs.length; idx++)
|
||||
{
|
||||
if (hdrs[idx].getName().equalsIgnoreCase("Accept-Encoding"))
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
Vector pae;
|
||||
if (idx == hdrs.length)
|
||||
{
|
||||
hdrs = Util.resizeArray(hdrs, idx + 1);
|
||||
req.setHeaders(hdrs);
|
||||
pae = new Vector();
|
||||
}
|
||||
else
|
||||
{
|
||||
try
|
||||
{
|
||||
pae = Util.parseHeader(hdrs[idx].getValue());
|
||||
}
|
||||
catch (ParseException pe)
|
||||
{
|
||||
throw new ModuleException(pe.toString());
|
||||
}
|
||||
}
|
||||
|
||||
// done if "*;q=1.0" present
|
||||
|
||||
HttpHeaderElement all = Util.getElement(pae, "*");
|
||||
if (all != null)
|
||||
{
|
||||
NVPair[] params = all.getParams();
|
||||
for (idx = 0; idx < params.length; idx++)
|
||||
{
|
||||
if (params[idx].getName().equalsIgnoreCase("q"))
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (idx == params.length)
|
||||
{
|
||||
// no qvalue, i.e. q=1.0
|
||||
return REQ_CONTINUE;
|
||||
}
|
||||
|
||||
if (params[idx].getValue() == null ||
|
||||
params[idx].getValue().length() == 0)
|
||||
{
|
||||
throw new ModuleException("Invalid q value for \"*\" in " +
|
||||
"Accept-Encoding header: ");
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
if (Float.valueOf(params[idx].getValue()).floatValue() > 0.)
|
||||
{
|
||||
return REQ_CONTINUE;
|
||||
}
|
||||
}
|
||||
catch (NumberFormatException nfe)
|
||||
{
|
||||
throw new ModuleException("Invalid q value for \"*\" in " +
|
||||
"Accept-Encoding header: " + nfe.getMessage());
|
||||
}
|
||||
}
|
||||
|
||||
// Add gzip, deflate and compress tokens to the Accept-Encoding header
|
||||
|
||||
if (!pae.contains(new HttpHeaderElement("deflate")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("deflate"));
|
||||
}
|
||||
if (!pae.contains(new HttpHeaderElement("gzip")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("gzip"));
|
||||
}
|
||||
if (!pae.contains(new HttpHeaderElement("x-gzip")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("x-gzip"));
|
||||
}
|
||||
if (!pae.contains(new HttpHeaderElement("compress")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("compress"));
|
||||
}
|
||||
if (!pae.contains(new HttpHeaderElement("x-compress")))
|
||||
{
|
||||
pae.addElement(new HttpHeaderElement("x-compress"));
|
||||
}
|
||||
|
||||
hdrs[idx] = new NVPair("Accept-Encoding", Util.assembleHeader(pae));
|
||||
|
||||
return REQ_CONTINUE;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param resp Description of the Parameter
|
||||
* @param req Description of the Parameter
|
||||
*/
|
||||
public void responsePhase1Handler(Response resp, RoRequest req)
|
||||
{
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param resp Description of the Parameter
|
||||
* @param req Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public int responsePhase2Handler(Response resp, Request req)
|
||||
{
|
||||
return RSP_CONTINUE;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param resp Description of the Parameter
|
||||
* @param req Description of the Parameter
|
||||
* @exception IOException Description of the Exception
|
||||
* @exception ModuleException Description of the Exception
|
||||
*/
|
||||
public void responsePhase3Handler(Response resp, RoRequest req)
|
||||
throws IOException, ModuleException
|
||||
{
|
||||
String ce = resp.getHeader("Content-Encoding");
|
||||
if (ce == null || req.getMethod().equals("HEAD") ||
|
||||
resp.getStatusCode() == 206)
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
Vector pce;
|
||||
try
|
||||
{
|
||||
pce = Util.parseHeader(ce);
|
||||
}
|
||||
catch (ParseException pe)
|
||||
{
|
||||
throw new ModuleException(pe.toString());
|
||||
}
|
||||
|
||||
if (pce.size() == 0)
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
String encoding = ((HttpHeaderElement) pce.firstElement()).getName();
|
||||
if (encoding.equalsIgnoreCase("gzip") ||
|
||||
encoding.equalsIgnoreCase("x-gzip"))
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: pushing gzip-input-stream");
|
||||
|
||||
resp.inp_stream = new GZIPInputStream(resp.inp_stream);
|
||||
pce.removeElementAt(pce.size() - 1);
|
||||
resp.deleteHeader("Content-length");
|
||||
}
|
||||
else if (encoding.equalsIgnoreCase("deflate"))
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: pushing inflater-input-stream");
|
||||
|
||||
resp.inp_stream = new InflaterInputStream(resp.inp_stream);
|
||||
pce.removeElementAt(pce.size() - 1);
|
||||
resp.deleteHeader("Content-length");
|
||||
}
|
||||
else if (encoding.equalsIgnoreCase("compress") ||
|
||||
encoding.equalsIgnoreCase("x-compress"))
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: pushing uncompress-input-stream");
|
||||
|
||||
resp.inp_stream = new UncompressInputStream(resp.inp_stream);
|
||||
pce.removeElementAt(pce.size() - 1);
|
||||
resp.deleteHeader("Content-length");
|
||||
}
|
||||
else if (encoding.equalsIgnoreCase("identity"))
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: ignoring 'identity' token");
|
||||
pce.removeElementAt(pce.size() - 1);
|
||||
}
|
||||
else
|
||||
{
|
||||
Log.write(Log.MODS, "CEM: Unknown content encoding '" +
|
||||
encoding + "'");
|
||||
}
|
||||
|
||||
if (pce.size() > 0)
|
||||
{
|
||||
resp.setHeader("Content-Encoding", Util.assembleHeader(pce));
|
||||
}
|
||||
else
|
||||
{
|
||||
resp.deleteHeader("Content-Encoding");
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Invoked by the HTTPClient.
|
||||
*
|
||||
* @param resp Description of the Parameter
|
||||
* @param req Description of the Parameter
|
||||
*/
|
||||
public void trailerHandler(Response resp, RoRequest req)
|
||||
{
|
||||
}
|
||||
}
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -1,85 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
/**
|
||||
* contains all global constants used in this package
|
||||
* @version $Id$
|
||||
*
|
||||
*/
|
||||
public class Constants
|
||||
{
|
||||
|
||||
/**
|
||||
* user agent string a fetcher task gives to the corresponding server
|
||||
*/
|
||||
public static final String USER_AGENT = "Mozilla/4.06 [en] (WinNT; I)";
|
||||
|
||||
/**
|
||||
* Crawler Identification
|
||||
*/
|
||||
public static final String CRAWLER_AGENT = "Fetcher/0.95";
|
||||
|
||||
/**
|
||||
* size of the temporary buffer to read web documents in
|
||||
*/
|
||||
public final static int FETCHERTASK_READSIZE = 4096;
|
||||
|
||||
/**
|
||||
* don't read more than... bytes
|
||||
*/
|
||||
public final static int FETCHERTASK_MAXFILESIZE = 2000000;
|
||||
|
||||
}
|
|
@ -1,119 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.util.*;
|
||||
import java.net.*;
|
||||
|
||||
/**
|
||||
* filter class; gets IP Adresses from host names and forwards them to
|
||||
* the other parts of the application
|
||||
* since URLs cache their IP addresses themselves, and HTTP 1.1 needs the
|
||||
* host names to be sent to the server, this class is not used anymore
|
||||
* @version $Id$
|
||||
*/
|
||||
public class DNSResolver implements MessageListener
|
||||
{
|
||||
|
||||
HashMap ipCache = new HashMap();
|
||||
|
||||
|
||||
public DNSResolver()
|
||||
{
|
||||
}
|
||||
|
||||
public void notifyAddedToMessageHandler(MessageHandler m)
|
||||
{
|
||||
this.messageHandler = m;
|
||||
}
|
||||
|
||||
MessageHandler messageHandler;
|
||||
|
||||
public Message handleRequest(Message message)
|
||||
{
|
||||
if(message instanceof URLMessage)
|
||||
{
|
||||
URL url = ((URLMessage)message).getUrl();
|
||||
String host = url.getHost();
|
||||
InetAddress ip;
|
||||
/*InetAddress ip = (InetAddress)ipCache.get(host);
|
||||
|
||||
if(ip == null)
|
||||
{
|
||||
*/
|
||||
|
||||
try
|
||||
{
|
||||
ip = InetAddress.getByName(host);
|
||||
/*
|
||||
ipCache.put(host, ip);
|
||||
//System.out.println("DNSResolver: new Cache Entry \"" + host + "\" = \"" + ip.getHostAddress() + "\"");*/
|
||||
}
|
||||
catch(UnknownHostException e)
|
||||
{
|
||||
ip = null;
|
||||
return null;
|
||||
//System.out.println("DNSResolver: unknown host \"" + host + "\"");
|
||||
}
|
||||
/*}
|
||||
else
|
||||
{
|
||||
//System.out.println("DNSResolver: Cache hit: " + ip.getHostAddress());
|
||||
}*/
|
||||
//((URLMessage)message).setIpAddress(ip);
|
||||
}
|
||||
return message;
|
||||
}
|
||||
}
|
|
@ -1,280 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import de.lanlab.larm.threads.ThreadPool;
|
||||
import de.lanlab.larm.threads.ThreadPoolObserver;
|
||||
import de.lanlab.larm.threads.InterruptableTask;
|
||||
import de.lanlab.larm.storage.*;
|
||||
|
||||
import java.net.MalformedURLException;
|
||||
import java.net.URL;
|
||||
import java.util.LinkedList;
|
||||
|
||||
import de.lanlab.larm.fetcher.FetcherTask;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* filter class; the Fetcher is the main class which keeps the ThreadPool that
|
||||
* gets the documents. It should be placed at the very end of the MessageQueue,
|
||||
* so that all filtering can be made beforehand.
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @version $Id$
|
||||
*/
|
||||
|
||||
public class Fetcher implements MessageListener
|
||||
{
|
||||
/**
|
||||
* holds the threads
|
||||
*/
|
||||
ThreadPool fetcherPool;
|
||||
|
||||
/**
|
||||
* total number of docs read
|
||||
*/
|
||||
int docsRead = 0;
|
||||
|
||||
/**
|
||||
* the storage where the docs are saved to
|
||||
*/
|
||||
DocumentStorage storage;
|
||||
|
||||
/**
|
||||
* the storage where the links are saved to
|
||||
*/
|
||||
LinkStorage linkStorage;
|
||||
|
||||
/**
|
||||
* the host manager keeps track of host information
|
||||
*/
|
||||
HostManager hostManager;
|
||||
|
||||
|
||||
/**
|
||||
* initializes the fetcher with the given number of threads in the thread
|
||||
* pool and a document storage.
|
||||
*
|
||||
* @param maxThreads the number of threads in the ThreadPool
|
||||
* @param storage the storage where all documents are stored
|
||||
* @param hostManager the host manager
|
||||
*/
|
||||
public Fetcher(int maxThreads, DocumentStorage docStorage, LinkStorage linkStorage, HostManager hostManager)
|
||||
{
|
||||
this.storage = storage;
|
||||
this.linkStorage = linkStorage;
|
||||
FetcherTask.setDocStorage(docStorage);
|
||||
FetcherTask.setLinkStorage(linkStorage);
|
||||
fetcherPool = new ThreadPool(maxThreads, new FetcherThreadFactory(hostManager));
|
||||
fetcherPool.setQueue(new FetcherTaskQueue(hostManager));
|
||||
docsRead = 0;
|
||||
this.hostManager = hostManager;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* initializes the pool with default values (5 threads, NullStorage)
|
||||
*/
|
||||
public void init()
|
||||
{
|
||||
fetcherPool.init();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* initializes the pool with a NullStorage and the given number of threads
|
||||
*
|
||||
* @param maxThreads the number of threads in the thread pool
|
||||
*/
|
||||
public void init(int maxThreads)
|
||||
{
|
||||
fetcherPool.init();
|
||||
docsRead = 0;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this function will be called by the message handler each time a URL
|
||||
* passes all filters and gets to the fetcher. From here, it will be
|
||||
* distributed to the FetcherPool, a thread pool which carries out the task,
|
||||
* that is to fetch the document from the web.
|
||||
*
|
||||
* @param message the message, which should actually be a URLMessage
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public Message handleRequest(Message message)
|
||||
{
|
||||
URLMessage urlMessage = (URLMessage) message;
|
||||
|
||||
fetcherPool.doTask(new FetcherTask(urlMessage), "");
|
||||
docsRead++;
|
||||
|
||||
// eat the message
|
||||
return null;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* called by the message handler when this object is added to it
|
||||
*
|
||||
* @param handler the message handler
|
||||
*/
|
||||
public void notifyAddedToMessageHandler(MessageHandler handler)
|
||||
{
|
||||
this.messageHandler = handler;
|
||||
FetcherTask.setMessageHandler(handler);
|
||||
}
|
||||
|
||||
|
||||
MessageHandler messageHandler;
|
||||
|
||||
|
||||
/**
|
||||
* the thread pool observer will be called each time a thread changes its
|
||||
* state, i.e. from IDLE to RUNNING, and each time the number of thread
|
||||
* queue entries change.
|
||||
* this just wraps the thread pool method
|
||||
*
|
||||
* @param t the class that implements the ThreadPoolObserver interface
|
||||
*/
|
||||
public void addThreadPoolObserver(ThreadPoolObserver t)
|
||||
{
|
||||
fetcherPool.addThreadPoolObserver(t);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* returns the number of tasks queued. Should return 0 if there are any idle
|
||||
* threads. this method just wraps the ThreadPool method
|
||||
*
|
||||
* @return The queueSize value
|
||||
*/
|
||||
public int getQueueSize()
|
||||
{
|
||||
return fetcherPool.getQueueSize();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* get the total number of threads.
|
||||
* this method just wraps the ThreadPool method
|
||||
*
|
||||
* @return The workingThreadsCount value
|
||||
*/
|
||||
public int getWorkingThreadsCount()
|
||||
{
|
||||
return fetcherPool.getIdleThreadsCount() + fetcherPool.getBusyThreadsCount();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* get the number of threads that are currently idle.
|
||||
* this method just wraps the ThreadPool method
|
||||
*
|
||||
* @return The idleThreadsCount value
|
||||
*/
|
||||
public int getIdleThreadsCount()
|
||||
{
|
||||
return fetcherPool.getIdleThreadsCount();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* get the number of threads that are currently busy.
|
||||
* this method just wraps the ThreadPool method
|
||||
*
|
||||
* @return The busyThreadsCount value
|
||||
*/
|
||||
public int getBusyThreadsCount()
|
||||
{
|
||||
return fetcherPool.getBusyThreadsCount();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the threadPool attribute of the Fetcher object
|
||||
* beware: the original object is returned
|
||||
*
|
||||
* @TODO remove this / make it private if possible
|
||||
* @return The threadPool value
|
||||
*/
|
||||
public ThreadPool getThreadPool()
|
||||
{
|
||||
return fetcherPool;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the total number of docs read
|
||||
*
|
||||
* @return number of docs read
|
||||
*/
|
||||
public int getDocsRead()
|
||||
{
|
||||
return docsRead;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* returns the (original) task queue
|
||||
* @TODO remove this if possible
|
||||
* @return The taskQueue value
|
||||
*/
|
||||
public FetcherTaskQueue getTaskQueue()
|
||||
{
|
||||
return (FetcherTaskQueue) this.fetcherPool.getTaskQueue();
|
||||
}
|
||||
}
|
|
@ -1,205 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.awt.event.ActionListener;
|
||||
import java.awt.event.ActionEvent;
|
||||
import java.net.MalformedURLException;
|
||||
import java.net.URL;
|
||||
import java.util.*;
|
||||
import java.awt.event.*;
|
||||
import de.lanlab.larm.gui.*;
|
||||
import de.lanlab.larm.threads.*;
|
||||
|
||||
/**
|
||||
* this was used to connect the GUI to the fetcher
|
||||
* @TODO put this into the GUI package, probably?
|
||||
* @version $Id$
|
||||
*/
|
||||
public class FetcherGUIController implements ActionListener
|
||||
{
|
||||
FetcherMain fetcherMain;
|
||||
FetcherSummaryFrame fetcherFrame;
|
||||
|
||||
|
||||
public FetcherGUIController(FetcherMain fetcherMainPrg, FetcherSummaryFrame fetcherFrameWin, String defaultStartURL)
|
||||
{
|
||||
this.fetcherMain = fetcherMainPrg;
|
||||
this.fetcherFrame = fetcherFrameWin;
|
||||
|
||||
fetcherFrame.setRestrictTo(fetcherMain.urlScopeFilter.getRexString());
|
||||
fetcherFrame.setStartURL(defaultStartURL);
|
||||
|
||||
fetcherMain.fetcher.addThreadPoolObserver(
|
||||
new ThreadPoolObserver()
|
||||
{
|
||||
public void threadUpdate(int threadNr, String action, String info)
|
||||
{
|
||||
String status = threadNr + ": " + action + ": " + info;
|
||||
fetcherFrame.setIdleThreadsCount(fetcherMain.fetcher.getIdleThreadsCount());
|
||||
fetcherFrame.setBusyThreadsCount(fetcherMain.fetcher.getBusyThreadsCount());
|
||||
fetcherFrame.setWorkingThreadsCount(fetcherMain.fetcher.getWorkingThreadsCount());
|
||||
}
|
||||
|
||||
public void queueUpdate(String info, String action)
|
||||
{
|
||||
fetcherFrame.setRequestQueueCount(fetcherMain.fetcher.getQueueSize());
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
fetcherMain.monitor.addObserver(new Observer()
|
||||
{
|
||||
public void update(Observable o, Object arg)
|
||||
{
|
||||
// der ThreadMonitor wurde geupdated
|
||||
//fetcherFrame.setStalledThreads(fetcherMain.monitor.getStalledThreadCount(10, 500.0));
|
||||
//fetcherFrame.setBytesPerSecond(fetcherMain.monitor.getAverageReadCount(5));
|
||||
// fetcherFrame.setDocsPerSecond(fetcherMain.monitor.getDocsPerSecond(5));
|
||||
// wir nutzen die Gelegenheit, den aktuellen Speicherbestand auszugeben
|
||||
fetcherFrame.setFreeMem(Runtime.getRuntime().freeMemory());
|
||||
fetcherFrame.setTotalMem(Runtime.getRuntime().totalMemory());
|
||||
|
||||
}
|
||||
|
||||
});
|
||||
|
||||
/* fetcherMain.reFilter.addObserver(
|
||||
new Observer()
|
||||
{
|
||||
public void update(Observable o, Object arg)
|
||||
{
|
||||
fetcherFrame.setRobotsTxtCount(fetcherMain.reFilter.getExcludingHostsCount());
|
||||
}
|
||||
}
|
||||
);*/
|
||||
|
||||
fetcherMain.messageHandler.addMessageQueueObserver(new Observer()
|
||||
{
|
||||
public void update(Observable o, Object arg)
|
||||
{
|
||||
// a message has been added or deleted
|
||||
|
||||
fetcherFrame.setURLsQueued(fetcherMain.messageHandler.getQueued());
|
||||
}
|
||||
|
||||
}
|
||||
);
|
||||
|
||||
// this observer will be called if a filter has decided to throw a
|
||||
// message away.
|
||||
fetcherMain.messageHandler.addMessageProcessorObserver(new Observer()
|
||||
{
|
||||
public void update(Observable o, Object arg)
|
||||
{
|
||||
if(arg == fetcherMain.urlScopeFilter)
|
||||
{
|
||||
fetcherFrame.setScopeFiltered(fetcherMain.urlScopeFilter.getFiltered());
|
||||
}
|
||||
else if(arg == fetcherMain.urlVisitedFilter)
|
||||
{
|
||||
fetcherFrame.setVisitedFiltered(fetcherMain.urlVisitedFilter.getFiltered());
|
||||
}
|
||||
else if(arg == fetcherMain.reFilter)
|
||||
{
|
||||
fetcherFrame.setURLsCaughtCount(fetcherMain.reFilter.getFiltered());
|
||||
}
|
||||
else // it's the fetcher
|
||||
{
|
||||
fetcherFrame.setDocsRead(fetcherMain.fetcher.getDocsRead());
|
||||
}
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
fetcherFrame.addWindowListener(
|
||||
new WindowAdapter()
|
||||
{
|
||||
public void windowClosed(WindowEvent e)
|
||||
{
|
||||
System.out.println("window Closed");
|
||||
System.exit(0);
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
);
|
||||
|
||||
fetcherFrame.addStartButtonListener((ActionListener)this);
|
||||
}
|
||||
|
||||
/**
|
||||
* will be called when the start button is pressed
|
||||
*/
|
||||
public void actionPerformed(ActionEvent e)
|
||||
{
|
||||
System.out.println("Füge Start-URL ein");
|
||||
try
|
||||
{
|
||||
// urlVisitedFilter.printAllURLs();
|
||||
// urlVisitedFilter.clearHashtable();
|
||||
fetcherMain.setRexString(fetcherFrame.getRestrictTo());
|
||||
fetcherMain.startMonitor();
|
||||
fetcherMain.putURL(new URL(fetcherFrame.getStartURL()), false);
|
||||
}
|
||||
catch(Exception ex)
|
||||
{
|
||||
System.out.println("actionPerformed: Exception: " + ex.getMessage());
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
|
@ -1,571 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import de.lanlab.larm.threads.ThreadPoolObserver;
|
||||
import de.lanlab.larm.threads.ThreadPool;
|
||||
import de.lanlab.larm.util.*;
|
||||
import de.lanlab.larm.storage.*;
|
||||
import de.lanlab.larm.net.*;
|
||||
import HTTPClient.*;
|
||||
import org.apache.oro.text.regex.MalformedPatternException;
|
||||
import java.io.*;
|
||||
import java.net.MalformedURLException;
|
||||
import java.net.URL;
|
||||
import java.util.*;
|
||||
|
||||
|
||||
/**
|
||||
* ENTRY POINT: this class contains the main()-method of the application, does
|
||||
* all the initializing and optionally connects the fetcher with the GUI.
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created December 16, 2000
|
||||
* @version $Id$
|
||||
*/
|
||||
public class FetcherMain
|
||||
{
|
||||
|
||||
/**
|
||||
* the main message pipeline
|
||||
*/
|
||||
protected MessageHandler messageHandler;
|
||||
|
||||
/**
|
||||
* this filter records all incoming URLs and filters everything it already
|
||||
* knows
|
||||
*/
|
||||
protected URLVisitedFilter urlVisitedFilter;
|
||||
|
||||
/**
|
||||
* the scope filter filters URLs that fall out of the scope given by the
|
||||
* regular expression
|
||||
*/
|
||||
protected URLScopeFilter urlScopeFilter;
|
||||
|
||||
/*
|
||||
* The DNS resolver was supposed to hold the host addresses for all hosts
|
||||
* this is done by URL itself today
|
||||
*
|
||||
* protected DNSResolver dnsResolver;
|
||||
*/
|
||||
|
||||
/**
|
||||
* the robot exclusion filter looks if a robots.txt is present on a host
|
||||
* before it is first accessed
|
||||
*/
|
||||
protected RobotExclusionFilter reFilter;
|
||||
|
||||
/**
|
||||
* the host manager keeps track of all hosts and is used by the filters.
|
||||
*/
|
||||
protected HostManager hostManager;
|
||||
|
||||
/**
|
||||
* the host resolver can change a host that occurs within a URL to a different
|
||||
* host, depending on the rules specified in a configuration file
|
||||
*/
|
||||
protected HostResolver hostResolver;
|
||||
|
||||
/**
|
||||
* this rather flaky filter just filters out some URLs, i.e. different views
|
||||
* of Apache the apache DirIndex module. Has to be made
|
||||
* configurable in near future
|
||||
*/
|
||||
protected KnownPathsFilter knownPathsFilter;
|
||||
|
||||
/**
|
||||
* the URL length filter filters URLs that are too long, i.e. because of errors
|
||||
* in the implementation of dynamic web sites
|
||||
*/
|
||||
protected URLLengthFilter urlLengthFilter;
|
||||
|
||||
|
||||
/**
|
||||
* this is the main document fetcher. It contains a thread pool that fetches the
|
||||
* documents and stores them
|
||||
*/
|
||||
protected Fetcher fetcher;
|
||||
|
||||
/**
|
||||
* the thread monitor once was only a monitoring tool, but now has become a
|
||||
* vital part of the system that computes statistics and
|
||||
* flushes the log file buffers
|
||||
*/
|
||||
protected ThreadMonitor monitor;
|
||||
|
||||
/**
|
||||
* the storage is a central class that puts all fetched documents somewhere.
|
||||
* Several differnt implementations exist.
|
||||
*/
|
||||
protected DocumentStorage storage;
|
||||
|
||||
/**
|
||||
* initializes all classes and registers anonymous adapter classes as
|
||||
* listeners for fetcher events.
|
||||
*
|
||||
* @param nrThreads number of fetcher threads to be created
|
||||
*/
|
||||
public FetcherMain(int nrThreads, String hostResolverFile) throws Exception
|
||||
{
|
||||
// to make things clear, this method is commented a bit better than
|
||||
// the rest of the program...
|
||||
|
||||
// this is the main message queue. handlers are registered with
|
||||
// the queue, and whenever a message is put in it, the message is passed to the
|
||||
// filters in a "chain of responibility" manner. Every listener can decide
|
||||
// to throw the message away
|
||||
messageHandler = new MessageHandler();
|
||||
|
||||
// the storage is the class which saves a WebDocument somewhere, no
|
||||
// matter how it does it, whether it's in a file, in a database or
|
||||
// whatever
|
||||
|
||||
// example for the (very slow) SQL Server storage:
|
||||
// this.storage = new SQLServerStorage("sun.jdbc.odbc.JdbcOdbcDriver","jdbc:odbc:search","sa","...",nrThreads);
|
||||
|
||||
// the LogStorage used here does extensive logging. It logs all links and
|
||||
// document information.
|
||||
// it also saves all documents to page files.
|
||||
File logsDir = new File("logs");
|
||||
logsDir.mkdir(); // ensure log directory exists
|
||||
|
||||
// in this experimental implementation, the crawler is pretty verbose
|
||||
// the SimpleLogger, however, is a FlyWeight logger which is buffered and
|
||||
// not thread safe by default
|
||||
SimpleLogger storeLog = new SimpleLogger("store", /* add date/time? */ false);
|
||||
SimpleLogger visitedLog = new SimpleLogger("URLVisitedFilter", /* add date/time? */ false);
|
||||
SimpleLogger scopeLog = new SimpleLogger("URLScopeFilter", /* add date/time? */ false);
|
||||
SimpleLogger pathsLog = new SimpleLogger("KnownPathsFilter", /* add date/time? */ false);
|
||||
SimpleLogger linksLog = new SimpleLogger("links", /* add date/time? */ false);
|
||||
SimpleLogger lengthLog = new SimpleLogger("length", /* add date/time? */ false);
|
||||
|
||||
StoragePipeline storage = new StoragePipeline();
|
||||
|
||||
|
||||
// in the default configuration, the crawler will only save the document
|
||||
// information to store.log and the link information to links.log
|
||||
// The contents of the files are _not_ saved. If you set
|
||||
// "save in page files" to "true", they will be saved in "page files",
|
||||
// binary files each containing a set of documents. Here, the
|
||||
// maximum file size is ~50 MB (crawled files won't be split up into different
|
||||
// files). The logs/store.log file contains pointers to these files: a page
|
||||
// file number, the offset within that file, and the document's length
|
||||
|
||||
// FIXME: default constructor for all storages + bean access methods
|
||||
storage.addDocStorage(new LogStorage(storeLog, /* save in page files? */ true,
|
||||
/* page file prefix */ "logs/pagefile"));
|
||||
storage.addLinkStorage(new LinkLogStorage(linksLog));
|
||||
storage.addLinkStorage(messageHandler);
|
||||
/*
|
||||
// experimental Lucene storage. will slow the crawler down *a lot*
|
||||
LuceneStorage luceneStorage = new LuceneStorage();
|
||||
luceneStorage.setAnalyzer(new org.apache.lucene.analysis.de.GermanAnalyzer());
|
||||
luceneStorage.setCreate(true);
|
||||
// FIXME: index name and path need to be configurable
|
||||
luceneStorage.setIndexName("luceneIndex");
|
||||
// the field names come from URLMessage.java and WebDocument.java. See
|
||||
// LuceneStorage source for details
|
||||
luceneStorage.setFieldInfo("url", LuceneStorage.INDEX | LuceneStorage.STORE);
|
||||
luceneStorage.setFieldInfo("content", LuceneStorage.INDEX | LuceneStorage.STORE | LuceneStorage.TOKEN);
|
||||
storage.addDocStorage(luceneStorage);
|
||||
*/
|
||||
|
||||
storage.open();
|
||||
|
||||
//storage.addStorage(new JMSStorage(...));
|
||||
|
||||
// create the filters and add them to the message queue
|
||||
urlScopeFilter = new URLScopeFilter(scopeLog);
|
||||
|
||||
// dnsResolver = new DNSResolver();
|
||||
hostManager = new HostManager(1000);
|
||||
hostResolver = new HostResolver();
|
||||
if(hostResolverFile != null && !"".equals(hostResolverFile))
|
||||
{
|
||||
hostResolver.initFromFile(hostResolverFile);
|
||||
}
|
||||
hostManager.setHostResolver(hostResolver);
|
||||
|
||||
// hostManager.addSynonym("www.fachsprachen.uni-muenchen.de", "www.fremdsprachen.uni-muenchen.de");
|
||||
// hostManager.addSynonym("www.uni-muenchen.de", "www.lmu.de");
|
||||
// hostManager.addSynonym("www.uni-muenchen.de", "uni-muenchen.de");
|
||||
// hostManager.addSynonym("webinfo.uni-muenchen.de", "www.webinfo.uni-muenchen.de");
|
||||
// hostManager.addSynonym("webinfo.uni-muenchen.de", "webinfo.campus.lmu.de");
|
||||
// hostManager.addSynonym("www.s-a.uni-muenchen.de", "s-a.uni-muenchen.de");
|
||||
|
||||
reFilter = new RobotExclusionFilter(hostManager);
|
||||
|
||||
fetcher = new Fetcher(nrThreads, storage, storage, hostManager);
|
||||
|
||||
urlLengthFilter = new URLLengthFilter(500, lengthLog);
|
||||
|
||||
//knownPathsFilter = new KnownPathsFilter()
|
||||
|
||||
// prevent message box popups
|
||||
HTTPConnection.setDefaultAllowUserInteraction(false);
|
||||
|
||||
// prevent GZipped files from being decoded
|
||||
HTTPConnection.removeDefaultModule(HTTPClient.ContentEncodingModule.class);
|
||||
|
||||
urlVisitedFilter = new URLVisitedFilter(visitedLog, 100000);
|
||||
|
||||
// initialize the threads
|
||||
fetcher.init();
|
||||
|
||||
// the thread monitor watches the thread pool.
|
||||
|
||||
monitor = new ThreadMonitor(urlLengthFilter,
|
||||
urlVisitedFilter,
|
||||
urlScopeFilter,
|
||||
/*dnsResolver,*/
|
||||
reFilter,
|
||||
messageHandler,
|
||||
fetcher.getThreadPool(),
|
||||
hostManager,
|
||||
5000 // wake up every 5 seconds
|
||||
);
|
||||
|
||||
|
||||
// add all filters to the handler.
|
||||
messageHandler.addListener(urlLengthFilter);
|
||||
messageHandler.addListener(urlScopeFilter);
|
||||
messageHandler.addListener(reFilter);
|
||||
messageHandler.addListener(urlVisitedFilter);
|
||||
//messageHandler.addListener(knownPathsFilter);
|
||||
|
||||
messageHandler.addListener(fetcher);
|
||||
|
||||
//uncomment this to enable HTTPClient logging
|
||||
/*
|
||||
try
|
||||
{
|
||||
HTTPClient.Log.setLogWriter(new java.io.OutputStreamWriter(System.out) //new java.io.FileWriter("logs/HttpClient.log")
|
||||
,false);
|
||||
HTTPClient.Log.setLogging(HTTPClient.Log.ALL, true);
|
||||
}
|
||||
catch (Exception e)
|
||||
{
|
||||
e.printStackTrace();
|
||||
}
|
||||
*/
|
||||
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the RexString attribute of <code>UrlScopeFilter</code>.
|
||||
*
|
||||
* @param restrictTo the new RexString value
|
||||
*/
|
||||
public void setRexString(String restrictTo) throws MalformedPatternException
|
||||
{
|
||||
urlScopeFilter.setRexString(restrictTo);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param url Description of Parameter
|
||||
* @param isFrame Description of the Parameter
|
||||
* @exception java.net.MalformedURLException Description of Exception
|
||||
*/
|
||||
public void putURL(URL url, boolean isFrame)
|
||||
// throws java.net.MalformedURLException
|
||||
{
|
||||
try
|
||||
{
|
||||
messageHandler.putMessage(new URLMessage(url, null, isFrame == true ? URLMessage.LINKTYPE_FRAME : URLMessage.LINKTYPE_ANCHOR, null, this.hostResolver));
|
||||
}
|
||||
catch (Exception e)
|
||||
{
|
||||
// FIXME: replace with logging
|
||||
System.out.println("Exception: " + e.getMessage());
|
||||
e.printStackTrace();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*/
|
||||
public void startMonitor()
|
||||
{
|
||||
monitor.start();
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* the GUI is not working at this time. It was used in the very beginning, but
|
||||
* synchronous updates turned out to slow down the program a lot, even if the
|
||||
* GUI would be turned off. Thus, a lot
|
||||
* of Observer messages where removed later. Nontheless, it's quite cool to see
|
||||
* it working...
|
||||
*
|
||||
* @param f Description of Parameter
|
||||
* @param startURL Description of Parameter
|
||||
*/
|
||||
|
||||
/*
|
||||
public void initGui(FetcherMain f, String startURL)
|
||||
{
|
||||
// if we're on a windows platform, make it look a bit more convenient
|
||||
try
|
||||
{
|
||||
UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());
|
||||
}
|
||||
catch (Exception e)
|
||||
{
|
||||
// dann halt nicht...
|
||||
}
|
||||
System.out.println("Init FetcherFrame");
|
||||
|
||||
FetcherSummaryFrame fetcherFrame;
|
||||
fetcherFrame = new FetcherSummaryFrame();
|
||||
fetcherFrame.setSize(640, 450);
|
||||
fetcherFrame.setVisible(true);
|
||||
FetcherGUIController guiController = new FetcherGUIController(f, fetcherFrame, startURL);
|
||||
}
|
||||
*/
|
||||
|
||||
|
||||
/**
|
||||
* The main program.
|
||||
*
|
||||
* @param args The command line arguments
|
||||
*/
|
||||
public static void main(String[] args) throws Exception
|
||||
{
|
||||
int nrThreads = 10;
|
||||
|
||||
ArrayList startURLs = new ArrayList();
|
||||
String restrictTo = ".*";
|
||||
boolean gui = false;
|
||||
boolean showInfo = false;
|
||||
String hostResolverFile = "";
|
||||
System.out.println("LARM - LANLab Retrieval Machine - Fetcher - V 1.00 - B.20020914");
|
||||
// FIXME: consider using Jakarta Commons' CLI package for command line parameters
|
||||
|
||||
for (int i = 0; i < args.length; i++)
|
||||
{
|
||||
if (args[i].equals("-start"))
|
||||
{
|
||||
i++;
|
||||
String arg = args[i];
|
||||
if(arg.startsWith("@"))
|
||||
{
|
||||
// input is a file with one URL per line
|
||||
String fileName = arg.substring(1);
|
||||
System.out.println("reading URL file " + fileName);
|
||||
try
|
||||
{
|
||||
BufferedReader r = new BufferedReader(new FileReader(fileName));
|
||||
String line;
|
||||
int count=0;
|
||||
while((line = r.readLine()) != null)
|
||||
{
|
||||
try
|
||||
{
|
||||
startURLs.add(new URL(line));
|
||||
count++;
|
||||
}
|
||||
catch (MalformedURLException e)
|
||||
{
|
||||
System.out.println("Malformed URL '" + line + "' in line " + (count+1) + " of file " + fileName);
|
||||
|
||||
}
|
||||
}
|
||||
r.close();
|
||||
System.out.println("added " + count + " URLs from " + fileName);
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
System.out.println("Couldn't read '" + fileName + "': " + e);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
System.out.println("got URL " + arg);
|
||||
try
|
||||
{
|
||||
startURLs.add(new URL(arg));
|
||||
System.out.println("Start-URL added: " + arg);
|
||||
}
|
||||
catch (MalformedURLException e)
|
||||
{
|
||||
System.out.println("Malformed URL '" + arg + "'");
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
else if (args[i].equals("-restrictto"))
|
||||
{
|
||||
i++;
|
||||
restrictTo = args[i];
|
||||
System.out.println("Restricting URLs to " + restrictTo);
|
||||
}
|
||||
else if (args[i].equals("-threads"))
|
||||
{
|
||||
i++;
|
||||
nrThreads = Integer.parseInt(args[i]);
|
||||
System.out.println("Threads set to " + nrThreads);
|
||||
}
|
||||
else if (args[i].equals("-hostresolver"))
|
||||
{
|
||||
i++;
|
||||
hostResolverFile = args[i];
|
||||
System.out.println("reading host resolver props from '" + hostResolverFile + "'");
|
||||
|
||||
}
|
||||
else if (args[i].equals("-gui"))
|
||||
{
|
||||
gui = true;
|
||||
}
|
||||
else if (args[i].equals("-?"))
|
||||
{
|
||||
showInfo = true;
|
||||
}
|
||||
else
|
||||
{
|
||||
System.out.println("Unknown option: " + args[i] + "; use -? to get syntax");
|
||||
System.exit(0);
|
||||
}
|
||||
}
|
||||
|
||||
//URL.setURLStreamHandlerFactory(new HttpTimeoutFactory(500));
|
||||
// replaced by HTTPClient
|
||||
|
||||
FetcherMain f = new FetcherMain(nrThreads, hostResolverFile);
|
||||
if (showInfo || (startURLs.isEmpty() && gui == false))
|
||||
{
|
||||
System.out.println("The LARM crawler\n" +
|
||||
"\n" +
|
||||
"The LARM crawler is a fast parallel crawler, currently designed for\n" +
|
||||
"large intranets (up to a couple hundred hosts with some hundred thousand\n" +
|
||||
"documents). It is currently restricted by a relatively high memory overhead\n" +
|
||||
"per crawled host, and by a HashMap of already crawled URLs which is also held\n" +
|
||||
"in memory.\n" +
|
||||
"\n" +
|
||||
"Usage: FetcherMain <-start <URL>|@<filename>>+ -restrictto <RegEx>\n" +
|
||||
" [-threads <nr=10>] [-hostresolver <filename>]\n" +
|
||||
"\n" +
|
||||
"Commands:\n" +
|
||||
" -start specify one or more URLs to start with. You can as well specify a file" +
|
||||
" that contains URLs, one each line\n" +
|
||||
" -restrictto a Perl 5 regular expression each URL must match. It is run against the\n" +
|
||||
" _complete_ URL, including the http:// part\n" +
|
||||
" -threads the number of crawling threads. defaults to 10\n" +
|
||||
" -hostresolver specify a file that contains rules for changing the host part of \n" +
|
||||
" a URL during the normalization process (experimental).\n" +
|
||||
"Caution: The <RegEx> is applied to the _normalized_ form of a URL.\n" +
|
||||
" See URLNormalizer for details\n" +
|
||||
"Example:\n" +
|
||||
" -start @urls1.txt -start @urls2.txt -start http://localhost/ " +
|
||||
" -restrictto http://[^/]*\\.localhost/.* -threads 25\n" +
|
||||
"\n" +
|
||||
"The host resolver file may contain the following commands: \n" +
|
||||
" startsWith(part1) = part2\n" +
|
||||
" if host starts with part1, this part will be replaced by part2\n" +
|
||||
" endsWith(part1) = part2\n" +
|
||||
" if host ends with part1, this part will be replaced by part2. This is done after\n" +
|
||||
" startsWith was processed\n" +
|
||||
" synonym(host1) = host2\n" +
|
||||
" the keywords startsWith, endsWith and synonym are case sensitive\n" +
|
||||
" host1 will be replaced with host2. this is done _after_ startsWith and endsWith was \n" +
|
||||
" processed. Due to a bug in BeanUtils, dots are not allowed in the keys (in parentheses)\n" +
|
||||
" and have to be escaped with commas. To simplify, commas are also replaced in property \n" +
|
||||
" values. So just use commas instead of dots. The resulting host names are only used for \n" +
|
||||
" comparisons and do not have to be existing URLs (although the syntax has to be valid).\n" +
|
||||
" However, the names will often be passed to java.net.URL which will try to make a DNS name\n" +
|
||||
" resolution, which will time out if the server can't be found. \n" +
|
||||
" Example:" +
|
||||
" synonym(www1,host,com) = host,com\n" +
|
||||
" startsWith(www,) = ,\n" +
|
||||
" endsWith(host1,com) = host,com\n" +
|
||||
"The crawler will show a status message every 5 seconds, which is printed by ThreadMonitor.java\n" +
|
||||
"It will stop after the ThreadMonitor found the message queue and the crawling threads to be idle a \n" +
|
||||
"couple of times.\n" +
|
||||
"The crawled data will be saved within a logs/ directory. A cachingqueue/ directory is used for\n" +
|
||||
"temporary queues.\n" +
|
||||
"Note that this implementation is experimental, and that the command line options cover only a part \n" +
|
||||
"of the parameters. Much of the configuration can only be done by modifying FetcherMain.java\n");
|
||||
System.exit(0);
|
||||
}
|
||||
try
|
||||
{
|
||||
f.setRexString(restrictTo);
|
||||
|
||||
if (gui)
|
||||
{
|
||||
// f.initGui(f, startURL);
|
||||
// the GUI is not longer supported
|
||||
}
|
||||
else
|
||||
{
|
||||
f.startMonitor();
|
||||
for(Iterator it = startURLs.iterator(); it.hasNext(); )
|
||||
{
|
||||
f.putURL((URL)it.next(), false);
|
||||
}
|
||||
}
|
||||
}
|
||||
catch (MalformedPatternException e)
|
||||
{
|
||||
System.out.println("Wrong RegEx syntax. Must be a valid PERL RE");
|
||||
}
|
||||
}
|
||||
}
|
|
@ -1,882 +0,0 @@
|
|||
/*
|
||||
* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.net.URL;
|
||||
import de.lanlab.larm.threads.*;
|
||||
import de.lanlab.larm.util.InputStreamObserver;
|
||||
import de.lanlab.larm.util.ObservableInputStream;
|
||||
import de.lanlab.larm.util.WebDocument;
|
||||
import de.lanlab.larm.util.SimpleCharArrayReader;
|
||||
import de.lanlab.larm.storage.DocumentStorage;
|
||||
import de.lanlab.larm.storage.LinkStorage;
|
||||
|
||||
import de.lanlab.larm.util.State;
|
||||
import de.lanlab.larm.util.SimpleLogger;
|
||||
import HTTPClient.*;
|
||||
import java.net.*;
|
||||
import java.io.*;
|
||||
import java.util.*;
|
||||
import java.text.*;
|
||||
import de.lanlab.larm.parser.Tokenizer;
|
||||
import de.lanlab.larm.parser.LinkHandler;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* this class gets the documents from the web. It connects to the server given
|
||||
* by the IP address in the URLMessage, gets the document, and forwards it to
|
||||
* the storage. If it's an HTML document, it will be parsed and all links will
|
||||
* be put into the message handler again. stores contents of the files in field
|
||||
* "contents"
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 28. Juni 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class FetcherTask
|
||||
implements InterruptableTask, LinkHandler, Serializable
|
||||
{
|
||||
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
protected volatile boolean isInterrupted = false;
|
||||
|
||||
/**
|
||||
* each task has its own number. the class variable counts up if an instance
|
||||
* of a fetcher task is created
|
||||
*/
|
||||
static volatile int taskIdentity = 0;
|
||||
|
||||
/**
|
||||
* the number of this object
|
||||
*/
|
||||
int taskNr;
|
||||
|
||||
/**
|
||||
* the BASE Href (defaults to contextUrl, may be changed with a <base> tag
|
||||
* only valid within a doTask call
|
||||
*/
|
||||
private volatile URL base;
|
||||
|
||||
/**
|
||||
* the URL of the docuzment only valid within a doTask call
|
||||
*/
|
||||
private volatile URL contextUrl;
|
||||
|
||||
/**
|
||||
* the message handler the URL message comes from; same for all tasks
|
||||
*/
|
||||
protected static volatile MessageHandler messageHandler;
|
||||
|
||||
/**
|
||||
* actual number of bytes read only valid within a doTask call
|
||||
*/
|
||||
private volatile long bytesRead = 0;
|
||||
|
||||
/**
|
||||
* the docStorage this task will put the document to
|
||||
*/
|
||||
private static volatile DocumentStorage docStorage;
|
||||
|
||||
/**
|
||||
* the docStorage this task will put the links to
|
||||
*/
|
||||
private static volatile LinkStorage linkStorage;
|
||||
|
||||
/**
|
||||
* task state IDs. comparisons will be done by their references, so always
|
||||
* use the IDs
|
||||
*/
|
||||
public final static String FT_IDLE = "idle";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_STARTED = "started";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_OPENCONNECTION = "opening connection";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_CONNECTING = "connecting";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_GETTING = "getting";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_READING = "reading";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_SCANNING = "scanning";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_STORING = "storing";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_READY = "ready";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_CLOSING = "closing";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_EXCEPTION = "exception";
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static String FT_INTERRUPTED = "interrupted";
|
||||
|
||||
private volatile State taskState = new State(FT_IDLE);
|
||||
|
||||
/**
|
||||
* the URLs found will be stored and only added to the message handler in
|
||||
* the very end, to avoid too many synchronizations
|
||||
*/
|
||||
private volatile LinkedList foundUrls;
|
||||
|
||||
/**
|
||||
* the URL to be get
|
||||
*/
|
||||
protected volatile URLMessage actURLMessage;
|
||||
|
||||
/**
|
||||
* the document title, if present
|
||||
*/
|
||||
private volatile String title;
|
||||
|
||||
|
||||
/**
|
||||
* Gets a copy of the current taskState
|
||||
*
|
||||
* @return The taskState value
|
||||
*/
|
||||
public State getTaskState()
|
||||
{
|
||||
return taskState.cloneState();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the FetcherTask object
|
||||
*
|
||||
* @param urlMessage Description of the Parameter
|
||||
*/
|
||||
public FetcherTask(URLMessage urlMessage)
|
||||
{
|
||||
actURLMessage = urlMessage;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the uRLMessages attribute of the FetcherTask object
|
||||
*
|
||||
* @return The uRLMessages value
|
||||
*/
|
||||
public URLMessage getActURLMessage()
|
||||
{
|
||||
return this.actURLMessage;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the document docStorage
|
||||
*
|
||||
* @param docStorage The new docStorage
|
||||
*/
|
||||
public static void setDocStorage(DocumentStorage docStorage)
|
||||
{
|
||||
FetcherTask.docStorage = docStorage;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the document linkStorage
|
||||
*
|
||||
* @param linkStorage The new linkStorage
|
||||
*/
|
||||
public static void setLinkStorage(LinkStorage linkStorage)
|
||||
{
|
||||
FetcherTask.linkStorage = linkStorage;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the messageHandler
|
||||
*
|
||||
* @param messageHandler The new messageHandler
|
||||
*/
|
||||
public static void setMessageHandler(MessageHandler messageHandler)
|
||||
{
|
||||
FetcherTask.messageHandler = messageHandler;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* @return the URL as a string
|
||||
*/
|
||||
public String getInfo()
|
||||
{
|
||||
return actURLMessage.getURLString();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the uRL attribute of the FetcherTask object
|
||||
*
|
||||
* @return The uRL value
|
||||
*/
|
||||
public URL getURL()
|
||||
{
|
||||
return actURLMessage.getUrl();
|
||||
}
|
||||
|
||||
|
||||
volatile SimpleLogger log;
|
||||
|
||||
volatile SimpleLogger errorLog;
|
||||
|
||||
volatile HostManager hostManager;
|
||||
volatile HostResolver hostResolver;
|
||||
|
||||
//private long startTime;
|
||||
|
||||
/**
|
||||
* this will be called by the fetcher thread and will do all the work
|
||||
*
|
||||
* @param thread Description of the Parameter
|
||||
* @TODO probably split this up into different processing steps
|
||||
*/
|
||||
public void run(ServerThread thread)
|
||||
{
|
||||
|
||||
|
||||
taskState.setState(FT_STARTED);
|
||||
// state information is always set to make the thread monitor happy
|
||||
|
||||
log = thread.getLog();
|
||||
hostManager = ((FetcherThread) thread).getHostManager();
|
||||
hostResolver = hostManager.getHostResolver();
|
||||
base = contextUrl = actURLMessage.getUrl();
|
||||
String urlString = actURLMessage.getURLString();
|
||||
String host = contextUrl.getHost().toLowerCase();
|
||||
HostInfo hi = hostManager.getHostInfo(host);
|
||||
// System.out.println("FetcherTask with " + urlString + " started");
|
||||
if(actURLMessage.linkType == URLMessage.LINKTYPE_REDIRECT)
|
||||
{
|
||||
taskState.setState(FT_READY, null);
|
||||
hi.releaseLock();
|
||||
return; // we've already crawled that (see below)
|
||||
}
|
||||
|
||||
NVPair[] headers = ((FetcherThread) thread).getDefaultHeaders();
|
||||
int numHeaders = ((FetcherThread) thread).getUsedDefaultHeaders();
|
||||
boolean isIncremental = false;
|
||||
if (actURLMessage instanceof WebDocument)
|
||||
{
|
||||
// this is an incremental crawl where we only have to check whether the doc crawled
|
||||
// is newer
|
||||
isIncremental = true;
|
||||
headers[numHeaders] = new NVPair("If-Modified-Since", HTTPClient.Util.httpDate(((WebDocument) actURLMessage).getLastModified()));
|
||||
}
|
||||
//HostManager hm = ((FetcherThread)thread).getHostManager();
|
||||
|
||||
errorLog = thread.getErrorLog();
|
||||
|
||||
// startTime = System.currentTimeMillis();
|
||||
int threadNr = ((FetcherThread) thread).getThreadNumber();
|
||||
|
||||
log.log("start");
|
||||
int hostPos = urlString.indexOf(host);
|
||||
int hostLen = host.length();
|
||||
|
||||
// get and create
|
||||
|
||||
if (!hi.isHealthy())
|
||||
{
|
||||
// we make this check as late as possible to get the most current information
|
||||
log.log("Bad Host: " + contextUrl + "; returning");
|
||||
// System.out.println("[" + threadNr + "] bad host: " + this.actURLMessage.getUrl());
|
||||
|
||||
taskState.setState(FT_READY, null);
|
||||
hi.releaseLock();
|
||||
return;
|
||||
}
|
||||
|
||||
foundUrls = new java.util.LinkedList();
|
||||
|
||||
HTTPConnection conn = null;
|
||||
|
||||
title = "";
|
||||
|
||||
int size = 1;
|
||||
|
||||
InputStream in = null;
|
||||
bytesRead = 0;
|
||||
|
||||
try
|
||||
{
|
||||
|
||||
URL ipURL = contextUrl;
|
||||
|
||||
taskState.setState(FT_OPENCONNECTION, urlString);
|
||||
|
||||
log.log("connecting to " + ipURL.getHost());
|
||||
taskState.setState(FT_CONNECTING, ipURL);
|
||||
conn = new HTTPConnection(host);
|
||||
|
||||
conn.setDefaultTimeout(75000);
|
||||
|
||||
// 75 s
|
||||
conn.setDefaultAllowUserInteraction(false);
|
||||
|
||||
taskState.setState(this.FT_GETTING, ipURL);
|
||||
log.log("getting");
|
||||
|
||||
HTTPResponse response = conn.Get(ipURL.getFile(), "", headers);
|
||||
response.setReadIncrement(2720);
|
||||
int statusCode = response.getStatusCode();
|
||||
byte[] fullBuffer = null;
|
||||
String contentType = "";
|
||||
int contentLength = 0;
|
||||
Date date = null;
|
||||
|
||||
if (isIncremental)
|
||||
{
|
||||
// experimental
|
||||
System.out.println("ftask: if modified since: " + HTTPClient.Util.httpDate(((WebDocument) actURLMessage).getLastModified()));
|
||||
}
|
||||
|
||||
URL realURL;
|
||||
|
||||
switch (statusCode)
|
||||
{
|
||||
case 404: // file not found
|
||||
case 403: // access forbidden
|
||||
|
||||
// if this is an incremental crawl, remove the doc from the repository
|
||||
if (isIncremental)
|
||||
{
|
||||
WebDocument d = (WebDocument) actURLMessage;
|
||||
d.setResultCode(statusCode);
|
||||
// the repository will remove the doc if this statuscode is matched
|
||||
docStorage.store(d);
|
||||
}
|
||||
// otherwise, do nothing
|
||||
// Todo: we could add an error marker to the referal link
|
||||
break;
|
||||
case 304:
|
||||
// not modified
|
||||
System.out.println("ftask: -> not modified");
|
||||
// "not modified since"
|
||||
taskState.setState(FT_STORING, ipURL);
|
||||
// let the repository take care of the links
|
||||
// it will determine that this is the old document (because it already
|
||||
// has a docId), and will put back the links associated with it
|
||||
try
|
||||
{
|
||||
WebDocument doc = (WebDocument) this.actURLMessage;
|
||||
doc.setModified(false);
|
||||
docStorage.store(doc);
|
||||
this.bytesRead += doc.getSize();
|
||||
}
|
||||
catch (ClassCastException e)
|
||||
{
|
||||
System.out.println("error while casting to WebDoc: " + actURLMessage.getInfo());
|
||||
}
|
||||
break;
|
||||
case 301: // moved permanently
|
||||
case 302: // moved temporarily
|
||||
case 303: // see other
|
||||
case 307: // temporary redirect
|
||||
/*
|
||||
* this is a redirect. save it as a link and return.
|
||||
* note that we could read the doc from the open connection here, but this could mean
|
||||
* the filters were useless
|
||||
*/
|
||||
realURL = response.getEffectiveURI().toURL();
|
||||
foundUrls.add(new URLMessage(realURL, contextUrl, URLMessage.LINKTYPE_REDIRECT, "", hostResolver));
|
||||
linkStorage.storeLinks(foundUrls);
|
||||
break;
|
||||
default:
|
||||
// this can be a 30x code that was resolved by the HTTPClient and is passed to us as 200
|
||||
// we could turn this off and do it ourselves. But then we'd have to take care that
|
||||
// we don't get into an endless redirection loop -> i.e. extend URLMessage by a counter
|
||||
// at the moment we add the real URL to the message queue and mark it as a REDIRECT link
|
||||
// that way it is added to the visited filter. Then we take care that we don't crawl it again
|
||||
|
||||
// the other possibility is that we receive a "Location:" header along with a 200 status code
|
||||
// I have experienced that HTTPClient has an error with parsing this, so we do it ourselves
|
||||
//String location = response.getHeader("Location");
|
||||
realURL = response.getEffectiveURI().toURL();
|
||||
|
||||
/*if(location != null)
|
||||
{
|
||||
//System.out.println("interesting: location header with url " + location);
|
||||
foundUrls.add(new URLMessage(new URL(location), contextUrl, URLMessage.LINKTYPE_REDIRECT, "", hostManager));
|
||||
this.base = this.contextUrl = location;
|
||||
}
|
||||
else*/
|
||||
if(!(realURL.equals(contextUrl)))
|
||||
{
|
||||
//System.out.println("interesting: redirect with url " + realURL + " -context: " + contextUrl);
|
||||
foundUrls.add(new URLMessage(realURL, contextUrl, URLMessage.LINKTYPE_REDIRECT, "", hostResolver));
|
||||
this.base = this.contextUrl = realURL;
|
||||
//System.out.println(response);
|
||||
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
if (isIncremental)
|
||||
{
|
||||
// experimental
|
||||
System.out.println("ftask: -> was modified at " + response.getHeaderAsDate("Last-Modified"));
|
||||
}
|
||||
// read up to Constants.FETCHERTASK_MAXFILESIZE bytes into a byte array
|
||||
taskState.setState(FT_READING, ipURL);
|
||||
contentType = response.getHeader("Content-Type");
|
||||
String length = response.getHeader("Content-Length");
|
||||
date = response.getHeaderAsDate("Last-Modified");
|
||||
|
||||
if (length != null)
|
||||
{
|
||||
contentLength = Integer.parseInt(length);
|
||||
}
|
||||
log.log("reading");
|
||||
realURL = response.getEffectiveURI().toURL();
|
||||
if (contentType != null && contentType.startsWith("text/html"))
|
||||
{
|
||||
fullBuffer = response.getData(Constants.FETCHERTASK_MAXFILESIZE);
|
||||
hi.releaseLock();
|
||||
// max. 2 MB
|
||||
if (fullBuffer != null)
|
||||
{
|
||||
contentLength = fullBuffer.length;
|
||||
this.bytesRead += contentLength;
|
||||
}
|
||||
|
||||
/*
|
||||
* conn.disconnect();
|
||||
*/
|
||||
if (isInterrupted)
|
||||
{
|
||||
System.out.println("FetcherTask: interrupted while reading. File truncated");
|
||||
log.log("interrupted while reading. File truncated");
|
||||
}
|
||||
else
|
||||
{
|
||||
if (fullBuffer != null)
|
||||
{
|
||||
taskState.setState(FT_SCANNING, ipURL);
|
||||
|
||||
log.log("read file (" + fullBuffer.length + " bytes). Now scanning.");
|
||||
|
||||
// convert the bytes to Java characters
|
||||
// ouch. I haven't found a better solution yet. just slower ones.
|
||||
// remember: for better runtime performance avoid decorators, since they
|
||||
// multiply function calls
|
||||
char[] fullCharBuffer = new char[contentLength];
|
||||
new InputStreamReader(new ByteArrayInputStream(fullBuffer)).read(fullCharBuffer);
|
||||
Tokenizer tok = new Tokenizer();
|
||||
tok.setLinkHandler(this);
|
||||
tok.parse(new SimpleCharArrayReader(fullCharBuffer));
|
||||
|
||||
taskState.setState(FT_STORING, ipURL);
|
||||
linkStorage.storeLinks(foundUrls);
|
||||
WebDocument d;
|
||||
if (isIncremental)
|
||||
{
|
||||
d = ((WebDocument) this.actURLMessage);
|
||||
d.setModified(true);
|
||||
// file is new or newer
|
||||
d.setUrl(contextUrl);
|
||||
d.setMimeType(contentType);
|
||||
d.setResultCode(statusCode);
|
||||
d.setSize(contentLength);
|
||||
d.setTitle(title);
|
||||
d.setLastModified(date);
|
||||
}
|
||||
else
|
||||
{
|
||||
d = new WebDocument(contextUrl, contentType, statusCode, actURLMessage.getReferer(), contentLength, title, date, hostResolver);
|
||||
}
|
||||
d.addField("content", fullCharBuffer);
|
||||
d.addField("contentBytes", fullBuffer);
|
||||
docStorage.store(d);
|
||||
}
|
||||
|
||||
log.log("scanned");
|
||||
}
|
||||
|
||||
log.log("stored");
|
||||
}
|
||||
else
|
||||
{
|
||||
// System.out.println("Discovered unknown content type: " + contentType + " at " + urlString);
|
||||
//errorLog.log("[" + threadNr + "] Discovered unknown content type at " + urlString + ": " + contentType + ". just storing");
|
||||
taskState.setState(FT_STORING, ipURL);
|
||||
linkStorage.storeLinks(foundUrls);
|
||||
WebDocument d = new WebDocument(contextUrl, contentType, statusCode, actURLMessage.getReferer(),
|
||||
/*
|
||||
* contentLength
|
||||
*/
|
||||
0, title, date, hostResolver);
|
||||
//d.addField("content", fullBuffer);
|
||||
//d.addField("content", null);
|
||||
docStorage.store(d);
|
||||
}
|
||||
break;
|
||||
}
|
||||
/*
|
||||
* switch
|
||||
*/
|
||||
//conn.stop(); // close connection. todo: Do some caching...
|
||||
|
||||
}
|
||||
catch (InterruptedIOException e)
|
||||
{
|
||||
// timeout while reading this file
|
||||
//System.out.println("[" + threadNr + "] FetcherTask: Timeout while opening: " + this.actURLMessage.getUrl());
|
||||
errorLog.log("error: Timeout: " + this.actURLMessage.getUrl());
|
||||
hi.badRequest();
|
||||
}
|
||||
catch (FileNotFoundException e)
|
||||
{
|
||||
taskState.setState(FT_EXCEPTION);
|
||||
//System.out.println("[" + threadNr + "] FetcherTask: File not Found: " + this.actURLMessage.getUrl());
|
||||
errorLog.log("error: File not Found: " + this.actURLMessage.getUrl());
|
||||
}
|
||||
catch (NoRouteToHostException e)
|
||||
{
|
||||
// router is down or firewall prevents to connect
|
||||
hi.setReachable(false);
|
||||
taskState.setState(FT_EXCEPTION);
|
||||
//System.out.println("[" + threadNr + "] " + e.getClass().getName() + ": " + e.getMessage());
|
||||
// e.printStackTrace();
|
||||
errorLog.log("error: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
}
|
||||
catch (ConnectException e)
|
||||
{
|
||||
// no server is listening at this port
|
||||
hi.setReachable(false);
|
||||
taskState.setState(FT_EXCEPTION);
|
||||
//System.out.println("[" + threadNr + "] " + e.getClass().getName() + ": " + e.getMessage());
|
||||
// e.printStackTrace();
|
||||
errorLog.log("error: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
|
||||
}
|
||||
catch (SocketException e)
|
||||
{
|
||||
taskState.setState(FT_EXCEPTION);
|
||||
//System.out.println("[" + threadNr + "]: SocketException:" + e.getMessage());
|
||||
errorLog.log("error: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
|
||||
}
|
||||
catch (UnknownHostException e)
|
||||
{
|
||||
// IP Address not to be determined
|
||||
hi.setReachable(false);
|
||||
taskState.setState(FT_EXCEPTION);
|
||||
//System.out.println("[" + threadNr + "] " + e.getClass().getName() + ": " + e.getMessage());
|
||||
// e.printStackTrace();
|
||||
errorLog.log("error: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
|
||||
}
|
||||
catch (IOException e)
|
||||
{
|
||||
taskState.setState(FT_EXCEPTION);
|
||||
//System.out.println("[" + threadNr + "] " + e.getClass().getName() + ": " + e.getMessage());
|
||||
// e.printStackTrace();
|
||||
errorLog.log("error: IOException: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
|
||||
}
|
||||
catch (OutOfMemoryError ome)
|
||||
{
|
||||
taskState.setState(FT_EXCEPTION);
|
||||
System.out.println("[" + threadNr + "] Task " + this.taskNr + " OutOfMemory after " + size + " bytes");
|
||||
errorLog.log("error: OutOfMemory after " + size + " bytes");
|
||||
}
|
||||
catch (Throwable e)
|
||||
{
|
||||
taskState.setState(FT_EXCEPTION);
|
||||
System.out.println("[" + threadNr + "] " + e.getMessage() + " type: " + e.getClass().getName());
|
||||
e.printStackTrace();
|
||||
System.out.println("[" + threadNr + "]: stopping");
|
||||
errorLog.log("error: " + e.getClass().getName() + ": " + e.getMessage() + "; stopping");
|
||||
}
|
||||
finally
|
||||
{
|
||||
hi.releaseLock();
|
||||
|
||||
if (isInterrupted)
|
||||
{
|
||||
System.out.println("Task was interrupted");
|
||||
log.log("interrupted");
|
||||
taskState.setState(FT_INTERRUPTED);
|
||||
}
|
||||
}
|
||||
if (isInterrupted)
|
||||
{
|
||||
System.out.println("Task: closed everything");
|
||||
}
|
||||
/*
|
||||
* }
|
||||
*/
|
||||
taskState.setState(FT_CLOSING);
|
||||
conn.stop();
|
||||
taskState.setState(FT_READY);
|
||||
foundUrls = null;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* the interrupt method. not in use since the change to HTTPClient
|
||||
*
|
||||
* @TODO decide if we need this anymore
|
||||
*/
|
||||
public void interrupt()
|
||||
{
|
||||
System.out.println("FetcherTask: interrupted!");
|
||||
this.isInterrupted = true;
|
||||
/*
|
||||
* try
|
||||
* {
|
||||
* if (conn != null)
|
||||
* {
|
||||
* ((HttpURLConnection) conn).disconnect();
|
||||
* System.out.println("FetcherTask: disconnected URL Connection");
|
||||
* conn = null;
|
||||
* }
|
||||
* if (in != null)
|
||||
* {
|
||||
* in.close();
|
||||
* / possibly hangs at close() .> KeepAliveStream.close() -> MeteredStream.skip()
|
||||
* System.out.println("FetcherTask: Closed Input Stream");
|
||||
* in = null;
|
||||
* }
|
||||
* }
|
||||
* catch (IOException e)
|
||||
* {
|
||||
* System.out.println("IOException while interrupting: ");
|
||||
* e.printStackTrace();
|
||||
* }
|
||||
* System.out.println("FetcherTask: Set all IOs to null");
|
||||
*/
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this is called whenever a link was found in the current document, Don't
|
||||
* create too many objects here, as this will be called millions of times
|
||||
*
|
||||
* @param link Description of the Parameter
|
||||
* @param anchor Description of the Parameter
|
||||
* @param isFrame Description of the Parameter
|
||||
*/
|
||||
public void handleLink(String link, String anchor, boolean isFrame)
|
||||
{
|
||||
try
|
||||
{
|
||||
// cut out Ref part
|
||||
|
||||
|
||||
int refPart = link.indexOf("#");
|
||||
//System.out.println(link);
|
||||
if (refPart == 0)
|
||||
{
|
||||
return;
|
||||
}
|
||||
else if (refPart > 0)
|
||||
{
|
||||
link = link.substring(0, refPart);
|
||||
}
|
||||
|
||||
URL url = null;
|
||||
if (link.startsWith("http:"))
|
||||
{
|
||||
// distinguish between absolute and relative URLs
|
||||
|
||||
url = new URL(link);
|
||||
}
|
||||
else
|
||||
{
|
||||
// relative url
|
||||
url = new URL(base, link);
|
||||
}
|
||||
if(url.getPath() == null || url.getPath().length() == 0)
|
||||
{
|
||||
url = new URL(url.getProtocol(), url.getHost(), url.getPort(), "/" + url.getFile());
|
||||
}
|
||||
URLMessage urlMessage = new URLMessage(url, contextUrl, isFrame ? URLMessage.LINKTYPE_FRAME : URLMessage.LINKTYPE_ANCHOR, anchor, hostResolver);
|
||||
|
||||
//String urlString = urlMessage.getURLString();
|
||||
|
||||
foundUrls.add(urlMessage);
|
||||
//messageHandler.putMessage(new actURLMessage(url)); // put them in the very end
|
||||
}
|
||||
catch (MalformedURLException e)
|
||||
{
|
||||
//log.log("malformed url: base:" + base + " -+- link:" + link);
|
||||
log.log("warning: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
}
|
||||
catch (Exception e)
|
||||
{
|
||||
log.log("warning: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
// e.printStackTrace();
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* called when a BASE tag was found
|
||||
*
|
||||
* @param base the HREF attribute
|
||||
*/
|
||||
public void handleBase(String base)
|
||||
{
|
||||
try
|
||||
{
|
||||
this.base = new URL(base);
|
||||
}
|
||||
catch (MalformedURLException e)
|
||||
{
|
||||
log.log("warning: " + e.getClass().getName() + ": " + e.getMessage() + " while converting '" + base + "' to URL in document " + contextUrl);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* called when a TITLE tag was found
|
||||
*
|
||||
* @param title the string between <title> and >/title>
|
||||
*/
|
||||
public void handleTitle(String title)
|
||||
{
|
||||
this.title = title;
|
||||
}
|
||||
|
||||
|
||||
|
||||
/*
|
||||
* public void notifyOpened(ObservableInputStream in, long timeElapsed)
|
||||
* {
|
||||
* }
|
||||
* public void notifyClosed(ObservableInputStream in, long timeElapsed)
|
||||
* {
|
||||
* }
|
||||
* public void notifyRead(ObservableInputStream in, long timeElapsed, int nrRead, int totalRead)
|
||||
* {
|
||||
* if(totalRead / ((double)timeElapsed) < 0.3) // weniger als 300 bytes/s
|
||||
* {
|
||||
* System.out.println("Task " + this.taskNr + " stalled at pos " + totalRead + " with " + totalRead / (timeElapsed / 1000.0) + " bytes/s");
|
||||
* }
|
||||
* }
|
||||
* public void notifyFinished(ObservableInputStream in, long timeElapsed, int totalRead)
|
||||
* {
|
||||
* /System.out.println("Task " + this.taskNr + " finished (" + totalRead + " bytes in " + timeElapsed + " ms with " + totalRead / (timeElapsed / 1000.0) + " bytes/s)");
|
||||
* }
|
||||
*/
|
||||
/**
|
||||
* Gets the bytesRead attribute of the FetcherTask object
|
||||
*
|
||||
* @return The bytesRead value
|
||||
*/
|
||||
public long getBytesRead()
|
||||
{
|
||||
return bytesRead;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* do nothing if a warning occurs within the html parser
|
||||
*
|
||||
* @param message Description of the Parameter
|
||||
* @param systemID Description of the Parameter
|
||||
* @param line Description of the Parameter
|
||||
* @param column Description of the Parameter
|
||||
* @exception java.lang.Exception Description of the Exception
|
||||
*/
|
||||
public void warning(String message, String systemID, int line, int column)
|
||||
throws java.lang.Exception { }
|
||||
|
||||
|
||||
/**
|
||||
* do nothing if a fatal error occurs...
|
||||
*
|
||||
* @param message Description of the Parameter
|
||||
* @param systemID Description of the Parameter
|
||||
* @param line Description of the Parameter
|
||||
* @param column Description of the Parameter
|
||||
* @exception Exception Description of the Exception
|
||||
*/
|
||||
public void fatal(String message, String systemID, int line, int column)
|
||||
throws Exception
|
||||
{
|
||||
System.out.println("fatal error: " + message);
|
||||
log.log("fatal error: " + message);
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -1,297 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import de.lanlab.larm.threads.TaskQueue;
|
||||
import de.lanlab.larm.util.Queue;
|
||||
import de.lanlab.larm.util.CachingQueue;
|
||||
import de.lanlab.larm.util.HashedCircularLinkedList;
|
||||
import java.net.URL;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* this special kind of task queue reorders the incoming tasks so that every subsequent
|
||||
* task is for a different host.
|
||||
* This is done by a "HashedCircularLinkedList" which allows random adding while
|
||||
* a differnet thread iterates through the collection circularly.
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 23. November 2001
|
||||
* @version $Id$
|
||||
*/
|
||||
public class FetcherTaskQueue extends TaskQueue
|
||||
{
|
||||
/**
|
||||
* this is a hash that contains an entry for each server, which by itself is a
|
||||
* CachingQueue that stores all tasks for this server
|
||||
* @TODO probably link this to the host info structure
|
||||
*/
|
||||
private HashedCircularLinkedList servers = new HashedCircularLinkedList(100, 0.75f);
|
||||
private int size = 0;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the FetcherTaskQueue object. Does nothing
|
||||
*/
|
||||
public FetcherTaskQueue(HostManager manager)
|
||||
{
|
||||
this.manager = manager;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* true if no task is queued
|
||||
*
|
||||
* @return The empty value
|
||||
*/
|
||||
public boolean isEmpty()
|
||||
{
|
||||
return (size == 0);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* clear the queue. not synchronized.
|
||||
*/
|
||||
public void clear()
|
||||
{
|
||||
servers.clear();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* puts task into Queue.
|
||||
* Warning: not synchronized
|
||||
*
|
||||
* @param t the task to be added. must be a FetcherTask
|
||||
*/
|
||||
public void insert(Object t)
|
||||
{
|
||||
// assert (t != null && t.getURL() != null)
|
||||
|
||||
URLMessage um = ((FetcherTask)t).getActURLMessage();
|
||||
URL act = um.getUrl();
|
||||
String host = act.getHost();
|
||||
Queue q;
|
||||
q = ((Queue) servers.get(host));
|
||||
if (q == null)
|
||||
{
|
||||
// add a new host to the queue
|
||||
//String host2 = host.replace(':', '_').replace('/', '_').replace('\\', '_');
|
||||
// make it file system ready
|
||||
// FIXME: put '100' in properties. This is block size (the number of objects/block)
|
||||
q = new CachingQueue(host, 100);
|
||||
servers.put(host, q);
|
||||
}
|
||||
// assert((q != null) && (q instanceof FetcherTaskQueue));
|
||||
q.insert(t);
|
||||
size++;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* the size of the queue. make sure that insert() and size() calls are synchronized
|
||||
* if the exact number matters.
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public int size()
|
||||
{
|
||||
return size;
|
||||
}
|
||||
|
||||
/**
|
||||
* the number of different hosts queued at the moment
|
||||
*/
|
||||
public int getNumHosts()
|
||||
{
|
||||
return servers.size();
|
||||
}
|
||||
|
||||
HostManager manager;
|
||||
/**
|
||||
* get the next task. warning: not synchronized
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public Object remove()
|
||||
{
|
||||
FetcherTask t = null;
|
||||
String start=null;
|
||||
if (servers.size() > 0)
|
||||
{
|
||||
// while(true)
|
||||
// {
|
||||
Queue q = (Queue) servers.next();
|
||||
String host = (String)servers.getCurrentKey();
|
||||
// if(start == null)
|
||||
// {
|
||||
// start = host;
|
||||
// }
|
||||
// else if(host.equals(start))
|
||||
// {
|
||||
// System.out.println("FetcherTaskQueue: all hosts busy. waiting 1sec");
|
||||
// try
|
||||
// {
|
||||
// Thread.sleep(1000);
|
||||
// }
|
||||
// catch(InterruptedException e)
|
||||
// {
|
||||
// break;
|
||||
// }
|
||||
// }
|
||||
// HostInfo hInfo = manager.getHostInfo(host);
|
||||
// System.out.println("getting sync on " + hInfo.getHostName());
|
||||
// synchronized(hInfo.getLockMonitor())
|
||||
// {
|
||||
// if(!hInfo.isBusy())
|
||||
// {
|
||||
// System.out.println("FetcherTaskQueue: host " + host + " ok");
|
||||
// hInfo.obtainLock(); // decreased in FetcherTask
|
||||
// assert(q != null && q.size() > 0)
|
||||
t = (FetcherTask)q.remove();
|
||||
if (q.size() == 0)
|
||||
{
|
||||
servers.removeCurrent();
|
||||
q = null;
|
||||
}
|
||||
size--;
|
||||
// break;
|
||||
// }
|
||||
// else
|
||||
// {
|
||||
// System.out.println("FetcherTaskQueue: host " + host + " is busy. next...");
|
||||
// }
|
||||
// }
|
||||
// }
|
||||
}
|
||||
return t;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* tests
|
||||
*
|
||||
* @param args Description of the Parameter
|
||||
*/
|
||||
public static void main(String args[])
|
||||
{
|
||||
// FIXME: put that into a JUnit test case
|
||||
// FetcherTaskQueue q = new FetcherTaskQueue();
|
||||
// de.lanlab.larm.net.HostResolver hm = new de.lanlab.larm.net.HostResolver();
|
||||
// System.out.println("Test 1. put in 4 yahoos and 3 lmus. pull out LMU/Yahoo/LMU/Yahoo/LMU/Yahoo/Yahoo");
|
||||
// try
|
||||
// {
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.lmu.de/1"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.lmu.de/2"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.yahoo.de/1"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.yahoo.de/2"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.yahoo.de/3"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.yahoo.de/4"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.lmu.de/3"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// }
|
||||
// catch (Throwable t)
|
||||
// {
|
||||
// t.printStackTrace();
|
||||
// }
|
||||
//
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
//
|
||||
// System.out.println("Test 2. new Queue");
|
||||
// q = new FetcherTaskQueue();
|
||||
// System.out.println("size [0]:");
|
||||
// System.out.println(q.size());
|
||||
// try
|
||||
// {
|
||||
// System.out.println("put 3 lmus.");
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.lmu.de/1"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.lmu.de/2"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.lmu.de/3"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// System.out.print("pull out 1st element [lmu/1]: ");
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println("size now [2]: " + q.size());
|
||||
// System.out.print("pull out 2nd element [lmu/2]: ");
|
||||
// System.out.println(((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println("size now [1]: " + q.size());
|
||||
// System.out.println("put in 3 yahoos");
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.yahoo.de/1"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.yahoo.de/2"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.yahoo.de/3"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// System.out.println("remove [?]: " + ((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println("Size now [3]: " + q.size());
|
||||
// System.out.println("remove [?]: " + ((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println("Size now [2]: " + q.size());
|
||||
// System.out.println("remove [?]: " + ((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println("Size now [1]: " + q.size());
|
||||
// System.out.println("put in another Yahoo");
|
||||
// q.insert(new FetcherTask(new URLMessage(new URL("http://www.yahoo.de/4"), null, URLMessage.LINKTYPE_ANCHOR, null, hm)));
|
||||
// System.out.println("remove [?]: " + ((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println("Size now [1]: " + q.size());
|
||||
// System.out.println("remove [?]: " + ((FetcherTask) q.remove()).getInfo());
|
||||
// System.out.println("Size now [0]: " + q.size());
|
||||
// }
|
||||
// catch (Throwable t)
|
||||
// {
|
||||
// t.printStackTrace();
|
||||
// }
|
||||
|
||||
}
|
||||
}
|
|
@ -1,157 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import de.lanlab.larm.threads.ServerThread;
|
||||
import de.lanlab.larm.util.State;
|
||||
import de.lanlab.larm.net.HostManager;
|
||||
import HTTPClient.NVPair;
|
||||
|
||||
/**
|
||||
* a server thread for the thread pool that records the number
|
||||
* of bytes read and the number of tasks run
|
||||
* mainly for statistical purposes and to keep most of the information a task needs
|
||||
* static
|
||||
* @version $Id$
|
||||
*/
|
||||
public class FetcherThread extends ServerThread
|
||||
{
|
||||
|
||||
long totalBytesRead = 0;
|
||||
long totalTasksRun = 0;
|
||||
|
||||
HostManager hostManager;
|
||||
|
||||
byte[] documentBuffer = new byte[Constants.FETCHERTASK_READSIZE];
|
||||
|
||||
/**
|
||||
* default headers for HTTPClient
|
||||
*/
|
||||
private volatile NVPair headers[] = new NVPair[2];
|
||||
|
||||
|
||||
public NVPair[] getDefaultHeaders()
|
||||
{
|
||||
return headers;
|
||||
}
|
||||
|
||||
public int getUsedDefaultHeaders()
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
public HostManager getHostManager()
|
||||
{
|
||||
return hostManager;
|
||||
}
|
||||
|
||||
public FetcherThread(int threadNumber, ThreadGroup threadGroup, HostManager hostManager)
|
||||
{
|
||||
super(threadNumber,"FetcherThread " + threadNumber, threadGroup);
|
||||
this.hostManager = hostManager;
|
||||
headers[0] = new HTTPClient.NVPair("User-Agent", Constants.CRAWLER_AGENT);
|
||||
headers[1] = null; // may contain an additional field
|
||||
}
|
||||
|
||||
public static String STATE_IDLE = "Idle";
|
||||
|
||||
State idleState = new State(STATE_IDLE); // only set if task is finished
|
||||
|
||||
protected void taskReady()
|
||||
{
|
||||
totalBytesRead += ((FetcherTask)task).getBytesRead();
|
||||
totalTasksRun++;
|
||||
super.taskReady();
|
||||
idleState.setState(STATE_IDLE);
|
||||
|
||||
}
|
||||
|
||||
|
||||
public long getTotalBytesRead()
|
||||
{
|
||||
if(task != null)
|
||||
{
|
||||
return totalBytesRead + ((FetcherTask)task).getBytesRead();
|
||||
}
|
||||
else
|
||||
{
|
||||
return totalBytesRead;
|
||||
}
|
||||
}
|
||||
|
||||
public long getTotalTasksRun()
|
||||
{
|
||||
return totalTasksRun;
|
||||
}
|
||||
|
||||
public byte[] getDocumentBuffer()
|
||||
{
|
||||
return documentBuffer;
|
||||
}
|
||||
|
||||
public State getTaskState()
|
||||
{
|
||||
if(task != null)
|
||||
{
|
||||
// task could be null here
|
||||
return ((FetcherTask)task).getTaskState();
|
||||
}
|
||||
else
|
||||
{
|
||||
return idleState.cloneState();
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -1,101 +0,0 @@
|
|||
/*
|
||||
* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
package de.lanlab.larm.fetcher;
|
||||
import de.lanlab.larm.threads.*;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* this factory simply creates fetcher threads. It's passed to the ThreadPool
|
||||
* because the pool is creating the threads on its own
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 14. Juni 2002
|
||||
* @version $Id: FetcherThreadFactory.java,v 1.2 2002/05/22 23:09:17
|
||||
* cmarschner Exp $
|
||||
*/
|
||||
public class FetcherThreadFactory extends ThreadFactory
|
||||
{
|
||||
|
||||
//static int count = 0;
|
||||
|
||||
ThreadGroup threadGroup = new ThreadGroup("FetcherThreads");
|
||||
|
||||
HostManager hostManager;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the FetcherThreadFactory object
|
||||
*
|
||||
* @param hostManager Description of the Parameter
|
||||
*/
|
||||
public FetcherThreadFactory(HostManager hostManager)
|
||||
{
|
||||
this.hostManager = hostManager;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param count Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public ServerThread createServerThread(int count)
|
||||
{
|
||||
ServerThread newThread = new FetcherThread(count, threadGroup, hostManager);
|
||||
newThread.setPriority(4);
|
||||
return newThread;
|
||||
}
|
||||
}
|
|
@ -1,75 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
|
||||
/**
|
||||
* base class of all filter classes
|
||||
* @version $Id$
|
||||
*/
|
||||
public abstract class Filter
|
||||
{
|
||||
/**
|
||||
* number of items filtered. augmented directly by
|
||||
* the inheriting classes
|
||||
*/
|
||||
protected int filtered = 0;
|
||||
|
||||
|
||||
public int getFiltered()
|
||||
{
|
||||
return filtered;
|
||||
}
|
||||
}
|
|
@ -1,103 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.io.*;
|
||||
import java.util.zip.*;
|
||||
import java.net.*;
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 28. Januar 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class GZipTest
|
||||
{
|
||||
|
||||
/**
|
||||
* Constructor for the GZipTest object
|
||||
*/
|
||||
public GZipTest() { }
|
||||
|
||||
|
||||
/**
|
||||
* The main program for the GZipTest class
|
||||
*
|
||||
* @param args The command line arguments
|
||||
*/
|
||||
public static void main(String[] args)
|
||||
{
|
||||
try
|
||||
{
|
||||
String url = "http://speechdat.phonetik.uni-muenchen.de/speechdt//speechDB/FIXED1SL/BLOCK00/SES0006/A10006O5.aif";
|
||||
|
||||
ByteArrayOutputStream a = new ByteArrayOutputStream(url.length());
|
||||
GZIPOutputStream g = new GZIPOutputStream(a);
|
||||
OutputStreamWriter o = new OutputStreamWriter(g,"ISO-8859-1");
|
||||
|
||||
o.write(url);
|
||||
o.close();
|
||||
g.finish();
|
||||
byte[] array = a.toByteArray();
|
||||
System.out.println("URL: " + url + " \n Length: " + url.length() + "\n zipped: " + array.length
|
||||
);
|
||||
}
|
||||
catch (Exception e)
|
||||
{ e.printStackTrace();
|
||||
}
|
||||
}
|
||||
}
|
|
@ -1,183 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.net.*;
|
||||
import java.util.ArrayList;
|
||||
import java.io.*;
|
||||
import de.lanlab.larm.util.*;
|
||||
|
||||
/**
|
||||
* this can be considered a hack
|
||||
* @TODO implement this as a fast way to filter out different URL endings or beginnings
|
||||
* @version $Id$
|
||||
*/
|
||||
public class KnownPathsFilter extends Filter implements MessageListener
|
||||
{
|
||||
|
||||
MessageHandler messageHandler;
|
||||
|
||||
String[] pathsToFilter =
|
||||
{
|
||||
"/robots.txt",
|
||||
"/lmu-32321800/"
|
||||
};
|
||||
|
||||
ArrayList hosts = new ArrayList();
|
||||
Object[] hostsToFilter = null;
|
||||
|
||||
String[] filesToFilter =
|
||||
{
|
||||
// exclude Apache directory files
|
||||
"/?D=D",
|
||||
"/?S=D",
|
||||
"/?M=D",
|
||||
"/?N=D",
|
||||
"/?D=A",
|
||||
"/?S=A",
|
||||
"/?M=A",
|
||||
"/?N=A",
|
||||
};
|
||||
|
||||
int pathLength;
|
||||
int fileLength;
|
||||
int hostLength;
|
||||
SimpleLogger log;
|
||||
|
||||
/**
|
||||
* Constructor for the KnownPathsFilter object
|
||||
*/
|
||||
public KnownPathsFilter(SimpleLogger log)
|
||||
{
|
||||
pathLength = pathsToFilter.length;
|
||||
this.log = log;
|
||||
fileLength = filesToFilter.length;
|
||||
}
|
||||
|
||||
/**
|
||||
* add "forbidden" host name
|
||||
* note: this has no effect after the filter has been added to the message handler
|
||||
* @param hostname
|
||||
*/
|
||||
public void addHostToFilter(String hostname)
|
||||
{
|
||||
this.hosts.add(hostname);
|
||||
}
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param message Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public Message handleRequest(Message message)
|
||||
{
|
||||
try
|
||||
{
|
||||
URL url = new URL(((URLMessage)message).getNormalizedURLString());
|
||||
String file = url.getFile();
|
||||
String host = url.getHost();
|
||||
int i;
|
||||
for (i = 0; i < pathLength; i++)
|
||||
{
|
||||
if (file.startsWith(pathsToFilter[i]))
|
||||
{
|
||||
filtered++;
|
||||
//log.log("KnownPathsFilter: filtered file '" + url + "' - file starts with " + pathsToFilter[i]);
|
||||
log.log(message.toString());
|
||||
return null;
|
||||
}
|
||||
}
|
||||
for (i = 0; i < fileLength; i++)
|
||||
{
|
||||
if (file.endsWith(filesToFilter[i]))
|
||||
{
|
||||
filtered++;
|
||||
//log.log("KnownPathsFilter: filtered file '" + url + "' - file ends with " + filesToFilter[i]);
|
||||
log.log(message.toString());
|
||||
return null;
|
||||
}
|
||||
}
|
||||
for (i = 0; i<hostLength; i++)
|
||||
{
|
||||
if(hostsToFilter[i].equals(host))
|
||||
{
|
||||
filtered++;
|
||||
//log.log("KnownPathsFilter: filtered file '" + url + "' - host equals " + host);
|
||||
log.log(message.toString());
|
||||
return null;
|
||||
}
|
||||
}
|
||||
}
|
||||
catch(MalformedURLException e)
|
||||
{
|
||||
e.printStackTrace();
|
||||
}
|
||||
return message;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* will be called as soon as the Listener is added to the Message Queue
|
||||
*
|
||||
* @param handler the Message Handler
|
||||
*/
|
||||
public void notifyAddedToMessageHandler(MessageHandler handler)
|
||||
{
|
||||
this.messageHandler = messageHandler;
|
||||
this.hostsToFilter = hosts.toArray();
|
||||
this.hostLength = hostsToFilter.length;
|
||||
}
|
||||
}
|
|
@ -1,66 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.io.*;
|
||||
|
||||
/**
|
||||
* Marker interface.
|
||||
* represents a simple message.
|
||||
* @version $Id$
|
||||
*/
|
||||
public interface Message
|
||||
{
|
||||
}
|
|
@ -1,319 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.util.*;
|
||||
import de.lanlab.larm.util.SimpleObservable;
|
||||
import de.lanlab.larm.util.CachingQueue;
|
||||
import de.lanlab.larm.util.UnderflowException;
|
||||
import de.lanlab.larm.storage.LinkStorage;
|
||||
|
||||
/**
|
||||
* This is a message handler that runs in its own thread.
|
||||
* Messages can be put via <code>putMessage</code> or <code>putMessages</code>
|
||||
* (use the latter whenever possible).<br>
|
||||
* Messages are passed to the filters in the order in which the filters where
|
||||
* added to the handler.<br>
|
||||
* They can consume a message by returning <code>null</code>. Otherwise, they
|
||||
* return a Message object, usually the one they got.<br>
|
||||
* The filters will run synchronously within the message handler thread<br>
|
||||
* This implements a chain of responsibility-style message handling.
|
||||
* @version $Id$
|
||||
*/
|
||||
public class MessageHandler implements Runnable, LinkStorage
|
||||
{
|
||||
|
||||
/**
|
||||
* the queue where messages are put in.
|
||||
* Holds max. 2 x 5000 = 10.000 messages in RAM
|
||||
*/
|
||||
private CachingQueue messageQueue = new CachingQueue("fetcherURLMessageQueue", 5000);
|
||||
|
||||
/**
|
||||
* list of Observers
|
||||
*/
|
||||
private LinkedList listeners = new LinkedList();
|
||||
|
||||
/**
|
||||
* true as long as the thread is running
|
||||
*/
|
||||
private boolean running = true;
|
||||
|
||||
/**
|
||||
* the message handler thread
|
||||
*/
|
||||
private Thread t;
|
||||
|
||||
/**
|
||||
* flag for thread communication
|
||||
*/
|
||||
boolean messagesWaiting = false;
|
||||
|
||||
/**
|
||||
* true when a message is processed by the filters
|
||||
*/
|
||||
boolean workingOnMessage = false;
|
||||
|
||||
Object queueMonitor = new Object();
|
||||
|
||||
SimpleObservable messageQueueObservable = new SimpleObservable();
|
||||
SimpleObservable messageProcessorObservable = new SimpleObservable();
|
||||
|
||||
/**
|
||||
* messageHandler-Thread erzeugen und starten
|
||||
*/
|
||||
public MessageHandler()
|
||||
{
|
||||
t = new Thread(this, "MessageHandler Thread");
|
||||
// higher priority to prevent starving when a lot of fetcher threads are used
|
||||
t.setPriority(5);
|
||||
t.start();
|
||||
}
|
||||
|
||||
public boolean isWorkingOnMessage()
|
||||
{
|
||||
return workingOnMessage;
|
||||
}
|
||||
|
||||
/**
|
||||
* join messageHandler-Thread
|
||||
*/
|
||||
public void finalize()
|
||||
{
|
||||
if (t != null)
|
||||
{
|
||||
try
|
||||
{
|
||||
t.join();
|
||||
t = null;
|
||||
}
|
||||
catch(InterruptedException e) {}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* registers a filter to the message handler
|
||||
* @param MessageListener - the Listener
|
||||
*/
|
||||
public void addListener(MessageListener m)
|
||||
{
|
||||
m.notifyAddedToMessageHandler(this);
|
||||
listeners.addLast(m);
|
||||
}
|
||||
|
||||
/**
|
||||
* registers a MessageQueueObserver
|
||||
* It will be notified whenever a message is put into the Queue (Parameter is Int(1)) oder
|
||||
* removed (Parameter is Int(-1))
|
||||
* @param o the Observer
|
||||
*/
|
||||
public void addMessageQueueObserver(Observer o)
|
||||
{
|
||||
messageQueueObservable.addObserver(o);
|
||||
}
|
||||
|
||||
/**
|
||||
* adds a message processorObeserver
|
||||
* It will be notified when a message is consumed. In this case the parameter
|
||||
* is the filter that consumed the message
|
||||
* @param o the Observer
|
||||
*/
|
||||
public void addMessageProcessorObserver(Observer o)
|
||||
{
|
||||
messageProcessorObservable.addObserver(o);
|
||||
}
|
||||
|
||||
/**
|
||||
* insert one message into the queue
|
||||
*/
|
||||
public void putMessage(Message msg)
|
||||
{
|
||||
messageQueue.insert(msg);
|
||||
messageQueueObservable.setChanged();
|
||||
messageQueueObservable.notifyObservers(new Integer(1));
|
||||
synchronized(queueMonitor)
|
||||
{
|
||||
messagesWaiting = true;
|
||||
queueMonitor.notify();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* add a collection of events to the message queue
|
||||
*/
|
||||
public void putMessages(Collection msgs)
|
||||
{
|
||||
for(Iterator i = msgs.iterator(); i.hasNext();)
|
||||
{
|
||||
Message msg = (Message)i.next();
|
||||
messageQueue.insert(msg);
|
||||
}
|
||||
messageQueueObservable.setChanged();
|
||||
messageQueueObservable.notifyObservers(new Integer(1));
|
||||
synchronized(queueMonitor)
|
||||
{
|
||||
messagesWaiting = true;
|
||||
queueMonitor.notify();
|
||||
}
|
||||
}
|
||||
|
||||
public Collection storeLinks(Collection links)
|
||||
{
|
||||
putMessages(links);
|
||||
return links;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* the main messageHandler-Thread.
|
||||
*/
|
||||
public void run()
|
||||
{
|
||||
while(running)
|
||||
{
|
||||
//System.out.println("MessageHandler-Thread started");
|
||||
|
||||
synchronized(queueMonitor)
|
||||
{
|
||||
// wait for new messages
|
||||
workingOnMessage=false;
|
||||
try
|
||||
{
|
||||
queueMonitor.wait();
|
||||
}
|
||||
catch(InterruptedException e)
|
||||
{
|
||||
System.out.println("MessageHandler: Caught InterruptedException");
|
||||
}
|
||||
workingOnMessage=true;
|
||||
}
|
||||
//messagesWaiting = false;
|
||||
Message m;
|
||||
try
|
||||
{
|
||||
while(messagesWaiting)
|
||||
{
|
||||
synchronized(this.queueMonitor)
|
||||
{
|
||||
// note: another thread may put a new message in the queue after
|
||||
// messageQueue.size() is called below, which would result in the
|
||||
// inconsistent state: messageWaiting would be set to false, but
|
||||
// the queue would actually not be empty
|
||||
m = (Message)messageQueue.remove();
|
||||
if (messageQueue.size() == 0)
|
||||
{
|
||||
messagesWaiting = false;
|
||||
}
|
||||
|
||||
}
|
||||
//System.out.println("MessageHandler:run: Entferne erstes Element");
|
||||
|
||||
messageQueueObservable.setChanged();
|
||||
messageQueueObservable.notifyObservers(new Integer(-1)); // Message processed
|
||||
|
||||
// now distribute them. The handlers get the messages in the order
|
||||
// of insertion and have the right to change them
|
||||
|
||||
Iterator i = listeners.iterator();
|
||||
while(i.hasNext())
|
||||
{
|
||||
try
|
||||
{
|
||||
MessageListener listener = (MessageListener)i.next();
|
||||
m = (Message)listener.handleRequest(m);
|
||||
if (m == null)
|
||||
{
|
||||
// handler has consumed the message
|
||||
messageProcessorObservable.setChanged();
|
||||
messageProcessorObservable.notifyObservers(listener);
|
||||
break;
|
||||
}
|
||||
}
|
||||
catch(ClassCastException e)
|
||||
{
|
||||
System.out.println("MessageHandler:run: ClassCastException(2): " + e.getMessage());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
catch (ClassCastException e)
|
||||
{
|
||||
System.out.println("MessageHandler:run: ClassCastException: " + e.getMessage());
|
||||
}
|
||||
catch (UnderflowException e)
|
||||
{
|
||||
messagesWaiting = false;
|
||||
// System.out.println("MessageHandler: messagesWaiting = true although nothing queued!");
|
||||
// @FIXME: here is still a multi threading issue. I don't get it why this happens.
|
||||
// does someone want to draw a petri net of this? ;-)
|
||||
}
|
||||
catch (Exception e)
|
||||
{
|
||||
System.out.println("MessageHandler: " + e.getClass() + " " + e.getMessage());
|
||||
e.printStackTrace();
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
public int getQueued()
|
||||
{
|
||||
return messageQueue.size();
|
||||
}
|
||||
|
||||
public void openLinkStorage()
|
||||
{
|
||||
}
|
||||
}
|
|
@ -1,84 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
/**
|
||||
* A Message Listener works on messages in a message queue Usually it returns
|
||||
* the message back into the queue. But it can also change the message or create
|
||||
* a new object. If it returns null, the message handler stops
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 24. November 2001
|
||||
* @version $Id$
|
||||
*/
|
||||
public interface MessageListener
|
||||
{
|
||||
/**
|
||||
* the handler
|
||||
*
|
||||
* @param message the message to be handled
|
||||
* @return Message usually the original message
|
||||
* null: the message was consumed
|
||||
*/
|
||||
public Message handleRequest(Message message);
|
||||
|
||||
|
||||
/**
|
||||
* will be called as soon as the Listener is added to the Message Queue
|
||||
*
|
||||
* @param handler the Message Handler
|
||||
*/
|
||||
public void notifyAddedToMessageHandler(MessageHandler handler);
|
||||
}
|
|
@ -1,485 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import de.lanlab.larm.util.SimpleObservable;
|
||||
import de.lanlab.larm.util.State;
|
||||
import java.util.*;
|
||||
import java.net.*;
|
||||
import java.io.*;
|
||||
import org.apache.oro.text.perl.Perl5Util;
|
||||
import de.lanlab.larm.util.*;
|
||||
import de.lanlab.larm.threads.*;
|
||||
import HTTPClient.*;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* this factory simply creates fetcher threads. It's gonna be passed to the
|
||||
* ThreadPool because the pool is creating the threads on its own
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 17. Februar 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
class REFThreadFactory extends ThreadFactory
|
||||
{
|
||||
|
||||
ThreadGroup threadGroup = new ThreadGroup("RobotExclusionFilter");
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param count Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public ServerThread createServerThread(int count)
|
||||
{
|
||||
ServerThread newThread = new ServerThread(count, "REF-" + count, threadGroup);
|
||||
newThread.setPriority(4);
|
||||
return newThread;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* the RE filter obeys the robot exclusion standard. If a new host name is supposed
|
||||
* to be accessed, it first loads a "/robots.txt" on the given server and records the
|
||||
* disallows stated in that file.
|
||||
* The REFilter has a thread pool on its own to prevent the message handler from being
|
||||
* clogged up if the server doesn't respond. Incoming messages are queued while the
|
||||
* robots.txt is loaded.
|
||||
* The information is stored in HostInfo records of the host manager class
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 17. Februar 2002
|
||||
*/
|
||||
public class RobotExclusionFilter extends Filter implements MessageListener
|
||||
{
|
||||
|
||||
|
||||
protected HostManager hostManager;
|
||||
|
||||
protected SimpleLogger log;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the RobotExclusionFilter object
|
||||
*
|
||||
* @param hm Description of the Parameter
|
||||
*/
|
||||
public RobotExclusionFilter(HostManager hm)
|
||||
{
|
||||
log = new SimpleLogger("RobotExclusionFilter", true);
|
||||
hostManager = hm;
|
||||
rePool = new ThreadPool(5, new REFThreadFactory());
|
||||
rePool.init();
|
||||
log.setFlushAtOnce(true);
|
||||
log.log("refilter: initialized");
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* called by the message handler
|
||||
*/
|
||||
public void notifyAddedToMessageHandler(MessageHandler handler)
|
||||
{
|
||||
this.messageHandler = handler;
|
||||
}
|
||||
|
||||
|
||||
MessageHandler messageHandler = null;
|
||||
ThreadPool rePool;
|
||||
|
||||
|
||||
/**
|
||||
* method that handles each URL request<p>
|
||||
*
|
||||
* This method will get the robots.txt file the first time a server is
|
||||
* requested. See the description above.
|
||||
*
|
||||
* @param message
|
||||
* the (URL)Message
|
||||
* @return
|
||||
* the original message or NULL if this host had a disallow on that URL
|
||||
* @link{http://info.webcrawler.com/mak/projects/robots/norobots.html})
|
||||
*/
|
||||
|
||||
public Message handleRequest(Message message)
|
||||
{
|
||||
//log.logThreadSafe("handleRequest: got message: " + message);
|
||||
try
|
||||
{
|
||||
// assert message instanceof URLMessage;
|
||||
URLMessage urlMsg = ((URLMessage) message);
|
||||
URL url = urlMsg.getUrl();
|
||||
// String urlString = urlMsg.getNormalizedURLString();
|
||||
// URL nUrl = new URL(urlString);
|
||||
//assert url != null;
|
||||
HostInfo h = hostManager.getHostInfo(url.getHost());
|
||||
synchronized (h)
|
||||
{
|
||||
if (!h.isRobotTxtChecked() && !h.isLoadingRobotsTxt())
|
||||
{
|
||||
log.logThreadSafe("handleRequest: starting to get robots.txt");
|
||||
// probably this results in Race Conditions here
|
||||
|
||||
rePool.doTask(new RobotExclusionTask(h), new Integer(h.getId()));
|
||||
h.setLoadingRobotsTxt(true);
|
||||
}
|
||||
|
||||
// isLoading...() and queuedRequest.insert() must be atomic
|
||||
if (h.isLoadingRobotsTxt())
|
||||
{
|
||||
|
||||
//log.logThreadSafe("handleRequest: other thread is loading");
|
||||
// assert h.queuedRequests != null
|
||||
h.insertIntoQueue(message);
|
||||
// not thread safe
|
||||
log.logThreadSafe("handleRequest: queued file " + url);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
//log.logThreadSafe("handleRequest: no thread is loading; robots.txt loaded");
|
||||
//log.logThreadSafe("handleRequest: checking if allowed");
|
||||
String path = url.getPath();
|
||||
if (path == null || path.equals(""))
|
||||
{
|
||||
path = "/";
|
||||
}
|
||||
|
||||
if (h.isAllowed(path))
|
||||
{
|
||||
// log.logThreadSafe("handleRequest: file " + urlMsg.getURLString() + " ok");
|
||||
return message;
|
||||
}
|
||||
log.logThreadSafe("handleRequest: file " + urlMsg.getURLString() + " filtered");
|
||||
this.filtered++;
|
||||
}
|
||||
catch (Exception e)
|
||||
{
|
||||
e.printStackTrace();
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
|
||||
private static volatile NVPair headers[] = new NVPair[1];
|
||||
|
||||
static
|
||||
{
|
||||
headers[0] = new HTTPClient.NVPair("User-Agent", Constants.CRAWLER_AGENT);
|
||||
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* the task that actually loads and parses the robots.txt files
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 17. Februar 2002
|
||||
*/
|
||||
class RobotExclusionTask implements InterruptableTask
|
||||
{
|
||||
HostInfo hostInfo;
|
||||
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the RobotExclusionTask object
|
||||
*
|
||||
* @param hostInfo Description of the Parameter
|
||||
*/
|
||||
public RobotExclusionTask(HostInfo hostInfo)
|
||||
{
|
||||
this.hostInfo = hostInfo;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* dummy
|
||||
*
|
||||
* @return The info value
|
||||
*/
|
||||
public String getInfo()
|
||||
{
|
||||
return "";
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* not used
|
||||
*/
|
||||
public void interrupt() { }
|
||||
|
||||
|
||||
/**
|
||||
* gets a robots.txt file and adds the information to the hostInfo
|
||||
* structure
|
||||
*
|
||||
* @param thread the server thread (passed by the thread pool)
|
||||
*/
|
||||
public void run(ServerThread thread)
|
||||
{
|
||||
String threadName = Thread.currentThread().getName();
|
||||
synchronized(hostInfo)
|
||||
{
|
||||
if(hostInfo.isRobotTxtChecked())
|
||||
{
|
||||
log.logThreadSafe("task " + threadName + ": already loaded " + hostInfo.getHostName());
|
||||
return; // may happen 'cause check is not synchronized
|
||||
}
|
||||
}
|
||||
// assert hostInfo != null;
|
||||
|
||||
log.logThreadSafe("task " + threadName + ": starting to load " + hostInfo.getHostName());
|
||||
//hostInfo.setLoadingRobotsTxt(true);
|
||||
String[] disallows = null;
|
||||
boolean errorOccured = false;
|
||||
try
|
||||
{
|
||||
log.logThreadSafe("task " + threadName + ": getting connection");
|
||||
HTTPConnection conn = new HTTPConnection(hostInfo.getHostName());
|
||||
conn.setTimeout(30000);
|
||||
// wait at most 20 secs
|
||||
|
||||
HTTPResponse res = conn.Get("/robots.txt", (String) null, headers);
|
||||
log.logThreadSafe("task " + threadName + ": got connection.");
|
||||
if (res.getStatusCode() != 200)
|
||||
{
|
||||
errorOccured = true;
|
||||
log.log("task " + threadName + ": return code was " + res.getStatusCode());
|
||||
}
|
||||
else
|
||||
{
|
||||
|
||||
log.logThreadSafe("task " + threadName + ": reading");
|
||||
byte[] file = res.getData(40000);
|
||||
// max. 40 kb
|
||||
log.logThreadSafe("task " + threadName + ": reading done. parsing");
|
||||
disallows = parse(new BufferedReader(new InputStreamReader(new ByteArrayInputStream(file))));
|
||||
log.logThreadSafe("task " + threadName + ": parsing done. found " + disallows.length + " disallows");
|
||||
// assert disallows != null
|
||||
// HostInfo hostInfo = hostManager.getHostInfo(this.hostName);
|
||||
// assert hostInfo != null
|
||||
log.logThreadSafe("task " + threadName + ": setting disallows");
|
||||
}
|
||||
}
|
||||
catch (java.net.UnknownHostException e)
|
||||
{
|
||||
hostInfo.setReachable(false);
|
||||
log.logThreadSafe("task " + threadName + ": unknown host '" + hostInfo.getHostName() + "'. setting to unreachable");
|
||||
errorOccured = true;
|
||||
}
|
||||
catch (java.net.NoRouteToHostException e)
|
||||
{
|
||||
hostInfo.setReachable(false);
|
||||
log.logThreadSafe("task " + threadName + ": no route to '"+hostInfo.getHostName()+"'. setting to unreachable");
|
||||
errorOccured = true;
|
||||
}
|
||||
catch (java.net.ConnectException e)
|
||||
{
|
||||
hostInfo.setReachable(false);
|
||||
log.logThreadSafe("task " + threadName + ": connect exception while connecting to '"+hostInfo.getHostName()+"'. setting to unreachable");
|
||||
errorOccured = true;
|
||||
}
|
||||
catch (java.io.InterruptedIOException e)
|
||||
{
|
||||
// time out. fatal in this case
|
||||
hostInfo.setReachable(false);
|
||||
log.logThreadSafe("task " + threadName + ": time out while connecting to '" +hostInfo.getHostName() + "'. setting to unreachable");
|
||||
errorOccured = true;
|
||||
}
|
||||
|
||||
catch (Throwable e)
|
||||
{
|
||||
errorOccured = true;
|
||||
log.log("task " + threadName + ": unknown exception: " + e.getClass().getName() + ": " + e.getMessage() + ". continuing");
|
||||
log.log(e);
|
||||
|
||||
}
|
||||
finally
|
||||
{
|
||||
if (errorOccured)
|
||||
{
|
||||
log.logThreadSafe("task " + threadName + ": error occured. putback...");
|
||||
synchronized (hostInfo)
|
||||
{
|
||||
hostInfo.setRobotsChecked(true, null);
|
||||
// crawl everything
|
||||
hostInfo.setLoadingRobotsTxt(false);
|
||||
log.logThreadSafe("task " + threadName + ": now put " + hostInfo.getQueueSize() + " queueud requests back");
|
||||
//hostInfo.setLoadingRobotsTxt(false);
|
||||
putBackURLs();
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
log.logThreadSafe("task " + threadName + ": finished. putback...");
|
||||
synchronized (hostInfo)
|
||||
{
|
||||
hostInfo.setRobotsChecked(true, disallows);
|
||||
log.logThreadSafe("task " + threadName + ": done");
|
||||
log.logThreadSafe("task " + threadName + ": now put " + hostInfo.getQueueSize() + " queueud requests back");
|
||||
hostInfo.setLoadingRobotsTxt(false);
|
||||
putBackURLs();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* put back queued URLs
|
||||
*/
|
||||
private void putBackURLs()
|
||||
{
|
||||
|
||||
int qSize = hostInfo.getQueueSize();
|
||||
while (hostInfo.getQueueSize() > 0)
|
||||
{
|
||||
messageHandler.putMessage((Message) hostInfo.removeFromQueue());
|
||||
}
|
||||
log.logThreadSafe("task " + Thread.currentThread().getName() + ": finished. put " + qSize + " URLs back");
|
||||
hostInfo.removeQueue();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this parses the robots.txt file. It was taken from the PERL implementation
|
||||
* Since this is only rarely called, it's not optimized for speed
|
||||
*
|
||||
* @param r the robots.txt file
|
||||
* @return the disallows
|
||||
* @exception IOException any IOException
|
||||
*/
|
||||
public String[] parse(BufferedReader r)
|
||||
throws IOException
|
||||
{
|
||||
// taken from Perl
|
||||
Perl5Util p = new Perl5Util();
|
||||
String line;
|
||||
boolean isMe = false;
|
||||
boolean isAnon = false;
|
||||
ArrayList disallowed = new ArrayList();
|
||||
String ua = null;
|
||||
|
||||
while ((line = r.readLine()) != null)
|
||||
{
|
||||
if (p.match("/^#.*/", line))
|
||||
{
|
||||
// a comment
|
||||
continue;
|
||||
}
|
||||
line = p.substitute("s/\\s*\\#.* //", line);
|
||||
if (p.match("/^\\s*$/", line))
|
||||
{
|
||||
if (isMe)
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
else if (p.match("/^User-Agent:\\s*(.*)/i", line))
|
||||
{
|
||||
ua = p.group(1);
|
||||
ua = p.substitute("s/\\s+$//", ua);
|
||||
if (isMe)
|
||||
{
|
||||
break;
|
||||
}
|
||||
else if (ua.equals("*"))
|
||||
{
|
||||
isAnon = true;
|
||||
}
|
||||
else if (Constants.CRAWLER_AGENT.startsWith(ua))
|
||||
{
|
||||
isMe = true;
|
||||
}
|
||||
}
|
||||
else if (p.match("/^Disallow:\\s*(.*)/i", line))
|
||||
{
|
||||
if (ua == null)
|
||||
{
|
||||
isAnon = true;
|
||||
// warn...
|
||||
}
|
||||
String disallow = p.group(1);
|
||||
if (disallow != null && disallow.length() > 0)
|
||||
{
|
||||
// assume we have a relative path
|
||||
;
|
||||
}
|
||||
else
|
||||
{
|
||||
disallow = "/";
|
||||
}
|
||||
if (isMe || isAnon)
|
||||
{
|
||||
disallowed.add(disallow);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
// warn: unexpected line
|
||||
}
|
||||
}
|
||||
String[] disalloweds = new String[disallowed.size()];
|
||||
disallowed.toArray(disalloweds);
|
||||
return disalloweds;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
}
|
|
@ -1,603 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
|
||||
import de.lanlab.larm.threads.*;
|
||||
import java.util.*;
|
||||
import java.text.*;
|
||||
import java.io.*;
|
||||
import de.lanlab.larm.util.State;
|
||||
import de.lanlab.larm.util.SimpleLoggerManager;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* this monitor takes a sample of every thread every x milliseconds,
|
||||
* and logs a lot of information. In the near past it has evolved into the multi
|
||||
* purpose monitoring and maintenance facility.
|
||||
* At the moment it prints status information
|
||||
* to log files and to the console
|
||||
* @TODO this can be done better. Probably with an agent where different services
|
||||
* can be registered to be called every X seconds
|
||||
* @version $Id$
|
||||
*/
|
||||
public class ThreadMonitor extends Observable implements Runnable
|
||||
{
|
||||
/**
|
||||
* a reference to the thread pool that's gonna be observed
|
||||
*/
|
||||
private ThreadPool threadPool;
|
||||
|
||||
|
||||
class Sample
|
||||
{
|
||||
long bytesRead;
|
||||
long docsRead;
|
||||
long time;
|
||||
public Sample(long bytesRead, long docsRead, long time)
|
||||
{
|
||||
this.bytesRead = bytesRead;
|
||||
this.docsRead = docsRead;
|
||||
this.time = time;
|
||||
}
|
||||
}
|
||||
|
||||
ArrayList bytesReadPerPeriod;
|
||||
|
||||
/**
|
||||
* Zeit zwischen den Messungen
|
||||
*/
|
||||
int sampleDelta;
|
||||
|
||||
/**
|
||||
* the thread where this monitor runs in. Will run with high priority
|
||||
*/
|
||||
Thread thread;
|
||||
|
||||
|
||||
URLVisitedFilter urlVisitedFilter;
|
||||
URLScopeFilter urlScopeFilter;
|
||||
// DNSResolver dnsResolver;
|
||||
RobotExclusionFilter reFilter;
|
||||
MessageHandler messageHandler;
|
||||
URLLengthFilter urlLengthFilter;
|
||||
HostManager hostManager;
|
||||
|
||||
public final static double KBYTE = 1024;
|
||||
public final static double MBYTE = 1024 * KBYTE;
|
||||
public final static double ONEGBYTE = 1024 * MBYTE;
|
||||
|
||||
|
||||
String formatBytes(long lbytes)
|
||||
{
|
||||
double bytes = (double)lbytes;
|
||||
if(bytes >= ONEGBYTE)
|
||||
{
|
||||
return fractionFormat.format((bytes/ONEGBYTE)) + " GB";
|
||||
}
|
||||
else if(bytes >= MBYTE)
|
||||
{
|
||||
return fractionFormat.format(bytes/MBYTE) + " MB";
|
||||
}
|
||||
else if(bytes >= KBYTE)
|
||||
{
|
||||
return fractionFormat.format(bytes/KBYTE) + " KB";
|
||||
}
|
||||
else
|
||||
{
|
||||
return fractionFormat.format(bytes) + " Bytes";
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/**
|
||||
* a logfile where status information is posted
|
||||
* FIXME: put that in a seperate class (double code in FetcherTask)
|
||||
*/
|
||||
PrintWriter logWriter;
|
||||
private SimpleDateFormat formatter
|
||||
= new SimpleDateFormat ("hh:mm:ss:SSSS");
|
||||
private DecimalFormat fractionFormat = new DecimalFormat("0.00");
|
||||
|
||||
long startTime = System.currentTimeMillis();
|
||||
|
||||
private void log(String text)
|
||||
{
|
||||
try
|
||||
{
|
||||
logWriter.println(formatter.format(new Date()) + ";" + (System.currentTimeMillis()-startTime) + ";" + text);
|
||||
logWriter.flush();
|
||||
}
|
||||
catch(Exception e)
|
||||
{
|
||||
System.out.println("Couldn't write to logfile");
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* construct the monitor gets a reference to all monitored filters
|
||||
* @param threadPool the pool to be observed
|
||||
* @param sampleDelta time in ms between samples
|
||||
*/
|
||||
public ThreadMonitor(URLLengthFilter urlLengthFilter,
|
||||
URLVisitedFilter urlVisitedFilter,
|
||||
URLScopeFilter urlScopeFilter,
|
||||
/*DNSResolver dnsResolver,*/
|
||||
RobotExclusionFilter reFilter,
|
||||
MessageHandler messageHandler,
|
||||
ThreadPool threadPool,
|
||||
HostManager hostManager,
|
||||
int sampleDelta)
|
||||
{
|
||||
this.urlLengthFilter = urlLengthFilter;
|
||||
this.urlVisitedFilter = urlVisitedFilter;
|
||||
this.urlScopeFilter = urlScopeFilter;
|
||||
/* this.dnsResolver = dnsResolver;*/
|
||||
this.hostManager = hostManager;
|
||||
this.reFilter = reFilter;
|
||||
this.messageHandler = messageHandler;
|
||||
|
||||
this.threadPool = threadPool;
|
||||
bytesReadPerPeriod = new ArrayList();
|
||||
this.sampleDelta = sampleDelta;
|
||||
this.thread = new Thread(this, "ThreadMonitor");
|
||||
this.thread.setPriority(7);
|
||||
|
||||
try
|
||||
{
|
||||
// FIXME: at least take SimpleLogger, if not something else
|
||||
File logDir = new File("logs");
|
||||
logDir.mkdir();
|
||||
logWriter = new PrintWriter(new BufferedWriter(new FileWriter("logs/ThreadMonitor.log")));
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
System.out.println("Couldn't create logfile (ThreadMonitor)");
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/**
|
||||
* java.lang.Threads run method. To be invoked via start()
|
||||
* the monitor's main thread takes the samples every sampleDelta ms
|
||||
* Since Java is not real time, it remembers
|
||||
*/
|
||||
public void run()
|
||||
{
|
||||
int nothingReadCount = 0;
|
||||
long lastPeriodBytesRead = -1;
|
||||
long monitorRunCount = 0;
|
||||
long startTime = System.currentTimeMillis();
|
||||
log("time;overallBytesRead;overallTasksRun;urlsQueued;urlsWaiting;isWorkingOnMessage;urlsScopeFiltered;urlsVisitedFiltered;urlsREFiltered;memUsed;memFree;totalMem;nrHosts;visitedSize;visitedStringSize;urlLengthFiltered");
|
||||
while(true)
|
||||
{
|
||||
try
|
||||
{
|
||||
try
|
||||
{
|
||||
thread.sleep(sampleDelta);
|
||||
}
|
||||
catch(InterruptedException e)
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
Iterator threadIterator = threadPool.getThreadIterator();
|
||||
int i=0;
|
||||
StringBuffer bytesReadString = new StringBuffer(200);
|
||||
StringBuffer rawBytesReadString = new StringBuffer(200);
|
||||
StringBuffer tasksRunString = new StringBuffer(200);
|
||||
long overallBytesRead = 0;
|
||||
long overallTasksRun = 0;
|
||||
long now = System.currentTimeMillis();
|
||||
boolean finished = false;
|
||||
//System.out.print("\f");
|
||||
/*while(!finished)
|
||||
{
|
||||
boolean restart = false;*/
|
||||
boolean allThreadsIdle = true;
|
||||
StringBuffer sb = new StringBuffer(500);
|
||||
|
||||
while(threadIterator.hasNext())
|
||||
{
|
||||
FetcherThread thread = (FetcherThread)threadIterator.next();
|
||||
long totalBytesRead = thread.getTotalBytesRead();
|
||||
overallBytesRead += totalBytesRead;
|
||||
bytesReadString.append(formatBytes(totalBytesRead)).append( "; ");
|
||||
rawBytesReadString.append(totalBytesRead).append("; ");
|
||||
long tasksRun = thread.getTotalTasksRun();
|
||||
overallTasksRun += tasksRun;
|
||||
tasksRunString.append(tasksRun).append("; ");
|
||||
|
||||
// check task status
|
||||
State state = thread.getTaskState();
|
||||
//StringBuffer sb = new StringBuffer(200);
|
||||
sb.setLength(0);
|
||||
|
||||
System.out.println(sb + "[" + thread.getThreadNumber() + "] " + state.getState() + " for " +
|
||||
(now - state.getStateSince() ) + " ms " +
|
||||
(state.getInfo() != null ? "(" + state.getInfo() +")" : "")
|
||||
);
|
||||
if(!(state.getState().equals(FetcherThread.STATE_IDLE)))
|
||||
{
|
||||
//if(allThreadsIdle) System.out.println("(not all threads are idle, '"+state.getState()+"' != '"+FetcherThread.STATE_IDLE+"')");
|
||||
allThreadsIdle = false;
|
||||
}
|
||||
if (((state.equals(FetcherTask.FT_CONNECTING)) || (state.equals(FetcherTask.FT_GETTING)) || (state.equals(FetcherTask.FT_READING)) || (state.equals(FetcherTask.FT_CLOSING)))
|
||||
&& ((now - state.getStateSince()) > 160000))
|
||||
{
|
||||
System.out.println("****Restarting Thread " + thread.getThreadNumber());
|
||||
threadPool.restartThread(thread.getThreadNumber());
|
||||
break; // Iterator is invalid
|
||||
}
|
||||
|
||||
}
|
||||
/*if(restart)
|
||||
{
|
||||
continue;
|
||||
}
|
||||
finished = true;
|
||||
}*/
|
||||
/*
|
||||
if(overallBytesRead == lastPeriodBytesRead)
|
||||
{
|
||||
*
|
||||
disabled kickout feature - cm
|
||||
|
||||
nothingReadCount ++;
|
||||
System.out.println("Anomaly: nothing read during the last period(s). " + (20-nothingReadCount+1) + " periods to exit");
|
||||
if(nothingReadCount > 20) // nothing happens anymore
|
||||
{
|
||||
log("Ending");
|
||||
System.out.println("End at " + new Date().toString());
|
||||
// print some information
|
||||
System.exit(0);
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
else
|
||||
{
|
||||
nothingReadCount = 0;
|
||||
}*/
|
||||
|
||||
lastPeriodBytesRead = overallBytesRead;
|
||||
|
||||
//State reState = new State("hhh"); //reFilter.getState();
|
||||
sb.setLength(0);
|
||||
//System.out.println(sb + "Robot-Excl.Filter State: " + reState.getState() + " since " + (now-reState.getStateSince()) + " ms " + (reState.getInfo() != null ? " at " + reState.getInfo() : ""));
|
||||
|
||||
addSample(new Sample(overallBytesRead, overallTasksRun, System.currentTimeMillis()));
|
||||
int nrHosts = ((FetcherTaskQueue)threadPool.getTaskQueue()).getNumHosts();
|
||||
int visitedSize = urlVisitedFilter.size();
|
||||
int visitedStringSize = urlVisitedFilter.getStringSize();
|
||||
|
||||
double bytesPerSecond = getAverageBytesRead();
|
||||
double docsPerSecond = getAverageDocsRead();
|
||||
sb.setLength(0);
|
||||
System.out.print(sb + "\nBytes total: ");
|
||||
System.out.print(formatBytes(overallBytesRead) + " (" + formatBytes((long)(((double)overallBytesRead)*1000/(System.currentTimeMillis()-startTime))) + " per second since start)");
|
||||
System.out.print("\nBytes per Second: " + formatBytes((int)bytesPerSecond) + " (50 secs)");
|
||||
System.out.print( "\nDocs per Second: " + docsPerSecond);
|
||||
String bs = bytesReadString.toString();
|
||||
System.out.print( "\nBytes per Thread: " + bs + "\n");
|
||||
double docsPerSecondTotal = ((double)overallTasksRun)*1000/(System.currentTimeMillis()-startTime);
|
||||
sb.setLength(0);
|
||||
System.out.println(sb + "Docs read total: " + overallTasksRun + " Docs/s: " + fractionFormat.format(docsPerSecondTotal) +
|
||||
"\nDocs p.thread: " + tasksRunString);
|
||||
|
||||
long memUsed = Runtime.getRuntime().totalMemory()-Runtime.getRuntime().freeMemory();
|
||||
long memFree = Runtime.getRuntime().freeMemory();
|
||||
long totalMem = Runtime.getRuntime().totalMemory();
|
||||
sb.setLength(0);
|
||||
System.out.println(sb + "Mem used: " + formatBytes(memUsed) + ", free: " + formatBytes(memFree) + " total VM: " + totalMem);
|
||||
int urlsQueued = messageHandler.getQueued();
|
||||
int urlsWaiting = threadPool.getQueueSize();
|
||||
boolean isWorkingOnMessage = messageHandler.isWorkingOnMessage();
|
||||
int urlsScopeFiltered = urlScopeFilter.getFiltered();
|
||||
int urlsVisitedFiltered = urlVisitedFilter.getFiltered();
|
||||
int urlsREFiltered = reFilter.getFiltered();
|
||||
int urlLengthFiltered = urlLengthFilter.getFiltered();
|
||||
sb.setLength(0);
|
||||
System.out.println(sb + "URLs queued: " + urlsQueued + " waiting: " + urlsWaiting);
|
||||
sb.setLength(0);
|
||||
System.out.println(sb + "Message is being processed: " + isWorkingOnMessage);
|
||||
sb.setLength(0);
|
||||
System.out.println(sb + "URLs Filtered: length: " + urlLengthFiltered + " scope: " + urlsScopeFiltered + " visited: " + urlsVisitedFiltered + " robot.txt: " + urlsREFiltered);
|
||||
sb.setLength(0);
|
||||
System.out.println(sb + "Visited size: " + visitedSize + "; String Size in VisitedFilter: " + visitedStringSize + "; Number of Hosts: " + nrHosts + "; hosts in Host Manager: " + hostManager.getSize() + "\n");
|
||||
sb.setLength(0);
|
||||
log(sb + "" + now + ";" + overallBytesRead + ";" + overallTasksRun + ";" + urlsQueued + ";" + urlsWaiting + ";" + isWorkingOnMessage + ";" + urlsScopeFiltered + ";" + urlsVisitedFiltered + ";" + urlsREFiltered + ";" + memUsed + ";" + memFree + ";" + totalMem + ";" + nrHosts + ";" + visitedSize + ";" + visitedStringSize + ";" + rawBytesReadString + ";" + urlLengthFiltered);
|
||||
|
||||
|
||||
if(!isWorkingOnMessage && (urlsQueued == 0) && (urlsWaiting == 0) && allThreadsIdle)
|
||||
{
|
||||
nothingReadCount++;
|
||||
if(nothingReadCount > 20)
|
||||
{
|
||||
SimpleLoggerManager.getInstance().flush();
|
||||
System.exit(0);
|
||||
}
|
||||
|
||||
}
|
||||
else
|
||||
{
|
||||
nothingReadCount = 0;
|
||||
}
|
||||
|
||||
this.setChanged();
|
||||
this.notifyObservers();
|
||||
|
||||
// Request Garbage Collection
|
||||
monitorRunCount++;
|
||||
|
||||
if(monitorRunCount % 6 == 0)
|
||||
{
|
||||
System.runFinalization();
|
||||
}
|
||||
|
||||
if(monitorRunCount % 2 == 0)
|
||||
{
|
||||
System.gc();
|
||||
SimpleLoggerManager.getInstance().flush();
|
||||
}
|
||||
|
||||
}
|
||||
catch(NoSuchMethodError e)
|
||||
{
|
||||
e.printStackTrace();
|
||||
//System.out.println("cause: " + e.getCause());
|
||||
System.out.println("msg: " + e.getMessage());
|
||||
|
||||
}
|
||||
catch(Exception e)
|
||||
{
|
||||
System.out.println("Monitor: Exception: " + e.getClass().getName());
|
||||
e.printStackTrace();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* start the thread
|
||||
*/
|
||||
public void start()
|
||||
{
|
||||
this.clear();
|
||||
thread.start();
|
||||
}
|
||||
|
||||
/**
|
||||
* interrupt the monitor thread
|
||||
*/
|
||||
public void interrupt()
|
||||
{
|
||||
thread.interrupt();
|
||||
}
|
||||
|
||||
|
||||
public synchronized void clear()
|
||||
{
|
||||
//sampleTimeStamps.clear();
|
||||
/*for(int i=0; i < timeSamples.length; i++)
|
||||
{
|
||||
timeSamples[i].clear();
|
||||
}
|
||||
*/
|
||||
}
|
||||
|
||||
/* public synchronized double getAverageReadCount(int maxPeriods)
|
||||
{
|
||||
int lastPeriod = bytesReadPerPeriod.size()-1;
|
||||
int periods = Math.min(lastPeriod, maxPeriods);
|
||||
if(periods < 2)
|
||||
{
|
||||
return 0.0;
|
||||
}
|
||||
|
||||
|
||||
long bytesLastPeriod = ((Sample)bytesReadPerPeriod.get(lastPeriod)).bytesRead;
|
||||
long bytesBeforePeriod = ((Sample)bytesReadPerPeriod.get(lastPeriod - periods)).bytesRead;
|
||||
long bytesRead = bytesLastPeriod - bytesBeforePeriod;
|
||||
|
||||
long endTime = ((Long)sampleTimeStamps.get(sampleTimeStamps.size()-1)).longValue();
|
||||
long startTime = ((Long)sampleTimeStamps.get(sampleTimeStamps.size()-1 - periods)).longValue();
|
||||
long duration = endTime - startTime;
|
||||
System.out.println("bytes read: " + bytesRead + " duration in s: " + duration/1000.0 + " = " + ((double)bytesRead) / (duration/1000.0) + " per second");
|
||||
|
||||
return ((double)bytesRead) / (duration/1000.0);
|
||||
}
|
||||
*/
|
||||
|
||||
/*public synchronized double getDocsPerSecond(int maxPeriods)
|
||||
{
|
||||
int lastPeriod = bytesReadPerPeriod.size()-1;
|
||||
int periods = Math.min(lastPeriod, maxPeriods);
|
||||
if(periods < 2)
|
||||
{
|
||||
return 0.0;
|
||||
}
|
||||
|
||||
|
||||
long docsLastPeriod = ((Sample)bytesReadPerPeriod.get(lastPeriod)).docsRead;
|
||||
long docsBeforePeriod = ((Sample)bytesReadPerPeriod.get(lastPeriod - periods)).docsRead;
|
||||
long docsRead = docsLastPeriod - docsBeforePeriod;
|
||||
|
||||
long endTime = ((Long)sampleTimeStamps.get(sampleTimeStamps.size()-1)).longValue();
|
||||
long startTime = ((Long)sampleTimeStamps.get(sampleTimeStamps.size() - periods)).longValue();
|
||||
long duration = endTime - startTime;
|
||||
System.out.println("docs read: " + docsRead + " duration in s: " + duration/1000.0 + " = " + ((double)docsRead) / (duration/1000.0) + " per second");
|
||||
|
||||
return ((double)docsRead) / (duration/1000.0);
|
||||
}*/
|
||||
|
||||
/**
|
||||
* retrieves the number of threads whose byteCount is below the threshold
|
||||
* @param maxPeriods the number of periods to look back
|
||||
* @param threshold the number of bytes per second that acts as the threshold for a stalled thread
|
||||
*/
|
||||
/*public synchronized int getStalledThreadCount(int maxPeriods, double threshold)
|
||||
{
|
||||
int periods = Math.min(sampleTimeStamps.size(), maxPeriods);
|
||||
int stalledThreads = 0;
|
||||
int j=0, i=0;
|
||||
if(periods > 1)
|
||||
{
|
||||
for(j=0; j<timeSamples.length; j++)
|
||||
{
|
||||
long threadByteCount = 0;
|
||||
ArrayList actArrayList = timeSamples[j];
|
||||
double bytesPerSecond = 0;
|
||||
try
|
||||
{
|
||||
for(i=0; i<periods; i++)
|
||||
{
|
||||
|
||||
Sample actSample = (Sample)(actArrayList.get(i));
|
||||
threadByteCount += actSample.bytesRead;
|
||||
}
|
||||
}
|
||||
catch(Exception e)
|
||||
{
|
||||
System.out.println("getAverageReadCount: " + e.getClass().getName() + ": " + e.getMessage() + "(" + i + ";" + j + ")");
|
||||
e.printStackTrace();
|
||||
}
|
||||
|
||||
bytesPerSecond = ((double)threadByteCount) /
|
||||
((double)((Long)sampleTimeStamps.get(sampleTimeStamps.size()-1)).longValue()
|
||||
- ((Long)sampleTimeStamps.get(sampleTimeStamps.size()-periods)).longValue()) * 1000.0;
|
||||
if(bytesPerSecond < threshold)
|
||||
{
|
||||
stalledThreads++;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return stalledThreads;
|
||||
}
|
||||
*/
|
||||
|
||||
int samples=0;
|
||||
|
||||
public void addSample(Sample s)
|
||||
{
|
||||
if(samples < 10)
|
||||
{
|
||||
bytesReadPerPeriod.add(s);
|
||||
samples++;
|
||||
}
|
||||
else
|
||||
{
|
||||
bytesReadPerPeriod.set(samples % 10, s);
|
||||
}
|
||||
}
|
||||
|
||||
public double getAverageBytesRead()
|
||||
{
|
||||
Iterator i = bytesReadPerPeriod.iterator();
|
||||
Sample oldest = null;
|
||||
Sample newest = null;
|
||||
while(i.hasNext())
|
||||
{
|
||||
|
||||
Sample s = (Sample)i.next();
|
||||
if(oldest == null)
|
||||
{
|
||||
oldest = newest = s;
|
||||
}
|
||||
else
|
||||
{
|
||||
if(s.time < oldest.time)
|
||||
{
|
||||
oldest = s;
|
||||
}
|
||||
else if(s.time > newest.time)
|
||||
{
|
||||
newest = s;
|
||||
}
|
||||
}
|
||||
}
|
||||
return ((newest.bytesRead - oldest.bytesRead)/((newest.time - oldest.time)/1000.0));
|
||||
}
|
||||
public double getAverageDocsRead()
|
||||
{
|
||||
Iterator i = bytesReadPerPeriod.iterator();
|
||||
Sample oldest = null;
|
||||
Sample newest = null;
|
||||
while(i.hasNext())
|
||||
{
|
||||
|
||||
Sample s = (Sample)i.next();
|
||||
if(oldest == null)
|
||||
{
|
||||
oldest = newest = s;
|
||||
}
|
||||
else
|
||||
{
|
||||
if(s.time < oldest.time)
|
||||
{
|
||||
oldest = s;
|
||||
}
|
||||
else if(s.time > newest.time)
|
||||
{
|
||||
newest = s;
|
||||
}
|
||||
}
|
||||
}
|
||||
return ((newest.docsRead - oldest.docsRead)/((newest.time - oldest.time)/1000.0));
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -1,127 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import de.lanlab.larm.util.*;
|
||||
|
||||
/**
|
||||
* kills URLs longer than X characters. Used to prevent endless loops where
|
||||
* the page contains the current URL + some extension
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 28. Januar 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
|
||||
public class URLLengthFilter extends Filter implements MessageListener
|
||||
{
|
||||
/**
|
||||
* called by the message handler
|
||||
*
|
||||
* @param handler the handler
|
||||
*/
|
||||
public void notifyAddedToMessageHandler(MessageHandler handler)
|
||||
{
|
||||
this.messageHandler = handler;
|
||||
}
|
||||
|
||||
|
||||
MessageHandler messageHandler;
|
||||
|
||||
int maxLength;
|
||||
|
||||
// URLLengthFilter()
|
||||
// {
|
||||
// maxLength = 0;
|
||||
// }
|
||||
SimpleLogger log;
|
||||
|
||||
/**
|
||||
* Constructor for the URLLengthFilter object
|
||||
*
|
||||
* @param maxLength max length of the _total_ URL (protocol+host+port+path)
|
||||
*/
|
||||
public URLLengthFilter(int maxLength, SimpleLogger log)
|
||||
{
|
||||
this.maxLength = maxLength;
|
||||
this.log = log;
|
||||
}
|
||||
|
||||
public void setMaxLength(int maxLength)
|
||||
{
|
||||
this.maxLength = maxLength;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* handles the message
|
||||
*
|
||||
* @param message Description of the Parameter
|
||||
* @return the original message or NULL if the URL was too long
|
||||
*/
|
||||
public Message handleRequest(Message message)
|
||||
{
|
||||
URLMessage m = (URLMessage) message;
|
||||
String file = m.getUrl().getFile();
|
||||
if (file != null && file.length() > maxLength) // path + query
|
||||
{
|
||||
filtered++;
|
||||
//log.log("URLLengthFilter: URL " + m.getUrl() + " exceeds maxLength " + this.maxLength);
|
||||
log.log(message.toString());
|
||||
return null;
|
||||
}
|
||||
return message;
|
||||
}
|
||||
}
|
|
@ -1,360 +0,0 @@
|
|||
/*
|
||||
* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.net.*;
|
||||
import java.io.*;
|
||||
import de.lanlab.larm.util.URLUtils;
|
||||
import de.lanlab.larm.net.URLNormalizer;
|
||||
import de.lanlab.larm.net.HostManager;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* represents a URL which is passed around in the messageHandler
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 14. Juni 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class URLMessage implements Message, Serializable
|
||||
{
|
||||
/**
|
||||
* the URL
|
||||
*/
|
||||
protected URL url;
|
||||
|
||||
/**
|
||||
* docID or 0 (used with repository)
|
||||
*/
|
||||
long docId;
|
||||
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
protected volatile String urlString;
|
||||
|
||||
/**
|
||||
* referer or null
|
||||
*/
|
||||
protected URL referer;
|
||||
|
||||
/**
|
||||
* externalized referer URL, to prevent multiple calls to
|
||||
* url.toExternalForm()
|
||||
*/
|
||||
protected volatile String refererString;
|
||||
|
||||
/**
|
||||
* externalized referer URL, to prevent multiple calls to
|
||||
* url.toExternalForm()
|
||||
*/
|
||||
protected volatile String refererNormalizedString;
|
||||
|
||||
/**
|
||||
* normalized URL, as defined by {@link de.lanlab.larm.net.URLNormalizer}
|
||||
* (lower case, index.* removed, all characters except alphanumeric ones
|
||||
* escaped)
|
||||
*/
|
||||
protected String normalizedURLString;
|
||||
|
||||
/**
|
||||
* ANCHOR: an ordinary link like <a href="..."> (or AREA or IMG)<br>
|
||||
* FRAME: a <FRAME src="..."> tag<br>
|
||||
* REDIRECT: the link between two pages after a 301/302/307 result code
|
||||
*/
|
||||
byte linkType;
|
||||
|
||||
public final static byte LINKTYPE_ANCHOR=0;
|
||||
public final static byte LINKTYPE_FRAME=1;
|
||||
public final static byte LINKTYPE_REDIRECT=2;
|
||||
protected final static String LINKTYPE_STRING[] = { "A/IMG/AREA", "FRAME", "Redirect" };
|
||||
|
||||
|
||||
public int getLinkType()
|
||||
{
|
||||
return linkType;
|
||||
}
|
||||
|
||||
public String getLinkTypeString()
|
||||
{
|
||||
return LINKTYPE_STRING[linkType];
|
||||
}
|
||||
/**
|
||||
* anchor text, as in <a href="...">Anchor</a>
|
||||
*/
|
||||
protected String anchor;
|
||||
|
||||
|
||||
public void setDocId(long docId)
|
||||
{
|
||||
this.docId = docId;
|
||||
}
|
||||
|
||||
public long getDocId()
|
||||
{
|
||||
return docId;
|
||||
}
|
||||
|
||||
/**
|
||||
* Constructor for the URLMessage object
|
||||
*
|
||||
* @param url Description of the Parameter
|
||||
* @param referer Description of the Parameter
|
||||
* @param isFrame Description of the Parameter
|
||||
* @param anchor Description of the Parameter
|
||||
* @param hostManager Description of the Parameter
|
||||
*/
|
||||
public URLMessage(URL url, URL referer, byte linkType, String anchor, HostResolver hostResolver)
|
||||
{
|
||||
//super();
|
||||
this.url = url;
|
||||
this.urlString = url != null ? URLUtils.toExternalFormNoRef(url) : null;
|
||||
|
||||
this.referer = referer;
|
||||
this.refererString = referer != null ? URLUtils.toExternalFormNoRef(referer) : null;
|
||||
this.refererNormalizedString = referer != null ? URLUtils.toExternalFormNoRef(URLNormalizer.normalize(referer, hostResolver)) : null;
|
||||
this.linkType = linkType;
|
||||
this.anchor = anchor != null ? anchor : "";
|
||||
this.normalizedURLString = url != null ? URLUtils.toExternalFormNoRef(URLNormalizer.normalize(url, hostResolver)) : null;
|
||||
//this.normalizedURLString = URLNormalizer.
|
||||
//System.out.println("" + refererString + " -> " + urlString);
|
||||
this.docId = 0;
|
||||
}
|
||||
|
||||
public URLMessage(URL url, String normalizedURL, URL referer, String normalizedReferer, byte linkType, String anchor)
|
||||
{
|
||||
//super();
|
||||
this.url = url;
|
||||
this.urlString = url != null ? URLUtils.toExternalFormNoRef(url) : null;
|
||||
|
||||
this.referer = referer;
|
||||
this.refererString = referer != null ? URLUtils.toExternalFormNoRef(referer) : null;
|
||||
this.refererNormalizedString = normalizedReferer;
|
||||
this.linkType = linkType;
|
||||
this.anchor = anchor != null ? anchor : "";
|
||||
this.normalizedURLString = normalizedURL;
|
||||
//this.normalizedURLString = URLNormalizer.
|
||||
//System.out.println("" + refererString + " -> " + urlString);
|
||||
this.docId = 0;
|
||||
}
|
||||
|
||||
public URLMessage(URLMessage other)
|
||||
{
|
||||
this.url = other.url;
|
||||
this.urlString = other.urlString;
|
||||
this.referer = other.referer;
|
||||
this.refererString = other.refererString;
|
||||
this.refererNormalizedString = other.refererNormalizedString;
|
||||
this.linkType = other.linkType;
|
||||
this.anchor = other.anchor;
|
||||
this.normalizedURLString = other.normalizedURLString;
|
||||
this.docId = other.docId;
|
||||
}
|
||||
|
||||
/**
|
||||
* Gets the normalizedURLString attribute of the URLMessage object
|
||||
*
|
||||
* @return The normalizedURLString value
|
||||
|
||||
*/
|
||||
public String getNormalizedURLString()
|
||||
{
|
||||
return this.normalizedURLString;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the url attribute of the URLMessage object
|
||||
*
|
||||
* @return The url value
|
||||
*/
|
||||
public URL getUrl()
|
||||
{
|
||||
return this.url;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the referer attribute of the URLMessage object
|
||||
*
|
||||
* @return The referer value
|
||||
*/
|
||||
public URL getReferer()
|
||||
{
|
||||
return this.referer;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public String toString()
|
||||
{
|
||||
return urlString;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the uRLString attribute of the URLMessage object
|
||||
*
|
||||
* @return The uRLString value
|
||||
*/
|
||||
public String getURLString()
|
||||
{
|
||||
return urlString;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the refererString attribute of the URLMessage object
|
||||
*
|
||||
* @return The refererString value
|
||||
*/
|
||||
public String getRefererString()
|
||||
{
|
||||
return refererString;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the normalizedRefererString attribute of the URLMessage object
|
||||
*
|
||||
* @return The normalizedRefererString value
|
||||
*/
|
||||
public String getNormalizedRefererString()
|
||||
{
|
||||
return this.refererNormalizedString;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the anchor attribute of the URLMessage object
|
||||
*
|
||||
* @return The anchor value
|
||||
*/
|
||||
public String getAnchor()
|
||||
{
|
||||
return anchor;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public int hashCode()
|
||||
{
|
||||
return url.hashCode();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param out Description of the Parameter
|
||||
* @exception IOException Description of the Exception
|
||||
*/
|
||||
private void writeObject(java.io.ObjectOutputStream out)
|
||||
throws IOException
|
||||
{
|
||||
out.writeObject(url);
|
||||
out.writeObject(referer);
|
||||
out.writeByte(linkType);
|
||||
out.writeUTF(anchor != null ? anchor : "");
|
||||
out.writeUTF(refererNormalizedString != null ? refererNormalizedString : "");
|
||||
out.writeUTF(normalizedURLString != null ? normalizedURLString : "");
|
||||
out.write((int)((docId >> 32) & 0xffffffff) );
|
||||
out.write((int)(docId & 0xffffffff));
|
||||
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param in Description of the Parameter
|
||||
* @exception IOException Description of the Exception
|
||||
* @exception ClassNotFoundException Description of the Exception
|
||||
*/
|
||||
private void readObject(java.io.ObjectInputStream in)
|
||||
throws IOException, ClassNotFoundException
|
||||
{
|
||||
url = (URL) in.readObject();
|
||||
referer = (URL) in.readObject();
|
||||
urlString = url.toExternalForm();
|
||||
refererString = referer != null ? referer.toExternalForm() : "";
|
||||
linkType = in.readByte();
|
||||
anchor = in.readUTF();
|
||||
refererNormalizedString = in.readUTF();
|
||||
normalizedURLString = in.readUTF();
|
||||
docId = in.read() << 32;
|
||||
docId |= in.read();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the info attribute of the URLMessage object
|
||||
*
|
||||
* @return The info value
|
||||
*/
|
||||
public String getInfo()
|
||||
{
|
||||
return (referer != null ? refererString : "<start>") + "\t" + urlString + "\t" + this.getNormalizedURLString() + "\t" + linkType + "\t" + anchor;
|
||||
}
|
||||
|
||||
}
|
|
@ -1,133 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import org.apache.oro.text.regex.Perl5Matcher;
|
||||
import org.apache.oro.text.regex.Perl5Compiler;
|
||||
import org.apache.oro.text.regex.Pattern;
|
||||
import de.lanlab.larm.util.*;
|
||||
|
||||
/**
|
||||
* filter class. Tries to match a regular expression with an incoming URL
|
||||
* @author Clemens Marschner
|
||||
* @version $Id$
|
||||
*/
|
||||
class URLScopeFilter extends Filter implements MessageListener
|
||||
{
|
||||
public void notifyAddedToMessageHandler(MessageHandler handler)
|
||||
{
|
||||
this.messageHandler = handler;
|
||||
}
|
||||
MessageHandler messageHandler;
|
||||
|
||||
/**
|
||||
* the regular expression which describes a valid URL
|
||||
*/
|
||||
private Pattern pattern;
|
||||
private Perl5Matcher matcher;
|
||||
private Perl5Compiler compiler;
|
||||
SimpleLogger log;
|
||||
|
||||
public URLScopeFilter(SimpleLogger log)
|
||||
{
|
||||
matcher = new Perl5Matcher();
|
||||
compiler = new Perl5Compiler();
|
||||
this.log = log;
|
||||
}
|
||||
|
||||
public String getRexString()
|
||||
{
|
||||
return pattern.toString();
|
||||
}
|
||||
|
||||
/**
|
||||
* set the regular expression
|
||||
* @param rexString the expression
|
||||
*/
|
||||
public void setRexString(String rexString) throws org.apache.oro.text.regex.MalformedPatternException
|
||||
{
|
||||
this.pattern = compiler.compile(rexString, Perl5Compiler.CASE_INSENSITIVE_MASK | Perl5Compiler.SINGLELINE_MASK);
|
||||
//System.out.println("pattern set to: " + pattern);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this method will be called by the message handler. Tests the URL
|
||||
* and throws it out if it's not in the scope
|
||||
*/
|
||||
public Message handleRequest(Message message)
|
||||
{
|
||||
if(message instanceof URLMessage)
|
||||
{
|
||||
String urlString = ((URLMessage)message).getNormalizedURLString();
|
||||
int length = urlString.length();
|
||||
char buffer[] = new char[length];
|
||||
urlString.getChars(0,length,buffer,0);
|
||||
|
||||
//System.out.println("using pattern: " + pattern);
|
||||
boolean match = matcher.matches(buffer, pattern);
|
||||
if(!match)
|
||||
{
|
||||
//log.log("URLScopeFilter: not in scope: " + urlString);
|
||||
log.log(message.toString());
|
||||
filtered++;
|
||||
|
||||
return null;
|
||||
}
|
||||
}
|
||||
return message;
|
||||
}
|
||||
|
||||
}
|
|
@ -1,165 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.fetcher;
|
||||
|
||||
import java.net.URL;
|
||||
import java.util.*;
|
||||
|
||||
import de.lanlab.larm.util.SimpleLogger;
|
||||
|
||||
/**
|
||||
* contains a HashMap of all URLs already passed. Adds each URL to that list, or
|
||||
* consumes it if it is already present
|
||||
*
|
||||
* @todo find ways to reduce memory consumption here. the approach is somewhat naive
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 3. Januar 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class URLVisitedFilter extends Filter implements MessageListener
|
||||
{
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param handler Description of the Parameter
|
||||
*/
|
||||
public void notifyAddedToMessageHandler(MessageHandler handler)
|
||||
{
|
||||
}
|
||||
|
||||
|
||||
//SimpleLogger log;
|
||||
|
||||
HashSet urlHash;
|
||||
|
||||
static Boolean dummy = new Boolean(true);
|
||||
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the URLVisitedFilter object
|
||||
*
|
||||
* @param initialHashCapacity Description of the Parameter
|
||||
*/
|
||||
public URLVisitedFilter(SimpleLogger log, int initialHashCapacity)
|
||||
{
|
||||
urlHash = new HashSet(initialHashCapacity);
|
||||
//urlVector = new Vector(initialHashCapacity);
|
||||
this.log = log;
|
||||
}
|
||||
SimpleLogger log;
|
||||
|
||||
|
||||
/**
|
||||
* clears everything
|
||||
*/
|
||||
public void clearHashtable()
|
||||
{
|
||||
urlHash.clear();
|
||||
// urlVector.clear();
|
||||
}
|
||||
|
||||
|
||||
|
||||
/**q
|
||||
* @param message Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public Message handleRequest(Message message)
|
||||
{
|
||||
if (message instanceof URLMessage)
|
||||
{
|
||||
URLMessage urlMessage = ((URLMessage) message);
|
||||
URL url = urlMessage.getUrl();
|
||||
String urlString = urlMessage.getNormalizedURLString();
|
||||
if (urlHash.contains(urlString))
|
||||
{
|
||||
//log.log("URLVisitedFilter: " + urlString + " already present.");
|
||||
log.log(message.toString());
|
||||
filtered++;
|
||||
|
||||
return null;
|
||||
}
|
||||
else
|
||||
{
|
||||
// System.out.println("URLVisitedFilter: " + urlString + " not present yet.");
|
||||
urlHash.add(urlString);
|
||||
stringSize += urlString.length(); // see below
|
||||
//urlVector.add(urlString);
|
||||
}
|
||||
}
|
||||
return message;
|
||||
}
|
||||
|
||||
|
||||
private int stringSize = 0;
|
||||
|
||||
/**
|
||||
* just a method to get a rough number of characters contained in the array
|
||||
* with that you see that the total memory is mostly used by this class
|
||||
*/
|
||||
public int getStringSize()
|
||||
{
|
||||
return stringSize;
|
||||
}
|
||||
|
||||
public int size()
|
||||
{
|
||||
return urlHash.size();
|
||||
}
|
||||
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -1,208 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.gui;
|
||||
|
||||
/*
|
||||
A basic extension of the java.awt.Dialog class
|
||||
*/
|
||||
|
||||
import java.awt.*;
|
||||
|
||||
public class AboutDialog extends Dialog {
|
||||
|
||||
public AboutDialog(Frame parent, boolean modal)
|
||||
{
|
||||
super(parent, modal);
|
||||
|
||||
// This code is automatically generated by Visual Cafe when you add
|
||||
// components to the visual environment. It instantiates and initializes
|
||||
// the components. To modify the code, only use code syntax that matches
|
||||
// what Visual Cafe can generate, or Visual Cafe may be unable to back
|
||||
// parse your Java file into its visual environment.
|
||||
|
||||
//{{INIT_CONTROLS
|
||||
setLayout(null);
|
||||
setSize(249,150);
|
||||
setVisible(false);
|
||||
label1.setText("LARM - LANLab Retrieval Machine");
|
||||
add(label1);
|
||||
label1.setBounds(12,12,228,24);
|
||||
okButton.setLabel("OK");
|
||||
add(okButton);
|
||||
okButton.setBounds(95,85,66,27);
|
||||
label2.setText("(C) 2000 Clemens Marschner");
|
||||
add(label2);
|
||||
label2.setBounds(12,36,228,24);
|
||||
setTitle("AWT-Anwendung - Info");
|
||||
//}}
|
||||
|
||||
//{{REGISTER_LISTENERS
|
||||
SymWindow aSymWindow = new SymWindow();
|
||||
this.addWindowListener(aSymWindow);
|
||||
SymAction lSymAction = new SymAction();
|
||||
okButton.addActionListener(lSymAction);
|
||||
//}}
|
||||
|
||||
}
|
||||
|
||||
public AboutDialog(Frame parent, String title, boolean modal)
|
||||
{
|
||||
this(parent, modal);
|
||||
setTitle(title);
|
||||
}
|
||||
|
||||
public void addNotify()
|
||||
{
|
||||
// Record the size of the window prior to calling parents addNotify.
|
||||
Dimension d = getSize();
|
||||
|
||||
super.addNotify();
|
||||
|
||||
// Only do this once.
|
||||
if (fComponentsAdjusted)
|
||||
return;
|
||||
|
||||
// Adjust components according to the insets
|
||||
Insets insets = getInsets();
|
||||
setSize(insets.left + insets.right + d.width, insets.top + insets.bottom + d.height);
|
||||
Component components[] = getComponents();
|
||||
for (int i = 0; i < components.length; i++)
|
||||
{
|
||||
Point p = components[i].getLocation();
|
||||
p.translate(insets.left, insets.top);
|
||||
components[i].setLocation(p);
|
||||
}
|
||||
|
||||
// Used for addNotify check.
|
||||
fComponentsAdjusted = true;
|
||||
}
|
||||
|
||||
public void setVisible(boolean b)
|
||||
{
|
||||
if (b)
|
||||
{
|
||||
Rectangle bounds = getParent().getBounds();
|
||||
Rectangle abounds = getBounds();
|
||||
|
||||
setLocation(bounds.x + (bounds.width - abounds.width)/ 2,
|
||||
bounds.y + (bounds.height - abounds.height)/2);
|
||||
}
|
||||
|
||||
super.setVisible(b);
|
||||
}
|
||||
|
||||
//{{DECLARE_CONTROLS
|
||||
java.awt.Label label1 = new java.awt.Label();
|
||||
java.awt.Button okButton = new java.awt.Button();
|
||||
java.awt.Label label2 = new java.awt.Label();
|
||||
//}}
|
||||
|
||||
// Used for addNotify check.
|
||||
boolean fComponentsAdjusted = false;
|
||||
|
||||
class SymAction implements java.awt.event.ActionListener
|
||||
{
|
||||
public void actionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
Object object = event.getSource();
|
||||
if (object == okButton)
|
||||
okButton_ActionPerformed(event);
|
||||
}
|
||||
}
|
||||
|
||||
void okButton_ActionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
okButton_ActionPerformed_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void okButton_ActionPerformed_Interaction1(java.awt.event.ActionEvent event)
|
||||
{
|
||||
try {
|
||||
this.dispose();
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class SymWindow extends java.awt.event.WindowAdapter
|
||||
{
|
||||
public void windowClosing(java.awt.event.WindowEvent event)
|
||||
{
|
||||
Object object = event.getSource();
|
||||
if (object == AboutDialog.this)
|
||||
AboutDialog_WindowClosing(event);
|
||||
}
|
||||
}
|
||||
|
||||
void AboutDialog_WindowClosing(java.awt.event.WindowEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
AboutDialog_WindowClosing_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void AboutDialog_WindowClosing_Interaction1(java.awt.event.WindowEvent event)
|
||||
{
|
||||
try {
|
||||
this.dispose();
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -1,539 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.gui;
|
||||
|
||||
/*
|
||||
This simple extension of the java.awt.Frame class
|
||||
contains all the elements necessary to act as the
|
||||
main window of an application.
|
||||
*/
|
||||
|
||||
import java.awt.*;
|
||||
import java.awt.event.ActionListener;
|
||||
//import com.sun.java.swing.*;
|
||||
|
||||
public class FetcherFrame extends Frame
|
||||
{
|
||||
public FetcherFrame()
|
||||
{
|
||||
// This code is automatically generated by Visual Cafe when you add
|
||||
// components to the visual environment. It instantiates and initializes
|
||||
// the components. To modify the code, only use code syntax that matches
|
||||
// what Visual Cafe can generate, or Visual Cafe may be unable to back
|
||||
// parse your Java file into its visual environment.
|
||||
|
||||
//{{INIT_CONTROLS
|
||||
setLayout(new BorderLayout(0,0));
|
||||
setSize(800,600);
|
||||
setVisible(false);
|
||||
openFileDialog1.setMode(FileDialog.LOAD);
|
||||
openFileDialog1.setTitle("Öffnen");
|
||||
//$$ openFileDialog1.move(24,312);
|
||||
mainPanelWithBorders.setLayout(new BorderLayout(0,0));
|
||||
add("Center", mainPanelWithBorders);
|
||||
mainPanelWithBorders.setBounds(0,0,800,600);
|
||||
northBorder.setLayout(null);
|
||||
mainPanelWithBorders.add("North", northBorder);
|
||||
northBorder.setBackground(java.awt.Color.lightGray);
|
||||
northBorder.setBounds(0,0,800,3);
|
||||
southBorder.setLayout(null);
|
||||
mainPanelWithBorders.add("South", southBorder);
|
||||
southBorder.setBackground(java.awt.Color.lightGray);
|
||||
southBorder.setBounds(0,597,800,3);
|
||||
westBorder.setLayout(null);
|
||||
mainPanelWithBorders.add("West", westBorder);
|
||||
westBorder.setBackground(java.awt.Color.lightGray);
|
||||
westBorder.setBounds(0,3,3,594);
|
||||
eastBorder.setLayout(null);
|
||||
mainPanelWithBorders.add("East", eastBorder);
|
||||
eastBorder.setBackground(java.awt.Color.lightGray);
|
||||
eastBorder.setBounds(797,3,3,594);
|
||||
mainPanel.setLayout(new BorderLayout(0,3));
|
||||
mainPanelWithBorders.add("Center", mainPanel);
|
||||
mainPanel.setBackground(java.awt.Color.lightGray);
|
||||
mainPanel.setBounds(3,3,794,594);
|
||||
upperPanel.setLayout(new GridLayout(1,2,0,0));
|
||||
mainPanel.add("North", upperPanel);
|
||||
upperPanel.setBounds(0,0,794,150);
|
||||
preferencesPanel.setLayout(null);
|
||||
upperPanel.add(preferencesPanel);
|
||||
preferencesPanel.setBounds(0,0,397,150);
|
||||
startURLlabel.setText("Start-URL");
|
||||
preferencesPanel.add(startURLlabel);
|
||||
startURLlabel.setBounds(12,0,121,24);
|
||||
startURL.setText("uni-muenchen.de");
|
||||
preferencesPanel.add(startURL);
|
||||
startURL.setBounds(132,0,133,24);
|
||||
startButton.setLabel("Start");
|
||||
preferencesPanel.add(startButton);
|
||||
startButton.setFont(new Font("Dialog", Font.BOLD, 12));
|
||||
startButton.setBounds(288,36,99,24);
|
||||
restrictToLabel.setText("Restrict host to");
|
||||
preferencesPanel.add(restrictToLabel);
|
||||
restrictToLabel.setBounds(12,36,121,28);
|
||||
preferencesPanel.add(restrictTo);
|
||||
restrictTo.setBounds(133,36,133,24);
|
||||
logPanel.setLayout(new BorderLayout(0,0));
|
||||
upperPanel.add(logPanel);
|
||||
logPanel.setBounds(397,0,397,150);
|
||||
logPanel.add("Center", logList);
|
||||
logList.setBackground(java.awt.Color.white);
|
||||
logList.setBounds(0,0,397,150);
|
||||
lowerPanel.setLayout(new GridLayout(1,3,3,3));
|
||||
mainPanel.add("Center", lowerPanel);
|
||||
lowerPanel.setBounds(0,153,794,441);
|
||||
urlQueuePanel.setLayout(new BorderLayout(0,0));
|
||||
lowerPanel.add(urlQueuePanel);
|
||||
urlQueuePanel.setBounds(0,0,196,441);
|
||||
urlQueueLabel.setText("URLQueue");
|
||||
urlQueuePanel.add("North", urlQueueLabel);
|
||||
urlQueueLabel.setBounds(0,0,196,23);
|
||||
urlQueuePanel.add("Center", urlQueueList);
|
||||
urlQueueList.setBackground(java.awt.Color.white);
|
||||
urlQueueList.setBounds(0,23,196,418);
|
||||
urlThreadPanel.setLayout(new BorderLayout(0,0));
|
||||
lowerPanel.add(urlThreadPanel);
|
||||
urlThreadPanel.setBounds(199,0,196,441);
|
||||
urlThreadLabel.setText("URLThreads");
|
||||
urlThreadPanel.add("North", urlThreadLabel);
|
||||
urlThreadLabel.setBounds(0,0,196,23);
|
||||
urlThreadPanel.add("Center", urlThreadList);
|
||||
urlThreadList.setBackground(java.awt.Color.white);
|
||||
urlThreadList.setBounds(0,23,196,418);
|
||||
docQueuePanel.setLayout(new BorderLayout(0,0));
|
||||
lowerPanel.add(docQueuePanel);
|
||||
docQueuePanel.setBounds(398,0,196,441);
|
||||
docQueueLabel.setText("DocQueue");
|
||||
docQueuePanel.add("North", docQueueLabel);
|
||||
docQueueLabel.setBounds(0,0,196,23);
|
||||
docQueuePanel.add("Center", docQueueList);
|
||||
docQueueList.setBackground(java.awt.Color.white);
|
||||
docQueueList.setBounds(0,23,196,418);
|
||||
docThreadPanel.setLayout(new BorderLayout(0,0));
|
||||
lowerPanel.add(docThreadPanel);
|
||||
docThreadPanel.setBounds(597,0,196,441);
|
||||
docThreadLabel.setText("DocThreads");
|
||||
docThreadPanel.add("North", docThreadLabel);
|
||||
docThreadLabel.setBounds(0,0,196,23);
|
||||
docThreadPanel.add("Center", docThreadList);
|
||||
docThreadList.setBackground(java.awt.Color.white);
|
||||
docThreadList.setBounds(0,23,196,418);
|
||||
setTitle("LARM - Fetcher");
|
||||
//}}
|
||||
|
||||
//{{INIT_MENUS
|
||||
menu1.setLabel("Datei");
|
||||
menu1.add(newMenuItem);
|
||||
newMenuItem.setEnabled(false);
|
||||
newMenuItem.setLabel("Neu");
|
||||
newMenuItem.setShortcut(new MenuShortcut(java.awt.event.KeyEvent.VK_N,false));
|
||||
menu1.add(openMenuItem);
|
||||
openMenuItem.setLabel("Öffnen...");
|
||||
openMenuItem.setShortcut(new MenuShortcut(java.awt.event.KeyEvent.VK_O,false));
|
||||
menu1.add(saveMenuItem);
|
||||
saveMenuItem.setEnabled(false);
|
||||
saveMenuItem.setLabel("Speichern");
|
||||
saveMenuItem.setShortcut(new MenuShortcut(java.awt.event.KeyEvent.VK_S,false));
|
||||
menu1.add(saveAsMenuItem);
|
||||
saveAsMenuItem.setEnabled(false);
|
||||
saveAsMenuItem.setLabel("Speichern unter...");
|
||||
menu1.add(separatorMenuItem);
|
||||
separatorMenuItem.setLabel("-");
|
||||
menu1.add(exitMenuItem);
|
||||
exitMenuItem.setLabel("Beenden");
|
||||
mainMenuBar.add(menu1);
|
||||
menu2.setLabel("Bearbeiten");
|
||||
menu2.add(cutMenuItem);
|
||||
cutMenuItem.setEnabled(false);
|
||||
cutMenuItem.setLabel("Ausschneiden");
|
||||
cutMenuItem.setShortcut(new MenuShortcut(java.awt.event.KeyEvent.VK_X,false));
|
||||
menu2.add(copyMenuItem);
|
||||
copyMenuItem.setEnabled(false);
|
||||
copyMenuItem.setLabel("Kopieren");
|
||||
copyMenuItem.setShortcut(new MenuShortcut(java.awt.event.KeyEvent.VK_C,false));
|
||||
menu2.add(pasteMenuItem);
|
||||
pasteMenuItem.setEnabled(false);
|
||||
pasteMenuItem.setLabel("Einfügen");
|
||||
pasteMenuItem.setShortcut(new MenuShortcut(java.awt.event.KeyEvent.VK_V,false));
|
||||
mainMenuBar.add(menu2);
|
||||
menu3.setLabel("Hilfe");
|
||||
menu3.add(aboutMenuItem);
|
||||
aboutMenuItem.setLabel("Info...");
|
||||
mainMenuBar.add(menu3);
|
||||
//$$ mainMenuBar.move(0,312);
|
||||
setMenuBar(mainMenuBar);
|
||||
//}}
|
||||
|
||||
//{{REGISTER_LISTENERS
|
||||
SymWindow aSymWindow = new SymWindow();
|
||||
this.addWindowListener(aSymWindow);
|
||||
SymAction lSymAction = new SymAction();
|
||||
openMenuItem.addActionListener(lSymAction);
|
||||
exitMenuItem.addActionListener(lSymAction);
|
||||
aboutMenuItem.addActionListener(lSymAction);
|
||||
startButton.addActionListener(lSymAction);
|
||||
//}}
|
||||
}
|
||||
|
||||
public FetcherFrame(String title)
|
||||
{
|
||||
this();
|
||||
setTitle(title);
|
||||
}
|
||||
|
||||
/**
|
||||
* Shows or hides the component depending on the boolean flag b.
|
||||
* @param b if true, show the component; otherwise, hide the component.
|
||||
* @see java.awt.Component#isVisible
|
||||
*/
|
||||
public void setVisible(boolean b)
|
||||
{
|
||||
if(b)
|
||||
{
|
||||
setLocation(50, 50);
|
||||
}
|
||||
super.setVisible(b);
|
||||
}
|
||||
|
||||
static public void main(String args[])
|
||||
{
|
||||
try
|
||||
{
|
||||
//Create a new instance of our application's frame, and make it visible.
|
||||
(new FetcherFrame()).setVisible(true);
|
||||
}
|
||||
catch (Throwable t)
|
||||
{
|
||||
System.err.println(t);
|
||||
t.printStackTrace();
|
||||
//Ensure the application exits with an error condition.
|
||||
System.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
public void addNotify()
|
||||
{
|
||||
// Record the size of the window prior to calling parents addNotify.
|
||||
Dimension d = getSize();
|
||||
|
||||
super.addNotify();
|
||||
|
||||
if (fComponentsAdjusted)
|
||||
return;
|
||||
|
||||
// Adjust components according to the insets
|
||||
setSize(getInsets().left + getInsets().right + d.width, getInsets().top + getInsets().bottom + d.height);
|
||||
Component components[] = getComponents();
|
||||
for (int i = 0; i < components.length; i++)
|
||||
{
|
||||
Point p = components[i].getLocation();
|
||||
p.translate(getInsets().left, getInsets().top);
|
||||
components[i].setLocation(p);
|
||||
}
|
||||
fComponentsAdjusted = true;
|
||||
}
|
||||
|
||||
// Used for addNotify check.
|
||||
boolean fComponentsAdjusted = false;
|
||||
|
||||
//{{DECLARE_CONTROLS
|
||||
java.awt.FileDialog openFileDialog1 = new java.awt.FileDialog(this);
|
||||
java.awt.Panel mainPanelWithBorders = new java.awt.Panel();
|
||||
java.awt.Panel northBorder = new java.awt.Panel();
|
||||
java.awt.Panel southBorder = new java.awt.Panel();
|
||||
java.awt.Panel westBorder = new java.awt.Panel();
|
||||
java.awt.Panel eastBorder = new java.awt.Panel();
|
||||
java.awt.Panel mainPanel = new java.awt.Panel();
|
||||
java.awt.Panel upperPanel = new java.awt.Panel();
|
||||
java.awt.Panel preferencesPanel = new java.awt.Panel();
|
||||
java.awt.Label startURLlabel = new java.awt.Label();
|
||||
java.awt.TextField startURL = new java.awt.TextField(30);
|
||||
java.awt.Button startButton = new java.awt.Button();
|
||||
java.awt.Label restrictToLabel = new java.awt.Label();
|
||||
java.awt.TextField restrictTo = new java.awt.TextField();
|
||||
java.awt.Panel logPanel = new java.awt.Panel();
|
||||
java.awt.List logList = new java.awt.List(8);
|
||||
java.awt.Panel lowerPanel = new java.awt.Panel();
|
||||
java.awt.Panel urlQueuePanel = new java.awt.Panel();
|
||||
java.awt.Label urlQueueLabel = new java.awt.Label();
|
||||
java.awt.List urlQueueList = new java.awt.List(5);
|
||||
java.awt.Panel urlThreadPanel = new java.awt.Panel();
|
||||
java.awt.Label urlThreadLabel = new java.awt.Label();
|
||||
java.awt.List urlThreadList = new java.awt.List(4);
|
||||
java.awt.Panel docQueuePanel = new java.awt.Panel();
|
||||
java.awt.Label docQueueLabel = new java.awt.Label();
|
||||
java.awt.List docQueueList = new java.awt.List(4);
|
||||
java.awt.Panel docThreadPanel = new java.awt.Panel();
|
||||
java.awt.Label docThreadLabel = new java.awt.Label();
|
||||
java.awt.List docThreadList = new java.awt.List(4);
|
||||
//}}
|
||||
|
||||
//{{DECLARE_MENUS
|
||||
java.awt.MenuBar mainMenuBar = new java.awt.MenuBar();
|
||||
java.awt.Menu menu1 = new java.awt.Menu();
|
||||
java.awt.MenuItem newMenuItem = new java.awt.MenuItem();
|
||||
java.awt.MenuItem openMenuItem = new java.awt.MenuItem();
|
||||
java.awt.MenuItem saveMenuItem = new java.awt.MenuItem();
|
||||
java.awt.MenuItem saveAsMenuItem = new java.awt.MenuItem();
|
||||
java.awt.MenuItem separatorMenuItem = new java.awt.MenuItem();
|
||||
java.awt.MenuItem exitMenuItem = new java.awt.MenuItem();
|
||||
java.awt.Menu menu2 = new java.awt.Menu();
|
||||
java.awt.MenuItem cutMenuItem = new java.awt.MenuItem();
|
||||
java.awt.MenuItem copyMenuItem = new java.awt.MenuItem();
|
||||
java.awt.MenuItem pasteMenuItem = new java.awt.MenuItem();
|
||||
java.awt.Menu menu3 = new java.awt.Menu();
|
||||
java.awt.MenuItem aboutMenuItem = new java.awt.MenuItem();
|
||||
//}}
|
||||
|
||||
class SymWindow extends java.awt.event.WindowAdapter
|
||||
{
|
||||
public void windowClosing(java.awt.event.WindowEvent event)
|
||||
{
|
||||
Object object = event.getSource();
|
||||
if (object == FetcherFrame.this)
|
||||
FetcherFrame_WindowClosing(event);
|
||||
}
|
||||
}
|
||||
|
||||
void FetcherFrame_WindowClosing(java.awt.event.WindowEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
FetcherFrame_WindowClosing_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void FetcherFrame_WindowClosing_Interaction1(java.awt.event.WindowEvent event)
|
||||
{
|
||||
try {
|
||||
// QuitDialog Create and show as modal
|
||||
(new QuitDialog(this, true)).setVisible(true);
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class SymAction implements java.awt.event.ActionListener
|
||||
{
|
||||
public void actionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
Object object = event.getSource();
|
||||
if (object == openMenuItem)
|
||||
openMenuItem_ActionPerformed(event);
|
||||
else if (object == aboutMenuItem)
|
||||
aboutMenuItem_ActionPerformed(event);
|
||||
else if (object == exitMenuItem)
|
||||
exitMenuItem_ActionPerformed(event);
|
||||
else if (object == startButton)
|
||||
startButton_ActionPerformed(event);
|
||||
}
|
||||
}
|
||||
|
||||
void openMenuItem_ActionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
openMenuItem_ActionPerformed_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void openMenuItem_ActionPerformed_Interaction1(java.awt.event.ActionEvent event)
|
||||
{
|
||||
try {
|
||||
// OpenFileDialog Create and show as modal
|
||||
int defMode = openFileDialog1.getMode();
|
||||
String defTitle = openFileDialog1.getTitle();
|
||||
String defDirectory = openFileDialog1.getDirectory();
|
||||
String defFile = openFileDialog1.getFile();
|
||||
|
||||
openFileDialog1 = new java.awt.FileDialog(this, defTitle, defMode);
|
||||
openFileDialog1.setDirectory(defDirectory);
|
||||
openFileDialog1.setFile(defFile);
|
||||
openFileDialog1.setVisible(true);
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void aboutMenuItem_ActionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
aboutMenuItem_ActionPerformed_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void aboutMenuItem_ActionPerformed_Interaction1(java.awt.event.ActionEvent event)
|
||||
{
|
||||
try {
|
||||
// AboutDialog Create and show as modal
|
||||
(new AboutDialog(this, true)).setVisible(true);
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void exitMenuItem_ActionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
exitMenuItem_ActionPerformed_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void exitMenuItem_ActionPerformed_Interaction1(java.awt.event.ActionEvent event)
|
||||
{
|
||||
try {
|
||||
// QuitDialog Create and show as modal
|
||||
(new QuitDialog(this, true)).setVisible(true);
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
public void startButton_ActionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
}
|
||||
|
||||
public void addUrlQueueItem(String item)
|
||||
{
|
||||
urlQueueList.add(item);
|
||||
}
|
||||
|
||||
public void removeUrlQueueItem(String item)
|
||||
{
|
||||
urlQueueList.remove(item);
|
||||
}
|
||||
public void addDocQueueItem(String item)
|
||||
{
|
||||
docQueueList.add(item);
|
||||
}
|
||||
|
||||
public void removeDocQueueItem(String item)
|
||||
{
|
||||
docQueueList.remove(item);
|
||||
}
|
||||
|
||||
public synchronized int addUrlThreadItem(String item)
|
||||
{
|
||||
urlThreadList.add(item);
|
||||
return urlThreadList.getItemCount();
|
||||
}
|
||||
|
||||
public synchronized int addUrlThreadItem(String item, int pos)
|
||||
{
|
||||
urlThreadList.add(item,pos);
|
||||
return urlThreadList.getItemCount();
|
||||
}
|
||||
|
||||
public void replaceUrlThreadItem(String item, int index)
|
||||
{
|
||||
urlThreadList.replaceItem(item,index);
|
||||
}
|
||||
|
||||
public synchronized int addDocThreadItem(String item)
|
||||
{
|
||||
docThreadList.add(item);
|
||||
return docThreadList.getItemCount();
|
||||
}
|
||||
|
||||
public void replaceDocThreadItem(String item, int index)
|
||||
{
|
||||
docThreadList.replaceItem(item,index);
|
||||
}
|
||||
|
||||
|
||||
|
||||
public void addLogEntry(String entry)
|
||||
{
|
||||
logList.add(entry);
|
||||
logList.makeVisible(logList.getItemCount()-1);
|
||||
}
|
||||
|
||||
public void clearLog()
|
||||
{
|
||||
logList.removeAll();
|
||||
}
|
||||
|
||||
public void addStartButtonListener(ActionListener a)
|
||||
{
|
||||
startButton.addActionListener(a);
|
||||
}
|
||||
|
||||
public String getRestrictTo()
|
||||
{
|
||||
return restrictTo.getText();
|
||||
}
|
||||
public void setRestrictTo(String restrictTo)
|
||||
{
|
||||
this.restrictTo.setText(restrictTo);
|
||||
}
|
||||
public String getStartURL()
|
||||
{
|
||||
return startURL.getText();
|
||||
}
|
||||
public void setStartURL(String startURL)
|
||||
{
|
||||
this.startURL.setText(startURL);
|
||||
}
|
||||
|
||||
//public void setInfoText(String text)
|
||||
//{
|
||||
// thi
|
||||
//}
|
||||
}
|
||||
|
|
@ -1,377 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.gui;
|
||||
|
||||
import javax.swing.*;
|
||||
import java.awt.*;
|
||||
import java.awt.event.*;
|
||||
|
||||
|
||||
public class FetcherSummaryFrame extends JFrame
|
||||
{
|
||||
JPanel lowerPanel = new JPanel();
|
||||
JPanel progressPanel = new JPanel();
|
||||
JPanel middlePanel = new JPanel();
|
||||
JPanel rightPanel = new JPanel();
|
||||
BorderLayout borderLayout1 = new BorderLayout();
|
||||
JPanel propertyPanel = new JPanel();
|
||||
JLabel hostLabel = new JLabel();
|
||||
JLabel urlRestrictionFrame = new JLabel();
|
||||
JTextField startURL = new JTextField();
|
||||
JTextField restrictTo = new JTextField();
|
||||
JButton startButton = new JButton();
|
||||
GridLayout gridLayout1 = new GridLayout();
|
||||
JProgressBar urlQueuedProgress = new JProgressBar(0,100);
|
||||
JLabel urlQueuedLabel = new JLabel();
|
||||
JLabel scopeFilteredLabel = new JLabel();
|
||||
JProgressBar scopeFilteredProgress = new JProgressBar(0,100);
|
||||
JLabel visitedFilteredLabel = new JLabel();
|
||||
JProgressBar visitedFilteredProgress = new JProgressBar(0,100);
|
||||
JLabel workingThreadsLabel = new JLabel();
|
||||
JProgressBar workingThreadsProgress = new JProgressBar(0,100);
|
||||
JLabel idleThreadsLabel = new JLabel();
|
||||
JProgressBar idleThreadsProgress = new JProgressBar(0,100);
|
||||
JLabel busyThreadsLabel = new JLabel();
|
||||
JProgressBar busyThreadsProgress = new JProgressBar(0,100);
|
||||
JLabel requestQueueLabel = new JLabel();
|
||||
JProgressBar requestQueueProgress = new JProgressBar();
|
||||
JLabel stalledThreadsLabel = new JLabel();
|
||||
JProgressBar stalledThreadsProgress = new JProgressBar();
|
||||
JLabel dnsLabel = new JLabel();
|
||||
JProgressBar dnsProgress = new JProgressBar(0,100);
|
||||
JLabel freeMemLabel = new JLabel();
|
||||
JLabel freeMemText = new JLabel();
|
||||
JLabel totalMemLabel = new JLabel();
|
||||
JLabel totalMemText = new JLabel();
|
||||
JLabel bpsLabel = new JLabel();
|
||||
JLabel bpsText = new JLabel();
|
||||
JLabel docsLabel = new JLabel();
|
||||
JLabel docsText = new JLabel();
|
||||
JLabel docsReadLabel = new JLabel();
|
||||
JLabel docsReadText = new JLabel();
|
||||
JProgressBar urlsCaughtProgress = new JProgressBar(0,100);
|
||||
JLabel urlsCaughtText = new JLabel();
|
||||
JLabel robotsTxtsText = new JLabel();
|
||||
JProgressBar robotsTxtsProgress = new JProgressBar(0,100);
|
||||
|
||||
public FetcherSummaryFrame()
|
||||
{
|
||||
try
|
||||
{
|
||||
jbInit();
|
||||
this.setTitle("LARM - LANLab Retrieval Machine");
|
||||
this.setSize(new Dimension(640,350));
|
||||
this.urlQueuedProgress.setStringPainted(true);
|
||||
this.urlQueuedProgress.setString("0");
|
||||
this.scopeFilteredProgress.setStringPainted(true);
|
||||
this.scopeFilteredProgress.setString("0");
|
||||
this.visitedFilteredProgress.setStringPainted(true);
|
||||
this.visitedFilteredProgress.setString("0");
|
||||
workingThreadsProgress.setStringPainted(true);
|
||||
workingThreadsProgress.setString("0");
|
||||
idleThreadsProgress.setStringPainted(true);
|
||||
idleThreadsProgress.setString("0");
|
||||
busyThreadsProgress.setStringPainted(true);
|
||||
busyThreadsProgress.setString("0");
|
||||
stalledThreadsProgress.setStringPainted(true);
|
||||
stalledThreadsProgress.setString("0");
|
||||
requestQueueProgress.setStringPainted(true);
|
||||
requestQueueProgress.setString("0");
|
||||
dnsProgress.setStringPainted(true);
|
||||
dnsProgress.setString("0");
|
||||
urlsCaughtProgress.setStringPainted(true);
|
||||
urlsCaughtProgress.setString("0");
|
||||
robotsTxtsProgress.setStringPainted(true);
|
||||
robotsTxtsProgress.setString("0");
|
||||
}
|
||||
catch(Exception e)
|
||||
{
|
||||
e.printStackTrace();
|
||||
}
|
||||
}
|
||||
|
||||
private void jbInit() throws Exception
|
||||
{
|
||||
this.getContentPane().setLayout(borderLayout1);
|
||||
propertyPanel.setMinimumSize(new Dimension(10, 70));
|
||||
propertyPanel.setPreferredSize(new Dimension(10, 80));
|
||||
propertyPanel.setLayout(null);
|
||||
hostLabel.setText("Startseite");
|
||||
hostLabel.setBounds(new Rectangle(18, 15, 76, 17));
|
||||
urlRestrictionFrame.setText("URL-Restriction (regul. Ausdruck)");
|
||||
urlRestrictionFrame.setBounds(new Rectangle(18, 37, 208, 17));
|
||||
startURL.setBounds(new Rectangle(224, 14, 281, 21));
|
||||
restrictTo.setBounds(new Rectangle(224, 38, 281, 21));
|
||||
startButton.setActionCommand("start");
|
||||
startButton.setText("Start");
|
||||
startButton.setBounds(new Rectangle(528, 14, 79, 47));
|
||||
lowerPanel.setLayout(gridLayout1);
|
||||
urlQueuedLabel.setToolTipText("");
|
||||
urlQueuedLabel.setText("URLs queued");
|
||||
scopeFilteredLabel.setToolTipText("");
|
||||
scopeFilteredLabel.setText("Scope-gefiltert");
|
||||
visitedFilteredLabel.setText("Visited gefiltert");
|
||||
workingThreadsLabel.setText("Number of Working Threads");
|
||||
idleThreadsLabel.setText("Idle Threads");
|
||||
busyThreadsLabel.setText("Busy Threads");
|
||||
requestQueueLabel.setText("requests queued");
|
||||
stalledThreadsLabel.setText("stalled Threads");
|
||||
stalledThreadsProgress.setPreferredSize(new Dimension(190, 25));
|
||||
requestQueueProgress.setPreferredSize(new Dimension(190, 25));
|
||||
busyThreadsProgress.setPreferredSize(new Dimension(190, 25));
|
||||
idleThreadsProgress.setPreferredSize(new Dimension(190, 25));
|
||||
workingThreadsProgress.setPreferredSize(new Dimension(190, 25));
|
||||
urlQueuedProgress.setPreferredSize(new Dimension(190, 25));
|
||||
scopeFilteredProgress.setPreferredSize(new Dimension(190, 25));
|
||||
visitedFilteredProgress.setPreferredSize(new Dimension(190, 25));
|
||||
dnsLabel.setText("DNS Hosts cached");
|
||||
dnsProgress.setPreferredSize(new Dimension(190, 25));
|
||||
freeMemLabel.setText("Free Mem");
|
||||
freeMemLabel.setPreferredSize(new Dimension(60, 17));
|
||||
freeMemText.setText("0");
|
||||
freeMemText.setPreferredSize(new Dimension(120, 17));
|
||||
freeMemText.setMinimumSize(new Dimension(100, 17));
|
||||
totalMemLabel.setText("total Mem");
|
||||
totalMemLabel.setPreferredSize(new Dimension(60, 17));
|
||||
totalMemText.setText("0");
|
||||
totalMemText.setPreferredSize(new Dimension(120, 17));
|
||||
totalMemText.setMinimumSize(new Dimension(100, 17));
|
||||
bpsLabel.setPreferredSize(new Dimension(60, 17));
|
||||
bpsLabel.setText("Bytes/s");
|
||||
bpsText.setMinimumSize(new Dimension(100, 17));
|
||||
bpsText.setPreferredSize(new Dimension(120, 17));
|
||||
bpsText.setText("0");
|
||||
docsLabel.setText("Docs/s");
|
||||
docsLabel.setPreferredSize(new Dimension(60, 17));
|
||||
docsText.setText("0");
|
||||
docsText.setPreferredSize(new Dimension(120, 17));
|
||||
docsText.setMinimumSize(new Dimension(100, 17));
|
||||
docsReadLabel.setText("Docs read");
|
||||
docsReadLabel.setPreferredSize(new Dimension(60, 17));
|
||||
docsReadText.setText("0");
|
||||
docsReadText.setPreferredSize(new Dimension(120, 17));
|
||||
docsReadText.setMinimumSize(new Dimension(100, 17));
|
||||
urlsCaughtProgress.setPreferredSize(new Dimension(190, 25));
|
||||
urlsCaughtText.setText("URLs caught by Robots.txt");
|
||||
robotsTxtsText.setText("Robots.txts found");
|
||||
robotsTxtsProgress.setPreferredSize(new Dimension(190, 25));
|
||||
this.getContentPane().add(lowerPanel, BorderLayout.CENTER);
|
||||
lowerPanel.add(progressPanel, null);
|
||||
progressPanel.add(urlQueuedLabel, null);
|
||||
progressPanel.add(urlQueuedProgress, null);
|
||||
progressPanel.add(scopeFilteredLabel, null);
|
||||
progressPanel.add(scopeFilteredProgress, null);
|
||||
progressPanel.add(visitedFilteredLabel, null);
|
||||
progressPanel.add(visitedFilteredProgress, null);
|
||||
progressPanel.add(dnsLabel, null);
|
||||
progressPanel.add(dnsProgress, null);
|
||||
progressPanel.add(robotsTxtsText, null);
|
||||
progressPanel.add(robotsTxtsProgress, null);
|
||||
progressPanel.add(urlsCaughtText, null);
|
||||
progressPanel.add(urlsCaughtProgress, null);
|
||||
lowerPanel.add(middlePanel, null);
|
||||
middlePanel.add(workingThreadsLabel, null);
|
||||
middlePanel.add(workingThreadsProgress, null);
|
||||
middlePanel.add(idleThreadsLabel, null);
|
||||
middlePanel.add(idleThreadsProgress, null);
|
||||
middlePanel.add(busyThreadsLabel, null);
|
||||
middlePanel.add(busyThreadsProgress, null);
|
||||
middlePanel.add(requestQueueLabel, null);
|
||||
middlePanel.add(requestQueueProgress, null);
|
||||
middlePanel.add(stalledThreadsLabel, null);
|
||||
middlePanel.add(stalledThreadsProgress, null);
|
||||
lowerPanel.add(rightPanel, null);
|
||||
rightPanel.add(docsLabel, null);
|
||||
rightPanel.add(docsText, null);
|
||||
rightPanel.add(docsReadLabel, null);
|
||||
rightPanel.add(docsReadText, null);
|
||||
rightPanel.add(bpsLabel, null);
|
||||
rightPanel.add(bpsText, null);
|
||||
rightPanel.add(totalMemLabel, null);
|
||||
rightPanel.add(totalMemText, null);
|
||||
rightPanel.add(freeMemLabel, null);
|
||||
rightPanel.add(freeMemText, null);
|
||||
this.getContentPane().add(propertyPanel, BorderLayout.NORTH);
|
||||
propertyPanel.add(urlRestrictionFrame, null);
|
||||
propertyPanel.add(restrictTo, null);
|
||||
propertyPanel.add(hostLabel, null);
|
||||
propertyPanel.add(startButton, null);
|
||||
propertyPanel.add(startURL, null);
|
||||
}
|
||||
|
||||
public void setCounterProgressBar(JProgressBar p, int value)
|
||||
{
|
||||
int oldMax = p.getMaximum();
|
||||
int oldValue = p.getValue();
|
||||
|
||||
if(value > oldMax)
|
||||
{
|
||||
p.setMaximum(oldMax * 2);
|
||||
}
|
||||
else if (value < oldMax / 2 && oldValue >= oldMax / 2)
|
||||
{
|
||||
p.setMaximum(oldMax / 2);
|
||||
}
|
||||
p.setValue(value);
|
||||
p.setString("" + value);
|
||||
}
|
||||
|
||||
public void setURLsQueued(int queued)
|
||||
{
|
||||
setCounterProgressBar(this.urlQueuedProgress, queued);
|
||||
}
|
||||
|
||||
public void setScopeFiltered(int filtered)
|
||||
{
|
||||
setCounterProgressBar(this.scopeFilteredProgress, filtered);
|
||||
}
|
||||
|
||||
public void setVisitedFiltered(int filtered)
|
||||
{
|
||||
setCounterProgressBar(this.visitedFilteredProgress, filtered);
|
||||
}
|
||||
|
||||
public void setWorkingThreadsCount(int threads)
|
||||
{
|
||||
setCounterProgressBar(this.workingThreadsProgress, threads);
|
||||
}
|
||||
|
||||
public void setIdleThreadsCount(int threads)
|
||||
{
|
||||
setCounterProgressBar(this.idleThreadsProgress, threads);
|
||||
}
|
||||
|
||||
public void setBusyThreadsCount(int threads)
|
||||
{
|
||||
setCounterProgressBar(this.busyThreadsProgress, threads);
|
||||
}
|
||||
|
||||
public void setRequestQueueCount(int requests)
|
||||
{
|
||||
setCounterProgressBar(this.requestQueueProgress, requests);
|
||||
}
|
||||
|
||||
public void setDNSCount(int count)
|
||||
{
|
||||
setCounterProgressBar(this.dnsProgress, count);
|
||||
}
|
||||
|
||||
public void setURLsCaughtCount(int count)
|
||||
{
|
||||
setCounterProgressBar(this.urlQueuedProgress, count);
|
||||
}
|
||||
|
||||
public void addStartButtonListener(ActionListener a)
|
||||
{
|
||||
startButton.addActionListener(a);
|
||||
}
|
||||
|
||||
|
||||
|
||||
public String getRestrictTo()
|
||||
{
|
||||
return restrictTo.getText();
|
||||
}
|
||||
public void setRestrictTo(String restrictTo)
|
||||
{
|
||||
this.restrictTo.setText(restrictTo);
|
||||
}
|
||||
public String getStartURL()
|
||||
{
|
||||
return startURL.getText();
|
||||
}
|
||||
public void setStartURL(String startURL)
|
||||
{
|
||||
this.startURL.setText(startURL);
|
||||
}
|
||||
|
||||
public void setStalledThreads(int stalled)
|
||||
{
|
||||
stalledThreadsProgress.setValue(stalled);
|
||||
}
|
||||
|
||||
public void setBytesPerSecond(double bps)
|
||||
{
|
||||
bpsText.setText("" + bps);
|
||||
}
|
||||
|
||||
|
||||
public void setDocsPerSecond(double docs)
|
||||
{
|
||||
bpsText.setText("" + docs);
|
||||
}
|
||||
|
||||
public void setFreeMem(long freeMem)
|
||||
{
|
||||
freeMemText.setText("" + freeMem);
|
||||
}
|
||||
|
||||
public void setTotalMem(long totalMem)
|
||||
{
|
||||
totalMemText.setText("" + totalMem);
|
||||
}
|
||||
|
||||
public void setRobotsTxtCount(int robotsTxtCount)
|
||||
{
|
||||
setCounterProgressBar(robotsTxtsProgress, robotsTxtCount);
|
||||
}
|
||||
|
||||
public void setDocsRead(int docs)
|
||||
{
|
||||
bpsText.setText("" + docs);
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -1,238 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.gui;
|
||||
/*
|
||||
A basic extension of the java.awt.Dialog class
|
||||
*/
|
||||
|
||||
import java.awt.*;
|
||||
import java.awt.event.*;
|
||||
|
||||
public class QuitDialog extends Dialog
|
||||
{
|
||||
public QuitDialog(Frame parent, boolean modal)
|
||||
{
|
||||
super(parent, modal);
|
||||
|
||||
//Keep a local reference to the invoking frame
|
||||
frame = parent;
|
||||
|
||||
// This code is automatically generated by Visual Cafe when you add
|
||||
// components to the visual environment. It instantiates and initializes
|
||||
// the components. To modify the code, only use code syntax that matches
|
||||
// what Visual Cafe can generate, or Visual Cafe may be unable to back
|
||||
// parse your Java file into its visual environment.
|
||||
//{{INIT_CONTROLS
|
||||
setLayout(null);
|
||||
setSize(337,135);
|
||||
setVisible(false);
|
||||
yesButton.setLabel(" Ja ");
|
||||
add(yesButton);
|
||||
yesButton.setFont(new Font("Dialog", Font.BOLD, 12));
|
||||
yesButton.setBounds(72,80,79,22);
|
||||
noButton.setLabel(" Nein ");
|
||||
add(noButton);
|
||||
noButton.setFont(new Font("Dialog", Font.BOLD, 12));
|
||||
noButton.setBounds(185,80,79,22);
|
||||
label1.setText("Möchten Sie LARM beenden?");
|
||||
label1.setAlignment(java.awt.Label.CENTER);
|
||||
add(label1);
|
||||
label1.setBounds(68,33,220,23);
|
||||
setTitle("LARM - Beenden");
|
||||
//}}
|
||||
|
||||
//{{REGISTER_LISTENERS
|
||||
SymWindow aSymWindow = new SymWindow();
|
||||
this.addWindowListener(aSymWindow);
|
||||
SymAction lSymAction = new SymAction();
|
||||
noButton.addActionListener(lSymAction);
|
||||
yesButton.addActionListener(lSymAction);
|
||||
//}}
|
||||
}
|
||||
|
||||
public void addNotify()
|
||||
{
|
||||
// Record the size of the window prior to calling parents addNotify.
|
||||
Dimension d = getSize();
|
||||
|
||||
super.addNotify();
|
||||
|
||||
if (fComponentsAdjusted)
|
||||
return;
|
||||
|
||||
// Adjust components according to the insets
|
||||
setSize(getInsets().left + getInsets().right + d.width, getInsets().top + getInsets().bottom + d.height);
|
||||
Component components[] = getComponents();
|
||||
for (int i = 0; i < components.length; i++)
|
||||
{
|
||||
Point p = components[i].getLocation();
|
||||
p.translate(getInsets().left, getInsets().top);
|
||||
components[i].setLocation(p);
|
||||
}
|
||||
fComponentsAdjusted = true;
|
||||
}
|
||||
|
||||
public QuitDialog(Frame parent, String title, boolean modal)
|
||||
{
|
||||
this(parent, modal);
|
||||
setTitle(title);
|
||||
}
|
||||
|
||||
/**
|
||||
* Shows or hides the component depending on the boolean flag b.
|
||||
* @param b if true, show the component; otherwise, hide the component.
|
||||
* @see java.awt.Component#isVisible
|
||||
*/
|
||||
public void setVisible(boolean b)
|
||||
{
|
||||
if(b)
|
||||
{
|
||||
Rectangle bounds = getParent().getBounds();
|
||||
Rectangle abounds = getBounds();
|
||||
|
||||
setLocation(bounds.x + (bounds.width - abounds.width)/ 2,
|
||||
bounds.y + (bounds.height - abounds.height)/2);
|
||||
Toolkit.getDefaultToolkit().beep();
|
||||
}
|
||||
super.setVisible(b);
|
||||
}
|
||||
|
||||
// Used for addNotify check.
|
||||
boolean fComponentsAdjusted = false;
|
||||
// Invoking frame
|
||||
Frame frame = null;
|
||||
|
||||
//{{DECLARE_CONTROLS
|
||||
java.awt.Button yesButton = new java.awt.Button();
|
||||
java.awt.Button noButton = new java.awt.Button();
|
||||
java.awt.Label label1 = new java.awt.Label();
|
||||
//}}
|
||||
|
||||
class SymAction implements java.awt.event.ActionListener
|
||||
{
|
||||
public void actionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
Object object = event.getSource();
|
||||
if (object == yesButton)
|
||||
yesButton_ActionPerformed(event);
|
||||
else if (object == noButton)
|
||||
noButton_ActionPerformed(event);
|
||||
}
|
||||
}
|
||||
|
||||
void yesButton_ActionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
yesButton_ActionPerformed_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void yesButton_ActionPerformed_Interaction1(java.awt.event.ActionEvent event)
|
||||
{
|
||||
try {
|
||||
frame.setVisible(false); // Hide the invoking frame
|
||||
frame.dispose(); // Free system resources
|
||||
this.dispose(); // Free system resources
|
||||
System.exit(0); // close the application
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void noButton_ActionPerformed(java.awt.event.ActionEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
noButton_ActionPerformed_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void noButton_ActionPerformed_Interaction1(java.awt.event.ActionEvent event)
|
||||
{
|
||||
try {
|
||||
this.dispose();
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class SymWindow extends java.awt.event.WindowAdapter
|
||||
{
|
||||
public void windowClosing(java.awt.event.WindowEvent event)
|
||||
{
|
||||
Object object = event.getSource();
|
||||
if (object == QuitDialog.this)
|
||||
QuitDialog_WindowClosing(event);
|
||||
}
|
||||
}
|
||||
|
||||
void QuitDialog_WindowClosing(java.awt.event.WindowEvent event)
|
||||
{
|
||||
// to do: code goes here.
|
||||
|
||||
QuitDialog_WindowClosing_Interaction1(event);
|
||||
}
|
||||
|
||||
|
||||
void QuitDialog_WindowClosing_Interaction1(java.awt.event.WindowEvent event)
|
||||
{
|
||||
try {
|
||||
this.dispose();
|
||||
} catch (Exception e) {
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -1,359 +0,0 @@
|
|||
/*
|
||||
* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.net;
|
||||
|
||||
import java.util.HashMap;
|
||||
import java.net.*;
|
||||
import de.lanlab.larm.util.CachingQueue;
|
||||
import de.lanlab.larm.util.Queue;
|
||||
import java.util.LinkedList;
|
||||
import de.lanlab.larm.fetcher.Message;
|
||||
|
||||
/**
|
||||
* Contains information about a host. If a host doesn't respond too often, it's
|
||||
* excluded from the crawl. This class is used by the HostManager.
|
||||
* TODO: there needs to be a way to re-include the host in the crawl. Perhaps
|
||||
* all hosts marked as unhealthy should be checked periodically and marked
|
||||
* healthy again, if they respond.
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 16. Februar 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class HostInfo
|
||||
{
|
||||
final static String[] emptyKeepOutDirectories = new String[0];
|
||||
|
||||
private int id;
|
||||
|
||||
int healthyCount = 8;
|
||||
|
||||
int locks = 2; // max. concurrent requests
|
||||
int lockObtained = 0; // for debugging
|
||||
|
||||
Object lockMonitor = new Object();
|
||||
public Object getLockMonitor()
|
||||
{
|
||||
return lockMonitor;
|
||||
}
|
||||
public void releaseLock()
|
||||
{
|
||||
synchronized(lockMonitor)
|
||||
{
|
||||
if(lockObtained>=0)
|
||||
{
|
||||
locks++;
|
||||
lockObtained--;
|
||||
// try
|
||||
// {
|
||||
// throw new Exception();
|
||||
//
|
||||
// }
|
||||
// catch(Exception e)
|
||||
// {
|
||||
// System.out.println("HostInfo: release called at: " + e.getStackTrace()[1]);
|
||||
// }
|
||||
// System.out.println("HostInfo " + hostName + ": releaseing Lock. now " + lockObtained + " locks obtained, " + locks + " available");
|
||||
}
|
||||
// else
|
||||
// {
|
||||
// System.out.println("HostInfo: lock released although no lock acquired!?");
|
||||
// }
|
||||
}
|
||||
}
|
||||
// must be synchronized
|
||||
public void obtainLock()
|
||||
{
|
||||
locks--;
|
||||
lockObtained++;
|
||||
// try
|
||||
// {
|
||||
// throw new Exception();
|
||||
//
|
||||
// }
|
||||
// catch(Exception e)
|
||||
// {
|
||||
// System.out.println("obtain called at: " + e.getStackTrace()[1]);
|
||||
// }
|
||||
// System.out.println("HostInfo " + hostName + ": obtaining Lock. now " + lockObtained + " locks obtained, " + locks + " available");
|
||||
}
|
||||
// must be synchronized
|
||||
public boolean isBusy()
|
||||
{
|
||||
return locks<=0;
|
||||
}
|
||||
|
||||
// five strikes, and you're out
|
||||
private boolean isReachable = true;
|
||||
|
||||
private boolean robotTxtChecked = false;
|
||||
|
||||
private String[] disallows;
|
||||
|
||||
// robot exclusion
|
||||
private boolean isLoadingRobotsTxt = false;
|
||||
|
||||
private Queue queuedRequests = null;
|
||||
|
||||
// robot exclusion
|
||||
private String hostName;
|
||||
|
||||
|
||||
//LinkedList synonyms = new LinkedList();
|
||||
|
||||
/**
|
||||
* Constructor for the HostInfo object
|
||||
*
|
||||
* @param hostName Description of the Parameter
|
||||
* @param id Description of the Parameter
|
||||
*/
|
||||
public HostInfo(String hostName, int id)
|
||||
{
|
||||
this.id = id;
|
||||
this.disallows = HostInfo.emptyKeepOutDirectories;
|
||||
this.hostName = hostName;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*/
|
||||
public void removeQueue()
|
||||
{
|
||||
queuedRequests = null;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the id attribute of the HostInfo object
|
||||
*
|
||||
* @return The id value
|
||||
*/
|
||||
public int getId()
|
||||
{
|
||||
return id;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param message Description of the Parameter
|
||||
*/
|
||||
public void insertIntoQueue(Message message)
|
||||
{
|
||||
queuedRequests.insert(message);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the hostName attribute of the HostInfo object
|
||||
*
|
||||
* @return The hostName value
|
||||
*/
|
||||
public String getHostName()
|
||||
{
|
||||
return hostName;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the queueSize. No error checking is done when the queue is null
|
||||
*
|
||||
* @return The queueSize value
|
||||
*/
|
||||
public int getQueueSize()
|
||||
{
|
||||
return queuedRequests.size();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* gets last entry from queue. No error checking is done when the queue is null
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public Message removeFromQueue()
|
||||
{
|
||||
return (Message) queuedRequests.remove();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* is this host reachable and responding?
|
||||
*
|
||||
* @return The healthy value
|
||||
*/
|
||||
public boolean isHealthy()
|
||||
{
|
||||
return (healthyCount > 0) && isReachable;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* signals that the host returned with a bad request of whatever type
|
||||
*/
|
||||
public void badRequest()
|
||||
{
|
||||
healthyCount--;
|
||||
System.out.println("HostInfo: " + this.hostName + ": badRequest. " + healthyCount + " left");
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the reachable attribute of the HostInfo object
|
||||
*
|
||||
* @param reachable The new reachable value
|
||||
*/
|
||||
public void setReachable(boolean reachable)
|
||||
{
|
||||
isReachable = reachable;
|
||||
System.out.println("HostInfo: " + this.hostName + ": setting to " + (reachable ? "reachable" : "unreachable"));
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the reachable attribute of the HostInfo object
|
||||
*
|
||||
* @return The reachable value
|
||||
*/
|
||||
public boolean isReachable()
|
||||
{
|
||||
return isReachable;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the robotTxtChecked attribute of the HostInfo object
|
||||
*
|
||||
* @return The robotTxtChecked value
|
||||
*/
|
||||
public boolean isRobotTxtChecked()
|
||||
{
|
||||
return robotTxtChecked;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* must be synchronized externally
|
||||
*
|
||||
* @return The loadingRobotsTxt value
|
||||
*/
|
||||
public boolean isLoadingRobotsTxt()
|
||||
{
|
||||
return this.isLoadingRobotsTxt;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the loadingRobotsTxt attribute of the HostInfo object
|
||||
*
|
||||
* @param isLoading The new loadingRobotsTxt value
|
||||
*/
|
||||
public void setLoadingRobotsTxt(boolean isLoading)
|
||||
{
|
||||
this.isLoadingRobotsTxt = isLoading;
|
||||
if (isLoading)
|
||||
{
|
||||
// FIXME: move '100' to properties
|
||||
this.queuedRequests = new CachingQueue("HostInfo_" + id + "_QueuedRequests", 100);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the robotsChecked attribute of the HostInfo object
|
||||
*
|
||||
* @param isChecked The new robotsChecked value
|
||||
* @param disallows The new robotsChecked value
|
||||
*/
|
||||
public void setRobotsChecked(boolean isChecked, String[] disallows)
|
||||
{
|
||||
this.robotTxtChecked = isChecked;
|
||||
if (disallows != null)
|
||||
{
|
||||
this.disallows = disallows;
|
||||
}
|
||||
else
|
||||
{
|
||||
this.disallows = emptyKeepOutDirectories;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the allowed attribute of the HostInfo object
|
||||
*
|
||||
* @param path Description of the Parameter
|
||||
* @return The allowed value
|
||||
*/
|
||||
public synchronized boolean isAllowed(String path)
|
||||
{
|
||||
// assume keepOutDirectories is pretty short
|
||||
// assert disallows != null
|
||||
int length = disallows.length;
|
||||
for (int i = 0; i < length; i++)
|
||||
{
|
||||
if (path.startsWith(disallows[i]))
|
||||
{
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
}
|
|
@ -1,188 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.net;
|
||||
|
||||
import java.util.HashMap;
|
||||
import java.util.*;
|
||||
import org.apache.oro.text.perl.*;
|
||||
import org.apache.oro.text.regex.*;
|
||||
import org.apache.oro.text.*;
|
||||
import org.apache.oro.util.*;
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 16. Februar 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class HostManager
|
||||
{
|
||||
HashMap hosts;
|
||||
static int hostCount = 0;
|
||||
HostResolver resolver;
|
||||
|
||||
|
||||
|
||||
// ArrayList rewriteRules = new ArrayList();
|
||||
|
||||
/**
|
||||
* Constructor for the HostInfo object
|
||||
*
|
||||
* @param initialSize Description of the Parameter
|
||||
*/
|
||||
public HostManager(int initialCapacity)
|
||||
{
|
||||
hosts = new HashMap(initialCapacity);
|
||||
}
|
||||
|
||||
public void setHostResolver(HostResolver resolver)
|
||||
{
|
||||
this.resolver = resolver;
|
||||
}
|
||||
|
||||
/**
|
||||
* returns the hostResolver
|
||||
* @return
|
||||
*/
|
||||
public HostResolver getHostResolver()
|
||||
{
|
||||
return this.resolver;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param hostName Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public HostInfo put(String hostName)
|
||||
{
|
||||
if(resolver != null)
|
||||
{
|
||||
return putResolved(hostName, resolver.resolveHost(hostName));
|
||||
}
|
||||
else
|
||||
{
|
||||
return putResolved(hostName, hostName);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
public HostInfo putResolved(String hostName, String resolvedHostName)
|
||||
{
|
||||
if (!hosts.containsKey(resolvedHostName))
|
||||
{
|
||||
int hostID;
|
||||
synchronized (this)
|
||||
{
|
||||
hostID = hostCount++;
|
||||
}
|
||||
HostInfo hi = new HostInfo(hostName,hostID);
|
||||
hosts.put(resolvedHostName, hi);
|
||||
//System.out.println("hostManager: + " + hostName);
|
||||
// if(!hostName.equals(hostName.toLowerCase()))
|
||||
// {
|
||||
// try
|
||||
// {
|
||||
// throw new Exception();
|
||||
// }
|
||||
// catch(Exception e)
|
||||
// {
|
||||
// e.printStackTrace();
|
||||
// }
|
||||
// }
|
||||
return hi;
|
||||
}
|
||||
return (HostInfo)hosts.get(hostName);
|
||||
}
|
||||
|
||||
|
||||
public HostInfo getHostInfo(String hostName)
|
||||
{
|
||||
return getHostInfoNormalized(hostName, resolver.resolveHost(hostName));
|
||||
}
|
||||
|
||||
/**
|
||||
* Gets the hostID attribute of the HostInfo object
|
||||
*
|
||||
* @param hostName Description of the Parameter
|
||||
* @return The hostID value
|
||||
*/
|
||||
public HostInfo getHostInfoNormalized(String hostName, String normalizedHostName)
|
||||
{
|
||||
HostInfo hi = (HostInfo)hosts.get(normalizedHostName);
|
||||
if(hi == null)
|
||||
{
|
||||
// System.out.println("new host: " + normalizedHostName);
|
||||
return putResolved(hostName, normalizedHostName);
|
||||
}
|
||||
return hi;
|
||||
}
|
||||
|
||||
public int getSize()
|
||||
{
|
||||
return hosts.size();
|
||||
}
|
||||
|
||||
public HostInfo addSynonym(String hostName, String synonym)
|
||||
{
|
||||
resolver.addSynonym(hostName, synonym);
|
||||
return getHostInfo(hostName);
|
||||
}
|
||||
|
||||
|
||||
}
|
|
@ -1,273 +0,0 @@
|
|||
package de.lanlab.larm.net;
|
||||
|
||||
import java.util.*;
|
||||
import xxl.collections.*;
|
||||
import java.io.*;
|
||||
import org.apache.commons.beanutils.*;
|
||||
import java.lang.reflect.*;
|
||||
import org.apache.commons.logging.*;
|
||||
|
||||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
|
||||
|
||||
|
||||
//class LRUCache
|
||||
//{
|
||||
// HashMap cache = null;
|
||||
// LinkedList order = null;
|
||||
// int max;
|
||||
//
|
||||
// public LRUCache(int max)
|
||||
// {
|
||||
//
|
||||
// this.max = max;
|
||||
// cache = new HashMap((int)(max/0.6));
|
||||
// order = new LinkedList();
|
||||
// }
|
||||
//
|
||||
// public Object get(Object key)
|
||||
// {
|
||||
// return cache.get(key);
|
||||
// }
|
||||
//
|
||||
//
|
||||
//
|
||||
// public void put(Object key, Object value)
|
||||
// {
|
||||
// if(!cache.containsKey(key))
|
||||
// {
|
||||
// if(order.size() > max)
|
||||
// {
|
||||
// cache.remove(order.removeLast());
|
||||
// }
|
||||
// }
|
||||
// else
|
||||
// {
|
||||
// //assert order.contains(key);
|
||||
// order.remove(key);
|
||||
// // quite expensive, probably need a hashed list
|
||||
// // or something even simpler
|
||||
// }
|
||||
// order.addFirst(key);
|
||||
// cache.put(key, value);
|
||||
// }
|
||||
//}
|
||||
|
||||
/**
|
||||
* Uses @link{#resolveHost()} which transforms a host name according to the rules
|
||||
* Rules are (and executed in this order)
|
||||
* <ul>
|
||||
* <li>if host starts with (startsWith), replace this part with (replacement)
|
||||
* <li>if host ends with (endsWith), replace it with (replacement)
|
||||
* <li>if host is (synonym), replace it with (replacement)
|
||||
* </ul>
|
||||
* the resolver can be configured through a property file, which is loaded by an
|
||||
* Apache BeanUtils property loader.<p>
|
||||
* Actually the resolver doesn't do any network calls, so this class can be used
|
||||
* with any string, if you really need to
|
||||
* @author Clemens Marschner
|
||||
* @version 1.0
|
||||
*/
|
||||
public class HostResolver
|
||||
{
|
||||
|
||||
HashMap synonym;
|
||||
public HostResolver()
|
||||
{
|
||||
synonym = new HashMap();
|
||||
}
|
||||
|
||||
/**
|
||||
* convenience method that loads the config from a properties file
|
||||
* @param fileName a property file
|
||||
* @throws IOException thrown if fileName is wrong or something went wrong while reading
|
||||
* @throws InvocationTargetException thrown by java.util.Properties
|
||||
* @throws IllegalAccessException thrown by java.util.Properties
|
||||
*/
|
||||
public void initFromFile(String fileName) throws IOException, InvocationTargetException, IllegalAccessException
|
||||
{
|
||||
InputStream in = new FileInputStream(fileName);
|
||||
Properties p = new Properties();
|
||||
p.load(in);
|
||||
in.close();
|
||||
initFromProperties(p);
|
||||
}
|
||||
|
||||
/**
|
||||
* populates the synonym, startsWith and endsWith properties with a BeanUtils.populate()
|
||||
* @param props
|
||||
* @throws InvocationTargetException
|
||||
* @throws IllegalAccessException
|
||||
*/
|
||||
public void initFromProperties(Properties props) throws InvocationTargetException, IllegalAccessException
|
||||
{
|
||||
BeanUtils.populate(this, props);
|
||||
}
|
||||
|
||||
ArrayList startsWithArray = new ArrayList();
|
||||
int startsWithSize = 0;
|
||||
ArrayList endsWithArray = new ArrayList();
|
||||
int endsWithSize = 0;
|
||||
|
||||
public String getStartsWith(String name) throws IllegalAccessException
|
||||
{
|
||||
throw new IllegalAccessException("brrffz");
|
||||
}
|
||||
|
||||
public void setStartsWith(String name, String rep)
|
||||
{
|
||||
addHostStartsWithReplace(name.replace(',','.'), rep.replace(',','.'));
|
||||
}
|
||||
public String getEndsWith(String name) throws IllegalAccessException
|
||||
{
|
||||
throw new IllegalAccessException("brrffz");
|
||||
}
|
||||
public void setEndsWith(String name, String rep)
|
||||
{
|
||||
this.addHostEndsWithReplace(name.replace(',','.'), rep.replace(',','.'));
|
||||
}
|
||||
|
||||
public void setSynonym(String name, String syn)
|
||||
{
|
||||
addSynonym(name.replace(',','.'), syn.replace(',','.'));
|
||||
}
|
||||
public String getSynonym(String name) throws IllegalAccessException
|
||||
{
|
||||
throw new IllegalAccessException("brrffz");
|
||||
}
|
||||
public void addSynonym(String name, String syn)
|
||||
{
|
||||
System.out.println("adding synonym " + name + " -> " + syn);
|
||||
synonym.put(name, syn);
|
||||
}
|
||||
|
||||
/**
|
||||
* transforms a host name if a rule is found
|
||||
* @param hostName
|
||||
* @return probably changed host name
|
||||
*/
|
||||
public String resolveHost(String hostName)
|
||||
{
|
||||
if(hostName == null)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
for(int i=0; i<startsWithSize; i++)
|
||||
{
|
||||
String[] test = (String[])startsWithArray.get(i);
|
||||
if(hostName.startsWith(test[0]))
|
||||
{
|
||||
hostName = test[1] + hostName.substring(test[0].length());
|
||||
break;
|
||||
}
|
||||
}
|
||||
for(int i=0; i<endsWithSize; i++)
|
||||
{
|
||||
String[] test = (String[])endsWithArray.get(i);
|
||||
if(hostName.endsWith(test[0]))
|
||||
{
|
||||
hostName = hostName.substring(0, hostName.length() - test[0].length()) + test[1];
|
||||
break;
|
||||
}
|
||||
}
|
||||
String syn = (String)synonym.get(hostName);
|
||||
return syn != null ? syn : hostName;
|
||||
}
|
||||
|
||||
public void addHostStartsWithReplace(String startsWith, String replace)
|
||||
{
|
||||
System.out.println("adding sw replace " + startsWith + " -> " + replace);
|
||||
startsWithArray.add(new String[] { startsWith, replace });
|
||||
startsWithSize++;
|
||||
}
|
||||
|
||||
public void addHostEndsWithReplace(String endsWith, String replace)
|
||||
{
|
||||
System.out.println("adding ew replace " + endsWith + " -> " + replace);
|
||||
endsWithArray.add(new String[] { endsWith, replace });
|
||||
endsWithSize++;
|
||||
}
|
||||
|
||||
// /** The pattern cache to compile and store patterns */
|
||||
// private PatternCache __patternCache;
|
||||
// /** The hashtable to cache higher-level expressions */
|
||||
// private Cache __expressionCache;
|
||||
// /** The pattern matcher to perform matching operations. */
|
||||
// private Perl5Matcher __matcher = new Perl5Matcher();
|
||||
//
|
||||
// public void addReplaceRegEx(String findRegEx, String replaceRegEx, boolean greedy)
|
||||
// {
|
||||
// int compileOptions = Perl5Compiler.CASE_INSENSITIVE_MASK;
|
||||
// int numSubstitutions = 1;
|
||||
// if(greedy)
|
||||
// {
|
||||
// numSubstitutions = Util.SUBSTITUTE_ALL;
|
||||
// }
|
||||
//
|
||||
// Pattern compiledPattern = __patternCache.getPattern(findRegEx, compileOptions);
|
||||
// Perl5Substitution substitution = new Perl5Substitution(replaceRegEx, numInterpolations);
|
||||
// ParsedSubstitutionEntry entry = new ParsedSubstitutionEntry(compiledPattern, substitution, numSubstitutions);
|
||||
// __expressionCache.addElement(expression, entry);
|
||||
//
|
||||
// result = Util.substitute(__matcher, compiledPattern, substitution,
|
||||
// input, numSubstitutions);
|
||||
//
|
||||
// __lastMatch = __matcher.getMatch();
|
||||
//
|
||||
// return result;
|
||||
// }
|
||||
|
||||
}
|
|
@ -1,190 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.net;
|
||||
|
||||
// whatever package you want
|
||||
import sun.net.www.http.HttpClient;
|
||||
import sun.net.www.MessageHeader;
|
||||
import sun.net.ProgressEntry;
|
||||
|
||||
import java.net.*;
|
||||
import java.io.*;
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
*@author cmarschn
|
||||
*@created 2. Mai 2001
|
||||
*/
|
||||
public class HttpClientTimeout extends HttpClient {
|
||||
private int timeout = -1;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the HttpClientTimeout object
|
||||
*
|
||||
*@param url Description of Parameter
|
||||
*@param proxy Description of Parameter
|
||||
*@param proxyPort Description of Parameter
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
public HttpClientTimeout(URL url, String proxy, int proxyPort) throws IOException {
|
||||
super(url, proxy, proxyPort);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the HttpClientTimeout object
|
||||
*
|
||||
*@param url Description of Parameter
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
public HttpClientTimeout(URL url) throws IOException {
|
||||
super(url, null, -1);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the Timeout attribute of the HttpClientTimeout object
|
||||
*
|
||||
*@param i The new Timeout value
|
||||
*@exception SocketException Description of Exception
|
||||
*/
|
||||
public void setTimeout(int i) throws SocketException {
|
||||
this.timeout = -1;
|
||||
serverSocket.setSoTimeout(i);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the Socket attribute of the HttpClientTimeout object
|
||||
*
|
||||
*@return The Socket value
|
||||
*/
|
||||
public Socket getSocket() {
|
||||
return serverSocket;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param header Description of Parameter
|
||||
*@param entry Description of Parameter
|
||||
*@return Description of the Returned Value
|
||||
*@exception java.io.IOException Description of Exception
|
||||
*/
|
||||
public boolean parseHTTP(MessageHeader header, ProgressEntry entry) throws java.io.IOException {
|
||||
if (this.timeout != -1) {
|
||||
try {
|
||||
serverSocket.setSoTimeout(this.timeout);
|
||||
}
|
||||
catch (SocketException e) {
|
||||
throw new java.io.IOException("unable to set socket timeout!");
|
||||
}
|
||||
}
|
||||
return super.parseHTTP(header, entry);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
public void close() throws IOException {
|
||||
serverSocket.close();
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
* public void SetTimeout(int i) throws SocketException {
|
||||
* serverSocket.setSoTimeout(i);
|
||||
* }
|
||||
*/
|
||||
/*
|
||||
* This class has no public constructor for HTTP. This method is used to
|
||||
* get an HttpClient to the specifed URL. If there's currently an
|
||||
* active HttpClient to that server/port, you'll get that one.
|
||||
*
|
||||
* no longer syncrhonized -- it slows things down too much
|
||||
* synchronize at a higher level
|
||||
*/
|
||||
/**
|
||||
* Gets the New attribute of the HttpClientTimeout class
|
||||
*
|
||||
*@param url Description of Parameter
|
||||
*@return The New value
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
public static HttpClientTimeout getNew(URL url) throws IOException {
|
||||
/*
|
||||
* see if one's already around
|
||||
*/
|
||||
HttpClientTimeout ret = (HttpClientTimeout) kac.get(url);
|
||||
if (ret == null) {
|
||||
ret = new HttpClientTimeout(url);
|
||||
// CTOR called openServer()
|
||||
}
|
||||
else {
|
||||
ret.url = url;
|
||||
}
|
||||
// don't know if we're keeping alive until we parse the headers
|
||||
// for now, keepingAlive is false
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
|
|
@ -1,104 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.net;
|
||||
|
||||
import java.net.*;
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
*@author cmarschn
|
||||
*@created 2. Mai 2001
|
||||
*/
|
||||
public class HttpTimeoutFactory implements URLStreamHandlerFactory {
|
||||
int fiTimeoutVal;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the HttpTimeoutFactory object
|
||||
*
|
||||
*@param iT Description of Parameter
|
||||
*/
|
||||
public HttpTimeoutFactory(int iT) {
|
||||
fiTimeoutVal = iT;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param str Description of Parameter
|
||||
*@return Description of the Returned Value
|
||||
*/
|
||||
public URLStreamHandler createURLStreamHandler(String str) {
|
||||
return new HttpTimeoutHandler(fiTimeoutVal);
|
||||
}
|
||||
|
||||
static HttpTimeoutFactory instance = null;
|
||||
|
||||
/**
|
||||
* gets an instance. only the first call will create it. In subsequent calls the iT
|
||||
* parameter doesn't have a meaning.
|
||||
*/
|
||||
public static HttpTimeoutFactory getInstance(int iT)
|
||||
{
|
||||
if(instance == null)
|
||||
{
|
||||
instance = new HttpTimeoutFactory(iT);
|
||||
}
|
||||
return instance;
|
||||
}
|
||||
}
|
||||
|
|
@ -1,134 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.net;
|
||||
|
||||
import java.net.*;
|
||||
import java.io.IOException;
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
*@author cmarschn
|
||||
*@created 2. Mai 2001
|
||||
*/
|
||||
public class HttpTimeoutHandler extends sun.net.www.protocol.http.Handler {
|
||||
int timeoutVal;
|
||||
HttpURLConnectionTimeout fHUCT;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the HttpTimeoutHandler object
|
||||
*
|
||||
*@param iT Description of Parameter
|
||||
*/
|
||||
public HttpTimeoutHandler(int iT) {
|
||||
timeoutVal = iT;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the Socket attribute of the HttpTimeoutHandler object
|
||||
*
|
||||
*@return The Socket value
|
||||
*/
|
||||
public Socket getSocket() {
|
||||
return fHUCT.getSocket();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@exception Exception Description of Exception
|
||||
*/
|
||||
public void close() throws Exception {
|
||||
fHUCT.close();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param u Description of Parameter
|
||||
*@return Description of the Returned Value
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
protected java.net.URLConnection openConnection(URL u) throws IOException {
|
||||
return fHUCT = new HttpURLConnectionTimeout(u, this, timeoutVal);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the Proxy attribute of the HttpTimeoutHandler object
|
||||
*
|
||||
*@return The Proxy value
|
||||
*/
|
||||
String getProxy() {
|
||||
return proxy;
|
||||
// breaking encapsulation
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the ProxyPort attribute of the HttpTimeoutHandler object
|
||||
*
|
||||
*@return The ProxyPort value
|
||||
*/
|
||||
int getProxyPort() {
|
||||
return proxyPort;
|
||||
// breaking encapsulation
|
||||
}
|
||||
}
|
||||
|
|
@ -1,280 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.net;
|
||||
|
||||
import java.net.*;
|
||||
import java.io.*;
|
||||
import sun.net.www.http.HttpClient;
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
*@author cmarschn
|
||||
*@created 2. Mai 2001
|
||||
*/
|
||||
public class HttpURLConnectionTimeout extends sun.net.www.protocol.http.HttpURLConnection {
|
||||
int fiTimeoutVal;
|
||||
HttpTimeoutHandler fHandler;
|
||||
HttpClientTimeout fClient;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the HttpURLConnectionTimeout object
|
||||
*
|
||||
*@param u Description of Parameter
|
||||
*@param handler Description of Parameter
|
||||
*@param iTimeout Description of Parameter
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
public HttpURLConnectionTimeout(URL u, HttpTimeoutHandler handler, int iTimeout) throws IOException {
|
||||
super(u, handler);
|
||||
fHandler = handler;
|
||||
fiTimeoutVal = iTimeout;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the HttpURLConnectionTimeout object
|
||||
*
|
||||
*@param u Description of Parameter
|
||||
*@param host Description of Parameter
|
||||
*@param port Description of Parameter
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
public HttpURLConnectionTimeout(URL u, String host, int port) throws IOException {
|
||||
super(u, host, port);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
public void connect() throws IOException {
|
||||
if (connected) {
|
||||
return;
|
||||
}
|
||||
try {
|
||||
if ("http".equals(url.getProtocol())
|
||||
/*
|
||||
* && !failedOnce <- PRIVATE
|
||||
*/
|
||||
) {
|
||||
// for safety's sake, as reported by KLGroup
|
||||
synchronized (url) {
|
||||
http = HttpClientTimeout.getNew(url);
|
||||
}
|
||||
fClient = (HttpClientTimeout) http;
|
||||
((HttpClientTimeout) http).setTimeout(fiTimeoutVal);
|
||||
}
|
||||
else {
|
||||
// make sure to construct new connection if first
|
||||
// attempt failed
|
||||
http = new HttpClientTimeout(url, fHandler.getProxy(), fHandler.getProxyPort());
|
||||
}
|
||||
ps = (PrintStream) http.getOutputStream();
|
||||
}
|
||||
catch (IOException e) {
|
||||
throw e;
|
||||
}
|
||||
// this was missing from the original version
|
||||
connected = true;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Create a new HttpClient object, bypassing the cache of HTTP client
|
||||
* objects/connections.
|
||||
*
|
||||
*@param url the URL being accessed
|
||||
*@return The NewClient value
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
protected HttpClient getNewClient(URL url)
|
||||
throws IOException {
|
||||
HttpClientTimeout client = new HttpClientTimeout(url, (String) null, -1);
|
||||
try {
|
||||
client.setTimeout(fiTimeoutVal);
|
||||
}
|
||||
catch (Exception e) {
|
||||
System.out.println("Unable to set timeout value");
|
||||
}
|
||||
return (HttpClient) client;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the Socket attribute of the HttpURLConnectionTimeout object
|
||||
*
|
||||
*@return The Socket value
|
||||
*/
|
||||
Socket getSocket() {
|
||||
return fClient.getSocket();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@exception Exception Description of Exception
|
||||
*/
|
||||
void close() throws Exception {
|
||||
fClient.close();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* opens a stream allowing redirects only to the same host.
|
||||
*
|
||||
*@param c Description of Parameter
|
||||
*@return Description of the Returned Value
|
||||
*@exception IOException Description of Exception
|
||||
*/
|
||||
public static InputStream openConnectionCheckRedirects(URLConnection c)
|
||||
throws IOException {
|
||||
boolean redir;
|
||||
int redirects = 0;
|
||||
InputStream in = null;
|
||||
|
||||
do {
|
||||
if (c instanceof HttpURLConnectionTimeout) {
|
||||
((HttpURLConnectionTimeout) c).setInstanceFollowRedirects(false);
|
||||
}
|
||||
|
||||
// We want to open the input stream before
|
||||
// getting headers, because getHeaderField()
|
||||
// et al swallow IOExceptions.
|
||||
in = c.getInputStream();
|
||||
redir = false;
|
||||
|
||||
if (c instanceof HttpURLConnectionTimeout) {
|
||||
HttpURLConnectionTimeout http = (HttpURLConnectionTimeout) c;
|
||||
int stat = http.getResponseCode();
|
||||
if (stat >= 300 && stat <= 305 &&
|
||||
stat != HttpURLConnection.HTTP_NOT_MODIFIED) {
|
||||
URL base = http.getURL();
|
||||
String loc = http.getHeaderField("Location");
|
||||
URL target = null;
|
||||
if (loc != null) {
|
||||
target = new URL(base, loc);
|
||||
}
|
||||
http.disconnect();
|
||||
if (target == null
|
||||
|| !base.getProtocol().equals(target.getProtocol())
|
||||
|| base.getPort() != target.getPort()
|
||||
|| !HostsEquals(base, target)
|
||||
|| redirects >= 5) {
|
||||
throw new SecurityException("illegal URL redirect");
|
||||
}
|
||||
redir = true;
|
||||
c = target.openConnection();
|
||||
redirects++;
|
||||
}
|
||||
}
|
||||
} while (redir);
|
||||
return in;
|
||||
}
|
||||
|
||||
|
||||
// Same as java.net.URL.hostsEqual
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param u1 Description of Parameter
|
||||
*@param u2 Description of Parameter
|
||||
*@return Description of the Returned Value
|
||||
*/
|
||||
static boolean HostsEquals(URL u1, URL u2) {
|
||||
final String h1 = u1.getHost();
|
||||
final String h2 = u2.getHost();
|
||||
|
||||
if (h1 == null) {
|
||||
return h2 == null;
|
||||
}
|
||||
else if (h2 == null) {
|
||||
return false;
|
||||
}
|
||||
else if (h1.equalsIgnoreCase(h2)) {
|
||||
return true;
|
||||
}
|
||||
// Have to resolve addresses before comparing, otherwise
|
||||
// names like tachyon and tachyon.eng would compare different
|
||||
final boolean result[] = {false};
|
||||
|
||||
java.security.AccessController.doPrivileged(
|
||||
new java.security.PrivilegedAction() {
|
||||
/**
|
||||
* Main processing method for the HttpURLConnectionTimeout object
|
||||
*
|
||||
*@return Description of the Returned Value
|
||||
*/
|
||||
public Object run() {
|
||||
try {
|
||||
InetAddress a1 = InetAddress.getByName(h1);
|
||||
InetAddress a2 = InetAddress.getByName(h2);
|
||||
result[0] = a1.equals(a2);
|
||||
}
|
||||
catch (UnknownHostException e) {
|
||||
}
|
||||
catch (SecurityException e) {
|
||||
}
|
||||
return null;
|
||||
}
|
||||
});
|
||||
return result[0];
|
||||
}
|
||||
}
|
|
@ -1,419 +0,0 @@
|
|||
package de.lanlab.larm.net;
|
||||
/*
|
||||
* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
import java.io.*;
|
||||
import java.net.*;
|
||||
import org.apache.oro.text.perl.*;
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 14. Juni 2002
|
||||
*/
|
||||
public class URLNormalizer
|
||||
{
|
||||
final static int NP_SLASH = 1;
|
||||
final static int NP_CHAR = 2;
|
||||
final static int NP_PERCENT = 3;
|
||||
final static int NP_POINT = 4;
|
||||
final static int NP_HEX = 5;
|
||||
|
||||
/**
|
||||
* contains hex codes for characters in lowercase uses char arrays instead
|
||||
* of strings for faster processing
|
||||
*/
|
||||
protected final static char[][] charMap = {
|
||||
{'%', '0', '0'}, {'%', '0', '1'}, {'%', '0', '2'}, {'%', '0', '3'}, {'%', '0', '4'}, {'%', '0', '5'}, {'%', '0', '6'}, {'%', '0', '7'}, {'%', '0', '8'}, {'%', '0', '9'}, {'%', '0', 'A'}, {'%', '0', 'B'}, {'%', '0', 'C'}, {'%', '0', 'D'}, {'%', '0', 'E'}, {'%', '0', 'F'},
|
||||
{'%', '1', '0'}, {'%', '1', '1'}, {'%', '1', '2'}, {'%', '1', '3'}, {'%', '1', '4'}, {'%', '1', '5'}, {'%', '1', '6'}, {'%', '1', '7'}, {'%', '1', '8'}, {'%', '1', '9'}, {'%', '1', 'A'}, {'%', '1', 'B'}, {'%', '1', 'C'}, {'%', '1', 'D'}, {'%', '1', 'E'}, {'%', '1', 'F'},
|
||||
{'%', '2', '0'}, {'%', '2', '1'}, {'%', '2', '2'}, {'%', '2', '3'}, {'$'}, {'%', '2', '5'}, {'%', '2', '6'}, {'%', '2', '7'}, {'%', '2', '8'}, {'%', '2', '9'}, {'%', '2', 'A'}, {'%', '2', 'B'}, {'%', '2', 'C'}, {'-'}, {'.'}, {'%', '2', 'F'},
|
||||
{'0'}, {'1'}, {'2'}, {'3'}, {'4'}, {'5'}, {'6'}, {'7'}, {'8'}, {'9'}, {'%', '3', 'A'}, {'%', '3', 'B'}, {'%', '3', 'C'}, {'%', '3', 'D'}, {'%', '3', 'E'}, {'%', '3', 'F'},
|
||||
{'%', '4', '0'}, {'a'}, {'b'}, {'c'}, {'d'}, {'e'}, {'f'}, {'g'}, {'h'}, {'i'}, {'j'}, {'k'}, {'l'}, {'m'}, {'n'}, {'o'},
|
||||
{'p'}, {'q'}, {'r'}, {'s'}, {'t'}, {'u'}, {'v'}, {'w'}, {'x'}, {'y'}, {'z'}, {'%', '5', 'B'}, {'%', '5', 'C'}, {'%', '5', 'D'}, {'%', '5', 'E'}, {'_'},
|
||||
{'%', '6', '0'}, {'a'}, {'b'}, {'c'}, {'d'}, {'e'}, {'f'}, {'g'}, {'h'}, {'i'}, {'j'}, {'k'}, {'l'}, {'m'}, {'n'}, {'o'},
|
||||
{'p'}, {'q'}, {'r'}, {'s'}, {'t'}, {'u'}, {'v'}, {'w'}, {'x'}, {'y'}, {'z'}, {'%', '7', 'B'}, {'%', '7', 'C'}, {'%', '7', 'D'}, {'%', '7', 'E'}, {'%', '7', 'F'},
|
||||
{'%', '8', '0'}, {'%', '8', '1'}, {'%', '8', '2'}, {'%', '8', '3'}, {'%', '8', '4'}, {'%', '8', '5'}, {'%', '8', '6'}, {'%', '8', '7'}, {'%', '8', '8'}, {'%', '8', '9'}, {'%', '8', 'A'}, {'%', '8', 'B'}, {'%', '8', 'C'}, {'%', '8', 'D'}, {'%', '8', 'E'}, {'%', '8', 'F'},
|
||||
{'%', '9', '0'}, {'%', '9', '1'}, {'%', '9', '2'}, {'%', '9', '3'}, {'%', '9', '4'}, {'%', '9', '5'}, {'%', '9', '6'}, {'%', '9', '7'}, {'%', '9', '8'}, {'%', '9', '9'}, {'%', '9', 'A'}, {'%', '9', 'B'}, {'%', '9', 'C'}, {'%', '9', 'D'}, {'%', '9', 'E'}, {'%', '9', 'F'},
|
||||
{'%', 'A', '0'}, {'%', 'A', '1'}, {'%', 'A', '2'}, {'%', 'A', '3'}, {'%', 'A', '4'}, {'%', 'A', '5'}, {'%', 'A', '6'}, {'%', 'A', '7'}, {'%', 'A', '8'}, {'%', 'A', '9'}, {'%', 'A', 'A'}, {'%', 'A', 'B'}, {'%', 'A', 'C'}, {'%', 'A', 'D'}, {'%', 'A', 'E'}, {'%', 'A', 'F'},
|
||||
{'%', 'B', '0'}, {'%', 'B', '1'}, {'%', 'B', '2'}, {'%', 'B', '3'}, {'%', 'B', '4'}, {'%', 'B', '5'}, {'%', 'B', '6'}, {'%', 'B', '7'}, {'%', 'B', '8'}, {'%', 'B', '9'}, {'%', 'B', 'A'}, {'%', 'B', 'B'}, {'%', 'B', 'C'}, {'%', 'B', 'D'}, {'%', 'B', 'E'}, {'%', 'B', 'F'},
|
||||
{'%', 'E', '0'}, {'%', 'E', '1'}, {'%', 'E', '2'}, {'%', 'E', '3'}, {'%', 'E', '4'}, {'%', 'E', '5'}, {'%', 'E', '6'}, {'%', 'E', '7'}, {'%', 'E', '8'}, {'%', 'E', '9'}, {'%', 'E', 'A'}, {'%', 'E', 'B'}, {'%', 'E', 'C'}, {'%', 'E', 'D'}, {'%', 'E', 'E'}, {'%', 'E', 'F'},
|
||||
{'%', 'F', '0'}, {'%', 'F', '1'}, {'%', 'F', '2'}, {'%', 'F', '3'}, {'%', 'F', '4'}, {'%', 'F', '5'}, {'%', 'F', '6'}, {'%', 'D', '7'}, {'%', 'F', '8'}, {'%', 'F', '9'}, {'%', 'F', 'A'}, {'%', 'F', 'B'}, {'%', 'F', 'C'}, {'%', 'F', 'D'}, {'%', 'F', 'E'}, {'%', 'D', 'F'},
|
||||
{'%', 'E', '0'}, {'%', 'E', '1'}, {'%', 'E', '2'}, {'%', 'E', '3'}, {'%', 'E', '4'}, {'%', 'E', '5'}, {'%', 'E', '6'}, {'%', 'E', '7'}, {'%', 'E', '8'}, {'%', 'E', '9'}, {'%', 'E', 'A'}, {'%', 'E', 'B'}, {'%', 'E', 'C'}, {'%', 'E', 'D'}, {'%', 'E', 'E'}, {'%', 'E', 'F'},
|
||||
{'%', 'F', '0'}, {'%', 'F', '1'}, {'%', 'F', '2'}, {'%', 'F', '3'}, {'%', 'F', '4'}, {'%', 'F', '5'}, {'%', 'F', '6'}, {'%', 'F', '7'}, {'%', 'F', '8'}, {'%', 'F', '9'}, {'%', 'F', 'A'}, {'%', 'F', 'B'}, {'%', 'F', 'C'}, {'%', 'F', 'D'}, {'%', 'F', 'E'}, {'%', 'F', 'F'},
|
||||
};
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param path Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
* @exception IOException Description of the Exception
|
||||
*/
|
||||
protected static String normalizePath(String path)
|
||||
throws IOException
|
||||
{
|
||||
// rule 1: if the path is empty, return "/"
|
||||
if (path.length() == 0)
|
||||
{
|
||||
return "/";
|
||||
}
|
||||
|
||||
// Finite State Machine to convert characters to lowercase, remove "//" and "/./"
|
||||
// and make sure that all characters are escaped in a uniform way, i.e.
|
||||
// {" ", "+", "%20"} -> "%20"
|
||||
|
||||
StringBuffer w = new StringBuffer((int) (path.length() * 1.5));
|
||||
|
||||
int status = NP_CHAR;
|
||||
|
||||
int pos = 0;
|
||||
int length = path.length();
|
||||
char savedChar = '?';
|
||||
int hexChar = '?';
|
||||
int pathPos = -1; // position of last "/"
|
||||
int questionPos = -1; // assert length >0
|
||||
boolean isInQuery = false; // question mark reached?
|
||||
|
||||
while (pos < length)
|
||||
{
|
||||
char c = path.charAt(pos++);
|
||||
try
|
||||
{
|
||||
switch (status)
|
||||
{
|
||||
case NP_SLASH:
|
||||
if (c == '/')
|
||||
{
|
||||
// ignore subsequent slashes
|
||||
}
|
||||
else if (c == '.')
|
||||
{
|
||||
status = NP_POINT;
|
||||
}
|
||||
else if (c == '%')
|
||||
{
|
||||
status = NP_PERCENT;
|
||||
}
|
||||
else
|
||||
{
|
||||
pos--;
|
||||
status = NP_CHAR;
|
||||
}
|
||||
break;
|
||||
case NP_POINT:
|
||||
if (c == '/')
|
||||
{
|
||||
// ignore
|
||||
}
|
||||
else if (c == '.')
|
||||
{
|
||||
// ignore; this shouldn't happen
|
||||
}
|
||||
else
|
||||
{
|
||||
w.append('.');
|
||||
pos--;
|
||||
status = NP_SLASH;
|
||||
}
|
||||
break;
|
||||
case NP_PERCENT:
|
||||
if (c >= '0' && c <= '9')
|
||||
{
|
||||
hexChar = (c - '0') << 4;
|
||||
}
|
||||
else if (c >= 'a' && c <= 'f')
|
||||
{
|
||||
hexChar = (c - 'a' + 10) << 4;
|
||||
}
|
||||
else if (c >= 'A' && c <= 'F')
|
||||
{
|
||||
hexChar = (c - 'A' + 10) << 4;
|
||||
}
|
||||
else
|
||||
{
|
||||
w.append(charMap['%']);
|
||||
w.append(charMap[c]);
|
||||
break;
|
||||
}
|
||||
savedChar = c;
|
||||
status = NP_HEX;
|
||||
break;
|
||||
case NP_HEX:
|
||||
if (c >= '0' && c <= '9')
|
||||
{
|
||||
hexChar |= (c - '0');
|
||||
}
|
||||
else if (c >= 'a' && c <= 'f')
|
||||
{
|
||||
hexChar |= (c - 'a' + 10);
|
||||
}
|
||||
else if (c >= 'A' && c <= 'F')
|
||||
{
|
||||
hexChar |= (c - 'A' + 10);
|
||||
}
|
||||
else
|
||||
{
|
||||
w.append(charMap['%']);
|
||||
w.append(charMap[savedChar]);
|
||||
w.append(charMap[c]);
|
||||
break;
|
||||
}
|
||||
w.append(charMap[hexChar]);
|
||||
status = NP_CHAR;
|
||||
break;
|
||||
case NP_CHAR:
|
||||
switch (c)
|
||||
{
|
||||
case '%':
|
||||
status = NP_PERCENT;
|
||||
break;
|
||||
case '/':
|
||||
if(!isInQuery)
|
||||
{
|
||||
w.append(c);
|
||||
pathPos = w.length(); // points to the char. after "/"
|
||||
status = NP_SLASH;
|
||||
}
|
||||
else
|
||||
{
|
||||
w.append(charMap[c]);
|
||||
}
|
||||
break;
|
||||
case '?':
|
||||
if(!isInQuery)
|
||||
{
|
||||
if(pathPos == -1)
|
||||
{
|
||||
w.append('/');
|
||||
pathPos = w.length();
|
||||
}
|
||||
questionPos = w.length(); // points to the char at "?"
|
||||
isInQuery = true;
|
||||
}
|
||||
else
|
||||
{
|
||||
w.append(charMap[c]);
|
||||
break;
|
||||
}
|
||||
case '&':
|
||||
case ';':
|
||||
case '@':
|
||||
//case ':':
|
||||
case '=':
|
||||
w.append(c);
|
||||
break;
|
||||
case '+':
|
||||
w.append("%20");
|
||||
break;
|
||||
default:
|
||||
w.append(charMap[c]);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
catch (ArrayIndexOutOfBoundsException e)
|
||||
{
|
||||
// we encountered a unicode character >= 0x00ff
|
||||
// write UTF-8 to distinguish it from other characters
|
||||
// note that this does NOT lead to a pure UTF-8 URL since we
|
||||
// write 0x80 <= c <= 0xff as one-byte strings
|
||||
/*
|
||||
* if (ch <= 0x007f) { // other ASCII
|
||||
* sbuf.append(hex[ch]);
|
||||
* } else
|
||||
*/
|
||||
// note that we ignore the case that we receive "%" + unicode + c
|
||||
// (status = NP_HEX + Exception when writing savedchar); in that case
|
||||
// only the second character is written. we consider this to be very
|
||||
// unlikely
|
||||
|
||||
// see http://www.w3.org/International/O-URL-code.html
|
||||
if (c <= 0x07FF)
|
||||
{
|
||||
// non-ASCII <= 0x7FF
|
||||
w.append(charMap[0xc0 | (c >> 6)]);
|
||||
w.append(charMap[0x80 | (c & 0x3F)]);
|
||||
}
|
||||
else
|
||||
{
|
||||
// 0x7FF < c <= 0xFFFF
|
||||
w.append(charMap[0xe0 | (c >> 12)]);
|
||||
w.append(charMap[0x80 | ((c >> 6) & 0x3F)]);
|
||||
w.append(charMap[0x80 | (c & 0x3F)]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// rule 3: delete index.* or default.*
|
||||
|
||||
if(questionPos == -1) // no query
|
||||
{
|
||||
questionPos = w.length();
|
||||
}
|
||||
else
|
||||
{
|
||||
if(questionPos == w.length()-1)
|
||||
{
|
||||
// empty query. assert questionPos > 0
|
||||
w.deleteCharAt(questionPos);
|
||||
}
|
||||
}
|
||||
if(pathPos == -1) // no query
|
||||
{
|
||||
pathPos = 0;
|
||||
}
|
||||
if(questionPos > pathPos)
|
||||
{
|
||||
String file = w.substring(pathPos, questionPos);
|
||||
{
|
||||
//System.out.println("file: " + file);
|
||||
if(file.startsWith("index.") || file.startsWith("default."))
|
||||
{
|
||||
w.delete(pathPos, questionPos); // delete default page to avoid ambiguities
|
||||
}
|
||||
}
|
||||
}
|
||||
return w.toString();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param host Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
protected static String normalizeHost(HostResolver hostResolver, String host)
|
||||
{
|
||||
return hostResolver.resolveHost(host.toLowerCase());
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
HostResolver hostResolver;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the URLNormalizer object
|
||||
*
|
||||
* @param hostManager Description of the Parameter
|
||||
*/
|
||||
public URLNormalizer(HostResolver hostResolver)
|
||||
{
|
||||
this.hostResolver = hostResolver;
|
||||
}
|
||||
|
||||
public void setHostResolver(HostResolver hostResolver)
|
||||
{
|
||||
this.hostResolver = hostResolver;
|
||||
}
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param u Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
* @exception IOException Description of the Exception
|
||||
* @exception MalformedURLException Description of the Exception
|
||||
*/
|
||||
public static URL normalize(URL u, HostResolver hostResolver)
|
||||
{
|
||||
if(u == null)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
if (u.getProtocol().equals("http"))
|
||||
{
|
||||
try
|
||||
{
|
||||
int port = u.getPort();
|
||||
/*URL url =*/
|
||||
return new URL(u.getProtocol(), normalizeHost(hostResolver, u.getHost()), port == 80 ? -1 : port, normalizePath(u.getFile()));
|
||||
/*if(!u.equals(url))
|
||||
{
|
||||
System.out.println(u.toExternalForm() + " -> " + url.toExternalForm());
|
||||
}
|
||||
return url;*/
|
||||
}
|
||||
catch(MalformedURLException e)
|
||||
{
|
||||
System.out.println("assertion failed: MalformedURLException in URLNormalizer.normalize()");
|
||||
throw new java.lang.InternalError("assertion failed: MalformedURLException in URLNormalizer.normalize()");
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
System.out.println("assertion failed: IOException in URLNormalizer.normalize()");
|
||||
throw new java.lang.InternalError("assertion failed: MalformedURLException in URLNormalizer.normalize()");
|
||||
}
|
||||
|
||||
//return url
|
||||
}
|
||||
else
|
||||
{
|
||||
return u;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
public URL normalize(URL u)
|
||||
{
|
||||
return this.normalize(u, hostResolver);
|
||||
}
|
||||
|
||||
}
|
|
@ -1,264 +0,0 @@
|
|||
/*
|
||||
* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
package de.lanlab.larm.parser;
|
||||
|
||||
import java.util.Hashtable;
|
||||
import java.io.*;
|
||||
|
||||
/**
|
||||
* A very simple entity manager. Based on HeX, the HTML enabled XML parser, by
|
||||
* Anders Kristensen, HP Labs Bristol
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 1. Juni 2002
|
||||
*/
|
||||
public class EntityManager
|
||||
{
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
protected Hashtable entities = new Hashtable();
|
||||
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
private Tokenizer tok;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the EntityManager object
|
||||
*
|
||||
* @param tok Description of the Parameter
|
||||
*/
|
||||
public EntityManager(Tokenizer tok)
|
||||
{
|
||||
this.tok = tok;
|
||||
entities.put("amp", "&");
|
||||
entities.put("lt", "<");
|
||||
entities.put("gt", ">");
|
||||
entities.put("apos", "'");
|
||||
entities.put("quot", "\"");
|
||||
entities.put("auml", "ä");
|
||||
entities.put("ouml", "ö");
|
||||
entities.put("uuml", "ü");
|
||||
entities.put("Auml", "Ä");
|
||||
entities.put("Ouml", "Ö");
|
||||
entities.put("Uuml", "Ü");
|
||||
entities.put("szlig", "ß");
|
||||
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Finds entitiy and character references in the provided char array and
|
||||
* decodes them. The operation is destructive, i.e. the encoded string
|
||||
* replaces the original - this is atrightforward since the new string can
|
||||
* only get shorter.
|
||||
*
|
||||
* @param buffer Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
* @exception Exception Description of the Exception
|
||||
*/
|
||||
public final SimpleCharArrayWriter entityDecode(SimpleCharArrayWriter buffer)
|
||||
throws Exception
|
||||
{
|
||||
char[] buf = buffer.getCharArray();
|
||||
// avoids method calls
|
||||
int len = buffer.size();
|
||||
|
||||
// not fastest but certainly simplest:
|
||||
if (indexOf(buf, '&', 0, len) == -1)
|
||||
{
|
||||
return buffer;
|
||||
}
|
||||
SimpleCharArrayWriter newbuf = new SimpleCharArrayWriter(len);
|
||||
|
||||
for (int start = 0; ; )
|
||||
{
|
||||
int x = indexOf(buf, '&', start, len);
|
||||
if (x == -1)
|
||||
{
|
||||
newbuf.write(buf, start, len - start);
|
||||
return newbuf;
|
||||
}
|
||||
else
|
||||
{
|
||||
newbuf.write(buf, start, x - start);
|
||||
start = x + 1;
|
||||
x = indexOf(buf, ';', start, len);
|
||||
if (x == -1)
|
||||
{
|
||||
//tok.warning("Entity reference not semicolon terminated");
|
||||
newbuf.write('&');
|
||||
//break; //???????????
|
||||
}
|
||||
else
|
||||
{
|
||||
try
|
||||
{
|
||||
writeEntityDef(buf, start, x - start, newbuf);
|
||||
start = x + 1;
|
||||
}
|
||||
catch (Exception ex)
|
||||
{
|
||||
//tok.warning("Bad entity reference");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// character references are rare enough that we don't care about
|
||||
// creating a String object for them unnecessarily...
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param buf Description of the Parameter
|
||||
* @param off Description of the Parameter
|
||||
* @param len Description of the Parameter
|
||||
* @param out Description of the Parameter
|
||||
* @exception Exception Description of the Exception
|
||||
* @exception IOException Description of the Exception
|
||||
* @exception NumberFormatException Description of the Exception
|
||||
*/
|
||||
public void writeEntityDef(char[] buf, int off, int len, Writer out)
|
||||
throws Exception, IOException, NumberFormatException
|
||||
{
|
||||
Integer ch;
|
||||
//System.out.println("Entity: " + new String(buf, off, len) +" "+off+" "+len);
|
||||
|
||||
if (buf[off] == '#')
|
||||
{
|
||||
// character reference
|
||||
off++;
|
||||
len--;
|
||||
if (buf[off] == 'x' || buf[off] == 'X')
|
||||
{
|
||||
ch = Integer.valueOf(new String(buf, off + 1, len - 1), 16);
|
||||
}
|
||||
else
|
||||
{
|
||||
ch = Integer.valueOf(new String(buf, off, len));
|
||||
}
|
||||
out.write(ch.intValue());
|
||||
}
|
||||
else
|
||||
{
|
||||
String ent = new String(buf, off, len);
|
||||
String val = (String) entities.get(ent);
|
||||
if (val != null)
|
||||
{
|
||||
out.write(val);
|
||||
}
|
||||
else
|
||||
{
|
||||
out.write("&" + ent + ";");
|
||||
//tok.warning("unknown entity reference: " + ent);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param entity Description of the Parameter
|
||||
* @param value Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public String defTextEntity(String entity, String value)
|
||||
{
|
||||
return (String) entities.put(entity, value);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Returns the index within this String of the first occurrence of the
|
||||
* specified character, starting the search at fromIndex. This method
|
||||
* returns -1 if the character is not found.
|
||||
*
|
||||
* @param buf Description of the Parameter
|
||||
* @param ch Description of the Parameter
|
||||
* @param from Description of the Parameter
|
||||
* @param to Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
* @params buf the buffer to search
|
||||
* @params ch the character to search for
|
||||
* @params from the index to start the search
|
||||
* from
|
||||
* @params to the highest possible index returned
|
||||
* plus 1
|
||||
* @throws IndexOutOfBoundsException if index out of bounds...
|
||||
*/
|
||||
public final static int indexOf(char[] buf, int ch, int from, int to)
|
||||
{
|
||||
int i;
|
||||
for (i = from; i < to && buf[i] != ch; i++)
|
||||
{
|
||||
;
|
||||
}
|
||||
// do nothing
|
||||
if (i < to)
|
||||
{
|
||||
return i;
|
||||
}
|
||||
else
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
|
@ -1,68 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.parser;
|
||||
|
||||
/**
|
||||
*
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @version $Id$
|
||||
*/
|
||||
public interface LinkHandler
|
||||
{
|
||||
public void handleLink(String value, String anchorText, boolean isFrame);
|
||||
public void handleBase(String value);
|
||||
public void handleTitle(String value);
|
||||
}
|
|
@ -1,42 +0,0 @@
|
|||
package de.lanlab.larm.parser;
|
||||
|
||||
import java.io.CharArrayWriter;
|
||||
|
||||
/**
|
||||
* <p>Title: </p>
|
||||
* <p>Description: </p>
|
||||
* <p>Copyright: Copyright (c) 2002</p>
|
||||
* <p>Company: </p>
|
||||
* @author unascribed
|
||||
* @version 1.0
|
||||
*/
|
||||
|
||||
public final class SimpleCharArrayWriter extends java.io.CharArrayWriter {
|
||||
public SimpleCharArrayWriter() {
|
||||
super();
|
||||
}
|
||||
|
||||
public SimpleCharArrayWriter(int size) {
|
||||
super(size);
|
||||
}
|
||||
|
||||
// use only to *decrement* size
|
||||
public void setLength(int size) {
|
||||
// synchronized (lock) {
|
||||
if (size < count) count = size;
|
||||
// }
|
||||
}
|
||||
|
||||
public char[] getCharArray() {
|
||||
// synchronized (lock) {
|
||||
return buf;
|
||||
// }
|
||||
}
|
||||
|
||||
public int getLength()
|
||||
{
|
||||
return count;
|
||||
}
|
||||
|
||||
|
||||
}
|
File diff suppressed because it is too large
Load Diff
|
@ -1,80 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.storage;
|
||||
|
||||
import de.lanlab.larm.util.*;
|
||||
|
||||
/**
|
||||
* This interface stores documents provided by a fetcher task
|
||||
* @author Clemens Marschner
|
||||
*/
|
||||
public interface DocumentStorage
|
||||
{
|
||||
/**
|
||||
* called once when the storage is supposed to be initialized
|
||||
*/
|
||||
public void open();
|
||||
|
||||
|
||||
/**
|
||||
* called to store a web document
|
||||
*
|
||||
* @param doc the document
|
||||
* @return the document itself or a changed version. Only makes sense if
|
||||
* storage pipeline is used; usually the storage would return the document
|
||||
* as is.
|
||||
*/
|
||||
public WebDocument store(WebDocument doc);
|
||||
}
|
|
@ -1,111 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2002 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.storage;
|
||||
|
||||
import de.lanlab.larm.storage.LinkStorage;
|
||||
import de.lanlab.larm.util.SimpleLogger;
|
||||
import de.lanlab.larm.fetcher.URLMessage;
|
||||
|
||||
import java.util.Collection;
|
||||
import java.util.Iterator;
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 1. Juni 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class LinkLogStorage implements LinkStorage
|
||||
{
|
||||
private SimpleLogger log;
|
||||
|
||||
|
||||
/**
|
||||
* Creates a new <code>LinkLogStorage</code> instance.
|
||||
*
|
||||
* @param logFile an instance of <code>SimpleLogger</code>
|
||||
*/
|
||||
public LinkLogStorage(SimpleLogger logFile)
|
||||
{
|
||||
this.log = logFile;
|
||||
}
|
||||
|
||||
/**
|
||||
* Describe <code>openLinkStorage</code> method here.
|
||||
*
|
||||
*/
|
||||
public void openLinkStorage()
|
||||
{
|
||||
}
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param c Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public Collection storeLinks(Collection c)
|
||||
{
|
||||
synchronized (log)
|
||||
{
|
||||
for (Iterator it = c.iterator(); it.hasNext(); )
|
||||
{
|
||||
log.log(((URLMessage) it.next()).getInfo());
|
||||
}
|
||||
}
|
||||
return c;
|
||||
}
|
||||
}
|
|
@ -1,74 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.storage;
|
||||
import java.util.Collection;
|
||||
|
||||
public interface LinkStorage
|
||||
{
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*/
|
||||
public void openLinkStorage();
|
||||
|
||||
|
||||
/**
|
||||
* stores the extracted links may contain links of more than one document
|
||||
*
|
||||
* @param c Description of the Parameter
|
||||
* @return the collection, may have been changed or set to null
|
||||
*/
|
||||
public Collection storeLinks(Collection c);
|
||||
}
|
|
@ -1,231 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.storage;
|
||||
|
||||
import de.lanlab.larm.util.WebDocument;
|
||||
import de.lanlab.larm.util.SimpleLogger;
|
||||
import java.io.*;
|
||||
|
||||
|
||||
/**
|
||||
* this class saves the documents into page files of 50 MB and keeps a record of all
|
||||
* the positions into a Logger. the log file contains URL, page file number, and
|
||||
* index within the page file.
|
||||
*
|
||||
*/
|
||||
|
||||
public class LogStorage implements DocumentStorage
|
||||
{
|
||||
|
||||
SimpleLogger log;
|
||||
|
||||
File pageFile;
|
||||
FileOutputStream out;
|
||||
/*OutputStreamWriter outw;*/
|
||||
|
||||
int pageFileCount;
|
||||
String filePrefix;
|
||||
int offset;
|
||||
boolean isValid = false;
|
||||
/**
|
||||
* Description of the Field
|
||||
*/
|
||||
public final static int MAXLENGTH = 50000000;
|
||||
boolean logContents = false;
|
||||
String fileName;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the LogStorage object
|
||||
*
|
||||
* @param log the logger where index information is saved to
|
||||
* @param logContents whether all docs are to be stored in page files or not
|
||||
* @param filePrefix the file name where the page file number is appended
|
||||
*/
|
||||
public LogStorage(SimpleLogger log, boolean logContents, String filePrefix)
|
||||
{
|
||||
this.log = log;
|
||||
pageFileCount = 0;
|
||||
this.filePrefix = filePrefix;
|
||||
this.logContents = logContents;
|
||||
if (logContents)
|
||||
{
|
||||
openPageFile();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*/
|
||||
public void open() { }
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*/
|
||||
public void openPageFile()
|
||||
{
|
||||
int id = ++pageFileCount;
|
||||
fileName = filePrefix + "_" + id + ".pfl";
|
||||
try
|
||||
{
|
||||
this.offset = 0;
|
||||
out = new FileOutputStream(fileName);
|
||||
/*outw = new OutputStreamWriter(out);*/
|
||||
isValid = true;
|
||||
}
|
||||
catch (IOException io)
|
||||
{
|
||||
log.logThreadSafe("**ERROR: IOException while opening pageFile " + fileName + ": " + io.getClass().getName() + "; " + io.getMessage());
|
||||
isValid = false;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the outputStream attribute of the LogStorage object
|
||||
*
|
||||
* @return The outputStream value
|
||||
*/
|
||||
public OutputStream getOutputStream()
|
||||
{
|
||||
if (offset > MAXLENGTH)
|
||||
{
|
||||
try
|
||||
{
|
||||
out.close();
|
||||
}
|
||||
catch (IOException io)
|
||||
{
|
||||
log.logThreadSafe("**ERROR: IOException while closing pageFile " + fileName + ": " + io.getClass().getName() + "; " + io.getMessage());
|
||||
}
|
||||
openPageFile();
|
||||
}
|
||||
return out;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param bytes Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public synchronized int writeToPageFile(byte[] bytes)
|
||||
{
|
||||
try
|
||||
{
|
||||
OutputStream out = getOutputStream();
|
||||
int oldOffset = this.offset;
|
||||
out.write(bytes);
|
||||
this.offset += bytes.length;
|
||||
return oldOffset;
|
||||
}
|
||||
catch (IOException io)
|
||||
{
|
||||
log.logThreadSafe("**ERROR: IOException while writing " + bytes.length + " bytes to pageFile " + fileName + ": " + io.getClass().getName() + "; " + io.getMessage());
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
/*
|
||||
public synchronized int writeToPageFile(char[] chars)
|
||||
{
|
||||
try
|
||||
{
|
||||
getOutputStream();
|
||||
int oldOffset = this.offset;
|
||||
this.offset += outw.write(chars);
|
||||
new java.io.BufferedWriter().
|
||||
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
*/
|
||||
|
||||
/**
|
||||
* Sets the logger attribute of the LogStorage object
|
||||
*
|
||||
* @param log The new logger value
|
||||
*/
|
||||
public void setLogger(SimpleLogger log)
|
||||
{
|
||||
this.log = log;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* writes file info to log file;
|
||||
* stores the document if storing is enabled. in that case the log line contains
|
||||
* the page file number and the index within that file
|
||||
*
|
||||
* @param doc Description of the Parameter
|
||||
* @return the unchanged document
|
||||
*/
|
||||
public WebDocument store(WebDocument doc)
|
||||
{
|
||||
String docInfo = doc.getInfo();
|
||||
byte[] content = (byte[])doc.getField("contentBytes");
|
||||
if (logContents && isValid && content != null && content.length != 0)
|
||||
{
|
||||
int offset = writeToPageFile(content);
|
||||
docInfo = docInfo + "\t" + pageFileCount + "\t" + offset;
|
||||
}
|
||||
log.logThreadSafe(docInfo);
|
||||
return doc;
|
||||
}
|
||||
}
|
|
@ -1,241 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.storage;
|
||||
|
||||
import de.lanlab.larm.util.WebDocument;
|
||||
import org.apache.lucene.index.*;
|
||||
import org.apache.lucene.document.*;
|
||||
import org.apache.lucene.analysis.*;
|
||||
import java.util.*;
|
||||
import java.io.*;
|
||||
|
||||
/**
|
||||
* FIXME document this class
|
||||
* Title: LARM Lanlab Retrieval Machine Description: Copyright: Copyright (c)
|
||||
* Company:
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 14. Juni 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class LuceneStorage implements DocumentStorage
|
||||
{
|
||||
public final static int INDEX = 1;
|
||||
public final static int STORE = 2;
|
||||
public final static int TOKEN = 4;
|
||||
|
||||
private HashMap fieldInfos = new HashMap();
|
||||
private IndexWriter writer;
|
||||
private Analyzer analyzer;
|
||||
private String indexName;
|
||||
private boolean create;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the LuceneStorage object
|
||||
*/
|
||||
public LuceneStorage() { }
|
||||
|
||||
|
||||
/**
|
||||
* Sets the analyzer attribute of the LuceneStorage object
|
||||
*
|
||||
* @param a The new analyzer value
|
||||
*/
|
||||
public void setAnalyzer(Analyzer a)
|
||||
{
|
||||
this.analyzer = a;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the indexName attribute of the LuceneStorage object
|
||||
*
|
||||
* @param name The new indexName value
|
||||
*/
|
||||
public void setIndexName(String name)
|
||||
{
|
||||
this.indexName = name;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the fieldInfo attribute of the LuceneStorage object
|
||||
*
|
||||
* @param fieldName The new fieldInfo value
|
||||
* @param value The new fieldInfo value
|
||||
*/
|
||||
public void setFieldInfo(String fieldName, int value)
|
||||
{
|
||||
fieldInfos.put(fieldName, new Integer(value));
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Sets the create attribute of the LuceneStorage object
|
||||
*
|
||||
* @param create The new create value
|
||||
*/
|
||||
public void setCreate(boolean create)
|
||||
{
|
||||
this.create = create;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*/
|
||||
public void open()
|
||||
{
|
||||
// FIXME: replace with logging
|
||||
System.out.println("opening Lucene storage with index name " + indexName + ")");
|
||||
try
|
||||
{
|
||||
writer = new IndexWriter(indexName, analyzer, create);
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
// FIXME: replace with logging
|
||||
System.err.println("IOException occured when opening Lucene Index with index name '" + indexName + "'");
|
||||
e.printStackTrace();
|
||||
}
|
||||
if (writer != null)
|
||||
{
|
||||
// FIXME: replace with logging
|
||||
System.out.println("lucene storage opened successfully");
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the fieldInfo attribute of the LuceneStorage object
|
||||
*
|
||||
* @param fieldName Description of the Parameter
|
||||
* @param defaultValue Description of the Parameter
|
||||
* @return The fieldInfo value
|
||||
*/
|
||||
protected int getFieldInfo(String fieldName, int defaultValue)
|
||||
{
|
||||
Integer info = (Integer) fieldInfos.get(fieldName);
|
||||
if (info != null)
|
||||
{
|
||||
return info.intValue();
|
||||
}
|
||||
else
|
||||
{
|
||||
return defaultValue;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
protected void addField(Document doc, String name, String value, int defaultIndexFlags)
|
||||
{
|
||||
int flags = getFieldInfo(name, defaultIndexFlags);
|
||||
if (flags != 0)
|
||||
{
|
||||
doc.add(new Field(name, value, (flags & STORE) != 0, (flags & INDEX) != 0, (flags & TOKEN) != 0));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param webDoc Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public WebDocument store(WebDocument webDoc)
|
||||
{
|
||||
//System.out.println("storing " + webDoc.getUrl());
|
||||
|
||||
Document doc = new Document();
|
||||
int flags;
|
||||
|
||||
addField(doc, "url", webDoc.getUrl().toExternalForm(), STORE | INDEX);
|
||||
addField(doc, "mimetype", webDoc.getMimeType(), STORE | INDEX);
|
||||
// addField(doc, "...", webDoc.getNormalizedURLString(), STORE | INDEX); and so fortg
|
||||
// todo: other fields
|
||||
Set fields = webDoc.getFieldNames();
|
||||
|
||||
for (Iterator it = fields.iterator(); it.hasNext(); )
|
||||
{
|
||||
String fieldName = (String) it.next();
|
||||
Object field = webDoc.getField(fieldName);
|
||||
|
||||
if (field instanceof char[])
|
||||
{
|
||||
addField(doc, fieldName, new String((char[]) field), STORE | INDEX);
|
||||
}
|
||||
else if (field instanceof String)
|
||||
{
|
||||
addField(doc, fieldName, (String)field, STORE | INDEX);
|
||||
}
|
||||
/* else ? */
|
||||
// ignore byte[] fields
|
||||
}
|
||||
try
|
||||
{
|
||||
writer.addDocument(doc);
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
// FIXME: replace with logging
|
||||
System.err.println("IOException occured when adding document to Lucene index");
|
||||
e.printStackTrace();
|
||||
}
|
||||
return webDoc;
|
||||
}
|
||||
|
||||
//public void set
|
||||
}
|
|
@ -1,77 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.storage;
|
||||
import de.lanlab.larm.util.*;
|
||||
|
||||
/**
|
||||
* doesn't do a lot
|
||||
*/
|
||||
public class NullStorage implements DocumentStorage
|
||||
{
|
||||
|
||||
public NullStorage()
|
||||
{
|
||||
}
|
||||
|
||||
public void open()
|
||||
{
|
||||
}
|
||||
|
||||
public WebDocument store(WebDocument doc)
|
||||
{
|
||||
return doc;
|
||||
}
|
||||
|
||||
}
|
|
@ -1,227 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.storage;
|
||||
import java.sql.*;
|
||||
import de.lanlab.larm.util.*;
|
||||
import java.util.*;
|
||||
|
||||
/**
|
||||
* saves the document into an sql table. At this time only in MS SQL (and probably Sybase)
|
||||
* a table "Document" with the columns DO_URL(varchar), DO_MimeType(varchar) and
|
||||
* DO_Data2(BLOB) is created after start<br>
|
||||
* notes: experimental; slow
|
||||
*/
|
||||
public class SQLServerStorage implements DocumentStorage
|
||||
{
|
||||
|
||||
private Vector freeCons;
|
||||
private Vector busyCons;
|
||||
|
||||
private Vector freeStatements;
|
||||
private Vector busyStatements;
|
||||
|
||||
private PreparedStatement addDoc;
|
||||
|
||||
public SQLServerStorage(String driver, String connectionString, String account, String password, int nrConnections)
|
||||
{
|
||||
try
|
||||
{
|
||||
Class.forName(driver);
|
||||
freeCons = new Vector(nrConnections);
|
||||
busyCons = new Vector(nrConnections);
|
||||
freeStatements = new Vector(nrConnections);
|
||||
busyStatements = new Vector(nrConnections);
|
||||
|
||||
Connection sqlConn;
|
||||
PreparedStatement statement;
|
||||
for(int i=0; i<nrConnections; i++)
|
||||
{
|
||||
sqlConn = DriverManager.getConnection(connectionString, account, password);
|
||||
statement = sqlConn.prepareStatement("INSERT INTO Document (DO_URL, DO_MimeType, DO_Data2) VALUES (?,?,?)");
|
||||
freeCons.add(sqlConn);
|
||||
freeStatements.add(statement);
|
||||
}
|
||||
|
||||
|
||||
|
||||
}
|
||||
catch(SQLException e)
|
||||
{
|
||||
synchronized(this)
|
||||
{
|
||||
System.out.println(/*"Task " + taskNr + ": */ "SQLException: " + e.getMessage());
|
||||
System.err.println(" SQLState: " + e.getSQLState());
|
||||
System.err.println(" VendorError: " + e.getErrorCode());
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
catch(Exception e)
|
||||
{
|
||||
System.out.println("SQLServerStorage: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
e.printStackTrace();
|
||||
System.exit(0);
|
||||
}
|
||||
}
|
||||
|
||||
public Connection getConnection()
|
||||
{
|
||||
synchronized(this)
|
||||
{
|
||||
Connection actual = (Connection)freeCons.firstElement();
|
||||
freeCons.removeElementAt(0);
|
||||
if(actual == null)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
busyCons.add(actual);
|
||||
return actual;
|
||||
}
|
||||
}
|
||||
|
||||
public void releaseConnection(Connection con)
|
||||
{
|
||||
synchronized(this)
|
||||
{
|
||||
busyCons.remove(con);
|
||||
freeCons.add(con);
|
||||
}
|
||||
}
|
||||
|
||||
public PreparedStatement getStatement()
|
||||
{
|
||||
synchronized(this)
|
||||
{
|
||||
PreparedStatement actual = (PreparedStatement)freeStatements.firstElement();
|
||||
freeStatements.removeElementAt(0);
|
||||
if(actual == null)
|
||||
{
|
||||
return null;
|
||||
}
|
||||
busyStatements.add(actual);
|
||||
return actual;
|
||||
}
|
||||
}
|
||||
|
||||
public void releaseStatement(PreparedStatement statement)
|
||||
{
|
||||
synchronized(this)
|
||||
{
|
||||
busyStatements.remove(statement);
|
||||
freeStatements.add(statement);
|
||||
}
|
||||
}
|
||||
|
||||
public void open()
|
||||
{
|
||||
Connection conn = null;
|
||||
try
|
||||
{
|
||||
conn = getConnection();
|
||||
Statement delDoc = conn.createStatement();
|
||||
|
||||
// recreate table (faster than delete from table)
|
||||
|
||||
delDoc.executeUpdate("if exists (select * from sysobjects where id = object_id(N'[dbo].[Document]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)drop table [dbo].[Document]");
|
||||
delDoc.executeUpdate("CREATE TABLE [dbo].[Document] ([DO_ID] [int] IDENTITY (1, 1) NOT NULL , [DA_CrawlPass] [int] NULL , [DO_URL] [varchar] (255) NULL , [DO_ContentType] [varchar] (50) NULL , [DO_Data] [text] NULL , [DO_Hashcode] [int] NULL , [DO_ContentLength] [int] NULL , [DO_ContentEncoding] [varchar] (20) NULL , [DO_Data2] [image] NULL, [DO_MimeType] [varchar] (255) NULL) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]"); // löschen
|
||||
}
|
||||
catch(SQLException e)
|
||||
{
|
||||
System.out.println(/*"Task " + taskNr + ": */"SQLException: " + e.getMessage());
|
||||
System.err.println(" SQLState: " + e.getSQLState());
|
||||
System.err.println(" VendorError: " + e.getErrorCode());
|
||||
}
|
||||
finally
|
||||
{
|
||||
if(conn != null)
|
||||
{
|
||||
releaseConnection(conn);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
*
|
||||
* @param document
|
||||
* @return the unchanged document
|
||||
*/
|
||||
public WebDocument store(WebDocument document)
|
||||
{
|
||||
|
||||
PreparedStatement addDoc = null;
|
||||
try
|
||||
{
|
||||
addDoc = getStatement();
|
||||
addDoc.setString(1, document.getURLString());
|
||||
addDoc.setString(2, document.getMimeType());
|
||||
addDoc.setBytes(3, (byte[])document.getField("content"));
|
||||
addDoc.execute();
|
||||
}
|
||||
catch(SQLException e)
|
||||
{
|
||||
System.out.println(/* "Task " + taskNr + ": */ "SQLException: " + e.getMessage());
|
||||
System.err.println(" SQLState: " + e.getSQLState());
|
||||
System.err.println(" VendorError: " + e.getErrorCode());
|
||||
}
|
||||
finally
|
||||
{
|
||||
if(addDoc != null)
|
||||
{
|
||||
releaseStatement(addDoc);
|
||||
}
|
||||
}
|
||||
return document;
|
||||
}
|
||||
}
|
|
@ -1,185 +0,0 @@
|
|||
/*
|
||||
* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
package de.lanlab.larm.storage;
|
||||
|
||||
import de.lanlab.larm.util.WebDocument;
|
||||
import de.lanlab.larm.fetcher.URLMessage;
|
||||
|
||||
import java.util.ArrayList;
|
||||
import java.util.Iterator;
|
||||
import java.util.Collection;
|
||||
|
||||
/**
|
||||
* @author Clemens Marschner
|
||||
* @created 1. Juni 2002
|
||||
* @version $Id$
|
||||
*/
|
||||
public class StoragePipeline implements DocumentStorage, LinkStorage
|
||||
{
|
||||
private boolean isOpen;
|
||||
private boolean isLinkStorageOpen;
|
||||
private ArrayList docStorages;
|
||||
private ArrayList linkStorages;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the StoragePipeline object
|
||||
*/
|
||||
public StoragePipeline()
|
||||
{
|
||||
isOpen = false;
|
||||
isLinkStorageOpen = false;
|
||||
docStorages = new ArrayList();
|
||||
linkStorages = new ArrayList();
|
||||
}
|
||||
|
||||
/**
|
||||
* open all docStorages
|
||||
*/
|
||||
public void open()
|
||||
{
|
||||
for (Iterator it = docStorages.iterator(); it.hasNext(); )
|
||||
{
|
||||
// FIXME: replace with logging
|
||||
System.out.println("opening...");
|
||||
((DocumentStorage) it.next()).open();
|
||||
}
|
||||
isOpen = true;
|
||||
}
|
||||
|
||||
/**
|
||||
* store the doc into all docStorages
|
||||
* document is discarded if a storage.store() returns null
|
||||
*
|
||||
* @see de.lanlab.larm.storage.WebDocument#store
|
||||
* @param doc Description of the Parameter
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public WebDocument store(WebDocument doc)
|
||||
{
|
||||
for(Iterator it = docStorages.iterator(); it.hasNext();)
|
||||
{
|
||||
doc = ((DocumentStorage)it.next()).store(doc);
|
||||
if (doc == null)
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
return doc;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Adds a feature to the Storage attribute of the StoragePipeline object
|
||||
*
|
||||
* @param storage The feature to be added to the Storage attribute
|
||||
*/
|
||||
public void addDocStorage(DocumentStorage storage)
|
||||
{
|
||||
// FIXME: use JDK 1.4 asserts instead?
|
||||
if (isOpen)
|
||||
{
|
||||
throw new IllegalStateException("storage can't be added if pipeline is already open");
|
||||
}
|
||||
docStorages.add(storage);
|
||||
}
|
||||
|
||||
/**
|
||||
* Adds a feature to the Storage attribute of the StoragePipeline object
|
||||
*
|
||||
* @param storage The feature to be added to the Storage attribute
|
||||
*/
|
||||
public void addLinkStorage(LinkStorage storage)
|
||||
{
|
||||
// FIXME: use JDK 1.4 asserts instead?
|
||||
if (isOpen)
|
||||
{
|
||||
throw new IllegalStateException("storage can't be added if pipeline is already open");
|
||||
}
|
||||
linkStorages.add(storage);
|
||||
}
|
||||
|
||||
/**
|
||||
* Describe <code>openLinkStorage</code> method here.
|
||||
*
|
||||
*/
|
||||
public void openLinkStorage()
|
||||
{
|
||||
for (Iterator it = linkStorages.iterator(); it.hasNext(); )
|
||||
{
|
||||
((LinkStorage) it.next()).openLinkStorage();
|
||||
}
|
||||
isLinkStorageOpen = true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Describe <code>storeLinks</code> method here.
|
||||
*
|
||||
* @param c a <code>Collection</code> value
|
||||
* @return a <code>Collection</code> value
|
||||
*/
|
||||
public Collection storeLinks(Collection c)
|
||||
{
|
||||
for(Iterator it = linkStorages.iterator(); it.hasNext();)
|
||||
{
|
||||
c = ((LinkStorage)it.next()).storeLinks(c);
|
||||
if (c == null)
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
return c;
|
||||
}
|
||||
}
|
|
@ -1,62 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.threads;
|
||||
|
||||
public interface InterruptableTask
|
||||
{
|
||||
public void run(ServerThread thread);
|
||||
public void interrupt();
|
||||
public String getInfo();
|
||||
}
|
|
@ -1,227 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.threads;
|
||||
|
||||
import java.util.Vector;
|
||||
import java.util.Iterator;
|
||||
import java.io.*;
|
||||
import java.util.*;
|
||||
import de.lanlab.larm.util.*;
|
||||
|
||||
/**
|
||||
* This thread class acts like a server. It's running idle within
|
||||
* a thread pool until "runTask" is called. The given task will then
|
||||
* be executed asynchronously
|
||||
*/
|
||||
public class ServerThread extends Thread
|
||||
{
|
||||
/**
|
||||
* the task that is to be executed. null in idle-mode
|
||||
*/
|
||||
protected InterruptableTask task = null;
|
||||
|
||||
private boolean busy = false;
|
||||
|
||||
private ArrayList listeners = new ArrayList();
|
||||
private boolean isInterrupted = false;
|
||||
private int threadNumber;
|
||||
|
||||
SimpleLogger log;
|
||||
SimpleLogger errorLog;
|
||||
|
||||
public ServerThread(int threadNumber, String name, ThreadGroup threadGroup)
|
||||
{
|
||||
super(threadGroup, name);
|
||||
init(threadNumber);
|
||||
}
|
||||
|
||||
|
||||
public ServerThread(int threadNumber, String name)
|
||||
{
|
||||
super(name);
|
||||
init(threadNumber);
|
||||
}
|
||||
|
||||
void init(int threadNumber)
|
||||
{
|
||||
this.threadNumber = threadNumber;
|
||||
File logDir = new File("logs");
|
||||
logDir.mkdir();
|
||||
log = new SimpleLogger("thread" + threadNumber);
|
||||
errorLog = new SimpleLogger("thread" + threadNumber + "_errors");
|
||||
|
||||
}
|
||||
|
||||
/**
|
||||
* constructor
|
||||
* @param threadNumber assigns an arbitrary number to this thread
|
||||
* used by ServerThreadFactory
|
||||
*/
|
||||
public ServerThread(int threadNumber)
|
||||
{
|
||||
init(threadNumber);
|
||||
}
|
||||
|
||||
/**
|
||||
* the run method runs asynchronously. It waits until runTask() is
|
||||
* called
|
||||
*/
|
||||
public void run()
|
||||
{
|
||||
try
|
||||
{
|
||||
|
||||
while(!isInterrupted)
|
||||
{
|
||||
synchronized(this)
|
||||
{
|
||||
while(task == null)
|
||||
{
|
||||
wait();
|
||||
}
|
||||
}
|
||||
task.run(this);
|
||||
taskReady();
|
||||
}
|
||||
}
|
||||
catch(InterruptedException e)
|
||||
{
|
||||
System.out.println("ServerThread " + threadNumber + " interrupted");
|
||||
log.log("** Thread Interrupted **");
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this is the main method that will invoke a task to run.
|
||||
*/
|
||||
public synchronized void runTask(InterruptableTask t)
|
||||
{
|
||||
busy = true;
|
||||
task = t;
|
||||
notify();
|
||||
}
|
||||
|
||||
/**
|
||||
* it should be possible to interrupt a task with this function.
|
||||
* therefore, the task has to check its interrupted()-state
|
||||
*/
|
||||
public void interruptTask()
|
||||
{
|
||||
if(task != null)
|
||||
{
|
||||
task.interrupt();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* the server thread can either be in idle or busy mode
|
||||
*/
|
||||
public boolean isBusy()
|
||||
{
|
||||
return busy;
|
||||
}
|
||||
|
||||
public void addTaskReadyListener(TaskReadyListener l)
|
||||
{
|
||||
listeners.add(l);
|
||||
}
|
||||
|
||||
public void removeTaskReadyListener(TaskReadyListener l)
|
||||
{
|
||||
listeners.remove(l);
|
||||
}
|
||||
|
||||
public void interrupt()
|
||||
{
|
||||
super.interrupt();
|
||||
isInterrupted = true;
|
||||
}
|
||||
|
||||
public int getThreadNumber()
|
||||
{
|
||||
return this.threadNumber;
|
||||
}
|
||||
|
||||
public InterruptableTask getTask()
|
||||
{
|
||||
return task;
|
||||
}
|
||||
|
||||
/**
|
||||
* this method will be called when the task ends. It notifies all
|
||||
* of its observers about its changed state
|
||||
*/
|
||||
protected void taskReady()
|
||||
{
|
||||
task = null;
|
||||
busy = false;
|
||||
Iterator Ie = listeners.iterator();
|
||||
while(Ie.hasNext())
|
||||
{
|
||||
((TaskReadyListener)Ie.next()).taskReady(this);
|
||||
}
|
||||
}
|
||||
|
||||
public SimpleLogger getLog()
|
||||
{
|
||||
return log;
|
||||
}
|
||||
|
||||
public SimpleLogger getErrorLog()
|
||||
{
|
||||
return errorLog;
|
||||
}
|
||||
}
|
|
@ -1,132 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.threads;
|
||||
|
||||
import de.lanlab.larm.util.Queue;
|
||||
import java.util.Collection;
|
||||
import java.util.LinkedList;
|
||||
import java.util.Iterator;
|
||||
|
||||
|
||||
/**
|
||||
* Title: LARM Lanlab Retrieval Machine
|
||||
* Description:
|
||||
* Copyright: Copyright (c)
|
||||
* Company:
|
||||
* @author
|
||||
* @version $Id$
|
||||
*/
|
||||
public class TaskQueue implements Queue
|
||||
{
|
||||
private LinkedList queue = new LinkedList();
|
||||
|
||||
/**
|
||||
*
|
||||
*/
|
||||
public TaskQueue()
|
||||
{
|
||||
}
|
||||
|
||||
|
||||
public void insertMultiple(Collection c)
|
||||
{
|
||||
throw new UnsupportedOperationException();
|
||||
}
|
||||
|
||||
/**
|
||||
* push a task to the start of the queue
|
||||
* @param i the task
|
||||
*/
|
||||
public void insert(Object i)
|
||||
{
|
||||
queue.addFirst(i);
|
||||
}
|
||||
|
||||
/**
|
||||
* get the last element out of the queue
|
||||
* The element will be removed from the queue
|
||||
* @return the task
|
||||
*/
|
||||
public Object remove()
|
||||
{
|
||||
return queue.isEmpty() ? null : (InterruptableTask)queue.removeLast();
|
||||
}
|
||||
|
||||
/**
|
||||
*
|
||||
*/
|
||||
public Iterator iterator()
|
||||
{
|
||||
return queue.iterator();
|
||||
}
|
||||
|
||||
/**
|
||||
*
|
||||
*/
|
||||
public void clear()
|
||||
{
|
||||
queue.clear();
|
||||
}
|
||||
|
||||
public boolean isEmpty()
|
||||
{
|
||||
return queue.isEmpty();
|
||||
}
|
||||
|
||||
public int size()
|
||||
{
|
||||
return queue.size();
|
||||
}
|
||||
}
|
|
@ -1,63 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.threads;
|
||||
|
||||
import de.lanlab.larm.util.Observer;
|
||||
|
||||
public interface TaskReadyListener extends Observer
|
||||
{
|
||||
public void taskReady(ServerThread s);
|
||||
}
|
||||
|
|
@ -1,65 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.threads;
|
||||
|
||||
public class ThreadFactory
|
||||
{
|
||||
// static int count = 0;
|
||||
|
||||
public ServerThread createServerThread(int count)
|
||||
{
|
||||
return new ServerThread(count);
|
||||
}
|
||||
}
|
|
@ -1,433 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.threads;
|
||||
|
||||
//import java.util.Vector;
|
||||
import java.util.*;
|
||||
|
||||
/**
|
||||
* if you have many tasks to accomplish, you can do this with one of the
|
||||
* following strategies:
|
||||
* <uL>
|
||||
* <li> do it one after another (single threaded). this may often be
|
||||
* inefficient because most programs often wait for external resources
|
||||
* <li> assign a new thread for each task (thread on demand). This will clog
|
||||
* up the system if many tasks have to be accomplished synchronously
|
||||
* <li> hold a number of tasks, and queue the requests if there are more
|
||||
* tasks than threads (ThreadPool).
|
||||
* </ul>
|
||||
* This thread pool is based on an article in Java-Magazin 06/2000.
|
||||
* synchronizations were removed unless necessary
|
||||
*
|
||||
*
|
||||
*/
|
||||
public class ThreadPool implements ThreadingStrategy, TaskReadyListener {
|
||||
private int maxThreads = MAX_THREADS;
|
||||
/**
|
||||
* references to all threads are stored here
|
||||
*/
|
||||
private HashMap allThreads = new HashMap();
|
||||
/**
|
||||
* this vector takes all idle threads
|
||||
*/
|
||||
private Vector idleThreads = new Vector();
|
||||
/**
|
||||
* this vector takes all threads that are in operation (busy)
|
||||
*/
|
||||
private Vector busyThreads = new Vector();
|
||||
|
||||
/**
|
||||
* if there are no idleThreads, tasks will go here
|
||||
*/
|
||||
private TaskQueue queue = new TaskQueue();
|
||||
|
||||
/**
|
||||
* thread pool observers will be notified of status changes
|
||||
*/
|
||||
private Vector threadPoolObservers = new Vector();
|
||||
|
||||
private boolean isStopped = false;
|
||||
|
||||
/**
|
||||
* default maximum number of threads, if not given by the user
|
||||
*/
|
||||
public final static int MAX_THREADS = 5;
|
||||
|
||||
/**
|
||||
* thread was created
|
||||
*/
|
||||
public final static String THREAD_CREATE = "T_CREATE";
|
||||
/**
|
||||
* thread was created
|
||||
*/
|
||||
public final static String THREAD_START = "T_START";
|
||||
/**
|
||||
* thread is running
|
||||
*/
|
||||
public final static String THREAD_RUNNING = "T_RUNNING";
|
||||
/**
|
||||
* thread was stopped
|
||||
*/
|
||||
public final static String THREAD_STOP = "T_STOP";
|
||||
/**
|
||||
* thread was destroyed
|
||||
*/
|
||||
public final static String THREAD_END = "T_END";
|
||||
/**
|
||||
* thread is idle
|
||||
*/
|
||||
public final static String THREAD_IDLE = "T_IDLE";
|
||||
|
||||
/**
|
||||
* a task was added to the queue, because all threads were busy
|
||||
*/
|
||||
public final static String THREADQUEUE_ADD = "TQ_ADD";
|
||||
|
||||
/**
|
||||
* a task was removed from the queue, because a thread had finished and was
|
||||
* ready
|
||||
*/
|
||||
public final static String THREADQUEUE_REMOVE = "TQ_REMOVE";
|
||||
|
||||
/**
|
||||
* this factory will create the tasks
|
||||
*/
|
||||
ThreadFactory factory;
|
||||
|
||||
|
||||
/**
|
||||
* this constructor will create the pool with MAX_THREADS threads and the
|
||||
* default factory
|
||||
*/
|
||||
public ThreadPool() {
|
||||
this(MAX_THREADS, new ThreadFactory());
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this constructor will create the pool with the default Factory
|
||||
*
|
||||
*@param max the maximum number of threads
|
||||
*/
|
||||
public ThreadPool(int max) {
|
||||
this(max, new ThreadFactory());
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* constructor
|
||||
*
|
||||
*@param max maximum number of threads
|
||||
*@param factory the thread factory with which the threads will be created
|
||||
*/
|
||||
public ThreadPool(int max, ThreadFactory factory) {
|
||||
maxThreads = max;
|
||||
this.factory = factory;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this init method will create the tasks. It must be called by hand
|
||||
*/
|
||||
public void init() {
|
||||
for (int i = 0; i < maxThreads; i++) {
|
||||
createThread(i);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param i Description of the Parameter
|
||||
*/
|
||||
public void createThread(int i) {
|
||||
ServerThread s = factory.createServerThread(i);
|
||||
idleThreads.add(s);
|
||||
allThreads.put(new Integer(i), s);
|
||||
s.addTaskReadyListener(this);
|
||||
sendMessage(i, THREAD_CREATE, "");
|
||||
s.start();
|
||||
sendMessage(i, THREAD_IDLE, "");
|
||||
}
|
||||
|
||||
|
||||
// FIXME: synchronisationstechnisch buggy
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param i Description of the Parameter
|
||||
*/
|
||||
public void restartThread(int i) {
|
||||
sendMessage(i, THREAD_STOP, "");
|
||||
ServerThread t = (ServerThread) allThreads.get(new Integer(i));
|
||||
idleThreads.remove(t);
|
||||
busyThreads.remove(t);
|
||||
allThreads.remove(new Integer(i));
|
||||
t.interruptTask();
|
||||
t.interrupt();
|
||||
//t.join();
|
||||
// deprecated, I know, but the only way to overcome SUN's bugs
|
||||
t = null;
|
||||
createThread(i);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param t Description of the Parameter
|
||||
*@param key Description of the Parameter
|
||||
*/
|
||||
public synchronized void doTask(InterruptableTask t, Object key) {
|
||||
if (!idleThreads.isEmpty()) {
|
||||
ServerThread s = (ServerThread) idleThreads.firstElement();
|
||||
idleThreads.remove(s);
|
||||
busyThreads.add(s);
|
||||
sendMessage(s.getThreadNumber(), THREAD_START, t.getInfo());
|
||||
s.runTask(t);
|
||||
sendMessage(s.getThreadNumber(), THREAD_RUNNING, t.getInfo());
|
||||
} else {
|
||||
|
||||
queue.insert(t);
|
||||
sendMessage(-1, THREADQUEUE_ADD, t.getInfo());
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this will interrupt all threads. Therefore the InterruptableTasks must
|
||||
* attend on the interrupted-flag
|
||||
*/
|
||||
public void interrupt() {
|
||||
Iterator tasks = queue.iterator();
|
||||
while (tasks.hasNext()) {
|
||||
InterruptableTask t = (InterruptableTask) tasks.next();
|
||||
t.interrupt();
|
||||
sendMessage(-1, THREADQUEUE_REMOVE, t.getInfo());
|
||||
// In der Hoffnung, dass alles klappt...
|
||||
}
|
||||
queue.clear();
|
||||
Iterator threads = busyThreads.iterator();
|
||||
while (threads.hasNext()) {
|
||||
((ServerThread) threads.next()).interruptTask();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this will interrupt the tasks and end all threads
|
||||
*/
|
||||
public void stop() {
|
||||
isStopped = true;
|
||||
interrupt();
|
||||
Iterator threads = idleThreads.iterator();
|
||||
while (threads.hasNext()) {
|
||||
((ServerThread) threads.next()).interruptTask();
|
||||
}
|
||||
idleThreads.clear();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* wird von einem ServerThread aufgerufen, wenn dieser fertig ist
|
||||
*
|
||||
*@param s Description of the Parameter
|
||||
*@param: ServerThread s - der aufrufende Thread
|
||||
*/
|
||||
public synchronized void taskReady(ServerThread s) {
|
||||
if (isStopped) {
|
||||
s.interrupt();
|
||||
sendMessage(s.getThreadNumber(), THREAD_STOP, s.getTask().getInfo());
|
||||
busyThreads.remove(s);
|
||||
} else if (!queue.isEmpty()) {
|
||||
InterruptableTask t = (InterruptableTask) queue.remove();
|
||||
//queue.remove(t);
|
||||
sendMessage(-1, THREADQUEUE_REMOVE, t.getInfo());
|
||||
sendMessage(s.getThreadNumber(), THREAD_START, "");
|
||||
s.runTask(t);
|
||||
sendMessage(s.getThreadNumber(), THREAD_RUNNING, s.getTask().getInfo());
|
||||
} else {
|
||||
sendMessage(s.getThreadNumber(), THREAD_IDLE, "");
|
||||
idleThreads.add(s);
|
||||
busyThreads.remove(s);
|
||||
}
|
||||
synchronized (idleThreads) {
|
||||
idleThreads.notify();
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*/
|
||||
public void waitForFinish() {
|
||||
synchronized (idleThreads) {
|
||||
while (busyThreads.size() != 0) {
|
||||
//System.out.println("busyThreads: " + busyThreads.size());
|
||||
try {
|
||||
idleThreads.wait();
|
||||
} catch (InterruptedException e) {
|
||||
System.out.println("Interrupted: " + e.getMessage());
|
||||
}
|
||||
}
|
||||
//System.out.println("busyThreads: " + busyThreads.size());
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Adds a feature to the ThreadPoolObserver attribute of the ThreadPool
|
||||
* object
|
||||
*
|
||||
*@param o The feature to be added to the ThreadPoolObserver attribute
|
||||
*/
|
||||
public void addThreadPoolObserver(ThreadPoolObserver o) {
|
||||
threadPoolObservers.add(o);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param threadNr Description of the Parameter
|
||||
*@param action Description of the Parameter
|
||||
*@param info Description of the Parameter
|
||||
*/
|
||||
protected void sendMessage(int threadNr, String action, String info) {
|
||||
|
||||
Iterator Ie = threadPoolObservers.iterator();
|
||||
//System.out.println("ThreadPool: Sende " + action + " message an " + threadPoolObservers.size() + " Observers");
|
||||
if (threadNr != -1) {
|
||||
while (Ie.hasNext()) {
|
||||
((ThreadPoolObserver) Ie.next()).threadUpdate(threadNr, action, info);
|
||||
}
|
||||
} else {
|
||||
while (Ie.hasNext()) {
|
||||
((ThreadPoolObserver) Ie.next()).queueUpdate(info, action);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the queueSize attribute of the ThreadPool object
|
||||
*
|
||||
*@return The queueSize value
|
||||
*/
|
||||
public synchronized int getQueueSize() {
|
||||
return this.queue.size();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the idleThreadsCount attribute of the ThreadPool object
|
||||
*
|
||||
*@return The idleThreadsCount value
|
||||
*/
|
||||
public synchronized int getIdleThreadsCount() {
|
||||
return this.idleThreads.size();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the busyThreadsCount attribute of the ThreadPool object
|
||||
*
|
||||
*@return The busyThreadsCount value
|
||||
*/
|
||||
public synchronized int getBusyThreadsCount() {
|
||||
return this.busyThreads.size();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the threadCount attribute of the ThreadPool object
|
||||
*
|
||||
*@return The threadCount value
|
||||
*/
|
||||
public synchronized int getThreadCount() {
|
||||
return this.idleThreads.size() + this.busyThreads.size();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the threadIterator attribute of the ThreadPool object
|
||||
*
|
||||
*@return The threadIterator value
|
||||
*/
|
||||
public Iterator getThreadIterator() {
|
||||
return allThreads.values().iterator();
|
||||
// return allThreads.iterator();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
*@param queue Description of the Parameter
|
||||
*/
|
||||
public void setQueue(TaskQueue queue) {
|
||||
this.queue = queue;
|
||||
}
|
||||
|
||||
public TaskQueue getTaskQueue()
|
||||
{
|
||||
return queue;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
|
@ -1,66 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.threads;
|
||||
|
||||
import de.lanlab.larm.util.Observer;
|
||||
|
||||
/**
|
||||
* an observer that observes the thread pool...
|
||||
*/
|
||||
public interface ThreadPoolObserver extends Observer
|
||||
{
|
||||
public void queueUpdate(String info, String action);
|
||||
public void threadUpdate(int threadNr, String action, String info);
|
||||
}
|
|
@ -1,62 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.threads;
|
||||
|
||||
public interface ThreadingStrategy
|
||||
{
|
||||
public void doTask(InterruptableTask t, Object key);
|
||||
public void interrupt();
|
||||
public void stop();
|
||||
}
|
|
@ -1,772 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.io.*;
|
||||
import java.util.*;
|
||||
|
||||
|
||||
class StoreException extends RuntimeException
|
||||
{
|
||||
Exception origException;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the StoreException object
|
||||
*
|
||||
* @param e Description of the Parameter
|
||||
*/
|
||||
public StoreException(Exception e)
|
||||
{
|
||||
origException = e;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Gets the message attribute of the StoreException object
|
||||
*
|
||||
* @return The message value
|
||||
*/
|
||||
public String getMessage()
|
||||
{
|
||||
return origException.getMessage();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*/
|
||||
public void printStackTrace()
|
||||
{
|
||||
System.err.println("StoreException occured with reason: " + origException.getMessage());
|
||||
origException.printStackTrace();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* internal class that represents one block within a queue
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 3. Januar 2002
|
||||
*/
|
||||
class QueueBlock
|
||||
{
|
||||
|
||||
|
||||
/**
|
||||
* the elements section will be set to null if it is on disk Vector elements
|
||||
* must be Serializable
|
||||
*/
|
||||
LinkedList elements;
|
||||
|
||||
/**
|
||||
* Anzahl Elemente im Block. Kopie von elements.size()
|
||||
*/
|
||||
int size;
|
||||
|
||||
/**
|
||||
* maximale Blockgröße
|
||||
*/
|
||||
int maxSize;
|
||||
|
||||
/**
|
||||
* if set, elements is null and block was written to file
|
||||
*/
|
||||
boolean onDisk;
|
||||
|
||||
/**
|
||||
* Blockname
|
||||
*/
|
||||
String name;
|
||||
|
||||
|
||||
/**
|
||||
* initialisiert den Block
|
||||
*
|
||||
* @param name Der Blockname (muss eindeutig sein, sonst Kollision auf
|
||||
* Dateiebene)
|
||||
* @param maxSize maximale Blockgröße. Über- und Unterläufe werden durch
|
||||
* Exceptions behandelt
|
||||
*/
|
||||
public QueueBlock(String name, int maxSize)
|
||||
{
|
||||
this.name = name;
|
||||
this.onDisk = false;
|
||||
this.elements = new LinkedList();
|
||||
this.maxSize = maxSize;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* serialisiert und speichert den Block auf Platte
|
||||
*
|
||||
* @exception StoreException Description of the Exception
|
||||
*/
|
||||
public void store()
|
||||
throws StoreException
|
||||
{
|
||||
try
|
||||
{
|
||||
ObjectOutputStream o = new ObjectOutputStream(new FileOutputStream(getFileName()));
|
||||
o.writeObject(elements);
|
||||
elements = null;
|
||||
o.close();
|
||||
onDisk = true;
|
||||
//System.out.println("CachingQueue.store: Block stored");
|
||||
}
|
||||
catch (IOException e)
|
||||
{
|
||||
System.err.println("CachingQueue.store: IOException");
|
||||
throw new StoreException(e);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* @return the filename of the block
|
||||
*/
|
||||
String getFileName()
|
||||
{
|
||||
// package protected!
|
||||
|
||||
return "cachingqueue/" + name + ".cqb";
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* load the block from disk
|
||||
*
|
||||
* @exception StoreException Description of the Exception
|
||||
*/
|
||||
public void load()
|
||||
throws StoreException
|
||||
{
|
||||
try
|
||||
{
|
||||
ObjectInputStream i = new ObjectInputStream(new FileInputStream(getFileName()));
|
||||
elements = (LinkedList) i.readObject();
|
||||
i.close();
|
||||
onDisk = false;
|
||||
size = elements.size();
|
||||
if (!(new File(getFileName()).delete()))
|
||||
{
|
||||
System.err.println("CachingQueue.load: file could not be deleted");
|
||||
}
|
||||
//System.out.println("CachingQueue.load: Block loaded");
|
||||
}
|
||||
catch (Exception e)
|
||||
{
|
||||
System.err.println("CachingQueue.load: Exception " + e.getClass().getName() + " occured");
|
||||
throw new StoreException(e);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* inserts an object at the start of the queue must be synchronized by
|
||||
* calling class to be thread safe
|
||||
*
|
||||
* @param o Description of the Parameter
|
||||
* @exception StoreException Description of the Exception
|
||||
*/
|
||||
public void insert(Object o)
|
||||
throws StoreException
|
||||
{
|
||||
if (onDisk)
|
||||
{
|
||||
load();
|
||||
}
|
||||
if (size >= maxSize)
|
||||
{
|
||||
throw new OverflowException();
|
||||
}
|
||||
elements.addFirst(o);
|
||||
size++;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* gibt das letzte Element aus der Queue zurück und löscht dieses must be
|
||||
* made synchronized by calling class to be thread safe
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
* @exception UnderflowException Description of the Exception
|
||||
* @exception StoreException Description of the Exception
|
||||
*/
|
||||
public Object remove()
|
||||
throws UnderflowException, StoreException
|
||||
{
|
||||
if (onDisk)
|
||||
{
|
||||
load();
|
||||
}
|
||||
if (size <= 0)
|
||||
{
|
||||
throw new UnderflowException();
|
||||
}
|
||||
size--;
|
||||
return elements.removeLast();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* @return the number of elements in the block
|
||||
*/
|
||||
public int size()
|
||||
{
|
||||
return size;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* destructor. Assures that all files are deleted, even if the queue was not
|
||||
* empty at the time when the program ended
|
||||
*/
|
||||
public void finalize()
|
||||
{
|
||||
// System.err.println("finalize von " + name + " called");
|
||||
if (onDisk)
|
||||
{
|
||||
// temp-Datei löschen. Passiert, wenn z.B. eine Exception aufgetreten ist
|
||||
// System.err.println("CachingQueue.finalize von Block " + name + ": lösche Datei");
|
||||
if (!(new File(getFileName()).delete()))
|
||||
{
|
||||
// Dateifehler möglich durch Exception: ignorieren
|
||||
|
||||
// System.err.println("CachingQueue.finalize: file could not be deleted although onDisk was true");
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* this class holds a queue whose data is kept on disk whenever possible.
|
||||
* It's a single ended queue, meaning data can only be added at the front and
|
||||
* taken from the back. the queue itself is divided into blocks. Only the first
|
||||
* and last blocks are kept in main memory, the rest is stored on disk. Only a
|
||||
* LinkedList entry is kept in memory then.
|
||||
* Blocks are swapped if an overflow (in case of insertions) or underflow (in case
|
||||
* of removals) occur.<br>
|
||||
*
|
||||
* <pre>
|
||||
* +---+---+---+---+-+
|
||||
* put -> | M | S | S | S |M| -> remove
|
||||
* +---+---+---+---+-+
|
||||
* </pre>
|
||||
* the maximum number of entries can be specified with the blockSize parameter. Thus,
|
||||
* the queue actually holds a maximum number of 2 x blockSize objects in main memory,
|
||||
* plus a few bytes for each block.<br>
|
||||
* The objects contained in the blocks are stored with the standard Java
|
||||
* serialization mechanism
|
||||
* The files are named "cachingqueue\\Queuename_BlockNumber.cqb"
|
||||
* note that the class is not synchronized
|
||||
* @author Clemens Marschner
|
||||
* @created 3. Januar 2002
|
||||
*/
|
||||
|
||||
public class CachingQueue implements Queue
|
||||
{
|
||||
/**
|
||||
* the Blocks
|
||||
*/
|
||||
LinkedList queueBlocks;
|
||||
|
||||
/**
|
||||
* fast access to the first block
|
||||
*/
|
||||
QueueBlock first = null;
|
||||
|
||||
/**
|
||||
* fast access to the last block
|
||||
*/
|
||||
QueueBlock last = null;
|
||||
|
||||
/**
|
||||
* maximum block size
|
||||
*/
|
||||
int blockSize;
|
||||
|
||||
/**
|
||||
* "primary key" identity count for each block
|
||||
*/
|
||||
int blockCount = 0;
|
||||
|
||||
/**
|
||||
* active blocks
|
||||
*/
|
||||
int numBlocks = 0;
|
||||
|
||||
/**
|
||||
* queue name
|
||||
*/
|
||||
String name;
|
||||
|
||||
/**
|
||||
* total number of objects
|
||||
*/
|
||||
int size;
|
||||
|
||||
|
||||
/**
|
||||
* init
|
||||
*
|
||||
* @param name the name of the queue, used in files names
|
||||
* @param blockSize maximum number of objects stored in one block
|
||||
*/
|
||||
public CachingQueue(String name, int blockSize)
|
||||
{
|
||||
queueBlocks = new LinkedList();
|
||||
this.name = name;
|
||||
this.blockSize = blockSize;
|
||||
// FIXME: the name of the caching queue directory needs to be in properties
|
||||
File cq = new File("cachingqueue");
|
||||
cq.mkdir();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* inserts an object to the front of the queue
|
||||
*
|
||||
* @param o the object to be inserted. must implement Serializable
|
||||
* @exception StoreException encapsulates Exceptions that occur when writing to hard disk
|
||||
*/
|
||||
public synchronized void insert(Object o)
|
||||
throws StoreException
|
||||
{
|
||||
if (last == null && first == null)
|
||||
{
|
||||
first = last = newBlock();
|
||||
queueBlocks.addFirst(first);
|
||||
numBlocks++;
|
||||
}
|
||||
if (last == null && first != null)
|
||||
{
|
||||
// affirm((last==null && first==null) || (last!= null && first!=null));
|
||||
System.err.println("Error in CachingQueue: last!=first==null");
|
||||
}
|
||||
|
||||
if (first.size() >= blockSize)
|
||||
{
|
||||
// save block and create a new one
|
||||
QueueBlock newBlock = newBlock();
|
||||
numBlocks++;
|
||||
if (last != first)
|
||||
{
|
||||
first.store();
|
||||
}
|
||||
queueBlocks.addFirst(newBlock);
|
||||
first = newBlock;
|
||||
}
|
||||
first.insert(o);
|
||||
size++;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* returns the last object from the queue
|
||||
*
|
||||
* @return the object returned
|
||||
*
|
||||
* @exception StoreException Description of the Exception
|
||||
* @exception UnderflowException if the queue was empty
|
||||
*/
|
||||
public synchronized Object remove()
|
||||
throws StoreException, UnderflowException
|
||||
{
|
||||
if (last == null)
|
||||
{
|
||||
throw new UnderflowException();
|
||||
}
|
||||
if (last.size() <= 0)
|
||||
{
|
||||
queueBlocks.removeLast();
|
||||
numBlocks--;
|
||||
if (numBlocks == 1)
|
||||
{
|
||||
last = first;
|
||||
}
|
||||
else if (numBlocks == 0)
|
||||
{
|
||||
first = last = null;
|
||||
throw new UnderflowException();
|
||||
}
|
||||
else if (numBlocks < 0)
|
||||
{
|
||||
// affirm(numBlocks >= 0)
|
||||
System.err.println("CachingQueue.remove: numBlocks<0!");
|
||||
throw new UnderflowException();
|
||||
}
|
||||
else
|
||||
{
|
||||
last = (QueueBlock) queueBlocks.getLast();
|
||||
}
|
||||
}
|
||||
--size;
|
||||
return last.remove();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* not supported
|
||||
*
|
||||
* @param c Description of the Parameter
|
||||
*/
|
||||
public void insertMultiple(java.util.Collection c)
|
||||
{
|
||||
throw new UnsupportedOperationException();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* creates a new block
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
private QueueBlock newBlock()
|
||||
{
|
||||
return new QueueBlock(name + "_" + blockCount++, blockSize);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* total number of objects contained in the queue
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public int size()
|
||||
{
|
||||
return size;
|
||||
}
|
||||
|
||||
public boolean isEmpty()
|
||||
{
|
||||
return size == 0;
|
||||
}
|
||||
|
||||
public void clear()
|
||||
{
|
||||
while(!isEmpty())
|
||||
{
|
||||
remove();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* testing
|
||||
*
|
||||
* @param args The command line arguments
|
||||
*/
|
||||
public static void main(String[] args)
|
||||
{
|
||||
System.out.println("Test1: " + CachingQueueTester.testUnderflow());
|
||||
System.out.println("Test2: " + CachingQueueTester.testInsert());
|
||||
System.out.println("Test3: " + CachingQueueTester.testBufReadWrite());
|
||||
System.out.println("Test4: " + CachingQueueTester.testBufReadWrite2());
|
||||
System.out.println("Test5: " + CachingQueueTester.testUnderflow2());
|
||||
System.out.println("Test6: " + CachingQueueTester.testBufReadWrite3());
|
||||
System.out.println("Test7: " + CachingQueueTester.testExceptions());
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Testklasse TODO: auslagern und per JUnit handhaben
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 3. Januar 2002
|
||||
*/
|
||||
class AssertionFailedException extends RuntimeException
|
||||
{
|
||||
}
|
||||
|
||||
/**
|
||||
* Testklasse. contains some tests for the caching queue
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 3. Januar 2002
|
||||
*/
|
||||
class CachingQueueTester
|
||||
{
|
||||
|
||||
|
||||
/**
|
||||
* A unit test for JUnit
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public static boolean testUnderflow()
|
||||
{
|
||||
CachingQueue cq = new CachingQueue("testQueue1", 10);
|
||||
try
|
||||
{
|
||||
cq.remove();
|
||||
}
|
||||
catch (UnderflowException e)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
catch (Exception e)
|
||||
{
|
||||
e.printStackTrace();
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* A unit test for JUnit
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public static boolean testInsert()
|
||||
{
|
||||
CachingQueue cq = new CachingQueue("testQueue2", 10);
|
||||
String test = "Test1";
|
||||
affirm(cq.size() == 0);
|
||||
cq.insert(test);
|
||||
affirm(cq.size() == 1);
|
||||
return (cq.remove() == test);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* A unit test for JUnit
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public static boolean testBufReadWrite()
|
||||
{
|
||||
CachingQueue cq = new CachingQueue("testQueue3", 2);
|
||||
String test1 = "Test1";
|
||||
String test2 = "Test2";
|
||||
String test3 = "Test3";
|
||||
cq.insert(test1);
|
||||
cq.insert(test2);
|
||||
cq.insert(test3);
|
||||
affirm(cq.size() == 3);
|
||||
cq.remove();
|
||||
cq.remove();
|
||||
affirm(cq.size() == 1);
|
||||
return (cq.remove() == test3);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* A unit test for JUnit
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public static boolean testBufReadWrite2()
|
||||
{
|
||||
CachingQueue cq = new CachingQueue("testQueue4", 2);
|
||||
String test1 = "Test1";
|
||||
String test2 = "Test2";
|
||||
String test3 = "Test3";
|
||||
String test4 = "Test4";
|
||||
String test5 = "Test5";
|
||||
cq.insert(test1);
|
||||
cq.insert(test2);
|
||||
cq.insert(test3);
|
||||
cq.insert(test4);
|
||||
cq.insert(test5);
|
||||
affirm(cq.size() == 5);
|
||||
String t = (String) cq.remove();
|
||||
affirm(t.equals(test1));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test2));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test3));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test4));
|
||||
t = (String) cq.remove();
|
||||
affirm(cq.size() == 0);
|
||||
return (t.equals(test5));
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Description of the Method
|
||||
*
|
||||
* @param expr Description of the Parameter
|
||||
*/
|
||||
public static void affirm(boolean expr)
|
||||
{
|
||||
if (!expr)
|
||||
{
|
||||
throw new AssertionFailedException();
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* A unit test for JUnit
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public static boolean testUnderflow2()
|
||||
{
|
||||
CachingQueue cq = new CachingQueue("testQueue5", 2);
|
||||
String test1 = "Test1";
|
||||
String test2 = "Test2";
|
||||
String test3 = "Test3";
|
||||
String test4 = "Test4";
|
||||
String test5 = "Test5";
|
||||
cq.insert(test1);
|
||||
cq.insert(test2);
|
||||
cq.insert(test3);
|
||||
cq.insert(test4);
|
||||
cq.insert(test5);
|
||||
affirm(cq.remove().equals(test1));
|
||||
affirm(cq.remove().equals(test2));
|
||||
affirm(cq.remove().equals(test3));
|
||||
affirm(cq.remove().equals(test4));
|
||||
affirm(cq.remove().equals(test5));
|
||||
try
|
||||
{
|
||||
cq.remove();
|
||||
}
|
||||
catch (UnderflowException e)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* A unit test for JUnit
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public static boolean testBufReadWrite3()
|
||||
{
|
||||
CachingQueue cq = new CachingQueue("testQueue4", 1);
|
||||
String test1 = "Test1";
|
||||
String test2 = "Test2";
|
||||
String test3 = "Test3";
|
||||
String test4 = "Test4";
|
||||
String test5 = "Test5";
|
||||
cq.insert(test1);
|
||||
cq.insert(test2);
|
||||
cq.insert(test3);
|
||||
cq.insert(test4);
|
||||
cq.insert(test5);
|
||||
String t = (String) cq.remove();
|
||||
affirm(t.equals(test1));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test2));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test3));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test4));
|
||||
t = (String) cq.remove();
|
||||
return (t.equals(test5));
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* A unit test for JUnit
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
*/
|
||||
public static boolean testExceptions()
|
||||
{
|
||||
System.gc();
|
||||
CachingQueue cq = new CachingQueue("testQueue5", 1);
|
||||
String test1 = "Test1";
|
||||
String test2 = "Test2";
|
||||
String test3 = "Test3";
|
||||
String test4 = "Test4";
|
||||
String test5 = "Test5";
|
||||
cq.insert(test1);
|
||||
cq.insert(test2);
|
||||
cq.insert(test3);
|
||||
cq.insert(test4);
|
||||
cq.insert(test5);
|
||||
try
|
||||
{
|
||||
if (!(new File("testQueue5_1.cqb").delete()))
|
||||
{
|
||||
System.err.println("CachingQueueTester.textExceptions: Store 1 deleted. file name changed?");
|
||||
}
|
||||
if (!(new File("testQueue5_2.cqb").delete()))
|
||||
{
|
||||
System.err.println("CachingQueueTester.textExceptions: Store 2 deleted. file name changed?");
|
||||
}
|
||||
String t = (String) cq.remove();
|
||||
affirm(t.equals(test1));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test2));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test3));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test4));
|
||||
t = (String) cq.remove();
|
||||
affirm(t.equals(test5));
|
||||
}
|
||||
catch (StoreException e)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
finally
|
||||
{
|
||||
cq = null;
|
||||
System.gc();
|
||||
// finalizer müssten aufgerufen werden
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
}
|
|
@ -1,323 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.lang.reflect.*;
|
||||
import java.io.*;
|
||||
import java.util.*;
|
||||
|
||||
/**
|
||||
* prints class information with the reflection api
|
||||
* for debugging only
|
||||
*/
|
||||
public class ClassInfo
|
||||
{
|
||||
|
||||
public ClassInfo()
|
||||
{
|
||||
}
|
||||
|
||||
/**
|
||||
* Usage: java ClassInfo PackageName.MyNewClassName PackageName.DerivedClassName
|
||||
*/
|
||||
public static void main(String[] args)
|
||||
{
|
||||
|
||||
String name = args[0];
|
||||
String derivedName = args[1];
|
||||
LinkedList l = new LinkedList();
|
||||
ListIterator itry = l.listIterator();
|
||||
|
||||
try
|
||||
{
|
||||
Class cls = Class.forName(name);
|
||||
classInfo(derivedName, cls);
|
||||
}
|
||||
catch(Throwable t)
|
||||
{
|
||||
t.printStackTrace();
|
||||
}
|
||||
}
|
||||
|
||||
public static void classInfo(String derivedName, Class cls) throws SecurityException
|
||||
{
|
||||
String name = cls.getName();
|
||||
String pkg = getPackageName(name);
|
||||
String clss = getClassName(name);
|
||||
|
||||
StringWriter importsWriter = new StringWriter();
|
||||
PrintWriter imports = new PrintWriter(importsWriter);
|
||||
StringWriter outWriter = new StringWriter();
|
||||
PrintWriter out = new PrintWriter(outWriter);
|
||||
|
||||
TreeSet importClasses = new TreeSet();
|
||||
importClasses.add(getImportStatement(name));
|
||||
|
||||
out.println("/**\n * (class description here)\n */\npublic class " + derivedName + " " + (cls.isInterface() ? "implements " : "extends ") + clss + "\n{");
|
||||
|
||||
Method[] m = cls.getMethods();
|
||||
for(int i= 0; i< m.length; i++)
|
||||
{
|
||||
Method thism = m[i];
|
||||
if((thism.getModifiers() & Modifier.PRIVATE) == 0 && ((thism.getModifiers() & Modifier.FINAL) == 0)
|
||||
&& (thism.getDeclaringClass().getName() != "java.lang.Object"))
|
||||
{
|
||||
out.println(" /**");
|
||||
out.println(" * (method description here)");
|
||||
out.println(" * defined in " + thism.getDeclaringClass().getName());
|
||||
|
||||
Class[] parameters = thism.getParameterTypes();
|
||||
for(int j = 0; j < parameters.length; j ++)
|
||||
{
|
||||
if(getPackageName(parameters[j].getName()) != "")
|
||||
{
|
||||
importClasses.add(getImportStatement(parameters[j].getName()));
|
||||
}
|
||||
out.println(" * @param p" + j + " (parameter description here)");
|
||||
}
|
||||
|
||||
if(thism.getReturnType().getName() != "void")
|
||||
{
|
||||
String returnPackage = getPackageName(thism.getReturnType().getName());
|
||||
if(returnPackage != "")
|
||||
{
|
||||
importClasses.add(getImportStatement(thism.getReturnType().getName()));
|
||||
}
|
||||
out.println(" * @return (return value description here)");
|
||||
}
|
||||
|
||||
out.println(" */");
|
||||
|
||||
out.print(" " + getModifierString(thism.getModifiers()) + getClassName(thism.getReturnType().getName()) + " ");
|
||||
out.print(thism.getName() + "(");
|
||||
|
||||
for(int j = 0; j < parameters.length; j ++)
|
||||
{
|
||||
if(j>0)
|
||||
{
|
||||
out.print(", ");
|
||||
}
|
||||
out.print(getClassName(parameters[j].getName()) + " p" + j);
|
||||
}
|
||||
out.print(")");
|
||||
Class[] exceptions = thism.getExceptionTypes();
|
||||
|
||||
if (exceptions.length > 0)
|
||||
{
|
||||
out.print(" throws ");
|
||||
}
|
||||
|
||||
for(int k = 0; k < exceptions.length; k++)
|
||||
{
|
||||
if(k > 0)
|
||||
{
|
||||
out.print(", ");
|
||||
}
|
||||
String exCompleteName = exceptions[k].getName();
|
||||
String exName = getClassName(exCompleteName);
|
||||
importClasses.add(getImportStatement(exCompleteName));
|
||||
|
||||
out.print(exName);
|
||||
}
|
||||
out.print("\n" +
|
||||
" {\n" +
|
||||
" /**@todo: Implement this " + thism.getName() + "() method */\n" +
|
||||
" throw new UnsupportedOperationException(\"Method " + thism.getName() + "() not yet implemented.\");\n" +
|
||||
" }\n\n");
|
||||
|
||||
|
||||
}
|
||||
}
|
||||
out.println("}");
|
||||
|
||||
Iterator importIterator = importClasses.iterator();
|
||||
while(importIterator.hasNext())
|
||||
{
|
||||
String importName = (String)importIterator.next();
|
||||
if(!importName.startsWith("java.lang"))
|
||||
{
|
||||
imports.println("import " + importName + ";");
|
||||
}
|
||||
}
|
||||
|
||||
out.flush();
|
||||
imports.flush();
|
||||
|
||||
if(getPackageName(derivedName) != "")
|
||||
{
|
||||
System.out.println("package " + getPackageName(derivedName) + ";\n");
|
||||
}
|
||||
System.out.println( "/**\n" +
|
||||
" * Title: \n" +
|
||||
" * Description:\n" +
|
||||
" * Copyright: Copyright (c)\n" +
|
||||
" * Company:\n" +
|
||||
" * @author\n" +
|
||||
" * @version 1.0\n" +
|
||||
" */\n");
|
||||
System.out.println(importsWriter.getBuffer());
|
||||
System.out.print(outWriter.getBuffer());
|
||||
}
|
||||
|
||||
public static String getPackageName(String className)
|
||||
{
|
||||
if(className.charAt(0) == '[')
|
||||
{
|
||||
switch(className.charAt(1))
|
||||
{
|
||||
case 'L':
|
||||
return getPackageName(className.substring(2,className.length()-1));
|
||||
default:
|
||||
return "";
|
||||
}
|
||||
}
|
||||
String name = className.lastIndexOf(".") != -1 ? className.substring(0, className.lastIndexOf(".")) : "";
|
||||
//System.out.println("Package: " + name);
|
||||
return name;
|
||||
}
|
||||
|
||||
public static String getClassName(String className)
|
||||
{
|
||||
if(className.charAt(0) == '[')
|
||||
{
|
||||
switch(className.charAt(1))
|
||||
{
|
||||
case 'L':
|
||||
return getClassName(className.substring(2,className.length()-1)) + "[]";
|
||||
case 'C':
|
||||
return "char[]";
|
||||
case 'I':
|
||||
return "int[]";
|
||||
case 'B':
|
||||
return "byte[]";
|
||||
// rest is missing here
|
||||
|
||||
}
|
||||
}
|
||||
String name = (className.lastIndexOf(".") > -1) ? className.substring(className.lastIndexOf(".")+1) : className;
|
||||
//System.out.println("Class: " + name);
|
||||
return name;
|
||||
}
|
||||
|
||||
static String getImportStatement(String className)
|
||||
{
|
||||
String pack = getPackageName(className);
|
||||
String clss = getClassName(className);
|
||||
if(clss.indexOf("[]") > -1)
|
||||
{
|
||||
return pack + "." + clss.substring(0,clss.length() - 2);
|
||||
}
|
||||
else
|
||||
{
|
||||
return pack + "." + clss;
|
||||
}
|
||||
}
|
||||
|
||||
public static String getModifierString(int modifiers)
|
||||
{
|
||||
StringBuffer mods = new StringBuffer();
|
||||
if((modifiers & Modifier.ABSTRACT) != 0)
|
||||
{
|
||||
mods.append("abstract ");
|
||||
}
|
||||
if((modifiers & Modifier.FINAL) != 0)
|
||||
{
|
||||
mods.append("final ");
|
||||
}
|
||||
if((modifiers & Modifier.INTERFACE) != 0)
|
||||
{
|
||||
mods.append("interface ");
|
||||
}
|
||||
if((modifiers & Modifier.NATIVE) != 0)
|
||||
{
|
||||
mods.append("native ");
|
||||
}
|
||||
if((modifiers & Modifier.PRIVATE) != 0)
|
||||
{
|
||||
mods.append("private ");
|
||||
}
|
||||
if((modifiers & Modifier.PROTECTED) != 0)
|
||||
{
|
||||
mods.append("protected ");
|
||||
}
|
||||
if((modifiers & Modifier.PUBLIC) != 0)
|
||||
{
|
||||
mods.append("public ");
|
||||
}
|
||||
if((modifiers & Modifier.STATIC) != 0)
|
||||
{
|
||||
mods.append("static ");
|
||||
}
|
||||
if((modifiers & Modifier.STRICT) != 0)
|
||||
{
|
||||
mods.append("strictfp ");
|
||||
}
|
||||
if((modifiers & Modifier.SYNCHRONIZED) != 0)
|
||||
{
|
||||
mods.append("synchronized ");
|
||||
}
|
||||
if((modifiers & Modifier.TRANSIENT) != 0)
|
||||
{
|
||||
mods.append("transient ");
|
||||
}
|
||||
if((modifiers & Modifier.VOLATILE) != 0)
|
||||
{
|
||||
mods.append("volatile ");
|
||||
}
|
||||
return mods.toString();
|
||||
}
|
||||
|
||||
|
||||
}
|
|
@ -1,371 +0,0 @@
|
|||
package de.lanlab.larm.util;
|
||||
|
||||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
import java.util.*;
|
||||
|
||||
/**
|
||||
* simple hashed linked list. It allows for inserting and removing elements like
|
||||
* in a hash table (in fact, it uses a HashMap), while still being able to easily
|
||||
* traverse the collection like a list. In addition, the iterator is circular. It
|
||||
* always returns a next element as long as there are elements in the list. In
|
||||
* contrast to the iterator of Sun's collection classes, this class can cope with
|
||||
* inserts and removals while traversing the list.<p>
|
||||
* Elements are always added to the end of the list, that is, always at the same place<br>
|
||||
* All operations should work in near constant time as the list grows. Only the
|
||||
* trade-off costs of a hash (memory versus speed) have to be considered.
|
||||
* The List doesn't accept null elements
|
||||
* @todo put the traversal function into an Iterator
|
||||
* @todo implement the class as a derivate from a Hash
|
||||
*/
|
||||
public class HashedCircularLinkedList
|
||||
{
|
||||
|
||||
|
||||
/**
|
||||
* Entry class.
|
||||
*/
|
||||
private static class Entry
|
||||
{
|
||||
Object key;
|
||||
Object element;
|
||||
Entry next;
|
||||
Entry previous;
|
||||
|
||||
Entry(Object element, Entry next, Entry previous, Object key)
|
||||
{
|
||||
this.element = element;
|
||||
this.next = next;
|
||||
this.previous = previous;
|
||||
this.key = key;
|
||||
}
|
||||
}
|
||||
|
||||
public Object getCurrentKey()
|
||||
{
|
||||
|
||||
return current != null ? current.key : null;
|
||||
|
||||
}
|
||||
|
||||
/**
|
||||
* the list. contains objects
|
||||
*/
|
||||
private transient Entry header = new Entry(null, null, null, null);
|
||||
|
||||
/**
|
||||
* the hash. maps keys to entries, which by themselves map to objects
|
||||
*/
|
||||
HashMap keys;
|
||||
|
||||
private transient int size = 0;
|
||||
|
||||
/** the current entry in the traversal */
|
||||
Entry current = null;
|
||||
|
||||
/**
|
||||
* Constructs an empty list.
|
||||
*/
|
||||
public HashedCircularLinkedList(int initialCapacity, float loadFactor)
|
||||
{
|
||||
header.next = header.previous = header;
|
||||
keys = new HashMap(initialCapacity, loadFactor);
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns the number of elements in this list.
|
||||
*
|
||||
* @return the number of elements in this list.
|
||||
*/
|
||||
public int size()
|
||||
{
|
||||
return size;
|
||||
}
|
||||
|
||||
/**
|
||||
* Removes the first occurrence of the specified element in this list. If
|
||||
* the list does not contain the element, it is unchanged. More formally,
|
||||
* removes the element with the lowest index <tt>i</tt> such that
|
||||
* <tt>(o==null ? get(i)==null : o.equals(get(i)))</tt> (if such an
|
||||
* element exists).
|
||||
*
|
||||
* @param o element to be removed from this list, if present.
|
||||
* @return <tt>true</tt> if the list contained the specified element.
|
||||
*/
|
||||
public boolean removeByKey(Object o)
|
||||
{
|
||||
// assert(o != null)
|
||||
Entry e = (Entry)keys.get(o);
|
||||
if(e != null)
|
||||
{
|
||||
if(e == current)
|
||||
{
|
||||
if(size > 1)
|
||||
{
|
||||
current = previousEntry(current);
|
||||
}
|
||||
else
|
||||
{
|
||||
current = null;
|
||||
}
|
||||
}
|
||||
this.removeEntryFromList(e);
|
||||
keys.remove(o);
|
||||
size--;
|
||||
return true;
|
||||
}
|
||||
else
|
||||
{
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Removes all of the elements from this list.
|
||||
*/
|
||||
public void clear()
|
||||
{
|
||||
// list
|
||||
header.next = header.previous = header;
|
||||
|
||||
// hash
|
||||
keys.clear();
|
||||
|
||||
size = 0;
|
||||
current = null;
|
||||
}
|
||||
|
||||
|
||||
private Entry addEntryBefore(Object key, Object o, Entry e)
|
||||
{
|
||||
Entry newEntry = new Entry(o, e, e.previous, key);
|
||||
newEntry.previous.next = newEntry;
|
||||
newEntry.next.previous = newEntry;
|
||||
return newEntry;
|
||||
}
|
||||
|
||||
private void removeEntryFromList(Entry e)
|
||||
{
|
||||
if(e != null)
|
||||
{
|
||||
if (e == header)
|
||||
{
|
||||
throw new NoSuchElementException();
|
||||
}
|
||||
|
||||
e.previous.next = e.next;
|
||||
e.next.previous = e.previous;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* (method description here)
|
||||
* defined in java.util.Map
|
||||
* @param p0 (parameter description here)
|
||||
* @param p1 (parameter description here)
|
||||
* @return (return value description here)
|
||||
*/
|
||||
public boolean put(Object key, Object value)
|
||||
{
|
||||
if(key != null && !keys.containsKey(key))
|
||||
{
|
||||
Entry e = addEntryBefore(key, value, header); // add it as the last element
|
||||
keys.put(key, e); // link key to entry
|
||||
size++;
|
||||
return true;
|
||||
}
|
||||
else
|
||||
{
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
public boolean hasNext()
|
||||
{
|
||||
return (size > 0);
|
||||
}
|
||||
|
||||
private Entry nextEntry(Entry e)
|
||||
{
|
||||
// assert(e != null)
|
||||
if(size > 1)
|
||||
{
|
||||
if(e == null)
|
||||
{
|
||||
e = header;
|
||||
}
|
||||
Entry next = e.next;
|
||||
if(next == header)
|
||||
{
|
||||
next = next.next;
|
||||
}
|
||||
return next;
|
||||
}
|
||||
else if(size == 1)
|
||||
{
|
||||
return header.next;
|
||||
}
|
||||
else
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
private Entry previousEntry(Entry e)
|
||||
{
|
||||
// assert(e != null)
|
||||
if(size > 1)
|
||||
{
|
||||
if(e == null)
|
||||
{
|
||||
e = header;
|
||||
}
|
||||
Entry previous = e.previous;
|
||||
if(previous == header)
|
||||
{
|
||||
previous = previous.previous;
|
||||
}
|
||||
return previous;
|
||||
}
|
||||
else if(size == 1)
|
||||
{
|
||||
return header.previous;
|
||||
}
|
||||
else
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
public Object next()
|
||||
{
|
||||
current = nextEntry(current);
|
||||
if(current != null)
|
||||
{
|
||||
return current.element;
|
||||
}
|
||||
else
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
public void removeCurrent()
|
||||
{
|
||||
keys.remove(current.key);
|
||||
removeEntryFromList(current);
|
||||
}
|
||||
|
||||
|
||||
public Object get(Object key)
|
||||
{
|
||||
Entry e = ((Entry)keys.get(key));
|
||||
if(e != null)
|
||||
{
|
||||
return e.element;
|
||||
}
|
||||
else
|
||||
{
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* testing
|
||||
*/
|
||||
public static void main(String[] args)
|
||||
{
|
||||
HashedCircularLinkedList h = new HashedCircularLinkedList(20, 0.75f);
|
||||
h.put("1", "a");
|
||||
h.put("2", "b");
|
||||
h.put("3", "c");
|
||||
String t;
|
||||
System.out.println("size [3]: " + h.size());
|
||||
t = (String)h.next();
|
||||
System.out.println("2nd element via get [b]: " + h.get("2"));
|
||||
|
||||
System.out.println("next element [a]: " + t);
|
||||
t = (String)h.next();
|
||||
System.out.println("next element [b]: " + t);
|
||||
t = (String)h.next();
|
||||
System.out.println("next element [c]: " + t);
|
||||
t = (String)h.next();
|
||||
System.out.println("1st element after circular traversal [a]: " + t);
|
||||
h.removeByKey("1");
|
||||
System.out.println("1st element after remove [null]: " + h.get("1"));
|
||||
System.out.println("size after removal [2]: " + h.size());
|
||||
t = (String)h.next();
|
||||
System.out.println("next element [b]: " + t);
|
||||
t = (String)h.next();
|
||||
System.out.println("next element [c]: " + t);
|
||||
t = (String)h.next();
|
||||
System.out.println("next element [b]: " + t);
|
||||
h.removeCurrent();
|
||||
t = (String)h.next();
|
||||
System.out.println("next element after 1 removal [c]: " + t);
|
||||
t = (String)h.next();
|
||||
System.out.println("next element: [c]: " + t);
|
||||
h.removeByKey("3");
|
||||
System.out.println("size after 3 removals [0]: " + h.size());
|
||||
t = (String)h.next();
|
||||
System.out.println("next element [null]: " + t);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -1,63 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
public interface InputStreamObserver
|
||||
{
|
||||
public void notifyOpened(ObservableInputStream in, long timeElapsed);
|
||||
public void notifyClosed(ObservableInputStream in, long timeElapsed);
|
||||
public void notifyRead(ObservableInputStream in, long timeElapsed, int nrRead, int totalRead);
|
||||
public void notifyFinished(ObservableInputStream in, long timeElapsed, int totalRead);
|
||||
}
|
|
@ -1,68 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.io.*;
|
||||
|
||||
public class Logger
|
||||
{
|
||||
private FileOutputStream out;
|
||||
|
||||
public Logger(String fileName)
|
||||
{
|
||||
|
||||
}
|
||||
|
||||
}
|
|
@ -1,146 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.io.*;
|
||||
|
||||
public class ObservableInputStream extends FilterInputStream
|
||||
{
|
||||
private boolean reporting = true;
|
||||
private long startTime;
|
||||
private int totalRead = 0;
|
||||
private int step = 1;
|
||||
private int nextStep = 0;
|
||||
|
||||
InputStreamObserver observer;
|
||||
|
||||
public ObservableInputStream(InputStream in, InputStreamObserver iso, int reportingStep)
|
||||
{
|
||||
super(in);
|
||||
startTime = System.currentTimeMillis();
|
||||
observer = iso;
|
||||
observer.notifyOpened(this, System.currentTimeMillis() - startTime);
|
||||
nextStep = step = reportingStep;
|
||||
}
|
||||
|
||||
public void close() throws IOException
|
||||
{
|
||||
super.close();
|
||||
observer.notifyClosed(this, System.currentTimeMillis() - startTime);
|
||||
}
|
||||
|
||||
public void setReporting(boolean reporting)
|
||||
{
|
||||
this.reporting = reporting;
|
||||
}
|
||||
|
||||
public boolean isReporting()
|
||||
{
|
||||
return reporting;
|
||||
}
|
||||
|
||||
public void setReportingStep(int step)
|
||||
{
|
||||
this.step = step;
|
||||
}
|
||||
|
||||
public int read() throws IOException
|
||||
{
|
||||
int readByte = super.read();
|
||||
if(reporting)
|
||||
{
|
||||
notifyObserver(readByte>=0? 1 : 0);
|
||||
}
|
||||
return readByte;
|
||||
}
|
||||
|
||||
public int read(byte[] b) throws IOException
|
||||
{
|
||||
int nrRead = super.read(b);
|
||||
if(reporting)
|
||||
{
|
||||
notifyObserver(nrRead);
|
||||
}
|
||||
return nrRead;
|
||||
}
|
||||
|
||||
private void notifyObserver(int nrRead)
|
||||
{
|
||||
if(nrRead > 0)
|
||||
{
|
||||
totalRead += nrRead;
|
||||
if(totalRead > nextStep)
|
||||
{
|
||||
nextStep += step;
|
||||
observer.notifyRead(this, System.currentTimeMillis() - startTime, nrRead, totalRead);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
observer.notifyFinished(this, System.currentTimeMillis() - startTime, totalRead);
|
||||
}
|
||||
}
|
||||
|
||||
public int read(byte[] b, int offs, int size) throws IOException
|
||||
{
|
||||
int nrRead = super.read(b, offs, size);
|
||||
if(reporting)
|
||||
{
|
||||
notifyObserver(nrRead);
|
||||
}
|
||||
return nrRead;
|
||||
}
|
||||
}
|
||||
|
|
@ -1,63 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
|
||||
/**
|
||||
* not used
|
||||
*/
|
||||
public interface Observer
|
||||
{
|
||||
}
|
|
@ -1,69 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
/**
|
||||
* Title: LARM
|
||||
* Description:
|
||||
* Copyright: Copyright (c) 2001
|
||||
* Company: LMU-IP
|
||||
* @author Clemens Marschner
|
||||
* @version 1.0
|
||||
*/
|
||||
|
||||
|
||||
public class OverflowException extends RuntimeException
|
||||
{
|
||||
}
|
|
@ -1,76 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
/**
|
||||
* Title: LARM Lanlab Retrieval Machine
|
||||
* Description:
|
||||
* Copyright: Copyright (c)
|
||||
* Company:
|
||||
* @author
|
||||
* @version 1.0
|
||||
*/
|
||||
|
||||
import java.util.Collection;
|
||||
|
||||
public interface Queue
|
||||
{
|
||||
public Object remove();
|
||||
public void insert(Object o);
|
||||
public void insertMultiple(Collection c);
|
||||
public int size();
|
||||
public boolean isEmpty();
|
||||
public void clear();
|
||||
}
|
|
@ -1,334 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
import java.io.*;
|
||||
|
||||
/**
|
||||
* A <code>SimpleCharArrayReader</code> contains
|
||||
* an internal buffer that contains bytes that
|
||||
* may be read from the stream. An internal
|
||||
* counter keeps track of the next byte to
|
||||
* be supplied by the <code>read</code> method.
|
||||
* <br>
|
||||
* In contrast to the original <code>CharArrayReader</code> this
|
||||
* version is not thread safe. The monitor on the read()-function caused programs
|
||||
* to slow down much, because this function is called for every character. This
|
||||
* class can thus only be used if only one thread is accessing the stream
|
||||
* @author Clemens Marschner
|
||||
* @version 1.00
|
||||
* @see java.io.ByteArrayInputStream
|
||||
*/
|
||||
public
|
||||
class SimpleCharArrayReader extends Reader
|
||||
{
|
||||
|
||||
/**
|
||||
* A flag that is set to true when this stream is closed.
|
||||
*/
|
||||
private boolean isClosed = false;
|
||||
|
||||
/**
|
||||
* An array of bytes that was provided
|
||||
* by the creator of the stream. Elements <code>buf[0]</code>
|
||||
* through <code>buf[count-1]</code> are the
|
||||
* only bytes that can ever be read from the
|
||||
* stream; element <code>buf[pos]</code> is
|
||||
* the next byte to be read.
|
||||
*/
|
||||
protected char buf[];
|
||||
|
||||
/**
|
||||
* The index of the next character to read from the input stream buffer.
|
||||
* This value should always be nonnegative
|
||||
* and not larger than the value of <code>count</code>.
|
||||
* The next byte to be read from the input stream buffer
|
||||
* will be <code>buf[pos]</code>.
|
||||
*/
|
||||
protected int pos;
|
||||
|
||||
/**
|
||||
* The currently marked position in the stream.
|
||||
* SimpleCharArrayReader objects are marked at position zero by
|
||||
* default when constructed. They may be marked at another
|
||||
* position within the buffer by the <code>mark()</code> method.
|
||||
* The current buffer position is set to this point by the
|
||||
* <code>reset()</code> method.
|
||||
*
|
||||
* @since JDK1.1
|
||||
*/
|
||||
protected int mark = 0;
|
||||
|
||||
/**
|
||||
* The index one greater than the last valid character in the input
|
||||
* stream buffer.
|
||||
* This value should always be nonnegative
|
||||
* and not larger than the length of <code>buf</code>.
|
||||
* It is one greater than the position of
|
||||
* the last byte within <code>buf</code> that
|
||||
* can ever be read from the input stream buffer.
|
||||
*/
|
||||
protected int count;
|
||||
|
||||
/**
|
||||
* Creates a <code>SimpleCharArrayReader</code>
|
||||
* so that it uses <code>buf</code> as its
|
||||
* buffer array.
|
||||
* The buffer array is not copied.
|
||||
* The initial value of <code>pos</code>
|
||||
* is <code>0</code> and the initial value
|
||||
* of <code>count</code> is the length of
|
||||
* <code>buf</code>.
|
||||
*
|
||||
* @param buf the input buffer.
|
||||
*/
|
||||
public SimpleCharArrayReader(char buf[])
|
||||
{
|
||||
this.buf = buf;
|
||||
this.pos = 0;
|
||||
this.count = buf.length;
|
||||
}
|
||||
|
||||
/**
|
||||
* Creates <code>SimpleCharArrayReader</code>
|
||||
* that uses <code>buf</code> as its
|
||||
* buffer array. The initial value of <code>pos</code>
|
||||
* is <code>offset</code> and the initial value
|
||||
* of <code>count</code> is <code>offset+len</code>.
|
||||
* The buffer array is not copied.
|
||||
* <p>
|
||||
* Note that if bytes are simply read from
|
||||
* the resulting input stream, elements <code>buf[pos]</code>
|
||||
* through <code>buf[pos+len-1]</code> will
|
||||
* be read; however, if a <code>reset</code>
|
||||
* operation is performed, then bytes <code>buf[0]</code>
|
||||
* through b<code>uf[pos-1]</code> will then
|
||||
* become available for input.
|
||||
*
|
||||
* @param buf the input buffer.
|
||||
* @param offset the offset in the buffer of the first byte to read.
|
||||
* @param length the maximum number of bytes to read from the buffer.
|
||||
*/
|
||||
public SimpleCharArrayReader(char buf[], int offset, int length)
|
||||
{
|
||||
this.buf = buf;
|
||||
this.pos = offset;
|
||||
this.count = Math.min(offset + length, buf.length);
|
||||
this.mark = offset;
|
||||
}
|
||||
|
||||
/**
|
||||
* Reads the next byte of data from this input stream. The value
|
||||
* byte is returned as an <code>int</code> in the range
|
||||
* <code>0</code> to <code>255</code>. If no byte is available
|
||||
* because the end of the stream has been reached, the value
|
||||
* <code>-1</code> is returned.
|
||||
* <p>
|
||||
*
|
||||
* @return the next byte of data, or <code>-1</code> if the end of the
|
||||
* stream has been reached.
|
||||
*/
|
||||
public int read()
|
||||
{
|
||||
return (pos < count) ? (buf[pos++] & 0xff) : -1;
|
||||
}
|
||||
|
||||
/**
|
||||
* Reads up to <code>len</code> bytes of data into an array of bytes
|
||||
* from this input stream.
|
||||
* If <code>pos</code> equals <code>count</code>,
|
||||
* then <code>-1</code> is returned to indicate
|
||||
* end of file. Otherwise, the number <code>k</code>
|
||||
* of bytes read is equal to the smaller of
|
||||
* <code>len</code> and <code>count-pos</code>.
|
||||
* If <code>k</code> is positive, then bytes
|
||||
* <code>buf[pos]</code> through <code>buf[pos+k-1]</code>
|
||||
* are copied into <code>b[off]</code> through
|
||||
* <code>b[off+k-1]</code> in the manner performed
|
||||
* by <code>System.arraycopy</code>. The
|
||||
* value <code>k</code> is added into <code>pos</code>
|
||||
* and <code>k</code> is returned.
|
||||
* <p>
|
||||
* This <code>read</code> method cannot block.
|
||||
*
|
||||
* @param b the buffer into which the data is read.
|
||||
* @param off the start offset of the data.
|
||||
* @param len the maximum number of bytes read.
|
||||
* @return the total number of bytes read into the buffer, or
|
||||
* <code>-1</code> if there is no more data because the end of
|
||||
* the stream has been reached.
|
||||
*/
|
||||
public int read(char b[], int off, int len)
|
||||
{
|
||||
if (b == null)
|
||||
{
|
||||
throw new NullPointerException();
|
||||
}
|
||||
else if ((off < 0) || (off > b.length) || (len < 0) ||
|
||||
((off + len) > b.length) || ((off + len) < 0))
|
||||
{
|
||||
throw new IndexOutOfBoundsException();
|
||||
}
|
||||
if (pos >= count)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
if (pos + len > count)
|
||||
{
|
||||
len = count - pos;
|
||||
}
|
||||
if (len <= 0)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
System.arraycopy(buf, pos, b, off, len);
|
||||
pos += len;
|
||||
return len;
|
||||
}
|
||||
|
||||
/**
|
||||
* Skips <code>n</code> bytes of input from this input stream. Fewer
|
||||
* bytes might be skipped if the end of the input stream is reached.
|
||||
* The actual number <code>k</code>
|
||||
* of bytes to be skipped is equal to the smaller
|
||||
* of <code>n</code> and <code>count-pos</code>.
|
||||
* The value <code>k</code> is added into <code>pos</code>
|
||||
* and <code>k</code> is returned.
|
||||
*
|
||||
* @param n the number of bytes to be skipped.
|
||||
* @return the actual number of bytes skipped.
|
||||
*/
|
||||
public long skip(long n)
|
||||
{
|
||||
if (pos + n > count)
|
||||
{
|
||||
n = count - pos;
|
||||
}
|
||||
if (n < 0)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
pos += n;
|
||||
return n;
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns the number of bytes that can be read from this input
|
||||
* stream without blocking.
|
||||
* The value returned is
|
||||
* <code>count - pos</code>,
|
||||
* which is the number of bytes remaining to be read from the input buffer.
|
||||
*
|
||||
* @return the number of bytes that can be read from the input stream
|
||||
* without blocking.
|
||||
*/
|
||||
public int available()
|
||||
{
|
||||
return count - pos;
|
||||
}
|
||||
|
||||
/**
|
||||
* Tests if SimpleCharArrayReader supports mark/reset.
|
||||
*
|
||||
* @since JDK1.1
|
||||
*/
|
||||
public boolean markSupported()
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Set the current marked position in the stream.
|
||||
* SimpleCharArrayReader objects are marked at position zero by
|
||||
* default when constructed. They may be marked at another
|
||||
* position within the buffer by this method.
|
||||
*
|
||||
* @since JDK1.1
|
||||
*/
|
||||
public void mark(int readAheadLimit)
|
||||
{
|
||||
mark = pos;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resets the buffer to the marked position. The marked position
|
||||
* is the beginning unless another position was marked.
|
||||
* The value of <code>pos</code> is set to 0.
|
||||
*/
|
||||
public void reset()
|
||||
{
|
||||
|
||||
pos = mark;
|
||||
}
|
||||
|
||||
/**
|
||||
* Closes this input stream and releases any system resources
|
||||
* associated with the stream.
|
||||
* <p>
|
||||
*/
|
||||
public void close() throws IOException
|
||||
{
|
||||
isClosed = true;
|
||||
}
|
||||
|
||||
/** Check to make sure that the stream has not been closed */
|
||||
private void ensureOpen()
|
||||
{
|
||||
/* This method does nothing for now. Once we add throws clauses
|
||||
* to the I/O methods in this class, it will throw an IOException
|
||||
* if the stream has been closed.
|
||||
*/
|
||||
}
|
||||
|
||||
}
|
|
@ -1,170 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.io.*;
|
||||
import java.util.*;
|
||||
import java.text.*;
|
||||
|
||||
/**
|
||||
* This class is only used for SPEED. Its log function is not thread safe by
|
||||
* default.
|
||||
* It uses a BufferdWriter.
|
||||
* It registers with a logger manager, which can be used to flush several loggers
|
||||
* at once.
|
||||
* @todo: including the date slows down a lot
|
||||
* @version $Id$
|
||||
*/
|
||||
public class SimpleLogger
|
||||
{
|
||||
private final SimpleDateFormat formatter = new SimpleDateFormat ("HH:mm:ss:SSSS");
|
||||
|
||||
private Writer logFile;
|
||||
|
||||
private StringBuffer buffer = new StringBuffer(1000);
|
||||
|
||||
private long startTime = System.currentTimeMillis();
|
||||
private boolean includeDate;
|
||||
private boolean flushAtOnce = false;
|
||||
|
||||
|
||||
/**
|
||||
* Creates a new <code>SimpleLogger</code> instance.
|
||||
*
|
||||
* @param name a <code>String</code> value
|
||||
*/
|
||||
public SimpleLogger(String name)
|
||||
{
|
||||
init(name, true);
|
||||
}
|
||||
|
||||
/**
|
||||
* Creates a new <code>SimpleLogger</code> instance.
|
||||
*
|
||||
* @param name a <code>String</code> value
|
||||
* @param includeDate a <code>boolean</code> value
|
||||
*/
|
||||
public SimpleLogger(String name, boolean includeDate)
|
||||
{
|
||||
init(name, includeDate);
|
||||
}
|
||||
|
||||
public void setStartTime(long startTime)
|
||||
{
|
||||
this.startTime = startTime;
|
||||
}
|
||||
|
||||
public synchronized void logThreadSafe(String text)
|
||||
{
|
||||
log(text);
|
||||
}
|
||||
|
||||
public synchronized void logThreadSafe(Throwable t)
|
||||
{
|
||||
log(t);
|
||||
}
|
||||
|
||||
public void log(String text)
|
||||
{
|
||||
try
|
||||
{
|
||||
buffer.setLength(0);
|
||||
if (includeDate)
|
||||
{
|
||||
buffer.append(formatter.format(new Date())).append(": ").append(System.currentTimeMillis()-startTime).append(" ms: ");
|
||||
}
|
||||
buffer.append(text).append("\n");
|
||||
logFile.write(buffer.toString());
|
||||
if (flushAtOnce)
|
||||
{
|
||||
logFile.flush();
|
||||
}
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
System.out.println("Couldn't write to logfile");
|
||||
}
|
||||
}
|
||||
|
||||
public void log(Throwable t)
|
||||
{
|
||||
t.printStackTrace(new PrintWriter(logFile));
|
||||
}
|
||||
|
||||
public void setFlushAtOnce(boolean flush)
|
||||
{
|
||||
this.flushAtOnce = flush;
|
||||
}
|
||||
|
||||
public void flush() throws IOException
|
||||
{
|
||||
logFile.flush();
|
||||
}
|
||||
|
||||
private void init(String name, boolean includeDate)
|
||||
{
|
||||
try
|
||||
{
|
||||
// FIXME: the logs directory needs to be configurable
|
||||
logFile = new BufferedWriter(new FileWriter("logs/" + name + ".log"));
|
||||
SimpleLoggerManager.getInstance().register(this);
|
||||
}
|
||||
catch (IOException e)
|
||||
{
|
||||
System.out.println("IOException while creating logfile " + name + ":");
|
||||
e.printStackTrace();
|
||||
}
|
||||
}
|
||||
}
|
|
@ -1,111 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.util.*;
|
||||
import java.io.IOException;
|
||||
|
||||
/**
|
||||
* This singleton manages all loggers. It can be used to flush all SimpleLoggers
|
||||
* at once.
|
||||
* @version $Id$
|
||||
*/
|
||||
public class SimpleLoggerManager
|
||||
{
|
||||
private static SimpleLoggerManager instance = null;
|
||||
|
||||
private ArrayList logs;
|
||||
|
||||
private SimpleLoggerManager()
|
||||
{
|
||||
logs = new ArrayList();
|
||||
}
|
||||
|
||||
public void register(SimpleLogger logger)
|
||||
{
|
||||
logs.add(logger);
|
||||
}
|
||||
|
||||
public void flush() throws IOException
|
||||
{
|
||||
Iterator it = logs.iterator();
|
||||
IOException ex = null;
|
||||
while(it.hasNext())
|
||||
{
|
||||
try
|
||||
{
|
||||
SimpleLogger logger = (SimpleLogger)it.next();
|
||||
logger.flush();
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
ex = e;
|
||||
}
|
||||
}
|
||||
if (ex != null)
|
||||
{
|
||||
throw ex;
|
||||
}
|
||||
}
|
||||
|
||||
public synchronized static SimpleLoggerManager getInstance()
|
||||
{
|
||||
if (instance == null)
|
||||
{
|
||||
instance = new SimpleLoggerManager();
|
||||
}
|
||||
return instance;
|
||||
}
|
||||
}
|
|
@ -1,66 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.util.Observable;
|
||||
|
||||
public class SimpleObservable extends Observable
|
||||
{
|
||||
|
||||
public void setChanged()
|
||||
{
|
||||
super.setChanged();
|
||||
}
|
||||
}
|
|
@ -1,162 +0,0 @@
|
|||
package de.lanlab.larm.util;
|
||||
|
||||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
|
||||
/**
|
||||
* A simple string tokenizer that regards <b>one</b> character as a delimiter.
|
||||
* Compared to Sun's StringTokenizer, it returns an empty token if two
|
||||
* subsequent delimiters are found
|
||||
*
|
||||
* @author Clemens Marschner
|
||||
* @created 24. März 2002
|
||||
*/
|
||||
public class SimpleStringTokenizer
|
||||
{
|
||||
|
||||
String string;
|
||||
|
||||
int currPos;
|
||||
int maxPos;
|
||||
char delim;
|
||||
|
||||
|
||||
/**
|
||||
* Constructor for the SimpleStringTokenizer object
|
||||
*
|
||||
* @param string the string to be tokenized
|
||||
* @param delim the delimiter that splits the string
|
||||
*/
|
||||
public SimpleStringTokenizer(String string, char delim)
|
||||
{
|
||||
setString(string);
|
||||
setDelim(delim);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* sets the delimiter. The tokenizer is not reset.
|
||||
*
|
||||
* @param delim The new delim value
|
||||
*/
|
||||
public void setDelim(char delim)
|
||||
{
|
||||
this.delim = delim;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* sets the string and reinitializes the tokenizer. Allows for reusing the
|
||||
* tokenizer object
|
||||
*
|
||||
* @param string string to be tokenized
|
||||
*/
|
||||
public void setString(String string)
|
||||
{
|
||||
this.string = string;
|
||||
reset();
|
||||
|
||||
maxPos = string.length() - 1;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* resets the tokenizer. It will act like newly created
|
||||
*/
|
||||
public void reset()
|
||||
{
|
||||
currPos = 0;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* returns true if the end is not reached
|
||||
*
|
||||
* @return false if the end is reached.
|
||||
*/
|
||||
public boolean hasMore()
|
||||
{
|
||||
return currPos <= maxPos;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* returns the next token from the stream. returns an empty string if the
|
||||
* end is reached
|
||||
*
|
||||
* @return Description of the Return Value
|
||||
* @see java.util.StringTokenizer#nextToken
|
||||
*/
|
||||
public String nextToken()
|
||||
{
|
||||
int nextPos = string.indexOf(delim, currPos);
|
||||
if (nextPos == -1)
|
||||
{
|
||||
nextPos = maxPos + 1;
|
||||
}
|
||||
String sub;
|
||||
if (nextPos > currPos)
|
||||
{
|
||||
sub = string.substring(currPos, nextPos);
|
||||
}
|
||||
else
|
||||
{
|
||||
sub = "";
|
||||
}
|
||||
currPos = nextPos + 1;
|
||||
return sub;
|
||||
}
|
||||
}
|
||||
|
|
@ -1,145 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.io.Serializable;
|
||||
/**
|
||||
* Title: LARM Lanlab Retrieval Machine
|
||||
* Description:
|
||||
* Copyright: Copyright (c)
|
||||
* Company:
|
||||
* @author
|
||||
* @version 1.0
|
||||
*/
|
||||
|
||||
/**
|
||||
* thread safe state information.
|
||||
* The get methods are not synchronized. Clone the state object before using them
|
||||
* If you use a state object in a class, always return a clone
|
||||
* <pre>public class MyClass {
|
||||
* State state = new State("Running");
|
||||
* public State getState() { return state.cloneState() }</pre>
|
||||
*
|
||||
* note on serialization: if you deserialize a state, the state string will be newly created.
|
||||
* that means you then have to compare the states via equal() and not ==
|
||||
*/
|
||||
public class State implements Cloneable, Serializable
|
||||
{
|
||||
|
||||
private String state;
|
||||
private long stateSince;
|
||||
private Object info;
|
||||
|
||||
public State(String state)
|
||||
{
|
||||
setState(state);
|
||||
}
|
||||
|
||||
|
||||
private State(String state, long stateSince)
|
||||
{
|
||||
init(state, stateSince, null);
|
||||
}
|
||||
|
||||
private State(String state, long stateSince, Object info)
|
||||
{
|
||||
init(state, stateSince, info);
|
||||
}
|
||||
|
||||
private void init(String state, long stateSince, Object info)
|
||||
{
|
||||
this.state = state;
|
||||
this.stateSince = stateSince;
|
||||
this.info = info;
|
||||
}
|
||||
|
||||
public void setState(String state)
|
||||
{
|
||||
setState(state, null);
|
||||
}
|
||||
|
||||
public synchronized void setState(String state, Object info)
|
||||
{
|
||||
this.state = state;
|
||||
this.stateSince = System.currentTimeMillis();
|
||||
this.info = info;
|
||||
}
|
||||
|
||||
public String getState()
|
||||
{
|
||||
return state;
|
||||
}
|
||||
|
||||
public long getStateSince()
|
||||
{
|
||||
return stateSince;
|
||||
}
|
||||
|
||||
public Object getInfo()
|
||||
{
|
||||
return info;
|
||||
}
|
||||
|
||||
public synchronized Object clone()
|
||||
{
|
||||
return new State(state, stateSince, info);
|
||||
}
|
||||
|
||||
public State cloneState()
|
||||
{
|
||||
return (State)clone();
|
||||
}
|
||||
|
||||
}
|
|
@ -1,416 +0,0 @@
|
|||
package de.lanlab.larm.util;
|
||||
|
||||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
import java.io.*;
|
||||
import java.util.*;
|
||||
import de.lanlab.larm.parser.*;
|
||||
import java.net.*;
|
||||
import de.lanlab.larm.fetcher.*;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* Utility class for accessing page files through the store.log file.
|
||||
* Works like an iterator
|
||||
*/
|
||||
public class StoreLogFile implements Iterator
|
||||
{
|
||||
|
||||
public void remove()
|
||||
{
|
||||
throw new UnsupportedOperationException();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* @author Clemens Marschner
|
||||
* @version 1.0
|
||||
*/
|
||||
public class PageFileEntry
|
||||
{
|
||||
String url;
|
||||
int pageFileNo;
|
||||
int resultCode;
|
||||
String mimeType;
|
||||
int size;
|
||||
String title;
|
||||
int pageFileOffset;
|
||||
File pageFileDirectory;
|
||||
boolean hasPageFileEntry;
|
||||
int isFrame;
|
||||
|
||||
class PageFileInputStream extends InputStream
|
||||
{
|
||||
InputStream pageFileIS;
|
||||
long offset;
|
||||
|
||||
public PageFileInputStream() throws IOException
|
||||
{
|
||||
pageFileIS = new FileInputStream(new File(pageFileDirectory, "pagefile_" + pageFileNo + ".pfl"));
|
||||
offset = 0;
|
||||
pageFileIS.skip(pageFileOffset);
|
||||
}
|
||||
public int available() throws IOException
|
||||
{
|
||||
return Math.min(pageFileIS.available(), (int)(size - offset));
|
||||
}
|
||||
public void close() throws IOException
|
||||
{
|
||||
pageFileIS.close();
|
||||
}
|
||||
public void mark(int readLimit)
|
||||
{
|
||||
throw new UnsupportedOperationException();
|
||||
}
|
||||
public boolean markSupported()
|
||||
{
|
||||
return false;
|
||||
}
|
||||
public int read() throws IOException
|
||||
{
|
||||
if(offset >= size)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
int c = pageFileIS.read();
|
||||
if(c != -1)
|
||||
{
|
||||
offset ++;
|
||||
}
|
||||
return c;
|
||||
}
|
||||
|
||||
public int read(byte[] b) throws IOException
|
||||
{
|
||||
int len = Math.min((int)(size-offset), b.length);
|
||||
if(len > 0)
|
||||
{
|
||||
len = pageFileIS.read(b, 0, len);
|
||||
if(len != -1)
|
||||
{
|
||||
offset += len;
|
||||
}
|
||||
return len;
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
public int read(byte[] b, int off, int maxLen) throws IOException
|
||||
{
|
||||
int len = Math.min(Math.min((int)(size-offset), b.length), maxLen);
|
||||
if(len > 0)
|
||||
{
|
||||
len = pageFileIS.read(b, off, maxLen);
|
||||
if(len != -1)
|
||||
{
|
||||
offset += len;
|
||||
}
|
||||
return len;
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
public long skip(long n) throws IOException
|
||||
{
|
||||
n = Math.min(n, size-offset);
|
||||
n = pageFileIS.skip(n);
|
||||
if(n > 0)
|
||||
{
|
||||
offset+=n;
|
||||
}
|
||||
return n;
|
||||
}
|
||||
|
||||
|
||||
|
||||
}
|
||||
|
||||
public PageFileEntry(String storeLogLine, File pageFileDirectory)
|
||||
{
|
||||
String column=null;
|
||||
SimpleStringTokenizer t = new SimpleStringTokenizer(storeLogLine, '\t');
|
||||
try
|
||||
{
|
||||
|
||||
hasPageFileEntry = false;
|
||||
t.nextToken();
|
||||
url = t.nextToken();
|
||||
column = "isFrame";
|
||||
isFrame = Integer.parseInt(t.nextToken());
|
||||
t.nextToken(); // anchor
|
||||
column = "resultCode";
|
||||
resultCode = Integer.parseInt(t.nextToken());
|
||||
mimeType = t.nextToken();
|
||||
column = "size";
|
||||
size = Integer.parseInt(t.nextToken());
|
||||
title = t.nextToken();
|
||||
if(size > 0)
|
||||
{
|
||||
column = "pageFileNo";
|
||||
pageFileNo = Integer.parseInt(t.nextToken());
|
||||
column = "pageFileOffset";
|
||||
pageFileOffset = Integer.parseInt(t.nextToken());
|
||||
this.pageFileDirectory = pageFileDirectory;
|
||||
hasPageFileEntry = true;
|
||||
}
|
||||
}
|
||||
catch(NumberFormatException e) // possibly tab characters in title. ignore
|
||||
{
|
||||
//System.out.println(e + " at " + url + " in column " + column);
|
||||
}
|
||||
}
|
||||
|
||||
public InputStream getInputStream() throws IOException
|
||||
{
|
||||
if(hasPageFileEntry)
|
||||
{
|
||||
return new PageFileInputStream();
|
||||
}
|
||||
else return null;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
BufferedReader reader;
|
||||
boolean isOpen = false;
|
||||
File storeLog;
|
||||
|
||||
/**
|
||||
*
|
||||
* @param storeLog location of store.log from LogStorage. pagefile_xy.pfl
|
||||
* must be in the same directory
|
||||
* @throws IOException
|
||||
*/
|
||||
public StoreLogFile(File storeLog) throws IOException
|
||||
{
|
||||
this.storeLog = storeLog;
|
||||
reader = new BufferedReader(new FileReader(storeLog));
|
||||
isOpen = true; // unless exception
|
||||
|
||||
}
|
||||
|
||||
public boolean hasNext()
|
||||
{
|
||||
try
|
||||
{
|
||||
reader.mark(1000);
|
||||
if(reader.readLine() != null)
|
||||
{
|
||||
reader.reset();
|
||||
return true;
|
||||
}
|
||||
else
|
||||
{
|
||||
return false;
|
||||
}
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
throw new RuntimeException("IOException occured");
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* @return a StoreLogFile.PageFileEntry with the current file
|
||||
* @throws IOException
|
||||
*/
|
||||
public Object next()
|
||||
{
|
||||
try
|
||||
{
|
||||
return new PageFileEntry(reader.readLine(), storeLog.getParentFile());
|
||||
}
|
||||
catch(IOException e)
|
||||
{
|
||||
throw new RuntimeException("IOException occured");
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
// static SimpleLogger log;
|
||||
// static PageFileEntry entry;
|
||||
// static ArrayList foundURLs;
|
||||
// static URL base;
|
||||
// static URL contextUrl;
|
||||
//
|
||||
// static void test1(StoreLogFile store) throws IOException
|
||||
// {
|
||||
// while(store.hasNext())
|
||||
// {
|
||||
// PageFileEntry entry = store.next();
|
||||
// if(entry.mimeType.equals("text/plain") && entry.hasPageFileEntry)
|
||||
// {
|
||||
// BufferedReader r = new BufferedReader(new InputStreamReader(entry.getInputStream()));
|
||||
// String l;
|
||||
// while((l = r.readLine()) != null)
|
||||
// {
|
||||
// System.out.println(entry.url + " >> " + l);
|
||||
// }
|
||||
// r.close();
|
||||
// }
|
||||
// //System.out.println(entry.title);
|
||||
// }
|
||||
// }
|
||||
// static void test2(StoreLogFile store) throws Exception
|
||||
// {
|
||||
// MessageHandler msgH = new MessageHandler();
|
||||
// log = new SimpleLogger("errors.log");
|
||||
// msgH.addListener(new URLVisitedFilter(log, 100000));
|
||||
// final de.lanlab.larm.net.HostManager hm = new de.lanlab.larm.net.HostManager(1000);
|
||||
// hm.setHostResolver(new HostResolver());
|
||||
//
|
||||
// while(store.hasNext())
|
||||
// {
|
||||
// entry = store.next();
|
||||
// foundURLs = new ArrayList();
|
||||
// if(entry.mimeType.startsWith("text/html") && entry.hasPageFileEntry)
|
||||
// {
|
||||
// Tokenizer t = new Tokenizer();
|
||||
// base = new URL(entry.url);
|
||||
// contextUrl = new URL(entry.url);
|
||||
//
|
||||
// t.setLinkHandler(new LinkHandler()
|
||||
// {
|
||||
//
|
||||
// public void handleLink(String link, String anchor, boolean isFrame)
|
||||
// {
|
||||
// try
|
||||
// {
|
||||
// // cut out Ref part
|
||||
//
|
||||
//
|
||||
// int refPart = link.indexOf("#");
|
||||
// //System.out.println(link);
|
||||
// if (refPart == 0)
|
||||
// {
|
||||
// return;
|
||||
// }
|
||||
// else if (refPart > 0)
|
||||
// {
|
||||
// link = link.substring(0, refPart);
|
||||
// }
|
||||
//
|
||||
// URL url = null;
|
||||
// if (link.startsWith("http:"))
|
||||
// {
|
||||
// // distinguish between absolute and relative URLs
|
||||
//
|
||||
// url = new URL(link);
|
||||
// }
|
||||
// else
|
||||
// {
|
||||
// // relative url
|
||||
// url = new URL(base, link);
|
||||
// }
|
||||
//
|
||||
// URLMessage urlMessage = new URLMessage(url, contextUrl, isFrame ? URLMessage.LINKTYPE_FRAME : URLMessage.LINKTYPE_ANCHOR, anchor, hm.getHostResolver());
|
||||
//
|
||||
// String urlString = urlMessage.getURLString();
|
||||
//
|
||||
// foundURLs.add(urlMessage);
|
||||
// //messageHandler.putMessage(new actURLMessage(url)); // put them in the very end
|
||||
// }
|
||||
// catch (MalformedURLException e)
|
||||
// {
|
||||
// //log.log("malformed url: base:" + base + " -+- link:" + link);
|
||||
// log.log("warning: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
// }
|
||||
// catch (Exception e)
|
||||
// {
|
||||
// log.log("warning: " + e.getClass().getName() + ": " + e.getMessage());
|
||||
// // e.printStackTrace();
|
||||
// }
|
||||
//
|
||||
// }
|
||||
//
|
||||
//
|
||||
// /**
|
||||
// * called when a BASE tag was found
|
||||
// *
|
||||
// * @param base the HREF attribute
|
||||
// */
|
||||
// public void handleBase(String baseString)
|
||||
// {
|
||||
// try
|
||||
// {
|
||||
// base = new URL(baseString);
|
||||
// }
|
||||
// catch (MalformedURLException e)
|
||||
// {
|
||||
// log.log("warning: " + e.getClass().getName() + ": " + e.getMessage() + " while converting '" + base + "' to URL in document " + contextUrl);
|
||||
// }
|
||||
// }
|
||||
//
|
||||
// public void handleTitle(String value)
|
||||
// {}
|
||||
//
|
||||
//
|
||||
// });
|
||||
// t.parse(new BufferedReader(new InputStreamReader(entry.getInputStream())));
|
||||
// msgH.putMessages(foundURLs);
|
||||
// }
|
||||
//
|
||||
// }
|
||||
//
|
||||
// }
|
||||
//
|
||||
// public static void main(String[] args) throws Exception
|
||||
// {
|
||||
// StoreLogFile store = new StoreLogFile(new File("c:/java/jakarta-lucene-sandbox/contributions/webcrawler-LARM/logs/store.log"));
|
||||
// test2(store);
|
||||
// }
|
||||
|
||||
}
|
||||
|
|
@ -1,107 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
import java.net.URL;
|
||||
|
||||
/**
|
||||
* Description of the Class
|
||||
*
|
||||
* @author Administrator
|
||||
* @created 27. Januar 2002
|
||||
*/
|
||||
public class URLUtils
|
||||
{
|
||||
/**
|
||||
* does the same as URL.toExternalForm(), but leaves out the Ref part (which we would
|
||||
* cut off anyway) and handles the String Buffer so that no call of expandCapacity() will
|
||||
* be necessary
|
||||
* only meaningful if the default URLStreamHandler is used (as is the case with http, https, or shttp)
|
||||
*
|
||||
* @param u the URL to be converted
|
||||
* @return the URL as String
|
||||
*/
|
||||
public static String toExternalFormNoRef(URL u)
|
||||
{
|
||||
String protocol = u.getProtocol();
|
||||
String authority = u.getAuthority();
|
||||
String file = u.getFile();
|
||||
|
||||
StringBuffer result = new StringBuffer(
|
||||
(protocol == null ? 0 : protocol.length()) +
|
||||
(authority == null ? 0 : authority.length()) +
|
||||
(file == null ? 1 : file.length()) + 3
|
||||
);
|
||||
|
||||
result.append(protocol);
|
||||
result.append(":");
|
||||
if (u.getAuthority() != null && u.getAuthority().length() > 0)
|
||||
{
|
||||
result.append("//");
|
||||
result.append(u.getAuthority());
|
||||
}
|
||||
if (u.getFile() != null && u.getFile().length() > 0)
|
||||
{
|
||||
result.append(u.getFile());
|
||||
}
|
||||
else
|
||||
{
|
||||
result.append("/");
|
||||
}
|
||||
|
||||
return result.toString();
|
||||
}
|
||||
|
||||
}
|
|
@ -1,69 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
/**
|
||||
* Title: LARM
|
||||
* Description:
|
||||
* Copyright: Copyright (c) 2001
|
||||
* Company: LMU-IP
|
||||
* @author Clemens Marschner
|
||||
* @version 1.0
|
||||
*/
|
||||
|
||||
|
||||
public class UnderflowException extends RuntimeException
|
||||
{
|
||||
}
|
|
@ -1,232 +0,0 @@
|
|||
/* ====================================================================
|
||||
* The Apache Software License, Version 1.1
|
||||
*
|
||||
* Copyright (c) 2001 The Apache Software Foundation. All rights
|
||||
* reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
* are met:
|
||||
*
|
||||
* 1. Redistributions of source code must retain the above copyright
|
||||
* notice, this list of conditions and the following disclaimer.
|
||||
*
|
||||
* 2. Redistributions in binary form must reproduce the above copyright
|
||||
* notice, this list of conditions and the following disclaimer in
|
||||
* the documentation and/or other materials provided with the
|
||||
* distribution.
|
||||
*
|
||||
* 3. The end-user documentation included with the redistribution,
|
||||
* if any, must include the following acknowledgment:
|
||||
* "This product includes software developed by the
|
||||
* Apache Software Foundation (http://www.apache.org/)."
|
||||
* Alternately, this acknowledgment may appear in the software itself,
|
||||
* if and wherever such third-party acknowledgments normally appear.
|
||||
*
|
||||
* 4. The names "Apache" and "Apache Software Foundation" and
|
||||
* "Apache Lucene" must not be used to endorse or promote products
|
||||
* derived from this software without prior written permission. For
|
||||
* written permission, please contact apache@apache.org.
|
||||
*
|
||||
* 5. Products derived from this software may not be called "Apache",
|
||||
* "Apache Lucene", nor may "Apache" appear in their name, without
|
||||
* prior written permission of the Apache Software Foundation.
|
||||
*
|
||||
* THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
|
||||
* WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
|
||||
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
* DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
|
||||
* ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
|
||||
* USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
||||
* ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
|
||||
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||||
* SUCH DAMAGE.
|
||||
* ====================================================================
|
||||
*
|
||||
* This software consists of voluntary contributions made by many
|
||||
* individuals on behalf of the Apache Software Foundation. For more
|
||||
* information on the Apache Software Foundation, please see
|
||||
* <http://www.apache.org/>.
|
||||
*/
|
||||
|
||||
package de.lanlab.larm.util;
|
||||
|
||||
|
||||
import java.net.URL;
|
||||
import java.util.HashMap;
|
||||
import java.util.Date;
|
||||
import java.util.Set;
|
||||
import de.lanlab.larm.fetcher.URLMessage;
|
||||
import de.lanlab.larm.net.HostManager;
|
||||
import de.lanlab.larm.net.*;
|
||||
|
||||
/**
|
||||
* a web document of whatever type. generated by a fetcher task
|
||||
*/
|
||||
public class WebDocument extends URLMessage
|
||||
{
|
||||
protected String mimeType;
|
||||
// protected byte[] document;
|
||||
protected int resultCode;
|
||||
protected int size;
|
||||
protected String title;
|
||||
protected Date lastModified;
|
||||
HashMap fields;
|
||||
boolean isModified;
|
||||
|
||||
public WebDocument(URLMessage msg)
|
||||
{
|
||||
super(msg);
|
||||
this.mimeType = "";
|
||||
this.resultCode = -1;
|
||||
this.size = -1;
|
||||
this.title = "";
|
||||
this.lastModified = new Date();
|
||||
clearFields();
|
||||
this.isModified = true;
|
||||
}
|
||||
|
||||
public WebDocument(URL url, String mimeType, int resultCode, URL referer, int size, String title, Date lastModified, HostResolver hm)
|
||||
{
|
||||
super(url, referer, URLMessage.LINKTYPE_ANCHOR, null, hm);
|
||||
this.url = url;
|
||||
this.mimeType = mimeType;
|
||||
//this.document = document;
|
||||
this.resultCode = resultCode;
|
||||
this.size = size;
|
||||
this.title = title;
|
||||
this.lastModified = lastModified;
|
||||
clearFields();
|
||||
this.isModified = true;
|
||||
}
|
||||
|
||||
public void setModified(boolean modified)
|
||||
{
|
||||
this.isModified = modified;
|
||||
}
|
||||
|
||||
public boolean isModified()
|
||||
{
|
||||
return isModified;
|
||||
}
|
||||
|
||||
public void clearFields()
|
||||
{
|
||||
this.fields = new HashMap(7);
|
||||
}
|
||||
|
||||
public Set getFieldNames()
|
||||
{
|
||||
return fields.keySet();
|
||||
}
|
||||
|
||||
public Object getField(String name)
|
||||
{
|
||||
return fields.get(name);
|
||||
}
|
||||
|
||||
public void addField(String name, Object value)
|
||||
{
|
||||
fields.put(name, value);
|
||||
}
|
||||
|
||||
public void removeField(String name)
|
||||
{
|
||||
fields.remove(name);
|
||||
}
|
||||
|
||||
public int getNumFields()
|
||||
{
|
||||
return fields.size();
|
||||
}
|
||||
|
||||
|
||||
public Date getLastModified()
|
||||
{
|
||||
return lastModified;
|
||||
}
|
||||
|
||||
public void setLastModified(Date lastModified)
|
||||
{
|
||||
this.lastModified = lastModified;
|
||||
}
|
||||
|
||||
public String getTitle()
|
||||
{
|
||||
return title;
|
||||
}
|
||||
|
||||
public URL getUrl()
|
||||
{
|
||||
return url;
|
||||
}
|
||||
|
||||
public int getSize()
|
||||
{
|
||||
return this.size;
|
||||
}
|
||||
|
||||
public void setSize(int size)
|
||||
{
|
||||
this.size = size;
|
||||
}
|
||||
|
||||
/*
|
||||
public void setDocument(byte[] document)
|
||||
{
|
||||
this.document = document;
|
||||
}
|
||||
*/
|
||||
|
||||
public int getResultCode()
|
||||
{
|
||||
return resultCode;
|
||||
}
|
||||
|
||||
public void setResultCode(int resultCode)
|
||||
{
|
||||
this.resultCode = resultCode;
|
||||
}
|
||||
|
||||
/*
|
||||
public byte[] getDocumentBytes()
|
||||
{
|
||||
return this.document;
|
||||
}
|
||||
*/
|
||||
|
||||
public void setUrl(URL url)
|
||||
{
|
||||
this.url = url;
|
||||
}
|
||||
|
||||
public void setMimeType(String mimeType)
|
||||
{
|
||||
this.mimeType = mimeType;
|
||||
}
|
||||
|
||||
public void setTitle(String title)
|
||||
{
|
||||
this.title = title;
|
||||
}
|
||||
|
||||
|
||||
public String getMimeType()
|
||||
{
|
||||
return mimeType;
|
||||
}
|
||||
|
||||
public String getInfo()
|
||||
{
|
||||
return super.getInfo() + "\t" +
|
||||
this.resultCode + "\t" +
|
||||
this.mimeType + "\t" +
|
||||
this.size + "\t" +
|
||||
"\"" + this.title.replace('\t',' ').replace('\"', (char)0xff ).replace('\n',' ').replace('\r',' ') + "\"\t" + (this.lastModified != null ? java.text.DateFormat.getDateTimeInstance(java.text.DateFormat.SHORT, java.text.DateFormat.SHORT).format(this.lastModified) : "");
|
||||
}
|
||||
|
||||
|
||||
}
|
|
@ -1,3 +0,0 @@
|
|||
LARM - Lucene Advanced Retrieval Machine
|
||||
|
||||
goes here
|
|
@ -1,282 +0,0 @@
|
|||
{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1031\deflangfe1031{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}
|
||||
{\f2\fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}{\f3\froman\fcharset2\fprq2{\*\panose 05050102010706020507}Symbol;}{\f4\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times;}
|
||||
{\f5\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Helvetica;}{\f14\fnil\fcharset2\fprq2{\*\panose 05000000000000000000}Wingdings;}{\f28\froman\fcharset238\fprq2 Times New Roman CE;}{\f29\froman\fcharset204\fprq2 Times New Roman Cyr;}
|
||||
{\f31\froman\fcharset161\fprq2 Times New Roman Greek;}{\f32\froman\fcharset162\fprq2 Times New Roman Tur;}{\f33\froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\f34\froman\fcharset178\fprq2 Times New Roman (Arabic);}
|
||||
{\f35\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f36\fswiss\fcharset238\fprq2 Arial CE;}{\f37\fswiss\fcharset204\fprq2 Arial Cyr;}{\f39\fswiss\fcharset161\fprq2 Arial Greek;}{\f40\fswiss\fcharset162\fprq2 Arial Tur;}
|
||||
{\f41\fswiss\fcharset177\fprq2 Arial (Hebrew);}{\f42\fswiss\fcharset178\fprq2 Arial (Arabic);}{\f43\fswiss\fcharset186\fprq2 Arial Baltic;}{\f60\froman\fcharset238\fprq2 Times CE;}{\f61\froman\fcharset204\fprq2 Times Cyr;}
|
||||
{\f63\froman\fcharset161\fprq2 Times Greek;}{\f64\froman\fcharset162\fprq2 Times Tur;}{\f65\froman\fcharset177\fprq2 Times (Hebrew);}{\f66\froman\fcharset178\fprq2 Times (Arabic);}{\f67\froman\fcharset186\fprq2 Times Baltic;}
|
||||
{\f68\fswiss\fcharset238\fprq2 Helvetica CE;}{\f69\fswiss\fcharset204\fprq2 Helvetica Cyr;}{\f71\fswiss\fcharset161\fprq2 Helvetica Greek;}{\f72\fswiss\fcharset162\fprq2 Helvetica Tur;}{\f73\fswiss\fcharset177\fprq2 Helvetica (Hebrew);}
|
||||
{\f74\fswiss\fcharset178\fprq2 Helvetica (Arabic);}{\f75\fswiss\fcharset186\fprq2 Helvetica Baltic;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;
|
||||
\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \snext0 Normal;}{\s1\ql \fi-432\li432\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx432\aspalpha\aspnum\faauto\ls1\adjustright\rin0\lin432\itap0 \b\f1\fs32\lang1033\langfe1031\kerning32\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 1;}{\s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 2;}{\s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 3;}{\s4\ql \fi-864\li864\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx864\aspalpha\aspnum\faauto\ls1\ilvl3\adjustright\rin0\lin864\itap0 \b\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 4;}{\s5\ql \fi-1008\li1008\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1008\aspalpha\aspnum\faauto\ls1\ilvl4\adjustright\rin0\lin1008\itap0 \b\i\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 5;}{\s6\ql \fi-1152\li1152\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1152\aspalpha\aspnum\faauto\ls1\ilvl5\adjustright\rin0\lin1152\itap0 \b\f4\fs22\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 6;}{\s7\ql \fi-1296\li1296\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1296\aspalpha\aspnum\faauto\ls1\ilvl6\adjustright\rin0\lin1296\itap0 \f4\fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 7;}{\s8\ql \fi-1440\li1440\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1440\aspalpha\aspnum\faauto\ls1\ilvl7\adjustright\rin0\lin1440\itap0 \i\f4\fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 8;}{\s9\ql \fi-1584\li1584\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1584\aspalpha\aspnum\faauto\ls1\ilvl8\adjustright\rin0\lin1584\itap0 \f5\fs22\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 9;}{\*\cs10 \additive Default Paragraph Font;}{
|
||||
\s15\qc \li0\ri0\sb240\sa60\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0 \b\f5\fs32\lang1033\langfe1031\kerning28\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext15 Title;}{
|
||||
\s16\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext16 Heading1;}{\s17\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0
|
||||
\i\fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext17 Body Text;}{\s18\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd
|
||||
toc 1;}{\s19\ql \li240\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin240\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 2;}{
|
||||
\s20\ql \li480\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin480\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 3;}{
|
||||
\s21\ql \li720\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin720\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 4;}{
|
||||
\s22\ql \li960\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin960\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 5;}{
|
||||
\s23\ql \li1200\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1200\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 6;}{
|
||||
\s24\ql \li1440\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1440\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 7;}{
|
||||
\s25\ql \li1680\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1680\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 8;}{
|
||||
\s26\ql \li1920\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin1920\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 \sautoupd toc 9;}{\*\cs27 \additive \ul\cf12 \sbasedon10 FollowedHyperlink;}{\*\cs28 \additive
|
||||
\ul\cf2 \sbasedon10 Hyperlink;}{\s29\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs18\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext29 Body Text 2;}{
|
||||
\s30\qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs18\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \sbasedon0 \snext30 Body Text 3;}}{\*\listtable{\list\listtemplateid-100094782\listhybrid{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01-;}{\levelnumbers;}\loch\af0\hich\af0\dbch\af0\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720\jclisttab\tx720 }{\listlevel
|
||||
\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1440\jclisttab\tx1440 }{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2160\jclisttab\tx2160 }{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc23\levelnfcn23
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li4320\jclisttab\tx4320 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li6480\jclisttab\tx6480 }{\listname ;}\listid72943879}{\list\listtemplateid1225178542\listhybrid
|
||||
{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\loch\af3\hich\af3\dbch\af0\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720
|
||||
\jclisttab\tx720 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1440\jclisttab\tx1440 }
|
||||
{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2160\jclisttab\tx2160 }{\listlevel
|
||||
\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc23\levelnfcn23
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li4320\jclisttab\tx4320 }{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li6480\jclisttab\tx6480 }{\listname ;}\listid401611123}{\list\listtemplateid1804128586{\listlevel
|
||||
\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\'00;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s1\fi-432\li432\jclisttab\tx432 }{\listlevel\levelnfc0\levelnfcn0
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'03\'00.\'01;}{\levelnumbers\'01\'03;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s2\fi-576\li576\jclisttab\tx576 }{\listlevel\levelnfc0\levelnfcn0\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'05\'00.\'01.\'02;}{\levelnumbers\'01\'03\'05;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s3\fi-720\li720\jclisttab\tx720 }{\listlevel\levelnfc0\levelnfcn0\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'07\'00.\'01.\'02.\'03;}{\levelnumbers\'01\'03\'05\'07;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s4\fi-864\li864\jclisttab\tx864 }{\listlevel\levelnfc0\levelnfcn0
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'09\'00.\'01.\'02.\'03.\'04;}{\levelnumbers\'01\'03\'05\'07\'09;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s5\fi-1008\li1008\jclisttab\tx1008 }{\listlevel
|
||||
\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0b\'00.\'01.\'02.\'03.\'04.\'05;}{\levelnumbers\'01\'03\'05\'07\'09\'0b;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s6\fi-1152\li1152
|
||||
\jclisttab\tx1152 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0d\'00.\'01.\'02.\'03.\'04.\'05.\'06;}{\levelnumbers\'01\'03\'05\'07\'09\'0b\'0d;}\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1 \s7\fi-1296\li1296\jclisttab\tx1296 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0f\'00.\'01.\'02.\'03.\'04.\'05.\'06.\'07;}{\levelnumbers
|
||||
\'01\'03\'05\'07\'09\'0b\'0d\'0f;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s8\fi-1440\li1440\jclisttab\tx1440 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\'11\'00.\'01.\'02.\'03.\'04.\'05.\'06.\'07.\'08;}{\levelnumbers\'01\'03\'05\'07\'09\'0b\'0d\'0f\'11;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s9\fi-1584\li1584\jclisttab\tx1584 }{\listname ;}\listid854879813}{\list\listtemplateid-1571007954
|
||||
\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01-;}{\levelnumbers;}\loch\af0\hich\af0\dbch\af0\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720
|
||||
\jclisttab\tx720 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1440\jclisttab\tx1440 }
|
||||
{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2160\jclisttab\tx2160 }{\listlevel
|
||||
\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc23\levelnfcn23
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li4320\jclisttab\tx4320 }{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li6480\jclisttab\tx6480 }{\listname ;}\listid1061900166}{\list\listtemplateid358938326\listhybrid
|
||||
{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat4\levelspace0\levelindent0{\leveltext\'01-;}{\levelnumbers;}\loch\af0\hich\af0\dbch\af0\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li360
|
||||
\jclisttab\tx360 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1080\jclisttab\tx1080 }
|
||||
{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1800\jclisttab\tx1800 }{\listlevel
|
||||
\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2520\jclisttab\tx2520 }{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3240\jclisttab\tx3240 }{\listlevel\levelnfc23\levelnfcn23
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3960\jclisttab\tx3960 }{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li4680\jclisttab\tx4680 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5400\jclisttab\tx5400 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li6120\jclisttab\tx6120 }{\listname ;}\listid1318614571}{\list\listtemplateid375532796\listhybrid
|
||||
{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid1944109128\'02\'00.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720
|
||||
\jclisttab\tx720 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567641\'02\'01.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li1440
|
||||
\jclisttab\tx1440 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567643\'02\'02.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li2160
|
||||
\jclisttab\tx2160 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567631\'02\'03.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li2880
|
||||
\jclisttab\tx2880 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567641\'02\'04.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li3600
|
||||
\jclisttab\tx3600 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567643\'02\'05.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li4320
|
||||
\jclisttab\tx4320 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567631\'02\'06.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li5040
|
||||
\jclisttab\tx5040 }{\listlevel\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567641\'02\'07.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-360\li5760
|
||||
\jclisttab\tx5760 }{\listlevel\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\leveltemplateid67567643\'02\'08.;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \fi-180\li6480
|
||||
\jclisttab\tx6480 }{\listname ;}\listid1371959374}}{\*\listoverridetable{\listoverride\listid854879813\listoverridecount0\ls1}{\listoverride\listid1061900166\listoverridecount0\ls2}{\listoverride\listid401611123\listoverridecount0\ls3}
|
||||
{\listoverride\listid72943879\listoverridecount0\ls4}{\listoverride\listid1318614571\listoverridecount0\ls5}{\listoverride\listid1371959374\listoverridecount0\ls6}}{\*\revtbl {Unknown;}{Peter Carlson;}{Clemens Marschner;}}{\info
|
||||
{\title Create the ability to take data from different data sources including web pages, sql queries, and a file systems, and put them}{\author Peter Carlson}{\operator Clemens Marschner}{\creatim\yr2002\mo12\dy2\min41}{\revtim\yr2002\mo12\dy2\min41}
|
||||
{\version2}{\edmins0}{\nofpages6}{\nofwords1570}{\nofchars8952}{\*\company Book and Hammer}{\nofcharsws10993}{\vern8249}}\widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\formshade\horzdoc\dgmargin\dghspace180\dgvspace180
|
||||
\dghorigin1800\dgvorigin1440\dghshow1\dgvshow1\jexpand\viewkind1\viewscale133\viewzk2\pgbrdrhead\pgbrdrfoot\splytwnine\ftnlytwnine\htmautsp\nolnhtadjtbl\useltbaln\alntblind\lytcalctblwd\lyttblrtgr\lnbrkrule \fet0\sectd
|
||||
\linex0\endnhere\sectlinegrid360\sectdefaultcl {\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4
|
||||
\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}
|
||||
{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain
|
||||
\s15\qc \li0\ri0\sb240\sa60\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0 \b\f5\fs32\lang1033\langfe1031\kerning28\cgrid\langnp1033\langfenp1031 {\fs28\lang1040\langfe1031\langnp1040 Lucene Retrieval Machine
|
||||
\par Lucene Framework
|
||||
\par }{\fs28\lang1031\langfe1031\langnp1031 Mission Document
|
||||
\par }\pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {\lang1040\langfe1031\langnp1040
|
||||
\par }{\lang1031\langfe1031\langnp1031 Revision: 5 (cmarschn, 2002-12-01)
|
||||
\par
|
||||
\par Clemens Marschner - Otis Gospodnetic - Peter Carlson - Kelvin Tan
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par
|
||||
\par {\*\bkmkstart _Toc26539139}{\listtext\pard\plain\s1 \b\f1\fs32\lang1033\langfe1031\kerning32\langnp1033 \hich\af1\dbch\af0\loch\f1 1\tab}}\pard\plain \s1\ql \fi-432\li432\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx432\aspalpha\aspnum\faauto\ls1\outlinelevel0\adjustright\rin0\lin432\itap0 \b\f1\fs32\lang1033\langfe1031\kerning32\cgrid\langnp1033\langfenp1031 {Mission{\*\bkmkend _Toc26539139}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {\fs56 \'93
|
||||
\par }\pard \ql \fi720\li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {The }{\b Lucene Retrieval Machine}{ forms a complete and highly scalable search solut
|
||||
ion for end-users of the Lucene search engine: Capable of intelligently indexing data from various sources, preprocessing of source documents configurable by the end user, up to a best-practice implementation of online search functionality.
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {
|
||||
\par }\pard \ql \fi720\li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {It will be based on the }{\b Lucene Framework }{
|
||||
that provides implementations for data aggregation and indexing functionality utilizing the Lucene indexing API, while being easily extensible and constructible by application developers or researchers.
|
||||
\par }\pard \qr \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\fs56 \'94}{
|
||||
\par {\*\bkmkstart _Toc26539140}{\listtext\pard\plain\s1 \b\f1\fs32\lang1033\langfe1031\kerning32\langnp1033 \hich\af1\dbch\af0\loch\f1 2\tab}}\pard\plain \s1\ql \fi-432\li432\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx432\aspalpha\aspnum\faauto\ls1\outlinelevel0
|
||||
\jclisttab\tx432\aspalpha\aspnum\faauto\ls1\outlinelevel0\pnrauth1\pnrdate1718329849\pnrstart0\pnrxst1\pnrxst0\pnrxst0\pnrxst0\pnrstop4\pnrstart1\pnrrgb1\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrstop9\pnrstart2\pnrnfc0
|
||||
\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc2\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrstop18\pnrstart3\pnrpnbr2\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0
|
||||
\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrstop36
|
||||
\adjustright\rin0\lin432\itap0 \b\f1\fs32\lang1033\langfe1031\kerning32\cgrid\langnp1033\langfenp1031 {\page Background and goals{\*\bkmkend _Toc26539140}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {L
|
||||
ucene 1.2 has become an increasingly popular library for incorporating search functionality into Java applications. As it stands, the Lucene codebase is an API, as opposed to a search framework, or a full-fledged search application. The consequence is tha
|
||||
t developers (users of the API) often need to implement commonly-used but non-API functionality, in addition to any custom search requirements, to arrive at a working search application.
|
||||
\par
|
||||
\par This proposal is a plan to leverage existing contributions and integr
|
||||
ate them into a Lucene search framework. We believe it will significantly reduce the time taken to implement a searching and indexing solution using Lucene, provide deeper functionality for a basic application and lower the time it takes for a new user to
|
||||
be productive. It will also constitute a best-practice model of how the Lucene user community creates efficient, flexible search applications, by means of the framework.
|
||||
\par
|
||||
\par This framework is a foundation of what we call the Lucene Retrieval Machine (LARM). I
|
||||
n contrast to the framework, the retrieval machine forms an application that can be installed and used by an end user. We think that the framework can be so modular that LARM can in fact form many different applications, from indexing database tables to f
|
||||
ile systems, up to indexing a large portion of the web. Many of these configurations can come out-of-the box.
|
||||
\par
|
||||
\par To take a familiar example, the Apache Web Server provides a framework for pluggable modules (the Apache modules) that allow for easily extending
|
||||
the web server using standard interfaces. On the other hand it is also easily configurable by an end user, and can be used without further programming. We want LARM to be even more flexible, to consist of merely a microkernel and consisting mainly of plug
|
||||
gable components.
|
||||
\par
|
||||
\par {\*\bkmkstart _Toc26539142}{\listtext\pard\plain\s1 \b\f1\fs32\lang1033\langfe1031\kerning32\langnp1033 \hich\af1\dbch\af0\loch\f1 3\tab}}\pard\plain \s1\ql \fi-432\li432\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx432\aspalpha\aspnum\faauto\ls1\outlinelevel0\adjustright\rin0\lin432\itap0 \b\f1\fs32\lang1033\langfe1031\kerning32\cgrid\langnp1033\langfenp1031 {\page Fulfilled User Needs{\*\bkmkend _Toc26539142}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {The challenges faced by a typical Lucene user are:\line \line 1. Retrieving data from a myriad of data sources
|
||||
\line 2. Processing this data to be added to the search index\line 3. Searching the index and displaying the search results
|
||||
\par 4. Provide a consistent mechanism to access and display the detailed content\line \line Correspondingly, Lucene Framework addresses these challenges by \line providing\line \line - A data gather interface for retrieving data from various sources\line
|
||||
- An indexing framework for updating indexes and indexing different kinds of data \line - A Search API which simplifies searching through abstraction, and provides display-related functionality\line \line In addition, Lucene Framework also provides\line
|
||||
\line - IndexReader/IndexSearcher pooling mechanism (?)\line - Ability to run in single-VM or client/server mode (?)
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li360\ri0\widctlpar\jclisttab\tx360\aspalpha\aspnum\faauto\ls5\adjustright\rin0\lin360\itap0 {Ability to access multiple indexes to search
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}Provide consistent link between search and index analyzer\line \line
|
||||
Hooks in the framework allow users to customize how the framework functions. These exist in the form of pluggable implementations of classes and user-defined "steps" in a pipeline. These are:\line \line
|
||||
- Data sources which allows the user to abstract the data to be retrieved\line - Filters for pre- and post(?)-processing of what data needs to be indexed\line - Content handlers which define strategies for indexing different file types (e.g XML, PDF, etc)
|
||||
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}Storage handlers so there can be a consistent way to access data from multiple data sources\line \line
|
||||
With these, Lucene Framework should address most, if not all, of a developer's requirements for a Lucene application.
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057
|
||||
\par
|
||||
\par {\*\bkmkstart _Toc26539143}{\listtext\pard\plain\s1 \b\f1\fs32\lang2057\langfe1031\kerning32\langnp2057 \hich\af1\dbch\af0\loch\f1 4\tab}}\pard\plain \s1\ql \fi-432\li432\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx432\aspalpha\aspnum\faauto\ls1\outlinelevel0\adjustright\rin0\lin432\itap0 \b\f1\fs32\lang1033\langfe1031\kerning32\cgrid\langnp1033\langfenp1031 {\lang2057\langfe1031\langnp2057 \page Scope/Requirements{\*\bkmkend _Toc26539143}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {\lang2057\langfe1031\langnp2057 We see LARM to be comprised of the following major components:
|
||||
\par 1) Data Gathering (web pages, database, file system)\line 2) Index creation (ability to handle data in different formats with \line configuration)\line 3) Search interface \line 4) Data display (ability to see content details of the data)
|
||||
\par {\*\bkmkstart _Toc26539144}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang2057\langfe1031\langnp2057 \hich\af1\dbch\af0\loch\f1 4.1\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {\lang2057\langfe1031\langnp2057 Data Gathering{\*\bkmkend _Toc26539144}
|
||||
\par {\*\bkmkstart _Toc26539145}{\listtext\pard\plain\s3 \b\f1\fs26\lang2057\langfe1031\langnp2057 \hich\af1\dbch\af0\loch\f1 4.1.1\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {\lang2057\langfe1031\langnp2057 Data Sources{\*\bkmkend _Toc26539145}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {
|
||||
A major area of work will be the document sources section. It will comprise of components that may vary from very simple to complex
|
||||
\par
|
||||
\par {\listtext\pard\plain\f3\lang1033\langfe1031\langnp1033 \loch\af3\dbch\af0\hich\f3 \'b7\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls3
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls3\pnrauth1\pnrdate1718329849\pnrstart0\pnrxst1\pnrxst0\pnrxst183\pnrxst240\pnrstop4\pnrstart1\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrstop9\pnrstart2\pnrnfc23\pnrnfc23
|
||||
\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrstop18\pnrstart3\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0
|
||||
\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrstop36
|
||||
\adjustright\rin0\lin720\itap0 {Database
|
||||
\par {\listtext\pard\plain\f3\lang1033\langfe1031\langnp1033 \loch\af3\dbch\af0\hich\f3 \'b7\tab}File System
|
||||
\par {\listtext\pard\plain\f3\lang1033\langfe1031\langnp1033 \loch\af3\dbch\af0\hich\f3 \'b7\tab}Web: This (assumingly large) subproject seeks for efficiently crawling the web. It\rquote s described elsewhere
|
||||
\par {\listtext\pard\plain\f3\lang1033\langfe1031\langnp1033 \loch\af3\dbch\af0\hich\f3 \'b7\tab}Something else: JMS Queues, Mailing lists, mail servers, or whatever else contains text or can be converted to text
|
||||
\par {\*\bkmkstart _Toc26539146}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.1.2\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Operating Modes{\*\bkmkend _Toc26539146}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {LARM will mostly be concerned with updating an already existing index. We don\rquote
|
||||
t want a solution that has to rebuild the whole index from scratch. It should do only as little work as necessary while indexing. That means, if possible, only the data that has changed is updated. We see two modes
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls2
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls2\pnrauth1\pnrdate1718329849\pnrstart0\pnrxst1\pnrxst0\pnrxst45\pnrxst0\pnrstop4\pnrstart1\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrstop9\pnrstart2\pnrnfc23\pnrnfc23
|
||||
\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrstop18\pnrstart3\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0
|
||||
\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrstop36
|
||||
\adjustright\rin0\lin720\itap0 {Scheduled full (re)indexing. The index is completely rebuilt from time to time. We want to avoid having to take external programs to do that, so a configurable scheduler that triggers the process should be incorporated
|
||||
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}Increme
|
||||
ntal re-indexing: If possible, re-indexing should take place incrementally: the index should only be updated if documents were added, modified or deleted. This requires a notification from the data source, e.g. by database triggers or file system notifica
|
||||
tion. As data sources will typically not be aware of LARM indexing, LARM will poll these data sources periodically. LARM will check for any changes such as additions, modifications or deletions.
|
||||
\par {\*\bkmkstart _Toc26539147}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.2\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Indexing{\*\bkmkend _Toc26539147}
|
||||
\par {\*\bkmkstart _Toc26539148}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.2.1\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Data Processing{\*\bkmkend _Toc26539148}
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard\plain \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls2
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls2\pnrauth1\pnrdate1718329849\pnrstart0\pnrxst1\pnrxst0\pnrxst45\pnrxst0\pnrstop4\pnrstart1\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrstop9\pnrstart2\pnrnfc23\pnrnfc23
|
||||
\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc0\pnrnfc0\pnrnfc2\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrstop18\pnrstart3\pnrpnbr2\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0
|
||||
\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrstop36
|
||||
\adjustright\rin0\lin720\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Preprocessing means making differen
|
||||
t kinds of data available for the indexer. This means HTML, PDF etc. must first be transformed to a format that can be understood by the indexer and other storage devices. This means fielded data in text, number or date format.
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}The data is then put into a
|
||||
pipeline of components that may work on the data in order to transform different input formats into a set of fields and tokens. This step can involve putting headings and body text into different fields, or doing linguistic analysis on the content. [I thi
|
||||
nk a distinction needs to be made between the Lucene analyzer/tokenizer and the pre or post processor]
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}Also are we going to support field types?
|
||||
\par {\*\bkmkstart _Toc26539149}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.2.2\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Indexing Step{\*\bkmkend _Toc26539149}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {For indexing itself we want to use Lucene. One of the components in the data processing pipeline w
|
||||
ill use the Lucene API to add incoming data to the index. It puts tokenized fields into a searchable index. It can also store the raw data or index the raw data. The index option will be configurable. No other components in the pipeline should know about
|
||||
Lucene. For them this is just another step in the pipeline, another component.
|
||||
\par {\*\bkmkstart _Toc26539150}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.3\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Searching{\*\bkmkend _Toc26539150}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {
|
||||
The best practices way of searching and returning search results, including functionality such as index reader pooling, sorting search results and paging.
|
||||
\par Proposed: SearchBean, in conjunction with the indexing framework.
|
||||
\par Implementation:
|
||||
\par Create a standard API to search the index. When items are updated, deleted or added, by the Indexing engine, have the index}{\revised\revauth2\revdttm107744546 }{automatically reflect this updated information including creati
|
||||
ng new sorted indexes and not having to worry about which index searcher to use. Also, increase performance by using pooled index searchers, which get updated at the appropriate time.}{\f5\cf1
|
||||
\par {\*\bkmkstart _Toc26539151}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.4\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Nonfunctional Requirements{\*\bkmkend _Toc26539151}
|
||||
\par {\*\bkmkstart _Toc26539152}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.4.1\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Scalability{\*\bkmkend _Toc26539152}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {We want the whole system to scal
|
||||
e up to hundreds of millions of documents. This is what we see as one of the major challenges why we work on this project.
|
||||
\par Scalability means
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls2
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls2\pnrauth1\pnrdate1718329849\pnrstart0\pnrxst1\pnrxst0\pnrxst45\pnrxst0\pnrstop4\pnrstart1\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrstop9\pnrstart2\pnrnfc23\pnrnfc23
|
||||
\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc0\pnrnfc0\pnrnfc4\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrstop18\pnrstart3\pnrpnbr4\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0
|
||||
\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrstop36
|
||||
\adjustright\rin0\lin720\itap0 {We have to avoid algorithms that have a higher than constant complexity,
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}We have to avoid internal I/O (hard drive or network accesses),
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}We will have to use efficient data structures and/or compression, where applicable.
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}In order to avoid more than necessary synchronization points between processes, we have to be able to exchange much of the data in }{\b batch mode}{, as
|
||||
well to cater the several orders of magnitude slower communication through network or hard drive I/O.
|
||||
\par {\*\bkmkstart _Toc26539153}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.4.2\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Configuration{\*\bkmkend _Toc26539153}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {
|
||||
LARM provides a standard way of configuration of all components in the system. Parameters that may be configured include:
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls2
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls2\pnrauth1\pnrdate1718329849\pnrstart0\pnrxst1\pnrxst0\pnrxst45\pnrxst0\pnrstop4\pnrstart1\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrstop9\pnrstart2\pnrnfc23\pnrnfc23
|
||||
\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc0\pnrnfc0\pnrnfc8\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrstop18\pnrstart3\pnrpnbr8\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0
|
||||
\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrstop36
|
||||
\adjustright\rin0\lin720\itap0 {Mapping of name extensions or MIME types to data handlers.
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}Index handler configuration
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}Sorted fields
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}Default conjunction operator
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}[Full list needs to be defined]}{\f5\cf1
|
||||
\par {\*\bkmkstart _Toc26539154}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.4.3\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Inherently Distributable{\*\bkmkend _Toc26539154}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Constrained system resources (hard drive & network I/O, CPU and RAM) require that dif
|
||||
ferent components may be distributed over multiple machines. This calls for a message-oriented paradigm with coarse-grained data exchange packets.
|
||||
\par {\*\bkmkstart _Toc26539155}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.4.4\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Component-Based Approach{\*\bkmkend _Toc26539155}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Our intention is to use Avalon as the basis for the framework. Whilst we recognize t
|
||||
hat this might significantly increase the learning curve to using the framework, we feel that its an acceptable tradeoff in exchange for a well-designed, production-quality server framework.
|
||||
\par The component-based approach means different parts of the system
|
||||
(e.g. data sources and the pipeline) are only loosely coupled. This will enable us to assemble a system using configuration files, not source code, to incorporate external developments more easily through standard interfaces, and to be able to break up di
|
||||
fferent steps of the pipeline into several processes, connected by data exchange components.
|
||||
\par {\*\bkmkstart _Toc26539156}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.4.5\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Fail-Safety{\*\bkmkend _Toc26539156}/Check-pointing
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {
|
||||
If LARM is to be used in a production environment there have to be possibilities to shut the system down at least at specified checkpoint
|
||||
s, and to recover after a system crash. Since indexing operations may take days or even weeks, it must be possible to save the global status of an indexing operation, at least every now and then.
|
||||
\par {\*\bkmkstart _Toc26539157}{\listtext\pard\plain\s3 \b\f1\fs26\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.4.6\tab}}\pard\plain \s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\outlinelevel2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Other non-functional Requirements{\*\bkmkend _Toc26539157}
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard\plain \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls2
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls2\pnrauth1\pnrdate1718329849\pnrstart0\pnrxst1\pnrxst0\pnrxst45\pnrxst0\pnrstop4\pnrstart1\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrstop9\pnrstart2\pnrnfc23\pnrnfc23
|
||||
\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc0\pnrnfc0\pnrnfc13\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrstop18\pnrstart3\pnrpnbr13\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0
|
||||
\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrstop36
|
||||
\adjustright\rin0\lin720\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Common logging mechanism
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}Common configuration mechanism
|
||||
\par {\*\bkmkstart _Toc26539158}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 4.5\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Out of Scope{\*\bkmkend _Toc26539158}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {We see the following aspects to be out of scope of our approach:
|
||||
\par {\listtext\pard\plain\lang1033\langfe1031\langnp1033 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls2\adjustright\rin0\lin720\itap0 {
|
||||
Complete transaction safety. This would involve an XA based architecture that would guarantee that a change, e.g. in a database table, would result in an index update in the LARM engine. Core-Lucene is not transaction safe at this time, so this wouldn
|
||||
\rquote t make sense. In general, we favor a lean approach that allows slight differences between the index and the underlying data}{\b\f1\fs32\kerning32
|
||||
\par }\pard \ql \li720\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin720\itap0 {
|
||||
\par }{\i [Peter: }{\i\kerning32 I think we should include the transactional like updating. We need to be able to update a live index.]}{\kerning32
|
||||
\par {\*\bkmkstart _Toc26539159}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\kerning32\langnp1033 \hich\af1\dbch\af0\loch\f1 4.6\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {\kerning32 Development Standards{\*\bkmkend _Toc26539159}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {\kerning32 - Coding Style: Every LARM developer agrees to use the Avalon guidelines avail
|
||||
able at http://jakarta.apache.org/avalon/code-standards.html}{
|
||||
\par }{\kerning32 - Logging: In case we use Avalon, Avalon LogKit is used. Otherwise we use Jakarta Log4J [configuration to be defined]}{
|
||||
\par }}
|
File diff suppressed because it is too large
Load Diff
|
@ -1,113 +0,0 @@
|
|||
{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1031\deflangfe1031{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}
|
||||
{\f2\fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}{\f3\froman\fcharset2\fprq2{\*\panose 05050102010706020507}Symbol;}{\f4\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times;}
|
||||
{\f5\fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Helvetica;}{\f14\fnil\fcharset2\fprq2{\*\panose 05000000000000000000}Wingdings;}{\f28\froman\fcharset238\fprq2 Times New Roman CE;}{\f29\froman\fcharset204\fprq2 Times New Roman Cyr;}
|
||||
{\f31\froman\fcharset161\fprq2 Times New Roman Greek;}{\f32\froman\fcharset162\fprq2 Times New Roman Tur;}{\f33\froman\fcharset177\fprq2 Times New Roman (Hebrew);}{\f34\froman\fcharset178\fprq2 Times New Roman (Arabic);}
|
||||
{\f35\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f36\fswiss\fcharset238\fprq2 Arial CE;}{\f37\fswiss\fcharset204\fprq2 Arial Cyr;}{\f39\fswiss\fcharset161\fprq2 Arial Greek;}{\f40\fswiss\fcharset162\fprq2 Arial Tur;}
|
||||
{\f41\fswiss\fcharset177\fprq2 Arial (Hebrew);}{\f42\fswiss\fcharset178\fprq2 Arial (Arabic);}{\f43\fswiss\fcharset186\fprq2 Arial Baltic;}{\f60\froman\fcharset238\fprq2 Times CE;}{\f61\froman\fcharset204\fprq2 Times Cyr;}
|
||||
{\f63\froman\fcharset161\fprq2 Times Greek;}{\f64\froman\fcharset162\fprq2 Times Tur;}{\f65\froman\fcharset177\fprq2 Times (Hebrew);}{\f66\froman\fcharset178\fprq2 Times (Arabic);}{\f67\froman\fcharset186\fprq2 Times Baltic;}
|
||||
{\f68\fswiss\fcharset238\fprq2 Helvetica CE;}{\f69\fswiss\fcharset204\fprq2 Helvetica Cyr;}{\f71\fswiss\fcharset161\fprq2 Helvetica Greek;}{\f72\fswiss\fcharset162\fprq2 Helvetica Tur;}{\f73\fswiss\fcharset177\fprq2 Helvetica (Hebrew);}
|
||||
{\f74\fswiss\fcharset178\fprq2 Helvetica (Arabic);}{\f75\fswiss\fcharset186\fprq2 Helvetica Baltic;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;
|
||||
\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{
|
||||
\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 \snext0 Normal;}{\s1\ql \fi-432\li432\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx432\aspalpha\aspnum\faauto\ls1\adjustright\rin0\lin432\itap0 \b\f1\fs32\lang1033\langfe1031\kerning32\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 1;}{\s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 2;}{\s3\ql \fi-720\li720\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls1\ilvl2\adjustright\rin0\lin720\itap0 \b\f1\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 3;}{\s4\ql \fi-864\li864\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx864\aspalpha\aspnum\faauto\ls1\ilvl3\adjustright\rin0\lin864\itap0 \b\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 4;}{\s5\ql \fi-1008\li1008\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1008\aspalpha\aspnum\faauto\ls1\ilvl4\adjustright\rin0\lin1008\itap0 \b\i\fs26\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 5;}{\s6\ql \fi-1152\li1152\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1152\aspalpha\aspnum\faauto\ls1\ilvl5\adjustright\rin0\lin1152\itap0 \b\f4\fs22\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 6;}{\s7\ql \fi-1296\li1296\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1296\aspalpha\aspnum\faauto\ls1\ilvl6\adjustright\rin0\lin1296\itap0 \f4\fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 7;}{\s8\ql \fi-1440\li1440\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1440\aspalpha\aspnum\faauto\ls1\ilvl7\adjustright\rin0\lin1440\itap0 \i\f4\fs24\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 8;}{\s9\ql \fi-1584\li1584\ri0\sb240\sa60\widctlpar
|
||||
\jclisttab\tx1584\aspalpha\aspnum\faauto\ls1\ilvl8\adjustright\rin0\lin1584\itap0 \f5\fs22\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext0 heading 9;}{\*\cs10 \additive Default Paragraph Font;}{
|
||||
\s15\qc \li0\ri0\sb240\sa60\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0 \b\f5\fs32\lang1033\langfe1031\kerning28\cgrid\langnp1033\langfenp1031 \sbasedon0 \snext15 Title;}}{\*\listtable{\list\listtemplateid-100094782
|
||||
\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat0\levelspace0\levelindent0{\leveltext\'01-;}{\levelnumbers;}\loch\af0\hich\af0\dbch\af0\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li720
|
||||
\jclisttab\tx720 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li1440\jclisttab\tx1440 }
|
||||
{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2160\jclisttab\tx2160 }{\listlevel
|
||||
\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li2880\jclisttab\tx2880 }{\listlevel\levelnfc23
|
||||
\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li3600\jclisttab\tx3600 }{\listlevel\levelnfc23\levelnfcn23
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li4320\jclisttab\tx4320 }{\listlevel\levelnfc23\levelnfcn23\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3913 ?;}{\levelnumbers;}\f3\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5040\jclisttab\tx5040 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0
|
||||
\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01o;}{\levelnumbers;}\f2\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li5760\jclisttab\tx5760 }{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0
|
||||
\levelstartat1\levelspace0\levelindent0{\leveltext\'01\u-3929 ?;}{\levelnumbers;}\f14\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1\fbias0 \fi-360\li6480\jclisttab\tx6480 }{\listname ;}\listid72943879}{\list\listtemplateid1804128586{\listlevel
|
||||
\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'01\'00;}{\levelnumbers\'01;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s1\fi-432\li432\jclisttab\tx432 }{\listlevel\levelnfc0\levelnfcn0
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'03\'00.\'01;}{\levelnumbers\'01\'03;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s2\fi-576\li576\jclisttab\tx576 }{\listlevel\levelnfc0\levelnfcn0\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'05\'00.\'01.\'02;}{\levelnumbers\'01\'03\'05;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s3\fi-720\li720\jclisttab\tx720 }{\listlevel\levelnfc0\levelnfcn0\leveljc0
|
||||
\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'07\'00.\'01.\'02.\'03;}{\levelnumbers\'01\'03\'05\'07;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s4\fi-864\li864\jclisttab\tx864 }{\listlevel\levelnfc0\levelnfcn0
|
||||
\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'09\'00.\'01.\'02.\'03.\'04;}{\levelnumbers\'01\'03\'05\'07\'09;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s5\fi-1008\li1008\jclisttab\tx1008 }{\listlevel
|
||||
\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0b\'00.\'01.\'02.\'03.\'04.\'05;}{\levelnumbers\'01\'03\'05\'07\'09\'0b;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s6\fi-1152\li1152
|
||||
\jclisttab\tx1152 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0d\'00.\'01.\'02.\'03.\'04.\'05.\'06;}{\levelnumbers\'01\'03\'05\'07\'09\'0b\'0d;}\chbrdr\brdrnone\brdrcf1
|
||||
\chshdng0\chcfpat1\chcbpat1 \s7\fi-1296\li1296\jclisttab\tx1296 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext\'0f\'00.\'01.\'02.\'03.\'04.\'05.\'06.\'07;}{\levelnumbers
|
||||
\'01\'03\'05\'07\'09\'0b\'0d\'0f;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s8\fi-1440\li1440\jclisttab\tx1440 }{\listlevel\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace0\levelindent0{\leveltext
|
||||
\'11\'00.\'01.\'02.\'03.\'04.\'05.\'06.\'07.\'08;}{\levelnumbers\'01\'03\'05\'07\'09\'0b\'0d\'0f\'11;}\chbrdr\brdrnone\brdrcf1 \chshdng0\chcfpat1\chcbpat1 \s9\fi-1584\li1584\jclisttab\tx1584 }{\listname ;}\listid854879813}}{\*\listoverridetable
|
||||
{\listoverride\listid854879813\listoverridecount0\ls1}{\listoverride\listid72943879\listoverridecount0\ls2}}{\info{\title Usage Scenarios}{\author Clemens Marschner}{\operator Clemens Marschner}{\creatim\yr2002\mo12\dy2\min42}
|
||||
{\revtim\yr2002\mo12\dy2\min42}{\version2}{\edmins0}{\nofpages2}{\nofwords638}{\nofchars3642}{\*\company Dell Computer Corporation}{\nofcharsws4472}{\vern8249}}\paperw11906\paperh16838\margl1417\margr1417\margt1417\margb1134
|
||||
\deftab708\widowctrl\ftnbj\aenddoc\hyphhotz425\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\formshade\horzdoc\dgmargin\dghspace180\dgvspace180\dghorigin1417\dgvorigin1417\dghshow1\dgvshow1
|
||||
\jexpand\viewkind1\viewscale137\viewzk2\pgbrdrhead\pgbrdrfoot\splytwnine\ftnlytwnine\htmautsp\nolnhtadjtbl\useltbaln\alntblind\lytcalctblwd\lyttblrtgr\lnbrkrule \fet0\sectd \linex0\headery708\footery708\colsx708\endnhere\sectlinegrid360\sectdefaultcl
|
||||
{\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5
|
||||
\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang
|
||||
{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}\pard\plain \s15\qc \li0\ri0\sb240\sa60\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0
|
||||
\b\f5\fs32\lang1033\langfe1031\kerning28\cgrid\langnp1033\langfenp1031 {\fs28\lang1040\langfe1031\langnp1040 {\*\bkmkstart _Toc26538554}Lucene Retrieval Machine
|
||||
\par Lucene Framework
|
||||
\par }{\fs28\lang1031\langfe1031\langnp1031 Usage Scenarios Document
|
||||
\par }\pard\plain \qc \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang1040\langfe1031\langnp1040
|
||||
\par }{Revision: 5 (cmarschn, 2002-12-01)
|
||||
\par
|
||||
\par Clemens Marschner - Otis Gospodnetic - Peter Carlson - Kelvin Tan
|
||||
\par }\pard\plain \s1\ql \li0\ri0\sb240\sa60\keepn\widctlpar\aspalpha\aspnum\faauto\outlinelevel0\adjustright\rin0\lin0\itap0 \b\f1\fs32\lang1033\langfe1031\kerning32\cgrid\langnp1033\langfenp1031 {
|
||||
\par Usage Scenarios{\*\bkmkend _Toc26538554}
|
||||
\par {\*\bkmkstart _Toc26538555}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 1.1\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {File System Indexer{\*\bkmkend _Toc26538555}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 A file system indexer would work like the \'93Microsoft Index server\'94
|
||||
. It may consist of only one pipeline.
|
||||
\par The Scheduler puts document locations (i.e. file URLs) in an asynchronous (Request-) pipeline. The first MP loads the document, replacing the file URL message by a document message, and tries to detec
|
||||
t a MIME type. After that, the MessageDispatcher component dispatches the messages to different MPs, depending on the MIME type, that extract the text of each document. In the end, a LuceneStorage takes the resulting message and saves it in a Lucene index
|
||||
.
|
||||
\par In an extension of that one MP first detects the Mime type, a second would check if that mime type can be handled by the application, the third then loads the doc, the fourth analyses the documents and the sixth saves them to a LuceneStorage.
|
||||
\par In an incremental operation the Source is connected to the LuceneStorage and checks if documents have to be refreshed. An additional MP may check if the document loaded is newer than the one already indexed, and may discard the message if not.
|
||||
\par
|
||||
\par {\*\bkmkstart _Toc26538556}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 1.2\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Intranet Web Crawler{\*\bkmkend _Toc26538556}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 An
|
||||
intranet web crawler (only a few hosts) is not that different than the file system indexer, except that the loading process may be multithreaded and loads document over the net instead of the file system. In this case there have to be at least two Messag
|
||||
e
|
||||
Pipelines, since the crawling parts are again active components. An additional processing step extracts links from the loaded documents and puts them back into the queue. A URLSeenFilter (called URLVisitedFilter at this time) makes sure no URL is put into
|
||||
the pipeline twice. A RobotExclusionFilter makes sure the robot exclusion standard is followed, and filters URLs that are marked to be \'93disallowed\'94. At the end there is again that LuceneStorage.
|
||||
\par (This is how LARM is implemented right now. There are already some efforts made to put some of the data structures on hard drive)
|
||||
\par
|
||||
\par {\*\bkmkstart _Toc26538557}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 1.3\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Small WWW Web Crawler{\*\bkmkend _Toc26538557}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
If the system is supposed to scale to more than a few hosts, memory, efficiency and fault tolerance becomes a major concern, and it more and more becomes a matter
|
||||
of juggling with the system resources (network bandwidth, CPU time, RAM, hard drive space). If one of them becomes a bottleneck, the whole system may become very slow or (in case of RAM or HD shortage) may crash.
|
||||
\par Suppose LuceneStorage is much slower than t
|
||||
he crawler. Since the indexer is pretty much CPU bound it becomes necessary to distribute that on to two hosts. This can be done easily with the pipeline framework if it the pipeline is broken up into two parts and connected via JMS. The loaded document i
|
||||
s
|
||||
put into a JMS topic which is configured such that the JMS messages are routed to one of the destinations in a round-robin manner. On the other side there are indexing components on different hosts that build Lucene indexes that are merged from time to t
|
||||
ime.
|
||||
\par {\*\bkmkstart _Toc26538558}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 1.4\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Large WWW Web Crawler{\*\bkmkend _Toc26538558}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 (todo)
|
||||
\par If a web crawler is supposed to scale to the whole WWW a whole set of precautions have to be taken care of.
|
||||
\par {\listtext\pard\plain\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}}\pard \ql \fi-360\li720\ri0\widctlpar\jclisttab\tx720\aspalpha\aspnum\faauto\ls2
|
||||
\jclisttab\tx720\aspalpha\aspnum\faauto\ls2\pnrauth1\pnrdate1718329849\pnrstart0\pnrxst1\pnrxst0\pnrxst45\pnrxst0\pnrstop4\pnrstart1\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrrgb0\pnrstop9\pnrstart2\pnrnfc23\pnrnfc23
|
||||
\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc23\pnrnfc0\pnrnfc0\pnrnfc3\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrnfc0\pnrstop18\pnrstart3\pnrpnbr3\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0
|
||||
\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrpnbr0\pnrstop36
|
||||
\adjustright\rin0\lin720\itap0 {\lang2057\langfe1031\langnp2057 The URLSeen structure must scale to billions of URLs (i.e. constant memory usage) and must also be distributed
|
||||
\par {\listtext\pard\plain\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Crawlers and indexers must be distributed
|
||||
\par {\listtext\pard\plain\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Most of the data must be kept on disk
|
||||
\par {\listtext\pard\plain\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}The server must be able to save its state on disk and recover after failures
|
||||
\par {\listtext\pard\plain\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}Exchange of messages should take place in batch operation
|
||||
\par {\listtext\pard\plain\lang2057\langfe1031\langnp2057 \hich\af0\dbch\af0\loch\f0 -\tab}special services, i.e. DNS resolvers, have to be installed to prevent bottlenecks
|
||||
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {\lang2057\langfe1031\langnp2057
|
||||
\par {\*\bkmkstart _Toc26538559}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 1.5\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Database Indexer{\*\bkmkend _Toc26538559}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057
|
||||
A database indexer may consist of a Message source that is connected to the messaging mechanism of the database (i.e. triggers). It then reads the contents of changed database fields and puts them alon
|
||||
g to the indexer. That way Lucene may be integrated into HSSQL or even Oracle.
|
||||
\par Problem here: Transaction safety may make it necessary to operate largely with disk based structures (i.e. transaction logs)
|
||||
\par {\*\bkmkstart _Toc26538560}{\listtext\pard\plain\s2 \b\i\f1\fs28\lang1033\langfe1031\langnp1033 \hich\af1\dbch\af0\loch\f1 1.6\tab}}\pard\plain \s2\ql \fi-576\li576\ri0\sb240\sa60\keepn\widctlpar
|
||||
\jclisttab\tx576\aspalpha\aspnum\faauto\ls1\ilvl1\outlinelevel1\adjustright\rin0\lin576\itap0 \b\i\f1\fs28\lang1033\langfe1031\cgrid\langnp1033\langfenp1031 {Single Subsystem Test Scenario{\*\bkmkend _Toc26538560}
|
||||
\par }\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1031\langfe1031\cgrid\langnp1031\langfenp1031 {\lang2057\langfe1031\langnp2057 For testing purposes
|
||||
it may be viable to interface the SUT with dummy or utility components. That way, if the document processor subsystem is to be tested, the LuceneStorage may be replaced by a LogStorage, which does nothing but log everything it gets into log files. This lo
|
||||
g storage may also be placed between different processing steps.
|
||||
\par
|
||||
\par }}
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue