Commit Graph

1623 Commits

Author SHA1 Message Date
David Spencer 1d68f8c88d Logic ignored stop words were in a early version of this code but it was taken out in the belief that there
was no point in explicitly looking for them as the scoring algorithm would effictively ignore them.

I did a test and indexed 700 pages on a corporate web site and then ran the MoreLikeThis code on them
and 1/2 of the docs had stop words identified as interesting.

So - I added code in to ignore stop words, but make it backward compatible so that by default this code
is not used.




git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169512 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 19:29:56 +00:00
David Spencer 81087e8bb6 Touchup javadoc.
Make retrieveInterestingTerms only return the top terms, not all terms.



git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169511 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 19:10:28 +00:00
David Spencer 175cf8a9fd [1] Added comments to retrieveTerms() to document the return value.
[2] Added convenience routine retrieveInterestingTerms() which makes it easier to get at the "interesting words" in a document.




git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169508 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 18:49:43 +00:00
Erik Hatcher a79c508580 #34816 - adjust for contrib/WordNet renaming
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169391 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 01:19:03 +00:00
David Spencer c696188668 don't print out summary unless it's present
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169366 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 21:37:50 +00:00
David Spencer 7f8bf69311 cleanup deprecated warnings so it compiles cleanly w/ the current lucene code, lucene-core-1.9-rc1-dev.jar
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169365 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 21:36:22 +00:00
David Spencer c680751f63 test checkin of README, just to verify my permissions
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169349 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 19:25:40 +00:00
Daniel Naber 129227dce1 throw a more helpful exception if supposed directory is a file
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169136 13f79535-47bb-0310-9956-ffa450edef68
2005-05-08 14:51:29 +00:00
Erik Hatcher 78dbe41805 prefix all JARs with lucene-
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168986 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 23:43:54 +00:00
Daniel Naber 9f78244f9e convenience constructors that load list of stop words from a file
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168970 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 22:28:52 +00:00
Daniel Naber c3f90ad76e use non-deprecated API
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168642 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 19:32:54 +00:00
Daniel Naber 529214394c remove useless parameter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168640 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 19:29:40 +00:00
Erik Hatcher e8c90fb050 rename WordNet to wordnet, required intermediate move due to OS case insensitivity
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168480 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:32:00 +00:00
Erik Hatcher 5fd5169a6f temporary move to lowercase WordNet
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168479 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:31:11 +00:00
Erik Hatcher dd472377dd adjust code to fix compile/javadoc errors on JDK 1.5
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168478 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:26:08 +00:00
Erik Hatcher a12dac37b4 adjust project names for consistency
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168476 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:24:18 +00:00
Daniel Naber 170bdc33a3 call static methods via class, not via object (avoids warning in Eclipse)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168454 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 22:46:09 +00:00
Daniel Naber ffbdf0b882 test using a non-existing field as first sort key
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168453 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 22:41:44 +00:00
Mark Harwood 12a91b4395 Fixed bug where docs larger than maxDocBytesToAnalyze would cause last fragment to be sized as remainder of doc (which could be huge).
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168452 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 22:40:45 +00:00
Daniel Naber a20246c68c don't declare Exceptions that are never thrown; remove an unused variable
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168450 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 22:37:09 +00:00
Daniel Naber c97ba92ebd refactoring so that filename extensions are in one place
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168449 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 22:20:49 +00:00
Daniel Naber 0209ce959b don't print to stdout in test cases
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168338 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 15:22:03 +00:00
Daniel Naber 30fe087036 update build instructions and version numbers
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168332 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 13:38:34 +00:00
Daniel Naber 4b00637662 only delete our own files when re-creating an index (#34695)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168213 13f79535-47bb-0310-9956-ffa450edef68
2005-05-04 23:34:52 +00:00
Daniel Naber 77f94fb60c mention the new Java 1.4 requirement that we already agreed on in January
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168212 13f79535-47bb-0310-9956-ffa450edef68
2005-05-04 23:26:00 +00:00
Daniel Naber 0e9579345a fixing typos; WordNet url update
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168208 13f79535-47bb-0310-9956-ffa450edef68
2005-05-04 23:10:37 +00:00
Erik Hatcher 8f70c09b9b Wolfgang is non-stop with the additions. Easy enough to paste in, so here it is with a Collection-based TokenStream
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168029 13f79535-47bb-0310-9956-ffa450edef68
2005-05-04 00:24:17 +00:00
Erik Hatcher f94ebdb41e applied norm caching path from Wolfgang
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@167958 13f79535-47bb-0310-9956-ffa450edef68
2005-05-03 19:01:58 +00:00
Erik Hatcher 2a37a3e820 Apply wolfgangs fix to the tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@167835 13f79535-47bb-0310-9956-ffa450edef68
2005-05-03 00:33:27 +00:00
Andreas Vajda 572633f8c4 - reworked store I/O to use new IndexInput and IndexOutput classes
- reworked store I/O to avoid upstream buffering giving better txn control
 - added DbStoreTest unit test adapted from StoreTest


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165674 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 20:06:00 +00:00
Daniel Naber 4b1834eeee sorry, typo in image URL
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165660 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 18:50:45 +00:00
Daniel Naber 4b2d7f3fe0 use non-relative URL for image to make it work in sub directories; remove non-existing stuff from sandbox page
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165659 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 18:47:19 +00:00
Daniel Naber cfb14e1be8 improve text of exception
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165658 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 18:43:48 +00:00
Erik Hatcher b01de31134 Add contrib/memory to javadocs, and add imported build files into src distribution
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165616 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 10:00:44 +00:00
Erik Hatcher 9464b37949 remove ignores since artifacts now are built into main directory rather than here
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165615 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 09:53:43 +00:00
Erik Hatcher 8f9e2a15e7 Enhancement #34585 - high-performance in-memory index contributed by Wolfgang Hoschek
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165606 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 09:04:07 +00:00
Erik Hatcher bc49f328c6 aggregate duplicated distribution patterns into reusable patternsets
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165571 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 01:06:14 +00:00
Erik Hatcher fe95807ca8 belated checkin - moved deprecated build/test targets to separate easily removable import build file
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165569 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 00:53:41 +00:00
Erik Hatcher eb50b47c8b add contrib pieces to distribution files
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165568 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 00:51:18 +00:00
Erik Hatcher c3847f26ea overhaul of build system to facilitate building and packaging of contrib sub-projects. some work still to be done, but core Lucene build still working fine
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165566 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 00:11:11 +00:00
Erik Hatcher 21431112fe adjust license headers to be ASL 2.0
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165565 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 00:08:04 +00:00
Erik Hatcher df52ba1ec6 standardizing source layout
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165562 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 23:57:31 +00:00
Erik Hatcher f56d33e2d4 Add ASL header - sorry for the oversight on this.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165559 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 22:57:39 +00:00
Daniel Naber b8dfd507eb whitespace cleanup only (no more tabs/spaces mix)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165552 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 22:04:24 +00:00
Daniel Naber e8fd6b347c remove non-existing projects and fix a link
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165509 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 14:24:51 +00:00
Daniel Naber d087df635f move resource page to the wiki to avoid content duplication
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165508 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 14:07:36 +00:00
Daniel Naber db8246f137 forgot to commit these files
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165500 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 13:09:36 +00:00
Daniel Naber 9c3bd9ca86 import cleanup
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165484 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 11:59:02 +00:00
Erik Hatcher acf2b4c60c Remove outdated sandbox code
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165365 13f79535-47bb-0310-9956-ffa450edef68
2005-04-30 00:07:27 +00:00
Daniel Naber f848854278 fixing property
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165355 13f79535-47bb-0310-9956-ffa450edef68
2005-04-29 22:58:33 +00:00