Commit Graph

262 Commits

Author SHA1 Message Date
David Spencer 1d68f8c88d Logic ignored stop words were in a early version of this code but it was taken out in the belief that there
was no point in explicitly looking for them as the scoring algorithm would effictively ignore them.

I did a test and indexed 700 pages on a corporate web site and then ran the MoreLikeThis code on them
and 1/2 of the docs had stop words identified as interesting.

So - I added code in to ignore stop words, but make it backward compatible so that by default this code
is not used.




git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169512 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 19:29:56 +00:00
David Spencer 81087e8bb6 Touchup javadoc.
Make retrieveInterestingTerms only return the top terms, not all terms.



git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169511 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 19:10:28 +00:00
David Spencer 175cf8a9fd [1] Added comments to retrieveTerms() to document the return value.
[2] Added convenience routine retrieveInterestingTerms() which makes it easier to get at the "interesting words" in a document.




git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169508 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 18:49:43 +00:00
David Spencer c696188668 don't print out summary unless it's present
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169366 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 21:37:50 +00:00
David Spencer 7f8bf69311 cleanup deprecated warnings so it compiles cleanly w/ the current lucene code, lucene-core-1.9-rc1-dev.jar
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169365 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 21:36:22 +00:00
David Spencer c680751f63 test checkin of README, just to verify my permissions
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169349 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 19:25:40 +00:00
Erik Hatcher 78dbe41805 prefix all JARs with lucene-
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168986 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 23:43:54 +00:00
Erik Hatcher e8c90fb050 rename WordNet to wordnet, required intermediate move due to OS case insensitivity
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168480 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:32:00 +00:00
Erik Hatcher 5fd5169a6f temporary move to lowercase WordNet
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168479 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:31:11 +00:00
Erik Hatcher dd472377dd adjust code to fix compile/javadoc errors on JDK 1.5
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168478 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:26:08 +00:00
Erik Hatcher a12dac37b4 adjust project names for consistency
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168476 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:24:18 +00:00
Mark Harwood 12a91b4395 Fixed bug where docs larger than maxDocBytesToAnalyze would cause last fragment to be sized as remainder of doc (which could be huge).
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168452 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 22:40:45 +00:00
Erik Hatcher 8f70c09b9b Wolfgang is non-stop with the additions. Easy enough to paste in, so here it is with a Collection-based TokenStream
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168029 13f79535-47bb-0310-9956-ffa450edef68
2005-05-04 00:24:17 +00:00
Erik Hatcher f94ebdb41e applied norm caching path from Wolfgang
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@167958 13f79535-47bb-0310-9956-ffa450edef68
2005-05-03 19:01:58 +00:00
Erik Hatcher 2a37a3e820 Apply wolfgangs fix to the tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@167835 13f79535-47bb-0310-9956-ffa450edef68
2005-05-03 00:33:27 +00:00
Andreas Vajda 572633f8c4 - reworked store I/O to use new IndexInput and IndexOutput classes
- reworked store I/O to avoid upstream buffering giving better txn control
 - added DbStoreTest unit test adapted from StoreTest


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165674 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 20:06:00 +00:00
Erik Hatcher 8f9e2a15e7 Enhancement #34585 - high-performance in-memory index contributed by Wolfgang Hoschek
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165606 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 09:04:07 +00:00
Erik Hatcher c3847f26ea overhaul of build system to facilitate building and packaging of contrib sub-projects. some work still to be done, but core Lucene build still working fine
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165566 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 00:11:11 +00:00
Erik Hatcher 21431112fe adjust license headers to be ASL 2.0
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165565 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 00:08:04 +00:00
Erik Hatcher df52ba1ec6 standardizing source layout
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165562 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 23:57:31 +00:00
Erik Hatcher f56d33e2d4 Add ASL header - sorry for the oversight on this.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165559 13f79535-47bb-0310-9956-ffa450edef68
2005-05-01 22:57:39 +00:00
Andreas Vajda 77130721ce - replaced db.jar with db-4.3.27.jar
- downloading db-4.3.27.jar from http://downloads.osafoundation.org/db


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165319 13f79535-47bb-0310-9956-ffa450edef68
2005-04-29 17:33:27 +00:00
Erik Hatcher d9042b00d8 move PrecedenceQueryParser to contrib/misc until the kinks are worked out
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@164964 13f79535-47bb-0310-9956-ffa450edef68
2005-04-27 09:32:33 +00:00
Erik Hatcher 7b8f43ec7c move misc over to official contrib area
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@164963 13f79535-47bb-0310-9956-ffa450edef68
2005-04-27 09:16:31 +00:00
Erik Hatcher 5c9ccb2442 Add Lucene's test classes to contrib test classpath, some tests rely on the utility methods in the core tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@164937 13f79535-47bb-0310-9956-ffa450edef68
2005-04-27 01:52:17 +00:00
Erik Hatcher 790dfc1490 javadoc fixup
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@164742 13f79535-47bb-0310-9956-ffa450edef68
2005-04-26 04:41:54 +00:00
Erik Hatcher 26aab23901 add ignores
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@164698 13f79535-47bb-0310-9956-ffa450edef68
2005-04-26 00:30:08 +00:00
Erik Hatcher d650384d4b add GreekAnalyzer, contributed by Panagiotis Astithas (past@ebs.gr)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@164686 13f79535-47bb-0310-9956-ffa450edef68
2005-04-25 23:23:37 +00:00
Erik Hatcher 2fe0a80189 rename misspelled indexDictionnary method
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160988 13f79535-47bb-0310-9956-ffa450edef68
2005-04-12 00:11:33 +00:00
Erik Hatcher ec522fc1c8 Fixed deprecation issues, adjusted test cases to use assertEquals better, reformatted style
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160987 13f79535-47bb-0310-9956-ffa450edef68
2005-04-11 23:48:02 +00:00
Erik Hatcher 0c99b57cc1 Fixed issue with ctor parameter being ignored
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160984 13f79535-47bb-0310-9956-ffa450edef68
2005-04-11 23:43:57 +00:00
Erik Hatcher e88213a2d9 refactor build to use common contrib build system
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160983 13f79535-47bb-0310-9956-ffa450edef68
2005-04-11 23:42:26 +00:00
Daniel Naber c4f1ee70a9 use lowercase method names; remove javadoc that's inherited anyway
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160070 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 17:50:38 +00:00
Daniel Naber 04ea892fbe import cleanup
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160065 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 17:45:36 +00:00
Erik Hatcher 6f5f23444c enhanced test contributed by Sven. Encoding tweaks
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160034 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 12:25:16 +00:00
Erik Hatcher 0ff227ff0a switch dotted u character to use unicode value reference
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160023 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 10:16:37 +00:00
Erik Hatcher 4e580e221e Issue deprecation warnings when building test cases. Fixed deprecation warnings on TestKeywordAnalyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160012 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 09:10:59 +00:00
Erik Hatcher 3be3e8ab5d Add accent character normalizer filter contributed by Sven Duzont. Also created simple test case.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160011 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 09:10:05 +00:00
Daniel Naber 69380a1815 adapt to use of jline
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@158852 13f79535-47bb-0310-9956-ffa450edef68
2005-03-23 23:49:08 +00:00
Daniel Naber 84db65bfde adapt to use of jline
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@158851 13f79535-47bb-0310-9956-ffa450edef68
2005-03-23 23:42:23 +00:00
Daniel Naber 5a59714f4a use jline instead of java-readline. jline can be added to SVN thanks to its BSD license. plus some small cleanup.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@158850 13f79535-47bb-0310-9956-ffa450edef68
2005-03-23 23:40:50 +00:00
Erik Hatcher b54f22aaab Fix max word length issue (though don't know why anyone would limit long words in a more-like-this query).
Also, modified to take into account all values of a field rather than just the first one.


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@158076 13f79535-47bb-0310-9956-ffa450edef68
2005-03-18 15:03:00 +00:00
Erik Hatcher 1cb674fc04 regenerated from latest Snowball CVS
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@157834 13f79535-47bb-0310-9956-ffa450edef68
2005-03-17 00:41:31 +00:00
Erik Hatcher 9621a0985c added title to documentation
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@156593 13f79535-47bb-0310-9956-ffa450edef68
2005-03-09 01:59:14 +00:00
Erik Hatcher 9824226394 Contribution of slick Swing models to enable on-the-fly searching of
tables and lists.  Created by Jonathan Simon.



git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@156591 13f79535-47bb-0310-9956-ffa450edef68
2005-03-09 01:52:13 +00:00
Mark Harwood fdf05bd088 Fixed missing fieldname in API
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@154447 13f79535-47bb-0310-9956-ffa450edef68
2005-02-19 19:51:04 +00:00
Daniel Naber 05d0335dcd offer additional methods that take analyzer + text instead of tokenstream; fix some unused imports and variables
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@154444 13f79535-47bb-0310-9956-ffa450edef68
2005-02-19 19:08:52 +00:00
Daniel Naber 335c1567d8 remove empty "@return" tags so javadoc stops complaining; small whitespace cleanup
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@154083 13f79535-47bb-0310-9956-ffa450edef68
2005-02-16 20:37:57 +00:00
Daniel Naber 45864d1c9c clean up imports, remove unused variables and remove the declaration of an Exception that was never thrown
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@154080 13f79535-47bb-0310-9956-ffa450edef68
2005-02-16 20:20:15 +00:00
Erik Hatcher 28e712b2ee update docs to account for TLP migration
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@153802 13f79535-47bb-0310-9956-ffa450edef68
2005-02-14 16:48:47 +00:00
Erik Hatcher 373e613341 remove unnecessary import
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@153430 13f79535-47bb-0310-9956-ffa450edef68
2005-02-11 18:11:37 +00:00
Erik Hatcher 2ac412f6b7 move similarity and spellchecker to new contrib area
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@153429 13f79535-47bb-0310-9956-ffa450edef68
2005-02-11 18:11:05 +00:00
Erik Hatcher f375d09898 add customizable buffer size
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@153412 13f79535-47bb-0310-9956-ffa450edef68
2005-02-11 15:30:14 +00:00
Erik Hatcher cd0d0937e1 split keyword tokenizer out of KeywordAnalyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@153398 13f79535-47bb-0310-9956-ffa450edef68
2005-02-11 13:50:37 +00:00
Erik Hatcher 826fef7f6a KeywordAnalyzer contribution - adapted from _Lucene in Action_ code
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@152921 13f79535-47bb-0310-9956-ffa450edef68
2005-02-08 19:13:05 +00:00
Mark Harwood 276ab079f5 Added Nicko Cadell's Encoder contribution
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151622 13f79535-47bb-0310-9956-ffa450edef68
2005-02-06 21:31:54 +00:00
Mark Harwood b1555b0bbf Test SVN Commit
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151615 13f79535-47bb-0310-9956-ffa450edef68
2005-02-06 18:12:57 +00:00
Erik Hatcher 0ee1728e6d move two more projects over to contrib
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151590 13f79535-47bb-0310-9956-ffa450edef68
2005-02-06 15:35:12 +00:00
Erik Hatcher 646f0f0434 Switch ant project to conventional src/java directory structure
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151589 13f79535-47bb-0310-9956-ffa450edef68
2005-02-06 14:51:59 +00:00
Erik Hatcher 767312d611 add convenient TODO file to keep track of sandbox -> contrib move
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151469 13f79535-47bb-0310-9956-ffa450edef68
2005-02-05 02:23:19 +00:00
Erik Hatcher 10904d02f6 fix most deprecation warnings
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151468 13f79535-47bb-0310-9956-ffa450edef68
2005-02-05 02:21:39 +00:00
Erik Hatcher 0955eef89f move parts of the sandbox over to contrib area
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151459 13f79535-47bb-0310-9956-ffa450edef68
2005-02-05 01:25:43 +00:00