348 Commits

Author SHA1 Message Date
Erik Hatcher
7a3103fac0 Applied patched for LUCENE-324, correcting token offsets returned by ChineseTokenizer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@353930 13f79535-47bb-0310-9956-ffa450edef68
2005-12-04 23:07:42 +00:00
Wolfgang Hoschek
ebe44ace90 git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351896 13f79535-47bb-0310-9956-ffa450edef68 2005-12-03 05:44:16 +00:00
Wolfgang Hoschek
a155416b4d tentative add: Various fulltext analysis utilities avoiding redundant code in several
* classes.

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351895 13f79535-47bb-0310-9956-ffa450edef68
2005-12-03 05:42:59 +00:00
Wolfgang Hoschek
860733f32e indentation fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351893 13f79535-47bb-0310-9956-ffa450edef68
2005-12-03 05:27:50 +00:00
Wolfgang Hoschek
f42d7a1e9b indentation fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351892 13f79535-47bb-0310-9956-ffa450edef68
2005-12-03 05:26:16 +00:00
Wolfgang Hoschek
e28541354d some performance improvements
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351891 13f79535-47bb-0310-9956-ffa450edef68
2005-12-03 05:24:31 +00:00
Wolfgang Hoschek
efa4d10fa1 some performance improvements
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351890 13f79535-47bb-0310-9956-ffa450edef68
2005-12-03 05:22:08 +00:00
Wolfgang Hoschek
317f3f77e9 just a SVN test - please ignore
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351887 13f79535-47bb-0310-9956-ffa450edef68
2005-12-03 04:32:53 +00:00
Wolfgang Hoschek
ec49618824 just a SVN test - please ignore
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351886 13f79535-47bb-0310-9956-ffa450edef68
2005-12-03 04:31:52 +00:00
Mark Harwood
2da431d139 Added support for field-specific highlighting which respects the fieldnames found in queries. Pass a field name to the QueryScorer in order to only select those field's query terms for highlighting. Updated JUnit tests too.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@351504 13f79535-47bb-0310-9956-ffa450edef68
2005-12-01 22:18:33 +00:00
Andreas Vajda
fa24e67d6d - changed build to use version 4.3.29 of the C Berkeley DB Java API
- updated copyright notice year ranges to include 2005


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@350095 13f79535-47bb-0310-9956-ffa450edef68
2005-12-01 01:43:07 +00:00
Erik Hatcher
a4c714d9d5 no longer needed
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@348059 13f79535-47bb-0310-9956-ffa450edef68
2005-11-22 01:40:32 +00:00
Daniel Naber
bfde3257dc moving the non-language specific analyzers to core, this is where most users will probably expect them
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@347991 13f79535-47bb-0310-9956-ffa450edef68
2005-11-21 21:35:24 +00:00
Daniel Naber
31c271c84b import cleanup to avoid Eclipse warnings
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@344474 13f79535-47bb-0310-9956-ffa450edef68
2005-11-15 23:21:44 +00:00
Daniel Naber
4fd74d2554 Rename *Test files which are no unit tests to make "ant test" work. See LUCENE-465.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@344471 13f79535-47bb-0310-9956-ffa450edef68
2005-11-15 23:18:22 +00:00
Daniel Naber
7e079d2950 avoid compiler/Eclipse warnings
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@344468 13f79535-47bb-0310-9956-ffa450edef68
2005-11-15 23:15:53 +00:00
Erik Hatcher
1687a79648 Add NullFragmenter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@332696 13f79535-47bb-0310-9956-ffa450edef68
2005-11-12 01:08:01 +00:00
Erik Hatcher
32fb624ebc LUCENE-437 - Add position increment pass through on SnowballFilter tokens
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@290943 13f79535-47bb-0310-9956-ffa450edef68
2005-09-22 13:38:58 +00:00
Mark Harwood
c00b260ecf Added fix to QueryScorer - if a query has multiple WeightedTerms with different weights for the same term the highest weight is used for scoring that term (previously selected last weight in list)
SimpleHTMLEncoder now encodes characters outside of ASCII range as character entities as per suggestion here: http://issues.apache.org/bugzilla/show_bug.cgi?id=36333


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@279088 13f79535-47bb-0310-9956-ffa450edef68
2005-09-06 20:19:50 +00:00
Mark Harwood
f6b07dabe8 Changed TokenGroup.isDistinct after problems reported with JapaneseAnalyzer (no gaps between tokens)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@279078 13f79535-47bb-0310-9956-ffa450edef68
2005-09-06 19:38:12 +00:00
Daniel Naber
dd5c74112f a query parser by Ronnie Kolehmainen that also sends PrefixQuerys etc. through the analyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@231523 13f79535-47bb-0310-9956-ffa450edef68
2005-08-11 21:28:58 +00:00
Erik Hatcher
6e9c0b6f45 remove unused file
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@209184 13f79535-47bb-0310-9956-ffa450edef68
2005-07-05 02:29:34 +00:00
Erik Hatcher
9d70229506 #34331 - Add Paul Elschot's Surround query language parser
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@209183 13f79535-47bb-0310-9956-ffa450edef68
2005-07-05 02:29:03 +00:00
Mark Harwood
0062898ada Updated version of MemoryIndex - reliant on new Term.createTerm() method in Trunk
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@208688 13f79535-47bb-0310-9956-ffa450edef68
2005-06-30 21:40:05 +00:00
Mark Harwood
7894a0c0c0 Added (simple) SpanQuery support - matches any terms declared in Spans - proper impl should check for distances
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@208673 13f79535-47bb-0310-9956-ffa450edef68
2005-06-30 20:09:58 +00:00
Daniel Naber
6da2ef197d update to Apache Software License 2.0
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@189623 13f79535-47bb-0310-9956-ffa450edef68
2005-06-08 19:48:19 +00:00
Mark Harwood
07cee0b287 git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@179637 13f79535-47bb-0310-9956-ffa450edef68 2005-06-02 20:27:06 +00:00
Daniel Naber
a3f99b1f43 small javadoc improvements
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178893 13f79535-47bb-0310-9956-ffa450edef68
2005-05-28 22:58:17 +00:00
Daniel Naber
27597a5c71 small javadoc fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178892 13f79535-47bb-0310-9956-ffa450edef68
2005-05-28 22:40:36 +00:00
Daniel Naber
fe52019614 javadoc fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178880 13f79535-47bb-0310-9956-ffa450edef68
2005-05-28 19:21:49 +00:00
Daniel Naber
06bb3230ff make this non-public, as it's not documented properly and has a confusing name
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178878 13f79535-47bb-0310-9956-ffa450edef68
2005-05-28 19:05:54 +00:00
Daniel Naber
bd2345d856 small javadoc fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178839 13f79535-47bb-0310-9956-ffa450edef68
2005-05-27 23:07:00 +00:00
Daniel Naber
952cfd54be small javadoc fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178833 13f79535-47bb-0310-9956-ffa450edef68
2005-05-27 23:02:07 +00:00
Daniel Naber
816f370c0e small javadoc fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178832 13f79535-47bb-0310-9956-ffa450edef68
2005-05-27 23:00:49 +00:00
Daniel Naber
9d2d4ead75 use entity for umlaut
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178239 13f79535-47bb-0310-9956-ffa450edef68
2005-05-24 18:44:20 +00:00
Daniel Naber
69b1f490df javadoc: fix typo and use HTML entity so generated HTML is correct
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169681 13f79535-47bb-0310-9956-ffa450edef68
2005-05-11 19:33:12 +00:00
David Spencer
1d68f8c88d Logic ignored stop words were in a early version of this code but it was taken out in the belief that there
was no point in explicitly looking for them as the scoring algorithm would effictively ignore them.

I did a test and indexed 700 pages on a corporate web site and then ran the MoreLikeThis code on them
and 1/2 of the docs had stop words identified as interesting.

So - I added code in to ignore stop words, but make it backward compatible so that by default this code
is not used.




git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169512 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 19:29:56 +00:00
David Spencer
81087e8bb6 Touchup javadoc.
Make retrieveInterestingTerms only return the top terms, not all terms.



git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169511 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 19:10:28 +00:00
David Spencer
175cf8a9fd [1] Added comments to retrieveTerms() to document the return value.
[2] Added convenience routine retrieveInterestingTerms() which makes it easier to get at the "interesting words" in a document.




git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169508 13f79535-47bb-0310-9956-ffa450edef68
2005-05-10 18:49:43 +00:00
David Spencer
c696188668 don't print out summary unless it's present
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169366 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 21:37:50 +00:00
David Spencer
7f8bf69311 cleanup deprecated warnings so it compiles cleanly w/ the current lucene code, lucene-core-1.9-rc1-dev.jar
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169365 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 21:36:22 +00:00
David Spencer
c680751f63 test checkin of README, just to verify my permissions
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169349 13f79535-47bb-0310-9956-ffa450edef68
2005-05-09 19:25:40 +00:00
Erik Hatcher
78dbe41805 prefix all JARs with lucene-
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168986 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 23:43:54 +00:00
Erik Hatcher
e8c90fb050 rename WordNet to wordnet, required intermediate move due to OS case insensitivity
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168480 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:32:00 +00:00
Erik Hatcher
5fd5169a6f temporary move to lowercase WordNet
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168479 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:31:11 +00:00
Erik Hatcher
dd472377dd adjust code to fix compile/javadoc errors on JDK 1.5
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168478 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:26:08 +00:00
Erik Hatcher
a12dac37b4 adjust project names for consistency
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168476 13f79535-47bb-0310-9956-ffa450edef68
2005-05-06 00:24:18 +00:00
Mark Harwood
12a91b4395 Fixed bug where docs larger than maxDocBytesToAnalyze would cause last fragment to be sized as remainder of doc (which could be huge).
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168452 13f79535-47bb-0310-9956-ffa450edef68
2005-05-05 22:40:45 +00:00
Erik Hatcher
8f70c09b9b Wolfgang is non-stop with the additions. Easy enough to paste in, so here it is with a Collection-based TokenStream
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@168029 13f79535-47bb-0310-9956-ffa450edef68
2005-05-04 00:24:17 +00:00
Erik Hatcher
f94ebdb41e applied norm caching path from Wolfgang
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@167958 13f79535-47bb-0310-9956-ffa450edef68
2005-05-03 19:01:58 +00:00