203 Commits

Author SHA1 Message Date
Uwe Schindler
5abaff61fa LUCENE-2266: Fixed offset calculations in NGramTokenFilter and EdgeNGramTokenFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@910078 13f79535-47bb-0310-9956-ffa450edef68
2010-02-14 21:33:12 +00:00
Robert Muir
a6b7c5552b LUCENE-2055: better snowball integration, deprecate buggy handcoded snowball impls, restructure lang support
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@907125 13f79535-47bb-0310-9956-ffa450edef68
2010-02-05 23:05:46 +00:00
Robert Muir
23d403b6bb LUCENE-2234: Hindi Analyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@906468 13f79535-47bb-0310-9956-ffa450edef68
2010-02-04 12:41:56 +00:00
Simon Willnauer
5ad7974d3f LUCENE-2242: Contrib CharTokenizer classes should be instantiated using their new Version based ctors
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@905065 13f79535-47bb-0310-9956-ffa450edef68
2010-01-31 16:01:17 +00:00
Robert Muir
39b9f97cd4 LUCENE-2209: add @lucene.internal/@lucene.experimental javadoc tags
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@905057 13f79535-47bb-0310-9956-ffa450edef68
2010-01-31 15:20:26 +00:00
Robert Muir
fdf4ea2448 LUCENE-2218: Improvements to ShingleFilter (performance, configurable sep. char and min shingle size)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@905043 13f79535-47bb-0310-9956-ffa450edef68
2010-01-31 14:04:01 +00:00
Simon Willnauer
537bb742cd LUCENE-2238: deprecated ChineseAnalyzer / ChineseTokenizer in favor of StandardAnalyzer / Tokenizer which does the same thing
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@904521 13f79535-47bb-0310-9956-ffa450edef68
2010-01-29 15:44:54 +00:00
Uwe Schindler
49b3a12971 LUCENE-2183: Added Unicode 4 support to CharTokenizer and its subclasses
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@904401 13f79535-47bb-0310-9956-ffa450edef68
2010-01-29 07:43:45 +00:00
Uwe Schindler
e9a979f1eb LUCENE-2198: Support protected words in stemming TokenFilters using a new KeywordAttribute
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@903608 13f79535-47bb-0310-9956-ffa450edef68
2010-01-27 11:19:05 +00:00
Michael McCandless
9b3b890f45 LUCENE-2213: rename ArrayUtil.getNextSize -> oversize; tweak how it picks the next size
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@901662 13f79535-47bb-0310-9956-ffa450edef68
2010-01-21 11:54:50 +00:00
Robert Muir
ba2b0851b8 LUCENE-2226: move contrib/snowball to contrib/analyzers
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@901505 13f79535-47bb-0310-9956-ffa450edef68
2010-01-21 02:45:09 +00:00
Robert Muir
78e45c92a7 LUCENE-2207: CJKTokenizer generates tokens with incorrect offsets
LUCENE-2219: Chinese, SmartChinese, Wikipedia tokenizers generate incorrect offsets, test end() in BaseTokenStreamTestCase


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@900196 13f79535-47bb-0310-9956-ffa450edef68
2010-01-17 19:25:57 +00:00
Michael McCandless
9caaad0fea remove $ tags
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@899914 13f79535-47bb-0310-9956-ffa450edef68
2010-01-16 10:31:33 +00:00
Uwe Schindler
3f722b66a5 LUCENE-2211: Fix various missing clearAttributes() and improve BaseTokenStreamTestCase to check for this trap
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@899627 13f79535-47bb-0310-9956-ffa450edef68
2010-01-15 13:42:18 +00:00
Robert Muir
0ad2f181aa remaining eol-style and inconsistent newlines fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@897712 13f79535-47bb-0310-9956-ffa450edef68
2010-01-10 21:53:13 +00:00
Robert Muir
7d5844740e LUCENE-2200: final classes had non-overriding protected members
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@897707 13f79535-47bb-0310-9956-ffa450edef68
2010-01-10 21:09:58 +00:00
Simon Willnauer
673e368bf7 LUCENE-2199: ShingleFilter skipped over tri-gram shingles if outputUnigram was set to false
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@897672 13f79535-47bb-0310-9956-ffa450edef68
2010-01-10 18:06:19 +00:00
Robert Muir
36ba04585e fix javadoc header
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@896112 13f79535-47bb-0310-9956-ffa450edef68
2010-01-05 16:16:01 +00:00
Robert Muir
d22b7a98cd LUCENE-2185: add @Deprecated annotations
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@895342 13f79535-47bb-0310-9956-ffa450edef68
2010-01-03 10:31:42 +00:00
Robert Muir
a949836869 LUCENE-2034: Refactor analyzer reuse and stopword handling
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@895339 13f79535-47bb-0310-9956-ffa450edef68
2010-01-03 08:48:17 +00:00
Robert Muir
16eaa6198f LUCENE-1786: improve performance of TestCompoundWordTokenFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@892355 13f79535-47bb-0310-9956-ffa450edef68
2009-12-18 19:25:24 +00:00
Uwe Schindler
dad7e60253 LUCENE-2157: DelimitedPayloadTokenFilter no longer copies the buffer over itsself, instead it sets the length to the offset of the delimiter. Also optimizes logic and IdentityEncoder to use NIO.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@890791 13f79535-47bb-0310-9956-ffa450edef68
2009-12-15 13:27:27 +00:00
Simon Willnauer
6c0c318218 LUCENE-2100: Marked all contrib Analyzer subclasses as final. Analyzers should be only act as a composition of TokenStreams, users should compose their own analyzers instead of subclassing existing ones.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@888799 13f79535-47bb-0310-9956-ffa450edef68
2009-12-09 13:32:32 +00:00
Simon Willnauer
9ee4ce0fd5 LUCENE-2102: Add Turkish LowerCaseFilter which handles Turkish and Azeri unique casing behavior correctly.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@887535 13f79535-47bb-0310-9956-ffa450edef68
2009-12-05 12:46:05 +00:00
Simon Willnauer
a0bf23d762 fixed javadoc warnings due to missing closing braces
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@887122 13f79535-47bb-0310-9956-ffa450edef68
2009-12-04 09:10:21 +00:00
Robert Muir
892bc7f55a LUCENE-2062: Bulgarian Analyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@886190 13f79535-47bb-0310-9956-ffa450edef68
2009-12-02 16:08:56 +00:00
Uwe Schindler
9edfb3b66a LUCENE-2094: Prepare CharArraySet for Unicode 4.0
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@885592 13f79535-47bb-0310-9956-ffa450edef68
2009-11-30 21:49:21 +00:00
Uwe Schindler
09fd7abd7a fix javadoc
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@885571 13f79535-47bb-0310-9956-ffa450edef68
2009-11-30 19:55:57 +00:00
Robert Muir
2ef402eefa LUCENE-2067: Add a stemmer for Czech
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@885216 13f79535-47bb-0310-9956-ffa450edef68
2009-11-29 11:59:38 +00:00
Robert Muir
f0e064eb41 LUCENE-2069: supplementary char support for lowercasefilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@885024 13f79535-47bb-0310-9956-ffa450edef68
2009-11-27 21:34:11 +00:00
Simon Willnauer
e69141c51a LUCENE-2068: Fixed ReverseStringFilter for Unicode 4.0. Reverse Supplementary Characters correctly.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@883149 13f79535-47bb-0310-9956-ffa450edef68
2009-11-22 21:09:42 +00:00
Simon Willnauer
ba4769d418 Fixed JavaDoc - spelling issues in @param
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@880727 13f79535-47bb-0310-9956-ffa450edef68
2009-11-16 12:33:10 +00:00
Uwe Schindler
00f07ee460 LUCENE-2051: Contrib Analyzer Setters should be deprecated and replace with ctor arguments, thanks to Simon Willnauer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@880715 13f79535-47bb-0310-9956-ffa450edef68
2009-11-16 11:48:37 +00:00
Uwe Schindler
7370094ead Fix some javadocs errors in contrib
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@880706 13f79535-47bb-0310-9956-ffa450edef68
2009-11-16 11:08:25 +00:00
Uwe Schindler
945e7eda52 LUCENE-2052: add varargs where possible
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@836248 13f79535-47bb-0310-9956-ffa450edef68
2009-11-14 19:26:49 +00:00
Uwe Schindler
5b83cc59b2 LUCENE-1257: Generics: *heavy* Robert Muir & mine patch
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@834847 13f79535-47bb-0310-9956-ffa450edef68
2009-11-11 12:18:34 +00:00
Robert Muir
786eb6ce0d LUCENE-2012: add remaining @overrides (contrib,demo)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@833867 13f79535-47bb-0310-9956-ffa450edef68
2009-11-08 12:45:12 +00:00
Robert Muir
80e8bfbbc9 LUCENE-2031: Move patternanalyzer from memory contrib into analyzers
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@832889 13f79535-47bb-0310-9956-ffa450edef68
2009-11-04 22:37:01 +00:00
Simon Willnauer
a5da31ef90 Trivial fix of ignored return value of reader.read(). Done during hackathon - review by uschindler
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@832554 13f79535-47bb-0310-9956-ffa450edef68
2009-11-03 20:58:12 +00:00
Simon Willnauer
e84f86d497 Trivial fix - changed new Character('_') into Character.valueOf('_')
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@832549 13f79535-47bb-0310-9956-ffa450edef68
2009-11-03 20:49:57 +00:00
Robert Muir
9b0c42a9c1 fix confusing smartcn javadoc bug
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831913 13f79535-47bb-0310-9956-ffa450edef68
2009-11-02 15:08:42 +00:00
Robert Muir
066eac49a4 LUCENE-2022: remove deprecated api from contrib/analysis and wikipedia
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831425 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 19:04:30 +00:00
Robert Muir
cc374d7efc set RussianLowerCaseFilter deprecation to 4.0
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831391 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 17:11:10 +00:00
Michael McCandless
13593aa802 LUCENE-2002: restore RussianLowerCaseFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831284 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 12:45:11 +00:00
Robert Muir
0733caac5f LUCENE-2021: use chararrayset in french elision filter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831268 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 11:25:10 +00:00
Robert Muir
8861ba2ffd LUCENE-2014: add a thai test to prevent any similar regression
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831189 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 03:26:44 +00:00
Robert Muir
19e55ea991 LUCENE-1257: port smartchineseanalyzer to java 5
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831121 13f79535-47bb-0310-9956-ffa450edef68
2009-10-29 22:29:50 +00:00
Robert Muir
1b38f9c24d LUCENE-2014: SmartChineseAnalyzer position increment bug
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@830871 13f79535-47bb-0310-9956-ffa450edef68
2009-10-29 09:22:37 +00:00
Michael McCandless
74f872182e fix some javadoc warnings
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@829817 13f79535-47bb-0310-9956-ffa450edef68
2009-10-26 14:55:51 +00:00
Uwe Schindler
7902c4b729 Remove the remaining deprecated ctors from TokenStream API test base class (BaseTokenStreamTestCase). They were used to test old and new TokenStream API.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@829244 13f79535-47bb-0310-9956-ffa450edef68
2009-10-23 21:21:17 +00:00