Uwe Schindler
5abaff61fa
LUCENE-2266: Fixed offset calculations in NGramTokenFilter and EdgeNGramTokenFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@910078 13f79535-47bb-0310-9956-ffa450edef68
2010-02-14 21:33:12 +00:00
Robert Muir
a6b7c5552b
LUCENE-2055: better snowball integration, deprecate buggy handcoded snowball impls, restructure lang support
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@907125 13f79535-47bb-0310-9956-ffa450edef68
2010-02-05 23:05:46 +00:00
Robert Muir
23d403b6bb
LUCENE-2234: Hindi Analyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@906468 13f79535-47bb-0310-9956-ffa450edef68
2010-02-04 12:41:56 +00:00
Simon Willnauer
5ad7974d3f
LUCENE-2242: Contrib CharTokenizer classes should be instantiated using their new Version based ctors
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@905065 13f79535-47bb-0310-9956-ffa450edef68
2010-01-31 16:01:17 +00:00
Robert Muir
39b9f97cd4
LUCENE-2209: add @lucene.internal/@lucene.experimental javadoc tags
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@905057 13f79535-47bb-0310-9956-ffa450edef68
2010-01-31 15:20:26 +00:00
Robert Muir
fdf4ea2448
LUCENE-2218: Improvements to ShingleFilter (performance, configurable sep. char and min shingle size)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@905043 13f79535-47bb-0310-9956-ffa450edef68
2010-01-31 14:04:01 +00:00
Simon Willnauer
537bb742cd
LUCENE-2238: deprecated ChineseAnalyzer / ChineseTokenizer in favor of StandardAnalyzer / Tokenizer which does the same thing
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@904521 13f79535-47bb-0310-9956-ffa450edef68
2010-01-29 15:44:54 +00:00
Uwe Schindler
49b3a12971
LUCENE-2183: Added Unicode 4 support to CharTokenizer and its subclasses
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@904401 13f79535-47bb-0310-9956-ffa450edef68
2010-01-29 07:43:45 +00:00
Uwe Schindler
e9a979f1eb
LUCENE-2198: Support protected words in stemming TokenFilters using a new KeywordAttribute
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@903608 13f79535-47bb-0310-9956-ffa450edef68
2010-01-27 11:19:05 +00:00
Michael McCandless
9b3b890f45
LUCENE-2213: rename ArrayUtil.getNextSize -> oversize; tweak how it picks the next size
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@901662 13f79535-47bb-0310-9956-ffa450edef68
2010-01-21 11:54:50 +00:00
Robert Muir
ba2b0851b8
LUCENE-2226: move contrib/snowball to contrib/analyzers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@901505 13f79535-47bb-0310-9956-ffa450edef68
2010-01-21 02:45:09 +00:00
Robert Muir
78e45c92a7
LUCENE-2207: CJKTokenizer generates tokens with incorrect offsets
...
LUCENE-2219: Chinese, SmartChinese, Wikipedia tokenizers generate incorrect offsets, test end() in BaseTokenStreamTestCase
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@900196 13f79535-47bb-0310-9956-ffa450edef68
2010-01-17 19:25:57 +00:00
Michael McCandless
9caaad0fea
remove $ tags
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@899914 13f79535-47bb-0310-9956-ffa450edef68
2010-01-16 10:31:33 +00:00
Uwe Schindler
3f722b66a5
LUCENE-2211: Fix various missing clearAttributes() and improve BaseTokenStreamTestCase to check for this trap
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@899627 13f79535-47bb-0310-9956-ffa450edef68
2010-01-15 13:42:18 +00:00
Robert Muir
0ad2f181aa
remaining eol-style and inconsistent newlines fixes
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@897712 13f79535-47bb-0310-9956-ffa450edef68
2010-01-10 21:53:13 +00:00
Robert Muir
7d5844740e
LUCENE-2200: final classes had non-overriding protected members
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@897707 13f79535-47bb-0310-9956-ffa450edef68
2010-01-10 21:09:58 +00:00
Simon Willnauer
673e368bf7
LUCENE-2199: ShingleFilter skipped over tri-gram shingles if outputUnigram was set to false
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@897672 13f79535-47bb-0310-9956-ffa450edef68
2010-01-10 18:06:19 +00:00
Robert Muir
36ba04585e
fix javadoc header
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@896112 13f79535-47bb-0310-9956-ffa450edef68
2010-01-05 16:16:01 +00:00
Robert Muir
d22b7a98cd
LUCENE-2185: add @Deprecated annotations
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@895342 13f79535-47bb-0310-9956-ffa450edef68
2010-01-03 10:31:42 +00:00
Robert Muir
a949836869
LUCENE-2034: Refactor analyzer reuse and stopword handling
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@895339 13f79535-47bb-0310-9956-ffa450edef68
2010-01-03 08:48:17 +00:00
Robert Muir
16eaa6198f
LUCENE-1786: improve performance of TestCompoundWordTokenFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@892355 13f79535-47bb-0310-9956-ffa450edef68
2009-12-18 19:25:24 +00:00
Uwe Schindler
dad7e60253
LUCENE-2157: DelimitedPayloadTokenFilter no longer copies the buffer over itsself, instead it sets the length to the offset of the delimiter. Also optimizes logic and IdentityEncoder to use NIO.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@890791 13f79535-47bb-0310-9956-ffa450edef68
2009-12-15 13:27:27 +00:00
Simon Willnauer
6c0c318218
LUCENE-2100: Marked all contrib Analyzer subclasses as final. Analyzers should be only act as a composition of TokenStreams, users should compose their own analyzers instead of subclassing existing ones.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@888799 13f79535-47bb-0310-9956-ffa450edef68
2009-12-09 13:32:32 +00:00
Simon Willnauer
9ee4ce0fd5
LUCENE-2102: Add Turkish LowerCaseFilter which handles Turkish and Azeri unique casing behavior correctly.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@887535 13f79535-47bb-0310-9956-ffa450edef68
2009-12-05 12:46:05 +00:00
Simon Willnauer
a0bf23d762
fixed javadoc warnings due to missing closing braces
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@887122 13f79535-47bb-0310-9956-ffa450edef68
2009-12-04 09:10:21 +00:00
Robert Muir
892bc7f55a
LUCENE-2062: Bulgarian Analyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@886190 13f79535-47bb-0310-9956-ffa450edef68
2009-12-02 16:08:56 +00:00
Uwe Schindler
9edfb3b66a
LUCENE-2094: Prepare CharArraySet for Unicode 4.0
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@885592 13f79535-47bb-0310-9956-ffa450edef68
2009-11-30 21:49:21 +00:00
Uwe Schindler
09fd7abd7a
fix javadoc
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@885571 13f79535-47bb-0310-9956-ffa450edef68
2009-11-30 19:55:57 +00:00
Robert Muir
2ef402eefa
LUCENE-2067: Add a stemmer for Czech
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@885216 13f79535-47bb-0310-9956-ffa450edef68
2009-11-29 11:59:38 +00:00
Robert Muir
f0e064eb41
LUCENE-2069: supplementary char support for lowercasefilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@885024 13f79535-47bb-0310-9956-ffa450edef68
2009-11-27 21:34:11 +00:00
Simon Willnauer
e69141c51a
LUCENE-2068: Fixed ReverseStringFilter for Unicode 4.0. Reverse Supplementary Characters correctly.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@883149 13f79535-47bb-0310-9956-ffa450edef68
2009-11-22 21:09:42 +00:00
Simon Willnauer
ba4769d418
Fixed JavaDoc - spelling issues in @param
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@880727 13f79535-47bb-0310-9956-ffa450edef68
2009-11-16 12:33:10 +00:00
Uwe Schindler
00f07ee460
LUCENE-2051: Contrib Analyzer Setters should be deprecated and replace with ctor arguments, thanks to Simon Willnauer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@880715 13f79535-47bb-0310-9956-ffa450edef68
2009-11-16 11:48:37 +00:00
Uwe Schindler
7370094ead
Fix some javadocs errors in contrib
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@880706 13f79535-47bb-0310-9956-ffa450edef68
2009-11-16 11:08:25 +00:00
Uwe Schindler
945e7eda52
LUCENE-2052: add varargs where possible
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@836248 13f79535-47bb-0310-9956-ffa450edef68
2009-11-14 19:26:49 +00:00
Uwe Schindler
5b83cc59b2
LUCENE-1257: Generics: *heavy* Robert Muir & mine patch
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@834847 13f79535-47bb-0310-9956-ffa450edef68
2009-11-11 12:18:34 +00:00
Robert Muir
786eb6ce0d
LUCENE-2012: add remaining @overrides (contrib,demo)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@833867 13f79535-47bb-0310-9956-ffa450edef68
2009-11-08 12:45:12 +00:00
Robert Muir
80e8bfbbc9
LUCENE-2031: Move patternanalyzer from memory contrib into analyzers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@832889 13f79535-47bb-0310-9956-ffa450edef68
2009-11-04 22:37:01 +00:00
Simon Willnauer
a5da31ef90
Trivial fix of ignored return value of reader.read(). Done during hackathon - review by uschindler
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@832554 13f79535-47bb-0310-9956-ffa450edef68
2009-11-03 20:58:12 +00:00
Simon Willnauer
e84f86d497
Trivial fix - changed new Character('_') into Character.valueOf('_')
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@832549 13f79535-47bb-0310-9956-ffa450edef68
2009-11-03 20:49:57 +00:00
Robert Muir
9b0c42a9c1
fix confusing smartcn javadoc bug
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831913 13f79535-47bb-0310-9956-ffa450edef68
2009-11-02 15:08:42 +00:00
Robert Muir
066eac49a4
LUCENE-2022: remove deprecated api from contrib/analysis and wikipedia
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831425 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 19:04:30 +00:00
Robert Muir
cc374d7efc
set RussianLowerCaseFilter deprecation to 4.0
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831391 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 17:11:10 +00:00
Michael McCandless
13593aa802
LUCENE-2002: restore RussianLowerCaseFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831284 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 12:45:11 +00:00
Robert Muir
0733caac5f
LUCENE-2021: use chararrayset in french elision filter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831268 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 11:25:10 +00:00
Robert Muir
8861ba2ffd
LUCENE-2014: add a thai test to prevent any similar regression
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831189 13f79535-47bb-0310-9956-ffa450edef68
2009-10-30 03:26:44 +00:00
Robert Muir
19e55ea991
LUCENE-1257: port smartchineseanalyzer to java 5
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@831121 13f79535-47bb-0310-9956-ffa450edef68
2009-10-29 22:29:50 +00:00
Robert Muir
1b38f9c24d
LUCENE-2014: SmartChineseAnalyzer position increment bug
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@830871 13f79535-47bb-0310-9956-ffa450edef68
2009-10-29 09:22:37 +00:00
Michael McCandless
74f872182e
fix some javadoc warnings
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@829817 13f79535-47bb-0310-9956-ffa450edef68
2009-10-26 14:55:51 +00:00
Uwe Schindler
7902c4b729
Remove the remaining deprecated ctors from TokenStream API test base class (BaseTokenStreamTestCase). They were used to test old and new TokenStream API.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@829244 13f79535-47bb-0310-9956-ffa450edef68
2009-10-23 21:21:17 +00:00