Robert Muir
e053d80455
LUCENE-1966: ArabicAnalyzer stopwords cleanup
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@825110 13f79535-47bb-0310-9956-ffa450edef68
2009-10-14 12:24:18 +00:00
Uwe Schindler
4cded8042c
LUCENE-1946, LUCENE-1753: Remove deprecated TokenStream API. What a pity, my wonderful backwards layer is gone! :-( Enforce decorator pattern by making the rest of TokenStreams final.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@824116 13f79535-47bb-0310-9956-ffa450edef68
2009-10-11 17:35:09 +00:00
Robert Muir
877c9ff521
For fa analyzer, add a test for custom stopwords
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@823546 13f79535-47bb-0310-9956-ffa450edef68
2009-10-09 13:27:14 +00:00
Robert Muir
956c8cda82
LUCENE-1963: Lowercase before stopfilter in ArabicAnalyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@823534 13f79535-47bb-0310-9956-ffa450edef68
2009-10-09 12:55:47 +00:00
Michael McCandless
f20e419aff
LUCENE-1950: remove autoCommit=true from IndexWriter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@823321 13f79535-47bb-0310-9956-ffa450edef68
2009-10-08 20:57:32 +00:00
Simon Willnauer
286cb1f9d2
LUCENE-1962: Cleaned up Persian & Arabic Analyzer. Prevent default stopword list from being loaded more than once.
...
- replace if blocks with a single switch
- marking private members final where needed
- changed protected visibility to final in final class.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@823180 13f79535-47bb-0310-9956-ffa450edef68
2009-10-08 13:54:18 +00:00
Michael McCandless
c11776d2c6
remove tags
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@822781 13f79535-47bb-0310-9956-ffa450edef68
2009-10-07 15:41:09 +00:00
Michael Busch
d7d9241ef7
LUCENE-1856: Remove Hits.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@822587 13f79535-47bb-0310-9956-ffa450edef68
2009-10-07 05:08:22 +00:00
Karl-Johan Wettin
b3f73db537
LUCENE-1939: IndexOutOfBoundsException at ShingleMatrixFilter's Iterator#hasNext method on exhausted streams.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@821888 13f79535-47bb-0310-9956-ffa450edef68
2009-10-05 16:01:17 +00:00
Uwe Schindler
236baf9fcb
LUCENE-1944: Cleanup contrib to not use deprecated APIs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@821444 13f79535-47bb-0310-9956-ffa450edef68
2009-10-03 23:24:33 +00:00
Robert Muir
1f9088b038
LUCENE-1943: Improve performance of ChineseFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@821322 13f79535-47bb-0310-9956-ffa450edef68
2009-10-03 13:54:12 +00:00
Karl-Johan Wettin
4f878bdc93
LUCENE-1257: Generified ShingleMatrixFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@821311 13f79535-47bb-0310-9956-ffa450edef68
2009-10-03 13:17:11 +00:00
Uwe Schindler
af0e97fd72
LUCENE-1257: Replace StringBuffer by StringBuilder where possible
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@821185 13f79535-47bb-0310-9956-ffa450edef68
2009-10-02 22:11:10 +00:00
Robert Muir
dd9c1b0101
LUCENE-1936: Remove deprecated charset support from Greek and Russian analyzers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@820756 13f79535-47bb-0310-9956-ffa450edef68
2009-10-01 19:20:09 +00:00
Uwe Schindler
c1f5e753d7
LUCENE-1933: Provide a convenience AttributeFactory that creates a Token instance for all basic attributes
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@820658 13f79535-47bb-0310-9956-ffa450edef68
2009-10-01 13:49:46 +00:00
Uwe Schindler
ec90bc2202
LUCENE-1855: Change AttributeSource API to use generics
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@820553 13f79535-47bb-0310-9956-ffa450edef68
2009-10-01 07:53:43 +00:00
Uwe Schindler
4666489857
LUCENE-1906: Fix backwards problems with CharStream and Tokenizers with custom reset(Reader) method.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@813671 13f79535-47bb-0310-9956-ffa450edef68
2009-09-11 06:12:13 +00:00
Uwe Schindler
a8eb5c4b80
LUCENE-1903: Fix incorrect ShingleFilter behavior when outputUnigrams == false
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@812779 13f79535-47bb-0310-9956-ffa450edef68
2009-09-09 06:02:54 +00:00
Mark Robert Miller
53dcf2c320
cleanup unused reusableToken instances that were left behind
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@811984 13f79535-47bb-0310-9956-ffa450edef68
2009-09-07 03:29:38 +00:00
Chris M. Hostetter
c56f4c224f
LUCENE-1884: massive javadoc and comment cleanup -- primarily dealing with typos
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@811070 13f79535-47bb-0310-9956-ffa450edef68
2009-09-03 18:31:41 +00:00
Mark Robert Miller
26c5af3a33
LUCENE-1865 Add a ton of missing license headers throughout test/demo/contrib
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@808567 13f79535-47bb-0310-9956-ffa450edef68
2009-08-27 18:48:16 +00:00
Chris M. Hostetter
6b2eae0b5a
javadoc is historicly very finicky about relative names in @link tags when the name doesn't resolve in the class hierarchy at the current accesse level (ie: even if a class is in the same package, an @link to it's short namewon't resolve if it's not actaully part of a signature for a method/field being documented)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@808427 13f79535-47bb-0310-9956-ffa450edef68
2009-08-27 14:28:22 +00:00
Mark Robert Miller
8cc45886d5
fix javadoc links in ngram contrib
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@808221 13f79535-47bb-0310-9956-ffa450edef68
2009-08-26 23:26:10 +00:00
Mark Robert Miller
c593328eb0
convert the remaining @todo's to TODO:
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@808211 13f79535-47bb-0310-9956-ffa450edef68
2009-08-26 22:39:40 +00:00
Uwe Schindler
367b35f0cb
LUCENE-1843: Update contrib tests to conform to onlyUseNewAPI; refactored assertAnalyzesTo and others into the new BaseTokenStreamTestCase class; Rewrote TestMappingCharFilter to use the new assert functions, too; performance improvements of Token.copyTo(); new impl of SingleTokenTokenStream
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@807190 13f79535-47bb-0310-9956-ffa450edef68
2009-08-24 12:44:13 +00:00
Uwe Schindler
5dd1810b0c
LUCENE-1846: Fix more Locale problems
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@807117 13f79535-47bb-0310-9956-ffa450edef68
2009-08-24 08:31:34 +00:00
Uwe Schindler
4745c8db05
LUCENE-1825: Another one :(
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@806990 13f79535-47bb-0310-9956-ffa450edef68
2009-08-23 16:35:50 +00:00
Uwe Schindler
c2f95d474b
LUCENE-1825: Additional incorrect getAttribute usage
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@806986 13f79535-47bb-0310-9956-ffa450edef68
2009-08-23 16:17:08 +00:00
Robert Muir
6847c0e2bd
LUCENE-1826: the new tokenizer constructors should not allow deprecated charsets
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@806961 13f79535-47bb-0310-9956-ffa450edef68
2009-08-23 12:39:28 +00:00
Michael Busch
64ed5f39a5
LUCENE-1826: Add constructors that take AttributeSource and AttributeFactory to all Tokenizer implementations.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@806942 13f79535-47bb-0310-9956-ffa450edef68
2009-08-23 08:34:22 +00:00
Robert Muir
1ebbe2abd1
LUCENE-1793: Deprecate custom encoding support in Greek and Russian analyzers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@806886 13f79535-47bb-0310-9956-ffa450edef68
2009-08-22 20:36:06 +00:00
Robert Muir
1d9a96c2fc
LUCENE-1813: Add option to ReverseStringFilter to mark reversed tokens
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@805769 13f79535-47bb-0310-9956-ffa450edef68
2009-08-19 12:07:15 +00:00
Robert Muir
58cd4a04d7
LUCENE-1794: Ensure analyzer options are applied immediately when using reusable token streams
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@805766 13f79535-47bb-0310-9956-ffa450edef68
2009-08-19 11:56:31 +00:00
Robert Muir
3887cf9419
LUCENE-1692: Additional tests and javadocs for contrib/analyzers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@805400 13f79535-47bb-0310-9956-ffa450edef68
2009-08-18 12:55:26 +00:00
Robert Muir
d2af6ef0bd
LUCENE-1794: Implement TokenStream reuse for contrib Analyzers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@804680 13f79535-47bb-0310-9956-ffa450edef68
2009-08-16 12:37:05 +00:00
Uwe Schindler
b16e0aa31b
LUCENE-1801: All Tokenizers/TokenStreams that are source of tokens call AttributeSource.clearAttributes() first. Made Token.clear() consistent to AttributeImpl (clear everything)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@804392 13f79535-47bb-0310-9956-ffa450edef68
2009-08-14 22:01:42 +00:00
Robert Muir
43a5bd6c19
LUCENE-1628: Add Persian Analyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@802955 13f79535-47bb-0310-9956-ffa450edef68
2009-08-10 23:29:27 +00:00
Michael McCandless
35ea5c1350
LUCENE-1786: make the patternsFileContent static, so we only load it once, not 4 times, when running this test
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@802767 13f79535-47bb-0310-9956-ffa450edef68
2009-08-10 12:47:52 +00:00
Uwe Schindler
911df49bcb
LUCENE-1607: Change some more String.intern() in contrib
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@802095 13f79535-47bb-0310-9956-ffa450edef68
2009-08-07 17:19:53 +00:00
Robert Muir
820620f3a7
LUCENE-1758: Update ArabicAnalyzer to light10 stemming, stopwords improvements, lowercase non-arabic text
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@801348 13f79535-47bb-0310-9956-ffa450edef68
2009-08-05 18:22:22 +00:00
Grant Ingersoll
ab276a5ab9
Javadoc updates
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@801219 13f79535-47bb-0310-9956-ffa450edef68
2009-08-05 13:17:20 +00:00
Grant Ingersoll
c0d86a4e30
Javadoc updates
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@801218 13f79535-47bb-0310-9956-ffa450edef68
2009-08-05 13:17:11 +00:00
Michael Busch
c91651e4f2
LUCENE-1775: Change contrib tee/sink filters to use new TokenStream API.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@800606 13f79535-47bb-0310-9956-ffa450edef68
2009-08-03 22:45:27 +00:00
Michael Busch
457c29d31e
LUCENE-1775: Change remaining contrib TokenFilters (shingle, prefix-suffix) to use the new TokenStream API.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@800195 13f79535-47bb-0310-9956-ffa450edef68
2009-08-03 04:33:10 +00:00
Michael Busch
b91f993a0e
LUCENE-1460: Additional cleanup in two contrib junit tests.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@799973 13f79535-47bb-0310-9956-ffa450edef68
2009-08-02 02:57:30 +00:00
Michael Busch
537aeb24e0
LUCENE-1759: Set final offset correctly in contrib TokenStreams.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@799968 13f79535-47bb-0310-9956-ffa450edef68
2009-08-02 02:10:46 +00:00
Michael Busch
1743081b07
LUCENE-1460: Changed TokenStreams/TokenFilters in contrib to use the new TokenStream API.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@799953 13f79535-47bb-0310-9956-ffa450edef68
2009-08-01 22:52:32 +00:00
Simon Willnauer
999f6157c7
LUCENE-1728: Splitted contrib/analyzers into common and smartcn. Smartcn depends on a large dictionary that causes the analyzers jar to grow up to 3MB compressed size.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@797150 13f79535-47bb-0310-9956-ffa450edef68
2009-07-23 17:11:22 +00:00