Commit Graph

188 Commits

Author SHA1 Message Date
Robert Muir 6a07201844 don't fail test due to jre bugs in String.toLowerCase
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243415 13f79535-47bb-0310-9956-ffa450edef68
2012-02-13 04:50:12 +00:00
Robert Muir 590741dcfe LUCENE-3766: Remove Tokenizer's default ctor
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242890 13f79535-47bb-0310-9956-ffa450edef68
2012-02-10 19:12:35 +00:00
Robert Muir 8a50cefc6b LUCENE-3748: EnglishPossessiveFilter did not work with a proper right quotation mark
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242740 13f79535-47bb-0310-9956-ffa450edef68
2012-02-10 11:01:11 +00:00
Robert Muir 72ae3171be LUCENE-3765: Trappy behavior with StopFilter/ignoreCase
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242497 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 19:59:50 +00:00
Robert Muir c0319d5928 SOLR-3056: document expectations in these files
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1241960 13f79535-47bb-0310-9956-ffa450edef68
2012-02-08 16:27:47 +00:00
Robert Muir dac1b58277 SOLR-3097, SOLR-3105: add fieldtypes for different languages to the example
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1241878 13f79535-47bb-0310-9956-ffa450edef68
2012-02-08 12:07:52 +00:00
Tommaso Teofili 6d3bb736f3 [LUCENE-3744] - applied patch for whiteList usage in TypeTokenFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240034 13f79535-47bb-0310-9956-ffa450edef68
2012-02-03 09:13:17 +00:00
Michael McCandless 8e40ea5bf8 LUCENE-3742: fix token offset for hangs-off-end output in SynonymFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1238851 13f79535-47bb-0310-9956-ffa450edef68
2012-01-31 23:01:55 +00:00
Uwe Schindler 10ba9abeb2 Reverse merged revision(s) from lucene/dev/trunk up to 1237502
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858@1237505 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 23:19:05 +00:00
Michael McCandless d1165b1972 LUCENE-3725: add optional packing to FSTs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1237500 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 22:48:45 +00:00
Robert Muir d7fe56ddae LUCENE-2858: fix analyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858@1237312 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 15:16:04 +00:00
Steven Rowe 97d62cc383 Fix offset array assertion off-by-one
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1236912 13f79535-47bb-0310-9956-ffa450edef68
2012-01-27 22:43:48 +00:00
Robert Muir 6edfe4f157 LUCENE-3717: add tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235199 13f79535-47bb-0310-9956-ffa450edef68
2012-01-24 10:40:46 +00:00
Robert Muir 35a73d5f55 LUCENE-3717: fix broken offsets in ngramtokenizers, and check return value of Reader.read
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235187 13f79535-47bb-0310-9956-ffa450edef68
2012-01-24 09:50:21 +00:00
Robert Muir 7fafdd3576 LUCENE-3717: add checkRandomData to more analyzers and fix more offsets bugs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234850 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 15:19:58 +00:00
Steven Rowe 059410d424 LUCENE-3690: fix handling of unpaired numeric character entity UTF-16 surrogates to output U+FFFD REPLACEMENT CHARACTER; and add handling of properly paired numeric character entity UTF-16 surrogates, to output the corresponding pair of code units.
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234687 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 07:36:38 +00:00
Robert Muir c754c1c9c8 LUCENE-3717: add better offsets testing to BaseTokenStreamTestCase, fix offsets bugs in ThaiWordFilter and ICUTokenizer
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234652 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 00:08:52 +00:00
Robert Muir a7cfee6b07 SOLR-2891: fix CompoundWordTokenFilter to not create invalid offsets when the length of the text was changed by a previous filter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234546 13f79535-47bb-0310-9956-ffa450edef68
2012-01-22 16:41:06 +00:00
Steven Rowe f3a363708f LUCENE-3690: Re-implemented HTMLStripCharFilter as a JFlex-generated scanner. Fixes LUCENE-2208, SOLR-882, and SOLR-42.
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234452 13f79535-47bb-0310-9956-ffa450edef68
2012-01-22 05:20:46 +00:00
Uwe Schindler af9b4d816f LUCENE-3671: Add TypeTokenFilter that filters tokens based on their TypeAttribute
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234396 13f79535-47bb-0310-9956-ffa450edef68
2012-01-21 19:02:44 +00:00
Michael McCandless 87bc4521c2 LUCENE-3695: move some confusing FST sugar out
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231795 13f79535-47bb-0310-9956-ffa450edef68
2012-01-15 23:25:38 +00:00
Michael McCandless 11f33ee521 LUCENE-3684: add offsets to postings APIs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231794 13f79535-47bb-0310-9956-ffa450edef68
2012-01-15 23:17:45 +00:00
Yonik Seeley b2a0040e98 tests: silliness
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231526 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 16:59:36 +00:00
Michael McCandless 5ca66287ea woops
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231513 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 15:20:58 +00:00
Michael McCandless d584f6361d LUCENE-3685: add ToChildBlockJoinQuery, to join from parent to child
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231512 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 15:17:04 +00:00
Robert Muir 8b8c2b4dee LUCENE-3690: since this filter handles all kinds of bad partial and wierd input, this should be fine to enable
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231272 13f79535-47bb-0310-9956-ffa450edef68
2012-01-13 19:46:30 +00:00
Robert Muir cd372bdc83 LUCENE-3305: add Kuromoji Japanese morphological analyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1230748 13f79535-47bb-0310-9956-ffa450edef68
2012-01-12 20:10:48 +00:00
Simon Willnauer 3b8458f6de use TEST_VERSION_CURRENT instead of 4_0 in test
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1229523 13f79535-47bb-0310-9956-ffa450edef68
2012-01-10 12:46:38 +00:00
Simon Willnauer f19317d318 SOLR-3020: Add KeywordAttribute support to HunspellStemFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1229519 13f79535-47bb-0310-9956-ffa450edef68
2012-01-10 12:33:29 +00:00
Michael McCandless cdb2ee8a7b LUCENE-3679: replace IR.getFieldNames with IR.getFieldInfos
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1229401 13f79535-47bb-0310-9956-ffa450edef68
2012-01-09 22:29:40 +00:00
Michael McCandless defd51a11b fix syn test bug
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1228704 13f79535-47bb-0310-9956-ffa450edef68
2012-01-07 19:28:07 +00:00
Michael McCandless ed9f0fd5ef LUCENE-3668: if there's only 1 output for a synonym rule then set start/endOffset to match the full span of the input tokens
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1228650 13f79535-47bb-0310-9956-ffa450edef68
2012-01-07 16:26:15 +00:00
Steven Rowe 76d1662cb7 - Added license declaration
- Removed unused 'length' param to combine()

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1225615 13f79535-47bb-0310-9956-ffa450edef68
2011-12-29 18:53:10 +00:00
Robert Muir b2970db4bc LUCENE-2906: filter to process output of Standard/ICUTokenizer and create overlapping bigrams for CJK
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1225433 13f79535-47bb-0310-9956-ffa450edef68
2011-12-29 05:04:49 +00:00
Robert Muir a55f511a77 LUCENE-3650: move o.a.l.index.codecs to o.a.l.codecs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1215245 13f79535-47bb-0310-9956-ffa450edef68
2011-12-16 19:03:12 +00:00
Robert Muir 18febd69e4 LUCENE-2208: improve charfilter offset testing
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1215038 13f79535-47bb-0310-9956-ffa450edef68
2011-12-16 04:37:47 +00:00
Steven Rowe 60929a5adb LUCENE-3645: Remove unnecessary array wrapping when calling varargs methods
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1214413 13f79535-47bb-0310-9956-ffa450edef68
2011-12-14 19:15:47 +00:00
Robert Muir 7dc025bdce LUCENE-3642: fix invalid offsets from CharTokenizer, [Edge]NGramFilters, SmartChinese, add sanity check to BaseTokenStreamTestCase
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1213329 13f79535-47bb-0310-9956-ffa450edef68
2011-12-12 17:28:09 +00:00
Robert Muir 3899e18ca3 LUCENE-3640: Remove IndexSearcher.close
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1213117 13f79535-47bb-0310-9956-ffa450edef68
2011-12-12 00:21:40 +00:00
Uwe Schindler 905a0f211c LUCENE-3606: Make IndexReader really read-only
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1212292 13f79535-47bb-0310-9956-ffa450edef68
2011-12-09 09:13:39 +00:00
Robert Muir 9b15b1d3b0 consolidate assumes in ThaiAnalyzer test so we don't miss it for individual tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1212141 13f79535-47bb-0310-9956-ffa450edef68
2011-12-08 21:47:12 +00:00
Robert Muir 3843ac5b8b LUCENE-3606: fix more tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene3606@1210308 13f79535-47bb-0310-9956-ffa450edef68
2011-12-05 01:59:11 +00:00
Michael McCandless 961b820e53 LUCENE-2929: specify up front if you need freqs from DocsEnum
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1210176 13f79535-47bb-0310-9956-ffa450edef68
2011-12-04 18:50:58 +00:00
Chris M. Hostetter 3ed5106920 SOLR-2819: Improved speed of parsing hex entities in HTMLStripCharFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1208032 13f79535-47bb-0310-9956-ffa450edef68
2011-11-29 19:15:54 +00:00
Robert Muir 7f766cf603 LUCENE-3590: nuke BytesRef.utf8ToChars
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1206174 13f79535-47bb-0310-9956-ffa450edef68
2011-11-25 13:55:41 +00:00
Robert Muir 3b6da22aa7 LUCENE-3590: clearly mark bogus deep-copying apis in BytesRef
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1206143 13f79535-47bb-0310-9956-ffa450edef68
2011-11-25 12:50:13 +00:00
Robert Muir 873f199924 LUCENE-2621: move TermVectors,FieldInfos,SegmentInfos to codec
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1202842 13f79535-47bb-0310-9956-ffa450edef68
2011-11-16 19:09:35 +00:00
Robert Muir 598920d7bd LUCENE-3571: nuke IndexSearcher(Directory)
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1202657 13f79535-47bb-0310-9956-ffa450edef68
2011-11-16 12:19:41 +00:00
Simon Willnauer ee293e7e7d fix javadoc
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1200111 13f79535-47bb-0310-9956-ffa450edef68
2011-11-10 03:32:33 +00:00
Simon Willnauer dc6b4b6533 LUCENE-2564: Cut over WordListLoader to CharArrayMap/Set and use CharSetDecoder to detect encoding problems early
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1200080 13f79535-47bb-0310-9956-ffa450edef68
2011-11-10 01:21:25 +00:00