Robert Muir
3d2d144f92
LUCENE-3848: don't produce tokenstreams that start with posinc=0
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1301478 13f79535-47bb-0310-9956-ffa450edef68
2012-03-16 13:06:30 +00:00
Uwe Schindler
3d8b22ffd0
LUCENE-3850: Fix rawtypes warnings for Java 7 compiler ( #2 )
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1297162 13f79535-47bb-0310-9956-ffa450edef68
2012-03-05 18:48:04 +00:00
Uwe Schindler
989530e17e
LUCENE-3850: Fix rawtypes warnings for Java 7 compiler
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1297048 13f79535-47bb-0310-9956-ffa450edef68
2012-03-05 13:34:40 +00:00
Christian Moen
430365f7cc
Kuromoji now produces both compound words and the segmentation of those words in search mode (LUCENE-3767)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1296805 13f79535-47bb-0310-9956-ffa450edef68
2012-03-04 13:34:13 +00:00
Dawid Weiss
8c2e3cef8f
LUCENE-3820: limiting the amount of input for pattern matching to go past exponential time patterns, even if they happen. A nice catch from Mike too -- un-ignore testNastyPattern and look at processing time go wild with each additional input character...
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294797 13f79535-47bb-0310-9956-ffa450edef68
2012-02-28 19:26:05 +00:00
Dawid Weiss
f3cc65733b
Sysout of the randomized pattern.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294518 13f79535-47bb-0310-9956-ffa450edef68
2012-02-28 08:15:38 +00:00
Dawid Weiss
4d401ca87d
Test thread's name reflects the current seed.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294514 13f79535-47bb-0310-9956-ffa450edef68
2012-02-28 08:04:42 +00:00
Dawid Weiss
493bd8b42f
LUCENE-3820: optimistic limit on running time for the randomized pattern test. This doesn't eliminate the possibility of hitting an exponential time pattern, but I re-run a few times and it seems to be pretty stbale.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294322 13f79535-47bb-0310-9956-ffa450edef68
2012-02-27 20:50:24 +00:00
Dawid Weiss
7be5533989
LUCENE-3820: Wrong trailing index calculation in PatternReplaceCharFilter.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294141 13f79535-47bb-0310-9956-ffa450edef68
2012-02-27 13:13:10 +00:00
Tommaso Teofili
482c0610fd
[LUCENE-3731] - refactored analyzeText method to initializeIterator and made it abstract inside BaseUIMATokenizer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1293614 13f79535-47bb-0310-9956-ffa450edef68
2012-02-25 14:14:00 +00:00
Tommaso Teofili
930816cc5b
LUCENE-3731 - AEProviderFactory getAEProvider logic cleaned
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1292585 13f79535-47bb-0310-9956-ffa450edef68
2012-02-22 23:39:51 +00:00
Robert Muir
e51795be39
LUCENE-3731: remove unnecessary code
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244714 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 20:53:53 +00:00
Robert Muir
c97e3edbb9
LUCENE-3731: performance improvements and thread safety fixes to UIMA tokenizers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244688 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 20:29:20 +00:00
Tommaso Teofili
c454ae6a66
[LUCENE-3731] - creating and using simple wst and pos tagger implementations for analyzers' random string testing
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244474 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 13:17:57 +00:00
Ryan McKinley
cea3acb111
LUCENE-3731: fix javadoc warnings, add uima to eclipse project
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244350 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 04:41:32 +00:00
Ryan McKinley
8d9bfe9245
LUCENE-3731: adding missing overview.html
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244340 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 04:01:57 +00:00
Tommaso Teofili
d66d97790b
[LUCENE-3731] - Creating the analysis-uima module for UIMA based tokenizers/analyzers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244236 13f79535-47bb-0310-9956-ffa450edef68
2012-02-14 22:13:34 +00:00
Dawid Weiss
087f1e3126
LUCENE-3774: Optimized and streamlined license and notice file validation
...
by refactoring the build task into an ANT task and modifying build scripts
to perform top-level checks. (Dawid Weiss, Steve Rowe, Robert Muir)
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243527 13f79535-47bb-0310-9956-ffa450edef68
2012-02-13 14:12:59 +00:00
Robert Muir
6a07201844
don't fail test due to jre bugs in String.toLowerCase
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243415 13f79535-47bb-0310-9956-ffa450edef68
2012-02-13 04:50:12 +00:00
Robert Muir
590741dcfe
LUCENE-3766: Remove Tokenizer's default ctor
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242890 13f79535-47bb-0310-9956-ffa450edef68
2012-02-10 19:12:35 +00:00
Robert Muir
8a50cefc6b
LUCENE-3748: EnglishPossessiveFilter did not work with a proper right quotation mark
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242740 13f79535-47bb-0310-9956-ffa450edef68
2012-02-10 11:01:11 +00:00
Robert Muir
9f783ead67
SOLR-3115: improve japanese stopwords.txt description
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242557 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 22:17:44 +00:00
Robert Muir
509f4c557d
LUCENE-3751: align default japanese configurations for lucene/solr
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242543 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 21:45:41 +00:00
Robert Muir
72ae3171be
LUCENE-3765: Trappy behavior with StopFilter/ignoreCase
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242497 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 19:59:50 +00:00
Robert Muir
c0319d5928
SOLR-3056: document expectations in these files
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1241960 13f79535-47bb-0310-9956-ffa450edef68
2012-02-08 16:27:47 +00:00
Robert Muir
dac1b58277
SOLR-3097, SOLR-3105: add fieldtypes for different languages to the example
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1241878 13f79535-47bb-0310-9956-ffa450edef68
2012-02-08 12:07:52 +00:00
Robert Muir
bef6e3664d
LUCENE-3726: additional tests
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240760 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 16:16:02 +00:00
Robert Muir
03497e7595
LUCENE-3745: add proper Japanese stopping
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240714 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 13:05:42 +00:00
Robert Muir
009608d9f2
LUCENE-3726: default Kuromoji to search mode
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240710 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 12:41:13 +00:00
Tommaso Teofili
6d3bb736f3
[LUCENE-3744] - applied patch for whiteList usage in TypeTokenFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240034 13f79535-47bb-0310-9956-ffa450edef68
2012-02-03 09:13:17 +00:00
Michael McCandless
60c36c24fb
don't let prefix's output bleed into full string's output (potential/latent bug)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1239658 13f79535-47bb-0310-9956-ffa450edef68
2012-02-02 15:01:13 +00:00
Robert Muir
995c5b9ef1
LUCENE-3730: improve Kuromoji search mode heuristics
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1239061 13f79535-47bb-0310-9956-ffa450edef68
2012-02-01 11:03:17 +00:00
Michael McCandless
8e40ea5bf8
LUCENE-3742: fix token offset for hangs-off-end output in SynonymFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1238851 13f79535-47bb-0310-9956-ffa450edef68
2012-01-31 23:01:55 +00:00
Uwe Schindler
10ba9abeb2
Reverse merged revision(s) from lucene/dev/trunk up to 1237502
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858@1237505 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 23:19:05 +00:00
Michael McCandless
d1165b1972
LUCENE-3725: add optional packing to FSTs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1237500 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 22:48:45 +00:00
Robert Muir
d7fe56ddae
LUCENE-2858: fix analyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858@1237312 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 15:16:04 +00:00
Steven Rowe
97d62cc383
Fix offset array assertion off-by-one
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1236912 13f79535-47bb-0310-9956-ffa450edef68
2012-01-27 22:43:48 +00:00
Robert Muir
f640687877
LUCENE-3720: add warning+experimental and disable test
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1236341 13f79535-47bb-0310-9956-ffa450edef68
2012-01-26 18:26:07 +00:00
Robert Muir
6edfe4f157
LUCENE-3717: add tests
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235199 13f79535-47bb-0310-9956-ffa450edef68
2012-01-24 10:40:46 +00:00
Robert Muir
35a73d5f55
LUCENE-3717: fix broken offsets in ngramtokenizers, and check return value of Reader.read
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235187 13f79535-47bb-0310-9956-ffa450edef68
2012-01-24 09:50:21 +00:00
Robert Muir
7fafdd3576
LUCENE-3717: add checkRandomData to more analyzers and fix more offsets bugs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234850 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 15:19:58 +00:00
Steven Rowe
059410d424
LUCENE-3690: fix handling of unpaired numeric character entity UTF-16 surrogates to output U+FFFD REPLACEMENT CHARACTER; and add handling of properly paired numeric character entity UTF-16 surrogates, to output the corresponding pair of code units.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234687 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 07:36:38 +00:00
Robert Muir
c754c1c9c8
LUCENE-3717: add better offsets testing to BaseTokenStreamTestCase, fix offsets bugs in ThaiWordFilter and ICUTokenizer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234652 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 00:08:52 +00:00
Robert Muir
a7cfee6b07
SOLR-2891: fix CompoundWordTokenFilter to not create invalid offsets when the length of the text was changed by a previous filter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234546 13f79535-47bb-0310-9956-ffa450edef68
2012-01-22 16:41:06 +00:00
Steven Rowe
f3a363708f
LUCENE-3690: Re-implemented HTMLStripCharFilter as a JFlex-generated scanner. Fixes LUCENE-2208, SOLR-882, and SOLR-42.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234452 13f79535-47bb-0310-9956-ffa450edef68
2012-01-22 05:20:46 +00:00
Uwe Schindler
af9b4d816f
LUCENE-3671: Add TypeTokenFilter that filters tokens based on their TypeAttribute
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234396 13f79535-47bb-0310-9956-ffa450edef68
2012-01-21 19:02:44 +00:00
Robert Muir
e869b1fbf7
LUCENE-3700: give enough ram so that you can build naist-jdic with java 5
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232274 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:27:31 +00:00
Robert Muir
f562a8a0dc
LUCENE-3700: optionally support naist-jdic for kuromoji
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232268 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:20:24 +00:00
Robert Muir
48c01e5a2b
LUCENE-3699: share baseform with surface and flag if the reading can be computed from surface
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232265 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:12:27 +00:00
Robert Muir
c902f63125
unbreak clover/nightly builds until we do this right
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232254 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 01:37:28 +00:00
Robert Muir
12c9b8b4bf
LUCENE-3699: simplify dictionary access and reduce tokeninfodictionary 1.5MB
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232120 13f79535-47bb-0310-9956-ffa450edef68
2012-01-16 19:19:48 +00:00
Robert Muir
354a3be78f
LUCENE-3696: fix dictionary construction to work on java5
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232012 13f79535-47bb-0310-9956-ffa450edef68
2012-01-16 14:50:09 +00:00
Michael McCandless
87bc4521c2
LUCENE-3695: move some confusing FST sugar out
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231795 13f79535-47bb-0310-9956-ffa450edef68
2012-01-15 23:25:38 +00:00
Michael McCandless
11f33ee521
LUCENE-3684: add offsets to postings APIs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231794 13f79535-47bb-0310-9956-ffa450edef68
2012-01-15 23:17:45 +00:00
Robert Muir
fbd34b4390
cleanups to 4.x CHANGES
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231552 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 18:24:48 +00:00
Yonik Seeley
b2a0040e98
tests: silliness
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231526 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 16:59:36 +00:00
Michael McCandless
5ca66287ea
woops
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231513 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 15:20:58 +00:00
Michael McCandless
d584f6361d
LUCENE-3685: add ToChildBlockJoinQuery, to join from parent to child
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231512 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 15:17:04 +00:00
Robert Muir
8b8c2b4dee
LUCENE-3690: since this filter handles all kinds of bad partial and wierd input, this should be fine to enable
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231272 13f79535-47bb-0310-9956-ffa450edef68
2012-01-13 19:46:30 +00:00
Robert Muir
2ff4bdb04f
enable assertions when executing various tools
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231013 13f79535-47bb-0310-9956-ffa450edef68
2012-01-13 11:36:50 +00:00
Robert Muir
05a65507af
LUCENE-3305: optimization, don't retrieve the base form twice in this filter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1230769 13f79535-47bb-0310-9956-ffa450edef68
2012-01-12 20:36:58 +00:00
Robert Muir
4ebdc0872a
LUCENE-3305: sorry Mike (thanks for the help with the FST optimization)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1230756 13f79535-47bb-0310-9956-ffa450edef68
2012-01-12 20:24:40 +00:00
Robert Muir
cd372bdc83
LUCENE-3305: add Kuromoji Japanese morphological analyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1230748 13f79535-47bb-0310-9956-ffa450edef68
2012-01-12 20:10:48 +00:00
Simon Willnauer
3b8458f6de
use TEST_VERSION_CURRENT instead of 4_0 in test
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1229523 13f79535-47bb-0310-9956-ffa450edef68
2012-01-10 12:46:38 +00:00
Simon Willnauer
f19317d318
SOLR-3020: Add KeywordAttribute support to HunspellStemFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1229519 13f79535-47bb-0310-9956-ffa450edef68
2012-01-10 12:33:29 +00:00
Michael McCandless
cdb2ee8a7b
LUCENE-3679: replace IR.getFieldNames with IR.getFieldInfos
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1229401 13f79535-47bb-0310-9956-ffa450edef68
2012-01-09 22:29:40 +00:00
Michael McCandless
defd51a11b
fix syn test bug
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1228704 13f79535-47bb-0310-9956-ffa450edef68
2012-01-07 19:28:07 +00:00
Michael McCandless
ed9f0fd5ef
LUCENE-3668: if there's only 1 output for a synonym rule then set start/endOffset to match the full span of the input tokens
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1228650 13f79535-47bb-0310-9956-ffa450edef68
2012-01-07 16:26:15 +00:00
Steven Rowe
76d1662cb7
- Added license declaration
...
- Removed unused 'length' param to combine()
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1225615 13f79535-47bb-0310-9956-ffa450edef68
2011-12-29 18:53:10 +00:00
Robert Muir
b2970db4bc
LUCENE-2906: filter to process output of Standard/ICUTokenizer and create overlapping bigrams for CJK
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1225433 13f79535-47bb-0310-9956-ffa450edef68
2011-12-29 05:04:49 +00:00
Robert Muir
e2f81e84f2
SOLR-2982: workaround bug in sun javadoc
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1225228 13f79535-47bb-0310-9956-ffa450edef68
2011-12-28 16:51:23 +00:00
Robert Muir
f3869ef3ce
SOLR-2982: add Beider-Morse phonetic filter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1225211 13f79535-47bb-0310-9956-ffa450edef68
2011-12-28 16:00:52 +00:00
Robert Muir
a55f511a77
LUCENE-3650: move o.a.l.index.codecs to o.a.l.codecs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1215245 13f79535-47bb-0310-9956-ffa450edef68
2011-12-16 19:03:12 +00:00
Robert Muir
18febd69e4
LUCENE-2208: improve charfilter offset testing
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1215038 13f79535-47bb-0310-9956-ffa450edef68
2011-12-16 04:37:47 +00:00
Steven Rowe
60929a5adb
LUCENE-3645: Remove unnecessary array wrapping when calling varargs methods
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1214413 13f79535-47bb-0310-9956-ffa450edef68
2011-12-14 19:15:47 +00:00
Robert Muir
7dc025bdce
LUCENE-3642: fix invalid offsets from CharTokenizer, [Edge]NGramFilters, SmartChinese, add sanity check to BaseTokenStreamTestCase
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1213329 13f79535-47bb-0310-9956-ffa450edef68
2011-12-12 17:28:09 +00:00
Robert Muir
3899e18ca3
LUCENE-3640: Remove IndexSearcher.close
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1213117 13f79535-47bb-0310-9956-ffa450edef68
2011-12-12 00:21:40 +00:00
Uwe Schindler
905a0f211c
LUCENE-3606: Make IndexReader really read-only
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1212292 13f79535-47bb-0310-9956-ffa450edef68
2011-12-09 09:13:39 +00:00
Robert Muir
9b15b1d3b0
consolidate assumes in ThaiAnalyzer test so we don't miss it for individual tests
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1212141 13f79535-47bb-0310-9956-ffa450edef68
2011-12-08 21:47:12 +00:00
Robert Muir
3843ac5b8b
LUCENE-3606: fix more tests
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene3606@1210308 13f79535-47bb-0310-9956-ffa450edef68
2011-12-05 01:59:11 +00:00
Michael McCandless
961b820e53
LUCENE-2929: specify up front if you need freqs from DocsEnum
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1210176 13f79535-47bb-0310-9956-ffa450edef68
2011-12-04 18:50:58 +00:00
Chris M. Hostetter
3ed5106920
SOLR-2819: Improved speed of parsing hex entities in HTMLStripCharFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1208032 13f79535-47bb-0310-9956-ffa450edef68
2011-11-29 19:15:54 +00:00
Robert Muir
7f766cf603
LUCENE-3590: nuke BytesRef.utf8ToChars
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1206174 13f79535-47bb-0310-9956-ffa450edef68
2011-11-25 13:55:41 +00:00
Robert Muir
3b6da22aa7
LUCENE-3590: clearly mark bogus deep-copying apis in BytesRef
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1206143 13f79535-47bb-0310-9956-ffa450edef68
2011-11-25 12:50:13 +00:00
Robert Muir
873f199924
LUCENE-2621: move TermVectors,FieldInfos,SegmentInfos to codec
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1202842 13f79535-47bb-0310-9956-ffa450edef68
2011-11-16 19:09:35 +00:00
Robert Muir
598920d7bd
LUCENE-3571: nuke IndexSearcher(Directory)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1202657 13f79535-47bb-0310-9956-ffa450edef68
2011-11-16 12:19:41 +00:00
Simon Willnauer
ee293e7e7d
fix javadoc
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1200111 13f79535-47bb-0310-9956-ffa450edef68
2011-11-10 03:32:33 +00:00
Simon Willnauer
c0a7abbec0
LUCENE-2564: Cut over WordListLoader to CharArrayMap/Set and use CharSetDecoder to detect encoding problems early
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1200091 13f79535-47bb-0310-9956-ffa450edef68
2011-11-10 01:52:48 +00:00
Simon Willnauer
dc6b4b6533
LUCENE-2564: Cut over WordListLoader to CharArrayMap/Set and use CharSetDecoder to detect encoding problems early
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1200080 13f79535-47bb-0310-9956-ffa450edef68
2011-11-10 01:21:25 +00:00
Robert Muir
fa6500fa6c
LUCENE-3490: restructure codec hierarchy
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1197603 13f79535-47bb-0310-9956-ffa450edef68
2011-11-04 15:43:35 +00:00
Robert Muir
d5601eb371
SOLR-2276: Support for cologne phonetic
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1195082 13f79535-47bb-0310-9956-ffa450edef68
2011-10-30 01:00:06 +00:00
Uwe Schindler
a91efbedd1
LUCENE-3530: Remove deprecated methods in CompoundTokenFilters
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1188613 13f79535-47bb-0310-9956-ffa450edef68
2011-10-25 11:31:16 +00:00
Uwe Schindler
ec186e7280
LUCENE-3508: Decompounders based on CompoundWordTokenFilterBase can now be used with custom attributes. All those attributes are preserved and set on all added decompounded tokens
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1188597 13f79535-47bb-0310-9956-ffa450edef68
2011-10-25 10:44:36 +00:00
Robert Muir
f21ac2f58c
LUCENE-3301: add workaround for jre breakiterator bugs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1187900 13f79535-47bb-0310-9956-ffa450edef68
2011-10-23 14:55:25 +00:00
Robert Muir
7af9fbd16d
LUCENE-3521: upgrade icu jar to 4.8.1.1 / remove lucenetestcase hack
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1183738 13f79535-47bb-0310-9956-ffa450edef68
2011-10-15 21:48:50 +00:00
Robert Muir
9ba4ce2ed5
javadocs fixes
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1182505 13f79535-47bb-0310-9956-ffa450edef68
2011-10-12 18:20:41 +00:00
Jan Høydahl
22dcd39d9e
SOLR-2792: Allow case insensitive Hunspell stemming
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1179459 13f79535-47bb-0310-9956-ffa450edef68
2011-10-05 22:08:55 +00:00
Michael McCandless
ec2b654231
LUCENE-3477: add explicit breaks in jflex sources so we don't hit compiler warnings; fix a couple other warnings
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1177723 13f79535-47bb-0310-9956-ffa450edef68
2011-09-30 16:23:24 +00:00
Christopher John Male
8d28270460
LUCENE-3470: Changed Field constructor signatures order to value, fieldtype
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1176773 13f79535-47bb-0310-9956-ffa450edef68
2011-09-28 08:07:16 +00:00
Christopher John Male
67c13bd2fe
LUCENE-3455: Renamed Analyzer.reusableTokenStream to Analyzer.tokenStream
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1176728 13f79535-47bb-0310-9956-ffa450edef68
2011-09-28 05:26:54 +00:00
Christopher John Male
0bed3142bb
LUCENE-3455: Test Analysis consumers now use reusableTokenStream
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1175670 13f79535-47bb-0310-9956-ffa450edef68
2011-09-26 04:58:48 +00:00
Christopher John Male
4ff0b2f82c
LUCENE-3396: Collapsing Analyzer and ReusableAnalyzerBase together, mandating use of TokenStreamComponents
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1175297 13f79535-47bb-0310-9956-ffa450edef68
2011-09-25 05:10:25 +00:00
Christopher John Male
318911200d
LUCENE-3434: Removed state changing setters in ShingleAnalyzerWrapper and PerFieldAnalyzerWrapper
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1170942 13f79535-47bb-0310-9956-ffa450edef68
2011-09-15 03:21:17 +00:00
Christopher John Male
94028fe11a
LUCENE-3431: Removed deprecated addStopwords methods in QueryAutoStopWordAnalyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1170424 13f79535-47bb-0310-9956-ffa450edef68
2011-09-14 03:33:50 +00:00
Christopher John Male
3597bc4bf4
LUCENE-3396: Converted simple Analyzers which got lost in merging
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1169654 13f79535-47bb-0310-9956-ffa450edef68
2011-09-12 09:00:42 +00:00
Christopher John Male
4c5606ee29
LUCENE-3396: Converted most Analyzers over to using ReusableAnalyzerBase
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1169607 13f79535-47bb-0310-9956-ffa450edef68
2011-09-12 05:50:26 +00:00
Robert Muir
a027a35583
nocommit -> TODO
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1169474 13f79535-47bb-0310-9956-ffa450edef68
2011-09-11 16:39:59 +00:00
Christopher John Male
e3172b9239
LUCENE-3414: Added Hunspell for Lucene
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1167467 13f79535-47bb-0310-9956-ffa450edef68
2011-09-10 06:00:39 +00:00
Robert Muir
128aaf8387
LUCENE-3410: move changes to 3.5 and nuke deprecated code in trunk
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1166770 13f79535-47bb-0310-9956-ffa450edef68
2011-09-08 15:56:01 +00:00
Robert Muir
b265d499f2
LUCENE-3417: DictionaryCompoundWordFilter did not properly add tokens from the end compound word
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1166728 13f79535-47bb-0310-9956-ffa450edef68
2011-09-08 14:59:15 +00:00
Christopher John Male
4b44bd7d83
LUCENE-3410: Deprecated multi-int constructors in WordDelimiterFilter. Now uses int bitfield
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1165995 13f79535-47bb-0310-9956-ffa450edef68
2011-09-07 04:43:10 +00:00
Michael McCandless
4dad0ba89f
LUCENE-2308: cutover to FieldType
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1162347 13f79535-47bb-0310-9956-ffa450edef68
2011-08-27 13:27:01 +00:00
Christopher John Male
0f2d7ad556
LUCENE-3397: Cleaned up remaining test TSs and PatterAnalyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1161986 13f79535-47bb-0310-9956-ffa450edef68
2011-08-26 04:16:19 +00:00
Christopher John Male
1057d24e7f
LUCENE-3400: Removed DutchAnalyzer.setStemDictionary
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1161484 13f79535-47bb-0310-9956-ffa450edef68
2011-08-25 10:32:21 +00:00
Christopher John Male
0ef9c3c25f
LUCENE-3376: Moved ReusableAnalyzerBase to core
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1160117 13f79535-47bb-0310-9956-ffa450edef68
2011-08-22 06:01:31 +00:00
Robert Muir
a5d2d78cec
LUCENE-3378: nuke another useless custom test-classpath
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1158857 13f79535-47bb-0310-9956-ffa450edef68
2011-08-17 18:21:41 +00:00
Robert Muir
efbdae6dd2
LUCENE-3378: remove unneeded special test-classpaths in build
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1158821 13f79535-47bb-0310-9956-ffa450edef68
2011-08-17 16:45:37 +00:00
Robert Muir
99ac972281
LUCENE-3378: move collationtestbase to tests-framework
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1158819 13f79535-47bb-0310-9956-ffa450edef68
2011-08-17 16:43:13 +00:00
Robert Muir
8a0578dfe2
LUCENE-3378: move VocabularyAssert to test-framework
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1158730 13f79535-47bb-0310-9956-ffa450edef68
2011-08-17 14:19:15 +00:00
Robert Muir
7eab19aff7
LUCENE-3375: fix synonyms bug where keepOrig=false would discard unmatched inputs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1158342 13f79535-47bb-0310-9956-ffa450edef68
2011-08-16 16:01:05 +00:00
Robert Muir
f7237cb165
LUCENE-3361: remove api deprecations in trunk
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1154943 13f79535-47bb-0310-9956-ffa450edef68
2011-08-08 12:17:33 +00:00
Robert Muir
ef56f5d551
LUCENE-3361: port url+email tokenizer to standardtokenizerinterface, fix combining marks bug
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1154936 13f79535-47bb-0310-9956-ffa450edef68
2011-08-08 11:57:59 +00:00
Robert Muir
2dda5bc35f
LUCENE-3358: StandardTokenizer wrongly discarded combining marks attached to Han/Hiragana
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1154005 13f79535-47bb-0310-9956-ffa450edef68
2011-08-04 20:49:47 +00:00
Steven Rowe
23d22e4d47
LUCENE-3337: avoid building jar files unless necessary in build
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1151720 13f79535-47bb-0310-9956-ffa450edef68
2011-07-28 04:02:09 +00:00
Uwe Schindler
014dee7cf5
revert accidental commit
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1150488 13f79535-47bb-0310-9956-ffa450edef68
2011-07-24 21:22:07 +00:00
Uwe Schindler
9c73f9d03b
LUCENE-3336: Speed up javadocs-all builds by minimizing compile costs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1150486 13f79535-47bb-0310-9956-ffa450edef68
2011-07-24 21:19:42 +00:00
Robert Muir
3626220146
use a different character for test, one that is still enclosing mark in 6.0
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1150091 13f79535-47bb-0310-9956-ffa450edef68
2011-07-23 12:18:48 +00:00
Michael McCandless
fbf9f4ccad
LUCENE-3289: add options to FST Builder to tradeoff RAM/CPU used during build vs how small the resulting FST is
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1145292 13f79535-47bb-0310-9956-ffa450edef68
2011-07-11 18:53:13 +00:00
Robert Muir
015ecfa0a0
LUCENE-3233: improve ram/perf of SynonymFilter, add wordnet parsing, nuke contrib/wordnet
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1145158 13f79535-47bb-0310-9956-ffa450edef68
2011-07-11 12:58:52 +00:00
Christopher John Male
5f30bedccc
LUCENE-3283: Moved core QueryParsers to queryparser module
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1145016 13f79535-47bb-0310-9956-ffa450edef68
2011-07-11 03:37:00 +00:00
Steven Rowe
9e020991ef
Merged with trunk up to r1144714
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/solr2452@1144715 13f79535-47bb-0310-9956-ffa450edef68
2011-07-09 18:50:54 +00:00
Christopher John Male
f16f395a30
LUCENE-3284: Decoupled remaining module/contrib tests from QueryParser
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1144566 13f79535-47bb-0310-9956-ffa450edef68
2011-07-09 01:11:18 +00:00
Steven Rowe
88fe5d121f
Merged with trunk
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/solr2452@1144510 13f79535-47bb-0310-9956-ffa450edef68
2011-07-08 21:02:20 +00:00
Simon Willnauer
6c5621f16c
fixed dead store variable
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1144269 13f79535-47bb-0310-9956-ffa450edef68
2011-07-08 11:26:03 +00:00
Steven Rowe
4505c08643
SOLR-2452: merged with trunk up r1144161; applied the svn movement script and the latest version of the post-svn-movement patch
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/solr2452@1144174 13f79535-47bb-0310-9956-ffa450edef68
2011-07-08 06:41:23 +00:00
Michael McCandless
b55eeb510d
LUCENE-3246: invert getDelDocs to getLiveDocs as pre-cursor for LUCENE-1536
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1143415 13f79535-47bb-0310-9956-ffa450edef68
2011-07-06 13:54:38 +00:00
Dawid Weiss
796fa6def3
JavaDoc warnings squashed.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1141689 13f79535-47bb-0310-9956-ffa450edef68
2011-06-30 19:52:31 +00:00
Dawid Weiss
dab351a096
Changing the licensing from CC-SA (approved by Apache anyway, but we don't want any issues) to MPL (we've got an agreement from Marcin Milkowski; the license statement has been updated in Morfologik's repository too.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1141673 13f79535-47bb-0310-9956-ffa450edef68
2011-06-30 19:16:32 +00:00
Dawid Weiss
29b09032d3
LUCENE-2341: integrating morfologik (Polish stemming/ morphosyntactic dictionary) as an analysis module.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1141671 13f79535-47bb-0310-9956-ffa450edef68
2011-06-30 19:12:54 +00:00
Dawid Weiss
f85c4e7c88
Reverting 1141022 (needs to wait for 1.6 support).
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1141032 13f79535-47bb-0310-9956-ffa450edef68
2011-06-29 10:00:36 +00:00
Dawid Weiss
d188d3df90
LUCENE-2341: integrating morfologik (Polish stemming/ morphosyntactic dictionary) as an analysis module.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1141022 13f79535-47bb-0310-9956-ffa450edef68
2011-06-29 09:24:14 +00:00
Christopher John Male
f9ed2c19cd
LUCENE-3219: Moved SortField types to Enum
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1138276 13f79535-47bb-0310-9956-ffa450edef68
2011-06-22 01:48:45 +00:00
Robert Muir
eca56e0564
LUCENE-152: minor optimization to avoid some char[]/String creation
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1134328 13f79535-47bb-0310-9956-ffa450edef68
2011-06-10 14:00:32 +00:00
Simon Willnauer
2007a4b4e0
Remove @Version tags from JavaDoc
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1133805 13f79535-47bb-0310-9956-ffa450edef68
2011-06-09 11:43:35 +00:00
Steven Rowe
d2cc7f1330
LUCENE-3149: Switch ICU4J dependency to mavenized version
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1130718 13f79535-47bb-0310-9956-ffa450edef68
2011-06-02 18:42:26 +00:00
Steven Rowe
8428aa9c0d
LUCENE-3149: Updated ICU4J notice
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1130676 13f79535-47bb-0310-9956-ffa450edef68
2011-06-02 17:34:25 +00:00
Ryan McKinley
50fb06de1a
LUCENE-3149 -- fix maven-dist
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1130547 13f79535-47bb-0310-9956-ffa450edef68
2011-06-02 13:42:55 +00:00
Robert Muir
b7277878e8
LUCENE-152: add KStem
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1130527 13f79535-47bb-0310-9956-ffa450edef68
2011-06-02 12:58:22 +00:00
Robert Muir
5fff60467f
LUCENE-3149: upgrade icu to 4.8
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1130439 13f79535-47bb-0310-9956-ffa450edef68
2011-06-02 08:58:34 +00:00
Robert Muir
063d18e280
LUCENE-3163: add link to jira versions information to CHANGES.txt files
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1129656 13f79535-47bb-0310-9956-ffa450edef68
2011-05-31 13:03:40 +00:00