Michael McCandless
78b4be5dc6
LUCENE-3940: fix Kuromoji to not produce invalid token graph due to UNK with punctuation being decompounded
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1311072 13f79535-47bb-0310-9956-ffa450edef68
2012-04-08 19:17:17 +00:00
Michael McCandless
755ebafa49
LUCENE-3873: add MockGraphTokenFilter, inserting random graph tokens
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1310910 13f79535-47bb-0310-9956-ffa450edef68
2012-04-07 23:06:12 +00:00
Uwe Schindler
62890c8089
LUCENE-3919: Remove useless loop
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1310898 13f79535-47bb-0310-9956-ffa450edef68
2012-04-07 22:33:13 +00:00
Uwe Schindler
bdaa79206d
LUCENE-3919: Die, context class loader, die. Also don't initialize (run static ctors) unrelated classes!
...
@UweSays: "If you get the context classloader from a thread, in most cases you are doing something wrong because you don't understand how Java classloading works."
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1310893 13f79535-47bb-0310-9956-ffa450edef68
2012-04-07 22:27:57 +00:00
Uwe Schindler
7154c5466d
LUCENE-3919: Fix generics and additional checks
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1310883 13f79535-47bb-0310-9956-ffa450edef68
2012-04-07 22:00:28 +00:00
Robert Muir
ed485b29ec
add basic charfilter support to TestRandomChains
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1310805 13f79535-47bb-0310-9956-ffa450edef68
2012-04-07 17:37:16 +00:00
Robert Muir
fbc8429905
LUCENE-3919: more thorough testing of analysis chains
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1310789 13f79535-47bb-0310-9956-ffa450edef68
2012-04-07 15:48:02 +00:00
Chris M. Hostetter
bb7bc2ff44
LUCENE-3945: use sha1 checksums to verify jars pulled from ivy match expectations
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1309503 13f79535-47bb-0310-9956-ffa450edef68
2012-04-04 17:53:32 +00:00
Steven Rowe
0a47c9d4d9
nuke obsolete comment
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1309393 13f79535-47bb-0310-9956-ffa450edef68
2012-04-04 14:04:50 +00:00
Robert Muir
6c7c89c3f9
LUCENE-1866: add exclusion for bocchan test file
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1309255 13f79535-47bb-0310-9956-ffa450edef68
2012-04-04 05:36:52 +00:00
Robert Muir
2fe2e82584
LUCENE-1866: better RAT reporting
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1309248 13f79535-47bb-0310-9956-ffa450edef68
2012-04-04 05:03:53 +00:00
Robert Muir
e5448e2e20
LUCENE-3947: fix rat-sources task to work with tools/ directories
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1309207 13f79535-47bb-0310-9956-ffa450edef68
2012-04-04 01:51:56 +00:00
Robert Muir
6b16efdc22
LUCENE-3930: kuromoji steals icu's jar
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1308423 13f79535-47bb-0310-9956-ffa450edef68
2012-04-02 16:31:59 +00:00
Robert Muir
8f0d7cc135
LUCENE-3930: nuke jars from source tree and use ivy
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1307563 13f79535-47bb-0310-9956-ffa450edef68
2012-03-30 18:04:43 +00:00
Ryan McKinley
49f43806a8
LUCENE-2000: remove redundant casts
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1307012 13f79535-47bb-0310-9956-ffa450edef68
2012-03-29 17:34:34 +00:00
Michael McCandless
e49b69d459
tests: get JRE bug workaround working for this test again
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1306931 13f79535-47bb-0310-9956-ffa450edef68
2012-03-29 15:43:03 +00:00
Ryan McKinley
05fe168961
LUCENE-2000: clone() now returns covariant types where possible.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1306626 13f79535-47bb-0310-9956-ffa450edef68
2012-03-28 22:22:25 +00:00
Christian Moen
ec18632428
Fixed various related to config and user dictionaries for Kuromoji (SOLR-3276)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1306476 13f79535-47bb-0310-9956-ffa450edef68
2012-03-28 17:20:48 +00:00
Robert Muir
bca62a44d3
LUCENE-3929: add a test demonstrating this works
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305870 13f79535-47bb-0310-9956-ffa450edef68
2012-03-27 15:16:42 +00:00
Robert Muir
620f9a5739
small opto when charfilter is used: don't call this method twice in end
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305742 13f79535-47bb-0310-9956-ffa450edef68
2012-03-27 06:06:51 +00:00
Robert Muir
ae0f44fcb9
remaining eol-style fixes to trunk, native except .sh (LF)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305492 13f79535-47bb-0310-9956-ffa450edef68
2012-03-26 18:57:08 +00:00
Robert Muir
a29a14698e
fix eol-style
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305339 13f79535-47bb-0310-9956-ffa450edef68
2012-03-26 12:58:58 +00:00
Christian Moen
f5770479e3
Move and rename Kuromoji (LUCENE-3909)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305297 13f79535-47bb-0310-9956-ffa450edef68
2012-03-26 10:31:48 +00:00
Robert Muir
35705cc396
LUCENE-3919: fix czechstemmer aioobe on the empty term
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305177 13f79535-47bb-0310-9956-ffa450edef68
2012-03-25 23:40:44 +00:00
Michael McCandless
cb1a9a0cdf
LUCENE-3897: if best scoring path is ahead of current pos, move forward
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305149 13f79535-47bb-0310-9956-ffa450edef68
2012-03-25 21:37:55 +00:00
Michael McCandless
a278ba7a0c
LUCENE-3897: fix silly bug in forced backtrace
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305086 13f79535-47bb-0310-9956-ffa450edef68
2012-03-25 17:51:26 +00:00
Christian Moen
c3ddb9dc67
Added KuromojiReadingFormFilter (LUCENE-3915)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1305046 13f79535-47bb-0310-9956-ffa450edef68
2012-03-25 14:17:23 +00:00
Steven Rowe
fb33754168
LUCENE-3881: Added UAX29URLEmailAnalyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1304975 13f79535-47bb-0310-9956-ffa450edef68
2012-03-25 01:20:55 +00:00
Steven Rowe
ada9780484
LUCENE-3913: Fix HTMLStripCharFilter invalid final offset for input containing </br>
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1304912 13f79535-47bb-0310-9956-ffa450edef68
2012-03-24 20:54:31 +00:00
Robert Muir
f597b9a1cc
LUCENE-3883: Irish Analyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1304836 13f79535-47bb-0310-9956-ffa450edef68
2012-03-24 15:59:04 +00:00
Christian Moen
63f1c48b7d
Added katakana stem filter (LUCENE-3901)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1304719 13f79535-47bb-0310-9956-ffa450edef68
2012-03-24 06:38:53 +00:00
Michael McCandless
7291d38535
LUCENE-3905: sometimes run real-ish content (from LineFileDocs) through the analyzers too; fix end() offset bugs in the ngram tokenizers/filters
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1304525 13f79535-47bb-0310-9956-ffa450edef68
2012-03-23 17:39:13 +00:00
Robert Muir
86c2da0eac
happy new year
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303828 13f79535-47bb-0310-9956-ffa450edef68
2012-03-22 15:21:17 +00:00
Robert Muir
c3305a50ff
add some more kuromoji javadocs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303746 13f79535-47bb-0310-9956-ffa450edef68
2012-03-22 12:21:48 +00:00
Christian Moen
d2eebf9330
Fix for LUCENE-3897 (KuromojiTokenizer fails with large docs)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303739 13f79535-47bb-0310-9956-ffa450edef68
2012-03-22 11:41:54 +00:00
Robert Muir
a6fd306dfb
add missing license headers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303738 13f79535-47bb-0310-9956-ffa450edef68
2012-03-22 11:33:45 +00:00
Michael McCandless
1a191f4edc
LUCENE-3898: reset() was missing some state
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303441 13f79535-47bb-0310-9956-ffa450edef68
2012-03-21 15:22:28 +00:00
Robert Muir
fb395f66a3
use MockTokenizer instead of WhitespaceTokenizer for better testing
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303382 13f79535-47bb-0310-9956-ffa450edef68
2012-03-21 13:10:38 +00:00
Michael McCandless
595744089a
LUCENE-3896: CharacterUtils.fill must call Reader.read again if it only got a single high surrogate char on the first read
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303374 13f79535-47bb-0310-9956-ffa450edef68
2012-03-21 12:53:27 +00:00
Robert Muir
f75d40dad5
LUCENE-3894: try toning down for this tokenizer (it builds lots of tokens from the input treated as a path)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303276 13f79535-47bb-0310-9956-ffa450edef68
2012-03-21 04:30:11 +00:00
Robert Muir
1156de050f
LUCENE-3894: add large docs tests for more tokenizers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303273 13f79535-47bb-0310-9956-ffa450edef68
2012-03-21 03:59:14 +00:00
Robert Muir
dd7bfc78d9
LUCENE-3894: for tokenizers, add some tests for larger documents
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303258 13f79535-47bb-0310-9956-ffa450edef68
2012-03-21 02:54:07 +00:00
Robert Muir
3d73a3014e
LUCENE-3896: beef up TestDuelingAnalyzers for larger documents
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303253 13f79535-47bb-0310-9956-ffa450edef68
2012-03-21 01:52:22 +00:00
Michael McCandless
c20242721f
LUCENE-3894: some tokenizers weren't reading all input chars
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303193 13f79535-47bb-0310-9956-ffa450edef68
2012-03-20 23:02:37 +00:00
Robert Muir
b7a7e5a625
LUCENE-3889: remove unnecessary/unused base class
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1303026 13f79535-47bb-0310-9956-ffa450edef68
2012-03-20 17:28:26 +00:00
Jan Høydahl
5648222e86
SOLR-2764: Fix testcase for minimal stemmer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1302872 13f79535-47bb-0310-9956-ffa450edef68
2012-03-20 13:12:39 +00:00
Jan Høydahl
54d48eb98b
SOLR-2764: Create a NorwegianLightStemmer and NorwegianMinimalStemmer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1302833 13f79535-47bb-0310-9956-ffa450edef68
2012-03-20 10:57:50 +00:00
Robert Muir
790323780f
basic javadocs improvements, mostly simple descriptions where the class had nothing before
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1302752 13f79535-47bb-0310-9956-ffa450edef68
2012-03-20 02:09:25 +00:00
Robert Muir
4a2b1d974a
javadocs: add missing package.htmls
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1302713 13f79535-47bb-0310-9956-ffa450edef68
2012-03-19 23:20:25 +00:00
Steven Rowe
c4f72f61ac
LUCENE-3880: UAX29URLEmailTokenizer now recognizes emails when the mailto: scheme is prepended.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1302265 13f79535-47bb-0310-9956-ffa450edef68
2012-03-19 03:13:52 +00:00
Robert Muir
3d2d144f92
LUCENE-3848: don't produce tokenstreams that start with posinc=0
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1301478 13f79535-47bb-0310-9956-ffa450edef68
2012-03-16 13:06:30 +00:00
Uwe Schindler
3d8b22ffd0
LUCENE-3850: Fix rawtypes warnings for Java 7 compiler ( #2 )
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1297162 13f79535-47bb-0310-9956-ffa450edef68
2012-03-05 18:48:04 +00:00
Uwe Schindler
989530e17e
LUCENE-3850: Fix rawtypes warnings for Java 7 compiler
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1297048 13f79535-47bb-0310-9956-ffa450edef68
2012-03-05 13:34:40 +00:00
Christian Moen
430365f7cc
Kuromoji now produces both compound words and the segmentation of those words in search mode (LUCENE-3767)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1296805 13f79535-47bb-0310-9956-ffa450edef68
2012-03-04 13:34:13 +00:00
Dawid Weiss
8c2e3cef8f
LUCENE-3820: limiting the amount of input for pattern matching to go past exponential time patterns, even if they happen. A nice catch from Mike too -- un-ignore testNastyPattern and look at processing time go wild with each additional input character...
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294797 13f79535-47bb-0310-9956-ffa450edef68
2012-02-28 19:26:05 +00:00
Dawid Weiss
f3cc65733b
Sysout of the randomized pattern.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294518 13f79535-47bb-0310-9956-ffa450edef68
2012-02-28 08:15:38 +00:00
Dawid Weiss
4d401ca87d
Test thread's name reflects the current seed.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294514 13f79535-47bb-0310-9956-ffa450edef68
2012-02-28 08:04:42 +00:00
Dawid Weiss
493bd8b42f
LUCENE-3820: optimistic limit on running time for the randomized pattern test. This doesn't eliminate the possibility of hitting an exponential time pattern, but I re-run a few times and it seems to be pretty stbale.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294322 13f79535-47bb-0310-9956-ffa450edef68
2012-02-27 20:50:24 +00:00
Dawid Weiss
7be5533989
LUCENE-3820: Wrong trailing index calculation in PatternReplaceCharFilter.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1294141 13f79535-47bb-0310-9956-ffa450edef68
2012-02-27 13:13:10 +00:00
Tommaso Teofili
482c0610fd
[LUCENE-3731] - refactored analyzeText method to initializeIterator and made it abstract inside BaseUIMATokenizer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1293614 13f79535-47bb-0310-9956-ffa450edef68
2012-02-25 14:14:00 +00:00
Tommaso Teofili
930816cc5b
LUCENE-3731 - AEProviderFactory getAEProvider logic cleaned
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1292585 13f79535-47bb-0310-9956-ffa450edef68
2012-02-22 23:39:51 +00:00
Robert Muir
e51795be39
LUCENE-3731: remove unnecessary code
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244714 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 20:53:53 +00:00
Robert Muir
c97e3edbb9
LUCENE-3731: performance improvements and thread safety fixes to UIMA tokenizers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244688 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 20:29:20 +00:00
Tommaso Teofili
c454ae6a66
[LUCENE-3731] - creating and using simple wst and pos tagger implementations for analyzers' random string testing
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244474 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 13:17:57 +00:00
Ryan McKinley
cea3acb111
LUCENE-3731: fix javadoc warnings, add uima to eclipse project
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244350 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 04:41:32 +00:00
Ryan McKinley
8d9bfe9245
LUCENE-3731: adding missing overview.html
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244340 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 04:01:57 +00:00
Tommaso Teofili
d66d97790b
[LUCENE-3731] - Creating the analysis-uima module for UIMA based tokenizers/analyzers
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244236 13f79535-47bb-0310-9956-ffa450edef68
2012-02-14 22:13:34 +00:00
Dawid Weiss
087f1e3126
LUCENE-3774: Optimized and streamlined license and notice file validation
...
by refactoring the build task into an ANT task and modifying build scripts
to perform top-level checks. (Dawid Weiss, Steve Rowe, Robert Muir)
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243527 13f79535-47bb-0310-9956-ffa450edef68
2012-02-13 14:12:59 +00:00
Robert Muir
6a07201844
don't fail test due to jre bugs in String.toLowerCase
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243415 13f79535-47bb-0310-9956-ffa450edef68
2012-02-13 04:50:12 +00:00
Robert Muir
590741dcfe
LUCENE-3766: Remove Tokenizer's default ctor
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242890 13f79535-47bb-0310-9956-ffa450edef68
2012-02-10 19:12:35 +00:00
Robert Muir
8a50cefc6b
LUCENE-3748: EnglishPossessiveFilter did not work with a proper right quotation mark
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242740 13f79535-47bb-0310-9956-ffa450edef68
2012-02-10 11:01:11 +00:00
Robert Muir
9f783ead67
SOLR-3115: improve japanese stopwords.txt description
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242557 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 22:17:44 +00:00
Robert Muir
509f4c557d
LUCENE-3751: align default japanese configurations for lucene/solr
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242543 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 21:45:41 +00:00
Robert Muir
72ae3171be
LUCENE-3765: Trappy behavior with StopFilter/ignoreCase
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242497 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 19:59:50 +00:00
Robert Muir
c0319d5928
SOLR-3056: document expectations in these files
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1241960 13f79535-47bb-0310-9956-ffa450edef68
2012-02-08 16:27:47 +00:00
Robert Muir
dac1b58277
SOLR-3097, SOLR-3105: add fieldtypes for different languages to the example
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1241878 13f79535-47bb-0310-9956-ffa450edef68
2012-02-08 12:07:52 +00:00
Robert Muir
bef6e3664d
LUCENE-3726: additional tests
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240760 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 16:16:02 +00:00
Robert Muir
03497e7595
LUCENE-3745: add proper Japanese stopping
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240714 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 13:05:42 +00:00
Robert Muir
009608d9f2
LUCENE-3726: default Kuromoji to search mode
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240710 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 12:41:13 +00:00
Tommaso Teofili
6d3bb736f3
[LUCENE-3744] - applied patch for whiteList usage in TypeTokenFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240034 13f79535-47bb-0310-9956-ffa450edef68
2012-02-03 09:13:17 +00:00
Michael McCandless
60c36c24fb
don't let prefix's output bleed into full string's output (potential/latent bug)
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1239658 13f79535-47bb-0310-9956-ffa450edef68
2012-02-02 15:01:13 +00:00
Robert Muir
995c5b9ef1
LUCENE-3730: improve Kuromoji search mode heuristics
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1239061 13f79535-47bb-0310-9956-ffa450edef68
2012-02-01 11:03:17 +00:00
Michael McCandless
8e40ea5bf8
LUCENE-3742: fix token offset for hangs-off-end output in SynonymFilter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1238851 13f79535-47bb-0310-9956-ffa450edef68
2012-01-31 23:01:55 +00:00
Uwe Schindler
10ba9abeb2
Reverse merged revision(s) from lucene/dev/trunk up to 1237502
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858@1237505 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 23:19:05 +00:00
Michael McCandless
d1165b1972
LUCENE-3725: add optional packing to FSTs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1237500 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 22:48:45 +00:00
Robert Muir
d7fe56ddae
LUCENE-2858: fix analyzer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858@1237312 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 15:16:04 +00:00
Steven Rowe
97d62cc383
Fix offset array assertion off-by-one
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1236912 13f79535-47bb-0310-9956-ffa450edef68
2012-01-27 22:43:48 +00:00
Robert Muir
f640687877
LUCENE-3720: add warning+experimental and disable test
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1236341 13f79535-47bb-0310-9956-ffa450edef68
2012-01-26 18:26:07 +00:00
Robert Muir
6edfe4f157
LUCENE-3717: add tests
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235199 13f79535-47bb-0310-9956-ffa450edef68
2012-01-24 10:40:46 +00:00
Robert Muir
35a73d5f55
LUCENE-3717: fix broken offsets in ngramtokenizers, and check return value of Reader.read
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235187 13f79535-47bb-0310-9956-ffa450edef68
2012-01-24 09:50:21 +00:00
Robert Muir
7fafdd3576
LUCENE-3717: add checkRandomData to more analyzers and fix more offsets bugs
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234850 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 15:19:58 +00:00
Steven Rowe
059410d424
LUCENE-3690: fix handling of unpaired numeric character entity UTF-16 surrogates to output U+FFFD REPLACEMENT CHARACTER; and add handling of properly paired numeric character entity UTF-16 surrogates, to output the corresponding pair of code units.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234687 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 07:36:38 +00:00
Robert Muir
c754c1c9c8
LUCENE-3717: add better offsets testing to BaseTokenStreamTestCase, fix offsets bugs in ThaiWordFilter and ICUTokenizer
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234652 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 00:08:52 +00:00
Robert Muir
a7cfee6b07
SOLR-2891: fix CompoundWordTokenFilter to not create invalid offsets when the length of the text was changed by a previous filter
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234546 13f79535-47bb-0310-9956-ffa450edef68
2012-01-22 16:41:06 +00:00
Steven Rowe
f3a363708f
LUCENE-3690: Re-implemented HTMLStripCharFilter as a JFlex-generated scanner. Fixes LUCENE-2208, SOLR-882, and SOLR-42.
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234452 13f79535-47bb-0310-9956-ffa450edef68
2012-01-22 05:20:46 +00:00
Uwe Schindler
af9b4d816f
LUCENE-3671: Add TypeTokenFilter that filters tokens based on their TypeAttribute
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234396 13f79535-47bb-0310-9956-ffa450edef68
2012-01-21 19:02:44 +00:00
Robert Muir
e869b1fbf7
LUCENE-3700: give enough ram so that you can build naist-jdic with java 5
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232274 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:27:31 +00:00
Robert Muir
f562a8a0dc
LUCENE-3700: optionally support naist-jdic for kuromoji
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232268 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:20:24 +00:00
Robert Muir
48c01e5a2b
LUCENE-3699: share baseform with surface and flag if the reading can be computed from surface
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232265 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:12:27 +00:00
Robert Muir
c902f63125
unbreak clover/nightly builds until we do this right
...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232254 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 01:37:28 +00:00