Commit Graph

254 Commits

Author SHA1 Message Date
Robert Muir e51795be39 LUCENE-3731: remove unnecessary code
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244714 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 20:53:53 +00:00
Robert Muir c97e3edbb9 LUCENE-3731: performance improvements and thread safety fixes to UIMA tokenizers
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244688 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 20:29:20 +00:00
Tommaso Teofili c454ae6a66 [LUCENE-3731] - creating and using simple wst and pos tagger implementations for analyzers' random string testing
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244474 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 13:17:57 +00:00
Ryan McKinley cea3acb111 LUCENE-3731: fix javadoc warnings, add uima to eclipse project
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244350 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 04:41:32 +00:00
Ryan McKinley 8d9bfe9245 LUCENE-3731: adding missing overview.html
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244340 13f79535-47bb-0310-9956-ffa450edef68
2012-02-15 04:01:57 +00:00
Tommaso Teofili d66d97790b [LUCENE-3731] - Creating the analysis-uima module for UIMA based tokenizers/analyzers
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1244236 13f79535-47bb-0310-9956-ffa450edef68
2012-02-14 22:13:34 +00:00
Dawid Weiss 087f1e3126 LUCENE-3774: Optimized and streamlined license and notice file validation
by refactoring the build task into an ANT task and modifying build scripts
to perform top-level checks. (Dawid Weiss, Steve Rowe, Robert Muir)

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243527 13f79535-47bb-0310-9956-ffa450edef68
2012-02-13 14:12:59 +00:00
Robert Muir 6a07201844 don't fail test due to jre bugs in String.toLowerCase
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243415 13f79535-47bb-0310-9956-ffa450edef68
2012-02-13 04:50:12 +00:00
Robert Muir 590741dcfe LUCENE-3766: Remove Tokenizer's default ctor
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242890 13f79535-47bb-0310-9956-ffa450edef68
2012-02-10 19:12:35 +00:00
Robert Muir 8a50cefc6b LUCENE-3748: EnglishPossessiveFilter did not work with a proper right quotation mark
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242740 13f79535-47bb-0310-9956-ffa450edef68
2012-02-10 11:01:11 +00:00
Robert Muir 9f783ead67 SOLR-3115: improve japanese stopwords.txt description
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242557 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 22:17:44 +00:00
Robert Muir 509f4c557d LUCENE-3751: align default japanese configurations for lucene/solr
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242543 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 21:45:41 +00:00
Robert Muir 72ae3171be LUCENE-3765: Trappy behavior with StopFilter/ignoreCase
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1242497 13f79535-47bb-0310-9956-ffa450edef68
2012-02-09 19:59:50 +00:00
Robert Muir c0319d5928 SOLR-3056: document expectations in these files
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1241960 13f79535-47bb-0310-9956-ffa450edef68
2012-02-08 16:27:47 +00:00
Robert Muir dac1b58277 SOLR-3097, SOLR-3105: add fieldtypes for different languages to the example
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1241878 13f79535-47bb-0310-9956-ffa450edef68
2012-02-08 12:07:52 +00:00
Robert Muir bef6e3664d LUCENE-3726: additional tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240760 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 16:16:02 +00:00
Robert Muir 03497e7595 LUCENE-3745: add proper Japanese stopping
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240714 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 13:05:42 +00:00
Robert Muir 009608d9f2 LUCENE-3726: default Kuromoji to search mode
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240710 13f79535-47bb-0310-9956-ffa450edef68
2012-02-05 12:41:13 +00:00
Tommaso Teofili 6d3bb736f3 [LUCENE-3744] - applied patch for whiteList usage in TypeTokenFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1240034 13f79535-47bb-0310-9956-ffa450edef68
2012-02-03 09:13:17 +00:00
Michael McCandless 60c36c24fb don't let prefix's output bleed into full string's output (potential/latent bug)
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1239658 13f79535-47bb-0310-9956-ffa450edef68
2012-02-02 15:01:13 +00:00
Robert Muir 995c5b9ef1 LUCENE-3730: improve Kuromoji search mode heuristics
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1239061 13f79535-47bb-0310-9956-ffa450edef68
2012-02-01 11:03:17 +00:00
Michael McCandless 8e40ea5bf8 LUCENE-3742: fix token offset for hangs-off-end output in SynonymFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1238851 13f79535-47bb-0310-9956-ffa450edef68
2012-01-31 23:01:55 +00:00
Uwe Schindler 10ba9abeb2 Reverse merged revision(s) from lucene/dev/trunk up to 1237502
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858@1237505 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 23:19:05 +00:00
Michael McCandless d1165b1972 LUCENE-3725: add optional packing to FSTs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1237500 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 22:48:45 +00:00
Robert Muir d7fe56ddae LUCENE-2858: fix analyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene2858@1237312 13f79535-47bb-0310-9956-ffa450edef68
2012-01-29 15:16:04 +00:00
Steven Rowe 97d62cc383 Fix offset array assertion off-by-one
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1236912 13f79535-47bb-0310-9956-ffa450edef68
2012-01-27 22:43:48 +00:00
Robert Muir f640687877 LUCENE-3720: add warning+experimental and disable test
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1236341 13f79535-47bb-0310-9956-ffa450edef68
2012-01-26 18:26:07 +00:00
Robert Muir 6edfe4f157 LUCENE-3717: add tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235199 13f79535-47bb-0310-9956-ffa450edef68
2012-01-24 10:40:46 +00:00
Robert Muir 35a73d5f55 LUCENE-3717: fix broken offsets in ngramtokenizers, and check return value of Reader.read
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1235187 13f79535-47bb-0310-9956-ffa450edef68
2012-01-24 09:50:21 +00:00
Robert Muir 7fafdd3576 LUCENE-3717: add checkRandomData to more analyzers and fix more offsets bugs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234850 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 15:19:58 +00:00
Steven Rowe 059410d424 LUCENE-3690: fix handling of unpaired numeric character entity UTF-16 surrogates to output U+FFFD REPLACEMENT CHARACTER; and add handling of properly paired numeric character entity UTF-16 surrogates, to output the corresponding pair of code units.
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234687 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 07:36:38 +00:00
Robert Muir c754c1c9c8 LUCENE-3717: add better offsets testing to BaseTokenStreamTestCase, fix offsets bugs in ThaiWordFilter and ICUTokenizer
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234652 13f79535-47bb-0310-9956-ffa450edef68
2012-01-23 00:08:52 +00:00
Robert Muir a7cfee6b07 SOLR-2891: fix CompoundWordTokenFilter to not create invalid offsets when the length of the text was changed by a previous filter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234546 13f79535-47bb-0310-9956-ffa450edef68
2012-01-22 16:41:06 +00:00
Steven Rowe f3a363708f LUCENE-3690: Re-implemented HTMLStripCharFilter as a JFlex-generated scanner. Fixes LUCENE-2208, SOLR-882, and SOLR-42.
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234452 13f79535-47bb-0310-9956-ffa450edef68
2012-01-22 05:20:46 +00:00
Uwe Schindler af9b4d816f LUCENE-3671: Add TypeTokenFilter that filters tokens based on their TypeAttribute
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1234396 13f79535-47bb-0310-9956-ffa450edef68
2012-01-21 19:02:44 +00:00
Robert Muir e869b1fbf7 LUCENE-3700: give enough ram so that you can build naist-jdic with java 5
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232274 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:27:31 +00:00
Robert Muir f562a8a0dc LUCENE-3700: optionally support naist-jdic for kuromoji
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232268 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:20:24 +00:00
Robert Muir 48c01e5a2b LUCENE-3699: share baseform with surface and flag if the reading can be computed from surface
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232265 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 02:12:27 +00:00
Robert Muir c902f63125 unbreak clover/nightly builds until we do this right
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232254 13f79535-47bb-0310-9956-ffa450edef68
2012-01-17 01:37:28 +00:00
Robert Muir 12c9b8b4bf LUCENE-3699: simplify dictionary access and reduce tokeninfodictionary 1.5MB
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232120 13f79535-47bb-0310-9956-ffa450edef68
2012-01-16 19:19:48 +00:00
Robert Muir 354a3be78f LUCENE-3696: fix dictionary construction to work on java5
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1232012 13f79535-47bb-0310-9956-ffa450edef68
2012-01-16 14:50:09 +00:00
Michael McCandless 87bc4521c2 LUCENE-3695: move some confusing FST sugar out
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231795 13f79535-47bb-0310-9956-ffa450edef68
2012-01-15 23:25:38 +00:00
Michael McCandless 11f33ee521 LUCENE-3684: add offsets to postings APIs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231794 13f79535-47bb-0310-9956-ffa450edef68
2012-01-15 23:17:45 +00:00
Robert Muir fbd34b4390 cleanups to 4.x CHANGES
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231552 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 18:24:48 +00:00
Yonik Seeley b2a0040e98 tests: silliness
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231526 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 16:59:36 +00:00
Michael McCandless 5ca66287ea woops
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231513 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 15:20:58 +00:00
Michael McCandless d584f6361d LUCENE-3685: add ToChildBlockJoinQuery, to join from parent to child
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231512 13f79535-47bb-0310-9956-ffa450edef68
2012-01-14 15:17:04 +00:00
Robert Muir 8b8c2b4dee LUCENE-3690: since this filter handles all kinds of bad partial and wierd input, this should be fine to enable
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231272 13f79535-47bb-0310-9956-ffa450edef68
2012-01-13 19:46:30 +00:00
Robert Muir 2ff4bdb04f enable assertions when executing various tools
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1231013 13f79535-47bb-0310-9956-ffa450edef68
2012-01-13 11:36:50 +00:00
Robert Muir 05a65507af LUCENE-3305: optimization, don't retrieve the base form twice in this filter
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1230769 13f79535-47bb-0310-9956-ffa450edef68
2012-01-12 20:36:58 +00:00