Commit Graph

182 Commits

Author SHA1 Message Date
Michael McCandless 19234f12bd LUCENE-1692: add new contrib analyzer tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@786606 13f79535-47bb-0310-9956-ffa450edef68
2009-06-19 18:02:12 +00:00
Michael McCandless 2f2cd20828 LUCENE-1692: add tests for Thai & SmartChinese analyzers; fix wrong endOffset bug in ThaiWordFilter; use stop words by default with SmartChineseAnalyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@786560 13f79535-47bb-0310-9956-ffa450edef68
2009-06-19 15:52:36 +00:00
Michael McCandless 835c405be0 LUCENE-973: add test case for CJKAnalyzer; fix trailing empty string bug
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@785287 13f79535-47bb-0310-9956-ffa450edef68
2009-06-16 16:38:39 +00:00
Grant Ingersoll 1511ec5e31 LUCENE-1676: in-stream payload support
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@784297 13f79535-47bb-0310-9956-ffa450edef68
2009-06-12 22:26:01 +00:00
Michael McCandless af550281cb LUCENE-1629: remove unnecessary source files
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@775468 13f79535-47bb-0310-9956-ffa450edef68
2009-05-16 14:08:38 +00:00
Michael McCandless f81f6796a2 LUCENE-1629: correct ASF source headers
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@775444 13f79535-47bb-0310-9956-ffa450edef68
2009-05-16 09:55:34 +00:00
Michael McCandless be0a47b7e3 LUCENE-1629: move CHANGES entry to contrib; add TestArabicAnalyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@774727 13f79535-47bb-0310-9956-ffa450edef68
2009-05-14 10:50:52 +00:00
Michael McCandless e01aad89fe LUCENE-1629: adding new contrib analyzer SmartChineseAnalyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@774718 13f79535-47bb-0310-9956-ffa450edef68
2009-05-14 10:09:22 +00:00
Michael McCandless c73712d1bb LUCENE-1576: fix BrazilianAnalyzer to downcase before filtering stop words
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@759307 13f79535-47bb-0310-9956-ffa450edef68
2009-03-27 19:04:25 +00:00
Michael McCandless 0f17904f1e remove slow download host; only download zip file once for all tests
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@759061 13f79535-47bb-0310-9956-ffa450edef68
2009-03-27 08:44:33 +00:00
Michael McCandless 96863198a5 LUCENE-1490: fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS characters to only apply to the correct subset
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@755666 13f79535-47bb-0310-9956-ffa450edef68
2009-03-18 17:28:53 +00:00
Yonik Seeley 6c176eb016 LUCENE-1398: Add ReverseStringFilter to contrib/analyzers
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@747915 13f79535-47bb-0310-9956-ffa450edef68
2009-02-25 20:44:05 +00:00
Karl-Johan Wettin d7376608b2 LUCENE-1514
ShingleMatrixFilter#next(Token) easily throws a StackOverflowException due to recursive invocation. (Karl Wettin)


git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@733064 13f79535-47bb-0310-9956-ffa450edef68
2009-01-09 15:34:52 +00:00
Grant Ingersoll 2225462178 LUCENE-1380: Add PositionFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@725691 13f79535-47bb-0310-9956-ffa450edef68
2008-12-11 14:17:44 +00:00
Grant Ingersoll 702ea32da7 make constructors public
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@724059 13f79535-47bb-0310-9956-ffa450edef68
2008-12-07 00:39:35 +00:00
Yonik Seeley 8e8e8ddec4 set svn:eol-style to native on java files
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@712922 13f79535-47bb-0310-9956-ffa450edef68
2008-11-11 02:35:46 +00:00
Grant Ingersoll 8dfe073760 LUCENE-1406. Added Arabic stemming and normalization. Also added new method to WordListLoader to allow for comments in word lists.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@706342 13f79535-47bb-0310-9956-ffa450edef68
2008-10-20 17:19:29 +00:00
Michael McCandless 3f27b17a89 fix non-1.4-compatible throws clause
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@701827 13f79535-47bb-0310-9956-ffa450edef68
2008-10-05 16:40:59 +00:00
Otis Gospodnetic 0195fcd03d LUCENE-1378 - Removed the remaining 199 @author references
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@695514 13f79535-47bb-0310-9956-ffa450edef68
2008-09-15 15:42:11 +00:00
Karl-Johan Wettin 71f2d8199b LUCENE-1320
ShingleMatrixFilter JDK downgrade 1.5 -> 1.4 
Grant Ingersoll via Karl Wettin

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@694393 13f79535-47bb-0310-9956-ffa450edef68
2008-09-11 18:23:18 +00:00
Michael McCandless 4218996230 LUCENE-1366: rename Field.Index.* options
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@694004 13f79535-47bb-0310-9956-ffa450edef68
2008-09-10 21:38:52 +00:00
Michael McCandless d5a40278bc LUCENE-1369: switch from Hashtable to HashMap and from Vector to List, when possible
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@692921 13f79535-47bb-0310-9956-ffa450edef68
2008-09-07 19:22:40 +00:00
Karl-Johan Wettin bf238a5743 Javadocs fix
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@690779 13f79535-47bb-0310-9956-ffa450edef68
2008-08-31 20:46:47 +00:00
Michael McCandless 003a853cc8 LUCENE-1333: don't use LuceneTestCase in contrib until we can fix the build dependency
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@687539 13f79535-47bb-0310-9956-ffa450edef68
2008-08-21 02:45:37 +00:00
Michael McCandless bb6b711718 LUCENE-1333: improvements to Token reuse API and full cutover to reuse API for all core and contrib analyzers
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@687357 13f79535-47bb-0310-9956-ffa450edef68
2008-08-20 14:38:07 +00:00
Michael McCandless e31a9da835 LUCENE-1334: add Term(String fieldName) constructor that sets term text to empty string
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@687014 13f79535-47bb-0310-9956-ffa450edef68
2008-08-19 10:40:39 +00:00
Karl-Johan Wettin ddc7c290d0 LUCENE-1320
ShingleMatrixFilter, a multidimensional shingle token filter.

Bug fix, did not support empty input token streams.

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@674367 13f79535-47bb-0310-9956-ffa450edef68
2008-07-07 00:08:41 +00:00
Karl-Johan Wettin bca43ea3ea LUCENE-1320
ShingleMatrixFilter, a multidimensional shingle token filter.

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@673549 13f79535-47bb-0310-9956-ffa450edef68
2008-07-02 23:53:51 +00:00
Otis Gospodnetic f5df30327e - Fixed messed up indentation/tabs
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@657281 13f79535-47bb-0310-9956-ffa450edef68
2008-05-17 01:57:32 +00:00
Otis Gospodnetic 1d5ba345cc - Javadocs fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@657280 13f79535-47bb-0310-9956-ffa450edef68
2008-05-17 01:56:46 +00:00
Otis Gospodnetic d5c708a161 - Renamed vars a bit, so test is easier to understand
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@657279 13f79535-47bb-0310-9956-ffa450edef68
2008-05-17 01:55:48 +00:00
Grant Ingersoll 7a27cdcbc9 LUCENE-1166: Added token filter for decomposing compound words
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@657027 13f79535-47bb-0310-9956-ffa450edef68
2008-05-16 12:22:50 +00:00
Otis Gospodnetic aa0074f5db LUCENE-1003: Don't let RussianAnalyzer drop numbers.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@656111 13f79535-47bb-0310-9956-ffa450edef68
2008-05-14 05:37:45 +00:00
Grant Ingersoll cc955c9748 LUCENE-400: Added ShingleFilter (token based ngram)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@642612 13f79535-47bb-0310-9956-ffa450edef68
2008-03-29 21:11:33 +00:00
Grant Ingersoll 9ac963952f LUCENE-1236: Added some more javadocs. Also removed @author tags
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@637449 13f79535-47bb-0310-9956-ffa450edef68
2008-03-15 18:05:10 +00:00
Grant Ingersoll 0dc6c59ac1 LUCENE-494: Added QueryAutoStopWordAnalyzer in a new query subpackage
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@619420 13f79535-47bb-0310-9956-ffa450edef68
2008-02-07 14:13:38 +00:00
Grant Ingersoll 55d0c3a2f8 LUCENE-1077: refactored to have a common PayloadHelper classes. Also added TokenOffsetPayloadTokenFilter, which encodes the Token offset into the payloads
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@604870 13f79535-47bb-0310-9956-ffa450edef68
2007-12-17 13:55:46 +00:00
Grant Ingersoll f9b2e971f2 LUCENE-1077 new sinks and payloads analysis packages
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@602081 13f79535-47bb-0310-9956-ffa450edef68
2007-12-07 12:21:49 +00:00
Michael Busch 9c2a036db3 - LUCENE-908: Improvements and simplifications for how the MANIFEST file and the META-INF dir are created.
- LUCENE-935: Various improvements for the maven artifacts. Now the artifacts also include the sources as .jar files. 

git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@568766 13f79535-47bb-0310-9956-ffa450edef68
2007-08-22 23:16:48 +00:00
Grant Ingersoll 82eb074afd LUCENE-974: Removed Author tags from all existing code
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@564236 13f79535-47bb-0310-9956-ffa450edef68
2007-08-09 15:21:19 +00:00
Doron Cohen 9ff9bf8142 fix javadoc unknown tag warning.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@552111 13f79535-47bb-0310-9956-ffa450edef68
2007-06-30 07:04:27 +00:00
Otis Gospodnetic 71f2c1da8b - LUCENE-906: Elision filter for French.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@551744 13f79535-47bb-0310-9956-ffa450edef68
2007-06-29 00:36:09 +00:00
Michael Busch d955a970b6 LUCENE-622: Add ant target and pom.xml files for building maven artifacts of the Lucene core and the contrib modules.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@547860 13f79535-47bb-0310-9956-ffa450edef68
2007-06-16 04:45:13 +00:00
Michael Busch df0a188415 LUCENE-931: adding missing license headers to various files
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@545696 13f79535-47bb-0310-9956-ffa450edef68
2007-06-09 06:09:46 +00:00
Otis Gospodnetic 1a48e218d6 - Committing forgotten classes for LUCENE-759
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@517477 13f79535-47bb-0310-9956-ffa450edef68
2007-03-13 00:30:13 +00:00
Otis Gospodnetic 534be1599d - LUCENE-759: Two n-gram producting TokenFilters (using them for the spellchecker in SOLR-81)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@513876 13f79535-47bb-0310-9956-ffa450edef68
2007-03-02 18:19:53 +00:00
Otis Gospodnetic 6636d88def - 2-char indentation
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@513866 13f79535-47bb-0310-9956-ffa450edef68
2007-03-02 17:54:27 +00:00
Otis Gospodnetic 7b570fc8b2 - LUCENE-759: Made the tokenizer capable of creating n-grams of a varying sizes - from min to max characters per n-gram. Patch from Adam Hiatt.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@513344 13f79535-47bb-0310-9956-ffa450edef68
2007-03-01 14:22:57 +00:00
Otis Gospodnetic 8cafdd9b64 - Removed isEmpty() Java 6 method, so Andrzej can compile Luke
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@496628 13f79535-47bb-0310-9956-ffa450edef68
2007-01-16 09:07:01 +00:00
Otis Gospodnetic 74e68c9287 - Removed extra ;
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@496283 13f79535-47bb-0310-9956-ffa450edef68
2007-01-15 11:45:04 +00:00
Otis Gospodnetic 2cf113a022 - Javadocs
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@489847 13f79535-47bb-0310-9956-ffa450edef68
2006-12-23 03:36:34 +00:00
Otis Gospodnetic 8b7f6e4ef6 - LUCENE-759: New n-gram-capable tokenizers and their unit tests.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@489802 13f79535-47bb-0310-9956-ffa450edef68
2006-12-22 23:43:17 +00:00
Yonik Seeley 7ca20ee19f new ASF licenses header
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@472959 13f79535-47bb-0310-9956-ffa450edef68
2006-11-09 16:21:50 +00:00
Daniel Naber 2b9effb894 deprecate the analysis.nl.WordlistLoader class because it's not robust (fails silently) and use analysis.WordlistLoader instead
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@413180 13f79535-47bb-0310-9956-ffa450edef68
2006-06-09 22:15:47 +00:00
Chris M. Hostetter 2123b476df LUCENE-503: New ThaiAnalyzer and ThaiWordFilter
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@411863 13f79535-47bb-0310-9956-ffa450edef68
2006-06-05 17:29:01 +00:00
Daniel Naber 18f330f6a6 add missing license header
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@398112 13f79535-47bb-0310-9956-ffa450edef68
2006-04-29 09:54:16 +00:00
Yonik Seeley 3666a166a1 remove deprecations
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@387550 13f79535-47bb-0310-9956-ffa450edef68
2006-03-21 15:36:32 +00:00
Doug Cutting f9f3161f57 Minor javadoc improvements.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@379189 13f79535-47bb-0310-9956-ffa450edef68
2006-02-20 18:11:02 +00:00
Erik Hatcher 7a3103fac0 Applied patched for LUCENE-324, correcting token offsets returned by ChineseTokenizer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@353930 13f79535-47bb-0310-9956-ffa450edef68
2005-12-04 23:07:42 +00:00
Daniel Naber bfde3257dc moving the non-language specific analyzers to core, this is where most users will probably expect them
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@347991 13f79535-47bb-0310-9956-ffa450edef68
2005-11-21 21:35:24 +00:00
Daniel Naber 6da2ef197d update to Apache Software License 2.0
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@189623 13f79535-47bb-0310-9956-ffa450edef68
2005-06-08 19:48:19 +00:00
Daniel Naber a3f99b1f43 small javadoc improvements
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178893 13f79535-47bb-0310-9956-ffa450edef68
2005-05-28 22:58:17 +00:00
Daniel Naber bd2345d856 small javadoc fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178839 13f79535-47bb-0310-9956-ffa450edef68
2005-05-27 23:07:00 +00:00
Daniel Naber 952cfd54be small javadoc fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178833 13f79535-47bb-0310-9956-ffa450edef68
2005-05-27 23:02:07 +00:00
Daniel Naber 816f370c0e small javadoc fixes
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178832 13f79535-47bb-0310-9956-ffa450edef68
2005-05-27 23:00:49 +00:00
Daniel Naber 9d2d4ead75 use entity for umlaut
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@178239 13f79535-47bb-0310-9956-ffa450edef68
2005-05-24 18:44:20 +00:00
Daniel Naber 69b1f490df javadoc: fix typo and use HTML entity so generated HTML is correct
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@169681 13f79535-47bb-0310-9956-ffa450edef68
2005-05-11 19:33:12 +00:00
Erik Hatcher c3847f26ea overhaul of build system to facilitate building and packaging of contrib sub-projects. some work still to be done, but core Lucene build still working fine
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165566 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 00:11:11 +00:00
Erik Hatcher 21431112fe adjust license headers to be ASL 2.0
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@165565 13f79535-47bb-0310-9956-ffa450edef68
2005-05-02 00:08:04 +00:00
Erik Hatcher 790dfc1490 javadoc fixup
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@164742 13f79535-47bb-0310-9956-ffa450edef68
2005-04-26 04:41:54 +00:00
Erik Hatcher d650384d4b add GreekAnalyzer, contributed by Panagiotis Astithas (past@ebs.gr)
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@164686 13f79535-47bb-0310-9956-ffa450edef68
2005-04-25 23:23:37 +00:00
Daniel Naber c4f1ee70a9 use lowercase method names; remove javadoc that's inherited anyway
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160070 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 17:50:38 +00:00
Daniel Naber 04ea892fbe import cleanup
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160065 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 17:45:36 +00:00
Erik Hatcher 6f5f23444c enhanced test contributed by Sven. Encoding tweaks
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160034 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 12:25:16 +00:00
Erik Hatcher 0ff227ff0a switch dotted u character to use unicode value reference
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160023 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 10:16:37 +00:00
Erik Hatcher 4e580e221e Issue deprecation warnings when building test cases. Fixed deprecation warnings on TestKeywordAnalyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160012 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 09:10:59 +00:00
Erik Hatcher 3be3e8ab5d Add accent character normalizer filter contributed by Sven Duzont. Also created simple test case.
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@160011 13f79535-47bb-0310-9956-ffa450edef68
2005-04-04 09:10:05 +00:00
Erik Hatcher 28e712b2ee update docs to account for TLP migration
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@153802 13f79535-47bb-0310-9956-ffa450edef68
2005-02-14 16:48:47 +00:00
Erik Hatcher f375d09898 add customizable buffer size
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@153412 13f79535-47bb-0310-9956-ffa450edef68
2005-02-11 15:30:14 +00:00
Erik Hatcher cd0d0937e1 split keyword tokenizer out of KeywordAnalyzer
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@153398 13f79535-47bb-0310-9956-ffa450edef68
2005-02-11 13:50:37 +00:00
Erik Hatcher 826fef7f6a KeywordAnalyzer contribution - adapted from _Lucene in Action_ code
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@152921 13f79535-47bb-0310-9956-ffa450edef68
2005-02-08 19:13:05 +00:00
Erik Hatcher 0955eef89f move parts of the sandbox over to contrib area
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@151459 13f79535-47bb-0310-9956-ffa450edef68
2005-02-05 01:25:43 +00:00