update contrib changes

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@940820 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Uwe Schindler 2010-05-04 12:12:03 +00:00
parent 20a0dc280d
commit 265a64259c
1 changed files with 77 additions and 65 deletions

View File

@ -2,6 +2,81 @@ Lucene contrib change Log
======================= Trunk (not yet released) =======================
Changes in backwards compatibility policy
* LUCENE-2323: Moved contrib/wikipedia functionality into contrib/analyzers.
Additionally the package was changed from org.apache.lucene.wikipedia.analysis
to org.apache.lucene.analysis.wikipedia. (Robert Muir)
* LUCENE-2413: Consolidated all analyzers into contrib/analyzers.
- contrib/analyzers/smartcn now depends on contrib/analyzers/common
- The "AnalyzerUtil" in wordnet was removed.
... (in progress)
Bug fixes
* LUCENE-2404: Fix bugs with position increment and empty tokens in ThaiWordFilter.
For matchVersion >= 3.1 the filter also no longer lowercases. ThaiAnalyzer
API Changes
* LUCENE-2413: Deprecated PatternAnalyzer in contrib/analyzers, in favor of the
pattern package (CharFilter, Tokenizer, TokenFilter). (Robert Muir)
New features
* LUCENE-2399: Add ICUNormalizer2Filter, which normalizes tokens with ICU's
Normalizer2. This allows for efficient combinations of normalization and custom
mappings in addition to standard normalization, and normalization combined
with unicode case folding. (Robert Muir)
* LUCENE-1343: Add ICUFoldingFilter, a replacement for ASCIIFoldingFilter that
does a more thorough job of normalizing unicode text for search.
(Robert Haschart, Robert Muir)
* LUCENE-2409: Add ICUTransformFilter, which transforms text in a context
sensitive way, either from ICU built-in rules (such as Traditional-Simplified),
or from rules you write yourself. (Robert Muir)
* LUCENE-2298: Add analyzers/stempel, an algorithmic stemmer with support for
the Polish language. (Andrzej Bialecki via Robert Muir)
* LUCENE-2414: Add ICUTokenizer, a tailorable tokenizer that implements Unicode
Text Segmentation. This tokenizer is useful for documents or collections with
multiple languages. The default configuration includes special support for
Thai, Lao, Myanmar, and Khmer. (Robert Muir, Uwe Schindler)
* LUCENE-2413: Consolidated Solr analysis components into contrib/analyzers.
New features from Solr now available to Lucene users include:
- o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms
and phrases.
- o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML
constructs.
- o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words
into subwords and performs optional transformations on subword groups.
- o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which
filters out Tokens at the same position and Term text as the previous token.
- o.a.l.analysis.pattern: Package for pattern-based analysis, containing a
CharFilter, Tokenizer, and Tokenfilter for transforming text with regexes.
(... in progress)
Build
* LUCENE-2399: Upgrade contrib/icu's ICU jar file to ICU 4.4. (Robert Muir)
Optimizations
* LUCENE-2404: Improve performance of ThaiWordFilter by using a char[]-backed
CharacterIterator (currently from javax.swing). (Uwe Schindler, Robert Muir)
Other
* LUCENE-2415: Use reflection instead of a shim class to access Jakarta
Regex prefix. (Uwe Schindler)
======================= Lucene 3.x (not yet released) =======================
Changes in backwards compatibility policy
* LUCENE-2100: All Analyzers in Lucene-contrib have been marked as final.
@ -19,15 +94,6 @@ Changes in backwards compatibility policy
* LUCENE-2226: Moved contrib/snowball functionality into contrib/analyzers.
Be sure to remove any old obselete lucene-snowball jar files from your
classpath! (Robert Muir)
* LUCENE-2323: Moved contrib/wikipedia functionality into contrib/analyzers.
Additionally the package was changed from org.apache.lucene.wikipedia.analysis
to org.apache.lucene.analysis.wikipedia. (Robert Muir)
* LUCENE-2413: Consolidated all analyzers into contrib/analyzers.
- contrib/analyzers/smartcn now depends on contrib/analyzers/common
- The "AnalyzerUtil" in wordnet was removed.
... (in progress)
Changes in runtime behavior
@ -72,10 +138,6 @@ Bug fixes
* LUCENE-2359: Fix bug in CartesianPolyFilterBuilder related to handling of behavior around
the 180th meridian (Grant Ingersoll)
* LUCENE-2404: Fix bugs with position increment and empty tokens in ThaiWordFilter.
For matchVersion >= 3.1 the filter also no longer lowercases. ThaiAnalyzer
will use a separate LowerCaseFilter instead. (Uwe Schindler, Robert Muir)
API Changes
@ -92,11 +154,7 @@ API Changes
stemming. Add Turkish and Romanian stopwords lists to support this.
(Robert Muir, Uwe Schindler, Simon Willnauer)
* LUCENE-2413: Deprecated PatternAnalyzer in contrib/analyzers, in favor of the
pattern package (CharFilter, Tokenizer, TokenFilter). (Robert Muir)
New features
* LUCENE-2306: Add NumericRangeFilter and NumericRangeQuery support to XMLQueryParser.
(Jingkei Ly, via Mark Harwood)
@ -132,45 +190,9 @@ New features
the ability to override any stemmer with a custom dictionary map.
(Robert Muir, Uwe Schindler, Simon Willnauer)
* LUCENE-2399: Add ICUNormalizer2Filter, which normalizes tokens with ICU's
Normalizer2. This allows for efficient combinations of normalization and custom
mappings in addition to standard normalization, and normalization combined
with unicode case folding. (Robert Muir)
* LUCENE-1343: Add ICUFoldingFilter, a replacement for ASCIIFoldingFilter that
does a more thorough job of normalizing unicode text for search.
(Robert Haschart, Robert Muir)
* LUCENE-2409: Add ICUTransformFilter, which transforms text in a context
sensitive way, either from ICU built-in rules (such as Traditional-Simplified),
or from rules you write yourself. (Robert Muir)
* LUCENE-2298: Add analyzers/stempel, an algorithmic stemmer with support for
the Polish language. (Andrzej Bialecki via Robert Muir)
* LUCENE-2414: Add ICUTokenizer, a tailorable tokenizer that implements Unicode
Text Segmentation. This tokenizer is useful for documents or collections with
multiple languages. The default configuration includes special support for
Thai, Lao, Myanmar, and Khmer. (Robert Muir, Uwe Schindler)
* LUCENE-2400: ShingleFilter was changed to don't output all-filler shingles and
unigrams, and uses a more performant algorithm to build grams using a linked list
of AttributeSource.cloneAttributes() instances and the new copyTo() method.
(Steven Rowe via Uwe Schindler)
* LUCENE-2413: Consolidated Solr analysis components into contrib/analyzers.
New features from Solr now available to Lucene users include:
- o.a.l.analysis.commongrams: Constructs n-grams for frequently occurring terms
and phrases.
- o.a.l.analysis.charfilter.HTMLStripCharFilter: CharFilter that strips HTML
constructs.
- o.a.l.analysis.miscellaneous.WordDelimiterFilter: TokenFilter that splits words
into subwords and performs optional transformations on subword groups.
- o.a.l.analysis.miscellaneous.RemoveDuplicatesTokenFilter: TokenFilter which
filters out Tokens at the same position and Term text as the previous token.
- o.a.l.analysis.pattern: Package for pattern-based analysis, containing a
CharFilter, Tokenizer, and Tokenfilter for transforming text with regexes.
(... in progress)
Build
@ -179,17 +201,13 @@ Build
(Steven Rowe, Robert Muir)
* LUCENE-2323: Moved contrib/regex into contrib/queries. Moved the
queryparsers under contrib/misc and contrib/surround into contrib/queryparser.
Moved contrib/fast-vector-highlighter into contrib/highlighter.
Moved ChainedFilter from contrib/misc to contrib/queries. contrib/spatial now
depends on contrib/queries instead of contrib/misc. (Robert Muir)
queryparsers under contrib/misc into contrib/queryparser. Moved
contrib/fast-vector-highlighter into contrib/highlighter. (Robert Muir)
* LUCENE-2333: Fix failures during contrib builds, when classes in
core were changed without ant clean. This fix also optimizes the
dependency management between contribs by a new ANT macro.
(Uwe Schindler, Shai Erera)
* LUCENE-2399: Upgrade contrib/icu's ICU jar file to ICU 4.4. (Robert Muir)
Optimizations
@ -206,9 +224,6 @@ Optimizations
have been optimized to work on char[] and remove unnecessary object creation.
(Shai Erera, Robert Muir)
* LUCENE-2404: Improve performance of ThaiWordFilter by using a char[]-backed
CharacterIterator (currently from javax.swing). (Uwe Schindler, Robert Muir)
Test Cases
* LUCENE-2115: Cutover contrib tests to use Java5 generics. (Kay Kay
@ -219,9 +234,6 @@ Other
* LUCENE-1845: Updated bdb-je jar from version 3.3.69 to 3.3.93.
(Simon Willnauer via Mike McCandless)
* LUCENE-2415: Use reflection instead of a shim class to access Jakarta
Regex prefix. (Uwe Schindler)
================== Release 2.9.2 / 3.0.1 2010-02-26 ====================
New features