lucene/contrib/CHANGES.txt

167 lines
6.0 KiB
Plaintext

Lucene contrib change Log
======================= Trunk (not yet released) =======================
Changes in runtime behavior
1. LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all
number conversion. You'll need to fully re-index any previously created indexes.
This isn't a break in back-compatibility because local Lucene has not yet
been released. (Mike McCandless)
API Changes
1. LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards
compatibility with some public classes. If you have implemented custom Fregmenters or Scorers,
you will need to adjust them to work with the new TokenStream API. Rather than getting passed a
Token at a time, you will be given a TokenStream to init your impl with - store the Attributes
you are interested in locally and access them on each call to the method that used to pass a new
Token. Look at the included updated impls for examples. (Mark Miller)
Bug fixes
1. LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index.
(Karl Wettin)
2. LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the
same way IndexWriter does. Parts of InstantiatedIndex was not Serializable.
(Karl Wettin)
3. LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index.
(Karl Wettin, Robert Newson)
4. LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException
due to recursive invocation. (Karl Wettin)
5. LUCENE-1548: Fix distance normalization in LevenshteinDistance to
not produce negative distances (Thomas Morton via Mike McCandless)
6. LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS
characters to only apply to the correct subset (Daniel Cheng via
Mike McCandless)
7. LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after
StandardTokenizer so that stop words with mixed case are filtered
out. (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless)
8. LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size.
(Todd Teak via Otis Gospodnetic)
9. LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by
RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so
that the regexp must match the entire string, not just a prefix.
(Trejkaz via Mike McCandless)
New features
1. LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin)
2. LUCENE-1435: Added contrib/collation, a CollationKeyFilter
allowing you to convert tokens into CollationKeys encoded usign
IndexableBinaryStringTools. This allows for faster RangQuery when
a field needs to use a custom Collator. (Steven Rowe via Mike
McCandless)
3. LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now
read/write bz2 using Apache commons compress library. This means
you can download the .bz2 export from http://wikipedia.org and
immediately index it. (Shai Erera via Mike McCandless)
4. LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers. It
improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese
sentences properly. SmartChineseAnalyzer uses a Hidden Markov
Model to tokenize Chinese words in a more intelligent way.
(Xiaoping Gao via Mike McCandless)
5. LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll)
6. LUCENE-1578: Support for loading unoptimized readers to the
constructor of InstantiatedIndex. (Karl Wettin)
7. LUCENE-1704: Allow specifying the Tidy configuration file when
parsing HTML docs with contrib/ant. (Keith Sprochi via Mike
McCandless)
8. LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative
highlighter. (Koji Sekiguchi via Mike McCandless)
9. LUCENE-1740: Added "analyzer" command to Lucli, enabling changing
the analyzer from the default StandardAnalyzer. (Bernd Fondermann
via Mike McCandless)
10. LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan
Leibiusky via Mike McCandless)
11. LUCENE-1745: Added constructors to JakartaRegexpCapabilities and
JavaUtilRegexCapabilities as well as static flags to support
configuring a RegexCapabilities implementation with the
implementation-specific modifier flags. Allows for callers to
customize the RegexQuery using the implementation-specific options
and fine tune how regular expressions are compiled and
matched. (Marc Zampetti zampettim@aim.com via Mike McCandless)
Optimizations
1. LUCENE-1643: Re-use the collation key (RawCollationKey) for
better performance, in ICUCollationKeyFilter. (Robert Muir via
Mike McCandless)
Documentation
(None)
Build
(None)
Test Cases
(None)
======================= Release 2.4.0 2008-10-06 =======================
Changes in runtime behavior
(None)
API Changes
1.
(None)
Bug fixes
1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames()
and tests that assert that deleted documents behaves as they should (they did).
(Jason Rutherglen, Karl Wettin)
2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat
the array offset right. (Jason Rutherglen via Karl Wettin)
New features
1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin)
2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500.
Introducing Hungarian, Turkish and Romanian support, updated older stemmers
and optimized (reflectionless) SnowballFilter.
IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older)
might not be compatible with these updated classes as some algorithms have changed.
(Karl Wettin)
3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors
or by resolving the inverted index. (Karl Wettin)
Documentation
(None)
Build
(None)
Test Cases
(None)