Lucene contrib change Log ======================= Trunk (not yet released) ======================= Changes in runtime behavior (None) API Changes (None) Bug fixes 1. LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index. (Karl Wettin) 2. LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the same way IndexWriter does. Parts of InstantiatedIndex was not Serializable. (Karl Wettin) 3. LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index. (Karl Wettin, Robert Newson) 4. LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException due to recursive invocation. (Karl Wettin) 5. LUCENE-1548: Fix distance normalization in LevenshteinDistance to not produce negative distances (Thomas Morton via Mike McCandless) 6. LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS characters to only apply to the correct subset (Daniel Cheng via Mike McCandless) 7. LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after StandardTokenizer so that stop words with mixed case are filtered out. (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless) New features 1. LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin) 2. LUCENE-1435: Added contrib/collation, a CollationKeyFilter allowing you to convert tokens into CollationKeys encoded usign IndexableBinaryStringTools. This allows for faster RangQuery when a field needs to use a custom Collator. (Steven Rowe via Mike McCandless) 3. LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now read/write bz2 using Apache commons compress library. This means you can download the .bz2 export from http://wikipedia.org and immediately index it. (Shai Erera via Mike McCandless) 4. LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers. It improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese sentences properly. SmartChineseAnalyzer uses a Hidden Markov Model to tokenize Chinese words in a more intelligent way. (Xiaoping Gao via Mike McCandless) 5. LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll) 6. LUCENE-1578: Support for loading unoptimized readers to the constructor of InstantiatedIndex. (Karl Wettin) 7. LUCENE-1704: Allow specifying the Tidy configuration file when parsing HTML docs with contrib/ant. (Keith Sprochi via Mike McCandless) 8. LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative highlighter. (Koji Sekiguchi via Mike McCandless) Optimizations 1. LUCENE-1643: Re-use the collation key (RawCollationKey) for better performance, in ICUCollationKeyFilter. (Robert Muir via Mike McCandless) Documentation (None) Build (None) Test Cases (None) ======================= Release 2.4.0 2008-10-06 ======================= Changes in runtime behavior (None) API Changes 1. (None) Bug fixes 1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames() and tests that assert that deleted documents behaves as they should (they did). (Jason Rutherglen, Karl Wettin) 2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat the array offset right. (Jason Rutherglen via Karl Wettin) New features 1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin) 2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500. Introducing Hungarian, Turkish and Romanian support, updated older stemmers and optimized (reflectionless) SnowballFilter. IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older) might not be compatible with these updated classes as some algorithms have changed. (Karl Wettin) 3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors or by resolving the inverted index. (Karl Wettin) Documentation (None) Build (None) Test Cases (None)