Lucene contrib change Log ======================= Trunk (not yet released) ======================= Changes in runtime behavior 1. LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all number conversion. You'll need to fully re-index any previously created indexes. This isn't a break in back-compatibility because local Lucene has not yet been released. (Mike McCandless) API Changes 1. LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards compatibility with some public classes. If you have implemented custom Fregmenters or Scorers, you will need to adjust them to work with the new TokenStream API. Rather than getting passed a Token at a time, you will be given a TokenStream to init your impl with - store the Attributes you are interested in locally and access them on each call to the method that used to pass a new Token. Look at the included updated impls for examples. (Mark Miller) Bug fixes 1. LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index. (Karl Wettin) 2. LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the same way IndexWriter does. Parts of InstantiatedIndex was not Serializable. (Karl Wettin) 3. LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index. (Karl Wettin, Robert Newson) 4. LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException due to recursive invocation. (Karl Wettin) 5. LUCENE-1548: Fix distance normalization in LevenshteinDistance to not produce negative distances (Thomas Morton via Mike McCandless) 6. LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS characters to only apply to the correct subset (Daniel Cheng via Mike McCandless) 7. LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after StandardTokenizer so that stop words with mixed case are filtered out. (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless) 8. LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size. (Todd Teak via Otis Gospodnetic) 9. LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so that the regexp must match the entire string, not just a prefix. (Trejkaz via Mike McCandless) New features 1. LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin) 2. LUCENE-1435: Added contrib/collation, a CollationKeyFilter allowing you to convert tokens into CollationKeys encoded usign IndexableBinaryStringTools. This allows for faster RangQuery when a field needs to use a custom Collator. (Steven Rowe via Mike McCandless) 3. LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now read/write bz2 using Apache commons compress library. This means you can download the .bz2 export from http://wikipedia.org and immediately index it. (Shai Erera via Mike McCandless) 4. LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers. It improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese sentences properly. SmartChineseAnalyzer uses a Hidden Markov Model to tokenize Chinese words in a more intelligent way. (Xiaoping Gao via Mike McCandless) 5. LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll) 6. LUCENE-1578: Support for loading unoptimized readers to the constructor of InstantiatedIndex. (Karl Wettin) 7. LUCENE-1704: Allow specifying the Tidy configuration file when parsing HTML docs with contrib/ant. (Keith Sprochi via Mike McCandless) 8. LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative highlighter. (Koji Sekiguchi via Mike McCandless) 9. LUCENE-1740: Added "analyzer" command to Lucli, enabling changing the analyzer from the default StandardAnalyzer. (Bernd Fondermann via Mike McCandless) 10. LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan Leibiusky via Mike McCandless) 11. LUCENE-1745: Added constructors to JakartaRegexpCapabilities and JavaUtilRegexCapabilities as well as static flags to support configuring a RegexCapabilities implementation with the implementation-specific modifier flags. Allows for callers to customize the RegexQuery using the implementation-specific options and fine tune how regular expressions are compiled and matched. (Marc Zampetti zampettim@aim.com via Mike McCandless) Optimizations 1. LUCENE-1643: Re-use the collation key (RawCollationKey) for better performance, in ICUCollationKeyFilter. (Robert Muir via Mike McCandless) Documentation (None) Build (None) Test Cases (None) ======================= Release 2.4.0 2008-10-06 ======================= Changes in runtime behavior (None) API Changes 1. (None) Bug fixes 1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames() and tests that assert that deleted documents behaves as they should (they did). (Jason Rutherglen, Karl Wettin) 2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat the array offset right. (Jason Rutherglen via Karl Wettin) New features 1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin) 2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500. Introducing Hungarian, Turkish and Romanian support, updated older stemmers and optimized (reflectionless) SnowballFilter. IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older) might not be compatible with these updated classes as some algorithms have changed. (Karl Wettin) 3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors or by resolving the inverted index. (Karl Wettin) Documentation (None) Build (None) Test Cases (None)