lucene/contrib/CHANGES.txt

Lucene contrib change Log

======================= Trunk (not yet released) =======================

Changes in runtime behavior

 1. LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all
    number conversion.  You'll need to fully re-index any previously created indexes.
    This isn't a break in back-compatibility because local Lucene has not yet
    been released.  (Mike McCandless)

API Changes

 1. LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards
    compatibility with some public classes. If you have implemented custom Fregmenters or Scorers,
    you will need to adjust them to work with the new TokenStream API. Rather than getting passed a
    Token at a time, you will be given a TokenStream to init your impl with - store the Attributes
    you are interested in locally and access them on each call to the method that used to pass a new
    Token. Look at the included updated impls for examples.  (Mark Miller)

Bug fixes

 1. LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index.
    (Karl Wettin)

 2. LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the
    same way IndexWriter does. Parts of InstantiatedIndex was not Serializable.
    (Karl Wettin)

 3. LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index.
    (Karl Wettin, Robert Newson)

 4. LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException
    due to recursive invocation. (Karl Wettin)

 5. LUCENE-1548: Fix distance normalization in LevenshteinDistance to
    not produce negative distances (Thomas Morton via Mike McCandless)

 6. LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS
    characters to only apply to the correct subset (Daniel Cheng via
    Mike McCandless)

 7. LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after
    StandardTokenizer so that stop words with mixed case are filtered
    out.  (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless)

 8. LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size.
    (Todd Teak via Otis Gospodnetic)

 9. LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by
    RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so
    that the regexp must match the entire string, not just a prefix.
    (Trejkaz via Mike McCandless)

New features

 1. LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin)

 2. LUCENE-1435: Added contrib/collation, a CollationKeyFilter
    allowing you to convert tokens into CollationKeys encoded usign
    IndexableBinaryStringTools.  This allows for faster RangQuery when
    a field needs to use a custom Collator.  (Steven Rowe via Mike
    McCandless)

 3. LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now
    read/write bz2 using Apache commons compress library.  This means
    you can download the .bz2 export from http://wikipedia.org and
    immediately index it.  (Shai Erera via Mike McCandless)

 4. LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers.  It
    improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese
    sentences properly.  SmartChineseAnalyzer uses a Hidden Markov
    Model to tokenize Chinese words in a more intelligent way.
    (Xiaoping Gao via Mike McCandless)

 5. LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll)

 6. LUCENE-1578: Support for loading unoptimized readers to the
    constructor of InstantiatedIndex. (Karl Wettin)

 7. LUCENE-1704: Allow specifying the Tidy configuration file when
    parsing HTML docs with contrib/ant.  (Keith Sprochi via Mike
    McCandless)

 8. LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative
    highlighter.  (Koji Sekiguchi via Mike McCandless)

 9. LUCENE-1740: Added "analyzer" command to Lucli, enabling changing
    the analyzer from the default StandardAnalyzer.  (Bernd Fondermann
    via Mike McCandless)

10. LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan
    Leibiusky via Mike McCandless)

11. LUCENE-1745: Added constructors to JakartaRegexpCapabilities and
    JavaUtilRegexCapabilities as well as static flags to support
    configuring a RegexCapabilities implementation with the
    implementation-specific modifier flags. Allows for callers to
    customize the RegexQuery using the implementation-specific options
    and fine tune how regular expressions are compiled and
    matched. (Marc Zampetti zampettim@aim.com via Mike McCandless)

Optimizations

  1. LUCENE-1643: Re-use the collation key (RawCollationKey) for
     better performance, in ICUCollationKeyFilter.  (Robert Muir via
     Mike McCandless)

Documentation

 (None)

Build

 (None)

Test Cases

 (None)

======================= Release 2.4.0 2008-10-06 =======================

Changes in runtime behavior

 (None)

API Changes

 1.

 (None)

Bug fixes

 1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames()
    and tests that assert that deleted documents behaves as they should (they did).
    (Jason Rutherglen, Karl Wettin)

 2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat
    the array offset right. (Jason Rutherglen via Karl Wettin)

New features

 1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin)

 2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500.
    Introducing Hungarian, Turkish and Romanian support, updated older stemmers
    and optimized (reflectionless) SnowballFilter.
    IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older)
    might not be compatible with these updated classes as some algorithms have changed.
    (Karl Wettin)

 3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors
    or by resolving the inverted index. (Karl Wettin)

Documentation

 (None)

Build

 (None)

Test Cases

 (None)