2008-08-23 13:12:57 -04:00
|
|
|
Lucene contrib change Log
|
|
|
|
|
|
|
|
======================= Trunk (not yet released) =======================
|
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
Changes in runtime behavior
|
|
|
|
|
2009-07-16 11:38:06 -04:00
|
|
|
1. LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all
|
|
|
|
number conversion. You'll need to fully re-index any previously created indexes.
|
|
|
|
This isn't a break in back-compatibility because local Lucene has not yet
|
|
|
|
been released. (Mike McCandless)
|
2009-08-05 14:22:22 -04:00
|
|
|
|
|
|
|
2. LUCENE-1758: ArabicAnalyzer now uses the light10 algorithm, has a refined
|
|
|
|
default stopword list, and lowercases non-Arabic text.
|
|
|
|
You'll need to fully re-index any previously created indexes. This isn't a
|
|
|
|
break in back-compatibility because ArabicAnalyzer has not yet been
|
|
|
|
released. (Robert Muir)
|
2008-10-18 12:29:53 -04:00
|
|
|
|
2009-08-04 10:33:58 -04:00
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
API Changes
|
|
|
|
|
2009-07-30 18:00:47 -04:00
|
|
|
1. LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards
|
|
|
|
compatibility with some public classes. If you have implemented custom Fregmenters or Scorers,
|
|
|
|
you will need to adjust them to work with the new TokenStream API. Rather than getting passed a
|
|
|
|
Token at a time, you will be given a TokenStream to init your impl with - store the Attributes
|
|
|
|
you are interested in locally and access them on each call to the method that used to pass a new
|
|
|
|
Token. Look at the included updated impls for examples. (Mark Miller)
|
2008-10-18 12:29:53 -04:00
|
|
|
|
2009-08-03 00:33:10 -04:00
|
|
|
2. LUCENE-1460: Change contrib TokenStreams/Filters to use the new
|
|
|
|
TokenStream API. (Robert Muir, Michael Busch)
|
|
|
|
|
|
|
|
3. LUCENE-1775: Change remaining TokenFilters (shingle, prefix-suffix) to
|
|
|
|
use the new TokenStream API. (Robert Muir, Michael Busch)
|
2009-08-04 10:33:58 -04:00
|
|
|
|
|
|
|
4. LUCENE-1685: The position aware SpanScorer has become the default scorer
|
|
|
|
for Highlighting. The SpanScorer implementation has replaced QueryScorer
|
|
|
|
and the old term highlighting QueryScorer has been renamed to
|
|
|
|
QueryTermScorer. Multi-term queries are also now expanded by default. If
|
|
|
|
you were previously rewritting the query for multi-term query highlighting,
|
|
|
|
you should no longer do that (unless you switch to using QueryTermScorer).
|
|
|
|
The SpanScorer API (now QueryScorer) has also been improved to more closely
|
|
|
|
match the API of the previous QueryScorer implementation. (Mark Miller)
|
|
|
|
|
2009-08-03 00:33:10 -04:00
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
Bug fixes
|
|
|
|
|
|
|
|
1. LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index.
|
|
|
|
(Karl Wettin)
|
|
|
|
|
2008-12-11 17:08:45 -05:00
|
|
|
2. LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the
|
|
|
|
same way IndexWriter does. Parts of InstantiatedIndex was not Serializable.
|
|
|
|
(Karl Wettin)
|
|
|
|
|
2009-01-08 04:28:42 -05:00
|
|
|
3. LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index.
|
|
|
|
(Karl Wettin, Robert Newson)
|
|
|
|
|
2009-01-09 10:34:52 -05:00
|
|
|
4. LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException
|
|
|
|
due to recursive invocation. (Karl Wettin)
|
|
|
|
|
2009-02-27 09:07:12 -05:00
|
|
|
5. LUCENE-1548: Fix distance normalization in LevenshteinDistance to
|
|
|
|
not produce negative distances (Thomas Morton via Mike McCandless)
|
|
|
|
|
2009-03-18 17:42:17 -04:00
|
|
|
6. LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS
|
|
|
|
characters to only apply to the correct subset (Daniel Cheng via
|
|
|
|
Mike McCandless)
|
|
|
|
|
2009-03-27 15:04:25 -04:00
|
|
|
7. LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after
|
|
|
|
StandardTokenizer so that stop words with mixed case are filtered
|
|
|
|
out. (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless)
|
|
|
|
|
2009-07-14 15:44:52 -04:00
|
|
|
8. LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size.
|
|
|
|
(Todd Teak via Otis Gospodnetic)
|
|
|
|
|
2009-07-31 14:02:56 -04:00
|
|
|
9. LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by
|
|
|
|
RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so
|
|
|
|
that the regexp must match the entire string, not just a prefix.
|
|
|
|
(Trejkaz via Mike McCandless)
|
|
|
|
|
2009-08-18 20:09:41 -04:00
|
|
|
10. LUCENE-1792: Fix new query parser to set rewrite method for
|
|
|
|
multi-term queries. (Luis Alves, Mike McCandless via Michael Busch)
|
|
|
|
|
2009-08-21 04:02:40 -04:00
|
|
|
11. LUCENE-1828: Fix memory index to call TokenStream.reset() and
|
|
|
|
TokenStream.end(). (Tim Smith via Michael Busch)
|
2009-08-04 10:33:58 -04:00
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
New features
|
|
|
|
|
2009-06-19 08:16:52 -04:00
|
|
|
1. LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin)
|
2008-10-18 12:29:53 -04:00
|
|
|
|
2009-06-19 08:16:52 -04:00
|
|
|
2. LUCENE-1435: Added contrib/collation, a CollationKeyFilter
|
2009-03-19 06:51:55 -04:00
|
|
|
allowing you to convert tokens into CollationKeys encoded usign
|
|
|
|
IndexableBinaryStringTools. This allows for faster RangQuery when
|
|
|
|
a field needs to use a custom Collator. (Steven Rowe via Mike
|
|
|
|
McCandless)
|
|
|
|
|
2009-06-19 08:16:52 -04:00
|
|
|
3. LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now
|
2009-04-16 05:46:30 -04:00
|
|
|
read/write bz2 using Apache commons compress library. This means
|
|
|
|
you can download the .bz2 export from http://wikipedia.org and
|
|
|
|
immediately index it. (Shai Erera via Mike McCandless)
|
|
|
|
|
2009-06-19 08:16:52 -04:00
|
|
|
4. LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers. It
|
2009-05-14 06:50:52 -04:00
|
|
|
improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese
|
|
|
|
sentences properly. SmartChineseAnalyzer uses a Hidden Markov
|
|
|
|
Model to tokenize Chinese words in a more intelligent way.
|
|
|
|
(Xiaoping Gao via Mike McCandless)
|
2009-06-12 18:26:01 -04:00
|
|
|
|
2009-06-19 08:16:52 -04:00
|
|
|
5. LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll)
|
2009-06-13 17:54:07 -04:00
|
|
|
|
2009-06-19 08:16:52 -04:00
|
|
|
6. LUCENE-1578: Support for loading unoptimized readers to the
|
2009-06-13 17:54:07 -04:00
|
|
|
constructor of InstantiatedIndex. (Karl Wettin)
|
2009-07-06 15:55:05 -04:00
|
|
|
|
|
|
|
7. LUCENE-1704: Allow specifying the Tidy configuration file when
|
|
|
|
parsing HTML docs with contrib/ant. (Keith Sprochi via Mike
|
|
|
|
McCandless)
|
2009-07-09 09:06:51 -04:00
|
|
|
|
|
|
|
8. LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative
|
|
|
|
highlighter. (Koji Sekiguchi via Mike McCandless)
|
2009-07-13 06:06:01 -04:00
|
|
|
|
|
|
|
9. LUCENE-1740: Added "analyzer" command to Lucli, enabling changing
|
|
|
|
the analyzer from the default StandardAnalyzer. (Bernd Fondermann
|
|
|
|
via Mike McCandless)
|
2009-07-14 12:56:16 -04:00
|
|
|
|
|
|
|
10. LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan
|
|
|
|
Leibiusky via Mike McCandless)
|
2009-06-13 17:54:07 -04:00
|
|
|
|
2009-07-31 13:41:04 -04:00
|
|
|
11. LUCENE-1745: Added constructors to JakartaRegexpCapabilities and
|
|
|
|
JavaUtilRegexCapabilities as well as static flags to support
|
|
|
|
configuring a RegexCapabilities implementation with the
|
|
|
|
implementation-specific modifier flags. Allows for callers to
|
|
|
|
customize the RegexQuery using the implementation-specific options
|
|
|
|
and fine tune how regular expressions are compiled and
|
|
|
|
matched. (Marc Zampetti zampettim@aim.com via Mike McCandless)
|
|
|
|
|
2009-08-02 23:38:44 -04:00
|
|
|
12. LUCENE-1567: Added a new QueryParser framework, that allows
|
|
|
|
implementing a new query syntax in a flexible and efficient way.
|
|
|
|
This new QueryParser will be moved to Lucene's core in release
|
|
|
|
3.0 and will then replace the current core QueryParser, which
|
|
|
|
has been deprecated with this patch.
|
|
|
|
(Luis Alves and Adriano Campos via Michael Busch)
|
2009-08-03 00:06:22 -04:00
|
|
|
|
|
|
|
13. LUCENE-1486: Added ComplexPhraseQueryParser, an extension of QueryParser
|
|
|
|
that allows a subset of the Lucene query language to be embedded in
|
|
|
|
PhraseQuerys. Wildcard, Range, and Fuzzy queries, as well as limited
|
|
|
|
boolean logic, can be used within quote operators with this parser, ie:
|
|
|
|
"(jo* -john) smyth~". (Mark Harwood via Mark Miller)
|
2009-08-03 22:57:00 -04:00
|
|
|
|
|
|
|
14. Added web-based demo of functionality in contrib's XML Query Parser
|
2009-08-04 09:56:11 -04:00
|
|
|
packaged as War file (Mark Harwood)
|
|
|
|
|
2009-08-04 11:05:34 -04:00
|
|
|
15. LUCENE-1406: Added Arabic analyzer. (Robert Muir via Grant Ingersoll)
|
|
|
|
|
2009-08-10 19:29:27 -04:00
|
|
|
16. LUCENE-1628: Added Persian analyzer. (Robert Muir)
|
2009-08-02 23:38:44 -04:00
|
|
|
|
2009-08-19 08:07:15 -04:00
|
|
|
17. LUCENE-1813: Add option to ReverseStringFilter to mark reversed tokens.
|
|
|
|
(Andrzej Bialecki via Robert Muir)
|
|
|
|
|
2009-05-19 05:50:24 -04:00
|
|
|
Optimizations
|
|
|
|
|
|
|
|
1. LUCENE-1643: Re-use the collation key (RawCollationKey) for
|
|
|
|
better performance, in ICUCollationKeyFilter. (Robert Muir via
|
|
|
|
Mike McCandless)
|
2009-02-09 06:49:33 -05:00
|
|
|
|
2009-08-16 08:37:05 -04:00
|
|
|
2. LUCENE-1794: Implement TokenStream reuse for contrib Analyzers,
|
|
|
|
and implement reset() for TokenStreams to support reuse. (Robert Muir)
|
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
Documentation
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
|
|
|
Build
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
|
|
|
Test Cases
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
2008-10-01 07:22:58 -04:00
|
|
|
======================= Release 2.4.0 2008-10-06 =======================
|
|
|
|
|
2008-08-23 13:12:57 -04:00
|
|
|
Changes in runtime behavior
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
|
|
|
API Changes
|
|
|
|
|
2008-08-23 18:02:47 -04:00
|
|
|
1.
|
|
|
|
|
2008-08-23 13:12:57 -04:00
|
|
|
(None)
|
|
|
|
|
|
|
|
Bug fixes
|
|
|
|
|
|
|
|
1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames()
|
|
|
|
and tests that assert that deleted documents behaves as they should (they did).
|
|
|
|
(Jason Rutherglen, Karl Wettin)
|
|
|
|
|
|
|
|
2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat
|
|
|
|
the array offset right. (Jason Rutherglen via Karl Wettin)
|
|
|
|
|
|
|
|
New features
|
|
|
|
|
|
|
|
1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin)
|
|
|
|
|
2008-08-23 18:02:47 -04:00
|
|
|
2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500.
|
|
|
|
Introducing Hungarian, Turkish and Romanian support, updated older stemmers
|
|
|
|
and optimized (reflectionless) SnowballFilter.
|
|
|
|
IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older)
|
|
|
|
might not be compatible with these updated classes as some algorithms have changed.
|
|
|
|
(Karl Wettin)
|
|
|
|
|
2008-08-25 11:02:20 -04:00
|
|
|
3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors
|
|
|
|
or by resolving the inverted index. (Karl Wettin)
|
|
|
|
|
2008-08-23 13:12:57 -04:00
|
|
|
Documentation
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
|
|
|
Build
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
|
|
|
Test Cases
|
|
|
|
|
2008-10-01 07:22:58 -04:00
|
|
|
(None)
|