2008-08-23 13:12:57 -04:00
|
|
|
Lucene contrib change Log
|
|
|
|
|
2009-09-24 17:36:11 -04:00
|
|
|
======================= Trunk (not yet released) =======================
|
|
|
|
|
|
|
|
Changes in runtime behavior
|
|
|
|
|
|
|
|
API Changes
|
|
|
|
|
2009-10-01 15:20:09 -04:00
|
|
|
* LUCENE-1936: Deprecated RussianLowerCaseFilter, because it transforms
|
|
|
|
text exactly the same as LowerCaseFilter. Please use LowerCaseFilter
|
|
|
|
instead, which has the same functionality. (Robert Muir)
|
|
|
|
|
2009-09-24 17:36:11 -04:00
|
|
|
Bug fixes
|
|
|
|
|
|
|
|
* LUCENE-1781: Fixed various issues with the lat/lng bounding box
|
|
|
|
distance filter created for radius search in contrib/spatial.
|
|
|
|
(Bill Bell via Mike McCandless)
|
|
|
|
|
|
|
|
New features
|
|
|
|
|
2009-10-01 08:23:03 -04:00
|
|
|
* LUCENE-1924: Added BalancedSegmentMergePolicy to contrib/misc,
|
|
|
|
which is a merge policy that tries to avoid doing very large
|
|
|
|
segment merges to give better search performance in a mixed
|
|
|
|
indexing/searching environment. (John Wang via Mike McCandless)
|
|
|
|
|
2009-09-24 17:36:11 -04:00
|
|
|
Optimizations
|
|
|
|
|
|
|
|
Documentation
|
|
|
|
|
|
|
|
Build
|
|
|
|
|
|
|
|
Test Cases
|
2009-09-21 10:23:44 -04:00
|
|
|
======================= Release 2.9.0 2009-09-23 =======================
|
2008-08-23 13:12:57 -04:00
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
Changes in runtime behavior
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1505: Local lucene now uses org.apache.lucene.util.NumericUtils for all
|
2009-07-16 11:38:06 -04:00
|
|
|
number conversion. You'll need to fully re-index any previously created indexes.
|
|
|
|
This isn't a break in back-compatibility because local Lucene has not yet
|
|
|
|
been released. (Mike McCandless)
|
2009-08-05 14:22:22 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1758: ArabicAnalyzer now uses the light10 algorithm, has a refined
|
2009-08-05 14:22:22 -04:00
|
|
|
default stopword list, and lowercases non-Arabic text.
|
|
|
|
You'll need to fully re-index any previously created indexes. This isn't a
|
|
|
|
break in back-compatibility because ArabicAnalyzer has not yet been
|
|
|
|
released. (Robert Muir)
|
2008-10-18 12:29:53 -04:00
|
|
|
|
2009-08-04 10:33:58 -04:00
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
API Changes
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1695: Update the Highlighter to use the new TokenStream API. This issue breaks backwards
|
2009-09-02 11:23:58 -04:00
|
|
|
compatibility with some public classes. If you have implemented custom Fragmenters or Scorers,
|
2009-07-30 18:00:47 -04:00
|
|
|
you will need to adjust them to work with the new TokenStream API. Rather than getting passed a
|
|
|
|
Token at a time, you will be given a TokenStream to init your impl with - store the Attributes
|
|
|
|
you are interested in locally and access them on each call to the method that used to pass a new
|
|
|
|
Token. Look at the included updated impls for examples. (Mark Miller)
|
2008-10-18 12:29:53 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1460: Change contrib TokenStreams/Filters to use the new
|
2009-08-03 00:33:10 -04:00
|
|
|
TokenStream API. (Robert Muir, Michael Busch)
|
|
|
|
|
2009-09-09 02:02:54 -04:00
|
|
|
* LUCENE-1775, LUCENE-1903: Change remaining TokenFilters (shingle, prefix-suffix)
|
|
|
|
to use the new TokenStream API. ShingleFilter is much more efficient now,
|
|
|
|
it clones much less often and computes the tokens mostly on the fly now.
|
2009-09-09 02:05:46 -04:00
|
|
|
Also added more tests. (Robert Muir, Michael Busch, Uwe Schindler, Chris Harris)
|
2009-08-04 10:33:58 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1685: The position aware SpanScorer has become the default scorer
|
2009-08-04 10:33:58 -04:00
|
|
|
for Highlighting. The SpanScorer implementation has replaced QueryScorer
|
|
|
|
and the old term highlighting QueryScorer has been renamed to
|
|
|
|
QueryTermScorer. Multi-term queries are also now expanded by default. If
|
2009-09-02 11:23:58 -04:00
|
|
|
you were previously rewriting the query for multi-term query highlighting,
|
2009-08-04 10:33:58 -04:00
|
|
|
you should no longer do that (unless you switch to using QueryTermScorer).
|
|
|
|
The SpanScorer API (now QueryScorer) has also been improved to more closely
|
|
|
|
match the API of the previous QueryScorer implementation. (Mark Miller)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
|
2009-08-22 16:36:06 -04:00
|
|
|
Analyzers. If you need to index text in these encodings, please use Java's
|
|
|
|
character set conversion facilities (InputStreamReader, etc) during I/O,
|
|
|
|
so that Lucene can analyze this text as Unicode instead. (Robert Muir)
|
2009-08-03 00:33:10 -04:00
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
Bug fixes
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1423: InstantiatedTermEnum#skipTo(Term) throws ArrayIndexOutOfBounds on empty index.
|
2008-10-18 12:29:53 -04:00
|
|
|
(Karl Wettin)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1462: InstantiatedIndexWriter did not reset pre analyzed TokenStreams the
|
2008-12-11 17:08:45 -05:00
|
|
|
same way IndexWriter does. Parts of InstantiatedIndex was not Serializable.
|
|
|
|
(Karl Wettin)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1510: InstantiatedIndexReader#norms methods throws NullPointerException on empty index.
|
2009-01-08 04:28:42 -05:00
|
|
|
(Karl Wettin, Robert Newson)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1514: ShingleMatrixFilter#next(Token) easily throws a StackOverflowException
|
2009-01-09 10:34:52 -05:00
|
|
|
due to recursive invocation. (Karl Wettin)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1548: Fix distance normalization in LevenshteinDistance to
|
2009-02-27 09:07:12 -05:00
|
|
|
not produce negative distances (Thomas Morton via Mike McCandless)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1490: Fix latin1 conversion of HALFWIDTH_AND_FULLWIDTH_FORMS
|
2009-03-18 17:42:17 -04:00
|
|
|
characters to only apply to the correct subset (Daniel Cheng via
|
|
|
|
Mike McCandless)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1576: Fix BrazilianAnalyzer to downcase tokens after
|
2009-03-27 15:04:25 -04:00
|
|
|
StandardTokenizer so that stop words with mixed case are filtered
|
|
|
|
out. (Rafael Cunha de Almeida, Douglas Campos via Mike McCandless)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1491: EdgeNGramTokenFilter no longer stops on tokens shorter than minimum n-gram size.
|
2009-07-14 15:44:52 -04:00
|
|
|
(Todd Teak via Otis Gospodnetic)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1683: Fixed JavaUtilRegexCapabilities (an impl used by
|
2009-07-31 14:02:56 -04:00
|
|
|
RegexQuery) to use Matcher.matches() not Matcher.lookingAt() so
|
|
|
|
that the regexp must match the entire string, not just a prefix.
|
|
|
|
(Trejkaz via Mike McCandless)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1792: Fix new query parser to set rewrite method for
|
2009-08-18 20:09:41 -04:00
|
|
|
multi-term queries. (Luis Alves, Mike McCandless via Michael Busch)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1828: Fix memory index to call TokenStream.reset() and
|
2009-08-21 04:02:40 -04:00
|
|
|
TokenStream.end(). (Tim Smith via Michael Busch)
|
2009-08-04 10:33:58 -04:00
|
|
|
|
2009-09-17 07:41:30 -04:00
|
|
|
* LUCENE-1912: Fix fast-vector-highlighter issue when two or more
|
|
|
|
terms are concatenated (Koji Sekiguchi via Mike McCandless)
|
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
New features
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1531: Added support for BoostingTermQuery to XML query parser. (Karl Wettin)
|
2008-10-18 12:29:53 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1435: Added contrib/collation, a CollationKeyFilter
|
2009-09-02 11:23:58 -04:00
|
|
|
allowing you to convert tokens into CollationKeys encoded using
|
|
|
|
IndexableBinaryStringTools. This allows for faster RangeQuery when
|
2009-03-19 06:51:55 -04:00
|
|
|
a field needs to use a custom Collator. (Steven Rowe via Mike
|
|
|
|
McCandless)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1591: EnWikiDocMaker, LineDocMaker, WriteLineDoc can now
|
2009-04-16 05:46:30 -04:00
|
|
|
read/write bz2 using Apache commons compress library. This means
|
|
|
|
you can download the .bz2 export from http://wikipedia.org and
|
|
|
|
immediately index it. (Shai Erera via Mike McCandless)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1629: Add SmartChineseAnalyzer to contrib/analyzers. It
|
2009-05-14 06:50:52 -04:00
|
|
|
improves on CJKAnalyzer and ChineseAnalyzer by handling Chinese
|
|
|
|
sentences properly. SmartChineseAnalyzer uses a Hidden Markov
|
|
|
|
Model to tokenize Chinese words in a more intelligent way.
|
|
|
|
(Xiaoping Gao via Mike McCandless)
|
2009-06-12 18:26:01 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1676: Added DelimitedPayloadTokenFilter class for automatically adding payloads "in-stream" (Grant Ingersoll)
|
2009-06-13 17:54:07 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1578: Support for loading unoptimized readers to the
|
2009-06-13 17:54:07 -04:00
|
|
|
constructor of InstantiatedIndex. (Karl Wettin)
|
2009-07-06 15:55:05 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1704: Allow specifying the Tidy configuration file when
|
2009-07-06 15:55:05 -04:00
|
|
|
parsing HTML docs with contrib/ant. (Keith Sprochi via Mike
|
|
|
|
McCandless)
|
2009-07-09 09:06:51 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1522: Added contrib/fast-vector-highlighter, a new alternative
|
2009-07-09 09:06:51 -04:00
|
|
|
highlighter. (Koji Sekiguchi via Mike McCandless)
|
2009-07-13 06:06:01 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1740: Added "analyzer" command to Lucli, enabling changing
|
2009-07-13 06:06:01 -04:00
|
|
|
the analyzer from the default StandardAnalyzer. (Bernd Fondermann
|
|
|
|
via Mike McCandless)
|
2009-07-14 12:56:16 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1272: Add get/setBoost to MoreLikeThis. (Jonathan
|
2009-07-14 12:56:16 -04:00
|
|
|
Leibiusky via Mike McCandless)
|
2009-06-13 17:54:07 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1745: Added constructors to JakartaRegexpCapabilities and
|
2009-07-31 13:41:04 -04:00
|
|
|
JavaUtilRegexCapabilities as well as static flags to support
|
|
|
|
configuring a RegexCapabilities implementation with the
|
|
|
|
implementation-specific modifier flags. Allows for callers to
|
|
|
|
customize the RegexQuery using the implementation-specific options
|
|
|
|
and fine tune how regular expressions are compiled and
|
|
|
|
matched. (Marc Zampetti zampettim@aim.com via Mike McCandless)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1567: Added a new QueryParser framework, that allows
|
2009-08-02 23:38:44 -04:00
|
|
|
implementing a new query syntax in a flexible and efficient way.
|
|
|
|
This new QueryParser will be moved to Lucene's core in release
|
|
|
|
3.0 and will then replace the current core QueryParser, which
|
|
|
|
has been deprecated with this patch.
|
|
|
|
(Luis Alves and Adriano Campos via Michael Busch)
|
2009-08-03 00:06:22 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1486: Added ComplexPhraseQueryParser, an extension of QueryParser
|
2009-08-03 00:06:22 -04:00
|
|
|
that allows a subset of the Lucene query language to be embedded in
|
|
|
|
PhraseQuerys. Wildcard, Range, and Fuzzy queries, as well as limited
|
|
|
|
boolean logic, can be used within quote operators with this parser, ie:
|
|
|
|
"(jo* -john) smyth~". (Mark Harwood via Mark Miller)
|
2009-08-03 22:57:00 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* Added web-based demo of functionality in contrib's XML Query Parser
|
2009-08-04 09:56:11 -04:00
|
|
|
packaged as War file (Mark Harwood)
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1406: Added Arabic analyzer. (Robert Muir via Grant Ingersoll)
|
2009-08-04 11:05:34 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1628: Added Persian analyzer. (Robert Muir)
|
2009-08-02 23:38:44 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1813: Add option to ReverseStringFilter to mark reversed tokens.
|
2009-08-19 08:07:15 -04:00
|
|
|
(Andrzej Bialecki via Robert Muir)
|
|
|
|
|
2009-05-19 05:50:24 -04:00
|
|
|
Optimizations
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1643: Re-use the collation key (RawCollationKey) for
|
2009-05-19 05:50:24 -04:00
|
|
|
better performance, in ICUCollationKeyFilter. (Robert Muir via
|
|
|
|
Mike McCandless)
|
2009-02-09 06:49:33 -05:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1794: Implement TokenStream reuse for contrib Analyzers,
|
2009-08-16 08:37:05 -04:00
|
|
|
and implement reset() for TokenStreams to support reuse. (Robert Muir)
|
|
|
|
|
2008-10-18 12:29:53 -04:00
|
|
|
Documentation
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1876: added missing package level documentation for numerous
|
2009-09-03 14:51:48 -04:00
|
|
|
contrib packages.
|
|
|
|
(Steven Rowe & Robert Muir)
|
2008-10-18 12:29:53 -04:00
|
|
|
|
|
|
|
Build
|
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1728: Split contrib/analyzers into common and smartcn modules.
|
2009-09-02 11:23:58 -04:00
|
|
|
Contrib/analyzers now builds an additional lucene-smartcn Jar file. All
|
2009-08-21 08:59:07 -04:00
|
|
|
smartcn classes are not included in the lucene-analyzers JAR file.
|
2009-09-02 11:23:58 -04:00
|
|
|
(Robert Muir via Simon Willnauer)
|
2009-08-21 19:19:26 -04:00
|
|
|
|
2009-09-08 09:07:07 -04:00
|
|
|
* LUCENE-1829: Fix contrib query parser to properly create javacc files.
|
2009-08-21 19:19:26 -04:00
|
|
|
(Jan-Pascal and Luis Alves via Michael Busch)
|
2008-10-18 12:29:53 -04:00
|
|
|
|
|
|
|
Test Cases
|
|
|
|
|
|
|
|
|
2008-10-01 07:22:58 -04:00
|
|
|
======================= Release 2.4.0 2008-10-06 =======================
|
|
|
|
|
2008-08-23 13:12:57 -04:00
|
|
|
Changes in runtime behavior
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
|
|
|
API Changes
|
|
|
|
|
2008-08-23 18:02:47 -04:00
|
|
|
1.
|
|
|
|
|
2008-08-23 13:12:57 -04:00
|
|
|
(None)
|
|
|
|
|
|
|
|
Bug fixes
|
|
|
|
|
|
|
|
1. LUCENE-1312: Added full support for InstantiatedIndexReader#getFieldNames()
|
|
|
|
and tests that assert that deleted documents behaves as they should (they did).
|
|
|
|
(Jason Rutherglen, Karl Wettin)
|
|
|
|
|
|
|
|
2. LUCENE-1318: InstantiatedIndexReader.norms(String, b[], int) didn't treat
|
|
|
|
the array offset right. (Jason Rutherglen via Karl Wettin)
|
|
|
|
|
|
|
|
New features
|
|
|
|
|
|
|
|
1. LUCENE-1320: ShingleMatrixFilter, multidimensional shingle token filter. (Karl Wettin)
|
|
|
|
|
2008-08-23 18:02:47 -04:00
|
|
|
2. LUCENE-1142: Updated Snowball package, org.tartarus distribution revision 500.
|
|
|
|
Introducing Hungarian, Turkish and Romanian support, updated older stemmers
|
|
|
|
and optimized (reflectionless) SnowballFilter.
|
|
|
|
IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY: an index created using the 2.3.2 (or older)
|
|
|
|
might not be compatible with these updated classes as some algorithms have changed.
|
|
|
|
(Karl Wettin)
|
|
|
|
|
2008-08-25 11:02:20 -04:00
|
|
|
3. LUCENE-1016: TermVectorAccessor, transparent vector space access via stored vectors
|
|
|
|
or by resolving the inverted index. (Karl Wettin)
|
|
|
|
|
2008-08-23 13:12:57 -04:00
|
|
|
Documentation
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
|
|
|
Build
|
|
|
|
|
|
|
|
(None)
|
|
|
|
|
|
|
|
Test Cases
|
|
|
|
|
2008-10-01 07:22:58 -04:00
|
|
|
(None)
|