sync CHANGEs for 3.1

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1087056 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Grant Ingersoll 2011-03-30 19:37:38 +00:00
parent a4c7a88834
commit 9fdc41f0f8
3 changed files with 191 additions and 169 deletions

View File

@ -393,7 +393,7 @@ Optimizations
* LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early * LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early
on empty or one-element lists/arrays. (Uwe Schindler) on empty or one-element lists/arrays. (Uwe Schindler)
======================= Lucene 3.1 (not yet released) ======================= ======================= Lucene 3.1.0 =======================
Changes in backwards compatibility policy Changes in backwards compatibility policy
@ -409,7 +409,7 @@ Changes in backwards compatibility policy
* LUCENE-2190: Removed deprecated customScore() and customExplain() * LUCENE-2190: Removed deprecated customScore() and customExplain()
methods from experimental CustomScoreQuery. (Uwe Schindler) methods from experimental CustomScoreQuery. (Uwe Schindler)
* LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default. * LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
This means that terms with a position increment gap of zero do not This means that terms with a position increment gap of zero do not
affect the norms calculation by default. (Robert Muir) affect the norms calculation by default. (Robert Muir)
@ -447,10 +447,10 @@ Changes in backwards compatibility policy
actual file's length if the file exists, and throws FileNotFoundException actual file's length if the file exists, and throws FileNotFoundException
otherwise. Returning length=0 for a non-existent file is no longer allowed. If otherwise. Returning length=0 for a non-existent file is no longer allowed. If
you relied on that, make sure to catch the exception. (Shai Erera) you relied on that, make sure to catch the exception. (Shai Erera)
* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index * LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
creation. Previously, if you passed an empty Directory and set OpenMode to creation. Previously, if you passed an empty Directory and set OpenMode to
CREATE*, IndexWriter would make a first empty commit. If you need that CREATE*, IndexWriter would make a first empty commit. If you need that
behavior you can call writer.commit()/close() immediately after you create it. behavior you can call writer.commit()/close() immediately after you create it.
(Shai Erera, Mike McCandless) (Shai Erera, Mike McCandless)
@ -466,10 +466,10 @@ Changes in backwards compatibility policy
values in multi-valued field has been changed for some cases in index. values in multi-valued field has been changed for some cases in index.
If you index empty fields and uses positions/offsets information on that If you index empty fields and uses positions/offsets information on that
fields, reindex is recommended. (David Smiley, Koji Sekiguchi) fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
* LUCENE-2804: Directory.setLockFactory new declares throwing an IOException. * LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
(Shai Erera, Robert Muir) (Shai Erera, Robert Muir)
* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and * LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
Searchable are collapsed into IndexSearcher; contrib/remote and Searchable are collapsed into IndexSearcher; contrib/remote and
MultiSearcher have been removed. (Mike McCandless) MultiSearcher have been removed. (Mike McCandless)
@ -496,7 +496,7 @@ Changes in runtime behavior
* LUCENE-2179: CharArraySet.clear() is now functional. * LUCENE-2179: CharArraySet.clear() is now functional.
(Robert Muir, Uwe Schindler) (Robert Muir, Uwe Schindler)
* LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index * LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index
before it adds the new ones. Also, the existing segments are not merged and so before it adds the new ones. Also, the existing segments are not merged and so
the index will not end up with a single segment (unless it was empty before). the index will not end up with a single segment (unless it was empty before).
In addition, addIndexesNoOptimize was renamed to addIndexes and no longer In addition, addIndexesNoOptimize was renamed to addIndexes and no longer
@ -515,9 +515,9 @@ Changes in runtime behavior
usage, allowing applications to accidentally open two writers on the usage, allowing applications to accidentally open two writers on the
same directory. (Mike McCandless) same directory. (Mike McCandless)
* LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on * LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on
LogMergePolicy now affect optimize() as well (as opposed to only regular LogMergePolicy now affect optimize() as well (as opposed to only regular
merges). This means that you can run optimize() and too large segments won't merges). This means that you can run optimize() and too large segments won't
be merged. (Shai Erera) be merged. (Shai Erera)
* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List, * LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
@ -527,9 +527,9 @@ Changes in runtime behavior
the IndexSearcher search methods that take an int nDocs will now the IndexSearcher search methods that take an int nDocs will now
throw IllegalArgumentException if nDocs is 0. Instead, you should throw IllegalArgumentException if nDocs is 0. Instead, you should
use the newly added TotalHitCountCollector. (Mike McCandless) use the newly added TotalHitCountCollector. (Mike McCandless)
* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio * LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
to determine whether the passed in segment should be compound. to determine whether the passed in segment should be compound.
(Shai Erera, Earwin Burrfoot) (Shai Erera, Earwin Burrfoot)
* LUCENE-2805: IndexWriter now increments the index version on every change to * LUCENE-2805: IndexWriter now increments the index version on every change to
@ -549,7 +549,7 @@ Changes in runtime behavior
* LUCENE-2010: Segments with 100% deleted documents are now removed on * LUCENE-2010: Segments with 100% deleted documents are now removed on
IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless) IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
* LUCENE-2960: Allow some changes to IndexWriterConfig to take effect * LUCENE-2960: Allow some changes to IndexWriterConfig to take effect
"live" (after an IW is instantiated), via "live" (after an IW is instantiated), via
IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless) IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless)
@ -567,7 +567,7 @@ API Changes
* LUCENE-2103: NoLockFactory should have a private constructor; * LUCENE-2103: NoLockFactory should have a private constructor;
until Lucene 4.0 the default one will be deprecated. until Lucene 4.0 the default one will be deprecated.
(Shai Erera via Uwe Schindler) (Shai Erera via Uwe Schindler)
* LUCENE-2177: Deprecate the Field ctors that take byte[] and Store. * LUCENE-2177: Deprecate the Field ctors that take byte[] and Store.
Since the removal of compressed fields, Store can only be YES, so Since the removal of compressed fields, Store can only be YES, so
@ -587,30 +587,30 @@ API Changes
files are no longer open by IndexReaders. (luocanrao via Mike files are no longer open by IndexReaders. (luocanrao via Mike
McCandless) McCandless)
* LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier * LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier
use by external code. In addition it offers a matchExtension method which use by external code. In addition it offers a matchExtension method which
callers can use to query whether a certain file matches a certain extension. callers can use to query whether a certain file matches a certain extension.
(Shai Erera via Mike McCandless) (Shai Erera via Mike McCandless)
* LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery. * LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery.
This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but
only scores terms by their boost values. For example, this can be used only scores terms by their boost values. For example, this can be used
with FuzzyQuery to ensure that exact matches are always scored higher, with FuzzyQuery to ensure that exact matches are always scored higher,
because only the boost will be used in scoring. (Robert Muir) because only the boost will be used in scoring. (Robert Muir)
* LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to * LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to
expose its folding logic. (Cédrik Lime via Robert Muir) expose its folding logic. (Cédrik Lime via Robert Muir)
* LUCENE-2294: IndexWriter constructors have been deprecated in favor of a * LUCENE-2294: IndexWriter constructors have been deprecated in favor of a
single ctor which accepts IndexWriterConfig and a Directory. You can set all single ctor which accepts IndexWriterConfig and a Directory. You can set all
the parameters related to IndexWriter on IndexWriterConfig. The different the parameters related to IndexWriter on IndexWriterConfig. The different
setter/getter methods were deprecated as well. One should call setter/getter methods were deprecated as well. One should call
writer.getConfig().getXYZ() to query for a parameter XYZ. writer.getConfig().getXYZ() to query for a parameter XYZ.
Additionally, the setter/getter related to MergePolicy were deprecated as Additionally, the setter/getter related to MergePolicy were deprecated as
well. One should interact with the MergePolicy directly. well. One should interact with the MergePolicy directly.
(Shai Erera via Mike McCandless) (Shai Erera via Mike McCandless)
* LUCENE-2320: IndexWriter's MergePolicy configuration was moved to * LUCENE-2320: IndexWriter's MergePolicy configuration was moved to
IndexWriterConfig and the respective methods on IndexWriter were deprecated. IndexWriterConfig and the respective methods on IndexWriter were deprecated.
(Shai Erera via Mike McCandless) (Shai Erera via Mike McCandless)
@ -634,14 +634,14 @@ API Changes
* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit * LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
points too. If you use an IndexDeletionPolicy which holds onto index commits points too. If you use an IndexDeletionPolicy which holds onto index commits
(such as SnapshotDeletionPolicy), you can call this method to remove those (such as SnapshotDeletionPolicy), you can call this method to remove those
commit points when they are not needed anymore (instead of waiting for the commit points when they are not needed anymore (instead of waiting for the
next commit). (Shai Erera) next commit). (Shai Erera)
* LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced * LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
with equivalent ones that take a String (id) as argument. You can pass with equivalent ones that take a String (id) as argument. You can pass
whatever ID you want, as long as you use the same one when calling both. whatever ID you want, as long as you use the same one when calling both.
(Shai Erera) (Shai Erera)
* LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to * LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
set what IndexWriter passes for termsIndexDivisor to the readers it set what IndexWriter passes for termsIndexDivisor to the readers it
opens internally when apply deletions or creating a near-real-time opens internally when apply deletions or creating a near-real-time
@ -651,7 +651,7 @@ API Changes
in common/standard/ now implement the Word Break rules from the Unicode 6.0.0 in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
Text Segmentation algorithm (UAX#29), covering the full range of Unicode code Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
points, including values from U+FFFF to U+10FFFF points, including values from U+FFFF to U+10FFFF
ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/ ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
Analyzer implementation and behavior. Only the Unicode Basic Multilingual Analyzer implementation and behavior. Only the Unicode Basic Multilingual
Plane (code points from U+0000 to U+FFFF) is covered. Plane (code points from U+0000 to U+FFFF) is covered.
@ -659,16 +659,16 @@ API Changes
UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
relevant RFCs, in addition to implementing the UAX#29 Word Break rules. relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
(Steven Rowe, Robert Muir, Uwe Schindler) (Steven Rowe, Robert Muir, Uwe Schindler)
* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override * LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
and return a different RAMFile implementation. (Shai Erera) and return a different RAMFile implementation. (Shai Erera)
* LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to * LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
count the number of hits matching the query. (Mike McCandless) count the number of hits matching the query. (Mike McCandless)
* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method * LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method
is only syntactic sugar for setNorm(int, String, byte), but using the global is only syntactic sugar for setNorm(int, String, byte), but using the global
Similarity.getDefault().encodeNormValue(). Use the byte-based method instead Similarity.getDefault().encodeNormValue(). Use the byte-based method instead
to ensure that the norm is encoded with your Similarity. to ensure that the norm is encoded with your Similarity.
(Robert Muir, Mike McCandless) (Robert Muir, Mike McCandless)
@ -689,6 +689,9 @@ API Changes
for AttributeImpls, but can still be provided (if needed). for AttributeImpls, but can still be provided (if needed).
(Uwe Schindler) (Uwe Schindler)
* LUCENE-2691: Deprecate IndexWriter.getReader in favor of
IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless)
* LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity, * LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity,
it should keep it itself. Fixed Scorers to pass their parent Weight, so that it should keep it itself. Fixed Scorers to pass their parent Weight, so that
Scorer.visitSubScorers (LUCENE-2590) will work correctly. Scorer.visitSubScorers (LUCENE-2590) will work correctly.
@ -700,7 +703,7 @@ API Changes
expert use cases can handle seeing deleted documents returned. The expert use cases can handle seeing deleted documents returned. The
deletes remain buffered so that the next time you open an NRT reader deletes remain buffered so that the next time you open an NRT reader
and pass true, all deletes will be a applied. (Mike McCandless) and pass true, all deletes will be a applied. (Mike McCandless)
* LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now * LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now
require up front specification of enablePositionIncrement. Together with require up front specification of enablePositionIncrement. Together with
StopFilter they have a common base class (FilteringTokenFilter) that handles StopFilter they have a common base class (FilteringTokenFilter) that handles
@ -711,7 +714,7 @@ Bug fixes
* LUCENE-2249: ParallelMultiSearcher should shut down thread pool on * LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
close. (Martin Traverso via Uwe Schindler) close. (Martin Traverso via Uwe Schindler)
* LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap * LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap
incorrectly and lead to ConcurrentModificationException. incorrectly and lead to ConcurrentModificationException.
(Uwe Schindler, Robert Muir) (Uwe Schindler, Robert Muir)
@ -722,7 +725,7 @@ Bug fixes
* LUCENE-2074: Reduce buffer size of lexer back to default on reset. * LUCENE-2074: Reduce buffer size of lexer back to default on reset.
(Ruben Laguna, Shai Erera via Uwe Schindler) (Ruben Laguna, Shai Erera via Uwe Schindler)
* LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on * LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on
a prior (corrupt) index missing its segments_N file. (Mike a prior (corrupt) index missing its segments_N file. (Mike
McCandless) McCandless)
@ -731,10 +734,10 @@ Bug fixes
assuming whitespace tokenization. Previously all CJK queries, for example, assuming whitespace tokenization. Previously all CJK queries, for example,
would be turned into phrase queries. The old behavior is preserved with would be turned into phrase queries. The old behavior is preserved with
the matchVersion parameter for previous versions. Additionally, you can the matchVersion parameter for previous versions. Additionally, you can
explicitly enable the old behavior with setAutoGeneratePhraseQueries(true) explicitly enable the old behavior with setAutoGeneratePhraseQueries(true)
(Robert Muir) (Robert Muir)
* LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in * LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in
OOM if a large file was copied. (Shai Erera) OOM if a large file was copied. (Shai Erera)
* LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions * LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions
@ -752,14 +755,14 @@ Bug fixes
* LUCENE-2802: NRT DirectoryReader returned incorrect values from * LUCENE-2802: NRT DirectoryReader returned incorrect values from
getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
to a mutable reference to the IndexWriters SegmentInfos. to a mutable reference to the IndexWriters SegmentInfos.
(Simon Willnauer, Earwin Burrfoot) (Simon Willnauer, Earwin Burrfoot)
* LUCENE-2852: Fixed corner case in RAMInputStream that would hit a * LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
false EOF after seeking to EOF then seeking back to same block you false EOF after seeking to EOF then seeking back to same block you
were just in and then calling readBytes (Robert Muir, Mike McCandless) were just in and then calling readBytes (Robert Muir, Mike McCandless)
* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it * LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
decides whether to return the cached computed size or not. (Shai Erera) decides whether to return the cached computed size or not. (Shai Erera)
* LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if * LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if
@ -772,7 +775,7 @@ Bug fixes
internally, it now calls Similarity.idfExplain(Collection, IndexSearcher). internally, it now calls Similarity.idfExplain(Collection, IndexSearcher).
(Robert Muir) (Robert Muir)
* LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed. * LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed.
(Jason Rutherglen via Shai Erera) (Jason Rutherglen via Shai Erera)
* LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round() * LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round()
@ -788,6 +791,9 @@ Bug fixes
been rounded down to 0 instead of being rounded up to the smallest been rounded down to 0 instead of being rounded up to the smallest
positive number. (yonik) positive number. (yonik)
* LUCENE-2936: PhraseQuery score explanations were not correctly
identifying matches vs non-matches. (hossman)
* LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if * LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if
the underlying readByte() is inlined (which happens e.g. in MMapDirectory). the underlying readByte() is inlined (which happens e.g. in MMapDirectory).
The loop was unwinded which makes the hotspot bug disappear. The loop was unwinded which makes the hotspot bug disappear.
@ -796,30 +802,30 @@ Bug fixes
New features New features
* LUCENE-2128: Parallelized fetching document frequencies during weight * LUCENE-2128: Parallelized fetching document frequencies during weight
creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler) creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler)
* LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch * LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch
to Java 5, supplementary characters are now lowercased correctly if the to Java 5, supplementary characters are now lowercased correctly if the
set is created as case insensitive. set is created as case insensitive.
CharArraySet now requires a Version argument to preserve CharArraySet now requires a Version argument to preserve
backwards compatibility. If Version < 3.1 is passed to the constructor, backwards compatibility. If Version < 3.1 is passed to the constructor,
CharArraySet yields the old behavior. (Simon Willnauer) CharArraySet yields the old behavior. (Simon Willnauer)
* LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch * LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch
to Java 5, supplementary characters are now lowercased correctly. to Java 5, supplementary characters are now lowercased correctly.
LowerCaseFilter now requires a Version argument to preserve LowerCaseFilter now requires a Version argument to preserve
backwards compatibility. If Version < 3.1 is passed to the constructor, backwards compatibility. If Version < 3.1 is passed to the constructor,
LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir) LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir)
* LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer * LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer
that makes it easier to reuse TokenStreams correctly. This issue also added that makes it easier to reuse TokenStreams correctly. This issue also added
StopwordAnalyzerBase, which improves consistency of all Analyzers that use StopwordAnalyzerBase, which improves consistency of all Analyzers that use
stopwords, and implement many analyzers in contrib with it. stopwords, and implement many analyzers in contrib with it.
(Simon Willnauer via Robert Muir) (Simon Willnauer via Robert Muir)
* LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a * LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a
new KeywordAttribute. (Simon Willnauer, Drew Farris via Uwe Schindler) new KeywordAttribute. (Simon Willnauer, Drew Farris via Uwe Schindler)
* LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support * LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support
to CharTokenizer and its subclasses. CharTokenizer now has new to CharTokenizer and its subclasses. CharTokenizer now has new
int-API which is conditionally preferred to the old char-API depending int-API which is conditionally preferred to the old char-API depending
@ -828,8 +834,8 @@ New features
* LUCENE-2247: Added a CharArrayMap<V> for performance improvements * LUCENE-2247: Added a CharArrayMap<V> for performance improvements
in some stemmers and synonym filters. (Uwe Schindler) in some stemmers and synonym filters. (Uwe Schindler)
* LUCENE-2320: Added SetOnce which wraps an object and allows it to be set * LUCENE-2320: Added SetOnce which wraps an object and allows it to be set
exactly once. (Shai Erera via Mike McCandless) exactly once. (Shai Erera via Mike McCandless)
* LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that * LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that
@ -856,19 +862,19 @@ New features
Directory.copyTo, and use nio's FileChannel.transferTo when copying Directory.copyTo, and use nio's FileChannel.transferTo when copying
files between FSDirectory instances. (Earwin Burrfoot via Mike files between FSDirectory instances. (Earwin Burrfoot via Mike
McCandless). McCandless).
* LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the * LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
matchVersion parameter is Version.LUCENE_31. (Uwe Schindler) matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)
* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy * LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
can be used to prevent commits from ever getting deleted from the index. can be used to prevent commits from ever getting deleted from the index.
(Shai Erera) (Shai Erera)
* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can * LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
return a DirPayloadProcessor for a given Directory, which returns a return a DirPayloadProcessor for a given Directory, which returns a
PayloadProcessor for a given Term. The PayloadProcessor will be used to PayloadProcessor for a given Term. The PayloadProcessor will be used to
process the payloads of the segments as they are merged (e.g. if one wants to process the payloads of the segments as they are merged (e.g. if one wants to
rewrite payloads of external indexes as they are added, or of local ones). rewrite payloads of external indexes as they are added, or of local ones).
(Shai Erera, Michael Busch, Mike McCandless) (Shai Erera, Michael Busch, Mike McCandless)
* LUCENE-2440: Add support for custom ExecutorService in * LUCENE-2440: Add support for custom ExecutorService in
@ -881,7 +887,7 @@ New features
* LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when * LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
it's empty. (Ross Woolf via Mike McCandless) it's empty. (Ross Woolf via Mike McCandless)
* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike * LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
McCandless) McCandless)
@ -897,17 +903,20 @@ New features
to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>. to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
Using this wrapper its easy to add fuzzy/wildcard to e.g. a SpanNearQuery. Using this wrapper its easy to add fuzzy/wildcard to e.g. a SpanNearQuery.
(Robert Muir, Uwe Schindler) (Robert Muir, Uwe Schindler)
* LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query * LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query
instance for stripping off scores. The use of a QueryWrapperFilter instance for stripping off scores. The use of a QueryWrapperFilter
is no longer needed and discouraged for that use case. Directly wrapping is no longer needed and discouraged for that use case. Directly wrapping
Query improves performance, as out-of-order collection is now supported. Query improves performance, as out-of-order collection is now supported.
(Uwe Schindler) (Uwe Schindler)
* LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to * LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to
FieldInvertState so that it can be used in Similarity.computeNorm. FieldInvertState so that it can be used in Similarity.computeNorm.
(Robert Muir) (Robert Muir)
* LUCENE-2720: Segments now record the code version which created them.
(Shai Erera, Mike McCandless, Uwe Schindler)
* LUCENE-2474: Added expert ReaderFinishedListener API to * LUCENE-2474: Added expert ReaderFinishedListener API to
IndexReader, to allow apps that maintain external per-segment caches IndexReader, to allow apps that maintain external per-segment caches
to evict entries when a segment is finished. (Shay Banon, Yonik to evict entries when a segment is finished. (Shay Banon, Yonik
@ -916,8 +925,8 @@ New features
* LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and * LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and
the ICUTokenizer in contrib now all tag types with a consistent set the ICUTokenizer in contrib now all tag types with a consistent set
of token types (defined in StandardTokenizer). Tokens in the major of token types (defined in StandardTokenizer). Tokens in the major
CJK types are explicitly marked to allow for custom downstream handling: CJK types are explicitly marked to allow for custom downstream handling:
<IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>. <IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>.
(Robert Muir, Steven Rowe) (Robert Muir, Steven Rowe)
* LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler) * LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler)
@ -942,7 +951,7 @@ Optimizations
* LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin * LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin
Burrfoot via Mike McCandless) Burrfoot via Mike McCandless)
* LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode * LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode
into MultiTermQuery. The number of fuzzy expansions can be specified with into MultiTermQuery. The number of fuzzy expansions can be specified with
the maxExpansions parameter to FuzzyQuery. the maxExpansions parameter to FuzzyQuery.
(Uwe Schindler, Robert Muir, Mike McCandless) (Uwe Schindler, Robert Muir, Mike McCandless)
@ -976,12 +985,12 @@ Optimizations
TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve
null-handling for TypeAttribute. (Uwe Schindler) null-handling for TypeAttribute. (Uwe Schindler)
* LUCENE-2329: Switch TermsHash* from using a PostingList object per unique * LUCENE-2329: Switch TermsHash* from using a PostingList object per unique
term to parallel arrays, indexed by termID. This reduces garbage collection term to parallel arrays, indexed by termID. This reduces garbage collection
overhead significantly, which results in great indexing performance wins overhead significantly, which results in great indexing performance wins
when the available JVM heap space is low. This will become even more when the available JVM heap space is low. This will become even more
important when the DocumentsWriter RAM buffer is searchable in the future, important when the DocumentsWriter RAM buffer is searchable in the future,
because then it will make sense to make the RAM buffers as large as because then it will make sense to make the RAM buffers as large as
possible. (Mike McCandless, Michael Busch) possible. (Mike McCandless, Michael Busch)
* LUCENE-2380: The terms field cache methods (getTerms, * LUCENE-2380: The terms field cache methods (getTerms,
@ -996,7 +1005,7 @@ Optimizations
causing too many fallbacks to compare-by-value (instead of by-ord). causing too many fallbacks to compare-by-value (instead of by-ord).
(Mike McCandless) (Mike McCandless)
* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for * LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
efficient copying by sub-classes. Optimized copy is implemented for RAM and FS efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
streams. (Shai Erera) streams. (Shai Erera)
@ -1019,15 +1028,15 @@ Optimizations
* LUCENE-2010: Segments with 100% deleted documents are now removed on * LUCENE-2010: Segments with 100% deleted documents are now removed on
IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless) IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
* LUCENE-1472: Removed synchronization from static DateTools methods * LUCENE-1472: Removed synchronization from static DateTools methods
by using a ThreadLocal. Also converted DateTools.Resolution to a by using a ThreadLocal. Also converted DateTools.Resolution to a
Java 5 enum (this should not break backwards). (Uwe Schindler) Java 5 enum (this should not break backwards). (Uwe Schindler)
Build Build
* LUCENE-2124: Moved the JDK-based collation support from contrib/collation * LUCENE-2124: Moved the JDK-based collation support from contrib/collation
into core, and moved the ICU-based collation support into contrib/icu. into core, and moved the ICU-based collation support into contrib/icu.
(Robert Muir) (Robert Muir)
* LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards * LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
@ -1039,14 +1048,14 @@ Build
* LUCENE-1709: Tests are now parallelized by default (except for benchmark). You * LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
can force them to run sequentially by passing -Drunsequential=1 on the command can force them to run sequentially by passing -Drunsequential=1 on the command
line. The number of threads that are spawned per CPU defaults to '1'. If you line. The number of threads that are spawned per CPU defaults to '1'. If you
wish to change that, you can run the tests with -DthreadsPerProcessor=[num]. wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
(Robert Muir, Shai Erera, Peter Kofler) (Robert Muir, Shai Erera, Peter Kofler)
* LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar * LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar
from tarball of previous version. Backwards tests are now packaged together from tarball of previous version. Backwards tests are now packaged together
with src distribution. (Uwe Schindler) with src distribution. (Uwe Schindler)
* LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration: * LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration:
"ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ "ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
(Steven Rowe) (Steven Rowe)
@ -1055,8 +1064,8 @@ Build
generating Maven artifacts (Steven Rowe) generating Maven artifacts (Steven Rowe)
* LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's * LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's
tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera, Steven tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera,
Rowe) Steven Rowe)
Test Cases Test Cases
@ -1092,18 +1101,18 @@ Test Cases
access to "real" files from the test folder itself, can use access to "real" files from the test folder itself, can use
LuceneTestCase(J4).getDataFile(). (Uwe Schindler) LuceneTestCase(J4).getDataFile(). (Uwe Schindler)
* LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such * LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such
as Eclipse and IntelliJ. as Eclipse and IntelliJ.
(Paolo Castagna, Steven Rowe via Robert Muir) (Paolo Castagna, Steven Rowe via Robert Muir)
* LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at * LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
random. (Shai Erera, Robert Muir) random. (Shai Erera, Robert Muir)
Documentation Documentation
* LUCENE-2579: Fix oal.search's package.html description of abstract * LUCENE-2579: Fix oal.search's package.html description of abstract
methods. (Santiago M. Mola via Mike McCandless) methods. (Santiago M. Mola via Mike McCandless)
* LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage * LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
that the TermEnum must be seeked since it is unpositioned. that the TermEnum must be seeked since it is unpositioned.
(Adriano Crestani via Robert Muir) (Adriano Crestani via Robert Muir)

View File

@ -47,26 +47,26 @@ API Changes
(No changes) (No changes)
======================= Lucene 3.1 (not yet released) ======================= ======================= Lucene 3.1.0 =======================
Changes in backwards compatibility policy Changes in backwards compatibility policy
* LUCENE-2100: All Analyzers in Lucene-contrib have been marked as final. * LUCENE-2100: All Analyzers in Lucene-contrib have been marked as final.
Analyzers should be only act as a composition of TokenStreams, users should Analyzers should be only act as a composition of TokenStreams, users should
compose their own analyzers instead of subclassing existing ones. compose their own analyzers instead of subclassing existing ones.
(Simon Willnauer) (Simon Willnauer)
* LUCENE-2194, LUCENE-2201: Snowball APIs were upgraded to snowball revision * LUCENE-2194, LUCENE-2201: Snowball APIs were upgraded to snowball revision
502 (with some local modifications for improved performance). 502 (with some local modifications for improved performance).
Index backwards compatibility and binary backwards compatibility is Index backwards compatibility and binary backwards compatibility is
preserved, but some protected/public member variables changed type. This preserved, but some protected/public member variables changed type. This
does NOT affect java code/class files produced by the snowball compiler, does NOT affect java code/class files produced by the snowball compiler,
but technically is a backwards compatibility break. (Robert Muir) but technically is a backwards compatibility break. (Robert Muir)
* LUCENE-2226: Moved contrib/snowball functionality into contrib/analyzers. * LUCENE-2226: Moved contrib/snowball functionality into contrib/analyzers.
Be sure to remove any old obselete lucene-snowball jar files from your Be sure to remove any old obselete lucene-snowball jar files from your
classpath! (Robert Muir) classpath! (Robert Muir)
* LUCENE-2323: Moved contrib/wikipedia functionality into contrib/analyzers. * LUCENE-2323: Moved contrib/wikipedia functionality into contrib/analyzers.
Additionally the package was changed from org.apache.lucene.wikipedia.analysis Additionally the package was changed from org.apache.lucene.wikipedia.analysis
to org.apache.lucene.analysis.wikipedia. (Robert Muir) to org.apache.lucene.analysis.wikipedia. (Robert Muir)
@ -74,30 +74,30 @@ Changes in backwards compatibility policy
* LUCENE-2581: Added new methods to FragmentsBuilder interface. These methods * LUCENE-2581: Added new methods to FragmentsBuilder interface. These methods
are used to set pre/post tags and Encoder. (Koji Sekiguchi) are used to set pre/post tags and Encoder. (Koji Sekiguchi)
* LUCENE-2391: Improved spellchecker (re)build time/ram usage by omitting * LUCENE-2391: Improved spellchecker (re)build time/ram usage by omitting
frequencies/positions/norms for single-valued fields, modifying the default frequencies/positions/norms for single-valued fields, modifying the default
ramBufferMBSize to match IndexWriterConfig (16MB), making index optimization ramBufferMBSize to match IndexWriterConfig (16MB), making index optimization
an optional boolean parameter, and modifying the incremental update logic an optional boolean parameter, and modifying the incremental update logic
to work well with unoptimized spellcheck indexes. The indexDictionary() methods to work well with unoptimized spellcheck indexes. The indexDictionary() methods
were made final to ensure a hard backwards break in case you were subclassing were made final to ensure a hard backwards break in case you were subclassing
Spellchecker. In general, subclassing Spellchecker is not recommended. (Robert Muir) Spellchecker. In general, subclassing Spellchecker is not recommended. (Robert Muir)
Changes in runtime behavior Changes in runtime behavior
* LUCENE-2117: SnowballAnalyzer uses TurkishLowerCaseFilter instead of * LUCENE-2117: SnowballAnalyzer uses TurkishLowerCaseFilter instead of
LowercaseFilter to correctly handle the unique Turkish casing behavior if LowercaseFilter to correctly handle the unique Turkish casing behavior if
used with Version > 3.0 and the TurkishStemmer. used with Version > 3.0 and the TurkishStemmer.
(Robert Muir via Simon Willnauer) (Robert Muir via Simon Willnauer)
* LUCENE-2055: GermanAnalyzer now uses the Snowball German2 algorithm and * LUCENE-2055: GermanAnalyzer now uses the Snowball German2 algorithm and
stopwords list by default for Version > 3.0. stopwords list by default for Version > 3.0.
(Robert Muir, Uwe Schindler, Simon Willnauer) (Robert Muir, Uwe Schindler, Simon Willnauer)
Bug fixes Bug fixes
* LUCENE-2855: contrib queryparser was using CharSequence as key in some internal * LUCENE-2855: contrib queryparser was using CharSequence as key in some internal
Map instances, which was leading to incorrect behaviour, since some CharSequence Map instances, which was leading to incorrect behavior, since some CharSequence
implementors do not override hashcode and equals methods. Now the internal Maps implementors do not override hashcode and equals methods. Now the internal Maps
are using String instead. (Adriano Crestani) are using String instead. (Adriano Crestani)
* LUCENE-2068: Fixed ReverseStringFilter which was not aware of supplementary * LUCENE-2068: Fixed ReverseStringFilter which was not aware of supplementary
@ -106,9 +106,9 @@ Bug fixes
now reverses supplementary characters correctly if used with Version > 3.0. now reverses supplementary characters correctly if used with Version > 3.0.
(Simon Willnauer, Robert Muir) (Simon Willnauer, Robert Muir)
* LUCENE-2035: TokenSources.getTokenStream() does not assign positionIncrement. * LUCENE-2035: TokenSources.getTokenStream() does not assign positionIncrement.
(Christopher Morris via Mark Miller) (Christopher Morris via Mark Miller)
* LUCENE-2055: Deprecated RussianTokenizer, RussianStemmer, RussianStemFilter, * LUCENE-2055: Deprecated RussianTokenizer, RussianStemmer, RussianStemFilter,
FrenchStemmer, FrenchStemFilter, DutchStemmer, and DutchStemFilter. For FrenchStemmer, FrenchStemFilter, DutchStemmer, and DutchStemFilter. For
these Analyzers, SnowballFilter is used instead (for Version > 3.0), as these Analyzers, SnowballFilter is used instead (for Version > 3.0), as
@ -118,7 +118,7 @@ Bug fixes
* LUCENE-2184: Fixed bug with handling best fit value when the proper best fit value is * LUCENE-2184: Fixed bug with handling best fit value when the proper best fit value is
not an indexed field. Note, this change affects the APIs. (Grant Ingersoll) not an indexed field. Note, this change affects the APIs. (Grant Ingersoll)
* LUCENE-2359: Fix bug in CartesianPolyFilterBuilder related to handling of behavior around * LUCENE-2359: Fix bug in CartesianPolyFilterBuilder related to handling of behavior around
the 180th meridian (Grant Ingersoll) the 180th meridian (Grant Ingersoll)
@ -135,15 +135,15 @@ Bug fixes
and regenerating a new .nrm with 'ant gennorm2'. (David Bowen via Robert Muir) and regenerating a new .nrm with 'ant gennorm2'. (David Bowen via Robert Muir)
* LUCENE-2653: ThaiWordFilter depends on the JRE having a Thai dictionary, which is not * LUCENE-2653: ThaiWordFilter depends on the JRE having a Thai dictionary, which is not
always the case. If the dictionary is unavailable, the filter will now throw always the case. If the dictionary is unavailable, the filter will now throw
UnsupportedOperationException in the constructor. (Robert Muir) UnsupportedOperationException in the constructor. (Robert Muir)
* LUCENE-589: Fix contrib/demo for international documents. * LUCENE-589: Fix contrib/demo for international documents.
(Curtis d'Entremont via Robert Muir) (Curtis d'Entremont via Robert Muir)
* LUCENE-2246: Fix contrib/demo for Turkish html documents. * LUCENE-2246: Fix contrib/demo for Turkish html documents.
(Selim Nadi via Robert Muir) (Selim Nadi via Robert Muir)
* LUCENE-590: Demo HTML parser gives incorrect summaries when title is repeated as a heading * LUCENE-590: Demo HTML parser gives incorrect summaries when title is repeated as a heading
(Curtis d'Entremont via Robert Muir) (Curtis d'Entremont via Robert Muir)
@ -153,9 +153,9 @@ Bug fixes
* LUCENE-2874: Highlighting overlapping tokens outputted doubled words. * LUCENE-2874: Highlighting overlapping tokens outputted doubled words.
(Pierre Gossé via Robert Muir) (Pierre Gossé via Robert Muir)
* LUCENE-2943: Fix thread-safety issues with ICUCollationKeyFilter. * LUCENE-2943: Fix thread-safety issues with ICUCollationKeyFilter.
(Robert Muir) (Robert Muir)
API Changes API Changes
* LUCENE-2867: Some contrib queryparser methods that receives CharSequence as * LUCENE-2867: Some contrib queryparser methods that receives CharSequence as
@ -165,7 +165,7 @@ API Changes
* LUCENE-2147: Spatial GeoHashUtils now always decode GeoHash strings * LUCENE-2147: Spatial GeoHashUtils now always decode GeoHash strings
with full precision. GeoHash#decode_exactly(String) was merged into with full precision. GeoHash#decode_exactly(String) was merged into
GeoHash#decode(String). (Chris Male, Simon Willnauer) GeoHash#decode(String). (Chris Male, Simon Willnauer)
* LUCENE-2204: Change some package private classes/members to publicly accessible to implement * LUCENE-2204: Change some package private classes/members to publicly accessible to implement
custom FragmentsBuilders. (Koji Sekiguchi) custom FragmentsBuilders. (Koji Sekiguchi)
@ -182,14 +182,14 @@ API Changes
* LUCENE-2626: FastVectorHighlighter: enable FragListBuilder and FragmentsBuilder * LUCENE-2626: FastVectorHighlighter: enable FragListBuilder and FragmentsBuilder
to be set per-field override. (Koji Sekiguchi) to be set per-field override. (Koji Sekiguchi)
* LUCENE-2712: FieldBoostMapAttribute in contrib/queryparser was changed from * LUCENE-2712: FieldBoostMapAttribute in contrib/queryparser was changed from
a Map<CharSequence,Float> to a Map<String,Float>. Per the CharSequence javadoc, a Map<CharSequence,Float> to a Map<String,Float>. Per the CharSequence javadoc,
CharSequence is inappropriate as a map key. (Robert Muir) CharSequence is inappropriate as a map key. (Robert Muir)
* LUCENE-1937: Add more methods to manipulate QueryNodeProcessorPipeline elements. * LUCENE-1937: Add more methods to manipulate QueryNodeProcessorPipeline elements.
QueryNodeProcessorPipeline now implements the List interface, this is useful QueryNodeProcessorPipeline now implements the List interface, this is useful
if you want to extend or modify an existing pipeline. (Adriano Crestani via Robert Muir) if you want to extend or modify an existing pipeline. (Adriano Crestani via Robert Muir)
* LUCENE-2754, LUCENE-2757: Deprecated SpanRegexQuery. Use * LUCENE-2754, LUCENE-2757: Deprecated SpanRegexQuery. Use
new SpanMultiTermQueryWrapper<RegexQuery>(new RegexQuery()) instead. new SpanMultiTermQueryWrapper<RegexQuery>(new RegexQuery()) instead.
(Robert Muir, Uwe Schindler) (Robert Muir, Uwe Schindler)
@ -199,10 +199,10 @@ API Changes
* LUCENE-2830: Use StringBuilder instead of StringBuffer across Benchmark, and * LUCENE-2830: Use StringBuilder instead of StringBuffer across Benchmark, and
remove the StringBuffer HtmlParser.parse() variant. (Shai Erera) remove the StringBuffer HtmlParser.parse() variant. (Shai Erera)
* LUCENE-2920: Deprecated ShingleMatrixFilter as it is unmaintained and does * LUCENE-2920: Deprecated ShingleMatrixFilter as it is unmaintained and does
not work with custom Attributes or custom payload encoders. (Uwe Schindler) not work with custom Attributes or custom payload encoders. (Uwe Schindler)
New features New features
* LUCENE-2500: Added DirectIOLinuxDirectory, a Linux-specific * LUCENE-2500: Added DirectIOLinuxDirectory, a Linux-specific
@ -210,14 +210,14 @@ New features
cache. This is useful to prevent segment merging from evicting cache. This is useful to prevent segment merging from evicting
pages from the buffer cache, since fadvise/madvise do not seem. pages from the buffer cache, since fadvise/madvise do not seem.
(Michael McCandless) (Michael McCandless)
* LUCENE-2306: Add NumericRangeFilter and NumericRangeQuery support to XMLQueryParser. * LUCENE-2306: Add NumericRangeFilter and NumericRangeQuery support to XMLQueryParser.
(Jingkei Ly, via Mark Harwood) (Jingkei Ly, via Mark Harwood)
* LUCENE-2102: Add a Turkish LowerCase Filter. TurkishLowerCaseFilter handles * LUCENE-2102: Add a Turkish LowerCase Filter. TurkishLowerCaseFilter handles
Turkish and Azeri unique casing behavior correctly. Turkish and Azeri unique casing behavior correctly.
(Ahmet Arslan, Robert Muir via Simon Willnauer) (Ahmet Arslan, Robert Muir via Simon Willnauer)
* LUCENE-2039: Add a extensible query parser to contrib/misc. * LUCENE-2039: Add a extensible query parser to contrib/misc.
ExtendableQueryParser enables arbitrary parser extensions based on a ExtendableQueryParser enables arbitrary parser extensions based on a
customizable field naming scheme. customizable field naming scheme.
@ -225,11 +225,11 @@ New features
* LUCENE-2067: Add a Czech light stemmer. CzechAnalyzer will now stem words * LUCENE-2067: Add a Czech light stemmer. CzechAnalyzer will now stem words
when Version is set to 3.1 or higher. (Robert Muir) when Version is set to 3.1 or higher. (Robert Muir)
* LUCENE-2062: Add a Bulgarian analyzer. (Robert Muir, Simon Willnauer) * LUCENE-2062: Add a Bulgarian analyzer. (Robert Muir, Simon Willnauer)
* LUCENE-2206: Add Snowball's stopword lists for Danish, Dutch, English, * LUCENE-2206: Add Snowball's stopword lists for Danish, Dutch, English,
Finnish, French, German, Hungarian, Italian, Norwegian, Russian, Spanish, Finnish, French, German, Hungarian, Italian, Norwegian, Russian, Spanish,
and Swedish. These can be loaded with WordListLoader.getSnowballWordSet. and Swedish. These can be loaded with WordListLoader.getSnowballWordSet.
(Robert Muir, Simon Willnauer) (Robert Muir, Simon Willnauer)
@ -237,7 +237,7 @@ New features
(Koji Sekiguchi) (Koji Sekiguchi)
* LUCENE-2218: ShingleFilter supports minimum shingle size, and the separator * LUCENE-2218: ShingleFilter supports minimum shingle size, and the separator
character is now configurable. Its also up to 20% faster. character is now configurable. Its also up to 20% faster.
(Steven Rowe via Robert Muir) (Steven Rowe via Robert Muir)
* LUCENE-2234: Add a Hindi analyzer. (Robert Muir) * LUCENE-2234: Add a Hindi analyzer. (Robert Muir)
@ -267,7 +267,7 @@ New features
* LUCENE-2298: Add analyzers/stempel, an algorithmic stemmer with support for * LUCENE-2298: Add analyzers/stempel, an algorithmic stemmer with support for
the Polish language. (Andrzej Bialecki via Robert Muir) the Polish language. (Andrzej Bialecki via Robert Muir)
* LUCENE-2400: ShingleFilter was changed to don't output all-filler shingles and * LUCENE-2400: ShingleFilter was changed to don't output all-filler shingles and
unigrams, and uses a more performant algorithm to build grams using a linked list unigrams, and uses a more performant algorithm to build grams using a linked list
of AttributeSource.cloneAttributes() instances and the new copyTo() method. of AttributeSource.cloneAttributes() instances and the new copyTo() method.
(Steven Rowe via Uwe Schindler) (Steven Rowe via Uwe Schindler)
@ -286,7 +286,7 @@ New features
* LUCENE-2464: FastVectorHighlighter: add SingleFragListBuilder to return * LUCENE-2464: FastVectorHighlighter: add SingleFragListBuilder to return
entire field contents. (Koji Sekiguchi) entire field contents. (Koji Sekiguchi)
* LUCENE-2503: Added lighter stemming alternatives for European languages. * LUCENE-2503: Added lighter stemming alternatives for European languages.
(Robert Muir) (Robert Muir)
* LUCENE-2581: FastVectorHighlighter: add Encoder to FragmentsBuilder. * LUCENE-2581: FastVectorHighlighter: add Encoder to FragmentsBuilder.
@ -294,20 +294,23 @@ New features
* LUCENE-2624: Add Analyzers for Armenian, Basque, and Catalan, from snowball. * LUCENE-2624: Add Analyzers for Armenian, Basque, and Catalan, from snowball.
(Robert Muir) (Robert Muir)
* LUCENE-1938: PrecedenceQueryParser is now implemented with the flexible QP framework. * LUCENE-1938: PrecedenceQueryParser is now implemented with the flexible QP framework.
This means that you can also add this functionality to your own QP pipeline by using This means that you can also add this functionality to your own QP pipeline by using
BooleanModifiersQueryNodeProcessor, for example instead of GroupQueryNodeProcessor. BooleanModifiersQueryNodeProcessor, for example instead of GroupQueryNodeProcessor.
(Adriano Crestani via Robert Muir) (Adriano Crestani via Robert Muir)
* LUCENE-2791: Added WindowsDirectory, a Windows-specific Directory impl * LUCENE-2791: Added WindowsDirectory, a Windows-specific Directory impl
that doesn't synchronize on the file handle. This can be useful to that doesn't synchronize on the file handle. This can be useful to
avoid the performance problems of SimpleFSDirectory and NIOFSDirectory. avoid the performance problems of SimpleFSDirectory and NIOFSDirectory.
(Robert Muir, Simon Willnauer, Uwe Schindler, Michael McCandless) (Robert Muir, Simon Willnauer, Uwe Schindler, Michael McCandless)
* LUCENE-2842: Add analyzer for Galician. Also adds the RSLP (Orengo) stemmer * LUCENE-2842: Add analyzer for Galician. Also adds the RSLP (Orengo) stemmer
for Portuguese. (Robert Muir) for Portuguese. (Robert Muir)
* SOLR-1057: Add PathHierarchyTokenizer that represents file path hierarchies as synonyms of
/something, /something/something, /something/something/else. (Ryan McKinley, Koji Sekiguchi)
Build Build
* LUCENE-2124: Moved the JDK-based collation support from contrib/collation * LUCENE-2124: Moved the JDK-based collation support from contrib/collation

View File

@ -247,24 +247,26 @@ Documentation
---------------------- ----------------------
================== 3.1.0-dev ================== ================== 3.1.0 ==================
Versions of Major Components Versions of Major Components
--------------------- ---------------------
Apache Lucene trunk Apache Lucene 3.1.0
Apache Tika 0.8 Apache Tika 0.8
Carrot2 3.4.2 Carrot2 3.4.2
Velocity 1.6.1 and Velocity Tools 2.0-beta3
Apache UIMA 2.3.1-SNAPSHOT
Upgrading from Solr 1.4 Upgrading from Solr 1.4
---------------------- ----------------------
* The Lucene index format has changed and as a result, once you upgrade, * The Lucene index format has changed and as a result, once you upgrade,
previous versions of Solr will no longer be able to read your indices. previous versions of Solr will no longer be able to read your indices.
In a master/slave configuration, all searchers/slaves should be upgraded In a master/slave configuration, all searchers/slaves should be upgraded
before the master. If the master were to be updated first, the older before the master. If the master were to be updated first, the older
searchers would not be able to read the new index format. searchers would not be able to read the new index format.
* The Solr JavaBin format has changed as of Solr 3.1. If you are using the * The Solr JavaBin format has changed as of Solr 3.1. If you are using the
JavaBin format, you will need to upgrade your SolrJ client. (SOLR-2034) JavaBin format, you will need to upgrade your SolrJ client. (SOLR-2034)
* The experimental ALIAS command has been removed (SOLR-1637) * The experimental ALIAS command has been removed (SOLR-1637)
@ -275,10 +277,10 @@ Upgrading from Solr 1.4
is deprecated (SOLR-1696) is deprecated (SOLR-1696)
* The deprecated HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and * The deprecated HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and
HTMLStripStandardTokenizerFactory were removed. To strip HTML tags, HTMLStripStandardTokenizerFactory were removed. To strip HTML tags,
HTMLStripCharFilter should be used instead, and it works with any HTMLStripCharFilter should be used instead, and it works with any
Tokenizer of your choice. (SOLR-1657) Tokenizer of your choice. (SOLR-1657)
* Field compression is no longer supported. Fields that were formerly * Field compression is no longer supported. Fields that were formerly
compressed will be uncompressed as index segments are merged. For compressed will be uncompressed as index segments are merged. For
shorter fields, this may actually be an improvement, as the compression shorter fields, this may actually be an improvement, as the compression
@ -287,24 +289,24 @@ Upgrading from Solr 1.4
* SOLR-1845: The TermsComponent response format was changed so that the * SOLR-1845: The TermsComponent response format was changed so that the
"terms" container is a map instead of a named list. This affects "terms" container is a map instead of a named list. This affects
response formats like JSON, but not XML. (yonik) response formats like JSON, but not XML. (yonik)
* SOLR-1876: All Analyzers and TokenStreams are now final to enforce * SOLR-1876: All Analyzers and TokenStreams are now final to enforce
the decorator pattern. (rmuir, uschindler) the decorator pattern. (rmuir, uschindler)
* LUCENE-2608: Added the ability to specify the accuracy on a per request basis. * LUCENE-2608: Added the ability to specify the accuracy on a per request basis.
It is recommended that implementations of SolrSpellChecker should change over to the new SolrSpellChecker It is recommended that implementations of SolrSpellChecker should change over to the new SolrSpellChecker
methods using the new SpellingOptions class, but are not required to. While this change is methods using the new SpellingOptions class, but are not required to. While this change is
backward compatible, the trunk version of Solr has already dropped support for all but the SpellingOptions method. (gsingers) backward compatible, the trunk version of Solr has already dropped support for all but the SpellingOptions method. (gsingers)
* readercycle script was removed. (SOLR-2046) * readercycle script was removed. (SOLR-2046)
* In previous releases, sorting or evaluating function queries on * In previous releases, sorting or evaluating function queries on
fields that were "multiValued" (either by explicit declaration in fields that were "multiValued" (either by explicit declaration in
schema.xml or by implict behavior because the "version" attribute on schema.xml or by implict behavior because the "version" attribute on
the schema was less then 1.2) did not generally work, but it would the schema was less then 1.2) did not generally work, but it would
sometimes silently act as if it succeeded and order the docs sometimes silently act as if it succeeded and order the docs
arbitrarily. Solr will now fail on any attempt to sort, or apply a arbitrarily. Solr will now fail on any attempt to sort, or apply a
function to, multi-valued fields function to, multi-valued fields
* The DataImportHandler jars are no longer included in the solr * The DataImportHandler jars are no longer included in the solr
WAR and should be added in Solr's lib directory, or referenced WAR and should be added in Solr's lib directory, or referenced
@ -374,13 +376,13 @@ New Features
* SOLR-1379: Add RAMDirectoryFactory for non-persistent in memory index storage. * SOLR-1379: Add RAMDirectoryFactory for non-persistent in memory index storage.
(Alex Baranov via yonik) (Alex Baranov via yonik)
* SOLR-1857: Synced Solr analysis with Lucene 3.1. Added KeywordMarkerFilterFactory * SOLR-1857: Synced Solr analysis with Lucene 3.1. Added KeywordMarkerFilterFactory
and StemmerOverrideFilterFactory, which can be used to tune stemming algorithms. and StemmerOverrideFilterFactory, which can be used to tune stemming algorithms.
Added factories for Bulgarian, Czech, Hindi, Turkish, and Wikipedia analysis. Improved the Added factories for Bulgarian, Czech, Hindi, Turkish, and Wikipedia analysis. Improved the
performance of SnowballPorterFilterFactory. (rmuir) performance of SnowballPorterFilterFactory. (rmuir)
* SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr * SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr
TokenFilters now support custom Attributes, and some have improved performance: TokenFilters now support custom Attributes, and some have improved performance:
especially WordDelimiterFilter and CommonGramsFilter. (rmuir, cmale, uschindler) especially WordDelimiterFilter and CommonGramsFilter. (rmuir, cmale, uschindler)
* SOLR-1740: ShingleFilterFactory supports the "minShingleSize" and "tokenSeparator" * SOLR-1740: ShingleFilterFactory supports the "minShingleSize" and "tokenSeparator"
@ -389,10 +391,10 @@ New Features
* SOLR-744: ShingleFilterFactory supports the "outputUnigramsIfNoShingles" * SOLR-744: ShingleFilterFactory supports the "outputUnigramsIfNoShingles"
parameter, to output unigrams if the number of input tokens is fewer than parameter, to output unigrams if the number of input tokens is fewer than
minShingleSize, and no shingles can be generated. minShingleSize, and no shingles can be generated.
(Chris Harris via Steven Rowe) (Chris Harris via Steven Rowe)
* SOLR-1923: PhoneticFilterFactory now has support for the * SOLR-1923: PhoneticFilterFactory now has support for the
Caverphone algorithm. (rmuir) Caverphone algorithm. (rmuir)
* SOLR-1957: The VelocityResponseWriter contrib moved to core. * SOLR-1957: The VelocityResponseWriter contrib moved to core.
@ -460,7 +462,7 @@ New Features
(Ankul Garg, Jason Rutherglen, Shalin Shekhar Mangar, Grant Ingersoll, Robert Muir, ab) (Ankul Garg, Jason Rutherglen, Shalin Shekhar Mangar, Grant Ingersoll, Robert Muir, ab)
* SOLR-1568: Added "native" filtering support for PointType, GeohashField. Added LatLonType with filtering support too. See * SOLR-1568: Added "native" filtering support for PointType, GeohashField. Added LatLonType with filtering support too. See
http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial. http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial.
Removed SpatialTileField as the underlying CartesianTier is broken beyond repair and is going to be moved. (gsingers) Removed SpatialTileField as the underlying CartesianTier is broken beyond repair and is going to be moved. (gsingers)
* SOLR-2128: Full parameter substitution for function queries. * SOLR-2128: Full parameter substitution for function queries.
@ -515,7 +517,7 @@ Optimizations
Bug Fixes Bug Fixes
---------------------- ----------------------
* SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble) * SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble)
* SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate * SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate
to the original ValueSource.getValues(reader) so custom sources to the original ValueSource.getValues(reader) so custom sources
@ -538,8 +540,8 @@ Bug Fixes
* SOLR-1584: SolrJ - SolrQuery.setIncludeScore() incorrectly added * SOLR-1584: SolrJ - SolrQuery.setIncludeScore() incorrectly added
fl=score to the parameter list instead of appending score to the fl=score to the parameter list instead of appending score to the
existing field list. (yonik) existing field list. (yonik)
* SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always * SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always
uses Lucene default. (Lance Norskog via Mark Miller) uses Lucene default. (Lance Norskog via Mark Miller)
* SOLR-1593: ReverseWildcardFilter didn't work for surrogate pairs * SOLR-1593: ReverseWildcardFilter didn't work for surrogate pairs
@ -556,7 +558,7 @@ Bug Fixes
set when streaming updates, rather than using UTF-8 as the HTTP headers set when streaming updates, rather than using UTF-8 as the HTTP headers
indicated, leading to an encoding mismatch. (hossman, yonik) indicated, leading to an encoding mismatch. (hossman, yonik)
* SOLR-1587: A distributed search request with fl=score, didn't match * SOLR-1587: A distributed search request with fl=score, didn't match
the behavior of a non-distributed request since it only returned the behavior of a non-distributed request since it only returned
the id,score fields instead of all fields in addition to score. (yonik) the id,score fields instead of all fields in addition to score. (yonik)
@ -565,7 +567,7 @@ Bug Fixes
* SOLR-1615: Backslash escaping did not work in quoted strings * SOLR-1615: Backslash escaping did not work in quoted strings
for local param arguments. (Wojtek Piaseczny, yonik) for local param arguments. (Wojtek Piaseczny, yonik)
* SOLR-1628: log contains incorrect number of adds and deletes. * SOLR-1628: log contains incorrect number of adds and deletes.
(Thijs Vonk via yonik) (Thijs Vonk via yonik)
* SOLR-343: Date faceting now respects facet.mincount limiting * SOLR-343: Date faceting now respects facet.mincount limiting
@ -593,7 +595,7 @@ Bug Fixes
(never officially released) introduced another hanging bug due to (never officially released) introduced another hanging bug due to
connections not being released. connections not being released.
(Attila Babo, Erik Hetzner, Johannes Tuchscherer via yonik) (Attila Babo, Erik Hetzner, Johannes Tuchscherer via yonik)
* SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers * SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers
retrieved from ContentStreams are not closed in various places, resulting retrieved from ContentStreams are not closed in various places, resulting
in file descriptor leaks. in file descriptor leaks.
@ -602,7 +604,7 @@ Bug Fixes
* SOLR-1753: StatsComponent throws NPE when getting statistics for facets in distributed search * SOLR-1753: StatsComponent throws NPE when getting statistics for facets in distributed search
(Janne Majaranta via koji) (Janne Majaranta via koji)
* SOLR-1736:In the slave , If 'mov'ing file does not succeed , copy the file (noble) * SOLR-1736:In the slave , If 'mov'ing file does not succeed , copy the file (noble)
* SOLR-1579: Fixes to XML escaping in stats.jsp * SOLR-1579: Fixes to XML escaping in stats.jsp
(David Bowen and hossman) (David Bowen and hossman)
@ -656,7 +658,7 @@ Bug Fixes
* SOLR-2047: ReplicationHandler should accept bool type for enable flag. (koji) * SOLR-2047: ReplicationHandler should accept bool type for enable flag. (koji)
* SOLR-1630: Fix spell checking collation issue related to token positions (rmuir, gsingers) * SOLR-1630: Fix spell checking collation issue related to token positions (rmuir, gsingers)
* SOLR-2100: The replication handler backup command didn't save the commit * SOLR-2100: The replication handler backup command didn't save the commit
point and hence could fail when a newer commit caused the older commit point point and hence could fail when a newer commit caused the older commit point
@ -665,7 +667,7 @@ Bug Fixes
* SOLR-2114: Fixed parsing error in hsin function. The function signature has changed slightly. (gsingers) * SOLR-2114: Fixed parsing error in hsin function. The function signature has changed slightly. (gsingers)
* SOLR-2083: SpellCheckComponent misreports suggestions when distributed (James Dyer via gsingers) * SOLR-2083: SpellCheckComponent misreports suggestions when distributed (James Dyer via gsingers)
* SOLR-2111: Change exception handling in distributed faceting to work more * SOLR-2111: Change exception handling in distributed faceting to work more
like non-distributed faceting, change facet_counts/exception from a String like non-distributed faceting, change facet_counts/exception from a String
@ -689,9 +691,9 @@ Bug Fixes
* SOLR-2173: Suggester should always rebuild Lookup data if Lookup.load fails. (ab) * SOLR-2173: Suggester should always rebuild Lookup data if Lookup.load fails. (ab)
* SOLR-2081: BaseResponseWriter.isStreamingDocs causes * SOLR-2081: BaseResponseWriter.isStreamingDocs causes
SingleResponseWriter.end to be called 2x SingleResponseWriter.end to be called 2x
(Chris A. Mattmann via hossman) (Chris A. Mattmann via hossman)
* SOLR-2219: The init() method of every SolrRequestHandler was being * SOLR-2219: The init() method of every SolrRequestHandler was being
called twice. (ambikeshwar singh and hossman) called twice. (ambikeshwar singh and hossman)
@ -716,7 +718,7 @@ Bug Fixes
* SOLR-482: Provide more exception handling in CSVLoader (gsingers) * SOLR-482: Provide more exception handling in CSVLoader (gsingers)
* SOLR-1283: HTMLStripCharFilter sometimes threw a "Mark Invalid" exception. * SOLR-1283: HTMLStripCharFilter sometimes threw a "Mark Invalid" exception.
(Julien Coloos, hossman, yonik) (Julien Coloos, hossman, yonik)
* SOLR-2085: Improve SolrJ behavior when FacetComponent comes before * SOLR-2085: Improve SolrJ behavior when FacetComponent comes before
@ -743,21 +745,29 @@ Bug Fixes
* SOLR-2380: Distributed faceting could miss values when facet.sort=index * SOLR-2380: Distributed faceting could miss values when facet.sort=index
and when facet.offset was greater than 0. (yonik) and when facet.offset was greater than 0. (yonik)
* SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader * SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader
are fixed to be resolved using the URI standard (RFC 2396). The system are fixed to be resolved using the URI standard (RFC 2396). The system
identifier is no longer a plain filename with path, it gets initialized identifier is no longer a plain filename with path, it gets initialized
using a custom URI scheme "solrres:". This scheme is resolved using a using a custom URI scheme "solrres:". This scheme is resolved using a
EntityResolver that utilizes ResourceLoader EntityResolver that utilizes ResourceLoader
(org.apache.solr.common.util.SystemIdResolver). This makes all relative (org.apache.solr.common.util.SystemIdResolver). This makes all relative
pathes in Solr's config files behave like expected. This change pathes in Solr's config files behave like expected. This change
introduces some backwards breaks in the API: Some config classes introduces some backwards breaks in the API: Some config classes
(Config, SolrConfig, IndexSchema) were changed to take (Config, SolrConfig, IndexSchema) were changed to take
org.xml.sax.InputSource instead of InputStream. There may also be some org.xml.sax.InputSource instead of InputStream. There may also be some
backwards breaks in existing config files, it is recommended to check backwards breaks in existing config files, it is recommended to check
your config files / XSLTs and replace all XIncludes/HREFs that were your config files / XSLTs and replace all XIncludes/HREFs that were
hacked to use absolute paths to use relative ones. (uschindler) hacked to use absolute paths to use relative ones. (uschindler)
* SOLR-309: Fix FieldType so setting an analyzer on a FieldType that
doesn't expect it will generate an error. Practically speaking this
means that Solr will now correctly generate an error on
initialization if the schema.xml contains an analyzer configuration
for a fieldType that does not use TextField. (hossman)
* SOLR-2192: StreamingUpdateSolrServer.blockUntilFinished was not
thread safe and could throw an exception. (yonik)
Other Changes Other Changes
---------------------- ----------------------