From 86d8937f3af4b6a9a7764fb6737c6bcf769050c6 Mon Sep 17 00:00:00 2001 From: Uwe Schindler Date: Wed, 1 Dec 2010 13:37:53 +0000 Subject: [PATCH] Clean up changes my merging in 2.9/3.0 fixes git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1041007 13f79535-47bb-0310-9956-ffa450edef68 --- lucene/CHANGES.txt | 229 +++++++++++++++++++++++++------------ lucene/contrib/CHANGES.txt | 50 +++++--- 2 files changed, 188 insertions(+), 91 deletions(-) diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt index 0b787a0fa5d..c4ca3d6aead 100644 --- a/lucene/CHANGES.txt +++ b/lucene/CHANGES.txt @@ -330,10 +330,6 @@ Documentation * LUCENE-2579: Fix oal.search's package.html description of abstract methods. (Santiago M. Mola via Mike McCandless) -* LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to - Java NIO behavior when a Thread is interrupted while blocking on IO. - (Simon Willnauer, Robert Muir) - Bug fixes * LUCENE-2633: PackedInts Packed32 and Packed64 did not support internal @@ -434,10 +430,6 @@ Changes in runtime behavior usage, allowing applications to accidentally open two writers on the same directory. (Mike McCandless) -* LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a - test lock just before the real lock is acquired. (Surinder Pal - Singh Bindra via Mike McCandless) - * LUCENE-2701: maxMergeMB and maxMergeDocs constraints set on LogMergePolicy now affect optimize() as well (as opposed to only regular merges). This means that you can run optimize() and too large segments won't be merged. (Shai Erera) @@ -537,12 +529,6 @@ API Changes Bug fixes -* LUCENE-2216: OpenBitSet.hashCode returned different hash codes for - sets that only differed by trailing zeros. (Dawid Weiss, yonik) - -* LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap(). - (Javier Godoy via Uwe Schindler) - * LUCENE-2249: ParallelMultiSearcher should shut down thread pool on close. (Martin Traverso via Uwe Schindler) @@ -554,10 +540,6 @@ Bug fixes IndexWriter/IndexReader to Directory, and it no longer leaks memory. (Earwin Burrfoot via Mike McCandless) -* LUCENE-2365: IndexWriter.newestSegment (used normally for testing) - is fixed to return null if there are no segments. (Karthick - Sankarachary via Mike McCandless) - * LUCENE-2074: Reduce buffer size of lexer back to default on reset. (Ruben Laguna, Shai Erera via Uwe Schindler) @@ -565,42 +547,10 @@ Bug fixes a prior (corrupt) index missing its segments_N file. (Mike McCandless) -* LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer - throws an exception when term count exceeds doc count. - (Mike McCandless, Uwe Schindler) - -* LUCENE-2513: when opening writable IndexReader on a not-current - commit, do not overwrite "future" commits. (Mike McCandless) - -* LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when - primary & secondary dirs share the same underlying directory. - (Michael McCandless) - * LUCENE-2534: fix over-sharing bug in MultiTermsEnum.docs/AndPositionsEnum. (Robert Muir, Mike McCandless) -* LUCENE-2536: IndexWriter.rollback was failing to properly rollback - buffered deletions against segments that were flushed (Mark Harwood - via Mike McCandless) - -* LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results - with endpoints near Long.MIN_VALUE and Long.MAX_VALUE: - NumericUtils.splitRange() overflowed, if - - the range contained a LOWER bound - that was greater than (Long.MAX_VALUE - (1L << precisionStep)) - - the range contained an UPPER bound - that was less than (Long.MIN_VALUE + (1L << precisionStep)) - With standard precision steps around 4, this had no effect on - most queries, only those that met the above conditions. - Queries with large precision steps failed more easy. Queries with - precision step >=64 were not affected. Also 32 bit data types int - and float were not affected. - (Yonik Seeley, Uwe Schindler) - -* LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record - the absolute docid. (Uwe Schindler) - * LUCENE-2458: QueryParser no longer automatically forms phrase queries, assuming whitespace tokenization. Previously all CJK queries, for example, would be turned into phrase queries. The old behavior is preserved with @@ -614,33 +564,13 @@ Bug fixes * LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions exceeds number of terms at one position (Jayendra Patil via Mike McCandless) -* LUCENE-2593: Fixed certain rare cases where a disk full could lead - to a corrupted index (Robert Muir, Mike McCandless) - * LUCENE-2617: Optional clauses of a BooleanQuery were not factored into coord if the scorer for that segment returned null. This can cause the same document to score to differently depending on what segment it resides in. (yonik) -* LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an - exact multiple of the chunk size. (Robert Muir) - * LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll) -* LUCENE-2634: isCurrent on an NRT reader was failing to return false - if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless) - -* LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing - an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir) - -* LUCENE-2658: Exceptions while processing term vectors enabled for multiple - fields could lead to invalid ArrayIndexOutOfBoundsExceptions. - (Robert Muir, Mike McCandless) - -* LUCENE-2744: CheckIndex was stating total number of fields, - not the number that have norms enabled, on the "test: field - norms..." output. (Mark Kristensson via Mike McCandless) - New features * LUCENE-2128: Parallelized fetching document frequencies during weight @@ -792,12 +722,6 @@ Optimizations (getStrings, getStringIndex), consume quite a bit less RAM in most cases. (Mike McCandless) -* LUCENE-2098: Improve the performance of BaseCharFilter, especially for - large documents. (Robin Wojciki, Koji Sekiguchi, Robert Muir) - -* LUCENE-2556: Improve memory usage after cloning (Char)TermAttribute. - (Adriano Crestani via Uwe Schindler) - * LUCENE-2719: Improved TermsHashPerField's sorting to use a better quick sort algorithm that dereferences the privot element not on every compare call. Also replaced lots of sorting code in Lucene @@ -873,6 +797,159 @@ Test Cases as Eclipse and IntelliJ. (Paolo Castagna, Steven Rowe via Robert Muir) +================== Release 2.9.4 / 3.0.3 2010-12-03 ==================== + +Changes in runtime behavior + +* LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a + test lock just before the real lock is acquired. (Surinder Pal + Singh Bindra via Mike McCandless) + +* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file + handles against deleted files when compound-file was enabled (the + default) and readers are pooled. As a result of this the peak + worst-case free disk space required during optimize is now 3X the + index size, when compound file is enabled (else 2X). (Mike + McCandless) + +* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default = + 0.1), which means any time a merged segment is greater than 10% of + the index size, it will be left in non-compound format even if + compound format is on. This change was made to reduce peak + transient disk usage during optimize which increased due to + LUCENE-2762. (Mike McCandless) + +Bug fixes + +* LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer + throws an exception when term count exceeds doc count. + (Mike McCandless, Uwe Schindler) + +* LUCENE-2513: when opening writable IndexReader on a not-current + commit, do not overwrite "future" commits. (Mike McCandless) + +* LUCENE-2536: IndexWriter.rollback was failing to properly rollback + buffered deletions against segments that were flushed (Mark Harwood + via Mike McCandless) + +* LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results + with endpoints near Long.MIN_VALUE and Long.MAX_VALUE: + NumericUtils.splitRange() overflowed, if + - the range contained a LOWER bound + that was greater than (Long.MAX_VALUE - (1L << precisionStep)) + - the range contained an UPPER bound + that was less than (Long.MIN_VALUE + (1L << precisionStep)) + With standard precision steps around 4, this had no effect on + most queries, only those that met the above conditions. + Queries with large precision steps failed more easy. Queries with + precision step >=64 were not affected. Also 32 bit data types int + and float were not affected. + (Yonik Seeley, Uwe Schindler) + +* LUCENE-2593: Fixed certain rare cases where a disk full could lead + to a corrupted index (Robert Muir, Mike McCandless) + +* LUCENE-2620: Fixed a bug in WildcardQuery where too many asterisks + would result in unbearably slow performance. (Nick Barkas via Robert Muir) + +* LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an + exact multiple of the chunk size. (Robert Muir) + +* LUCENE-2634: isCurrent on an NRT reader was failing to return false + if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless) + +* LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing + an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir) + +* LUCENE-2384: Reset zzBuffer in StandardTokenizerImpl when lexer is reset. + (Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074) + +* LUCENE-2658: Exceptions while processing term vectors enabled for multiple + fields could lead to invalid ArrayIndexOutOfBoundsExceptions. + (Robert Muir, Mike McCandless) + +* LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap(). + (Javier Godoy via Uwe Schindler) + +* LUCENE-2328: Fixed memory leak in how IndexWriter/Reader tracked + already sync'd files. (Earwin Burrfoot via Mike McCandless) + +* LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record + the absolute docid. (Uwe Schindler) + +* LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when + primary & secondary dirs share the same underlying directory. + (Michael McCandless) + +* LUCENE-2365: IndexWriter.newestSegment (used normally for testing) + is fixed to return null if there are no segments. (Karthick + Sankarachary via Mike McCandless) + +* LUCENE-2730: Fix two rare deadlock cases in IndexWriter (Mike McCandless) + +* LUCENE-2744: CheckIndex was stating total number of fields, + not the number that have norms enabled, on the "test: field + norms..." output. (Mark Kristensson via Mike McCandless) + +* LUCENE-2759: Fixed two near-real-time cases where doc store files + may be opened for read even though they are still open for write. + (Mike McCandless) + +* LUCENE-2618: Fix rare thread safety issue whereby + IndexWriter.optimize could sometimes return even though the index + wasn't fully optimized (Mike McCandless) + +* LUCENE-2767: Fix thread safety issue in addIndexes(IndexReader[]) + that could potentially result in index corruption. (Mike + McCandless) + +* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file + handles against deleted files when compound-file was enabled (the + default) and readers are pooled. As a result of this the peak + worst-case free disk space required during optimize is now 3X the + index size, when compound file is enabled (else 2X). (Mike + McCandless) + +* LUCENE-2216: OpenBitSet.hashCode returned different hash codes for + sets that only differed by trailing zeros. (Dawid Weiss, yonik) + +* LUCENE-2782: Fix rare potential thread hazard with + IndexWriter.commit (Mike McCandless) + +API Changes + +* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default = + 0.1), which means any time a merged segment is greater than 10% of + the index size, it will be left in non-compound format even if + compound format is on. This change was made to reduce peak + transient disk usage during optimize which increased due to + LUCENE-2762. (Mike McCandless) + +Optimizations + +* LUCENE-2556: Improve memory usage after cloning TermAttribute. + (Adriano Crestani via Uwe Schindler) + +* LUCENE-2098: Improve the performance of BaseCharFilter, especially for + large documents. (Robin Wojciki, Koji Sekiguchi, Robert Muir) + +New features + +* LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files + also in 2.9. The file format did not change, only the version number was + upgraded to mark segments that have no compression. FieldsWriter still only + writes 2.9 segments as they could contain compressed fields. This cross-version + index format compatibility is provided here solely because Lucene 2.9 and 3.0 + have the same bugfix level, features, and the same index format with this slight + compression difference. In general, Lucene does not support reading newer + indexes with older library versions. (Uwe Schindler) + +Documentation + +* LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to + Java NIO behavior when a Thread is interrupted while blocking on IO. + (Simon Willnauer, Robert Muir) + ================== Release 2.9.3 / 3.0.2 2010-06-18 ==================== Changes in backwards compatibility policy diff --git a/lucene/contrib/CHANGES.txt b/lucene/contrib/CHANGES.txt index aa0dbf0e78f..2a3c9ee7161 100644 --- a/lucene/contrib/CHANGES.txt +++ b/lucene/contrib/CHANGES.txt @@ -98,15 +98,6 @@ Bug fixes Additionally, for Version > 3.0, the Snowball stopword lists are used by default. (Robert Muir, Uwe Schindler, Simon Willnauer) - * LUCENE-2278: FastVectorHighlighter: Highlighted term is out of alignment - in multi-valued NOT_ANALYZED field. (Koji Sekiguchi) - - * LUCENE-2284: MatchAllDocsQueryNode toString() created an invalid XML tag. - (Frank Wesemann via Robert Muir) - - * LUCENE-2277: QueryNodeImpl threw ConcurrentModificationException on - add(List). (Frank Wesemann via Robert Muir) - * LUCENE-2184: Fixed bug with handling best fit value when the proper best fit value is not an indexed field. Note, this change affects the APIs. (Grant Ingersoll) @@ -117,9 +108,6 @@ Bug fixes For matchVersion >= 3.1 the filter also no longer lowercases. ThaiAnalyzer will use a separate LowerCaseFilter instead. (Uwe Schindler, Robert Muir) -* LUCENE-2524: FastVectorHighlighter: use mod for getting colored tag. - (Koji Sekiguchi) - * LUCENE-2615: Fix DirectIOLinuxDirectory to not assign bogus permissions to newly created files, and to not silently hardwire buffer size to 1 MB. (Mark Miller, Robert Muir, Mike McCandless) @@ -132,9 +120,6 @@ Bug fixes always the case. If the dictionary is unavailable, the filter will now throw UnsupportedOperationException in the constructor. (Robert Muir) -* LUCENE-2616: FastVectorHighlighter: out of alignment when the first value is - empty in multiValued field (Koji Sekiguchi) - * LUCENE-589: Fix contrib/demo for international documents. (Curtis d'Entremont via Robert Muir) @@ -323,6 +308,41 @@ Other * LUCENE-2415: Use reflection instead of a shim class to access Jakarta Regex prefix. (Uwe Schindler) +================== Release 2.9.4 / 3.0.3 2010-12-03 ==================== + +Bug Fixes + + * LUCENE-2277: QueryNodeImpl threw ConcurrentModificationException on + add(List). (Frank Wesemann via Robert Muir) + + * LUCENE-2284: MatchAllDocsQueryNode toString() created an invalid XML tag. + (Frank Wesemann via Robert Muir) + + * LUCENE-2278: FastVectorHighlighter: Highlighted term is out of alignment + in multi-valued NOT_ANALYZED field. (Koji Sekiguchi) + + * LUCENE-2524: FastVectorHighlighter: use mod for getting colored tag. + (Koji Sekiguchi) + + * LUCENE-2616: FastVectorHighlighter: out of alignment when the first value is + empty in multiValued field (Koji Sekiguchi) + + * LUCENE-2731, LUCENE-2732: Fix (charset) problems in XML loading in + HyphenationCompoundWordTokenFilter (partial bugfix-only in 2.9 and 3.0, + full fix will be in later 3.1). + (Uwe Schinder) + +Documentation + + * LUCENE-2055: Add documentation noting that the Dutch and French stemmers + in contrib/analyzers do not implement the Snowball algorithm correctly, + and recommend to use the equivalents in contrib/snowball if possible. + (Robert Muir, Uwe Schindler, Simon Willnauer) + + * LUCENE-2653: Add documentation noting that ThaiWordFilter will not work + as expected on all JRE's. For example, on an IBM JRE, it does nothing. + (Robert Muir) + ================== Release 2.9.3 / 3.0.2 2010-06-18 ==================== No changes.