From 9fdc41f0f8250f275876630dc5731f9362b049a9 Mon Sep 17 00:00:00 2001 From: Grant Ingersoll Date: Wed, 30 Mar 2011 19:37:38 +0000 Subject: [PATCH] sync CHANGEs for 3.1 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1087056 13f79535-47bb-0310-9956-ffa450edef68 --- lucene/CHANGES.txt | 179 +++++++++++++++++++------------------ lucene/contrib/CHANGES.txt | 85 +++++++++--------- solr/CHANGES.txt | 96 +++++++++++--------- 3 files changed, 191 insertions(+), 169 deletions(-) diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt index 0221b958c3f..0ec3f79127e 100644 --- a/lucene/CHANGES.txt +++ b/lucene/CHANGES.txt @@ -393,7 +393,7 @@ Optimizations * LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early on empty or one-element lists/arrays. (Uwe Schindler) -======================= Lucene 3.1 (not yet released) ======================= +======================= Lucene 3.1.0 ======================= Changes in backwards compatibility policy @@ -409,7 +409,7 @@ Changes in backwards compatibility policy * LUCENE-2190: Removed deprecated customScore() and customExplain() methods from experimental CustomScoreQuery. (Uwe Schindler) - + * LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default. This means that terms with a position increment gap of zero do not affect the norms calculation by default. (Robert Muir) @@ -447,10 +447,10 @@ Changes in backwards compatibility policy actual file's length if the file exists, and throws FileNotFoundException otherwise. Returning length=0 for a non-existent file is no longer allowed. If you relied on that, make sure to catch the exception. (Shai Erera) - + * LUCENE-2386: IndexWriter no longer performs an empty commit upon new index creation. Previously, if you passed an empty Directory and set OpenMode to - CREATE*, IndexWriter would make a first empty commit. If you need that + CREATE*, IndexWriter would make a first empty commit. If you need that behavior you can call writer.commit()/close() immediately after you create it. (Shai Erera, Mike McCandless) @@ -466,10 +466,10 @@ Changes in backwards compatibility policy values in multi-valued field has been changed for some cases in index. If you index empty fields and uses positions/offsets information on that fields, reindex is recommended. (David Smiley, Koji Sekiguchi) - + * LUCENE-2804: Directory.setLockFactory new declares throwing an IOException. (Shai Erera, Robert Muir) - + * LUCENE-2837: Added deprecations noting that in 4.0, Searcher and Searchable are collapsed into IndexSearcher; contrib/remote and MultiSearcher have been removed. (Mike McCandless) @@ -496,7 +496,7 @@ Changes in runtime behavior * LUCENE-2179: CharArraySet.clear() is now functional. (Robert Muir, Uwe Schindler) -* LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index +* LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index before it adds the new ones. Also, the existing segments are not merged and so the index will not end up with a single segment (unless it was empty before). In addition, addIndexesNoOptimize was renamed to addIndexes and no longer @@ -515,9 +515,9 @@ Changes in runtime behavior usage, allowing applications to accidentally open two writers on the same directory. (Mike McCandless) -* LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on - LogMergePolicy now affect optimize() as well (as opposed to only regular - merges). This means that you can run optimize() and too large segments won't +* LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on + LogMergePolicy now affect optimize() as well (as opposed to only regular + merges). This means that you can run optimize() and too large segments won't be merged. (Shai Erera) * LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List, @@ -527,9 +527,9 @@ Changes in runtime behavior the IndexSearcher search methods that take an int nDocs will now throw IllegalArgumentException if nDocs is 0. Instead, you should use the newly added TotalHitCountCollector. (Mike McCandless) - -* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio - to determine whether the passed in segment should be compound. + +* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio + to determine whether the passed in segment should be compound. (Shai Erera, Earwin Burrfoot) * LUCENE-2805: IndexWriter now increments the index version on every change to @@ -549,7 +549,7 @@ Changes in runtime behavior * LUCENE-2010: Segments with 100% deleted documents are now removed on IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless) - + * LUCENE-2960: Allow some changes to IndexWriterConfig to take effect "live" (after an IW is instantiated), via IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless) @@ -567,7 +567,7 @@ API Changes * LUCENE-2103: NoLockFactory should have a private constructor; until Lucene 4.0 the default one will be deprecated. - (Shai Erera via Uwe Schindler) + (Shai Erera via Uwe Schindler) * LUCENE-2177: Deprecate the Field ctors that take byte[] and Store. Since the removal of compressed fields, Store can only be YES, so @@ -587,30 +587,30 @@ API Changes files are no longer open by IndexReaders. (luocanrao via Mike McCandless) -* LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier - use by external code. In addition it offers a matchExtension method which +* LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier + use by external code. In addition it offers a matchExtension method which callers can use to query whether a certain file matches a certain extension. - (Shai Erera via Mike McCandless) + (Shai Erera via Mike McCandless) * LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery. This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but - only scores terms by their boost values. For example, this can be used - with FuzzyQuery to ensure that exact matches are always scored higher, + only scores terms by their boost values. For example, this can be used + with FuzzyQuery to ensure that exact matches are always scored higher, because only the boost will be used in scoring. (Robert Muir) - -* LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to + +* LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to expose its folding logic. (Cédrik Lime via Robert Muir) - -* LUCENE-2294: IndexWriter constructors have been deprecated in favor of a + +* LUCENE-2294: IndexWriter constructors have been deprecated in favor of a single ctor which accepts IndexWriterConfig and a Directory. You can set all - the parameters related to IndexWriter on IndexWriterConfig. The different - setter/getter methods were deprecated as well. One should call + the parameters related to IndexWriter on IndexWriterConfig. The different + setter/getter methods were deprecated as well. One should call writer.getConfig().getXYZ() to query for a parameter XYZ. - Additionally, the setter/getter related to MergePolicy were deprecated as + Additionally, the setter/getter related to MergePolicy were deprecated as well. One should interact with the MergePolicy directly. (Shai Erera via Mike McCandless) - -* LUCENE-2320: IndexWriter's MergePolicy configuration was moved to + +* LUCENE-2320: IndexWriter's MergePolicy configuration was moved to IndexWriterConfig and the respective methods on IndexWriter were deprecated. (Shai Erera via Mike McCandless) @@ -634,14 +634,14 @@ API Changes * LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit points too. If you use an IndexDeletionPolicy which holds onto index commits (such as SnapshotDeletionPolicy), you can call this method to remove those - commit points when they are not needed anymore (instead of waiting for the + commit points when they are not needed anymore (instead of waiting for the next commit). (Shai Erera) - + * LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced with equivalent ones that take a String (id) as argument. You can pass - whatever ID you want, as long as you use the same one when calling both. + whatever ID you want, as long as you use the same one when calling both. (Shai Erera) - + * LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to set what IndexWriter passes for termsIndexDivisor to the readers it opens internally when apply deletions or creating a near-real-time @@ -651,7 +651,7 @@ API Changes in common/standard/ now implement the Word Break rules from the Unicode 6.0.0 Text Segmentation algorithm (UAX#29), covering the full range of Unicode code points, including values from U+FFFF to U+10FFFF - + ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/ Analyzer implementation and behavior. Only the Unicode Basic Multilingual Plane (code points from U+0000 to U+FFFF) is covered. @@ -659,16 +659,16 @@ API Changes UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the relevant RFCs, in addition to implementing the UAX#29 Word Break rules. (Steven Rowe, Robert Muir, Uwe Schindler) - + * LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override and return a different RAMFile implementation. (Shai Erera) - + * LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to count the number of hits matching the query. (Mike McCandless) -* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method - is only syntactic sugar for setNorm(int, String, byte), but using the global - Similarity.getDefault().encodeNormValue(). Use the byte-based method instead +* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method + is only syntactic sugar for setNorm(int, String, byte), but using the global + Similarity.getDefault().encodeNormValue(). Use the byte-based method instead to ensure that the norm is encoded with your Similarity. (Robert Muir, Mike McCandless) @@ -689,6 +689,9 @@ API Changes for AttributeImpls, but can still be provided (if needed). (Uwe Schindler) +* LUCENE-2691: Deprecate IndexWriter.getReader in favor of + IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless) + * LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity, it should keep it itself. Fixed Scorers to pass their parent Weight, so that Scorer.visitSubScorers (LUCENE-2590) will work correctly. @@ -700,7 +703,7 @@ API Changes expert use cases can handle seeing deleted documents returned. The deletes remain buffered so that the next time you open an NRT reader and pass true, all deletes will be a applied. (Mike McCandless) - + * LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now require up front specification of enablePositionIncrement. Together with StopFilter they have a common base class (FilteringTokenFilter) that handles @@ -711,7 +714,7 @@ Bug fixes * LUCENE-2249: ParallelMultiSearcher should shut down thread pool on close. (Martin Traverso via Uwe Schindler) - + * LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap incorrectly and lead to ConcurrentModificationException. (Uwe Schindler, Robert Muir) @@ -722,7 +725,7 @@ Bug fixes * LUCENE-2074: Reduce buffer size of lexer back to default on reset. (Ruben Laguna, Shai Erera via Uwe Schindler) - + * LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on a prior (corrupt) index missing its segments_N file. (Mike McCandless) @@ -731,10 +734,10 @@ Bug fixes assuming whitespace tokenization. Previously all CJK queries, for example, would be turned into phrase queries. The old behavior is preserved with the matchVersion parameter for previous versions. Additionally, you can - explicitly enable the old behavior with setAutoGeneratePhraseQueries(true) + explicitly enable the old behavior with setAutoGeneratePhraseQueries(true) (Robert Muir) - -* LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in + +* LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in OOM if a large file was copied. (Shai Erera) * LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions @@ -752,14 +755,14 @@ Bug fixes * LUCENE-2802: NRT DirectoryReader returned incorrect values from getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due - to a mutable reference to the IndexWriters SegmentInfos. + to a mutable reference to the IndexWriters SegmentInfos. (Simon Willnauer, Earwin Burrfoot) * LUCENE-2852: Fixed corner case in RAMInputStream that would hit a false EOF after seeking to EOF then seeking back to same block you were just in and then calling readBytes (Robert Muir, Mike McCandless) -* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it +* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it decides whether to return the cached computed size or not. (Shai Erera) * LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if @@ -772,7 +775,7 @@ Bug fixes internally, it now calls Similarity.idfExplain(Collection, IndexSearcher). (Robert Muir) -* LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed. +* LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed. (Jason Rutherglen via Shai Erera) * LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round() @@ -788,6 +791,9 @@ Bug fixes been rounded down to 0 instead of being rounded up to the smallest positive number. (yonik) +* LUCENE-2936: PhraseQuery score explanations were not correctly + identifying matches vs non-matches. (hossman) + * LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if the underlying readByte() is inlined (which happens e.g. in MMapDirectory). The loop was unwinded which makes the hotspot bug disappear. @@ -796,30 +802,30 @@ Bug fixes New features * LUCENE-2128: Parallelized fetching document frequencies during weight - creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler) + creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler) * LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch to Java 5, supplementary characters are now lowercased correctly if the set is created as case insensitive. - CharArraySet now requires a Version argument to preserve - backwards compatibility. If Version < 3.1 is passed to the constructor, + CharArraySet now requires a Version argument to preserve + backwards compatibility. If Version < 3.1 is passed to the constructor, CharArraySet yields the old behavior. (Simon Willnauer) - + * LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch to Java 5, supplementary characters are now lowercased correctly. - LowerCaseFilter now requires a Version argument to preserve - backwards compatibility. If Version < 3.1 is passed to the constructor, - LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir) + LowerCaseFilter now requires a Version argument to preserve + backwards compatibility. If Version < 3.1 is passed to the constructor, + LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir) * LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer that makes it easier to reuse TokenStreams correctly. This issue also added StopwordAnalyzerBase, which improves consistency of all Analyzers that use - stopwords, and implement many analyzers in contrib with it. + stopwords, and implement many analyzers in contrib with it. (Simon Willnauer via Robert Muir) - + * LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a new KeywordAttribute. (Simon Willnauer, Drew Farris via Uwe Schindler) - + * LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support to CharTokenizer and its subclasses. CharTokenizer now has new int-API which is conditionally preferred to the old char-API depending @@ -828,8 +834,8 @@ New features * LUCENE-2247: Added a CharArrayMap for performance improvements in some stemmers and synonym filters. (Uwe Schindler) - -* LUCENE-2320: Added SetOnce which wraps an object and allows it to be set + +* LUCENE-2320: Added SetOnce which wraps an object and allows it to be set exactly once. (Shai Erera via Mike McCandless) * LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that @@ -856,19 +862,19 @@ New features Directory.copyTo, and use nio's FileChannel.transferTo when copying files between FSDirectory instances. (Earwin Burrfoot via Mike McCandless). - + * LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the matchVersion parameter is Version.LUCENE_31. (Uwe Schindler) * LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy can be used to prevent commits from ever getting deleted from the index. (Shai Erera) - -* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can - return a DirPayloadProcessor for a given Directory, which returns a - PayloadProcessor for a given Term. The PayloadProcessor will be used to + +* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can + return a DirPayloadProcessor for a given Directory, which returns a + PayloadProcessor for a given Term. The PayloadProcessor will be used to process the payloads of the segments as they are merged (e.g. if one wants to - rewrite payloads of external indexes as they are added, or of local ones). + rewrite payloads of external indexes as they are added, or of local ones). (Shai Erera, Michael Busch, Mike McCandless) * LUCENE-2440: Add support for custom ExecutorService in @@ -881,7 +887,7 @@ New features * LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when it's empty. (Ross Woolf via Mike McCandless) - + * LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike McCandless) @@ -897,17 +903,20 @@ New features to add span support: SpanMultiTermQueryWrapper. Using this wrapper its easy to add fuzzy/wildcard to e.g. a SpanNearQuery. (Robert Muir, Uwe Schindler) - + * LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query instance for stripping off scores. The use of a QueryWrapperFilter is no longer needed and discouraged for that use case. Directly wrapping Query improves performance, as out-of-order collection is now supported. (Uwe Schindler) -* LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to +* LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to FieldInvertState so that it can be used in Similarity.computeNorm. (Robert Muir) +* LUCENE-2720: Segments now record the code version which created them. + (Shai Erera, Mike McCandless, Uwe Schindler) + * LUCENE-2474: Added expert ReaderFinishedListener API to IndexReader, to allow apps that maintain external per-segment caches to evict entries when a segment is finished. (Shay Banon, Yonik @@ -916,8 +925,8 @@ New features * LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and the ICUTokenizer in contrib now all tag types with a consistent set of token types (defined in StandardTokenizer). Tokens in the major - CJK types are explicitly marked to allow for custom downstream handling: - , , , and . + CJK types are explicitly marked to allow for custom downstream handling: + , , , and . (Robert Muir, Steven Rowe) * LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler) @@ -942,7 +951,7 @@ Optimizations * LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin Burrfoot via Mike McCandless) -* LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode +* LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode into MultiTermQuery. The number of fuzzy expansions can be specified with the maxExpansions parameter to FuzzyQuery. (Uwe Schindler, Robert Muir, Mike McCandless) @@ -976,12 +985,12 @@ Optimizations TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve null-handling for TypeAttribute. (Uwe Schindler) -* LUCENE-2329: Switch TermsHash* from using a PostingList object per unique +* LUCENE-2329: Switch TermsHash* from using a PostingList object per unique term to parallel arrays, indexed by termID. This reduces garbage collection overhead significantly, which results in great indexing performance wins when the available JVM heap space is low. This will become even more important when the DocumentsWriter RAM buffer is searchable in the future, - because then it will make sense to make the RAM buffers as large as + because then it will make sense to make the RAM buffers as large as possible. (Mike McCandless, Michael Busch) * LUCENE-2380: The terms field cache methods (getTerms, @@ -996,7 +1005,7 @@ Optimizations causing too many fallbacks to compare-by-value (instead of by-ord). (Mike McCandless) -* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for +* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for efficient copying by sub-classes. Optimized copy is implemented for RAM and FS streams. (Shai Erera) @@ -1019,15 +1028,15 @@ Optimizations * LUCENE-2010: Segments with 100% deleted documents are now removed on IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless) - + * LUCENE-1472: Removed synchronization from static DateTools methods by using a ThreadLocal. Also converted DateTools.Resolution to a Java 5 enum (this should not break backwards). (Uwe Schindler) Build -* LUCENE-2124: Moved the JDK-based collation support from contrib/collation - into core, and moved the ICU-based collation support into contrib/icu. +* LUCENE-2124: Moved the JDK-based collation support from contrib/collation + into core, and moved the ICU-based collation support into contrib/icu. (Robert Muir) * LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards @@ -1039,14 +1048,14 @@ Build * LUCENE-1709: Tests are now parallelized by default (except for benchmark). You can force them to run sequentially by passing -Drunsequential=1 on the command - line. The number of threads that are spawned per CPU defaults to '1'. If you + line. The number of threads that are spawned per CPU defaults to '1'. If you wish to change that, you can run the tests with -DthreadsPerProcessor=[num]. (Robert Muir, Shai Erera, Peter Kofler) * LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar from tarball of previous version. Backwards tests are now packaged together with src distribution. (Uwe Schindler) - + * LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration: "ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ (Steven Rowe) @@ -1055,8 +1064,8 @@ Build generating Maven artifacts (Steven Rowe) * LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's - tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera, Steven - Rowe) + tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera, + Steven Rowe) Test Cases @@ -1092,18 +1101,18 @@ Test Cases access to "real" files from the test folder itself, can use LuceneTestCase(J4).getDataFile(). (Uwe Schindler) -* LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such +* LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such as Eclipse and IntelliJ. (Paolo Castagna, Steven Rowe via Robert Muir) * LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at random. (Shai Erera, Robert Muir) - + Documentation * LUCENE-2579: Fix oal.search's package.html description of abstract methods. (Santiago M. Mola via Mike McCandless) - + * LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage that the TermEnum must be seeked since it is unpositioned. (Adriano Crestani via Robert Muir) diff --git a/lucene/contrib/CHANGES.txt b/lucene/contrib/CHANGES.txt index cbf58336eb7..a4a474b512a 100644 --- a/lucene/contrib/CHANGES.txt +++ b/lucene/contrib/CHANGES.txt @@ -47,26 +47,26 @@ API Changes (No changes) -======================= Lucene 3.1 (not yet released) ======================= +======================= Lucene 3.1.0 ======================= Changes in backwards compatibility policy * LUCENE-2100: All Analyzers in Lucene-contrib have been marked as final. Analyzers should be only act as a composition of TokenStreams, users should compose their own analyzers instead of subclassing existing ones. - (Simon Willnauer) + (Simon Willnauer) * LUCENE-2194, LUCENE-2201: Snowball APIs were upgraded to snowball revision - 502 (with some local modifications for improved performance). - Index backwards compatibility and binary backwards compatibility is - preserved, but some protected/public member variables changed type. This - does NOT affect java code/class files produced by the snowball compiler, + 502 (with some local modifications for improved performance). + Index backwards compatibility and binary backwards compatibility is + preserved, but some protected/public member variables changed type. This + does NOT affect java code/class files produced by the snowball compiler, but technically is a backwards compatibility break. (Robert Muir) - + * LUCENE-2226: Moved contrib/snowball functionality into contrib/analyzers. Be sure to remove any old obselete lucene-snowball jar files from your classpath! (Robert Muir) - + * LUCENE-2323: Moved contrib/wikipedia functionality into contrib/analyzers. Additionally the package was changed from org.apache.lucene.wikipedia.analysis to org.apache.lucene.analysis.wikipedia. (Robert Muir) @@ -74,30 +74,30 @@ Changes in backwards compatibility policy * LUCENE-2581: Added new methods to FragmentsBuilder interface. These methods are used to set pre/post tags and Encoder. (Koji Sekiguchi) - * LUCENE-2391: Improved spellchecker (re)build time/ram usage by omitting + * LUCENE-2391: Improved spellchecker (re)build time/ram usage by omitting frequencies/positions/norms for single-valued fields, modifying the default ramBufferMBSize to match IndexWriterConfig (16MB), making index optimization an optional boolean parameter, and modifying the incremental update logic - to work well with unoptimized spellcheck indexes. The indexDictionary() methods - were made final to ensure a hard backwards break in case you were subclassing + to work well with unoptimized spellcheck indexes. The indexDictionary() methods + were made final to ensure a hard backwards break in case you were subclassing Spellchecker. In general, subclassing Spellchecker is not recommended. (Robert Muir) - + Changes in runtime behavior * LUCENE-2117: SnowballAnalyzer uses TurkishLowerCaseFilter instead of LowercaseFilter to correctly handle the unique Turkish casing behavior if used with Version > 3.0 and the TurkishStemmer. - (Robert Muir via Simon Willnauer) + (Robert Muir via Simon Willnauer) - * LUCENE-2055: GermanAnalyzer now uses the Snowball German2 algorithm and + * LUCENE-2055: GermanAnalyzer now uses the Snowball German2 algorithm and stopwords list by default for Version > 3.0. (Robert Muir, Uwe Schindler, Simon Willnauer) Bug fixes - * LUCENE-2855: contrib queryparser was using CharSequence as key in some internal - Map instances, which was leading to incorrect behaviour, since some CharSequence - implementors do not override hashcode and equals methods. Now the internal Maps + * LUCENE-2855: contrib queryparser was using CharSequence as key in some internal + Map instances, which was leading to incorrect behavior, since some CharSequence + implementors do not override hashcode and equals methods. Now the internal Maps are using String instead. (Adriano Crestani) * LUCENE-2068: Fixed ReverseStringFilter which was not aware of supplementary @@ -106,9 +106,9 @@ Bug fixes now reverses supplementary characters correctly if used with Version > 3.0. (Simon Willnauer, Robert Muir) - * LUCENE-2035: TokenSources.getTokenStream() does not assign positionIncrement. + * LUCENE-2035: TokenSources.getTokenStream() does not assign positionIncrement. (Christopher Morris via Mark Miller) - + * LUCENE-2055: Deprecated RussianTokenizer, RussianStemmer, RussianStemFilter, FrenchStemmer, FrenchStemFilter, DutchStemmer, and DutchStemFilter. For these Analyzers, SnowballFilter is used instead (for Version > 3.0), as @@ -118,7 +118,7 @@ Bug fixes * LUCENE-2184: Fixed bug with handling best fit value when the proper best fit value is not an indexed field. Note, this change affects the APIs. (Grant Ingersoll) - + * LUCENE-2359: Fix bug in CartesianPolyFilterBuilder related to handling of behavior around the 180th meridian (Grant Ingersoll) @@ -135,15 +135,15 @@ Bug fixes and regenerating a new .nrm with 'ant gennorm2'. (David Bowen via Robert Muir) * LUCENE-2653: ThaiWordFilter depends on the JRE having a Thai dictionary, which is not - always the case. If the dictionary is unavailable, the filter will now throw + always the case. If the dictionary is unavailable, the filter will now throw UnsupportedOperationException in the constructor. (Robert Muir) - * LUCENE-589: Fix contrib/demo for international documents. + * LUCENE-589: Fix contrib/demo for international documents. (Curtis d'Entremont via Robert Muir) - + * LUCENE-2246: Fix contrib/demo for Turkish html documents. - (Selim Nadi via Robert Muir) - + (Selim Nadi via Robert Muir) + * LUCENE-590: Demo HTML parser gives incorrect summaries when title is repeated as a heading (Curtis d'Entremont via Robert Muir) @@ -153,9 +153,9 @@ Bug fixes * LUCENE-2874: Highlighting overlapping tokens outputted doubled words. (Pierre Gossé via Robert Muir) - * LUCENE-2943: Fix thread-safety issues with ICUCollationKeyFilter. + * LUCENE-2943: Fix thread-safety issues with ICUCollationKeyFilter. (Robert Muir) - + API Changes * LUCENE-2867: Some contrib queryparser methods that receives CharSequence as @@ -165,7 +165,7 @@ API Changes * LUCENE-2147: Spatial GeoHashUtils now always decode GeoHash strings with full precision. GeoHash#decode_exactly(String) was merged into GeoHash#decode(String). (Chris Male, Simon Willnauer) - + * LUCENE-2204: Change some package private classes/members to publicly accessible to implement custom FragmentsBuilders. (Koji Sekiguchi) @@ -182,14 +182,14 @@ API Changes * LUCENE-2626: FastVectorHighlighter: enable FragListBuilder and FragmentsBuilder to be set per-field override. (Koji Sekiguchi) - * LUCENE-2712: FieldBoostMapAttribute in contrib/queryparser was changed from + * LUCENE-2712: FieldBoostMapAttribute in contrib/queryparser was changed from a Map to a Map. Per the CharSequence javadoc, CharSequence is inappropriate as a map key. (Robert Muir) * LUCENE-1937: Add more methods to manipulate QueryNodeProcessorPipeline elements. QueryNodeProcessorPipeline now implements the List interface, this is useful if you want to extend or modify an existing pipeline. (Adriano Crestani via Robert Muir) - + * LUCENE-2754, LUCENE-2757: Deprecated SpanRegexQuery. Use new SpanMultiTermQueryWrapper(new RegexQuery()) instead. (Robert Muir, Uwe Schindler) @@ -199,10 +199,10 @@ API Changes * LUCENE-2830: Use StringBuilder instead of StringBuffer across Benchmark, and remove the StringBuffer HtmlParser.parse() variant. (Shai Erera) - + * LUCENE-2920: Deprecated ShingleMatrixFilter as it is unmaintained and does not work with custom Attributes or custom payload encoders. (Uwe Schindler) - + New features * LUCENE-2500: Added DirectIOLinuxDirectory, a Linux-specific @@ -210,14 +210,14 @@ New features cache. This is useful to prevent segment merging from evicting pages from the buffer cache, since fadvise/madvise do not seem. (Michael McCandless) - + * LUCENE-2306: Add NumericRangeFilter and NumericRangeQuery support to XMLQueryParser. (Jingkei Ly, via Mark Harwood) * LUCENE-2102: Add a Turkish LowerCase Filter. TurkishLowerCaseFilter handles Turkish and Azeri unique casing behavior correctly. (Ahmet Arslan, Robert Muir via Simon Willnauer) - + * LUCENE-2039: Add a extensible query parser to contrib/misc. ExtendableQueryParser enables arbitrary parser extensions based on a customizable field naming scheme. @@ -225,11 +225,11 @@ New features * LUCENE-2067: Add a Czech light stemmer. CzechAnalyzer will now stem words when Version is set to 3.1 or higher. (Robert Muir) - + * LUCENE-2062: Add a Bulgarian analyzer. (Robert Muir, Simon Willnauer) * LUCENE-2206: Add Snowball's stopword lists for Danish, Dutch, English, - Finnish, French, German, Hungarian, Italian, Norwegian, Russian, Spanish, + Finnish, French, German, Hungarian, Italian, Norwegian, Russian, Spanish, and Swedish. These can be loaded with WordListLoader.getSnowballWordSet. (Robert Muir, Simon Willnauer) @@ -237,7 +237,7 @@ New features (Koji Sekiguchi) * LUCENE-2218: ShingleFilter supports minimum shingle size, and the separator - character is now configurable. Its also up to 20% faster. + character is now configurable. Its also up to 20% faster. (Steven Rowe via Robert Muir) * LUCENE-2234: Add a Hindi analyzer. (Robert Muir) @@ -267,7 +267,7 @@ New features * LUCENE-2298: Add analyzers/stempel, an algorithmic stemmer with support for the Polish language. (Andrzej Bialecki via Robert Muir) - * LUCENE-2400: ShingleFilter was changed to don't output all-filler shingles and + * LUCENE-2400: ShingleFilter was changed to don't output all-filler shingles and unigrams, and uses a more performant algorithm to build grams using a linked list of AttributeSource.cloneAttributes() instances and the new copyTo() method. (Steven Rowe via Uwe Schindler) @@ -286,7 +286,7 @@ New features * LUCENE-2464: FastVectorHighlighter: add SingleFragListBuilder to return entire field contents. (Koji Sekiguchi) - * LUCENE-2503: Added lighter stemming alternatives for European languages. + * LUCENE-2503: Added lighter stemming alternatives for European languages. (Robert Muir) * LUCENE-2581: FastVectorHighlighter: add Encoder to FragmentsBuilder. @@ -294,20 +294,23 @@ New features * LUCENE-2624: Add Analyzers for Armenian, Basque, and Catalan, from snowball. (Robert Muir) - + * LUCENE-1938: PrecedenceQueryParser is now implemented with the flexible QP framework. This means that you can also add this functionality to your own QP pipeline by using BooleanModifiersQueryNodeProcessor, for example instead of GroupQueryNodeProcessor. (Adriano Crestani via Robert Muir) * LUCENE-2791: Added WindowsDirectory, a Windows-specific Directory impl - that doesn't synchronize on the file handle. This can be useful to + that doesn't synchronize on the file handle. This can be useful to avoid the performance problems of SimpleFSDirectory and NIOFSDirectory. (Robert Muir, Simon Willnauer, Uwe Schindler, Michael McCandless) * LUCENE-2842: Add analyzer for Galician. Also adds the RSLP (Orengo) stemmer for Portuguese. (Robert Muir) + * SOLR-1057: Add PathHierarchyTokenizer that represents file path hierarchies as synonyms of + /something, /something/something, /something/something/else. (Ryan McKinley, Koji Sekiguchi) + Build * LUCENE-2124: Moved the JDK-based collation support from contrib/collation diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt index d6e9b20d24c..0f36491e0be 100644 --- a/solr/CHANGES.txt +++ b/solr/CHANGES.txt @@ -247,24 +247,26 @@ Documentation ---------------------- -================== 3.1.0-dev ================== +================== 3.1.0 ================== Versions of Major Components --------------------- -Apache Lucene trunk +Apache Lucene 3.1.0 Apache Tika 0.8 Carrot2 3.4.2 +Velocity 1.6.1 and Velocity Tools 2.0-beta3 +Apache UIMA 2.3.1-SNAPSHOT Upgrading from Solr 1.4 ---------------------- -* The Lucene index format has changed and as a result, once you upgrade, +* The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format. -* The Solr JavaBin format has changed as of Solr 3.1. If you are using the +* The Solr JavaBin format has changed as of Solr 3.1. If you are using the JavaBin format, you will need to upgrade your SolrJ client. (SOLR-2034) * The experimental ALIAS command has been removed (SOLR-1637) @@ -275,10 +277,10 @@ Upgrading from Solr 1.4 is deprecated (SOLR-1696) * The deprecated HTMLStripReader, HTMLStripWhitespaceTokenizerFactory and - HTMLStripStandardTokenizerFactory were removed. To strip HTML tags, - HTMLStripCharFilter should be used instead, and it works with any + HTMLStripStandardTokenizerFactory were removed. To strip HTML tags, + HTMLStripCharFilter should be used instead, and it works with any Tokenizer of your choice. (SOLR-1657) - + * Field compression is no longer supported. Fields that were formerly compressed will be uncompressed as index segments are merged. For shorter fields, this may actually be an improvement, as the compression @@ -287,24 +289,24 @@ Upgrading from Solr 1.4 * SOLR-1845: The TermsComponent response format was changed so that the "terms" container is a map instead of a named list. This affects response formats like JSON, but not XML. (yonik) - + * SOLR-1876: All Analyzers and TokenStreams are now final to enforce the decorator pattern. (rmuir, uschindler) -* LUCENE-2608: Added the ability to specify the accuracy on a per request basis. +* LUCENE-2608: Added the ability to specify the accuracy on a per request basis. It is recommended that implementations of SolrSpellChecker should change over to the new SolrSpellChecker methods using the new SpellingOptions class, but are not required to. While this change is backward compatible, the trunk version of Solr has already dropped support for all but the SpellingOptions method. (gsingers) * readercycle script was removed. (SOLR-2046) -* In previous releases, sorting or evaluating function queries on +* In previous releases, sorting or evaluating function queries on fields that were "multiValued" (either by explicit declaration in schema.xml or by implict behavior because the "version" attribute on the schema was less then 1.2) did not generally work, but it would sometimes silently act as if it succeeded and order the docs arbitrarily. Solr will now fail on any attempt to sort, or apply a - function to, multi-valued fields + function to, multi-valued fields * The DataImportHandler jars are no longer included in the solr WAR and should be added in Solr's lib directory, or referenced @@ -374,13 +376,13 @@ New Features * SOLR-1379: Add RAMDirectoryFactory for non-persistent in memory index storage. (Alex Baranov via yonik) -* SOLR-1857: Synced Solr analysis with Lucene 3.1. Added KeywordMarkerFilterFactory - and StemmerOverrideFilterFactory, which can be used to tune stemming algorithms. +* SOLR-1857: Synced Solr analysis with Lucene 3.1. Added KeywordMarkerFilterFactory + and StemmerOverrideFilterFactory, which can be used to tune stemming algorithms. Added factories for Bulgarian, Czech, Hindi, Turkish, and Wikipedia analysis. Improved the performance of SnowballPorterFilterFactory. (rmuir) -* SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr - TokenFilters now support custom Attributes, and some have improved performance: +* SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr + TokenFilters now support custom Attributes, and some have improved performance: especially WordDelimiterFilter and CommonGramsFilter. (rmuir, cmale, uschindler) * SOLR-1740: ShingleFilterFactory supports the "minShingleSize" and "tokenSeparator" @@ -389,10 +391,10 @@ New Features * SOLR-744: ShingleFilterFactory supports the "outputUnigramsIfNoShingles" parameter, to output unigrams if the number of input tokens is fewer than - minShingleSize, and no shingles can be generated. + minShingleSize, and no shingles can be generated. (Chris Harris via Steven Rowe) -* SOLR-1923: PhoneticFilterFactory now has support for the +* SOLR-1923: PhoneticFilterFactory now has support for the Caverphone algorithm. (rmuir) * SOLR-1957: The VelocityResponseWriter contrib moved to core. @@ -460,7 +462,7 @@ New Features (Ankul Garg, Jason Rutherglen, Shalin Shekhar Mangar, Grant Ingersoll, Robert Muir, ab) * SOLR-1568: Added "native" filtering support for PointType, GeohashField. Added LatLonType with filtering support too. See - http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial. + http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial. Removed SpatialTileField as the underlying CartesianTier is broken beyond repair and is going to be moved. (gsingers) * SOLR-2128: Full parameter substitution for function queries. @@ -515,7 +517,7 @@ Optimizations Bug Fixes ---------------------- -* SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble) +* SOLR-1769: Solr 1.4 Replication - Repeater throwing NullPointerException (Jörgen Rydenius via noble) * SOLR-1432: Make the new ValueSource.getValues(context,reader) delegate to the original ValueSource.getValues(reader) so custom sources @@ -538,8 +540,8 @@ Bug Fixes * SOLR-1584: SolrJ - SolrQuery.setIncludeScore() incorrectly added fl=score to the parameter list instead of appending score to the existing field list. (yonik) - -* SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always + +* SOLR-1580: Solr Configuration ignores 'mergeFactor' parameter, always uses Lucene default. (Lance Norskog via Mark Miller) * SOLR-1593: ReverseWildcardFilter didn't work for surrogate pairs @@ -556,7 +558,7 @@ Bug Fixes set when streaming updates, rather than using UTF-8 as the HTTP headers indicated, leading to an encoding mismatch. (hossman, yonik) -* SOLR-1587: A distributed search request with fl=score, didn't match +* SOLR-1587: A distributed search request with fl=score, didn't match the behavior of a non-distributed request since it only returned the id,score fields instead of all fields in addition to score. (yonik) @@ -565,7 +567,7 @@ Bug Fixes * SOLR-1615: Backslash escaping did not work in quoted strings for local param arguments. (Wojtek Piaseczny, yonik) -* SOLR-1628: log contains incorrect number of adds and deletes. +* SOLR-1628: log contains incorrect number of adds and deletes. (Thijs Vonk via yonik) * SOLR-343: Date faceting now respects facet.mincount limiting @@ -593,7 +595,7 @@ Bug Fixes (never officially released) introduced another hanging bug due to connections not being released. (Attila Babo, Erik Hetzner, Johannes Tuchscherer via yonik) - + * SOLR-1748, SOLR-1747, SOLR-1746, SOLR-1745, SOLR-1744: Streams and Readers retrieved from ContentStreams are not closed in various places, resulting in file descriptor leaks. @@ -602,7 +604,7 @@ Bug Fixes * SOLR-1753: StatsComponent throws NPE when getting statistics for facets in distributed search (Janne Majaranta via koji) -* SOLR-1736:In the slave , If 'mov'ing file does not succeed , copy the file (noble) +* SOLR-1736:In the slave , If 'mov'ing file does not succeed , copy the file (noble) * SOLR-1579: Fixes to XML escaping in stats.jsp (David Bowen and hossman) @@ -656,7 +658,7 @@ Bug Fixes * SOLR-2047: ReplicationHandler should accept bool type for enable flag. (koji) -* SOLR-1630: Fix spell checking collation issue related to token positions (rmuir, gsingers) +* SOLR-1630: Fix spell checking collation issue related to token positions (rmuir, gsingers) * SOLR-2100: The replication handler backup command didn't save the commit point and hence could fail when a newer commit caused the older commit point @@ -665,7 +667,7 @@ Bug Fixes * SOLR-2114: Fixed parsing error in hsin function. The function signature has changed slightly. (gsingers) -* SOLR-2083: SpellCheckComponent misreports suggestions when distributed (James Dyer via gsingers) +* SOLR-2083: SpellCheckComponent misreports suggestions when distributed (James Dyer via gsingers) * SOLR-2111: Change exception handling in distributed faceting to work more like non-distributed faceting, change facet_counts/exception from a String @@ -689,9 +691,9 @@ Bug Fixes * SOLR-2173: Suggester should always rebuild Lookup data if Lookup.load fails. (ab) * SOLR-2081: BaseResponseWriter.isStreamingDocs causes - SingleResponseWriter.end to be called 2x - (Chris A. Mattmann via hossman) - + SingleResponseWriter.end to be called 2x + (Chris A. Mattmann via hossman) + * SOLR-2219: The init() method of every SolrRequestHandler was being called twice. (ambikeshwar singh and hossman) @@ -716,7 +718,7 @@ Bug Fixes * SOLR-482: Provide more exception handling in CSVLoader (gsingers) -* SOLR-1283: HTMLStripCharFilter sometimes threw a "Mark Invalid" exception. +* SOLR-1283: HTMLStripCharFilter sometimes threw a "Mark Invalid" exception. (Julien Coloos, hossman, yonik) * SOLR-2085: Improve SolrJ behavior when FacetComponent comes before @@ -743,21 +745,29 @@ Bug Fixes * SOLR-2380: Distributed faceting could miss values when facet.sort=index and when facet.offset was greater than 0. (yonik) - + * SOLR-1656: XIncludes and other HREFs in XML files loaded by ResourceLoader - are fixed to be resolved using the URI standard (RFC 2396). The system - identifier is no longer a plain filename with path, it gets initialized - using a custom URI scheme "solrres:". This scheme is resolved using a - EntityResolver that utilizes ResourceLoader - (org.apache.solr.common.util.SystemIdResolver). This makes all relative - pathes in Solr's config files behave like expected. This change - introduces some backwards breaks in the API: Some config classes - (Config, SolrConfig, IndexSchema) were changed to take - org.xml.sax.InputSource instead of InputStream. There may also be some - backwards breaks in existing config files, it is recommended to check - your config files / XSLTs and replace all XIncludes/HREFs that were + are fixed to be resolved using the URI standard (RFC 2396). The system + identifier is no longer a plain filename with path, it gets initialized + using a custom URI scheme "solrres:". This scheme is resolved using a + EntityResolver that utilizes ResourceLoader + (org.apache.solr.common.util.SystemIdResolver). This makes all relative + pathes in Solr's config files behave like expected. This change + introduces some backwards breaks in the API: Some config classes + (Config, SolrConfig, IndexSchema) were changed to take + org.xml.sax.InputSource instead of InputStream. There may also be some + backwards breaks in existing config files, it is recommended to check + your config files / XSLTs and replace all XIncludes/HREFs that were hacked to use absolute paths to use relative ones. (uschindler) +* SOLR-309: Fix FieldType so setting an analyzer on a FieldType that + doesn't expect it will generate an error. Practically speaking this + means that Solr will now correctly generate an error on + initialization if the schema.xml contains an analyzer configuration + for a fieldType that does not use TextField. (hossman) + +* SOLR-2192: StreamingUpdateSolrServer.blockUntilFinished was not + thread safe and could throw an exception. (yonik) Other Changes ----------------------