synchronize CHANGES.txt

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1059914 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Robert Muir 2011-01-17 13:20:54 +00:00
parent 9906198ff3
commit 140101dc38
1 changed files with 147 additions and 105 deletions

View File

@ -89,19 +89,9 @@ Changes in backwards compatibility policy
* LUCENE-2484: Removed deprecated TermAttribute. Use CharTermAttribute
and TermToBytesRefAttribute instead. (Uwe Schindler)
* LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
takes deletions into account by default. You can disable this by
calling setCalibrateSizeByDeletes(false) on the merge policy. (Mike
McCandless)
* LUCENE-2600: Remove IndexReader.isDeleted in favor of
IndexReader.getDeletedDocs(). (Mike McCandless)
* LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
values in multi-valued field has been changed for some cases in index.
If you index empty fields and uses positions/offsets information on that
fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
* LUCENE-2667: FuzzyQuery's defaults have changed for more performant
behavior: the minimum similarity is 2 edit distances from the word,
and the priority queue size is 50. To support this, FuzzyQuery now allows
@ -140,21 +130,6 @@ Changes in backwards compatibility policy
Changes in Runtime Behavior
* LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
Windows and Solaris systems that support unmapping, FSDirectory.open returns
MMapDirectory. Additionally the behavior of MMapDirectory has been
changed to enable unmapping by default if supported by the JRE.
(Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
to determine whether the passed in segment should be compound.
(Shai Erera, Earwin Burrfoot)
* LUCENE-2805: IndexWriter now increments the index version on every change to
the index instead of for every commit. Committing or closing the IndexWriter
without any changes to the index will not cause any index version increment.
(Simon Willnauer, Mike McCandless)
* LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if you
omitNorms(true) for field "a" for 1000 documents, but then add a document with
omitNorms(false) for field "a", all documents for field "a" will have no norms.
@ -181,17 +156,6 @@ API Changes
deleted docs (getDeletedDocs), providing a new Bits interface to
directly query by doc ID.
* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
points too. If you use an IndexDeletionPolicy which holds onto index commits
(such as SnapshotDeletionPolicy), you can call this method to remove those
commit points when they are not needed anymore (instead of waiting for the
next commit). (Shai Erera)
* LUCENE-2674: A new idfExplain method was added to Similarity, that
accepts an incoming docFreq. If you subclass Similarity, make sure
you also override this method on upgrade. (Robert Muir, Mike
McCandless)
* LUCENE-2691: IndexWriter.getReader() has been made package local and is now
exposed via open and reopen methods on IndexReader. The semantics of the
call is the same as it was prior to the API change.
@ -199,9 +163,6 @@ API Changes
* LUCENE-2566: QueryParser: Unary operators +,-,! will not be treated as
operators if they are followed by whitespace. (yonik)
* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
and return a different RAMFile implementation. (Shai Erera)
* LUCENE-2831: Weight#scorer, Weight#explain, Filter#getDocIdSet,
Collector#setNextReader & FieldComparator#setNextReader now expect an
@ -253,10 +214,6 @@ New features
data and payloads in 5 separate files instead of the 2 used by
standard codec), and int block (really a "base" for using
block-based compressors like PForDelta for storing postings data).
* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
can be used to prevent commits from ever getting deleted from the index.
(Shai Erera)
* LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard
codec is more RAM efficient: terms data is stored as block byte
@ -271,16 +228,6 @@ New features
applications that have many unique terms, since it reduces how often
a new segment must be flushed given a fixed RAM buffer size.
* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
return a DirPayloadProcessor for a given Directory, which returns a
PayloadProcessor for a given Term. The PayloadProcessor will be used to
process the payloads of the segments as they are merged (e.g. if one wants to
rewrite payloads of external indexes as they are added, or of local ones).
(Shai Erera, Michael Busch, Mike McCandless)
* LUCENE-2440: Add support for custom ExecutorService in
ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
* LUCENE-2489: Added PerFieldCodecWrapper (in oal.index.codecs) which
lets you set the Codec per field (Mike McCandless)
@ -291,17 +238,6 @@ New features
SegmentInfosReader to allow customization of SegmentInfos data.
(Andrzej Bialecki)
* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
McCandless)
* LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq. Along
with a custom Collector these experimental methods make it possible
to gather the hit-count per sub-clause and per document while a
search is running. (Simon Willnauer, Mike McCandless)
* LUCENE-2636: Added MultiCollector which allows running the search with several
Collectors. (Shai Erera)
* LUCENE-2504: FieldComparator.setNextReader now returns a
FieldComparator instance. You can "return this", to just reuse the
same instance, or you can return a comparator optimized to the new
@ -364,17 +300,6 @@ New features
Optimizations
* LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
(Mike McCandless)
* LUCENE-2531: Fix issue when sorting by a String field that was
causing too many fallbacks to compare-by-value (instead of by-ord).
(Mike McCandless)
* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
streams. (Shai Erera)
* LUCENE-2588: Don't store unnecessary suffixes when writing the terms
index, saving RAM in IndexReader; change default terms index
interval from 128 to 32, because the terms index now requires much
@ -389,11 +314,6 @@ Optimizations
MultiTermQuery now stores TermState per leaf reader during rewrite to re-
seek the term dictionary in TermQuery / TermWeight.
(Simon Willnauer, Mike McCandless, Robert Muir)
Documentation
* LUCENE-2579: Fix oal.search's package.html description of abstract
methods. (Santiago M. Mola via Mike McCandless)
Bug fixes
@ -404,14 +324,6 @@ Bug fixes
with more document deletions is requested before a reader with fewer
deletions, provided they share some segments. (yonik)
* LUCENE-2802: NRT DirectoryReader returned incorrect values from
getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
to a mutable reference to the IndexWriters SegmentInfos.
(Simon Willnauer, Earwin Burrfoot)
* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
decides whether to return the cached computed size or not. (Shai Erera)
======================= Lucene 3.x (not yet released) =======================
Changes in backwards compatibility policy
@ -476,10 +388,33 @@ Changes in backwards compatibility policy
* LUCENE-2733: Removed public constructors of utility classes with only static
methods to prevent instantiation. (Uwe Schindler)
* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List
instead of a Collection, guaranteeing the commits are sorted from oldest to
latest. (Shai Erera)
* LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
takes deletions into account by default. You can disable this by
calling setCalibrateSizeByDeletes(false) on the merge policy. (Mike
McCandless)
* LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
values in multi-valued field has been changed for some cases in index.
If you index empty fields and uses positions/offsets information on that
fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
* LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
(Shai Erera, Robert Muir)
* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
Searchable are collapsed into IndexSearcher; contrib/remote and
MultiSearcher have been removed. (Mike McCandless)
* LUCENE-2854: Deprecated SimilarityDelegator and
Similarity.lengthNorm; the latter is now final, forcing any custom
Similarity impls to cutover to the more general computeNorm (Robert
Muir, Mike McCandless)
* LUCENE-2674: A new idfExplain method was added to Similarity, that
accepts an incoming docFreq. If you subclass Similarity, make sure
you also override this method on upgrade. (Robert Muir, Mike
McCandless)
Changes in runtime behavior
* LUCENE-1923: Made IndexReader.toString() produce something
@ -495,7 +430,7 @@ Changes in runtime behavior
invokes a merge on the incoming and target segments, but instead copies the
segments to the target index. You can call maybeMerge or optimize after this
method completes, if you need to.
In addition, Directory.copyTo* were removed in favor of copy which takes the
target Directory, source and target files as arguments, and copies the source
file to the target Directory under the target file name. (Shai Erera)
@ -512,6 +447,33 @@ Changes in runtime behavior
merges). This means that you can run optimize() and too large segments won't
be merged. (Shai Erera)
* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
guaranteeing the commits are sorted from oldest to latest. (Shai Erera)
* LUCENE-2785: TopScoreDocCollector, TopFieldCollector and
the IndexSearcher search methods that take an int nDocs will now
throw IllegalArgumentException if nDocs is 0. Instead, you should
use the newly added TotalHitCountCollector. (Mike McCandless)
* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
to determine whether the passed in segment should be compound.
(Shai Erera, Earwin Burrfoot)
* LUCENE-2805: IndexWriter now increments the index version on every change to
the index instead of for every commit. Committing or closing the IndexWriter
without any changes to the index will not cause any index version increment.
(Simon Willnauer, Mike McCandless)
* LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
Windows and Solaris systems that support unmapping, FSDirectory.open returns
MMapDirectory. Additionally the behavior of MMapDirectory has been
changed to enable unmapping by default if supported by the JRE.
(Mike McCandless, Uwe Schindler, Robert Muir)
* LUCENE-2829: Improve the performance of "primary key" lookup use
case (running a TermQuery that matches one document) on a
multi-segment index. (Robert Muir, Mike McCandless)
API Changes
* LUCENE-2076: Rename FSDirectory.getFile -> getDirectory. (George
@ -522,7 +484,7 @@ API Changes
custom Similarity can alter how norms are encoded, though they must
still be encoded as a single byte (Johan Kindgren via Mike
McCandless)
* LUCENE-2103: NoLockFactory should have a private constructor;
until Lucene 4.0 the default one will be deprecated.
(Shai Erera via Uwe Schindler)
@ -594,17 +556,42 @@ API Changes
(such as SnapshotDeletionPolicy), you can call this method to remove those
commit points when they are not needed anymore (instead of waiting for the
next commit). (Shai Erera)
* LUCENE-2455: IndexWriter.addIndexesNoOptimize was renamed to addIndexes.
IndexFileNames.segmentFileName now takes another parameter to accommodate
custom file names. You should use this method to name all your files.
(Shai Erera)
* LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
with equivalent ones that take a String (id) as argument. You can pass
whatever ID you want, as long as you use the same one when calling both.
(Shai Erera)
* LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
set what IndexWriter passes for termsIndexDivisor to the readers it
opens internally when apply deletions or creating a near-real-time
reader. (Earwin Burrfoot via Mike McCandless)
* LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer
in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
points, including values from U+FFFF to U+10FFFF
ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
Analyzer implementation and behavior. Only the Unicode Basic Multilingual
Plane (code points from U+0000 to U+FFFF) is covered.
UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
(Steven Rowe, Robert Muir, Uwe Schindler)
* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
and return a different RAMFile implementation. (Shai Erera)
* LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
count the number of hits matching the query. (Mike McCandless)
* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method
is only syntactic sugar for setNorm(int, String, byte), but using the global
Similarity.getDefault().encodeNormValue(). Use the byte-based method instead
to ensure that the norm is encoded with your Similarity.
(Robert Muir, Mike McCandless)
Bug fixes
* LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
@ -625,10 +612,6 @@ Bug fixes
a prior (corrupt) index missing its segments_N file. (Mike
McCandless)
* LUCENE-2534: fix over-sharing bug in
MultiTermsEnum.docs/AndPositionsEnum. (Robert Muir, Mike
McCandless)
* LUCENE-2458: QueryParser no longer automatically forms phrase queries,
assuming whitespace tokenization. Previously all CJK queries, for example,
would be turned into phrase queries. The old behavior is preserved with
@ -647,7 +630,22 @@ Bug fixes
can cause the same document to score to differently depending on
what segment it resides in. (yonik)
* LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
* LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
* LUCENE-2732: Fix charset problems in XML loading in
HyphenationCompoundWordTokenFilter. (Uwe Schindler)
* LUCENE-2802: NRT DirectoryReader returned incorrect values from
getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
to a mutable reference to the IndexWriters SegmentInfos.
(Simon Willnauer, Earwin Burrfoot)
* LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
false EOF after seeking to EOF then seeking back to same block you
were just in and then calling readBytes (Robert Muir, Mike McCandless)
* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
decides whether to return the cached computed size or not. (Shai Erera)
New features
@ -720,6 +718,16 @@ New features
can be used to prevent commits from ever getting deleted from the index.
(Shai Erera)
* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
return a DirPayloadProcessor for a given Directory, which returns a
PayloadProcessor for a given Term. The PayloadProcessor will be used to
process the payloads of the segments as they are merged (e.g. if one wants to
rewrite payloads of external indexes as they are added, or of local ones).
(Shai Erera, Michael Busch, Mike McCandless)
* LUCENE-2440: Add support for custom ExecutorService in
ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
* LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter
to wrap any other Analyzer and provide the same functionality as
MaxFieldLength provided on IndexWriter. This patch also fixes a bug
@ -727,9 +735,17 @@ New features
* LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
it's empty. (Ross Woolf via Mike McCandless)
* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
McCandless)
* LUCENE-2671: Add SortField.setMissingValue( v ) to enable sorting
behavior for documents that do not include the given field. (ryan)
* LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq. Along
with a custom Collector these experimental methods make it possible
to gather the hit-count per sub-clause and per document while a
search is running. (Simon Willnauer, Mike McCandless)
* LUCENE-2636: Added MultiCollector which allows running the search with several
Collectors. (Shai Erera)
* LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries
to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
@ -748,6 +764,9 @@ New features
Optimizations
* LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
simple polling for results. (Edward Drapkin, Simon Willnauer)
* LUCENE-2075: Terms dict cache is now shared across threads instead
of being stored separately in thread local storage. Also fixed
terms dict so that the cache is used when seeking the thread local
@ -810,6 +829,17 @@ Optimizations
(getStrings, getStringIndex), consume quite a bit less RAM in most
cases. (Mike McCandless)
* LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
(Mike McCandless)
* LUCENE-2531: Fix issue when sorting by a String field that was
causing too many fallbacks to compare-by-value (instead of by-ord).
(Mike McCandless)
* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
streams. (Shai Erera)
* LUCENE-2719: Improved TermsHashPerField's sorting to use a better
quick sort algorithm that dereferences the pivot element not on
every compare call. Also replaced lots of sorting code in Lucene
@ -889,6 +919,18 @@ Test Cases
as Eclipse and IntelliJ.
(Paolo Castagna, Steven Rowe via Robert Muir)
* LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
random. (Shai Erera, Robert Muir)
Documentation
* LUCENE-2579: Fix oal.search's package.html description of abstract
methods. (Santiago M. Mola via Mike McCandless)
* LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
that the TermEnum must be seeked since it is unpositioned.
(Adriano Crestani via Robert Muir)
================== Release 2.9.4 / 3.0.3 2010-12-03 ====================
Changes in runtime behavior