synchronize CHANGES.txt

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1059914 13f79535-47bb-0310-9956-ffa450edef68
2011-01-17 13:20:54 +00:00 · 2011-01-17 13:20:54 +00:00 · 140101dc38
parent 9906198ff3
commit 140101dc38
1 changed files with 147 additions and 105 deletions
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@ -89,19 +89,9 @@ Changes in backwards compatibility policy
 * LUCENE-2484: Removed deprecated TermAttribute. Use CharTermAttribute
  and TermToBytesRefAttribute instead.  (Uwe Schindler)

-* LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
-  takes deletions into account by default.  You can disable this by
-  calling setCalibrateSizeByDeletes(false) on the merge policy.  (Mike
-  McCandless)
-
 * LUCENE-2600: Remove IndexReader.isDeleted in favor of
  IndexReader.getDeletedDocs().  (Mike McCandless)

-* LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
-  values in multi-valued field has been changed for some cases in index.
-  If you index empty fields and uses positions/offsets information on that
-  fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
-
 * LUCENE-2667: FuzzyQuery's defaults have changed for more performant 
  behavior: the minimum similarity is 2 edit distances from the word,
  and the priority queue size is 50. To support this, FuzzyQuery now allows
@ -140,21 +130,6 @@ Changes in backwards compatibility policy

 Changes in Runtime Behavior

-* LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
-  Windows and Solaris systems that support unmapping, FSDirectory.open returns
-  MMapDirectory. Additionally the behavior of MMapDirectory has been
-  changed to enable unmapping by default if supported by the JRE.
-  (Mike McCandless, Uwe Schindler, Robert Muir)
-
-* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio 
-  to determine whether the passed in segment should be compound. 
-  (Shai Erera, Earwin Burrfoot)
-
-* LUCENE-2805: IndexWriter now increments the index version on every change to
-  the index instead of for every commit. Committing or closing the IndexWriter
-  without any changes to the index will not cause any index version increment.
-  (Simon Willnauer, Mike McCandless)
-
 * LUCENE-2846: omitNorms now behaves like omitTermFrequencyAndPositions, if you
  omitNorms(true) for field "a" for 1000 documents, but then add a document with
  omitNorms(false) for field "a", all documents for field "a" will have no norms.
@ -181,17 +156,6 @@ API Changes
  deleted docs (getDeletedDocs), providing a new Bits interface to
  directly query by doc ID.

-* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
-  points too. If you use an IndexDeletionPolicy which holds onto index commits
-  (such as SnapshotDeletionPolicy), you can call this method to remove those
-  commit points when they are not needed anymore (instead of waiting for the 
-  next commit). (Shai Erera)
-
-* LUCENE-2674: A new idfExplain method was added to Similarity, that
-  accepts an incoming docFreq.  If you subclass Similarity, make sure
-  you also override this method on upgrade.  (Robert Muir, Mike
-  McCandless)
-
 * LUCENE-2691: IndexWriter.getReader() has been made package local and is now
  exposed via open and reopen methods on IndexReader.  The semantics of the
  call is the same as it was prior to the API change.
@ -199,9 +163,6 @@ API Changes

 * LUCENE-2566: QueryParser: Unary operators +,-,! will not be treated as
  operators if they are followed by whitespace. (yonik)
-
-* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
-  and return a different RAMFile implementation. (Shai Erera)
  
 * LUCENE-2831: Weight#scorer, Weight#explain, Filter#getDocIdSet,
  Collector#setNextReader & FieldComparator#setNextReader now expect an
@ -253,10 +214,6 @@ New features
  data and payloads in 5 separate files instead of the 2 used by
  standard codec), and int block (really a "base" for using
  block-based compressors like PForDelta for storing postings data).
-
-* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
-  can be used to prevent commits from ever getting deleted from the index.
-  (Shai Erera)
  
 * LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard
  codec is more RAM efficient: terms data is stored as block byte
@ -271,16 +228,6 @@ New features
  applications that have many unique terms, since it reduces how often
  a new segment must be flushed given a fixed RAM buffer size.

-* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can 
-  return a DirPayloadProcessor for a given Directory, which returns a 
-  PayloadProcessor for a given Term. The PayloadProcessor will be used to 
-  process the payloads of the segments as they are merged (e.g. if one wants to
-  rewrite payloads of external indexes as they are added, or of local ones). 
-  (Shai Erera, Michael Busch, Mike McCandless)
-
-* LUCENE-2440: Add support for custom ExecutorService in
-  ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
-
 * LUCENE-2489: Added PerFieldCodecWrapper (in oal.index.codecs) which
  lets you set the Codec per field (Mike McCandless)

@ -291,17 +238,6 @@ New features
  SegmentInfosReader to allow customization of SegmentInfos data.
  (Andrzej Bialecki)

-* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
-  McCandless)
-
-* LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq.  Along
-  with a custom Collector these experimental methods make it possible
-  to gather the hit-count per sub-clause and per document while a
-  search is running.  (Simon Willnauer, Mike McCandless)
-
-* LUCENE-2636: Added MultiCollector which allows running the search with several
-  Collectors. (Shai Erera)
-
 * LUCENE-2504: FieldComparator.setNextReader now returns a
  FieldComparator instance.  You can "return this", to just reuse the
  same instance, or you can return a comparator optimized to the new
@ -364,17 +300,6 @@ New features

 Optimizations

-* LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
-  (Mike McCandless)
-
-* LUCENE-2531: Fix issue when sorting by a String field that was
-  causing too many fallbacks to compare-by-value (instead of by-ord).
-  (Mike McCandless)
-
-* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for 
-  efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
-  streams. (Shai Erera)
-
 * LUCENE-2588: Don't store unnecessary suffixes when writing the terms
  index, saving RAM in IndexReader; change default terms index
  interval from 128 to 32, because the terms index now requires much
@ -389,11 +314,6 @@ Optimizations
  MultiTermQuery now stores TermState per leaf reader during rewrite to re-
  seek the term dictionary in TermQuery / TermWeight.
  (Simon Willnauer, Mike McCandless, Robert Muir)
-
-Documentation
-
-* LUCENE-2579: Fix oal.search's package.html description of abstract
-  methods.  (Santiago M. Mola via Mike McCandless)
  
 Bug fixes

@ -404,14 +324,6 @@ Bug fixes
  with more document deletions is requested before a reader with fewer
  deletions, provided they share some segments. (yonik)

-* LUCENE-2802: NRT DirectoryReader returned incorrect values from
-  getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
-  to a mutable reference to the IndexWriters SegmentInfos. 
-  (Simon Willnauer, Earwin Burrfoot)
-
-* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it 
-  decides whether to return the cached computed size or not. (Shai Erera)
-
 ======================= Lucene 3.x (not yet released) =======================

 Changes in backwards compatibility policy
@ -476,10 +388,33 @@ Changes in backwards compatibility policy
 * LUCENE-2733: Removed public constructors of utility classes with only static
  methods to prevent instantiation.  (Uwe Schindler)

-* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List
-  instead of a Collection, guaranteeing the commits are sorted from oldest to 
-  latest. (Shai Erera)
+* LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
+  takes deletions into account by default.  You can disable this by
+  calling setCalibrateSizeByDeletes(false) on the merge policy.  (Mike
+  McCandless)
+
+* LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
+  values in multi-valued field has been changed for some cases in index.
+  If you index empty fields and uses positions/offsets information on that
+  fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
  
+* LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
+  (Shai Erera, Robert Muir)
+  
+* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
+  Searchable are collapsed into IndexSearcher; contrib/remote and
+  MultiSearcher have been removed.  (Mike McCandless)
+
+* LUCENE-2854: Deprecated SimilarityDelegator and
+  Similarity.lengthNorm; the latter is now final, forcing any custom
+  Similarity impls to cutover to the more general computeNorm (Robert
+  Muir, Mike McCandless)
+
+* LUCENE-2674: A new idfExplain method was added to Similarity, that
+  accepts an incoming docFreq.  If you subclass Similarity, make sure
+  you also override this method on upgrade.  (Robert Muir, Mike
+  McCandless)
+
 Changes in runtime behavior

 * LUCENE-1923: Made IndexReader.toString() produce something
@ -495,7 +430,7 @@ Changes in runtime behavior
  invokes a merge on the incoming and target segments, but instead copies the
  segments to the target index. You can call maybeMerge or optimize after this
  method completes, if you need to.
-  
+
  In addition, Directory.copyTo* were removed in favor of copy which takes the
  target Directory, source and target files as arguments, and copies the source
  file to the target Directory under the target file name. (Shai Erera)
@ -512,6 +447,33 @@ Changes in runtime behavior
  merges). This means that you can run optimize() and too large segments won't 
  be merged. (Shai Erera)

+* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
+  guaranteeing the commits are sorted from oldest to latest. (Shai Erera)
+
+* LUCENE-2785: TopScoreDocCollector, TopFieldCollector and
+  the IndexSearcher search methods that take an int nDocs will now
+  throw IllegalArgumentException if nDocs is 0.  Instead, you should
+  use the newly added TotalHitCountCollector.  (Mike McCandless)
+  
+* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio 
+  to determine whether the passed in segment should be compound. 
+  (Shai Erera, Earwin Burrfoot)
+
+* LUCENE-2805: IndexWriter now increments the index version on every change to
+  the index instead of for every commit. Committing or closing the IndexWriter
+  without any changes to the index will not cause any index version increment.
+  (Simon Willnauer, Mike McCandless)
+
+* LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
+  Windows and Solaris systems that support unmapping, FSDirectory.open returns
+  MMapDirectory. Additionally the behavior of MMapDirectory has been
+  changed to enable unmapping by default if supported by the JRE.
+  (Mike McCandless, Uwe Schindler, Robert Muir)
+
+* LUCENE-2829: Improve the performance of "primary key" lookup use
+  case (running a TermQuery that matches one document) on a
+  multi-segment index.  (Robert Muir, Mike McCandless)
+  
 API Changes

 * LUCENE-2076: Rename FSDirectory.getFile -> getDirectory.  (George
@ -522,7 +484,7 @@ API Changes
  custom Similarity can alter how norms are encoded, though they must
  still be encoded as a single byte (Johan Kindgren via Mike
  McCandless)
-  
+
 * LUCENE-2103: NoLockFactory should have a private constructor;
  until Lucene 4.0 the default one will be deprecated.
  (Shai Erera via Uwe Schindler) 
@ -594,17 +556,42 @@ API Changes
  (such as SnapshotDeletionPolicy), you can call this method to remove those
  commit points when they are not needed anymore (instead of waiting for the 
  next commit). (Shai Erera)
-
-* LUCENE-2455: IndexWriter.addIndexesNoOptimize was renamed to addIndexes.
-  IndexFileNames.segmentFileName now takes another parameter to accommodate
-  custom file names. You should use this method to name all your files.
-  (Shai Erera)
  
 * LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
  with equivalent ones that take a String (id) as argument. You can pass
  whatever ID you want, as long as you use the same one when calling both. 
  (Shai Erera)
  
+* LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
+  set what IndexWriter passes for termsIndexDivisor to the readers it
+  opens internally when apply deletions or creating a near-real-time
+  reader.  (Earwin Burrfoot via Mike McCandless)
+
+* LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer
+  in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
+  Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
+  points, including values from U+FFFF to U+10FFFF
+   
+  ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
+  Analyzer implementation and behavior.  Only the Unicode Basic Multilingual
+  Plane (code points from U+0000 to U+FFFF) is covered.
+
+  UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
+  relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
+  (Steven Rowe, Robert Muir, Uwe Schindler)
+   
+* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
+  and return a different RAMFile implementation. (Shai Erera)
+  
+* LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
+  count the number of hits matching the query.  (Mike McCandless)
+
+* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method 
+  is only syntactic sugar for setNorm(int, String, byte), but  using the global 
+  Similarity.getDefault().encodeNormValue().  Use the byte-based method instead 
+  to ensure that the norm is encoded with your Similarity.
+  (Robert Muir, Mike McCandless)
+
 Bug fixes

 * LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
@ -625,10 +612,6 @@ Bug fixes
  a prior (corrupt) index missing its segments_N file.  (Mike
  McCandless)

-* LUCENE-2534: fix over-sharing bug in
-  MultiTermsEnum.docs/AndPositionsEnum.  (Robert Muir, Mike
-  McCandless)
-
 * LUCENE-2458: QueryParser no longer automatically forms phrase queries,
  assuming whitespace tokenization. Previously all CJK queries, for example,
  would be turned into phrase queries. The old behavior is preserved with
@ -647,7 +630,22 @@ Bug fixes
  can cause the same document to score to differently depending on
  what segment it resides in. (yonik)

-* LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)  
+* LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
+
+* LUCENE-2732: Fix charset problems in XML loading in
+  HyphenationCompoundWordTokenFilter.  (Uwe Schindler)
+
+* LUCENE-2802: NRT DirectoryReader returned incorrect values from
+  getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
+  to a mutable reference to the IndexWriters SegmentInfos. 
+  (Simon Willnauer, Earwin Burrfoot)
+
+* LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
+  false EOF after seeking to EOF then seeking back to same block you
+  were just in and then calling readBytes (Robert Muir, Mike McCandless)
+
+* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it 
+  decides whether to return the cached computed size or not. (Shai Erera)

 New features

@ -720,6 +718,16 @@ New features
  can be used to prevent commits from ever getting deleted from the index.
  (Shai Erera)
  
+* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can 
+  return a DirPayloadProcessor for a given Directory, which returns a 
+  PayloadProcessor for a given Term. The PayloadProcessor will be used to 
+  process the payloads of the segments as they are merged (e.g. if one wants to
+  rewrite payloads of external indexes as they are added, or of local ones). 
+  (Shai Erera, Michael Busch, Mike McCandless)
+
+* LUCENE-2440: Add support for custom ExecutorService in
+  ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
+
 * LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter
  to wrap any other Analyzer and provide the same functionality as
  MaxFieldLength provided on IndexWriter.  This patch also fixes a bug
@ -727,9 +735,17 @@ New features

 * LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
  it's empty.  (Ross Woolf via Mike McCandless)
+  
+* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
+  McCandless)

-* LUCENE-2671: Add SortField.setMissingValue( v ) to enable sorting
-  behavior for documents that do not include the given field. (ryan)
+* LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq.  Along
+  with a custom Collector these experimental methods make it possible
+  to gather the hit-count per sub-clause and per document while a
+  search is running.  (Simon Willnauer, Mike McCandless)
+
+* LUCENE-2636: Added MultiCollector which allows running the search with several
+  Collectors. (Shai Erera)

 * LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries
  to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
@ -748,6 +764,9 @@ New features
  
 Optimizations

+* LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
+  simple polling for results. (Edward Drapkin, Simon Willnauer)
+
 * LUCENE-2075: Terms dict cache is now shared across threads instead
  of being stored separately in thread local storage.  Also fixed
  terms dict so that the cache is used when seeking the thread local
@ -810,6 +829,17 @@ Optimizations
  (getStrings, getStringIndex), consume quite a bit less RAM in most
  cases.  (Mike McCandless)

+* LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
+  (Mike McCandless)
+
+* LUCENE-2531: Fix issue when sorting by a String field that was
+  causing too many fallbacks to compare-by-value (instead of by-ord).
+  (Mike McCandless)
+
+* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for 
+  efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
+  streams. (Shai Erera)
+
 * LUCENE-2719: Improved TermsHashPerField's sorting to use a better
  quick sort algorithm that dereferences the pivot element not on
  every compare call. Also replaced lots of sorting code in Lucene
@ -889,6 +919,18 @@ Test Cases
  as Eclipse and IntelliJ.
  (Paolo Castagna, Steven Rowe via Robert Muir)

+* LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
+  random. (Shai Erera, Robert Muir)
+  
+Documentation
+
+* LUCENE-2579: Fix oal.search's package.html description of abstract
+  methods.  (Santiago M. Mola via Mike McCandless)
+   
+* LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
+  that the TermEnum must be seeked since it is unpositioned.
+  (Adriano Crestani via Robert Muir)
+  
 ================== Release 2.9.4 / 3.0.3 2010-12-03 ====================

 Changes in runtime behavior