split changes for 3.x and trunk

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@940818 13f79535-47bb-0310-9956-ffa450edef68
2025-02-06 10:08:58 +00:00 · 2010-05-04 12:04:08 +00:00 · 2010-05-04 12:04:08 +00:00 · 20a0dc280d
commit 20a0dc280d
parent cd320b5f57
1 changed files with 114 additions and 128 deletions
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@ -70,6 +70,118 @@ Changes in backwards compatibility policy
    TermAttribute/CharTermAttribute. If you want to further filter
    or attach Payloads to NTS, use the new NumericTermAttribute.

+* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
+  creation. Previously, if you passed an empty Directory and set OpenMode to
+  CREATE*, IndexWriter would make a first empty commit. If you need that 
+  behavior you can call writer.commit()/close() immediately after you create it.
+  (Shai Erera, Mike McCandless)
+  
+* LUCENE-2316: Directory.fileLength contract was clarified - it returns the
+  actual file's length if the file exists, and throws FileNotFoundException
+  otherwise. Returning length=0 for a non-existent file is no longer allowed. If
+  you relied on that, make sure to catch the exception. (Shai Erera)
+
+* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
+  not unicode code units. For example, a Wildcard "?" represents any unicode
+  character. Furthermore, the rest of the automaton package and RegexpQuery use 
+  true Unicode codepoint representation.  (Robert Muir, Mike McCandless)
+ 
+Changes in runtime behavior
+
+* LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if 
+  it cannot delete the lock file, since obtaining the lock does not fail if the 
+  file is there. (Shai Erera)
+
+API Changes
+
+* LUCENE-1458, LUCENE-2111: The postings APIs (TermEnum, TermDocsEnum,
+  TermPositionsEnum) have been deprecated in favor of the new flexible
+  indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum,
+  DocsEnum, DocsAndPositionsEnum). One big difference is that field
+  and terms are now enumerated separately: a TermsEnum provides a
+  BytesRef (wraps a byte[]) per term within a single field, not a
+  Term.  Another is that when asking for a Docs/AndPositionsEnum, you
+  now specify the skipDocs explicitly (typically this will be the
+  deleted docs, but in general you can provide any Bits).
+
+* LUCENE-1458, LUCENE-2111: IndexReader now directly exposes its
+  deleted docs (getDeletedDocs), providing a new Bits interface to
+  directly query by doc ID.
+
+* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
+  points too. If you use an IndexDeletionPolicy which holds onto index commits
+  (such as SnapshotDeletionPolicy), you can call this method to remove those
+  commit points when they are not needed anymore (instead of waiting for the 
+  next commit). (Shai Erera)
+
+New features
+
+* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that 
+  matches terms against a finite-state machine. Implement WildcardQuery
+  and FuzzyQuery with finite-state methods. Adds RegexpQuery.
+  (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
+
+* LUCENE-1990: Adds internal packed ints implementation, to be used
+  for more efficient storage of int arrays when the values are
+  bounded, for example for storing the terms dict index Toke Toke
+  Eskildsen via Mike McCandless)
+
+* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
+  representation for the in-memory terms dict index.  (Mike
+  McCandless)
+
+* LUCENE-2126: Add new classes for data (de)serialization: DataInput
+  and DataOutput.  IndexInput and IndexOutput extend these new classes.
+  (Michael Busch)
+
+* LUCENE-1458, LUCENE-2111: With flexible indexing it is now possible
+  for an application to create its own postings codec, to alter how
+  fields, terms, docs and positions are encoded into the index.  The
+  standard codec is the default codec.  Both IndexWriter and
+  IndexReader accept a CodecProvider class to obtain codecs for newly
+  written segments as well as existing segments opened for reading.
+
+* LUCENE-1458, LUCENE-2111: Some experimental codecs have been added
+  for flexible indexing, including pulsing codec (inlines
+  low-frequency terms directly into the terms dict, avoiding seeking
+  for some queries), sep codec (stores docs, freqs, positions, skip
+  data and payloads in 5 separate files instead of the 2 used by
+  standard codec), and int block (really a "base" for using
+  block-based compressors like PForDelta for storing postings data).
+
+* LUCENE-2302, LUCENE-1458, LUCENE-2111: Terms are no longer required
+  to be character based.  Lucene views a term as an arbitrary byte[]:
+  during analysis, character-based terms are converted to UTF8 byte[],
+  but analyzers are free to directly create terms as byte[]
+  (NumericField does this, for example).  The term data is buffered as
+  byte[] during indexing, written as byte[] into the terms dictionary,
+  and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
+  searching.
+
+* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
+  can be used to prevent commits from ever getting deleted from the index.
+  (Shai Erera)
+  
+* LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard
+  codec is more RAM efficient: terms data is stored as block byte
+  arrays and packed integers.  Net RAM reduction for indexes that have
+  many unique terms should be substantial, and initial open time for
+  IndexReaders should be faster.  These gains only apply for newly
+  written segments after upgrading.
+
+* LUCENE-1458, LUCENE-2111: Terms data are now buffered directly as
+  byte[] during indexing, which uses half the RAM for ascii terms (and
+  also numeric fields).  This can improve indexing throughput for
+  applications that have many unique terms, since it reduces how often
+  a new segment must be flushed given a fixed RAM buffer size.
+
+* LUCENE-2398: Improve tests to work better from IDEs such as Eclipse.
+  (Paolo Castagna via Robert Muir)
+
+======================= Lucene 3.x (not yet released) =======================
+
+Changes in backwards compatibility policy
+
 * LUCENE-1483: Removed utility class oal.util.SorterTemplate; this
  class is no longer used by Lucene.  (Gunnar Wagenknecht via Mike
  McCandless)
@ -117,22 +229,6 @@ Changes in backwards compatibility policy
  of incrementToken(), tokenStream(), and reusableTokenStream().
  (Uwe Schindler, Robert Muir)

-* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
-  creation. Previously, if you passed an empty Directory and set OpenMode to
-  CREATE*, IndexWriter would make a first empty commit. If you need that 
-  behavior you can call writer.commit()/close() immediately after you create it.
-  (Shai Erera, Mike McCandless)
-  
-* LUCENE-2316: Directory.fileLength contract was clarified - it returns the
-  actual file's length if the file exists, and throws FileNotFoundException
-  otherwise. Returning length=0 for a non-existent file is no longer allowed. If
-  you relied on that, make sure to catch the exception. (Shai Erera)
-
-* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
-  not unicode code units. For example, a Wildcard "?" represents any unicode
-  character. Furthermore, the rest of the automaton package and RegexpQuery use 
-  true Unicode codepoint representation.  (Robert Muir, Mike McCandless)
-  
 Changes in runtime behavior

 * LUCENE-1923: Made IndexReader.toString() produce something
@ -141,10 +237,6 @@ Changes in runtime behavior
 * LUCENE-2179: CharArraySet.clear() is now functional.
  (Robert Muir, Uwe Schindler)

-* LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if 
-  it cannot delete the lock file, since obtaining the lock does not fail if the 
-  file is there. (Shai Erera)
-
 API Changes

 * LUCENE-2076: Rename FSDirectory.getFile -> getDirectory.  (George
@ -215,20 +307,6 @@ API Changes
  FSDirectory to see a sample of how such tracking might look like, if needed
  in your custom Directories.  (Earwin Burrfoot via Mike McCandless)

-* LUCENE-1458, LUCENE-2111: The postings APIs (TermEnum, TermDocsEnum,
-  TermPositionsEnum) have been deprecated in favor of the new flexible
-  indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum,
-  DocsEnum, DocsAndPositionsEnum). One big difference is that field
-  and terms are now enumerated separately: a TermsEnum provides a
-  BytesRef (wraps a byte[]) per term within a single field, not a
-  Term.  Another is that when asking for a Docs/AndPositionsEnum, you
-  now specify the skipDocs explicitly (typically this will be the
-  deleted docs, but in general you can provide any Bits).
-
-* LUCENE-1458, LUCENE-2111: IndexReader now directly exposes its
-  deleted docs (getDeletedDocs), providing a new Bits interface to
-  directly query by doc ID.
-
 * LUCENE-2302: Deprecated TermAttribute and replaced by a new
  CharTermAttribute. The change is backwards compatible, so
  mixed new/old TokenStreams all work on the same char[] buffer
@ -240,12 +318,6 @@ API Changes
  expressions).
  (Uwe Schindler, Robert Muir)

-* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
-  points too. If you use an IndexDeletionPolicy which holds onto index commits
-  (such as SnapshotDeletionPolicy), you can call this method to remove those
-  commit points when they are not needed anymore (instead of waiting for the 
-  next commit). (Shai Erera)
-  
 Bug fixes

 * LUCENE-2119: Don't throw NegativeArraySizeException if you pass
@ -290,28 +362,9 @@ Bug fixes
  is fixed to return null if there are no segments.  (Karthick
  Sankarachary via Mike McCandless)

-* LUCENE-2222: FixedIntBlockIndexInput incorrectly read one block of
-  0s before the actual data.  (Renaud Delbru via Mike McCandless)
-
-* LUCENE-2344: PostingsConsumer.merge was failing to call finishDoc,
-  which caused corruption for sep codec.  Also fixed several tests to
-  test all 4 core codecs.  (Renaud Delbru via Mike McCandless)
-
 * LUCENE-2074: Reduce buffer size of lexer back to default on reset.
  (Ruben Laguna, Shai Erera via Uwe Schindler)
  
-* LUCENE-2387: Don't hang onto Fieldables from the last doc indexed,
-  in IndexWriter, nor the Reader in Tokenizer after close is
-  called.  (Ruben Laguna, Uwe Schindler, Mike McCandless)
-  
-* LUCENE-2417: IndexCommit did not implement hashCode() and equals() 
-  consitently. Now they both take Directory and version into consideration. In
-  addition, all of IndexComnmit methods which threw 
-  UnsupportedOperationException are now abstract. (Shai Erera)
-
-* LUCENE-2424: Fix FieldDoc.toString to actually return its fields
-  (Stephen Green via Mike McCandless)
-
 New features

 * LUCENE-2128: Parallelized fetching document frequencies during weight
@ -376,52 +429,6 @@ New features
  files between FSDirectory instances.  (Earwin Burrfoot via Mike
  McCandless).
  
-* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that 
-  matches terms against a finite-state machine. Implement WildcardQuery
-  and FuzzyQuery with finite-state methods. Adds RegexpQuery.
-  (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
-
-* LUCENE-1990: Adds internal packed ints implementation, to be used
-  for more efficient storage of int arrays when the values are
-  bounded, for example for storing the terms dict index Toke Toke
-  Eskildsen via Mike McCandless)
-
-* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
-  representation for the in-memory terms dict index.  (Mike
-  McCandless)
-
-* LUCENE-2126: Add new classes for data (de)serialization: DataInput
-  and DataOutput.  IndexInput and IndexOutput extend these new classes.
-  (Michael Busch)
-
-* LUCENE-1458, LUCENE-2111: With flexible indexing it is now possible
-  for an application to create its own postings codec, to alter how
-  fields, terms, docs and positions are encoded into the index.  The
-  standard codec is the default codec.  Both IndexWriter and
-  IndexReader accept a CodecProvider class to obtain codecs for newly
-  written segments as well as existing segments opened for reading.
-
-* LUCENE-1458, LUCENE-2111: Some experimental codecs have been added
-  for flexible indexing, including pulsing codec (inlines
-  low-frequency terms directly into the terms dict, avoiding seeking
-  for some queries), sep codec (stores docs, freqs, positions, skip
-  data and payloads in 5 separate files instead of the 2 used by
-  standard codec), and int block (really a "base" for using
-  block-based compressors like PForDelta for storing postings data).
-
-* LUCENE-2302, LUCENE-1458, LUCENE-2111: Terms are no longer required
-  to be character based.  Lucene views a term as an arbitrary byte[]:
-  during analysis, character-based terms are converted to UTF8 byte[],
-  but analyzers are free to directly create terms as byte[]
-  (NumericField does this, for example).  The term data is buffered as
-  byte[] during indexing, written as byte[] into the terms dictionary,
-  and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
-  searching.
-
-* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
-  can be used to prevent commits from ever getting deleted from the index.
-  (Shai Erera)
-  
 * LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
  matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)

@ -492,24 +499,12 @@ Optimizations
  because then it will make sense to make the RAM buffers as large as 
  possible. (Mike McCandless, Michael Busch)

-* LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard
-  codec is more RAM efficient: terms data is stored as block byte
-  arrays and packed integers.  Net RAM reduction for indexes that have
-  many unique terms should be substantial, and initial open time for
-  IndexReaders should be faster.  These gains only apply for newly
-  written segments after upgrading.
-
-* LUCENE-1458, LUCENE-2111: Terms data are now buffered directly as
-  byte[] during indexing, which uses half the RAM for ascii terms (and
-  also numeric fields).  This can improve indexing throughput for
-  applications that have many unique terms, since it reduces how often
-  a new segment must be flushed given a fixed RAM buffer size.

 Build

 * LUCENE-2124: Moved the JDK-based collation support from contrib/collation 
-   into core, and moved the ICU-based collation support into contrib/icu.  
-   (Robert Muir)
+  into core, and moved the ICU-based collation support into contrib/icu.  
+  (Robert Muir)

 * LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
  branch is now included in the svn repository using "svn copy"
@ -518,12 +513,6 @@ Build
 * LUCENE-2074: Regenerating StandardTokenizerImpl files now needs
  JFlex 1.5 (currently only available on SVN). (Uwe Schindler)

-* LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
-  can force them to run sequentially by passing -Drunsequential=1 on the command
-  line. The number of threads that are spwaned per CPU defaults to '1'. If you 
-  wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
-  (Robert Muir, Shai Erera, Peter Kofler) 
-
 Test Cases

 * LUCENE-2037 Allow Junit4 tests in our envrionment (Erick Erickson
@ -558,9 +547,6 @@ Test Cases
  access to "real" files from the test folder itsself, can use
  LuceneTestCase(J4).getDataFile().  (Uwe Schindler)

-* LUCENE-2398: Improve tests to work better from IDEs such as Eclipse.
-  (Paolo Castagna via Robert Muir)
-
 ================== Release 2.9.2 / 3.0.1 2010-02-26 ====================

 Changes in backwards compatibility policy