LUCENE-3733: Update documentation, changes,...

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243048 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Uwe Schindler 2012-02-11 12:38:41 +00:00
parent cdc2173e09
commit 0095bd10fd
4 changed files with 99 additions and 44 deletions

View File

@ -78,6 +78,19 @@ Changes in backwards compatibility policy
(Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch) (Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch)
* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract
AtomicReader, CompositeReader, and DirectoryReader. To open Directory-
based indexes use DirectoryReader.open(), the corresponding method in
IndexReader is now deprecated for easier migration. Only DirectoryReader
supports commits, versions, and reopening with openIfChanged(). Terms,
postings, docvalues, and norms can from now on only be retrieved using
AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
only offering stored fields and access to the sub-readers (which may be
composite or atomic). SlowCompositeReaderWrapper (LUCENE-2597) can be
used to emulate atomic readers on top of composites.
Please review MIGRATE.txt for information how to migrate old code.
(Uwe Schindler, Robert Muir, Mike McCandless)
* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints, * LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
not unicode code units. For example, a Wildcard "?" represents any unicode not unicode code units. For example, a Wildcard "?" represents any unicode
character. Furthermore, the rest of the automaton package and RegexpQuery use character. Furthermore, the rest of the automaton package and RegexpQuery use
@ -98,7 +111,7 @@ Changes in backwards compatibility policy
and TermToBytesRefAttribute instead. (Uwe Schindler) and TermToBytesRefAttribute instead. (Uwe Schindler)
* LUCENE-2600: Remove IndexReader.isDeleted in favor of * LUCENE-2600: Remove IndexReader.isDeleted in favor of
IndexReader.getDeletedDocs(). (Mike McCandless) AtomicReader.getDeletedDocs(). (Mike McCandless)
* LUCENE-2667: FuzzyQuery's defaults have changed for more performant * LUCENE-2667: FuzzyQuery's defaults have changed for more performant
behavior: the minimum similarity is 2 edit distances from the word, behavior: the minimum similarity is 2 edit distances from the word,
@ -117,10 +130,6 @@ Changes in backwards compatibility policy
need to change it (e.g. using "\\" to escape '\' itself). need to change it (e.g. using "\\" to escape '\' itself).
(Sunil Kamath, Terry Yang via Robert Muir) (Sunil Kamath, Terry Yang via Robert Muir)
* LUCENE-2771: IndexReader.norms() now throws UOE on non-atomic IndexReaders. If
you really want a top-level norms, use MultiNorms or SlowMultiReaderWrapper.
(Uwe Schindler, Robert Muir)
* LUCENE-2837: Collapsed Searcher, Searchable into IndexSearcher; * LUCENE-2837: Collapsed Searcher, Searchable into IndexSearcher;
removed contrib/remote and MultiSearcher (Mike McCandless); absorbed removed contrib/remote and MultiSearcher (Mike McCandless); absorbed
ParallelMultiSearcher into IndexSearcher as an optional ParallelMultiSearcher into IndexSearcher as an optional
@ -189,9 +198,9 @@ Changes in backwards compatibility policy
with the old tokenStream() method removed. Consequently it is now mandatory with the old tokenStream() method removed. Consequently it is now mandatory
for all Analyzers to support reusability. (Chris Male) for all Analyzers to support reusability. (Chris Male)
* LUCENE-3473: IndexReader.getUniqueTermCount() no longer throws UOE when * LUCENE-3473: AtomicReader.getUniqueTermCount() no longer throws UOE when
it cannot be easily determined (e.g. Multi*Readers). Instead, it returns it cannot be easily determined. Instead, it returns -1 to be consistent with
-1 to be consistent with this behavior across other index statistics. this behavior across other index statistics.
(Robert Muir) (Robert Muir)
* LUCENE-1536: The abstract FilteredDocIdSet.match() method is no longer * LUCENE-1536: The abstract FilteredDocIdSet.match() method is no longer
@ -207,18 +216,18 @@ Changes in backwards compatibility policy
* LUCENE-3533: Removed SpanFilters, they created large lists of objects and * LUCENE-3533: Removed SpanFilters, they created large lists of objects and
did not scale. (Robert Muir) did not scale. (Robert Muir)
* LUCENE-3606: IndexReader was made read-only. It is no longer possible to * LUCENE-3606: IndexReader and subclasses were made read-only. It is no longer
delete or undelete documents using IndexReader; you have to use IndexWriter possible to delete or undelete documents using IndexReader; you have to use
now. As deleting by internal Lucene docID is no longer possible, this IndexWriter now. As deleting by internal Lucene docID is no longer possible,
requires adding a unique identifier field to your index. Deleting/relying this requires adding a unique identifier field to your index. Deleting/
upon Lucene docIDs is not recommended anyway, because they can change. relying upon Lucene docIDs is not recommended anyway, because they can
Consequently commit() was removed and IndexReader.open(), openIfChanged(), change. Consequently commit() was removed and DirectoryReader.open(),
and clone() no longer take readOnly booleans or IndexDeletionPolicy openIfChanged() no longer take readOnly booleans or IndexDeletionPolicy
instances. Furthermore, IndexReader.setNorm() was removed. If you need instances. Furthermore, IndexReader.setNorm() was removed. If you need
customized norm values, the recommended way to do this is by modifying customized norm values, the recommended way to do this is by modifying
Similarity to use an external byte[] or one of the new DocValues Similarity to use an external byte[] or one of the new DocValues
fields (LUCENE-3108). Alternatively, to dynamically change norms (boost fields (LUCENE-3108). Alternatively, to dynamically change norms (boost
*and* length norm) at query time, wrap your IndexReader using *and* length norm) at query time, wrap your AtomicReader using
FilterIndexReader, overriding FilterIndexReader.norms(). To persist the FilterIndexReader, overriding FilterIndexReader.norms(). To persist the
changes on disk, copy the FilteredIndexReader to a new index using changes on disk, copy the FilteredIndexReader to a new index using
IndexWriter.addIndexes(). (Uwe Schindler, Robert Muir) IndexWriter.addIndexes(). (Uwe Schindler, Robert Muir)
@ -231,13 +240,10 @@ Changes in backwards compatibility policy
FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS. (Robert FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS. (Robert
Muir, Mike McCandless) Muir, Mike McCandless)
* LUCENE-3646: FieldCacheImpl now throws UOE on non-atomic IndexReaders. If * LUCENE-2858: FilterIndexReader now extends AtomicReader. If you want to
you really want a top-level fieldcache, use SlowMultiReaderWrapper. filter composite readers like DirectoryReader or MultiReader, filter
(Robert Muir) their atomic leaves and build a new CompositeReader (e.g. MultiReader)
around them. (Uwe Schindler, Robert Muir)
* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract
AtomicReader, CompositeReader, and DirectoryReader. TODO:add more info
(Uwe Schindler, Mike McCandless, Robert Muir)
* LUCENE-3736: ParallelReader was split into ParallelAtomicReader * LUCENE-3736: ParallelReader was split into ParallelAtomicReader
and ParallelCompositeReader. Lucene 3.x's ParallelReader is now and ParallelCompositeReader. Lucene 3.x's ParallelReader is now
@ -335,9 +341,6 @@ Changes in Runtime Behavior
(Mike McCandless, Michael Busch, Simon Willnauer) (Mike McCandless, Michael Busch, Simon Willnauer)
* LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
does not store norms. (Shai Erera, Mike McCandless)
* LUCENE-3309: Stored fields no longer record whether they were * LUCENE-3309: Stored fields no longer record whether they were
tokenized or not. In general you should not rely on stored fields tokenized or not. In general you should not rely on stored fields
to record any "metadata" from indexing (tokenized, omitNorms, to record any "metadata" from indexing (tokenized, omitNorms,
@ -354,12 +357,12 @@ API Changes
and iterated as byte[] (wrapped in a BytesRef) by IndexReader for and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
searching. searching.
* LUCENE-1458, LUCENE-2111: IndexReader now directly exposes its * LUCENE-1458, LUCENE-2111: AtomicReader now directly exposes its
deleted docs (getDeletedDocs), providing a new Bits interface to deleted docs (getDeletedDocs), providing a new Bits interface to
directly query by doc ID. directly query by doc ID.
* LUCENE-2691: IndexWriter.getReader() has been made package local and is now * LUCENE-2691: IndexWriter.getReader() has been made package local and is now
exposed via open and reopen methods on IndexReader. The semantics of the exposed via open and reopen methods on DirectoryReader. The semantics of the
call is the same as it was prior to the API change. call is the same as it was prior to the API change.
(Grant Ingersoll, Mike McCandless) (Grant Ingersoll, Mike McCandless)
@ -370,14 +373,6 @@ API Changes
Collector#setNextReader & FieldComparator#setNextReader now expect an Collector#setNextReader & FieldComparator#setNextReader now expect an
AtomicReaderContext instead of an IndexReader. (Simon Willnauer) AtomicReaderContext instead of an IndexReader. (Simon Willnauer)
* LUCENE-2846: Remove the deprecated IndexReader.setNorm(int, String, float).
This method was only syntactic sugar for setNorm(int, String, byte), but
using the global Similarity.getDefault().encodeNormValue. Use the byte-based
method instead to ensure that the norm is encoded with your Similarity.
Also removed norms(String, byte[], int), which was only used by MultiReader
for building top-level norms. If you really need a top-level norms, use
MultiNorms or SlowMultiReaderWrapper. (Robert Muir, Mike Mccandless)
* LUCENE-2892: Add QueryParser.newFieldQuery (called by getFieldQuery by * LUCENE-2892: Add QueryParser.newFieldQuery (called by getFieldQuery by
default) which takes Analyzer as a parameter, for easier customization by default) which takes Analyzer as a parameter, for easier customization by
subclasses. (Robert Muir) subclasses. (Robert Muir)
@ -485,7 +480,7 @@ New features
Sep*, makes it simple to take any variable sized int block coders Sep*, makes it simple to take any variable sized int block coders
(like Simple9/16) and use them in a codec. (Mike McCandless) (like Simple9/16) and use them in a codec. (Mike McCandless)
* LUCENE-2597: Add oal.index.SlowMultiReaderWrapper, to wrap a * LUCENE-2597: Add oal.index.SlowCompositeReaderWrapper, to wrap a
composite reader (eg MultiReader or DirectoryReader), making it composite reader (eg MultiReader or DirectoryReader), making it
pretend it's an atomic reader. This is a convenience class (you can pretend it's an atomic reader. This is a convenience class (you can
use MultiFields static methods directly, instead) if you need to use use MultiFields static methods directly, instead) if you need to use
@ -621,7 +616,7 @@ New features
* LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements * LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements
a new bits() method, returning all documents in a random access way. If the a new bits() method, returning all documents in a random access way. If the
DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer
as replacement for IndexReader's live docs. as replacement for AtomicReader's live docs.
In addition, FilteredQuery backs now IndexSearcher's filtering search methods. In addition, FilteredQuery backs now IndexSearcher's filtering search methods.
Using FilteredQuery you can chain Filters in a very performant way Using FilteredQuery you can chain Filters in a very performant way
[new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not [new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not
@ -635,7 +630,7 @@ New features
load only certain fields when loading a document. (Peter Chang via load only certain fields when loading a document. (Peter Chang via
Mike McCandless) Mike McCandless)
* LUCENE-3628: Norms are represented as DocValues. IndexReader exposes * LUCENE-3628: Norms are represented as DocValues. AtomicReader exposes
a #normValues(String) method to obtain norms per field. (Simon Willnauer) a #normValues(String) method to obtain norms per field. (Simon Willnauer)
* LUCENE-3687: Similarity#computeNorm(FieldInvertState, Norm) allows to compute * LUCENE-3687: Similarity#computeNorm(FieldInvertState, Norm) allows to compute

View File

@ -278,6 +278,66 @@ LUCENE-1458, LUCENE-2111: Flexible Indexing
// document is deleted... // document is deleted...
} }
* LUCENE-2858, LUCENE-3733: The abstract class IndexReader has been
refactored to expose only essential methods to access stored fields
during display of search results. It is no longer possible to retrieve
terms or postings data from the underlying index, not even deletions are
visible anymore. You can still pass IndexReader as constructor parameter
to IndexSearcher and execute your searches; Lucene will automatically
delegate procedures like query rewriting and document collection atomic
subreaders.
If you want to dive deeper into the index and want to write own queries,
take a closer look at the new abstract sub-classes AtomicReader and
CompositeReader:
AtomicReader instances are now the only source of Terms, Postings,
DocValues and FieldCache. Queries are forced to execute on a Atomic
reader on a per-segment basis and FieldCaches are keyed by
AtomicReaders.
Its counterpart CompositeReader exposes a utility method to retrieve
its composites. But watch out, composites are not necessarily atomic.
Next to the added type-safety we also removed the notion of
index-commits and version numbers from the abstract IndexReader, the
associations with IndexWriter were pulled into a specialized
DirectoryReader. To open Directory-based indexes use
DirectoryReader.open(), the corresponding method in IndexReader is now
deprecated for easier migration. Only DirectoryReader supports commits,
versions, and reopening with openIfChanged(). Terms, postings,
docvalues, and norms can from now on only be retrieved using
AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
only offering stored fields and access to the sub-readers (which may be
composite or atomic).
If you have more advanced code dealing with custom Filters, you might
have noticed another new class hierarchy in Lucene (see LUCENE-2831):
IndexReaderContext with corresponding Atomic-/CompositeReaderContext.
The move towards per-segment search Lucene 2.9 exposed lots of custom
Queries and Filters that couldn't handle it. For example, some Filter
implementations expected the IndexReader passed in is identical to the
IndexReader passed to IndexSearcher with all its advantages like
absolute document IDs etc. Obviously this "paradigm-shift" broke lots of
applications and especially those that utilized cross-segment data
structures (like Apache Solr).
In Lucene 4.0, we introduce IndexReaderContexts "searcher-private"
reader hierarchy. During Query or Filter execution Lucene no longer
passes raw readers down Queries, Filters or Collectors; instead
components are provided an AtomicReaderContext (essentially a hierarchy
leaf) holding relative properties like the document-basis in relation to
the top-level reader. This allows Queries & Filter to build up logic
based on document IDs, albeit the per-segment orientation.
There are still valid use-cases where top-level readers ie. "atomic
views" on the index are desirable. Let say you want to iterate all terms
of a complete index for auto-completion or facetting, Lucene provides
utility wrappers like SlowCompositeReaderWrapper (LUCENE-2597) emulating
an AtomicReader. Note: using "atomicity emulators" can cause serious
slowdowns due to the need to merge terms, postings, DocValues, and
FieldCache, use them with care!
* LUCENE-2674: A new idfExplain method was added to Similarity, that * LUCENE-2674: A new idfExplain method was added to Similarity, that
accepts an incoming docFreq. If you subclass Similarity, make sure accepts an incoming docFreq. If you subclass Similarity, make sure
you also override this method on upgrade, otherwise your you also override this method on upgrade, otherwise your

View File

@ -326,8 +326,8 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
Weight#normalize(float)</a> &mdash; Determine the query normalization factor. The query normalization may Weight#normalize(float)</a> &mdash; Determine the query normalization factor. The query normalization may
allow for comparing scores between queries.</li> allow for comparing scores between queries.</li>
<li> <li>
<a href="Weight.html#scorer(org.apache.lucene.index.IndexReader, boolean, boolean)"> <a href="Weight.html#scorer(org.apache.lucene.index.AtomicReaderContext, boolean, boolean)">
Weight#scorer(IndexReader, boolean, boolean)</a> &mdash; Construct a new Weight#scorer(AtomicReaderContext, boolean, boolean)</a> &mdash; Construct a new
<a href="Scorer.html">Scorer</a> <a href="Scorer.html">Scorer</a>
for this Weight. See for this Weight. See
<a href="#The Scorer Class">The Scorer Class</a> <a href="#The Scorer Class">The Scorer Class</a>
@ -335,8 +335,8 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
Scorer is responsible for doing the actual scoring of documents given the Query. Scorer is responsible for doing the actual scoring of documents given the Query.
</li> </li>
<li> <li>
<a href="Weight.html#explain(org.apache.lucene.search.Searcher, org.apache.lucene.index.IndexReader, int)"> <a href="Weight.html#explain(org.apache.lucene.search.Searcher, org.apache.lucene.index.AtomicReaderContext, int)">
Weight#explain(Searcher, IndexReader, int)</a> &mdash; Provide a means for explaining why a given document was Weight#explain(Searcher, AtomicReaderContext, int)</a> &mdash; Provide a means for explaining why a given document was
scored scored
the way it was.</li> the way it was.</li>
</ol> </ol>

View File

@ -61,7 +61,7 @@ to check if the results are what we expect):</p>
iwriter.close(); iwriter.close();
// Now search the index: // Now search the index:
IndexReader ireader = IndexReader.open(directory); // read-only=true DirectoryReader ireader = DirectoryReader.open(directory); // read-only=true
IndexSearcher isearcher = new IndexSearcher(ireader); IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text": // Parse a simple query that searches for "text":
QueryParser parser = new QueryParser("fieldname", analyzer); QueryParser parser = new QueryParser("fieldname", analyzer);