mirror of https://github.com/apache/lucene.git
LUCENE-3733: Update documentation, changes,...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243048 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
cdc2173e09
commit
0095bd10fd
|
@ -78,6 +78,19 @@ Changes in backwards compatibility policy
|
||||||
|
|
||||||
(Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch)
|
(Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch)
|
||||||
|
|
||||||
|
* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract
|
||||||
|
AtomicReader, CompositeReader, and DirectoryReader. To open Directory-
|
||||||
|
based indexes use DirectoryReader.open(), the corresponding method in
|
||||||
|
IndexReader is now deprecated for easier migration. Only DirectoryReader
|
||||||
|
supports commits, versions, and reopening with openIfChanged(). Terms,
|
||||||
|
postings, docvalues, and norms can from now on only be retrieved using
|
||||||
|
AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
|
||||||
|
only offering stored fields and access to the sub-readers (which may be
|
||||||
|
composite or atomic). SlowCompositeReaderWrapper (LUCENE-2597) can be
|
||||||
|
used to emulate atomic readers on top of composites.
|
||||||
|
Please review MIGRATE.txt for information how to migrate old code.
|
||||||
|
(Uwe Schindler, Robert Muir, Mike McCandless)
|
||||||
|
|
||||||
* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
|
* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
|
||||||
not unicode code units. For example, a Wildcard "?" represents any unicode
|
not unicode code units. For example, a Wildcard "?" represents any unicode
|
||||||
character. Furthermore, the rest of the automaton package and RegexpQuery use
|
character. Furthermore, the rest of the automaton package and RegexpQuery use
|
||||||
|
@ -98,7 +111,7 @@ Changes in backwards compatibility policy
|
||||||
and TermToBytesRefAttribute instead. (Uwe Schindler)
|
and TermToBytesRefAttribute instead. (Uwe Schindler)
|
||||||
|
|
||||||
* LUCENE-2600: Remove IndexReader.isDeleted in favor of
|
* LUCENE-2600: Remove IndexReader.isDeleted in favor of
|
||||||
IndexReader.getDeletedDocs(). (Mike McCandless)
|
AtomicReader.getDeletedDocs(). (Mike McCandless)
|
||||||
|
|
||||||
* LUCENE-2667: FuzzyQuery's defaults have changed for more performant
|
* LUCENE-2667: FuzzyQuery's defaults have changed for more performant
|
||||||
behavior: the minimum similarity is 2 edit distances from the word,
|
behavior: the minimum similarity is 2 edit distances from the word,
|
||||||
|
@ -117,10 +130,6 @@ Changes in backwards compatibility policy
|
||||||
need to change it (e.g. using "\\" to escape '\' itself).
|
need to change it (e.g. using "\\" to escape '\' itself).
|
||||||
(Sunil Kamath, Terry Yang via Robert Muir)
|
(Sunil Kamath, Terry Yang via Robert Muir)
|
||||||
|
|
||||||
* LUCENE-2771: IndexReader.norms() now throws UOE on non-atomic IndexReaders. If
|
|
||||||
you really want a top-level norms, use MultiNorms or SlowMultiReaderWrapper.
|
|
||||||
(Uwe Schindler, Robert Muir)
|
|
||||||
|
|
||||||
* LUCENE-2837: Collapsed Searcher, Searchable into IndexSearcher;
|
* LUCENE-2837: Collapsed Searcher, Searchable into IndexSearcher;
|
||||||
removed contrib/remote and MultiSearcher (Mike McCandless); absorbed
|
removed contrib/remote and MultiSearcher (Mike McCandless); absorbed
|
||||||
ParallelMultiSearcher into IndexSearcher as an optional
|
ParallelMultiSearcher into IndexSearcher as an optional
|
||||||
|
@ -189,9 +198,9 @@ Changes in backwards compatibility policy
|
||||||
with the old tokenStream() method removed. Consequently it is now mandatory
|
with the old tokenStream() method removed. Consequently it is now mandatory
|
||||||
for all Analyzers to support reusability. (Chris Male)
|
for all Analyzers to support reusability. (Chris Male)
|
||||||
|
|
||||||
* LUCENE-3473: IndexReader.getUniqueTermCount() no longer throws UOE when
|
* LUCENE-3473: AtomicReader.getUniqueTermCount() no longer throws UOE when
|
||||||
it cannot be easily determined (e.g. Multi*Readers). Instead, it returns
|
it cannot be easily determined. Instead, it returns -1 to be consistent with
|
||||||
-1 to be consistent with this behavior across other index statistics.
|
this behavior across other index statistics.
|
||||||
(Robert Muir)
|
(Robert Muir)
|
||||||
|
|
||||||
* LUCENE-1536: The abstract FilteredDocIdSet.match() method is no longer
|
* LUCENE-1536: The abstract FilteredDocIdSet.match() method is no longer
|
||||||
|
@ -207,18 +216,18 @@ Changes in backwards compatibility policy
|
||||||
* LUCENE-3533: Removed SpanFilters, they created large lists of objects and
|
* LUCENE-3533: Removed SpanFilters, they created large lists of objects and
|
||||||
did not scale. (Robert Muir)
|
did not scale. (Robert Muir)
|
||||||
|
|
||||||
* LUCENE-3606: IndexReader was made read-only. It is no longer possible to
|
* LUCENE-3606: IndexReader and subclasses were made read-only. It is no longer
|
||||||
delete or undelete documents using IndexReader; you have to use IndexWriter
|
possible to delete or undelete documents using IndexReader; you have to use
|
||||||
now. As deleting by internal Lucene docID is no longer possible, this
|
IndexWriter now. As deleting by internal Lucene docID is no longer possible,
|
||||||
requires adding a unique identifier field to your index. Deleting/relying
|
this requires adding a unique identifier field to your index. Deleting/
|
||||||
upon Lucene docIDs is not recommended anyway, because they can change.
|
relying upon Lucene docIDs is not recommended anyway, because they can
|
||||||
Consequently commit() was removed and IndexReader.open(), openIfChanged(),
|
change. Consequently commit() was removed and DirectoryReader.open(),
|
||||||
and clone() no longer take readOnly booleans or IndexDeletionPolicy
|
openIfChanged() no longer take readOnly booleans or IndexDeletionPolicy
|
||||||
instances. Furthermore, IndexReader.setNorm() was removed. If you need
|
instances. Furthermore, IndexReader.setNorm() was removed. If you need
|
||||||
customized norm values, the recommended way to do this is by modifying
|
customized norm values, the recommended way to do this is by modifying
|
||||||
Similarity to use an external byte[] or one of the new DocValues
|
Similarity to use an external byte[] or one of the new DocValues
|
||||||
fields (LUCENE-3108). Alternatively, to dynamically change norms (boost
|
fields (LUCENE-3108). Alternatively, to dynamically change norms (boost
|
||||||
*and* length norm) at query time, wrap your IndexReader using
|
*and* length norm) at query time, wrap your AtomicReader using
|
||||||
FilterIndexReader, overriding FilterIndexReader.norms(). To persist the
|
FilterIndexReader, overriding FilterIndexReader.norms(). To persist the
|
||||||
changes on disk, copy the FilteredIndexReader to a new index using
|
changes on disk, copy the FilteredIndexReader to a new index using
|
||||||
IndexWriter.addIndexes(). (Uwe Schindler, Robert Muir)
|
IndexWriter.addIndexes(). (Uwe Schindler, Robert Muir)
|
||||||
|
@ -231,13 +240,10 @@ Changes in backwards compatibility policy
|
||||||
FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS. (Robert
|
FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS. (Robert
|
||||||
Muir, Mike McCandless)
|
Muir, Mike McCandless)
|
||||||
|
|
||||||
* LUCENE-3646: FieldCacheImpl now throws UOE on non-atomic IndexReaders. If
|
* LUCENE-2858: FilterIndexReader now extends AtomicReader. If you want to
|
||||||
you really want a top-level fieldcache, use SlowMultiReaderWrapper.
|
filter composite readers like DirectoryReader or MultiReader, filter
|
||||||
(Robert Muir)
|
their atomic leaves and build a new CompositeReader (e.g. MultiReader)
|
||||||
|
around them. (Uwe Schindler, Robert Muir)
|
||||||
* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract
|
|
||||||
AtomicReader, CompositeReader, and DirectoryReader. TODO:add more info
|
|
||||||
(Uwe Schindler, Mike McCandless, Robert Muir)
|
|
||||||
|
|
||||||
* LUCENE-3736: ParallelReader was split into ParallelAtomicReader
|
* LUCENE-3736: ParallelReader was split into ParallelAtomicReader
|
||||||
and ParallelCompositeReader. Lucene 3.x's ParallelReader is now
|
and ParallelCompositeReader. Lucene 3.x's ParallelReader is now
|
||||||
|
@ -335,9 +341,6 @@ Changes in Runtime Behavior
|
||||||
|
|
||||||
(Mike McCandless, Michael Busch, Simon Willnauer)
|
(Mike McCandless, Michael Busch, Simon Willnauer)
|
||||||
|
|
||||||
* LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
|
|
||||||
does not store norms. (Shai Erera, Mike McCandless)
|
|
||||||
|
|
||||||
* LUCENE-3309: Stored fields no longer record whether they were
|
* LUCENE-3309: Stored fields no longer record whether they were
|
||||||
tokenized or not. In general you should not rely on stored fields
|
tokenized or not. In general you should not rely on stored fields
|
||||||
to record any "metadata" from indexing (tokenized, omitNorms,
|
to record any "metadata" from indexing (tokenized, omitNorms,
|
||||||
|
@ -354,12 +357,12 @@ API Changes
|
||||||
and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
|
and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
|
||||||
searching.
|
searching.
|
||||||
|
|
||||||
* LUCENE-1458, LUCENE-2111: IndexReader now directly exposes its
|
* LUCENE-1458, LUCENE-2111: AtomicReader now directly exposes its
|
||||||
deleted docs (getDeletedDocs), providing a new Bits interface to
|
deleted docs (getDeletedDocs), providing a new Bits interface to
|
||||||
directly query by doc ID.
|
directly query by doc ID.
|
||||||
|
|
||||||
* LUCENE-2691: IndexWriter.getReader() has been made package local and is now
|
* LUCENE-2691: IndexWriter.getReader() has been made package local and is now
|
||||||
exposed via open and reopen methods on IndexReader. The semantics of the
|
exposed via open and reopen methods on DirectoryReader. The semantics of the
|
||||||
call is the same as it was prior to the API change.
|
call is the same as it was prior to the API change.
|
||||||
(Grant Ingersoll, Mike McCandless)
|
(Grant Ingersoll, Mike McCandless)
|
||||||
|
|
||||||
|
@ -370,14 +373,6 @@ API Changes
|
||||||
Collector#setNextReader & FieldComparator#setNextReader now expect an
|
Collector#setNextReader & FieldComparator#setNextReader now expect an
|
||||||
AtomicReaderContext instead of an IndexReader. (Simon Willnauer)
|
AtomicReaderContext instead of an IndexReader. (Simon Willnauer)
|
||||||
|
|
||||||
* LUCENE-2846: Remove the deprecated IndexReader.setNorm(int, String, float).
|
|
||||||
This method was only syntactic sugar for setNorm(int, String, byte), but
|
|
||||||
using the global Similarity.getDefault().encodeNormValue. Use the byte-based
|
|
||||||
method instead to ensure that the norm is encoded with your Similarity.
|
|
||||||
Also removed norms(String, byte[], int), which was only used by MultiReader
|
|
||||||
for building top-level norms. If you really need a top-level norms, use
|
|
||||||
MultiNorms or SlowMultiReaderWrapper. (Robert Muir, Mike Mccandless)
|
|
||||||
|
|
||||||
* LUCENE-2892: Add QueryParser.newFieldQuery (called by getFieldQuery by
|
* LUCENE-2892: Add QueryParser.newFieldQuery (called by getFieldQuery by
|
||||||
default) which takes Analyzer as a parameter, for easier customization by
|
default) which takes Analyzer as a parameter, for easier customization by
|
||||||
subclasses. (Robert Muir)
|
subclasses. (Robert Muir)
|
||||||
|
@ -485,7 +480,7 @@ New features
|
||||||
Sep*, makes it simple to take any variable sized int block coders
|
Sep*, makes it simple to take any variable sized int block coders
|
||||||
(like Simple9/16) and use them in a codec. (Mike McCandless)
|
(like Simple9/16) and use them in a codec. (Mike McCandless)
|
||||||
|
|
||||||
* LUCENE-2597: Add oal.index.SlowMultiReaderWrapper, to wrap a
|
* LUCENE-2597: Add oal.index.SlowCompositeReaderWrapper, to wrap a
|
||||||
composite reader (eg MultiReader or DirectoryReader), making it
|
composite reader (eg MultiReader or DirectoryReader), making it
|
||||||
pretend it's an atomic reader. This is a convenience class (you can
|
pretend it's an atomic reader. This is a convenience class (you can
|
||||||
use MultiFields static methods directly, instead) if you need to use
|
use MultiFields static methods directly, instead) if you need to use
|
||||||
|
@ -621,7 +616,7 @@ New features
|
||||||
* LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements
|
* LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements
|
||||||
a new bits() method, returning all documents in a random access way. If the
|
a new bits() method, returning all documents in a random access way. If the
|
||||||
DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer
|
DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer
|
||||||
as replacement for IndexReader's live docs.
|
as replacement for AtomicReader's live docs.
|
||||||
In addition, FilteredQuery backs now IndexSearcher's filtering search methods.
|
In addition, FilteredQuery backs now IndexSearcher's filtering search methods.
|
||||||
Using FilteredQuery you can chain Filters in a very performant way
|
Using FilteredQuery you can chain Filters in a very performant way
|
||||||
[new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not
|
[new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not
|
||||||
|
@ -635,7 +630,7 @@ New features
|
||||||
load only certain fields when loading a document. (Peter Chang via
|
load only certain fields when loading a document. (Peter Chang via
|
||||||
Mike McCandless)
|
Mike McCandless)
|
||||||
|
|
||||||
* LUCENE-3628: Norms are represented as DocValues. IndexReader exposes
|
* LUCENE-3628: Norms are represented as DocValues. AtomicReader exposes
|
||||||
a #normValues(String) method to obtain norms per field. (Simon Willnauer)
|
a #normValues(String) method to obtain norms per field. (Simon Willnauer)
|
||||||
|
|
||||||
* LUCENE-3687: Similarity#computeNorm(FieldInvertState, Norm) allows to compute
|
* LUCENE-3687: Similarity#computeNorm(FieldInvertState, Norm) allows to compute
|
||||||
|
|
|
@ -278,6 +278,66 @@ LUCENE-1458, LUCENE-2111: Flexible Indexing
|
||||||
// document is deleted...
|
// document is deleted...
|
||||||
}
|
}
|
||||||
|
|
||||||
|
* LUCENE-2858, LUCENE-3733: The abstract class IndexReader has been
|
||||||
|
refactored to expose only essential methods to access stored fields
|
||||||
|
during display of search results. It is no longer possible to retrieve
|
||||||
|
terms or postings data from the underlying index, not even deletions are
|
||||||
|
visible anymore. You can still pass IndexReader as constructor parameter
|
||||||
|
to IndexSearcher and execute your searches; Lucene will automatically
|
||||||
|
delegate procedures like query rewriting and document collection atomic
|
||||||
|
subreaders.
|
||||||
|
|
||||||
|
If you want to dive deeper into the index and want to write own queries,
|
||||||
|
take a closer look at the new abstract sub-classes AtomicReader and
|
||||||
|
CompositeReader:
|
||||||
|
|
||||||
|
AtomicReader instances are now the only source of Terms, Postings,
|
||||||
|
DocValues and FieldCache. Queries are forced to execute on a Atomic
|
||||||
|
reader on a per-segment basis and FieldCaches are keyed by
|
||||||
|
AtomicReaders.
|
||||||
|
|
||||||
|
Its counterpart CompositeReader exposes a utility method to retrieve
|
||||||
|
its composites. But watch out, composites are not necessarily atomic.
|
||||||
|
Next to the added type-safety we also removed the notion of
|
||||||
|
index-commits and version numbers from the abstract IndexReader, the
|
||||||
|
associations with IndexWriter were pulled into a specialized
|
||||||
|
DirectoryReader. To open Directory-based indexes use
|
||||||
|
DirectoryReader.open(), the corresponding method in IndexReader is now
|
||||||
|
deprecated for easier migration. Only DirectoryReader supports commits,
|
||||||
|
versions, and reopening with openIfChanged(). Terms, postings,
|
||||||
|
docvalues, and norms can from now on only be retrieved using
|
||||||
|
AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
|
||||||
|
only offering stored fields and access to the sub-readers (which may be
|
||||||
|
composite or atomic).
|
||||||
|
|
||||||
|
If you have more advanced code dealing with custom Filters, you might
|
||||||
|
have noticed another new class hierarchy in Lucene (see LUCENE-2831):
|
||||||
|
IndexReaderContext with corresponding Atomic-/CompositeReaderContext.
|
||||||
|
|
||||||
|
The move towards per-segment search Lucene 2.9 exposed lots of custom
|
||||||
|
Queries and Filters that couldn't handle it. For example, some Filter
|
||||||
|
implementations expected the IndexReader passed in is identical to the
|
||||||
|
IndexReader passed to IndexSearcher with all its advantages like
|
||||||
|
absolute document IDs etc. Obviously this "paradigm-shift" broke lots of
|
||||||
|
applications and especially those that utilized cross-segment data
|
||||||
|
structures (like Apache Solr).
|
||||||
|
|
||||||
|
In Lucene 4.0, we introduce IndexReaderContexts "searcher-private"
|
||||||
|
reader hierarchy. During Query or Filter execution Lucene no longer
|
||||||
|
passes raw readers down Queries, Filters or Collectors; instead
|
||||||
|
components are provided an AtomicReaderContext (essentially a hierarchy
|
||||||
|
leaf) holding relative properties like the document-basis in relation to
|
||||||
|
the top-level reader. This allows Queries & Filter to build up logic
|
||||||
|
based on document IDs, albeit the per-segment orientation.
|
||||||
|
|
||||||
|
There are still valid use-cases where top-level readers ie. "atomic
|
||||||
|
views" on the index are desirable. Let say you want to iterate all terms
|
||||||
|
of a complete index for auto-completion or facetting, Lucene provides
|
||||||
|
utility wrappers like SlowCompositeReaderWrapper (LUCENE-2597) emulating
|
||||||
|
an AtomicReader. Note: using "atomicity emulators" can cause serious
|
||||||
|
slowdowns due to the need to merge terms, postings, DocValues, and
|
||||||
|
FieldCache, use them with care!
|
||||||
|
|
||||||
* LUCENE-2674: A new idfExplain method was added to Similarity, that
|
* LUCENE-2674: A new idfExplain method was added to Similarity, that
|
||||||
accepts an incoming docFreq. If you subclass Similarity, make sure
|
accepts an incoming docFreq. If you subclass Similarity, make sure
|
||||||
you also override this method on upgrade, otherwise your
|
you also override this method on upgrade, otherwise your
|
||||||
|
|
|
@ -326,8 +326,8 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
|
||||||
Weight#normalize(float)</a> — Determine the query normalization factor. The query normalization may
|
Weight#normalize(float)</a> — Determine the query normalization factor. The query normalization may
|
||||||
allow for comparing scores between queries.</li>
|
allow for comparing scores between queries.</li>
|
||||||
<li>
|
<li>
|
||||||
<a href="Weight.html#scorer(org.apache.lucene.index.IndexReader, boolean, boolean)">
|
<a href="Weight.html#scorer(org.apache.lucene.index.AtomicReaderContext, boolean, boolean)">
|
||||||
Weight#scorer(IndexReader, boolean, boolean)</a> — Construct a new
|
Weight#scorer(AtomicReaderContext, boolean, boolean)</a> — Construct a new
|
||||||
<a href="Scorer.html">Scorer</a>
|
<a href="Scorer.html">Scorer</a>
|
||||||
for this Weight. See
|
for this Weight. See
|
||||||
<a href="#The Scorer Class">The Scorer Class</a>
|
<a href="#The Scorer Class">The Scorer Class</a>
|
||||||
|
@ -335,8 +335,8 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
|
||||||
Scorer is responsible for doing the actual scoring of documents given the Query.
|
Scorer is responsible for doing the actual scoring of documents given the Query.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<a href="Weight.html#explain(org.apache.lucene.search.Searcher, org.apache.lucene.index.IndexReader, int)">
|
<a href="Weight.html#explain(org.apache.lucene.search.Searcher, org.apache.lucene.index.AtomicReaderContext, int)">
|
||||||
Weight#explain(Searcher, IndexReader, int)</a> — Provide a means for explaining why a given document was
|
Weight#explain(Searcher, AtomicReaderContext, int)</a> — Provide a means for explaining why a given document was
|
||||||
scored
|
scored
|
||||||
the way it was.</li>
|
the way it was.</li>
|
||||||
</ol>
|
</ol>
|
||||||
|
|
|
@ -61,7 +61,7 @@ to check if the results are what we expect):</p>
|
||||||
iwriter.close();
|
iwriter.close();
|
||||||
|
|
||||||
// Now search the index:
|
// Now search the index:
|
||||||
IndexReader ireader = IndexReader.open(directory); // read-only=true
|
DirectoryReader ireader = DirectoryReader.open(directory); // read-only=true
|
||||||
IndexSearcher isearcher = new IndexSearcher(ireader);
|
IndexSearcher isearcher = new IndexSearcher(ireader);
|
||||||
// Parse a simple query that searches for "text":
|
// Parse a simple query that searches for "text":
|
||||||
QueryParser parser = new QueryParser("fieldname", analyzer);
|
QueryParser parser = new QueryParser("fieldname", analyzer);
|
||||||
|
|
Loading…
Reference in New Issue