mirror of https://github.com/apache/lucene.git
LUCENE-3733: Update documentation, changes,...
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1243048 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
cdc2173e09
commit
0095bd10fd
|
@ -78,6 +78,19 @@ Changes in backwards compatibility policy
|
|||
|
||||
(Mike McCandless, Robert Muir, Uwe Schindler, Mark Miller, Michael Busch)
|
||||
|
||||
* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract
|
||||
AtomicReader, CompositeReader, and DirectoryReader. To open Directory-
|
||||
based indexes use DirectoryReader.open(), the corresponding method in
|
||||
IndexReader is now deprecated for easier migration. Only DirectoryReader
|
||||
supports commits, versions, and reopening with openIfChanged(). Terms,
|
||||
postings, docvalues, and norms can from now on only be retrieved using
|
||||
AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
|
||||
only offering stored fields and access to the sub-readers (which may be
|
||||
composite or atomic). SlowCompositeReaderWrapper (LUCENE-2597) can be
|
||||
used to emulate atomic readers on top of composites.
|
||||
Please review MIGRATE.txt for information how to migrate old code.
|
||||
(Uwe Schindler, Robert Muir, Mike McCandless)
|
||||
|
||||
* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
|
||||
not unicode code units. For example, a Wildcard "?" represents any unicode
|
||||
character. Furthermore, the rest of the automaton package and RegexpQuery use
|
||||
|
@ -98,7 +111,7 @@ Changes in backwards compatibility policy
|
|||
and TermToBytesRefAttribute instead. (Uwe Schindler)
|
||||
|
||||
* LUCENE-2600: Remove IndexReader.isDeleted in favor of
|
||||
IndexReader.getDeletedDocs(). (Mike McCandless)
|
||||
AtomicReader.getDeletedDocs(). (Mike McCandless)
|
||||
|
||||
* LUCENE-2667: FuzzyQuery's defaults have changed for more performant
|
||||
behavior: the minimum similarity is 2 edit distances from the word,
|
||||
|
@ -117,10 +130,6 @@ Changes in backwards compatibility policy
|
|||
need to change it (e.g. using "\\" to escape '\' itself).
|
||||
(Sunil Kamath, Terry Yang via Robert Muir)
|
||||
|
||||
* LUCENE-2771: IndexReader.norms() now throws UOE on non-atomic IndexReaders. If
|
||||
you really want a top-level norms, use MultiNorms or SlowMultiReaderWrapper.
|
||||
(Uwe Schindler, Robert Muir)
|
||||
|
||||
* LUCENE-2837: Collapsed Searcher, Searchable into IndexSearcher;
|
||||
removed contrib/remote and MultiSearcher (Mike McCandless); absorbed
|
||||
ParallelMultiSearcher into IndexSearcher as an optional
|
||||
|
@ -189,9 +198,9 @@ Changes in backwards compatibility policy
|
|||
with the old tokenStream() method removed. Consequently it is now mandatory
|
||||
for all Analyzers to support reusability. (Chris Male)
|
||||
|
||||
* LUCENE-3473: IndexReader.getUniqueTermCount() no longer throws UOE when
|
||||
it cannot be easily determined (e.g. Multi*Readers). Instead, it returns
|
||||
-1 to be consistent with this behavior across other index statistics.
|
||||
* LUCENE-3473: AtomicReader.getUniqueTermCount() no longer throws UOE when
|
||||
it cannot be easily determined. Instead, it returns -1 to be consistent with
|
||||
this behavior across other index statistics.
|
||||
(Robert Muir)
|
||||
|
||||
* LUCENE-1536: The abstract FilteredDocIdSet.match() method is no longer
|
||||
|
@ -207,18 +216,18 @@ Changes in backwards compatibility policy
|
|||
* LUCENE-3533: Removed SpanFilters, they created large lists of objects and
|
||||
did not scale. (Robert Muir)
|
||||
|
||||
* LUCENE-3606: IndexReader was made read-only. It is no longer possible to
|
||||
delete or undelete documents using IndexReader; you have to use IndexWriter
|
||||
now. As deleting by internal Lucene docID is no longer possible, this
|
||||
requires adding a unique identifier field to your index. Deleting/relying
|
||||
upon Lucene docIDs is not recommended anyway, because they can change.
|
||||
Consequently commit() was removed and IndexReader.open(), openIfChanged(),
|
||||
and clone() no longer take readOnly booleans or IndexDeletionPolicy
|
||||
* LUCENE-3606: IndexReader and subclasses were made read-only. It is no longer
|
||||
possible to delete or undelete documents using IndexReader; you have to use
|
||||
IndexWriter now. As deleting by internal Lucene docID is no longer possible,
|
||||
this requires adding a unique identifier field to your index. Deleting/
|
||||
relying upon Lucene docIDs is not recommended anyway, because they can
|
||||
change. Consequently commit() was removed and DirectoryReader.open(),
|
||||
openIfChanged() no longer take readOnly booleans or IndexDeletionPolicy
|
||||
instances. Furthermore, IndexReader.setNorm() was removed. If you need
|
||||
customized norm values, the recommended way to do this is by modifying
|
||||
Similarity to use an external byte[] or one of the new DocValues
|
||||
fields (LUCENE-3108). Alternatively, to dynamically change norms (boost
|
||||
*and* length norm) at query time, wrap your IndexReader using
|
||||
*and* length norm) at query time, wrap your AtomicReader using
|
||||
FilterIndexReader, overriding FilterIndexReader.norms(). To persist the
|
||||
changes on disk, copy the FilteredIndexReader to a new index using
|
||||
IndexWriter.addIndexes(). (Uwe Schindler, Robert Muir)
|
||||
|
@ -231,13 +240,10 @@ Changes in backwards compatibility policy
|
|||
FieldInfo.IndexOption: DOCS_AND_POSITIONS_AND_OFFSETS. (Robert
|
||||
Muir, Mike McCandless)
|
||||
|
||||
* LUCENE-3646: FieldCacheImpl now throws UOE on non-atomic IndexReaders. If
|
||||
you really want a top-level fieldcache, use SlowMultiReaderWrapper.
|
||||
(Robert Muir)
|
||||
|
||||
* LUCENE-2858, LUCENE-3733: IndexReader was refactored into abstract
|
||||
AtomicReader, CompositeReader, and DirectoryReader. TODO:add more info
|
||||
(Uwe Schindler, Mike McCandless, Robert Muir)
|
||||
* LUCENE-2858: FilterIndexReader now extends AtomicReader. If you want to
|
||||
filter composite readers like DirectoryReader or MultiReader, filter
|
||||
their atomic leaves and build a new CompositeReader (e.g. MultiReader)
|
||||
around them. (Uwe Schindler, Robert Muir)
|
||||
|
||||
* LUCENE-3736: ParallelReader was split into ParallelAtomicReader
|
||||
and ParallelCompositeReader. Lucene 3.x's ParallelReader is now
|
||||
|
@ -335,9 +341,6 @@ Changes in Runtime Behavior
|
|||
|
||||
(Mike McCandless, Michael Busch, Simon Willnauer)
|
||||
|
||||
* LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
|
||||
does not store norms. (Shai Erera, Mike McCandless)
|
||||
|
||||
* LUCENE-3309: Stored fields no longer record whether they were
|
||||
tokenized or not. In general you should not rely on stored fields
|
||||
to record any "metadata" from indexing (tokenized, omitNorms,
|
||||
|
@ -354,12 +357,12 @@ API Changes
|
|||
and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
|
||||
searching.
|
||||
|
||||
* LUCENE-1458, LUCENE-2111: IndexReader now directly exposes its
|
||||
* LUCENE-1458, LUCENE-2111: AtomicReader now directly exposes its
|
||||
deleted docs (getDeletedDocs), providing a new Bits interface to
|
||||
directly query by doc ID.
|
||||
|
||||
* LUCENE-2691: IndexWriter.getReader() has been made package local and is now
|
||||
exposed via open and reopen methods on IndexReader. The semantics of the
|
||||
exposed via open and reopen methods on DirectoryReader. The semantics of the
|
||||
call is the same as it was prior to the API change.
|
||||
(Grant Ingersoll, Mike McCandless)
|
||||
|
||||
|
@ -370,14 +373,6 @@ API Changes
|
|||
Collector#setNextReader & FieldComparator#setNextReader now expect an
|
||||
AtomicReaderContext instead of an IndexReader. (Simon Willnauer)
|
||||
|
||||
* LUCENE-2846: Remove the deprecated IndexReader.setNorm(int, String, float).
|
||||
This method was only syntactic sugar for setNorm(int, String, byte), but
|
||||
using the global Similarity.getDefault().encodeNormValue. Use the byte-based
|
||||
method instead to ensure that the norm is encoded with your Similarity.
|
||||
Also removed norms(String, byte[], int), which was only used by MultiReader
|
||||
for building top-level norms. If you really need a top-level norms, use
|
||||
MultiNorms or SlowMultiReaderWrapper. (Robert Muir, Mike Mccandless)
|
||||
|
||||
* LUCENE-2892: Add QueryParser.newFieldQuery (called by getFieldQuery by
|
||||
default) which takes Analyzer as a parameter, for easier customization by
|
||||
subclasses. (Robert Muir)
|
||||
|
@ -485,7 +480,7 @@ New features
|
|||
Sep*, makes it simple to take any variable sized int block coders
|
||||
(like Simple9/16) and use them in a codec. (Mike McCandless)
|
||||
|
||||
* LUCENE-2597: Add oal.index.SlowMultiReaderWrapper, to wrap a
|
||||
* LUCENE-2597: Add oal.index.SlowCompositeReaderWrapper, to wrap a
|
||||
composite reader (eg MultiReader or DirectoryReader), making it
|
||||
pretend it's an atomic reader. This is a convenience class (you can
|
||||
use MultiFields static methods directly, instead) if you need to use
|
||||
|
@ -621,7 +616,7 @@ New features
|
|||
* LUCENE-1536: Filters can now be applied down-low, if their DocIdSet implements
|
||||
a new bits() method, returning all documents in a random access way. If the
|
||||
DocIdSet is not too sparse, it will be passed as acceptDocs down to the Scorer
|
||||
as replacement for IndexReader's live docs.
|
||||
as replacement for AtomicReader's live docs.
|
||||
In addition, FilteredQuery backs now IndexSearcher's filtering search methods.
|
||||
Using FilteredQuery you can chain Filters in a very performant way
|
||||
[new FilteredQuery(new FilteredQuery(query, filter1), filter2)], which was not
|
||||
|
@ -635,7 +630,7 @@ New features
|
|||
load only certain fields when loading a document. (Peter Chang via
|
||||
Mike McCandless)
|
||||
|
||||
* LUCENE-3628: Norms are represented as DocValues. IndexReader exposes
|
||||
* LUCENE-3628: Norms are represented as DocValues. AtomicReader exposes
|
||||
a #normValues(String) method to obtain norms per field. (Simon Willnauer)
|
||||
|
||||
* LUCENE-3687: Similarity#computeNorm(FieldInvertState, Norm) allows to compute
|
||||
|
|
|
@ -278,6 +278,66 @@ LUCENE-1458, LUCENE-2111: Flexible Indexing
|
|||
// document is deleted...
|
||||
}
|
||||
|
||||
* LUCENE-2858, LUCENE-3733: The abstract class IndexReader has been
|
||||
refactored to expose only essential methods to access stored fields
|
||||
during display of search results. It is no longer possible to retrieve
|
||||
terms or postings data from the underlying index, not even deletions are
|
||||
visible anymore. You can still pass IndexReader as constructor parameter
|
||||
to IndexSearcher and execute your searches; Lucene will automatically
|
||||
delegate procedures like query rewriting and document collection atomic
|
||||
subreaders.
|
||||
|
||||
If you want to dive deeper into the index and want to write own queries,
|
||||
take a closer look at the new abstract sub-classes AtomicReader and
|
||||
CompositeReader:
|
||||
|
||||
AtomicReader instances are now the only source of Terms, Postings,
|
||||
DocValues and FieldCache. Queries are forced to execute on a Atomic
|
||||
reader on a per-segment basis and FieldCaches are keyed by
|
||||
AtomicReaders.
|
||||
|
||||
Its counterpart CompositeReader exposes a utility method to retrieve
|
||||
its composites. But watch out, composites are not necessarily atomic.
|
||||
Next to the added type-safety we also removed the notion of
|
||||
index-commits and version numbers from the abstract IndexReader, the
|
||||
associations with IndexWriter were pulled into a specialized
|
||||
DirectoryReader. To open Directory-based indexes use
|
||||
DirectoryReader.open(), the corresponding method in IndexReader is now
|
||||
deprecated for easier migration. Only DirectoryReader supports commits,
|
||||
versions, and reopening with openIfChanged(). Terms, postings,
|
||||
docvalues, and norms can from now on only be retrieved using
|
||||
AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
|
||||
only offering stored fields and access to the sub-readers (which may be
|
||||
composite or atomic).
|
||||
|
||||
If you have more advanced code dealing with custom Filters, you might
|
||||
have noticed another new class hierarchy in Lucene (see LUCENE-2831):
|
||||
IndexReaderContext with corresponding Atomic-/CompositeReaderContext.
|
||||
|
||||
The move towards per-segment search Lucene 2.9 exposed lots of custom
|
||||
Queries and Filters that couldn't handle it. For example, some Filter
|
||||
implementations expected the IndexReader passed in is identical to the
|
||||
IndexReader passed to IndexSearcher with all its advantages like
|
||||
absolute document IDs etc. Obviously this "paradigm-shift" broke lots of
|
||||
applications and especially those that utilized cross-segment data
|
||||
structures (like Apache Solr).
|
||||
|
||||
In Lucene 4.0, we introduce IndexReaderContexts "searcher-private"
|
||||
reader hierarchy. During Query or Filter execution Lucene no longer
|
||||
passes raw readers down Queries, Filters or Collectors; instead
|
||||
components are provided an AtomicReaderContext (essentially a hierarchy
|
||||
leaf) holding relative properties like the document-basis in relation to
|
||||
the top-level reader. This allows Queries & Filter to build up logic
|
||||
based on document IDs, albeit the per-segment orientation.
|
||||
|
||||
There are still valid use-cases where top-level readers ie. "atomic
|
||||
views" on the index are desirable. Let say you want to iterate all terms
|
||||
of a complete index for auto-completion or facetting, Lucene provides
|
||||
utility wrappers like SlowCompositeReaderWrapper (LUCENE-2597) emulating
|
||||
an AtomicReader. Note: using "atomicity emulators" can cause serious
|
||||
slowdowns due to the need to merge terms, postings, DocValues, and
|
||||
FieldCache, use them with care!
|
||||
|
||||
* LUCENE-2674: A new idfExplain method was added to Similarity, that
|
||||
accepts an incoming docFreq. If you subclass Similarity, make sure
|
||||
you also override this method on upgrade, otherwise your
|
||||
|
|
|
@ -326,8 +326,8 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
|
|||
Weight#normalize(float)</a> — Determine the query normalization factor. The query normalization may
|
||||
allow for comparing scores between queries.</li>
|
||||
<li>
|
||||
<a href="Weight.html#scorer(org.apache.lucene.index.IndexReader, boolean, boolean)">
|
||||
Weight#scorer(IndexReader, boolean, boolean)</a> — Construct a new
|
||||
<a href="Weight.html#scorer(org.apache.lucene.index.AtomicReaderContext, boolean, boolean)">
|
||||
Weight#scorer(AtomicReaderContext, boolean, boolean)</a> — Construct a new
|
||||
<a href="Scorer.html">Scorer</a>
|
||||
for this Weight. See
|
||||
<a href="#The Scorer Class">The Scorer Class</a>
|
||||
|
@ -335,8 +335,8 @@ org.apache.lucene.search.IndexSearcher#search(Query,Filter,int)}.
|
|||
Scorer is responsible for doing the actual scoring of documents given the Query.
|
||||
</li>
|
||||
<li>
|
||||
<a href="Weight.html#explain(org.apache.lucene.search.Searcher, org.apache.lucene.index.IndexReader, int)">
|
||||
Weight#explain(Searcher, IndexReader, int)</a> — Provide a means for explaining why a given document was
|
||||
<a href="Weight.html#explain(org.apache.lucene.search.Searcher, org.apache.lucene.index.AtomicReaderContext, int)">
|
||||
Weight#explain(Searcher, AtomicReaderContext, int)</a> — Provide a means for explaining why a given document was
|
||||
scored
|
||||
the way it was.</li>
|
||||
</ol>
|
||||
|
|
|
@ -61,7 +61,7 @@ to check if the results are what we expect):</p>
|
|||
iwriter.close();
|
||||
|
||||
// Now search the index:
|
||||
IndexReader ireader = IndexReader.open(directory); // read-only=true
|
||||
DirectoryReader ireader = DirectoryReader.open(directory); // read-only=true
|
||||
IndexSearcher isearcher = new IndexSearcher(ireader);
|
||||
// Parse a simple query that searches for "text":
|
||||
QueryParser parser = new QueryParser("fieldname", analyzer);
|
||||
|
|
Loading…
Reference in New Issue