mirror of https://github.com/apache/lucene.git
javadocs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene4547@1436696 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
28567c2327
commit
a53962cf5f
|
@ -37,10 +37,7 @@ import org.apache.lucene.util.PriorityQueue;
|
||||||
|
|
||||||
// prototype streaming DV api
|
// prototype streaming DV api
|
||||||
public abstract class DocValuesConsumer implements Closeable {
|
public abstract class DocValuesConsumer implements Closeable {
|
||||||
// TODO: are any of these params too "infringing" on codec?
|
|
||||||
// we want codec to get necessary stuff from IW, but trading off against merge complexity.
|
|
||||||
|
|
||||||
// nocommit should we pass SegmentWriteState...?
|
|
||||||
public abstract void addNumericField(FieldInfo field, Iterable<Number> values) throws IOException;
|
public abstract void addNumericField(FieldInfo field, Iterable<Number> values) throws IOException;
|
||||||
|
|
||||||
public abstract void addBinaryField(FieldInfo field, Iterable<BytesRef> values) throws IOException;
|
public abstract void addBinaryField(FieldInfo field, Iterable<BytesRef> values) throws IOException;
|
||||||
|
|
|
@ -363,11 +363,11 @@ file, previously they were stored in text format only.</li>
|
||||||
frequencies.</li>
|
frequencies.</li>
|
||||||
<li>In version 4.0, the format of the inverted index became extensible via
|
<li>In version 4.0, the format of the inverted index became extensible via
|
||||||
the {@link org.apache.lucene.codecs.Codec Codec} api. Fast per-document storage
|
the {@link org.apache.lucene.codecs.Codec Codec} api. Fast per-document storage
|
||||||
({@link org.apache.lucene.index.DocValues DocValues}) was introduced. Normalization
|
({@code DocValues}) was introduced. Normalization factors need no longer be a
|
||||||
factors need no longer be a single byte, they can be any DocValues
|
single byte, they can be any {@link org.apache.lucene.index.NumericDocValues NumericDocValues}.
|
||||||
{@link org.apache.lucene.index.DocValues.Type type}. Terms need not be unicode
|
Terms need not be unicode strings, they can be any byte sequence. Term offsets
|
||||||
strings, they can be any byte sequence. Term offsets can optionally be indexed
|
can optionally be indexed into the postings lists. Payloads can be stored in the
|
||||||
into the postings lists. Payloads can be stored in the term vectors.</li>
|
term vectors.</li>
|
||||||
</ul>
|
</ul>
|
||||||
<a name="Limitations" id="Limitations"></a>
|
<a name="Limitations" id="Limitations"></a>
|
||||||
<h2>Limitations</h2>
|
<h2>Limitations</h2>
|
||||||
|
|
|
@ -368,11 +368,11 @@ file, previously they were stored in text format only.</li>
|
||||||
frequencies.</li>
|
frequencies.</li>
|
||||||
<li>In version 4.0, the format of the inverted index became extensible via
|
<li>In version 4.0, the format of the inverted index became extensible via
|
||||||
the {@link org.apache.lucene.codecs.Codec Codec} api. Fast per-document storage
|
the {@link org.apache.lucene.codecs.Codec Codec} api. Fast per-document storage
|
||||||
({@link org.apache.lucene.index.DocValues DocValues}) was introduced. Normalization
|
({@code DocValues}) was introduced. Normalization factors need no longer be a
|
||||||
factors need no longer be a single byte, they can be any DocValues
|
single byte, they can be any {@link org.apache.lucene.index.NumericDocValues NumericDocValues}.
|
||||||
{@link org.apache.lucene.index.DocValues.Type type}. Terms need not be unicode
|
Terms need not be unicode strings, they can be any byte sequence. Term offsets
|
||||||
strings, they can be any byte sequence. Term offsets can optionally be indexed
|
can optionally be indexed into the postings lists. Payloads can be stored in the
|
||||||
into the postings lists. Payloads can be stored in the term vectors.</li>
|
term vectors.</li>
|
||||||
<li>In version 4.1, the format of the postings list changed to use either
|
<li>In version 4.1, the format of the postings list changed to use either
|
||||||
of FOR compression or variable-byte encoding, depending upon the frequency
|
of FOR compression or variable-byte encoding, depending upon the frequency
|
||||||
of the term.</li>
|
of the term.</li>
|
||||||
|
|
|
@ -368,11 +368,11 @@ file, previously they were stored in text format only.</li>
|
||||||
frequencies.</li>
|
frequencies.</li>
|
||||||
<li>In version 4.0, the format of the inverted index became extensible via
|
<li>In version 4.0, the format of the inverted index became extensible via
|
||||||
the {@link org.apache.lucene.codecs.Codec Codec} api. Fast per-document storage
|
the {@link org.apache.lucene.codecs.Codec Codec} api. Fast per-document storage
|
||||||
({@link org.apache.lucene.index.DocValues DocValues}) was introduced. Normalization
|
({@code DocValues}) was introduced. Normalization factors need no longer be a
|
||||||
factors need no longer be a single byte, they can be any DocValues
|
single byte, they can be any {@link org.apache.lucene.index.NumericDocValues NumericDocValues}.
|
||||||
{@link org.apache.lucene.index.DocValues.Type type}. Terms need not be unicode
|
Terms need not be unicode strings, they can be any byte sequence. Term offsets
|
||||||
strings, they can be any byte sequence. Term offsets can optionally be indexed
|
can optionally be indexed into the postings lists. Payloads can be stored in the
|
||||||
into the postings lists. Payloads can be stored in the term vectors.</li>
|
term vectors.</li>
|
||||||
<li>In version 4.1, the format of the postings list changed to use either
|
<li>In version 4.1, the format of the postings list changed to use either
|
||||||
of FOR compression or variable-byte encoding, depending upon the frequency
|
of FOR compression or variable-byte encoding, depending upon the frequency
|
||||||
of the term.</li>
|
of the term.</li>
|
||||||
|
|
|
@ -182,9 +182,6 @@ public abstract class PerFieldDocValuesFormat extends DocValuesFormat {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// nocommit what if SimpleNormsFormat wants to use this
|
|
||||||
// ...? we have a "boolean isNorms" issue...? I guess we
|
|
||||||
// just need to make a PerFieldNormsFormat?
|
|
||||||
private class FieldsReader extends DocValuesProducer {
|
private class FieldsReader extends DocValuesProducer {
|
||||||
|
|
||||||
private final Map<String,DocValuesProducer> fields = new TreeMap<String,DocValuesProducer>();
|
private final Map<String,DocValuesProducer> fields = new TreeMap<String,DocValuesProducer>();
|
||||||
|
|
|
@ -416,7 +416,7 @@ public class FieldType implements IndexableFieldType {
|
||||||
* {@inheritDoc}
|
* {@inheritDoc}
|
||||||
* <p>
|
* <p>
|
||||||
* The default is <code>null</code> (no docValues)
|
* The default is <code>null</code> (no docValues)
|
||||||
* @see #setDocValueType(DocValuesType)
|
* @see #setDocValueType(org.apache.lucene.index.FieldInfo.DocValuesType)
|
||||||
*/
|
*/
|
||||||
@Override
|
@Override
|
||||||
public DocValuesType docValueType() {
|
public DocValuesType docValueType() {
|
||||||
|
|
|
@ -175,10 +175,10 @@ public abstract class AtomicReader extends IndexReader {
|
||||||
* used by a single thread. */
|
* used by a single thread. */
|
||||||
public abstract SortedDocValues getSortedDocValues(String field) throws IOException;
|
public abstract SortedDocValues getSortedDocValues(String field) throws IOException;
|
||||||
|
|
||||||
// nocommit document that these are thread-private:
|
|
||||||
/** Returns {@link NumericDocValues} representing norms
|
/** Returns {@link NumericDocValues} representing norms
|
||||||
* for this field, or null if no {@link NumericDocValues}
|
* for this field, or null if no {@link NumericDocValues}
|
||||||
* were indexed. */
|
* were indexed. The returned instance should only be
|
||||||
|
* used by a single thread. */
|
||||||
public abstract NumericDocValues getNormValues(String field) throws IOException;
|
public abstract NumericDocValues getNormValues(String field) throws IOException;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|
|
@ -19,6 +19,9 @@ package org.apache.lucene.index;
|
||||||
|
|
||||||
import org.apache.lucene.util.BytesRef;
|
import org.apache.lucene.util.BytesRef;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A per-document byte[]
|
||||||
|
*/
|
||||||
public abstract class BinaryDocValues {
|
public abstract class BinaryDocValues {
|
||||||
|
|
||||||
/** Lookup the value for document.
|
/** Lookup the value for document.
|
||||||
|
@ -29,8 +32,12 @@ public abstract class BinaryDocValues {
|
||||||
* "private" instance should be used for each source. */
|
* "private" instance should be used for each source. */
|
||||||
public abstract void get(int docID, BytesRef result);
|
public abstract void get(int docID, BytesRef result);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Indicates the value was missing for the document.
|
||||||
|
*/
|
||||||
public static final byte[] MISSING = new byte[0];
|
public static final byte[] MISSING = new byte[0];
|
||||||
|
|
||||||
|
/** An empty BinaryDocValues which returns empty bytes for every document */
|
||||||
public static final BinaryDocValues EMPTY = new BinaryDocValues() {
|
public static final BinaryDocValues EMPTY = new BinaryDocValues() {
|
||||||
@Override
|
@Override
|
||||||
public void get(int docID, BytesRef result) {
|
public void get(int docID, BytesRef result) {
|
||||||
|
|
|
@ -17,9 +17,18 @@ package org.apache.lucene.index;
|
||||||
* limitations under the License.
|
* limitations under the License.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A per-document numeric value.
|
||||||
|
*/
|
||||||
public abstract class NumericDocValues {
|
public abstract class NumericDocValues {
|
||||||
|
/**
|
||||||
|
* Returns the numeric value for the specified document ID.
|
||||||
|
* @param docID document ID to lookup
|
||||||
|
* @return numeric value
|
||||||
|
*/
|
||||||
public abstract long get(int docID);
|
public abstract long get(int docID);
|
||||||
|
|
||||||
|
/** An empty NumericDocValues which returns zero for every document */
|
||||||
public static final NumericDocValues EMPTY = new NumericDocValues() {
|
public static final NumericDocValues EMPTY = new NumericDocValues() {
|
||||||
@Override
|
@Override
|
||||||
public long get(int docID) {
|
public long get(int docID) {
|
||||||
|
|
|
@ -19,11 +19,35 @@ package org.apache.lucene.index;
|
||||||
|
|
||||||
import org.apache.lucene.util.BytesRef;
|
import org.apache.lucene.util.BytesRef;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A per-document byte[] with presorted values.
|
||||||
|
* <p>
|
||||||
|
* Per-Document values in a SortedDocValues are deduplicated, dereferenced,
|
||||||
|
* and sorted into a dictionary of unique values. A pointer to the
|
||||||
|
* dictionary value (ordinal) can be retrieved for each document. Ordinals
|
||||||
|
* are dense and in increasing sorted order.
|
||||||
|
*/
|
||||||
public abstract class SortedDocValues extends BinaryDocValues {
|
public abstract class SortedDocValues extends BinaryDocValues {
|
||||||
|
/**
|
||||||
|
* Returns the ordinal for the specified docID.
|
||||||
|
* @param docID document ID to lookup
|
||||||
|
* @return ordinal for the document: this is dense, starts at 0, then
|
||||||
|
* increments by 1 for the next value in sorted order.
|
||||||
|
*/
|
||||||
public abstract int getOrd(int docID);
|
public abstract int getOrd(int docID);
|
||||||
|
|
||||||
|
/** Retrieves the value for the specified ordinal.
|
||||||
|
* @param ord ordinal to lookup
|
||||||
|
* @param result will be populated with the ordinal's value
|
||||||
|
* @see #getOrd(int)
|
||||||
|
*/
|
||||||
public abstract void lookupOrd(int ord, BytesRef result);
|
public abstract void lookupOrd(int ord, BytesRef result);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns the number of unique values.
|
||||||
|
* @return number of unique values in this SortedDocValues. This is
|
||||||
|
* also equivalent to one plus the maximum ordinal.
|
||||||
|
*/
|
||||||
public abstract int getValueCount();
|
public abstract int getValueCount();
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
@ -37,6 +61,7 @@ public abstract class SortedDocValues extends BinaryDocValues {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** An empty SortedDocValues which returns empty bytes for every document */
|
||||||
public static final SortedDocValues EMPTY = new SortedDocValues() {
|
public static final SortedDocValues EMPTY = new SortedDocValues() {
|
||||||
@Override
|
@Override
|
||||||
public int getOrd(int docID) {
|
public int getOrd(int docID) {
|
||||||
|
|
|
@ -254,7 +254,7 @@ its {@link org.apache.lucene.search.similarities.Similarity#computeNorm} method.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Additional user-supplied statistics can be added to the document as DocValues fields and
|
Additional user-supplied statistics can be added to the document as DocValues fields and
|
||||||
accessed via {@link org.apache.lucene.index.AtomicReader#docValues}.
|
accessed via {@link org.apache.lucene.index.AtomicReader#getNumericDocValues}.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
</body>
|
</body>
|
||||||
|
|
|
@ -338,7 +338,7 @@ extend by plugging in a different component (e.g. term frequency normalizer).
|
||||||
Finally, you can extend the low level {@link org.apache.lucene.search.similarities.Similarity Similarity} directly
|
Finally, you can extend the low level {@link org.apache.lucene.search.similarities.Similarity Similarity} directly
|
||||||
to implement a new retrieval model, or to use external scoring factors particular to your application. For example,
|
to implement a new retrieval model, or to use external scoring factors particular to your application. For example,
|
||||||
a custom Similarity can access per-document values via {@link org.apache.lucene.search.FieldCache FieldCache} or
|
a custom Similarity can access per-document values via {@link org.apache.lucene.search.FieldCache FieldCache} or
|
||||||
{@link org.apache.lucene.index.DocValues} and integrate them into the score.
|
{@link org.apache.lucene.index.NumericDocValues} and integrate them into the score.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
See the {@link org.apache.lucene.search.similarities} package documentation for information
|
See the {@link org.apache.lucene.search.similarities} package documentation for information
|
||||||
|
|
|
@ -132,7 +132,7 @@ subclassing the Similarity, one can simply introduce a new basic model and tell
|
||||||
matching term occurs. In these
|
matching term occurs. In these
|
||||||
cases people have overridden Similarity to return 1 from the tf() method.</p></li>
|
cases people have overridden Similarity to return 1 from the tf() method.</p></li>
|
||||||
<li><p>Changing Length Normalization — By overriding
|
<li><p>Changing Length Normalization — By overriding
|
||||||
{@link org.apache.lucene.search.similarities.Similarity#computeNorm(FieldInvertState state, Norm)},
|
{@link org.apache.lucene.search.similarities.Similarity#computeNorm(FieldInvertState state)},
|
||||||
it is possible to discount how the length of a field contributes
|
it is possible to discount how the length of a field contributes
|
||||||
to a score. In {@link org.apache.lucene.search.similarities.DefaultSimilarity},
|
to a score. In {@link org.apache.lucene.search.similarities.DefaultSimilarity},
|
||||||
lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be
|
lengthNorm = 1 / (numTerms in field)^0.5, but if one changes this to be
|
||||||
|
|
Loading…
Reference in New Issue