mirror of https://github.com/apache/lucene.git
file formats
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene4765@1446988 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
a4af217886
commit
4991491bda
|
@ -375,7 +375,8 @@ can optionally be indexed into the postings lists. Payloads can be stored in the
|
|||
term vectors.</li>
|
||||
<li>In version 4.1, the format of the postings list changed to use either
|
||||
of FOR compression or variable-byte encoding, depending upon the frequency
|
||||
of the term.</li>
|
||||
of the term. Terms appearing only once were changed to inline directly into
|
||||
the term dictionary. Stored fields are compressed by default. </li>
|
||||
</ul>
|
||||
<a name="Limitations" id="Limitations"></a>
|
||||
<h2>Limitations</h2>
|
||||
|
|
|
@ -34,7 +34,7 @@ import org.apache.lucene.util.packed.BlockPackedWriter;
|
|||
/**
|
||||
* Lucene 4.2 DocValues format.
|
||||
* <p>
|
||||
* Encodes the three per-document value types (Numeric,Binary,Sorted) with five basic strategies.
|
||||
* Encodes the four per-document value types (Numeric,Binary,Sorted,SortedSet) with seven basic strategies.
|
||||
* <p>
|
||||
* <ul>
|
||||
* <li>Delta-compressed Numerics: per-document integers written in blocks of 4096. For each block
|
||||
|
@ -51,7 +51,9 @@ import org.apache.lucene.util.packed.BlockPackedWriter;
|
|||
* start for the block, and the average (expected) delta per entry. For each document the
|
||||
* deviation from the delta (actual - expected) is written.
|
||||
* <li>Sorted: an FST mapping deduplicated terms to ordinals is written, along with the per-document
|
||||
* ordinals written using one of the numeric stratgies above.
|
||||
* ordinals written using one of the numeric strategies above.
|
||||
* <li>SortedSet: an FST mapping deduplicated terms to ordinals is written, along with the per-document
|
||||
* ordinal list written using one of the binary strategies above.
|
||||
* </ul>
|
||||
* <p>
|
||||
* Files:
|
||||
|
@ -77,6 +79,8 @@ import org.apache.lucene.util.packed.BlockPackedWriter;
|
|||
* </ul>
|
||||
* <p>Sorted fields have two entries: a SortedEntry with the FST metadata,
|
||||
* and an ordinary NumericEntry for the document-to-ord metadata.</p>
|
||||
* <p>SortedSet fields have two entries: a SortedEntry with the FST metadata,
|
||||
* and an ordinary BinaryEntry for the document-to-ord-list metadata.</p>
|
||||
* <p>FieldNumber of -1 indicates the end of metadata.</p>
|
||||
* <p>EntryType is a 0 (NumericEntry), 1 (BinaryEntry, or 2 (SortedEntry)</p>
|
||||
* <p>DataOffset is the pointer to the start of the data in the DocValues data (.dvd)</p>
|
||||
|
@ -107,6 +111,8 @@ import org.apache.lucene.util.packed.BlockPackedWriter;
|
|||
* <li>UncompressedNumerics --> {@link DataOutput#writeByte Byte}<sup>maxdoc</sup></li>
|
||||
* <li>Addresses --> {@link MonotonicBlockPackedWriter MonotonicBlockPackedInts(blockSize=4096)}</li>
|
||||
* </ul>
|
||||
* <p>SortedSet entries store the list of ordinals in their BinaryData as a
|
||||
* sequences of increasing {@link DataOutput#writeVLong vLong}s, delta-encoded.</p>
|
||||
* </ol>
|
||||
*/
|
||||
public final class Lucene42DocValuesFormat extends DocValuesFormat {
|
||||
|
|
|
@ -375,7 +375,11 @@ can optionally be indexed into the postings lists. Payloads can be stored in the
|
|||
term vectors.</li>
|
||||
<li>In version 4.1, the format of the postings list changed to use either
|
||||
of FOR compression or variable-byte encoding, depending upon the frequency
|
||||
of the term.</li>
|
||||
of the term. Terms appearing only once were changed to inline directly into
|
||||
the term dictionary. Stored fields are compressed by default. </li>
|
||||
<li>In version 4.2, term vectors are compressed by default. DocValues has
|
||||
a new multi-valued type (SortedSet), that can be used for faceting/grouping/joining
|
||||
on multi-valued fields.</li>
|
||||
</ul>
|
||||
<a name="Limitations" id="Limitations"></a>
|
||||
<h2>Limitations</h2>
|
||||
|
|
Loading…
Reference in New Issue