mirror of https://github.com/apache/lucene.git
LUCENE-1848: remove old version references where it makes sense
git-svn-id: https://svn.apache.org/repos/asf/lucene/java/trunk@807653 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
3519f543e7
commit
7dd9b440aa
|
@ -368,7 +368,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
<div class="section">
|
<div class="section">
|
||||||
<p>
|
<p>
|
||||||
This document defines the index file formats used
|
This document defines the index file formats used
|
||||||
in Lucene version 2.1. If you are using a different
|
in Lucene version 2.9. If you are using a different
|
||||||
version of Lucene, please consult the copy of
|
version of Lucene, please consult the copy of
|
||||||
<span class="codefrag">docs/fileformats.html</span>
|
<span class="codefrag">docs/fileformats.html</span>
|
||||||
that was distributed
|
that was distributed
|
||||||
|
@ -382,7 +382,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
languages</a>. If these versions are to remain compatible with Apache
|
languages</a>. If these versions are to remain compatible with Apache
|
||||||
Lucene, then a language-independent definition of the Lucene index
|
Lucene, then a language-independent definition of the Lucene index
|
||||||
format is required. This document thus attempts to provide a
|
format is required. This document thus attempts to provide a
|
||||||
complete and independent definition of the Apache Lucene 2.1 file
|
complete and independent definition of the Apache Lucene 2.9 file
|
||||||
formats.
|
formats.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
|
@ -786,7 +786,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
<tr>
|
<tr>
|
||||||
|
|
||||||
<td><a href="#Normalization Factors">Norms</a></td>
|
<td><a href="#Normalization Factors">Norms</a></td>
|
||||||
<td>.nrm (pre 2.1: .f[0-9]*)</td>
|
<td>.nrm</td>
|
||||||
<td>Encodes length and boost factors for docs and fields</td>
|
<td>Encodes length and boost factors for docs and fields</td>
|
||||||
|
|
||||||
</tr>
|
</tr>
|
||||||
|
@ -1492,37 +1492,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
|
|
||||||
<b>Pre-2.1:</b>
|
<b>2.9</b>
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize>
|
|
||||||
<sup>SegCount</sup>
|
|
||||||
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>2.1 and above:</b>
|
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, HasSingleNormFile, NumField,
|
|
||||||
NormGen<sup>NumField</sup>,
|
|
||||||
IsCompoundFile><sup>SegCount</sup>
|
|
||||||
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>2.3:</b>
|
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
|
|
||||||
NormGen<sup>NumField</sup>,
|
|
||||||
IsCompoundFile><sup>SegCount</sup>
|
|
||||||
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>2.4 and above:</b>
|
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
|
|
||||||
NormGen<sup>NumField</sup>,
|
|
||||||
IsCompoundFile, DeletionCount, HasProx><sup>SegCount</sup>, Checksum
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>2.9 and above:</b>
|
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
|
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
|
||||||
NormGen<sup>NumField</sup>,
|
NormGen<sup>NumField</sup>,
|
||||||
IsCompoundFile, DeletionCount, HasProx, Diagnostics><sup>SegCount</sup>, CommitUserData, Checksum
|
IsCompoundFile, DeletionCount, HasProx, Diagnostics><sup>SegCount</sup>, CommitUserData, Checksum
|
||||||
|
@ -1548,7 +1518,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
CommitUserData --> Map<String,String>
|
CommitUserData --> Map<String,String>
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Format is -1 as of Lucene 1.4, -3 (SegmentInfos.FORMAT_SINGLE_NORM_FILE) as of Lucene 2.1 and 2.2, -4 (SegmentInfos.FORMAT_SHARED_DOC_STORE) as of Lucene 2.3, -7 (SegmentInfos.FORMAT_HAS_PROX) as of Lucene 2.4, and -9 (SegmentInfos.FORMAT_DIAGNOSTICS) as of Lucene 2.9.
|
Format is -9 (SegmentInfos.FORMAT_DIAGNOSTICS).
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Version counts how often the index has been
|
Version counts how often the index has been
|
||||||
|
@ -1648,7 +1618,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
Lucene version, OS, Java version, why the segment
|
Lucene version, OS, Java version, why the segment
|
||||||
was created (merge, flush, addIndexes), etc.
|
was created (merge, flush, addIndexes), etc.
|
||||||
</p>
|
</p>
|
||||||
<a name="N105EB"></a><a name="Lock File"></a>
|
<a name="N105BE"></a><a name="Lock File"></a>
|
||||||
<h3 class="boxed">Lock File</h3>
|
<h3 class="boxed">Lock File</h3>
|
||||||
<p>
|
<p>
|
||||||
The write lock, which is stored in the index
|
The write lock, which is stored in the index
|
||||||
|
@ -1662,20 +1632,14 @@ document.write("Last Published: " + document.lastModified);
|
||||||
documents). This lock file ensures that only one
|
documents). This lock file ensures that only one
|
||||||
writer is modifying the index at a time.
|
writer is modifying the index at a time.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<a name="N105C7"></a><a name="Deletable File"></a>
|
||||||
Note that prior to version 2.1, Lucene also used a
|
|
||||||
commit lock. This was removed in 2.1.
|
|
||||||
</p>
|
|
||||||
<a name="N105F7"></a><a name="Deletable File"></a>
|
|
||||||
<h3 class="boxed">Deletable File</h3>
|
<h3 class="boxed">Deletable File</h3>
|
||||||
<p>
|
<p>
|
||||||
Prior to Lucene 2.1 there was a file "deletable"
|
A writer dynamically computes
|
||||||
that contained details about files that need to be
|
|
||||||
deleted. As of 2.1, a writer dynamically computes
|
|
||||||
the files that are deletable, instead, so no file
|
the files that are deletable, instead, so no file
|
||||||
is written.
|
is written.
|
||||||
</p>
|
</p>
|
||||||
<a name="N10600"></a><a name="Compound Files"></a>
|
<a name="N105D0"></a><a name="Compound Files"></a>
|
||||||
<h3 class="boxed">Compound Files</h3>
|
<h3 class="boxed">Compound Files</h3>
|
||||||
<p>Starting with Lucene 1.4 the compound file format became default. This
|
<p>Starting with Lucene 1.4 the compound file format became default. This
|
||||||
is simply a container for all files described in the next section
|
is simply a container for all files described in the next section
|
||||||
|
@ -1702,14 +1666,14 @@ document.write("Last Published: " + document.lastModified);
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
<a name="N10628"></a><a name="Per-Segment Files"></a>
|
<a name="N105F8"></a><a name="Per-Segment Files"></a>
|
||||||
<h2 class="boxed">Per-Segment Files</h2>
|
<h2 class="boxed">Per-Segment Files</h2>
|
||||||
<div class="section">
|
<div class="section">
|
||||||
<p>
|
<p>
|
||||||
The remaining files are all per-segment, and are
|
The remaining files are all per-segment, and are
|
||||||
thus defined by suffix.
|
thus defined by suffix.
|
||||||
</p>
|
</p>
|
||||||
<a name="N10630"></a><a name="Fields"></a>
|
<a name="N10600"></a><a name="Fields"></a>
|
||||||
<h3 class="boxed">Fields</h3>
|
<h3 class="boxed">Fields</h3>
|
||||||
<p>
|
<p>
|
||||||
|
|
||||||
|
@ -1755,12 +1719,6 @@ document.write("Last Published: " + document.lastModified);
|
||||||
without term vectors.
|
without term vectors.
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>Lucene >= 1.9:</b>
|
|
||||||
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<li>If the third lowest-order bit is set (0x04), term positions are stored with the term vectors.</li>
|
<li>If the third lowest-order bit is set (0x04), term positions are stored with the term vectors.</li>
|
||||||
|
|
||||||
<li>If the fourth lowest-order bit is set (0x08), term offsets are stored with the term vectors.</li>
|
<li>If the fourth lowest-order bit is set (0x08), term offsets are stored with the term vectors.</li>
|
||||||
|
@ -1873,31 +1831,6 @@ document.write("Last Published: " + document.lastModified);
|
||||||
VInt
|
VInt
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>Lucene <= 1.4:</b>
|
|
||||||
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>Bits -->
|
|
||||||
Byte
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>Value -->
|
|
||||||
String
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>Only the low-order bit of Bits is used. It is one for
|
|
||||||
tokenized fields, and zero for non-tokenized fields.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>Lucene >= 1.9:</b>
|
|
||||||
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>Bits -->
|
<p>Bits -->
|
||||||
Byte
|
Byte
|
||||||
</p>
|
</p>
|
||||||
|
@ -1933,7 +1866,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
</ol>
|
</ol>
|
||||||
<a name="N106F2"></a><a name="Term Dictionary"></a>
|
<a name="N106A7"></a><a name="Term Dictionary"></a>
|
||||||
<h3 class="boxed">Term Dictionary</h3>
|
<h3 class="boxed">Term Dictionary</h3>
|
||||||
<p>
|
<p>
|
||||||
The term dictionary is represented as two files:
|
The term dictionary is represented as two files:
|
||||||
|
@ -2006,7 +1939,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>TIVersion names the version of the format
|
<p>TIVersion names the version of the format
|
||||||
of this file and is -2 in Lucene 1.4.
|
of this file and is equal to TermInfosWriter.FORMAT_CURRENT.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>Term
|
<p>Term
|
||||||
|
@ -2125,7 +2058,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
</ol>
|
</ol>
|
||||||
<a name="N10776"></a><a name="Frequencies"></a>
|
<a name="N1072B"></a><a name="Frequencies"></a>
|
||||||
<h3 class="boxed">Frequencies</h3>
|
<h3 class="boxed">Frequencies</h3>
|
||||||
<p>
|
<p>
|
||||||
The .frq file contains the lists of documents
|
The .frq file contains the lists of documents
|
||||||
|
@ -2241,7 +2174,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
<sup>nd</sup>
|
<sup>nd</sup>
|
||||||
starts.
|
starts.
|
||||||
</p>
|
</p>
|
||||||
<p>Lucene 2.2 introduces the notion of skip levels. Each term can have multiple skip levels.
|
<p>Each term can have multiple skip levels.
|
||||||
The amount of skip levels for a term is NumSkipLevels = Min(MaxSkipLevels, floor(log(DocFreq/log(SkipInterval)))).
|
The amount of skip levels for a term is NumSkipLevels = Min(MaxSkipLevels, floor(log(DocFreq/log(SkipInterval)))).
|
||||||
The number of SkipData entries for a skip level is DocFreq/(SkipInterval^(Level + 1)), whereas the lowest skip
|
The number of SkipData entries for a skip level is DocFreq/(SkipInterval^(Level + 1)), whereas the lowest skip
|
||||||
level is Level=0. <br>
|
level is Level=0. <br>
|
||||||
|
@ -2253,7 +2186,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
entry in level-1. In the example has entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a pointer
|
entry in level-1. In the example has entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a pointer
|
||||||
to entry 31 on level 0.
|
to entry 31 on level 0.
|
||||||
</p>
|
</p>
|
||||||
<a name="N107FE"></a><a name="Positions"></a>
|
<a name="N107B3"></a><a name="Positions"></a>
|
||||||
<h3 class="boxed">Positions</h3>
|
<h3 class="boxed">Positions</h3>
|
||||||
<p>
|
<p>
|
||||||
The .prx file contains the lists of positions that
|
The .prx file contains the lists of positions that
|
||||||
|
@ -2323,25 +2256,9 @@ document.write("Last Published: " + document.lastModified);
|
||||||
Payload. If PayloadLength is not stored, then this Payload has the same
|
Payload. If PayloadLength is not stored, then this Payload has the same
|
||||||
length as the Payload at the previous position.
|
length as the Payload at the previous position.
|
||||||
</p>
|
</p>
|
||||||
<a name="N1083A"></a><a name="Normalization Factors"></a>
|
<a name="N107EF"></a><a name="Normalization Factors"></a>
|
||||||
<h3 class="boxed">Normalization Factors</h3>
|
<h3 class="boxed">Normalization Factors</h3>
|
||||||
<p>
|
<p>There's a single .nrm file containing all norms:
|
||||||
|
|
||||||
<b>Pre-2.1:</b>
|
|
||||||
There's a norm file for each indexed field with a byte for
|
|
||||||
each document. The .f[0-9]* file contains,
|
|
||||||
for each document, a byte that encodes a value that is multiplied
|
|
||||||
into the score for hits on that field:
|
|
||||||
</p>
|
|
||||||
<p>Norms
|
|
||||||
(.f[0-9]*) --> <Byte>
|
|
||||||
<sup>SegSize</sup>
|
|
||||||
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>2.1 and above:</b>
|
|
||||||
There's a single .nrm file containing all norms:
|
|
||||||
</p>
|
</p>
|
||||||
<p>AllNorms
|
<p>AllNorms
|
||||||
(.nrm) --> NormsHeader,<Norms>
|
(.nrm) --> NormsHeader,<Norms>
|
||||||
|
@ -2417,17 +2334,9 @@ document.write("Last Published: " + document.lastModified);
|
||||||
When field <em>N</em> is modified, a separate norm file <em>.sN</em>
|
When field <em>N</em> is modified, a separate norm file <em>.sN</em>
|
||||||
is created, to maintain the norm values for that field.
|
is created, to maintain the norm values for that field.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>Separate norm files are created (when adequate) for both compound and non compound segments.
|
||||||
|
|
||||||
<b>Pre-2.1:</b>
|
|
||||||
Separate norm files are created only for compound segments.
|
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<a name="N10840"></a><a name="Term Vectors"></a>
|
||||||
|
|
||||||
<b>2.1 and above:</b>
|
|
||||||
Separate norm files are created (when adequate) for both compound and non compound segments.
|
|
||||||
</p>
|
|
||||||
<a name="N108A3"></a><a name="Term Vectors"></a>
|
|
||||||
<h3 class="boxed">Term Vectors</h3>
|
<h3 class="boxed">Term Vectors</h3>
|
||||||
<p>
|
<p>
|
||||||
Term Vector support is an optional on a field by
|
Term Vector support is an optional on a field by
|
||||||
|
@ -2450,7 +2359,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>TVXVersion --> Int (3 (TermVectorsReader.FORMAT_VERSION2) for Lucene 2.4)</p>
|
<p>TVXVersion --> Int (TermVectorsReader.CURRENT)</p>
|
||||||
|
|
||||||
<p>DocumentPosition --> UInt64 (offset in
|
<p>DocumentPosition --> UInt64 (offset in
|
||||||
the .tvd file)</p>
|
the .tvd file)</p>
|
||||||
|
@ -2475,7 +2384,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>TVDVersion --> Int (3 (TermVectorsReader.FORMAT_VERSION2) for Lucene 2.4)</p>
|
<p>TVDVersion --> Int (TermVectorsReader.FORMAT_CURRENT)</p>
|
||||||
|
|
||||||
<p>NumFields --> VInt</p>
|
<p>NumFields --> VInt</p>
|
||||||
|
|
||||||
|
@ -2511,7 +2420,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>TVFVersion --> Int (3 (TermVectorsReader.FORMAT_VERSION2) for Lucene 2.4)</p>
|
<p>TVFVersion --> Int (TermVectorsReader.FORMAT_CURRENT)</p>
|
||||||
|
|
||||||
<p>NumTerms --> VInt</p>
|
<p>NumTerms --> VInt</p>
|
||||||
|
|
||||||
|
@ -2563,7 +2472,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
</li>
|
</li>
|
||||||
|
|
||||||
</ol>
|
</ol>
|
||||||
<a name="N1093F"></a><a name="Deleted Documents"></a>
|
<a name="N108DC"></a><a name="Deleted Documents"></a>
|
||||||
<h3 class="boxed">Deleted Documents</h3>
|
<h3 class="boxed">Deleted Documents</h3>
|
||||||
<p>The .del file is
|
<p>The .del file is
|
||||||
optional, and only exists when a segment contains deletions.
|
optional, and only exists when a segment contains deletions.
|
||||||
|
@ -2571,14 +2480,6 @@ document.write("Last Published: " + document.lastModified);
|
||||||
<p>Although per-segment, this file is maintained exterior to compound segment files.
|
<p>Although per-segment, this file is maintained exterior to compound segment files.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
|
|
||||||
<b>Pre-2.1:</b>
|
|
||||||
Deletions
|
|
||||||
(.del) --> ByteCount,BitCount,Bits
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
|
|
||||||
<b>2.1 and above:</b>
|
|
||||||
Deletions
|
Deletions
|
||||||
(.del) --> [Format],ByteCount,BitCount, Bits | DGaps (depending on Format)
|
(.del) --> [Format],ByteCount,BitCount, Bits | DGaps (depending on Format)
|
||||||
</p>
|
</p>
|
||||||
|
@ -2635,7 +2536,7 @@ document.write("Last Published: " + document.lastModified);
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
<a name="N10982"></a><a name="Limitations"></a>
|
<a name="N10916"></a><a name="Limitations"></a>
|
||||||
<h2 class="boxed">Limitations</h2>
|
<h2 class="boxed">Limitations</h2>
|
||||||
<div class="section">
|
<div class="section">
|
||||||
<p>
|
<p>
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -12,7 +12,7 @@
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
This document defines the index file formats used
|
This document defines the index file formats used
|
||||||
in Lucene version 2.1. If you are using a different
|
in Lucene version 2.9. If you are using a different
|
||||||
version of Lucene, please consult the copy of
|
version of Lucene, please consult the copy of
|
||||||
<code>docs/fileformats.html</code>
|
<code>docs/fileformats.html</code>
|
||||||
that was distributed
|
that was distributed
|
||||||
|
@ -27,7 +27,7 @@
|
||||||
languages</a>. If these versions are to remain compatible with Apache
|
languages</a>. If these versions are to remain compatible with Apache
|
||||||
Lucene, then a language-independent definition of the Lucene index
|
Lucene, then a language-independent definition of the Lucene index
|
||||||
format is required. This document thus attempts to provide a
|
format is required. This document thus attempts to provide a
|
||||||
complete and independent definition of the Apache Lucene 2.1 file
|
complete and independent definition of the Apache Lucene 2.9 file
|
||||||
formats.
|
formats.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -367,7 +367,7 @@
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td><a href="#Normalization Factors">Norms</a></td>
|
<td><a href="#Normalization Factors">Norms</a></td>
|
||||||
<td>.nrm (pre 2.1: .f[0-9]*)</td>
|
<td>.nrm</td>
|
||||||
<td>Encodes length and boost factors for docs and fields</td>
|
<td>Encodes length and boost factors for docs and fields</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -903,32 +903,8 @@
|
||||||
-2), followed by the generation recorded as Int64,
|
-2), followed by the generation recorded as Int64,
|
||||||
written twice.
|
written twice.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<b>Pre-2.1:</b>
|
<b>2.9</b>
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize>
|
|
||||||
<sup>SegCount</sup>
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
<b>2.1 and above:</b>
|
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, HasSingleNormFile, NumField,
|
|
||||||
NormGen<sup>NumField</sup>,
|
|
||||||
IsCompoundFile><sup>SegCount</sup>
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
<b>2.3:</b>
|
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
|
|
||||||
NormGen<sup>NumField</sup>,
|
|
||||||
IsCompoundFile><sup>SegCount</sup>
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
<b>2.4 and above:</b>
|
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
|
|
||||||
NormGen<sup>NumField</sup>,
|
|
||||||
IsCompoundFile, DeletionCount, HasProx><sup>SegCount</sup>, Checksum
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
<b>2.9 and above:</b>
|
|
||||||
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
|
Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
|
||||||
NormGen<sup>NumField</sup>,
|
NormGen<sup>NumField</sup>,
|
||||||
IsCompoundFile, DeletionCount, HasProx, Diagnostics><sup>SegCount</sup>, CommitUserData, Checksum
|
IsCompoundFile, DeletionCount, HasProx, Diagnostics><sup>SegCount</sup>, CommitUserData, Checksum
|
||||||
|
@ -961,7 +937,7 @@
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Format is -1 as of Lucene 1.4, -3 (SegmentInfos.FORMAT_SINGLE_NORM_FILE) as of Lucene 2.1 and 2.2, -4 (SegmentInfos.FORMAT_SHARED_DOC_STORE) as of Lucene 2.3, -7 (SegmentInfos.FORMAT_HAS_PROX) as of Lucene 2.4, and -9 (SegmentInfos.FORMAT_DIAGNOSTICS) as of Lucene 2.9.
|
Format is -9 (SegmentInfos.FORMAT_DIAGNOSTICS).
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
|
@ -1092,20 +1068,12 @@
|
||||||
documents). This lock file ensures that only one
|
documents). This lock file ensures that only one
|
||||||
writer is modifying the index at a time.
|
writer is modifying the index at a time.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
|
||||||
Note that prior to version 2.1, Lucene also used a
|
|
||||||
commit lock. This was removed in 2.1.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section id="Deletable File"><title>Deletable File</title>
|
<section id="Deletable File"><title>Deletable File</title>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Prior to Lucene 2.1 there was a file "deletable"
|
A writer dynamically computes
|
||||||
that contained details about files that need to be
|
|
||||||
deleted. As of 2.1, a writer dynamically computes
|
|
||||||
the files that are deletable, instead, so no file
|
the files that are deletable, instead, so no file
|
||||||
is written.
|
is written.
|
||||||
</p>
|
</p>
|
||||||
|
@ -1193,9 +1161,6 @@
|
||||||
bit is one for fields that have term vectors stored, and zero for fields
|
bit is one for fields that have term vectors stored, and zero for fields
|
||||||
without term vectors.
|
without term vectors.
|
||||||
</li>
|
</li>
|
||||||
<p>
|
|
||||||
<b>Lucene >= 1.9:</b>
|
|
||||||
</p>
|
|
||||||
<li>If the third lowest-order bit is set (0x04), term positions are stored with the term vectors.</li>
|
<li>If the third lowest-order bit is set (0x04), term positions are stored with the term vectors.</li>
|
||||||
<li>If the fourth lowest-order bit is set (0x08), term offsets are stored with the term vectors.</li>
|
<li>If the fourth lowest-order bit is set (0x08), term offsets are stored with the term vectors.</li>
|
||||||
<li>If the fifth lowest-order bit is set (0x10), norms are omitted for the indexed field.</li>
|
<li>If the fifth lowest-order bit is set (0x10), norms are omitted for the indexed field.</li>
|
||||||
|
@ -1286,22 +1251,6 @@
|
||||||
<p>FieldNum -->
|
<p>FieldNum -->
|
||||||
VInt
|
VInt
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
|
||||||
<b>Lucene <= 1.4:</b>
|
|
||||||
</p>
|
|
||||||
<p>Bits -->
|
|
||||||
Byte
|
|
||||||
</p>
|
|
||||||
<p>Value -->
|
|
||||||
String
|
|
||||||
</p>
|
|
||||||
<p>Only the low-order bit of Bits is used. It is one for
|
|
||||||
tokenized fields, and zero for non-tokenized fields.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
<b>Lucene >= 1.9:</b>
|
|
||||||
</p>
|
|
||||||
<p>Bits -->
|
<p>Bits -->
|
||||||
Byte
|
Byte
|
||||||
</p>
|
</p>
|
||||||
|
@ -1383,7 +1332,7 @@
|
||||||
UTF16 character code) by the term's text.
|
UTF16 character code) by the term's text.
|
||||||
</p>
|
</p>
|
||||||
<p>TIVersion names the version of the format
|
<p>TIVersion names the version of the format
|
||||||
of this file and is -2 in Lucene 1.4.
|
of this file and is equal to TermInfosWriter.FORMAT_CURRENT.
|
||||||
</p>
|
</p>
|
||||||
<p>Term
|
<p>Term
|
||||||
text prefixes are shared. The PrefixLength is the number of initial
|
text prefixes are shared. The PrefixLength is the number of initial
|
||||||
|
@ -1592,7 +1541,7 @@
|
||||||
<sup>nd</sup>
|
<sup>nd</sup>
|
||||||
starts.
|
starts.
|
||||||
</p>
|
</p>
|
||||||
<p>Lucene 2.2 introduces the notion of skip levels. Each term can have multiple skip levels.
|
<p>Each term can have multiple skip levels.
|
||||||
The amount of skip levels for a term is NumSkipLevels = Min(MaxSkipLevels, floor(log(DocFreq/log(SkipInterval)))).
|
The amount of skip levels for a term is NumSkipLevels = Min(MaxSkipLevels, floor(log(DocFreq/log(SkipInterval)))).
|
||||||
The number of SkipData entries for a skip level is DocFreq/(SkipInterval^(Level + 1)), whereas the lowest skip
|
The number of SkipData entries for a skip level is DocFreq/(SkipInterval^(Level + 1)), whereas the lowest skip
|
||||||
level is Level=0. <br></br>
|
level is Level=0. <br></br>
|
||||||
|
@ -1674,20 +1623,8 @@
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
<section id="Normalization Factors"><title>Normalization Factors</title>
|
<section id="Normalization Factors"><title>Normalization Factors</title>
|
||||||
<p>
|
|
||||||
<b>Pre-2.1:</b>
|
<p>There's a single .nrm file containing all norms:
|
||||||
There's a norm file for each indexed field with a byte for
|
|
||||||
each document. The .f[0-9]* file contains,
|
|
||||||
for each document, a byte that encodes a value that is multiplied
|
|
||||||
into the score for hits on that field:
|
|
||||||
</p>
|
|
||||||
<p>Norms
|
|
||||||
(.f[0-9]*) --> <Byte>
|
|
||||||
<sup>SegSize</sup>
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
<b>2.1 and above:</b>
|
|
||||||
There's a single .nrm file containing all norms:
|
|
||||||
</p>
|
</p>
|
||||||
<p>AllNorms
|
<p>AllNorms
|
||||||
(.nrm) --> NormsHeader,<Norms>
|
(.nrm) --> NormsHeader,<Norms>
|
||||||
|
@ -1745,13 +1682,7 @@
|
||||||
When field <em>N</em> is modified, a separate norm file <em>.sN</em>
|
When field <em>N</em> is modified, a separate norm file <em>.sN</em>
|
||||||
is created, to maintain the norm values for that field.
|
is created, to maintain the norm values for that field.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>Separate norm files are created (when adequate) for both compound and non compound segments.
|
||||||
<b>Pre-2.1:</b>
|
|
||||||
Separate norm files are created only for compound segments.
|
|
||||||
</p>
|
|
||||||
<p>
|
|
||||||
<b>2.1 and above:</b>
|
|
||||||
Separate norm files are created (when adequate) for both compound and non compound segments.
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
@ -1770,7 +1701,7 @@
|
||||||
<p>DocumentIndex (.tvx) --> TVXVersion<DocumentPosition,FieldPosition>
|
<p>DocumentIndex (.tvx) --> TVXVersion<DocumentPosition,FieldPosition>
|
||||||
<sup>NumDocs</sup>
|
<sup>NumDocs</sup>
|
||||||
</p>
|
</p>
|
||||||
<p>TVXVersion --> Int (3 (TermVectorsReader.FORMAT_VERSION2) for Lucene 2.4)</p>
|
<p>TVXVersion --> Int (TermVectorsReader.CURRENT)</p>
|
||||||
<p>DocumentPosition --> UInt64 (offset in
|
<p>DocumentPosition --> UInt64 (offset in
|
||||||
the .tvd file)</p>
|
the .tvd file)</p>
|
||||||
<p>FieldPosition --> UInt64 (offset in the
|
<p>FieldPosition --> UInt64 (offset in the
|
||||||
|
@ -1785,7 +1716,7 @@
|
||||||
Document (.tvd) --> TVDVersion<NumFields, FieldNums, FieldPositions>
|
Document (.tvd) --> TVDVersion<NumFields, FieldNums, FieldPositions>
|
||||||
<sup>NumDocs</sup>
|
<sup>NumDocs</sup>
|
||||||
</p>
|
</p>
|
||||||
<p>TVDVersion --> Int (3 (TermVectorsReader.FORMAT_VERSION2) for Lucene 2.4)</p>
|
<p>TVDVersion --> Int (TermVectorsReader.FORMAT_CURRENT)</p>
|
||||||
<p>NumFields --> VInt</p>
|
<p>NumFields --> VInt</p>
|
||||||
<p>FieldNums --> <FieldNumDelta>
|
<p>FieldNums --> <FieldNumDelta>
|
||||||
<sup>NumFields</sup>
|
<sup>NumFields</sup>
|
||||||
|
@ -1805,7 +1736,7 @@
|
||||||
<p>Field (.tvf) --> TVFVersion<NumTerms, Position/Offset, TermFreqs>
|
<p>Field (.tvf) --> TVFVersion<NumTerms, Position/Offset, TermFreqs>
|
||||||
<sup>NumFields</sup>
|
<sup>NumFields</sup>
|
||||||
</p>
|
</p>
|
||||||
<p>TVFVersion --> Int (3 (TermVectorsReader.FORMAT_VERSION2) for Lucene 2.4)</p>
|
<p>TVFVersion --> Int (TermVectorsReader.FORMAT_CURRENT)</p>
|
||||||
<p>NumTerms --> VInt</p>
|
<p>NumTerms --> VInt</p>
|
||||||
<p>Position/Offset --> Byte</p>
|
<p>Position/Offset --> Byte</p>
|
||||||
<p>TermFreqs --> <TermText, TermFreq, Positions?, Offsets?>
|
<p>TermFreqs --> <TermText, TermFreq, Positions?, Offsets?>
|
||||||
|
@ -1845,15 +1776,7 @@
|
||||||
|
|
||||||
<p>Although per-segment, this file is maintained exterior to compound segment files.
|
<p>Although per-segment, this file is maintained exterior to compound segment files.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
<b>Pre-2.1:</b>
|
|
||||||
Deletions
|
|
||||||
(.del) --> ByteCount,BitCount,Bits
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
|
||||||
<b>2.1 and above:</b>
|
|
||||||
Deletions
|
Deletions
|
||||||
(.del) --> [Format],ByteCount,BitCount, Bits | DGaps (depending on Format)
|
(.del) --> [Format],ByteCount,BitCount, Bits | DGaps (depending on Format)
|
||||||
</p>
|
</p>
|
||||||
|
|
Loading…
Reference in New Issue