LUCENE-2048: regen site

git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1145655 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Robert Muir 2011-07-12 16:21:19 +00:00
parent a2cf98a7bb
commit f9928d27a7
5 changed files with 55 additions and 48 deletions

View File

@ -412,10 +412,14 @@ document.write("Last Published: " + document.lastModified);
to stored fields file, previously they were stored in
text format only.
</p>
<p>
In version 3.4, fields can omit position data while
still indexing term frequencies.
</p>
</div>
<a name="N1003A"></a><a name="Definitions"></a>
<a name="N1003D"></a><a name="Definitions"></a>
<h2 class="boxed">Definitions</h2>
<div class="section">
<p>
@ -456,7 +460,7 @@ document.write("Last Published: " + document.lastModified);
strings, the first naming the field, and the second naming text
within the field.
</p>
<a name="N1005A"></a><a name="Inverted Indexing"></a>
<a name="N1005D"></a><a name="Inverted Indexing"></a>
<h3 class="boxed">Inverted Indexing</h3>
<p>
The index stores statistics about terms in order
@ -466,7 +470,7 @@ document.write("Last Published: " + document.lastModified);
it. This is the inverse of the natural relationship, in which
documents list terms.
</p>
<a name="N10066"></a><a name="Types of Fields"></a>
<a name="N10069"></a><a name="Types of Fields"></a>
<h3 class="boxed">Types of Fields</h3>
<p>
In Lucene, fields may be <i>stored</i>, in which
@ -480,7 +484,7 @@ document.write("Last Published: " + document.lastModified);
to be indexed literally.
</p>
<p>See the <a href="api/core/org/apache/lucene/document/Field.html">Field</a> java docs for more information on Fields.</p>
<a name="N10083"></a><a name="Segments"></a>
<a name="N10086"></a><a name="Segments"></a>
<h3 class="boxed">Segments</h3>
<p>
Lucene indexes may be composed of multiple sub-indexes, or
@ -506,7 +510,7 @@ document.write("Last Published: " + document.lastModified);
Searches may involve multiple segments and/or multiple indexes, each
index potentially composed of a set of segments.
</p>
<a name="N100A1"></a><a name="Document Numbers"></a>
<a name="N100A4"></a><a name="Document Numbers"></a>
<h3 class="boxed">Document Numbers</h3>
<p>
Internally, Lucene refers to documents by an integer <i>document
@ -561,7 +565,7 @@ document.write("Last Published: " + document.lastModified);
</div>
<a name="N100C8"></a><a name="Overview"></a>
<a name="N100CB"></a><a name="Overview"></a>
<h2 class="boxed">Overview</h2>
<div class="section">
<p>
@ -608,7 +612,7 @@ document.write("Last Published: " + document.lastModified);
<p>Term Frequency
data. For each term in the dictionary, the numbers of all the
documents that contain that term, and the frequency of the term in
that document if omitTf is false.
that document, unless frequencies are omitted (IndexOptions.DOCS_ONLY)
</p>
</li>
@ -619,8 +623,7 @@ document.write("Last Published: " + document.lastModified);
<p>Term Proximity
data. For each term in the dictionary, the positions that the term
occurs in each document. Note that this will
not exist if all fields in all documents set
omitTf to true.
not exist if all fields in all documents omit position data.
</p>
</li>
@ -660,7 +663,7 @@ document.write("Last Published: " + document.lastModified);
</div>
<a name="N1010B"></a><a name="File Naming"></a>
<a name="N1010E"></a><a name="File Naming"></a>
<h2 class="boxed">File Naming</h2>
<div class="section">
<p>
@ -687,7 +690,7 @@ document.write("Last Published: " + document.lastModified);
</p>
</div>
<a name="N1011A"></a><a name="file-names"></a>
<a name="N1011D"></a><a name="file-names"></a>
<h2 class="boxed">Summary of File Extensions</h2>
<div class="section">
<p>The following table summarizes the names and extensions of the files in Lucene:
@ -837,10 +840,10 @@ document.write("Last Published: " + document.lastModified);
</div>
<a name="N10212"></a><a name="Primitive Types"></a>
<a name="N10215"></a><a name="Primitive Types"></a>
<h2 class="boxed">Primitive Types</h2>
<div class="section">
<a name="N10217"></a><a name="Byte"></a>
<a name="N1021A"></a><a name="Byte"></a>
<h3 class="boxed">Byte</h3>
<p>
The most primitive type
@ -848,7 +851,7 @@ document.write("Last Published: " + document.lastModified);
other data types are defined as sequences
of bytes, so file formats are byte-order independent.
</p>
<a name="N10220"></a><a name="UInt32"></a>
<a name="N10223"></a><a name="UInt32"></a>
<h3 class="boxed">UInt32</h3>
<p>
32-bit unsigned integers are written as four
@ -858,7 +861,7 @@ document.write("Last Published: " + document.lastModified);
UInt32 --&gt; &lt;Byte&gt;<sup>4</sup>
</p>
<a name="N1022F"></a><a name="Uint64"></a>
<a name="N10232"></a><a name="Uint64"></a>
<h3 class="boxed">Uint64</h3>
<p>
64-bit unsigned integers are written as eight
@ -867,7 +870,7 @@ document.write("Last Published: " + document.lastModified);
<p>UInt64 --&gt; &lt;Byte&gt;<sup>8</sup>
</p>
<a name="N1023E"></a><a name="VInt"></a>
<a name="N10241"></a><a name="VInt"></a>
<h3 class="boxed">VInt</h3>
<p>
A variable-length format for positive integers is
@ -1417,13 +1420,13 @@ document.write("Last Published: " + document.lastModified);
This provides compression while still being
efficient to decode.
</p>
<a name="N10523"></a><a name="Chars"></a>
<a name="N10526"></a><a name="Chars"></a>
<h3 class="boxed">Chars</h3>
<p>
Lucene writes unicode
character sequences as UTF-8 encoded bytes.
</p>
<a name="N1052C"></a><a name="String"></a>
<a name="N1052F"></a><a name="String"></a>
<h3 class="boxed">String</h3>
<p>
Lucene writes strings as UTF-8 encoded bytes.
@ -1436,10 +1439,10 @@ document.write("Last Published: " + document.lastModified);
</div>
<a name="N10539"></a><a name="Compound Types"></a>
<a name="N1053C"></a><a name="Compound Types"></a>
<h2 class="boxed">Compound Types</h2>
<div class="section">
<a name="N1053E"></a><a name="MapStringString"></a>
<a name="N10541"></a><a name="MapStringString"></a>
<h3 class="boxed">Map&lt;String,String&gt;</h3>
<p>
In a couple places Lucene stores a Map
@ -1452,13 +1455,13 @@ document.write("Last Published: " + document.lastModified);
</div>
<a name="N1054E"></a><a name="Per-Index Files"></a>
<a name="N10551"></a><a name="Per-Index Files"></a>
<h2 class="boxed">Per-Index Files</h2>
<div class="section">
<p>
The files in this section exist one-per-index.
</p>
<a name="N10556"></a><a name="Segments File"></a>
<a name="N10559"></a><a name="Segments File"></a>
<h3 class="boxed">Segments File</h3>
<p>
The active segments in the index are stored in the
@ -1613,7 +1616,7 @@ document.write("Last Published: " + document.lastModified);
</p>
<p>
HasProx is 1 if any fields in this segment have
omitTf set to false; else, it's 0.
position data (IndexOptions.DOCS_AND_FREQS_AND_POSITIONS); else, it's 0.
</p>
<p>
CommitUserData stores an optional user-supplied
@ -1631,7 +1634,7 @@ document.write("Last Published: " + document.lastModified);
<p> HasVectors is 1 if this segment stores term vectors,
else it's 0.
</p>
<a name="N105E1"></a><a name="Lock File"></a>
<a name="N105E4"></a><a name="Lock File"></a>
<h3 class="boxed">Lock File</h3>
<p>
The write lock, which is stored in the index
@ -1645,14 +1648,14 @@ document.write("Last Published: " + document.lastModified);
documents). This lock file ensures that only one
writer is modifying the index at a time.
</p>
<a name="N105EA"></a><a name="Deletable File"></a>
<a name="N105ED"></a><a name="Deletable File"></a>
<h3 class="boxed">Deletable File</h3>
<p>
A writer dynamically computes
the files that are deletable, instead, so no file
is written.
</p>
<a name="N105F3"></a><a name="Compound Files"></a>
<a name="N105F6"></a><a name="Compound Files"></a>
<h3 class="boxed">Compound Files</h3>
<p>Starting with Lucene 1.4 the compound file format became default. This
is simply a container for all files described in the next section
@ -1681,14 +1684,14 @@ document.write("Last Published: " + document.lastModified);
</div>
<a name="N10624"></a><a name="Per-Segment Files"></a>
<a name="N10627"></a><a name="Per-Segment Files"></a>
<h2 class="boxed">Per-Segment Files</h2>
<div class="section">
<p>
The remaining files are all per-segment, and are
thus defined by suffix.
</p>
<a name="N1062C"></a><a name="Fields"></a>
<a name="N1062F"></a><a name="Fields"></a>
<h3 class="boxed">Fields</h3>
<p>
@ -1742,11 +1745,15 @@ document.write("Last Published: " + document.lastModified);
<li>If the sixth lowest-order bit is set (0x20), payloads are stored for the indexed field.</li>
<li>If the seventh lowest-order bit is set (0x40), term frequencies and positions omitted for the indexed field.</li>
<li>If the eighth lowest-order bit is set (0x80), positions are omitted for the indexed field.</li>
</ul>
</p>
<p>
FNMVersion (added in 2.9) is always -2.
FNMVersion (added in 2.9) is -2 for indexes from 2.9 - 3.3. It is -3 for indexes in Lucene 3.4+
</p>
<p>
Fields are numbered by their order in this file. Thus field zero is
@ -1898,7 +1905,7 @@ document.write("Last Published: " + document.lastModified);
</li>
</ol>
<a name="N106E7"></a><a name="Term Dictionary"></a>
<a name="N106F0"></a><a name="Term Dictionary"></a>
<h3 class="boxed">Term Dictionary</h3>
<p>
The term dictionary is represented as two files:
@ -2002,7 +2009,7 @@ document.write("Last Published: " + document.lastModified);
file. In particular, it is the difference between the position of
this term's data in that file and the position of the previous
term's data (or zero, for the first term in the file. For fields
with omitTf true, this will be 0 since
that omit position data, this will be 0 since
prox information is not stored.
</p>
@ -2090,12 +2097,12 @@ document.write("Last Published: " + document.lastModified);
</li>
</ol>
<a name="N1076B"></a><a name="Frequencies"></a>
<a name="N10774"></a><a name="Frequencies"></a>
<h3 class="boxed">Frequencies</h3>
<p>
The .frq file contains the lists of documents
which contain each term, along with the frequency of the term in that
document (if omitTf is false).
document (except when frequencies are omitted: IndexOptions.DOCS_ONLY).
</p>
<p>FreqFile (.frq) --&gt;
&lt;TermFreqs, SkipData&gt;
@ -2135,26 +2142,26 @@ document.write("Last Published: " + document.lastModified);
<p>TermFreq
entries are ordered by increasing document number.
</p>
<p>DocDelta: if omitTf is false, this determines both
<p>DocDelta: if frequencies are indexed, this determines both
the document number and the frequency. In
particular, DocDelta/2 is the difference between
this document number and the previous document
number (or zero when this is the first document in
a TermFreqs). When DocDelta is odd, the frequency
is one. When DocDelta is even, the frequency is
read as another VInt. If omitTf is true, DocDelta
read as another VInt. If frequencies are omitted, DocDelta
contains the gap (not multiplied by 2) between
document numbers and no frequency information is
stored.
</p>
<p>For example, the TermFreqs for a term which occurs
once in document seven and three times in document
eleven, with omitTf false, would be the following
eleven, with frequencies indexed, would be the following
sequence of VInts:
</p>
<p>15, 8, 3
</p>
<p> If omitTf were true it would be this sequence
<p> If frequencies were omitted (IndexOptions.DOCS_ONLY) it would be this sequence
of VInts instead:
</p>
<p>
@ -2218,14 +2225,14 @@ document.write("Last Published: " + document.lastModified);
entry in level-1. In the example has entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a pointer
to entry 31 on level 0.
</p>
<a name="N107F3"></a><a name="Positions"></a>
<a name="N107FC"></a><a name="Positions"></a>
<h3 class="boxed">Positions</h3>
<p>
The .prx file contains the lists of positions that
each term occurs at within documents. Note that
fields with omitTf true do not store
fields omitting positional data do not store
anything into this file, and if all fields in the
index have omitTf true then the .prx file will not
index omit positional data then the .prx file will not
exist.
</p>
<p>ProxFile (.prx) --&gt;
@ -2288,7 +2295,7 @@ document.write("Last Published: " + document.lastModified);
Payload. If PayloadLength is not stored, then this Payload has the same
length as the Payload at the previous position.
</p>
<a name="N1082F"></a><a name="Normalization Factors"></a>
<a name="N10838"></a><a name="Normalization Factors"></a>
<h3 class="boxed">Normalization Factors</h3>
<p>There's a single .nrm file containing all norms:
</p>
@ -2368,7 +2375,7 @@ document.write("Last Published: " + document.lastModified);
</p>
<p>Separate norm files are created (when adequate) for both compound and non compound segments.
</p>
<a name="N10880"></a><a name="Term Vectors"></a>
<a name="N10889"></a><a name="Term Vectors"></a>
<h3 class="boxed">Term Vectors</h3>
<p>
Term Vector support is an optional on a field by
@ -2504,7 +2511,7 @@ document.write("Last Published: " + document.lastModified);
</li>
</ol>
<a name="N1091C"></a><a name="Deleted Documents"></a>
<a name="N10925"></a><a name="Deleted Documents"></a>
<h3 class="boxed">Deleted Documents</h3>
<p>The .del file is
optional, and only exists when a segment contains deletions.
@ -2568,7 +2575,7 @@ document.write("Last Published: " + document.lastModified);
</div>
<a name="N10956"></a><a name="Limitations"></a>
<a name="N1095F"></a><a name="Limitations"></a>
<h2 class="boxed">Limitations</h2>
<div class="section">
<p>