mirror of https://github.com/apache/lucene.git
javadocs
git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1332797 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
9554a045e9
commit
22f2879134
|
@ -51,15 +51,23 @@ import org.apache.lucene.util.fst.FST; // javadocs
|
||||||
* <p>
|
* <p>
|
||||||
* <a name="Termdictionary" id="Termdictionary"></a>
|
* <a name="Termdictionary" id="Termdictionary"></a>
|
||||||
* <h3>Term Dictionary</h3>
|
* <h3>Term Dictionary</h3>
|
||||||
* <p>The .tim file contains the list of terms in each field, in UTF-8 order,
|
*
|
||||||
* along with per-term statistics (such as docfreq) and pointers to the frequencies,
|
* <p>The .tim file contains the list of terms in each
|
||||||
* positions, and skip data in the .frq and .prx files.
|
* field along with per-term statistics (such as docfreq)
|
||||||
|
* and pointers to the frequencies, positions, payloads and
|
||||||
|
* skip data in the .frq and .prx files.
|
||||||
* </p>
|
* </p>
|
||||||
* <p>The .tim is arranged in blocks: with blocks containing either terms or
|
*
|
||||||
* sub-blocks.</p>
|
* <p>The .tim is arranged in blocks: with blocks containing
|
||||||
|
* a variable number of entries (by default 25-48), where
|
||||||
|
* each entry is either a term or a reference to a
|
||||||
|
* sub-block. It's written by {@link BlockTreeTermsWriter}
|
||||||
|
* and read by {@link BlockTreeTermsReader}.</p>
|
||||||
|
*
|
||||||
* <p>NOTE: The term dictionary can plug into different postings implementations:
|
* <p>NOTE: The term dictionary can plug into different postings implementations:
|
||||||
* for example the postings writer/reader are actually responsible for encoding
|
* for example the postings writer/reader are actually responsible for encoding
|
||||||
* and decoding the MetadataBlock.</p>
|
* and decoding the MetadataBlock.</p>
|
||||||
|
*
|
||||||
* <ul>
|
* <ul>
|
||||||
* <!-- TODO: expand on this, its not really correct and doesnt explain sub-blocks etc -->
|
* <!-- TODO: expand on this, its not really correct and doesnt explain sub-blocks etc -->
|
||||||
* <li>TermsDict (.tim) --> Header, DirOffset, PostingsHeader, SkipInterval,
|
* <li>TermsDict (.tim) --> Header, DirOffset, PostingsHeader, SkipInterval,
|
||||||
|
@ -122,7 +130,8 @@ import org.apache.lucene.util.fst.FST; // javadocs
|
||||||
* <a name="Termindex" id="Termindex"></a>
|
* <a name="Termindex" id="Termindex"></a>
|
||||||
* <h3>Term Index</h3>
|
* <h3>Term Index</h3>
|
||||||
* <p>The .tip file contains an index into the term dictionary, so that it can be
|
* <p>The .tip file contains an index into the term dictionary, so that it can be
|
||||||
* accessed randomly.</p>
|
* accessed randomly. The index is also used to determine
|
||||||
|
* when a given term cannot exist on disk (in the .tim file), saving a disk seek.</p>
|
||||||
* <ul>
|
* <ul>
|
||||||
* <li>TermsIndex (.tip) --> Header, <IndexStartFP><sup>NumFields</sup>,
|
* <li>TermsIndex (.tip) --> Header, <IndexStartFP><sup>NumFields</sup>,
|
||||||
* FSTIndex<sup>NumFields</sup></li>
|
* FSTIndex<sup>NumFields</sup></li>
|
||||||
|
@ -133,8 +142,18 @@ import org.apache.lucene.util.fst.FST; // javadocs
|
||||||
* </ul>
|
* </ul>
|
||||||
* <p>Notes:</p>
|
* <p>Notes:</p>
|
||||||
* <ul>
|
* <ul>
|
||||||
* <li>The .tip file contains a separate FST for each field. Each field's IndexStartFP points
|
* <li>The .tip file contains a separate FST for each
|
||||||
* to its FST.</li>
|
* field. The FST maps a term prefix to the on-disk
|
||||||
|
* block that holds all terms starting with that
|
||||||
|
* prefix. Each field's IndexStartFP points to its
|
||||||
|
* FST.</li>
|
||||||
|
* <li>It's possible that an on-disk block would contain
|
||||||
|
* too many terms (more than the allowed maximum
|
||||||
|
* (default: 48)). When this happens, the block is
|
||||||
|
* sub-divided into new blocks (called "floor
|
||||||
|
* blocks"), and then the output in the FST for the
|
||||||
|
* block's prefix encodes the leading byte of each
|
||||||
|
* sub-block, and its file pointer.
|
||||||
* </ul>
|
* </ul>
|
||||||
* <a name="Frequencies" id="Frequencies"></a>
|
* <a name="Frequencies" id="Frequencies"></a>
|
||||||
* <h3>Frequencies</h3>
|
* <h3>Frequencies</h3>
|
||||||
|
|
Loading…
Reference in New Issue