mirror of https://github.com/apache/lucene.git
Improve Javadoc for Lucene90StoredFieldsFormat (#12984)
This commit is contained in:
parent
20ea551b95
commit
a9480316e2
|
@ -37,21 +37,21 @@ import org.apache.lucene.util.packed.DirectMonotonicWriter;
|
||||||
*
|
*
|
||||||
* <p>This {@link StoredFieldsFormat} compresses blocks of documents in order to improve the
|
* <p>This {@link StoredFieldsFormat} compresses blocks of documents in order to improve the
|
||||||
* compression ratio compared to document-level compression. It uses the <a
|
* compression ratio compared to document-level compression. It uses the <a
|
||||||
* href="http://code.google.com/p/lz4/">LZ4</a> compression algorithm by default in 16KB blocks,
|
* href="http://code.google.com/p/lz4/">LZ4</a> compression algorithm by default in 8KB blocks and
|
||||||
* which is fast to compress and very fast to decompress data. Although the default compression
|
* shared dictionaries, which is fast to compress and very fast to decompress data. Although the
|
||||||
* method that is used ({@link Mode#BEST_SPEED BEST_SPEED}) focuses more on speed than on
|
* default compression method that is used ({@link Mode#BEST_SPEED BEST_SPEED}) focuses more on
|
||||||
* compression ratio, it should provide interesting compression ratios for redundant inputs (such as
|
* speed than on compression ratio, it should provide interesting compression ratios for redundant
|
||||||
* log files, HTML or plain text). For higher compression, you can choose ({@link
|
* inputs (such as log files, HTML or plain text). For higher compression, you can choose ({@link
|
||||||
* Mode#BEST_COMPRESSION BEST_COMPRESSION}), which uses the <a
|
* Mode#BEST_COMPRESSION BEST_COMPRESSION}), which uses the <a
|
||||||
* href="http://en.wikipedia.org/wiki/DEFLATE">DEFLATE</a> algorithm with 48kB blocks and shared
|
* href="http://en.wikipedia.org/wiki/DEFLATE">DEFLATE</a> algorithm with 48KB blocks and shared
|
||||||
* dictionaries for a better ratio at the expense of slower performance. These two options can be
|
* dictionaries for a better ratio at the expense of slower performance. These two options can be
|
||||||
* configured like this:
|
* configured like this:
|
||||||
*
|
*
|
||||||
* <pre class="prettyprint">
|
* <pre class="prettyprint">
|
||||||
* // the default: for high performance
|
* // the default: for high performance
|
||||||
* indexWriterConfig.setCodec(new Lucene87Codec(Mode.BEST_SPEED));
|
* indexWriterConfig.setCodec(new Lucene99Codec(Mode.BEST_SPEED));
|
||||||
* // instead for higher performance (but slower):
|
* // instead for higher performance (but slower):
|
||||||
* // indexWriterConfig.setCodec(new Lucene87Codec(Mode.BEST_COMPRESSION));
|
* // indexWriterConfig.setCodec(new Lucene99Codec(Mode.BEST_COMPRESSION));
|
||||||
* </pre>
|
* </pre>
|
||||||
*
|
*
|
||||||
* <p><b>File formats</b>
|
* <p><b>File formats</b>
|
||||||
|
@ -61,9 +61,9 @@ import org.apache.lucene.util.packed.DirectMonotonicWriter;
|
||||||
* <ol>
|
* <ol>
|
||||||
* <li><a id="field_data"></a>
|
* <li><a id="field_data"></a>
|
||||||
* <p>A fields data file (extension <code>.fdt</code>). This file stores a compact
|
* <p>A fields data file (extension <code>.fdt</code>). This file stores a compact
|
||||||
* representation of documents in compressed blocks of 16KB or more. When writing a segment,
|
* representation of documents in compressed blocks of 8KB or more. When writing a segment,
|
||||||
* documents are appended to an in-memory <code>byte[]</code> buffer. When its size reaches
|
* documents are appended to an in-memory <code>byte[]</code> buffer. When its size reaches
|
||||||
* 16KB or more, some metadata about the documents is flushed to disk, immediately followed by
|
* 80KB or more, some metadata about the documents is flushed to disk, immediately followed by
|
||||||
* a compressed representation of the buffer using the <a
|
* a compressed representation of the buffer using the <a
|
||||||
* href="https://github.com/lz4/lz4">LZ4</a> <a
|
* href="https://github.com/lz4/lz4">LZ4</a> <a
|
||||||
* href="http://fastcompression.blogspot.fr/2011/05/lz4-explained.html">compression
|
* href="http://fastcompression.blogspot.fr/2011/05/lz4-explained.html">compression
|
||||||
|
@ -71,10 +71,10 @@ import org.apache.lucene.util.packed.DirectMonotonicWriter;
|
||||||
* <p>Notes
|
* <p>Notes
|
||||||
* <ul>
|
* <ul>
|
||||||
* <li>When at least one document in a chunk is large enough so that the chunk is larger
|
* <li>When at least one document in a chunk is large enough so that the chunk is larger
|
||||||
* than 32KB, the chunk will actually be compressed in several LZ4 blocks of 16KB. This
|
* than 80KB, the chunk will actually be compressed in several LZ4 blocks of 8KB. This
|
||||||
* allows {@link StoredFieldVisitor}s which are only interested in the first fields of a
|
* allows {@link StoredFieldVisitor}s which are only interested in the first fields of a
|
||||||
* document to not have to decompress 10MB of data if the document is 10MB, but only
|
* document to not have to decompress 10MB of data if the document is 10MB, but only
|
||||||
* 16KB.
|
* 8-16KB(may cross the block).
|
||||||
* <li>Given that the original lengths are written in the metadata of the chunk, the
|
* <li>Given that the original lengths are written in the metadata of the chunk, the
|
||||||
* decompressor can leverage this information to stop decoding as soon as enough data
|
* decompressor can leverage this information to stop decoding as soon as enough data
|
||||||
* has been decompressed.
|
* has been decompressed.
|
||||||
|
|
Loading…
Reference in New Issue