[DOCS] Documented index.codec.bloom.load for #4525

This commit is contained in:
Clinton Gormley 2013-12-20 10:46:43 +01:00
parent 51dc057244
commit 2b8c82c883
2 changed files with 49 additions and 21 deletions

View File

@ -40,7 +40,7 @@ curl -XPUT 'http://localhost:9200/twitter/' -d '{
"my_format" : {
"type" : "pulsing",
"freq_cut_off" : "5"
}
}
}
}
}
@ -77,13 +77,13 @@ substantial increase in search performance. Because this holds all term
bytes as a single byte[], you cannot have more than 2.1GB worth of terms
in a single segment.
This postings format offers the following parameters:
This postings format offers the following parameters:
`min_skip_count`::
`min_skip_count`::
The minimum number terms with a shared prefix to
allow a skip pointer to be written. The default is *8*.
allow a skip pointer to be written. The default is *8*.
`low_freq_cutoff`::
`low_freq_cutoff`::
Terms with a lower document frequency use a
single array object representation for postings and positions. The
default is *32*.
@ -97,15 +97,15 @@ Type name: `direct`
A postings format that stores terms & postings (docs, positions,
payloads) in RAM, using an FST. This postings format does write to disk,
but loads everything into memory. The memory postings format has the
following options:
following options:
`pack_fst`::
`pack_fst`::
A boolean option that defines if the in memory structure
should be packed once its build. Packed will reduce the size for the
data-structure in memory but requires more memory during building.
Default is *false*.
`acceptable_overhead_ratio`::
`acceptable_overhead_ratio`::
The compression ratio specified as a
float, that is used to compress internal structures. Example ratios `0`
(Compact, no memory overhead at all, but the returned implementation may
@ -124,13 +124,13 @@ top of this creates a bloom filter that is written to disk. During
opening this bloom filter is loaded into memory and used to offer
"fast-fail" reads. This postings format is useful for low doc-frequency
fields such as primary keys. The bloom filter postings format has the
following options:
following options:
`delegate`::
`delegate`::
The name of the configured postings format that the
bloom filter postings format will wrap.
bloom filter postings format will wrap.
`fpp`::
`fpp`::
The desired false positive probability specified as a
floating point number between 0 and 1.0. The `fpp` can be configured for
multiple expected insertions. Example expression: *10k=0.01,1m=0.03*. If
@ -141,6 +141,30 @@ following options:
Type name: `bloom`
[[codec-bloom-load]]
[TIP]
==================================================
It can sometime make sense to disable bloom filters. For instance, if you are
logging into an index per day, and you have thousands of indices, the bloom
filters can take up a sizable amount of memory. For most queries you are only
interested in recent indices, so you don't mind queries on older indices
taking slightly longer.
In these cases you can disable loading of the bloom filter on a per-index
basis by updating the index settings:
[source,js]
--------------------------------------------------
PUT /old_index/_settings?index.codec.bloom.load=false
--------------------------------------------------
This setting, which defaults to `true`, can be updated on a live index. Note,
however, that changing the value will cause the index to be reopened, which
will invalidate any existing caches.
==================================================
[float]
[[pulsing-postings]]
==== Pulsing postings format
@ -148,17 +172,17 @@ Type name: `bloom`
The pulsing implementation in-lines the posting lists for very low
frequent terms in the term dictionary. This is useful to improve lookup
performance for low-frequent terms. This postings format offers the
following parameters:
following parameters:
`min_block_size`::
`min_block_size`::
The minimum block size the default Lucene term
dictionary uses to encode on-disk blocks. Defaults to *25*.
dictionary uses to encode on-disk blocks. Defaults to *25*.
`max_block_size`::
`max_block_size`::
The maximum block size the default Lucene term
dictionary uses to encode on-disk blocks. Defaults to *48*.
dictionary uses to encode on-disk blocks. Defaults to *48*.
`freq_cut_off`::
`freq_cut_off`::
The document frequency cut off where pulsing
in-lines posting lists into the term dictionary. Terms with a document
frequency less or equal to the cutoff will be in-lined. The default is
@ -170,11 +194,11 @@ Type name: `pulsing`
[[default-postings]]
==== Default postings format
The default postings format has the following options:
The default postings format has the following options:
`min_block_size`::
`min_block_size`::
The minimum block size the default Lucene term
dictionary uses to encode on-disk blocks. Defaults to *25*.
dictionary uses to encode on-disk blocks. Defaults to *25*.
`max_block_size`::
The maximum block size the default Lucene term

View File

@ -59,6 +59,10 @@ settings API:
`index.codec`::
Codec. Default to `default`.
`index.codec.bloom.load`::
Whether to load the bloom filter. Defaults to `true`.
See <<bloom-postings>>.
`index.fail_on_merge_failure`::
Default to `true`.