[DOCS] Documented index.codec.bloom.load for #4525

2025-03-25 17:38:44 +00:00 · 2013-12-20 10:46:43 +01:00 · 2013-12-20 10:46:43 +01:00 · 2b8c82c883
commit 2b8c82c883
parent 51dc057244
2 changed files with 49 additions and 21 deletions
--- a/docs/reference/index-modules/codec.asciidoc
+++ b/docs/reference/index-modules/codec.asciidoc
@ -40,7 +40,7 @@ curl -XPUT 'http://localhost:9200/twitter/' -d '{
             "my_format" : {
                "type" : "pulsing",
                "freq_cut_off" : "5"
-             } 
+             }
          }
       }
        }
@ -77,13 +77,13 @@ substantial increase in search performance. Because this holds all term
 bytes as a single byte[], you cannot have more than 2.1GB worth of terms
 in a single segment.

-This postings format offers the following parameters: 
+This postings format offers the following parameters:

-`min_skip_count`:: 
+`min_skip_count`::
    The minimum number terms with a shared prefix to
-    allow a skip pointer to be written. The default is *8*. 
+    allow a skip pointer to be written. The default is *8*.

-`low_freq_cutoff`:: 
+`low_freq_cutoff`::
    Terms with a lower document frequency use a
    single array object representation for postings and positions. The
    default is *32*.
@ -97,15 +97,15 @@ Type name: `direct`
 A postings format that stores terms & postings (docs, positions,
 payloads) in RAM, using an FST. This postings format does write to disk,
 but loads everything into memory. The memory postings format has the
-following options: 
+following options:

-`pack_fst`:: 
+`pack_fst`::
    A boolean option that defines if the in memory structure
    should be packed once its build. Packed will reduce the size for the
    data-structure in memory but requires more memory during building.
    Default is *false*.

-`acceptable_overhead_ratio`:: 
+`acceptable_overhead_ratio`::
    The compression ratio specified as a
    float, that is used to compress internal structures. Example ratios `0`
    (Compact, no memory overhead at all, but the returned implementation may
@ -124,13 +124,13 @@ top of this creates a bloom filter that is written to disk. During
 opening this bloom filter is loaded into memory and used to offer
 "fast-fail" reads. This postings format is useful for low doc-frequency
 fields such as primary keys. The bloom filter postings format has the
-following options: 
+following options:

-`delegate`:: 
+`delegate`::
    The name of the configured postings format that the
-    bloom filter postings format will wrap. 
+    bloom filter postings format will wrap.

-`fpp`:: 
+`fpp`::
    The desired false positive probability specified as a
    floating point number between 0 and 1.0. The `fpp` can be configured for
    multiple expected insertions. Example expression: *10k=0.01,1m=0.03*. If
@ -141,6 +141,30 @@ following options:

 Type name: `bloom`

+[[codec-bloom-load]]
+[TIP]
+==================================================
+
+It can sometime make sense to disable bloom filters. For instance, if you are
+logging into an index per day, and you have thousands of indices, the bloom
+filters can take up a sizable amount of memory. For most queries you are only
+interested in recent indices, so you don't mind queries on older indices
+taking slightly longer.
+
+In these cases you can disable loading of the bloom filter on  a per-index
+basis by updating the index settings:
+
+[source,js]
+--------------------------------------------------
+PUT /old_index/_settings?index.codec.bloom.load=false
+--------------------------------------------------
+
+This setting, which defaults to `true`, can be updated on a live index. Note,
+however, that changing the value will cause the index to be reopened, which
+will invalidate any existing caches.
+
+==================================================
+
 [float]
 [[pulsing-postings]]
 ==== Pulsing postings format
@ -148,17 +172,17 @@ Type name: `bloom`
 The pulsing implementation in-lines the posting lists for very low
 frequent terms in the term dictionary. This is useful to improve lookup
 performance for low-frequent terms. This postings format offers the
-following parameters: 
+following parameters:

-`min_block_size`:: 
+`min_block_size`::
    The minimum block size the default Lucene term
-    dictionary uses to encode on-disk blocks. Defaults to *25*. 
+    dictionary uses to encode on-disk blocks. Defaults to *25*.

-`max_block_size`:: 
+`max_block_size`::
    The maximum block size the default Lucene term
-    dictionary uses to encode on-disk blocks. Defaults to *48*. 
+    dictionary uses to encode on-disk blocks. Defaults to *48*.

-`freq_cut_off`:: 
+`freq_cut_off`::
    The document frequency cut off where pulsing
    in-lines posting lists into the term dictionary. Terms with a document
    frequency less or equal to the cutoff will be in-lined. The default is
@ -170,11 +194,11 @@ Type name: `pulsing`
 [[default-postings]]
 ==== Default postings format

-The default postings format has the following options: 
+The default postings format has the following options:

-`min_block_size`:: 
+`min_block_size`::
    The minimum block size the default Lucene term
-    dictionary uses to encode on-disk blocks. Defaults to *25*. 
+    dictionary uses to encode on-disk blocks. Defaults to *25*.

 `max_block_size`::
    The maximum block size the default Lucene term
--- a/docs/reference/indices/update-settings.asciidoc
+++ b/docs/reference/indices/update-settings.asciidoc
@ -59,6 +59,10 @@ settings API:
 `index.codec`::
    Codec. Default to `default`.

+`index.codec.bloom.load`::
+    Whether to load the bloom filter. Defaults to `true`.
+    See <<bloom-postings>>.
+
 `index.fail_on_merge_failure`::
    Default to `true`.