diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index 9ac92155030..3bed8018218 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -508,6 +508,20 @@ admin.enableTable(table); See HColumnDescriptor for more information. +
+ Bloom Filters + Bloom Filters can be enabled per-ColumnFamily. + Use HColumnDescriptor.setBloomFilterType(NONE | ROW | + ROWCOL) to enable blooms per Column Family. Default = + NONE for no bloom filters. If + ROW, the hash of the row will be added to the bloom + on each insert. If ROWCOL, the hash of the row + + column family + column family qualifier will be added to the bloom on + each key insert. + See HColumnDescriptor and + for more information. + +
Secondary Indexes and Alternate Query Paths @@ -1420,6 +1434,65 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting> </section> </section> <!-- store --> + + <section xml:id="blooms"> + <title>Bloom Filters + Bloom filters were developed over in HBase-1200 + Add bloomfilters. + For description of the development process -- why static blooms + rather than dynamic -- and for an overview of the unique properties + that pertain to blooms in HBase, as well as possible future + directions, see the Development Process section + of the document BloomFilters + in HBase attached to HBase-1200. + + The bloom filters described here are actually version two of + blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom + option based on work done by the European Commission One-Lab + Project 034819. The core of the HBase bloom work was later + pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. + Version 1 of HBase blooms never worked that well. Version 2 is a + rewrite from scratch though again it starts with the one-lab + work. + + See also and . + + +
+ Bloom StoreFile footprint + + Bloom filters add an entry to the StoreFile + general FileInfo data structure and then two + extra entries to the StoreFile metadata + section. + +
+ BloomFilter in the <classname>StoreFile</classname> + <classname>FileInfo</classname> data structure + + FileInfo has a + BLOOM_FILTER_TYPE entry which is set to + NONE, ROW or + ROWCOL. +
+ +
+ BloomFilter entries in <classname>StoreFile</classname> + metadata + + BLOOM_FILTER_META holds Bloom Size, Hash + Function used, etc. Its small in size and is cached on + StoreFile.Reader load + BLOOM_FILTER_DATA is the actual bloomfilter + data. Obtained on-demand. Stored in the LRU cache, if it is enabled + (Its enabled by default). +
+
+
Block Cache @@ -1502,126 +1575,6 @@ HTable table2 = new HTable(conf2, "myTable"); - - - Bloom Filters - - Bloom filters were developed over in HBase-1200 - Add bloomfilters. - For description of the development process -- why static blooms - rather than dynamic -- and for an overview of the unique properties - that pertain to blooms in HBase, as well as possible future - directions, see the Development Process section - of the document BloomFilters - in HBase attached to HBase-1200. - - The bloom filters described here are actually version two of - blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom - option based on work done by the European Commission One-Lab - Project 034819. The core of the HBase bloom work was later - pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. - Version 1 of HBase blooms never worked that well. Version 2 is a - rewrite from scratch though again it starts with the one-lab - work. - - -
- Configurations - - Blooms are enabled by specifying options on a column family in the - HBase shell or in java code as specification on - org.apache.hadoop.hbase.HColumnDescriptor. - -
- <code>HColumnDescriptor</code> option - - Use HColumnDescriptor.setBloomFilterType(NONE | ROW | - ROWCOL) to enable blooms per Column Family. Default = - NONE for no bloom filters. If - ROW, the hash of the row will be added to the bloom - on each insert. If ROWCOL, the hash of the row + - column family + column family qualifier will be added to the bloom on - each key insert. -
- -
- <varname>io.hfile.bloom.enabled</varname> global kill - switch - - io.hfile.bloom.enabled in - Configuration serves as the kill switch in case - something goes wrong. Default = true. -
- -
- <varname>io.hfile.bloom.error.rate</varname> - - io.hfile.bloom.error.rate = average false - positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1 - bit per bloom entry. -
- -
- <varname>io.hfile.bloom.max.fold</varname> - - io.hfile.bloom.max.fold = guaranteed minimum - fold rate. Most people should leave this alone. Default = 7, or can - collapse to at least 1/128th of original size. See the - Development Process section of the document BloomFilters - in HBase for more on what this option means. -
-
- -
- Bloom StoreFile footprint - - Bloom filters add an entry to the StoreFile - general FileInfo data structure and then two - extra entries to the StoreFile metadata - section. - -
- BloomFilter in the <classname>StoreFile</classname> - <classname>FileInfo</classname> data structure - -
- <varname>BLOOM_FILTER_TYPE</varname> - - FileInfo has a - BLOOM_FILTER_TYPE entry which is set to - NONE, ROW or - ROWCOL. -
-
- -
- BloomFilter entries in <classname>StoreFile</classname> - metadata - -
- <varname>BLOOM_FILTER_META</varname> - - BLOOM_FILTER_META holds Bloom Size, Hash - Function used, etc. Its small in size and is cached on - StoreFile.Reader load -
- -
- <varname>BLOOM_FILTER_DATA</varname> - - BLOOM_FILTER_DATA is the actual bloomfilter - data. Obtained on-demand. Stored in the LRU cache, if it is enabled - (Its enabled by default). -
-
-
-
- diff --git a/src/docbkx/configuration.xml b/src/docbkx/configuration.xml index bf6071c7609..bde6c25f42c 100644 --- a/src/docbkx/configuration.xml +++ b/src/docbkx/configuration.xml @@ -1092,4 +1092,34 @@ of all regions.
+
+ Bloom Filter Configuration +
+ <varname>io.hfile.bloom.enabled</varname> global kill + switch + + io.hfile.bloom.enabled in + Configuration serves as the kill switch in case + something goes wrong. Default = true. +
+ +
+ <varname>io.hfile.bloom.error.rate</varname> + + io.hfile.bloom.error.rate = average false + positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1 + bit per bloom entry. +
+ +
+ <varname>io.hfile.bloom.max.fold</varname> + + io.hfile.bloom.max.fold = guaranteed minimum + fold rate. Most people should leave this alone. Default = 7, or can + collapse to at least 1/128th of original size. See the + Development Process section of the document BloomFilters + in HBase for more on what this option means. +
+