Add doc on direct memory, the block cache UI additions, list block cache options, downplay slab cache even more
This commit is contained in:
parent
5f4e85d3f9
commit
19979d770d
|
@ -1948,42 +1948,43 @@ rs.close();
|
||||||
LruBlockCache, BucketCache, and SlabCache, which are both (usually) offheap. This section
|
LruBlockCache, BucketCache, and SlabCache, which are both (usually) offheap. This section
|
||||||
discusses benefits and drawbacks of each implementation, how to choose the appropriate
|
discusses benefits and drawbacks of each implementation, how to choose the appropriate
|
||||||
option, and configuration options for each.</para>
|
option, and configuration options for each.</para>
|
||||||
|
|
||||||
|
<note><title>Block Cache Reporting: UI</title>
|
||||||
|
<para>See the RegionServer UI for detail on caching deploy. Since HBase 1.0, the
|
||||||
|
Block Cache detail has been significantly extended showing configurations,
|
||||||
|
sizings, current usage, and even detail on block counts and types.</para>
|
||||||
|
</note>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
|
||||||
<title>Cache Choices</title>
|
<title>Cache Choices</title>
|
||||||
<para><classname>LruBlockCache</classname> is the original implementation, and is
|
<para><classname>LruBlockCache</classname> is the original implementation, and is
|
||||||
entirely within the Java heap. <classname>SlabCache</classname> and
|
entirely within the Java heap. <classname>BucketCache</classname> is mainly
|
||||||
<classname>BucketCache</classname> are mainly intended for keeping blockcache
|
intended for keeping blockcache data offheap, although BucketCache can also
|
||||||
data offheap, although BucketCache can also keep data onheap and in files.</para>
|
keep data onheap and serve from a file-backed cache. There is also an older
|
||||||
<para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
|
offheap BlockCache, called SlabCache that has since been deprecated and
|
||||||
<para>BucketCache has seen more production deploys and has more deploy options. Fetching
|
removed in HBase 1.0.
|
||||||
will always be slower when fetching from BucketCache or SlabCache, as compared with the
|
</para>
|
||||||
native onheap LruBlockCache. However, latencies tend to be less erratic over time,
|
|
||||||
because there is less garbage collection.</para>
|
<para>Fetching will always be slower when fetching from BucketCache,
|
||||||
<para>Anecdotal evidence indicates that BucketCache requires less garbage collection than
|
as compared with the native onheap LruBlockCache. However, latencies tend to be
|
||||||
SlabCache so should be even less erratic (than SlabCache or LruBlockCache).</para>
|
less erratic across time, because there is less garbage collection. This is why
|
||||||
<para>SlabCache tends to do more garbage collections, because blocks are always moved
|
you'd use BucketCache, so your latencies are less erratic and to mitigate GCs
|
||||||
between L1 and L2, at least given the way <classname>DoubleBlockCache</classname>
|
and heap fragmentation. See Nick Dimiduk's <link
|
||||||
currently works. When you enable SlabCache, you are enabling a two tier caching
|
xlink:href="http://www.n10k.com/blog/blockcache-101/">BlockCache 101</link> for
|
||||||
|
comparisons running onheap vs offheap tests.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>When you enable BucketCache, you are enabling a two tier caching
|
||||||
system, an L1 cache which is implemented by an instance of LruBlockCache and
|
system, an L1 cache which is implemented by an instance of LruBlockCache and
|
||||||
an offheap L2 cache which is implemented by SlabCache. Management of these
|
an offheap L2 cache which is implemented by BucketCache. Management of these
|
||||||
two tiers and how blocks move between them is done by <classname>DoubleBlockCache</classname>
|
two tiers and the policy that dictates how blocks move between them is done by
|
||||||
when you are using SlabCache. DoubleBlockCache works by caching all blocks in L1
|
<classname>CombinedBlockCache</classname>. It keeps all DATA blocks in the L2
|
||||||
AND L2. When blocks are evicted from L1, they are moved to L2. See
|
BucketCache and meta blocks -- INDEX and BLOOM blocks --
|
||||||
<xref linkend="offheap.blockcache.slabcache" /> for more detail on how DoubleBlockCache works.
|
|
||||||
</para>
|
|
||||||
<para>The hosting class for BucketCache is <classname>CombinedBlockCache</classname>.
|
|
||||||
It keeps all DATA blocks in the BucketCache and meta blocks -- INDEX and BLOOM blocks --
|
|
||||||
onheap in the L1 <classname>LruBlockCache</classname>.
|
onheap in the L1 <classname>LruBlockCache</classname>.
|
||||||
</para>
|
See <xref linkend="offheap.blockcache" /> for more detail on going offheap.</para>
|
||||||
<para>Because the hosting class for each implementation
|
|
||||||
(<classname>DoubleBlockCache</classname> vs <classname>CombinedBlockCache</classname>)
|
|
||||||
works so differently, it is difficult to do a fair comparison between BucketCache and SlabCache.
|
|
||||||
See Nick Dimiduk's <link
|
|
||||||
xlink:href="http://www.n10k.com/blog/blockcache-101/">BlockCache 101</link> for some
|
|
||||||
numbers.</para>
|
|
||||||
<para>For more information about the off heap cache options, see <xref
|
|
||||||
linkend="offheap.blockcache" />.</para>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="cache.configurations">
|
<section xml:id="cache.configurations">
|
||||||
<title>General Cache Configurations</title>
|
<title>General Cache Configurations</title>
|
||||||
<para>Apart from the cache implementaiton itself, you can set some general
|
<para>Apart from the cache implementaiton itself, you can set some general
|
||||||
|
@ -1993,6 +1994,7 @@ rs.close();
|
||||||
After setting any of these options, restart or rolling restart your cluster for the
|
After setting any of these options, restart or rolling restart your cluster for the
|
||||||
configuration to take effect. Check logs for errors or unexpected behavior.</para>
|
configuration to take effect. Check logs for errors or unexpected behavior.</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section
|
<section
|
||||||
xml:id="block.cache.design">
|
xml:id="block.cache.design">
|
||||||
<title>LruBlockCache Design</title>
|
<title>LruBlockCache Design</title>
|
||||||
|
@ -2136,7 +2138,7 @@ rs.close();
|
||||||
xml:id="offheap.blockcache">
|
xml:id="offheap.blockcache">
|
||||||
<title>Offheap Block Cache</title>
|
<title>Offheap Block Cache</title>
|
||||||
<section xml:id="offheap.blockcache.slabcache">
|
<section xml:id="offheap.blockcache.slabcache">
|
||||||
<title>Enable SlabCache</title>
|
<title>How to Enable SlabCache</title>
|
||||||
<para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
|
<para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
|
||||||
<para> SlabCache is originally described in <link
|
<para> SlabCache is originally described in <link
|
||||||
xlink:href="http://blog.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/">Caching
|
xlink:href="http://blog.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/">Caching
|
||||||
|
@ -2160,29 +2162,39 @@ rs.close();
|
||||||
Check logs for errors or unexpected behavior.</para>
|
Check logs for errors or unexpected behavior.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="enable.bucketcache">
|
<section xml:id="enable.bucketcache">
|
||||||
<title>Enable BucketCache</title>
|
<title>How to Enable BucketCache</title>
|
||||||
<para>The usual deploy of BucketCache is via a
|
<para>The usual deploy of BucketCache is via a managing class that sets up two caching tiers: an L1 onheap cache
|
||||||
managing class that sets up two caching tiers: an L1 onheap cache
|
implemented by LruBlockCache and a second L2 cache implemented with BucketCache. The managing class is <link
|
||||||
implemented by LruBlockCache and a second L2 cache implemented
|
xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html">CombinedBlockCache</link> by default.
|
||||||
with BucketCache. The managing class is <link
|
The just-previous link describes the caching 'policy' implemented by CombinedBlockCache. In short, it works
|
||||||
xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html">CombinedBlockCache</link> by default. The just-previous link describes the mechanism of CombinedBlockCache. In short, it works
|
|
||||||
by keeping meta blocks -- INDEX and BLOOM in the L1, onheap LruBlockCache tier -- and DATA
|
by keeping meta blocks -- INDEX and BLOOM in the L1, onheap LruBlockCache tier -- and DATA
|
||||||
blocks are kept in the L2, BucketCache tier. It is possible to amend this behavior in
|
blocks are kept in the L2, BucketCache tier. It is possible to amend this behavior in
|
||||||
HBase since version 1.0 and ask that a column family have both its meta and DATA blocks hosted onheap in the L1 tier by
|
HBase since version 1.0 and ask that a column family has both its meta and DATA blocks hosted onheap in the L1 tier by
|
||||||
setting <varname>cacheDataInL1</varname> via <programlisting>(HColumnDescriptor.setCacheDataInL1(true)</programlisting>
|
setting <varname>cacheDataInL1</varname> via <programlisting>(HColumnDescriptor.setCacheDataInL1(true)</programlisting>
|
||||||
or in the shell, creating or amending column families setting <varname>CACHE_DATA_IN_L1</varname>
|
or in the shell, creating or amending column families setting <varname>CACHE_DATA_IN_L1</varname>
|
||||||
to true: e.g. <programlisting>hbase(main):003:0> create 't', {NAME => 't', CONFIGURATION => {CACHE_DATA_IN_L1 => 'true'}}</programlisting></para>
|
to true: e.g. <programlisting>hbase(main):003:0> create 't', {NAME => 't', CONFIGURATION => {CACHE_DATA_IN_L1 => 'true'}}</programlisting></para>
|
||||||
<para>The BucketCache deploy can be onheap, offheap, or file based. You set which via the
|
|
||||||
<varname>hbase.bucketcache.ioengine</varname> setting it to
|
<para>The BucketCache Block Cache can be deployed onheap, offheap, or file based.
|
||||||
<varname>heap</varname> for BucketCache running as part of the java heap,
|
You set which via the
|
||||||
<varname>offheap</varname> for BucketCache to make allocations offheap,
|
<varname>hbase.bucketcache.ioengine</varname> setting. Setting it to
|
||||||
and <varname>file:PATH_TO_FILE</varname> for BucketCache to use a file
|
<varname>heap</varname> will have BucketCache deployed inside the
|
||||||
(Useful in particular if you have some fast i/o attached to the box such
|
allocated java heap. Setting it to <varname>offheap</varname> will have
|
||||||
|
BucketCache make its allocations offheap,
|
||||||
|
and an ioengine setting of <varname>file:PATH_TO_FILE</varname> will direct
|
||||||
|
BucketCache to use a file caching (Useful in particular if you have some fast i/o attached to the box such
|
||||||
as SSDs).
|
as SSDs).
|
||||||
</para>
|
</para>
|
||||||
<para>To disable CombinedBlockCache, and use the BucketCache as a strict L2 cache to the L1
|
<para xml:id="raw.l1.l2">It is possible to deploy an L1+L2 setup where we bypass the CombinedBlockCache
|
||||||
LruBlockCache, set <varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
|
policy and have BucketCache working as a strict L2 cache to the L1
|
||||||
<literal>false</literal>. In this mode, on eviction from L1, blocks go to L2.</para>
|
LruBlockCache. For such a setup, set <varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
|
||||||
|
<literal>false</literal>. In this mode, on eviction from L1, blocks go to L2.
|
||||||
|
When a block is cached, it is cached first in L1. When we go to look for a cached block,
|
||||||
|
we look first in L1 and if none found, then search L2. Let us call this deploy format,
|
||||||
|
<emphasis><indexterm><primary>Raw L1+L2</primary></indexterm></emphasis>.</para>
|
||||||
|
<para>Other BucketCache configs include: specifying a location to persist cache to across
|
||||||
|
restarts, how many threads to use writing the cache, etc. See the
|
||||||
|
<link xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html">CacheConfig.html</link>
|
||||||
|
class for configuration options and descriptions.</para>
|
||||||
|
|
||||||
<procedure>
|
<procedure>
|
||||||
<title>BucketCache Example Configuration</title>
|
<title>BucketCache Example Configuration</title>
|
||||||
|
@ -2230,6 +2242,27 @@ rs.close();
|
||||||
In other words, you configure the L1 LruBlockCache as you would normally,
|
In other words, you configure the L1 LruBlockCache as you would normally,
|
||||||
as you would when there is no L2 BucketCache present.
|
as you would when there is no L2 BucketCache present.
|
||||||
</para>
|
</para>
|
||||||
|
<note xml:id="direct.memory">
|
||||||
|
<title>Direct Memory Usage In HBase</title>
|
||||||
|
<para>The default maximum direct memory varies by JVM. Traditionally it is 64M
|
||||||
|
or some relation to allocated heap size (-Xmx) or no limit at all (JDK7 apparently).
|
||||||
|
HBase servers use direct memory, in particular short-circuit reading, the hosted DFSClient will
|
||||||
|
allocate direct memory buffers. If you do offheap block caching, you'll
|
||||||
|
be making use of direct memory. Starting your JVM, make sure
|
||||||
|
the <varname>-XX:MaxDirectMemorySize</varname> setting in
|
||||||
|
<filename>conf/hbase-env.sh</filename> is set to some value that is
|
||||||
|
higher than what you have allocated to your offheap blockcache
|
||||||
|
(<varname>hbase.bucketcache.size</varname>). It should be larger than your offheap block
|
||||||
|
cache and then some for DFSClient usage (How much the DFSClient uses is not
|
||||||
|
easy to quantify; it is the number of open hfiles * <varname>hbase.dfs.client.read.shortcircuit.buffer.size</varname>
|
||||||
|
where hbase.dfs.client.read.shortcircuit.buffer.size is set to 128k in HBase -- see <filename>hbase-default.xml</filename>
|
||||||
|
default configurations).
|
||||||
|
</para>
|
||||||
|
<para>You can see how much memory -- onheap and offheap/direct -- a RegionServer is configured to use
|
||||||
|
and how much it is using at any one time by looking at the
|
||||||
|
<emphasis>Server Metrics: Memory</emphasis> tab in the UI.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|
Loading…
Reference in New Issue