Add doc on direct memory, the block cache UI additions, list block cache options, downplay slab cache even more
This commit is contained in:
parent
5f4e85d3f9
commit
19979d770d
|
@ -1948,42 +1948,43 @@ rs.close();
|
|||
LruBlockCache, BucketCache, and SlabCache, which are both (usually) offheap. This section
|
||||
discusses benefits and drawbacks of each implementation, how to choose the appropriate
|
||||
option, and configuration options for each.</para>
|
||||
|
||||
<note><title>Block Cache Reporting: UI</title>
|
||||
<para>See the RegionServer UI for detail on caching deploy. Since HBase 1.0, the
|
||||
Block Cache detail has been significantly extended showing configurations,
|
||||
sizings, current usage, and even detail on block counts and types.</para>
|
||||
</note>
|
||||
|
||||
<section>
|
||||
|
||||
<title>Cache Choices</title>
|
||||
<para><classname>LruBlockCache</classname> is the original implementation, and is
|
||||
entirely within the Java heap. <classname>SlabCache</classname> and
|
||||
<classname>BucketCache</classname> are mainly intended for keeping blockcache
|
||||
data offheap, although BucketCache can also keep data onheap and in files.</para>
|
||||
<para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
|
||||
<para>BucketCache has seen more production deploys and has more deploy options. Fetching
|
||||
will always be slower when fetching from BucketCache or SlabCache, as compared with the
|
||||
native onheap LruBlockCache. However, latencies tend to be less erratic over time,
|
||||
because there is less garbage collection.</para>
|
||||
<para>Anecdotal evidence indicates that BucketCache requires less garbage collection than
|
||||
SlabCache so should be even less erratic (than SlabCache or LruBlockCache).</para>
|
||||
<para>SlabCache tends to do more garbage collections, because blocks are always moved
|
||||
between L1 and L2, at least given the way <classname>DoubleBlockCache</classname>
|
||||
currently works. When you enable SlabCache, you are enabling a two tier caching
|
||||
entirely within the Java heap. <classname>BucketCache</classname> is mainly
|
||||
intended for keeping blockcache data offheap, although BucketCache can also
|
||||
keep data onheap and serve from a file-backed cache. There is also an older
|
||||
offheap BlockCache, called SlabCache that has since been deprecated and
|
||||
removed in HBase 1.0.
|
||||
</para>
|
||||
|
||||
<para>Fetching will always be slower when fetching from BucketCache,
|
||||
as compared with the native onheap LruBlockCache. However, latencies tend to be
|
||||
less erratic across time, because there is less garbage collection. This is why
|
||||
you'd use BucketCache, so your latencies are less erratic and to mitigate GCs
|
||||
and heap fragmentation. See Nick Dimiduk's <link
|
||||
xlink:href="http://www.n10k.com/blog/blockcache-101/">BlockCache 101</link> for
|
||||
comparisons running onheap vs offheap tests.
|
||||
</para>
|
||||
|
||||
<para>When you enable BucketCache, you are enabling a two tier caching
|
||||
system, an L1 cache which is implemented by an instance of LruBlockCache and
|
||||
an offheap L2 cache which is implemented by SlabCache. Management of these
|
||||
two tiers and how blocks move between them is done by <classname>DoubleBlockCache</classname>
|
||||
when you are using SlabCache. DoubleBlockCache works by caching all blocks in L1
|
||||
AND L2. When blocks are evicted from L1, they are moved to L2. See
|
||||
<xref linkend="offheap.blockcache.slabcache" /> for more detail on how DoubleBlockCache works.
|
||||
</para>
|
||||
<para>The hosting class for BucketCache is <classname>CombinedBlockCache</classname>.
|
||||
It keeps all DATA blocks in the BucketCache and meta blocks -- INDEX and BLOOM blocks --
|
||||
an offheap L2 cache which is implemented by BucketCache. Management of these
|
||||
two tiers and the policy that dictates how blocks move between them is done by
|
||||
<classname>CombinedBlockCache</classname>. It keeps all DATA blocks in the L2
|
||||
BucketCache and meta blocks -- INDEX and BLOOM blocks --
|
||||
onheap in the L1 <classname>LruBlockCache</classname>.
|
||||
</para>
|
||||
<para>Because the hosting class for each implementation
|
||||
(<classname>DoubleBlockCache</classname> vs <classname>CombinedBlockCache</classname>)
|
||||
works so differently, it is difficult to do a fair comparison between BucketCache and SlabCache.
|
||||
See Nick Dimiduk's <link
|
||||
xlink:href="http://www.n10k.com/blog/blockcache-101/">BlockCache 101</link> for some
|
||||
numbers.</para>
|
||||
<para>For more information about the off heap cache options, see <xref
|
||||
linkend="offheap.blockcache" />.</para>
|
||||
See <xref linkend="offheap.blockcache" /> for more detail on going offheap.</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="cache.configurations">
|
||||
<title>General Cache Configurations</title>
|
||||
<para>Apart from the cache implementaiton itself, you can set some general
|
||||
|
@ -1993,6 +1994,7 @@ rs.close();
|
|||
After setting any of these options, restart or rolling restart your cluster for the
|
||||
configuration to take effect. Check logs for errors or unexpected behavior.</para>
|
||||
</section>
|
||||
|
||||
<section
|
||||
xml:id="block.cache.design">
|
||||
<title>LruBlockCache Design</title>
|
||||
|
@ -2136,7 +2138,7 @@ rs.close();
|
|||
xml:id="offheap.blockcache">
|
||||
<title>Offheap Block Cache</title>
|
||||
<section xml:id="offheap.blockcache.slabcache">
|
||||
<title>Enable SlabCache</title>
|
||||
<title>How to Enable SlabCache</title>
|
||||
<para><emphasis>SlabCache is deprecated and will be removed in 1.0!</emphasis></para>
|
||||
<para> SlabCache is originally described in <link
|
||||
xlink:href="http://blog.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/">Caching
|
||||
|
@ -2160,29 +2162,39 @@ rs.close();
|
|||
Check logs for errors or unexpected behavior.</para>
|
||||
</section>
|
||||
<section xml:id="enable.bucketcache">
|
||||
<title>Enable BucketCache</title>
|
||||
<para>The usual deploy of BucketCache is via a
|
||||
managing class that sets up two caching tiers: an L1 onheap cache
|
||||
implemented by LruBlockCache and a second L2 cache implemented
|
||||
with BucketCache. The managing class is <link
|
||||
xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html">CombinedBlockCache</link> by default. The just-previous link describes the mechanism of CombinedBlockCache. In short, it works
|
||||
<title>How to Enable BucketCache</title>
|
||||
<para>The usual deploy of BucketCache is via a managing class that sets up two caching tiers: an L1 onheap cache
|
||||
implemented by LruBlockCache and a second L2 cache implemented with BucketCache. The managing class is <link
|
||||
xlink:href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html">CombinedBlockCache</link> by default.
|
||||
The just-previous link describes the caching 'policy' implemented by CombinedBlockCache. In short, it works
|
||||
by keeping meta blocks -- INDEX and BLOOM in the L1, onheap LruBlockCache tier -- and DATA
|
||||
blocks are kept in the L2, BucketCache tier. It is possible to amend this behavior in
|
||||
HBase since version 1.0 and ask that a column family have both its meta and DATA blocks hosted onheap in the L1 tier by
|
||||
HBase since version 1.0 and ask that a column family has both its meta and DATA blocks hosted onheap in the L1 tier by
|
||||
setting <varname>cacheDataInL1</varname> via <programlisting>(HColumnDescriptor.setCacheDataInL1(true)</programlisting>
|
||||
or in the shell, creating or amending column families setting <varname>CACHE_DATA_IN_L1</varname>
|
||||
to true: e.g. <programlisting>hbase(main):003:0> create 't', {NAME => 't', CONFIGURATION => {CACHE_DATA_IN_L1 => 'true'}}</programlisting></para>
|
||||
<para>The BucketCache deploy can be onheap, offheap, or file based. You set which via the
|
||||
<varname>hbase.bucketcache.ioengine</varname> setting it to
|
||||
<varname>heap</varname> for BucketCache running as part of the java heap,
|
||||
<varname>offheap</varname> for BucketCache to make allocations offheap,
|
||||
and <varname>file:PATH_TO_FILE</varname> for BucketCache to use a file
|
||||
(Useful in particular if you have some fast i/o attached to the box such
|
||||
|
||||
<para>The BucketCache Block Cache can be deployed onheap, offheap, or file based.
|
||||
You set which via the
|
||||
<varname>hbase.bucketcache.ioengine</varname> setting. Setting it to
|
||||
<varname>heap</varname> will have BucketCache deployed inside the
|
||||
allocated java heap. Setting it to <varname>offheap</varname> will have
|
||||
BucketCache make its allocations offheap,
|
||||
and an ioengine setting of <varname>file:PATH_TO_FILE</varname> will direct
|
||||
BucketCache to use a file caching (Useful in particular if you have some fast i/o attached to the box such
|
||||
as SSDs).
|
||||
</para>
|
||||
<para>To disable CombinedBlockCache, and use the BucketCache as a strict L2 cache to the L1
|
||||
LruBlockCache, set <varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
|
||||
<literal>false</literal>. In this mode, on eviction from L1, blocks go to L2.</para>
|
||||
<para xml:id="raw.l1.l2">It is possible to deploy an L1+L2 setup where we bypass the CombinedBlockCache
|
||||
policy and have BucketCache working as a strict L2 cache to the L1
|
||||
LruBlockCache. For such a setup, set <varname>CacheConfig.BUCKET_CACHE_COMBINED_KEY</varname> to
|
||||
<literal>false</literal>. In this mode, on eviction from L1, blocks go to L2.
|
||||
When a block is cached, it is cached first in L1. When we go to look for a cached block,
|
||||
we look first in L1 and if none found, then search L2. Let us call this deploy format,
|
||||
<emphasis><indexterm><primary>Raw L1+L2</primary></indexterm></emphasis>.</para>
|
||||
<para>Other BucketCache configs include: specifying a location to persist cache to across
|
||||
restarts, how many threads to use writing the cache, etc. See the
|
||||
<link xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html">CacheConfig.html</link>
|
||||
class for configuration options and descriptions.</para>
|
||||
|
||||
<procedure>
|
||||
<title>BucketCache Example Configuration</title>
|
||||
|
@ -2230,6 +2242,27 @@ rs.close();
|
|||
In other words, you configure the L1 LruBlockCache as you would normally,
|
||||
as you would when there is no L2 BucketCache present.
|
||||
</para>
|
||||
<note xml:id="direct.memory">
|
||||
<title>Direct Memory Usage In HBase</title>
|
||||
<para>The default maximum direct memory varies by JVM. Traditionally it is 64M
|
||||
or some relation to allocated heap size (-Xmx) or no limit at all (JDK7 apparently).
|
||||
HBase servers use direct memory, in particular short-circuit reading, the hosted DFSClient will
|
||||
allocate direct memory buffers. If you do offheap block caching, you'll
|
||||
be making use of direct memory. Starting your JVM, make sure
|
||||
the <varname>-XX:MaxDirectMemorySize</varname> setting in
|
||||
<filename>conf/hbase-env.sh</filename> is set to some value that is
|
||||
higher than what you have allocated to your offheap blockcache
|
||||
(<varname>hbase.bucketcache.size</varname>). It should be larger than your offheap block
|
||||
cache and then some for DFSClient usage (How much the DFSClient uses is not
|
||||
easy to quantify; it is the number of open hfiles * <varname>hbase.dfs.client.read.shortcircuit.buffer.size</varname>
|
||||
where hbase.dfs.client.read.shortcircuit.buffer.size is set to 128k in HBase -- see <filename>hbase-default.xml</filename>
|
||||
default configurations).
|
||||
</para>
|
||||
<para>You can see how much memory -- onheap and offheap/direct -- a RegionServer is configured to use
|
||||
and how much it is using at any one time by looking at the
|
||||
<emphasis>Server Metrics: Memory</emphasis> tab in the UI.
|
||||
</para>
|
||||
</note>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
|
Loading…
Reference in New Issue