HBASE-8143 HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM; DOC HOW TO AVOID

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1534504 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2013-10-22 05:44:17 +00:00
parent 670bc625b2
commit 4c47c09a31
1 changed files with 28 additions and 8 deletions

View File

@ -664,7 +664,19 @@ faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427
Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
more discussion around short circuit reads.
</para>
<para>To enable "short circuit" reads, you must set two configurations.
<para>To enable "short circuit" reads, it will depend on your version of Hadoop.
The original shortcircuit read patch was much improved upon in Hadoop 2 in
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link>.
See <link xlink:href="http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/"></link> for details
on the difference between the old and new implementations. See
<link xlink:href="http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Hadoop shortcircuit reads configuration page</link>
for how to enable the later version of shortcircuit.
</para>
<para>If you are running on an old Hadoop, one that is without
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link> but that
has
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
you must set two configurations.
First, the hdfs-site.xml needs to be amended. Set
the property <varname>dfs.block.local-path-access.user</varname>
to be the <emphasis>only</emphasis> user that can use the shortcut.
@ -686,7 +698,15 @@ username than the one configured here also has the shortcircuit
enabled, it will get an Exception regarding an unauthorized access but
the data will still be read.
</para>
<note xml:id="dfs.client.read.shortcircuit.buffer.size">
<title>dfs.client.read.shortcircuit.buffer.size</title>
<para>The default for this value is too high when running on a highly trafficed HBase. Set it down from its
1M default down to 128k or so. Put this configuration in the HBase configs (its a HDFS client-side configuration).
The Hadoop DFSClient in HBase will allocate a direct byte buffer of this size for <emphasis>each</emphasis>
block it has open; given HBase keeps its HDFS files open all the time, this can add up quickly.</para>
</note>
</section>
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,