HBASE-8143 HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM; DOC HOW TO AVOID

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1534504 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2013-10-22 05:44:17 +00:00
parent 670bc625b2
commit 4c47c09a31
1 changed files with 28 additions and 8 deletions

View File

@ -664,7 +664,19 @@ faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427
Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
more discussion around short circuit reads. more discussion around short circuit reads.
</para> </para>
<para>To enable "short circuit" reads, you must set two configurations. <para>To enable "short circuit" reads, it will depend on your version of Hadoop.
The original shortcircuit read patch was much improved upon in Hadoop 2 in
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link>.
See <link xlink:href="http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/"></link> for details
on the difference between the old and new implementations. See
<link xlink:href="http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Hadoop shortcircuit reads configuration page</link>
for how to enable the later version of shortcircuit.
</para>
<para>If you are running on an old Hadoop, one that is without
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link> but that
has
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
you must set two configurations.
First, the hdfs-site.xml needs to be amended. Set First, the hdfs-site.xml needs to be amended. Set
the property <varname>dfs.block.local-path-access.user</varname> the property <varname>dfs.block.local-path-access.user</varname>
to be the <emphasis>only</emphasis> user that can use the shortcut. to be the <emphasis>only</emphasis> user that can use the shortcut.
@ -686,7 +698,15 @@ username than the one configured here also has the shortcircuit
enabled, it will get an Exception regarding an unauthorized access but enabled, it will get an Exception regarding an unauthorized access but
the data will still be read. the data will still be read.
</para> </para>
<note xml:id="dfs.client.read.shortcircuit.buffer.size">
<title>dfs.client.read.shortcircuit.buffer.size</title>
<para>The default for this value is too high when running on a highly trafficed HBase. Set it down from its
1M default down to 128k or so. Put this configuration in the HBase configs (its a HDFS client-side configuration).
The Hadoop DFSClient in HBase will allocate a direct byte buffer of this size for <emphasis>each</emphasis>
block it has open; given HBase keeps its HDFS files open all the time, this can add up quickly.</para>
</note>
</section> </section>
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title> <section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as <para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues, a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,