HBASE-8143 HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM; DOC HOW TO AVOID

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1534504 13f79535-47bb-0310-9956-ffa450edef68
2013-10-22 05:44:17 +00:00 · 2013-10-22 05:44:17 +00:00 · 4c47c09a31
parent 670bc625b2
commit 4c47c09a31
1 changed files with 28 additions and 8 deletions
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@ -664,7 +664,19 @@ faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427
 Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
 more discussion around short circuit reads.
 </para>
-<para>To enable "short circuit" reads, you must set two configurations.
+<para>To enable "short circuit" reads, it will depend on your version of Hadoop.
    The original shortcircuit read patch was much improved upon in Hadoop 2 in
    <link xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link>.
    See <link xlink:href="http://blog.cloudera.com/blog/2013/08/how-improved-short-circuit-local-reads-bring-better-performance-and-security-to-hadoop/"></link> for details
    on the difference between the old and new implementations.  See
    <link xlink:href="http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Hadoop shortcircuit reads configuration page</link>
    for how to enable the later version of shortcircuit.
 </para>
 <para>If you are running on an old Hadoop, one that is without
    <link xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link> but that
    has
 <link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
 you must set two configurations.
 First, the hdfs-site.xml needs to be amended. Set
 the property  <varname>dfs.block.local-path-access.user</varname>
 to be the <emphasis>only</emphasis> user that can use the shortcut.
@ -686,7 +698,15 @@ username than the one configured here also has the shortcircuit
 enabled, it will get an Exception regarding an unauthorized access but
 the data will still be read.
 </para>
 <note xml:id="dfs.client.read.shortcircuit.buffer.size">
    <title>dfs.client.read.shortcircuit.buffer.size</title>
    <para>The default for this value is too high when running on a highly trafficed HBase.  Set it down from its
        1M default down to 128k or so.  Put this configuration in the HBase configs (its a HDFS client-side configuration).
        The Hadoop DFSClient in HBase will allocate a direct byte buffer of this size for <emphasis>each</emphasis>
    block it has open; given HBase keeps its HDFS files open all the time, this can add up quickly.</para>
 </note>
  </section>
    <section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
     <para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
     a MapReduce source or sink).  The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,