HBASE-7217 Documentation: Update section 11.5.1 to recommend that hbase.regionserver.checksum.verify is set

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1413793 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2012-11-26 19:26:15 +00:00
parent 7233553030
commit 6351fbaa34
1 changed files with 36 additions and 31 deletions

View File

@ -208,38 +208,10 @@
</section>
</section>
<section xml:id="perf.hdfs.configs">
<title>HDFS Configuration</title>
<section xml:id="perf.hdfs.configs.localread">
<title>Leveraging local data</title>
<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
it is possible for the DFSClient to take a "short circuit" and
read directly from disk instead of going through the DataNode when the
data is local. What this means for HBase is that the RegionServers can
read directly off their machine's disks instead of having to open a
socket to talk to the DataNode, the former being generally much
faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
more discussion around short circuit reads.
</para>
<para>To enable "short circuit" reads, you must set two configurations.
First, the hdfs-site.xml needs to be amended. Set
the property <varname>dfs.block.local-path-access.user</varname>
to be the <emphasis>only</emphasis> user that can use the shortcut.
This has to be the user that started HBase. Then in hbase-site.xml,
set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
</para>
<para>
The DataNodes need to be restarted in order to pick up the new
configuration. Be aware that if a process started under another
username than the one configured here also has the shortcircuit
enabled, it will get an Exception regarding an unauthorized access but
the data will still be read.
</para>
</section>
</section>
<section xml:id="perf.zookeeper">
<title>ZooKeeper</title>
<para>See <xref linkend="zookeeper"/> for information on configuring ZooKeeper, and see the part
@ -658,6 +630,39 @@ htable.close();</programlisting></para>
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
</para>
</section>
<section xml:id="perf.hdfs.configs.localread">
<title>Leveraging local data</title>
<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
it is possible for the DFSClient to take a "short circuit" and
read directly from disk instead of going through the DataNode when the
data is local. What this means for HBase is that the RegionServers can
read directly off their machine's disks instead of having to open a
socket to talk to the DataNode, the former being generally much
faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
more discussion around short circuit reads.
</para>
<para>To enable "short circuit" reads, you must set two configurations.
First, the hdfs-site.xml needs to be amended. Set
the property <varname>dfs.block.local-path-access.user</varname>
to be the <emphasis>only</emphasis> user that can use the shortcut.
This has to be the user that started HBase. Then in hbase-site.xml,
set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
</para>
<para>
For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled.
To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into
its datablocks and verify against these. See <xref linkend="hbase.regionserver.checksum.verify" />.
</para>
<para>
The DataNodes need to be restarted in order to pick up the new
configuration. Be aware that if a process started under another
username than the one configured here also has the shortcircuit
enabled, it will get an Exception regarding an unauthorized access but
the data will still be read.
</para>
</section>
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,