HBASE-7217 Documentation: Update section 11.5.1 to recommend that hbase.regionserver.checksum.verify is set
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1413793 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
7233553030
commit
6351fbaa34
|
@ -208,38 +208,10 @@
|
|||
</section>
|
||||
|
||||
</section>
|
||||
<section xml:id="perf.hdfs.configs">
|
||||
<title>HDFS Configuration</title>
|
||||
<section xml:id="perf.hdfs.configs.localread">
|
||||
<title>Leveraging local data</title>
|
||||
<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
|
||||
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
|
||||
it is possible for the DFSClient to take a "short circuit" and
|
||||
read directly from disk instead of going through the DataNode when the
|
||||
data is local. What this means for HBase is that the RegionServers can
|
||||
read directly off their machine's disks instead of having to open a
|
||||
socket to talk to the DataNode, the former being generally much
|
||||
faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
|
||||
Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
|
||||
more discussion around short circuit reads.
|
||||
</para>
|
||||
<para>To enable "short circuit" reads, you must set two configurations.
|
||||
First, the hdfs-site.xml needs to be amended. Set
|
||||
the property <varname>dfs.block.local-path-access.user</varname>
|
||||
to be the <emphasis>only</emphasis> user that can use the shortcut.
|
||||
This has to be the user that started HBase. Then in hbase-site.xml,
|
||||
set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
|
||||
</para>
|
||||
<para>
|
||||
The DataNodes need to be restarted in order to pick up the new
|
||||
configuration. Be aware that if a process started under another
|
||||
username than the one configured here also has the shortcircuit
|
||||
enabled, it will get an Exception regarding an unauthorized access but
|
||||
the data will still be read.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
|
||||
|
||||
<section xml:id="perf.zookeeper">
|
||||
<title>ZooKeeper</title>
|
||||
<para>See <xref linkend="zookeeper"/> for information on configuring ZooKeeper, and see the part
|
||||
|
@ -658,6 +630,39 @@ htable.close();</programlisting></para>
|
|||
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="perf.hdfs.configs.localread">
|
||||
<title>Leveraging local data</title>
|
||||
<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
|
||||
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
|
||||
it is possible for the DFSClient to take a "short circuit" and
|
||||
read directly from disk instead of going through the DataNode when the
|
||||
data is local. What this means for HBase is that the RegionServers can
|
||||
read directly off their machine's disks instead of having to open a
|
||||
socket to talk to the DataNode, the former being generally much
|
||||
faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
|
||||
Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
|
||||
more discussion around short circuit reads.
|
||||
</para>
|
||||
<para>To enable "short circuit" reads, you must set two configurations.
|
||||
First, the hdfs-site.xml needs to be amended. Set
|
||||
the property <varname>dfs.block.local-path-access.user</varname>
|
||||
to be the <emphasis>only</emphasis> user that can use the shortcut.
|
||||
This has to be the user that started HBase. Then in hbase-site.xml,
|
||||
set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
|
||||
</para>
|
||||
<para>
|
||||
For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled.
|
||||
To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into
|
||||
its datablocks and verify against these. See <xref linkend="hbase.regionserver.checksum.verify" />.
|
||||
</para>
|
||||
<para>
|
||||
The DataNodes need to be restarted in order to pick up the new
|
||||
configuration. Be aware that if a process started under another
|
||||
username than the one configured here also has the shortcircuit
|
||||
enabled, it will get an Exception regarding an unauthorized access but
|
||||
the data will still be read.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
|
||||
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
|
||||
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,
|
||||
|
|
Loading…
Reference in New Issue