HBASE-7217 Documentation: Update section 11.5.1 to recommend that hbase.regionserver.checksum.verify is set
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1413793 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
7233553030
commit
6351fbaa34
|
@ -208,38 +208,10 @@
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="perf.hdfs.configs">
|
|
||||||
<title>HDFS Configuration</title>
|
|
||||||
<section xml:id="perf.hdfs.configs.localread">
|
|
||||||
<title>Leveraging local data</title>
|
|
||||||
<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
|
|
||||||
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
|
|
||||||
it is possible for the DFSClient to take a "short circuit" and
|
|
||||||
read directly from disk instead of going through the DataNode when the
|
|
||||||
data is local. What this means for HBase is that the RegionServers can
|
|
||||||
read directly off their machine's disks instead of having to open a
|
|
||||||
socket to talk to the DataNode, the former being generally much
|
|
||||||
faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
|
|
||||||
Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
|
|
||||||
more discussion around short circuit reads.
|
|
||||||
</para>
|
|
||||||
<para>To enable "short circuit" reads, you must set two configurations.
|
|
||||||
First, the hdfs-site.xml needs to be amended. Set
|
|
||||||
the property <varname>dfs.block.local-path-access.user</varname>
|
|
||||||
to be the <emphasis>only</emphasis> user that can use the shortcut.
|
|
||||||
This has to be the user that started HBase. Then in hbase-site.xml,
|
|
||||||
set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
The DataNodes need to be restarted in order to pick up the new
|
|
||||||
configuration. Be aware that if a process started under another
|
|
||||||
username than the one configured here also has the shortcircuit
|
|
||||||
enabled, it will get an Exception regarding an unauthorized access but
|
|
||||||
the data will still be read.
|
|
||||||
</para>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
</section>
|
|
||||||
|
|
||||||
|
|
||||||
<section xml:id="perf.zookeeper">
|
<section xml:id="perf.zookeeper">
|
||||||
<title>ZooKeeper</title>
|
<title>ZooKeeper</title>
|
||||||
<para>See <xref linkend="zookeeper"/> for information on configuring ZooKeeper, and see the part
|
<para>See <xref linkend="zookeeper"/> for information on configuring ZooKeeper, and see the part
|
||||||
|
@ -657,6 +629,39 @@ htable.close();</programlisting></para>
|
||||||
See the
|
See the
|
||||||
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
|
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
|
||||||
</para>
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="perf.hdfs.configs.localread">
|
||||||
|
<title>Leveraging local data</title>
|
||||||
|
<para>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
|
||||||
|
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>,
|
||||||
|
it is possible for the DFSClient to take a "short circuit" and
|
||||||
|
read directly from disk instead of going through the DataNode when the
|
||||||
|
data is local. What this means for HBase is that the RegionServers can
|
||||||
|
read directly off their machine's disks instead of having to open a
|
||||||
|
socket to talk to the DataNode, the former being generally much
|
||||||
|
faster<footnote><para>See JD's <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance Talk</link></para></footnote>.
|
||||||
|
Also see <link xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short circuit</link> thread for
|
||||||
|
more discussion around short circuit reads.
|
||||||
|
</para>
|
||||||
|
<para>To enable "short circuit" reads, you must set two configurations.
|
||||||
|
First, the hdfs-site.xml needs to be amended. Set
|
||||||
|
the property <varname>dfs.block.local-path-access.user</varname>
|
||||||
|
to be the <emphasis>only</emphasis> user that can use the shortcut.
|
||||||
|
This has to be the user that started HBase. Then in hbase-site.xml,
|
||||||
|
set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled.
|
||||||
|
To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into
|
||||||
|
its datablocks and verify against these. See <xref linkend="hbase.regionserver.checksum.verify" />.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The DataNodes need to be restarted in order to pick up the new
|
||||||
|
configuration. Be aware that if a process started under another
|
||||||
|
username than the one configured here also has the shortcircuit
|
||||||
|
enabled, it will get an Exception regarding an unauthorized access but
|
||||||
|
the data will still be read.
|
||||||
|
</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
|
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
|
||||||
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
|
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
|
||||||
|
|
Loading…
Reference in New Issue