HBASE-4598 book update (book.xml, perf.xml, trouble.xml)
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1184830 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
8adbd7f62e
commit
6f65994f51
|
@ -1316,7 +1316,7 @@ scan.setFilter(filter);
|
|||
<section xml:id="master"><title>Master</title>
|
||||
<para><code>HMaster</code> is the implementation of the Master Server. The Master server
|
||||
is responsible for monitoring all RegionServer instances in the cluster, and is
|
||||
the interface for all metadata changes.
|
||||
the interface for all metadata changes. In a distributed cluster, the Master typically runs on the <xref linkend="arch.hdfs.nn" />.
|
||||
</para>
|
||||
<section xml:id="master.startup"><title>Startup Behavior</title>
|
||||
<para>If run in a multi-Master environment, all Masters compete to run the cluster. If the active
|
||||
|
@ -1353,6 +1353,7 @@ scan.setFilter(filter);
|
|||
</section>
|
||||
<section xml:id="regionserver.arch"><title>RegionServer</title>
|
||||
<para><code>HRegionServer</code> is the RegionServer implementation. It is responsible for serving and managing regions.
|
||||
In a distributed cluster, a RegionServer runs on a <xref linkend="arch.hdfs.dn" />.
|
||||
</para>
|
||||
<section xml:id="regionserver.arch.api"><title>Interface</title>
|
||||
<para>The methods exposed by <code>HRegionRegionInterface</code> contain both data-oriented and region-maintenance methods:
|
||||
|
@ -1711,6 +1712,27 @@ scan.setFilter(filter);
|
|||
</section> <!-- bloom -->
|
||||
|
||||
</section>
|
||||
|
||||
<section xml:id="arch.hdfs"><title>HDFS</title>
|
||||
<para>As HBase runs on HDFS (and each StoreFile is written as a file on HDFS),
|
||||
it is important to have an understanding of the HDFS Architecture
|
||||
especially in terms of how it stores files, handles failovers, and replicates blocks.
|
||||
</para>
|
||||
<para>See the Hadoop documentation on <link xlink:href="http://hadoop.apache.org/common/docs/current/hdfs_design.html">HDFS Architecture</link>
|
||||
for more information.
|
||||
</para>
|
||||
<section xml:id="arch.hdfs.nn"><title>NameNode</title>
|
||||
<para>The NameNode is responsible for maintaining the filesystem metadata. See the above HDFS Architecture link
|
||||
for more information.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="arch.hdfs.dn"><title>DataNode</title>
|
||||
<para>The DataNodes are responsible for storing HDFS blocks. See the above HDFS Architecture link
|
||||
for more information.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</chapter> <!-- architecture -->
|
||||
|
||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="external_apis.xml" />
|
||||
|
@ -1889,15 +1911,15 @@ hbase> describe 't1'</programlisting>
|
|||
</answer>
|
||||
</qandaentry>
|
||||
</qandadiv>
|
||||
<qandadiv xml:id="ec2"><title>EC2</title>
|
||||
<qandadiv xml:id="ec2"><title>Amazon EC2</title>
|
||||
<qandaentry>
|
||||
<question><para>
|
||||
Why doesn't my remote java connection into my ec2 cluster work?
|
||||
I am running HBase on Amazon EC2 and...
|
||||
</para></question>
|
||||
<answer>
|
||||
<para>
|
||||
See Andrew's answer here, up on the user list: <link xlink:href="http://search-hadoop.com/m/sPdqNFAwyg2">Remote Java client connection into EC2 instance</link>.
|
||||
</para>
|
||||
See Troubleshooting <xref linkend="trouble.ec2" /> and Performance <xref linkend="perf.ec2" /> sections.
|
||||
</para>
|
||||
</answer>
|
||||
</qandaentry>
|
||||
</qandadiv>
|
||||
|
|
|
@ -410,14 +410,34 @@ htable.close();</programlisting></para>
|
|||
</section>
|
||||
</section> <!-- deleting -->
|
||||
|
||||
<section xml:id="perf.ec2"><title>Amazon EC2</title>
|
||||
<para>Performance questions are common on Amazon EC2 environments because it is is a shared environment. You will
|
||||
not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same
|
||||
reason (i.e., it's a shared environment and you don't know what else is happening on the server).
|
||||
</para>
|
||||
<para>If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that
|
||||
because EC2 issues are practically a separate class of performance issues.
|
||||
<section xml:id="perf.hdfs"><title>HDFS</title>
|
||||
<para>Because HBase runs on <xref linkend="arch.hdfs" /> it is important to understand how it works and how it affects
|
||||
HBase.
|
||||
</para>
|
||||
<section xml:id="perf.hdfs.curr"><title>Current Issues With Low-Latency Reads</title>
|
||||
<para>The original use-case for HDFS was batch processing. As such, there low-latency reads were historically not a priority.
|
||||
With the increased adoption of HBase this is changing, and several improvements are already in development.
|
||||
See the
|
||||
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
|
||||
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
|
||||
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,
|
||||
returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this
|
||||
processing context. Not that there isn't room for improvement (and this gap will, over time, be reduced), but HDFS
|
||||
will always be faster in this use-case.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</para>
|
||||
<section xml:id="perf.ec2"><title>Amazon EC2</title>
|
||||
<para>Performance questions are common on Amazon EC2 environments because it is is a shared environment. You will
|
||||
not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same
|
||||
reason (i.e., it's a shared environment and you don't know what else is happening on the server).
|
||||
</para>
|
||||
<para>If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that
|
||||
because EC2 issues are practically a separate class of performance issues.
|
||||
</para>
|
||||
</section>
|
||||
</chapter>
|
||||
|
|
|
@ -793,6 +793,13 @@ ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expi
|
|||
<para>Questions on HBase and Amazon EC2 come up frequently on the HBase dist-list. Search for old threads using <link xlink:href="http://search-hadoop.com/">Search Hadoop</link>
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="trouble.ec2.connection">
|
||||
<title>Remote Java Connection into EC2 Cluster Not Working</title>
|
||||
<para>
|
||||
See Andrew's answer here, up on the user list: <link xlink:href="http://search-hadoop.com/m/sPdqNFAwyg2">Remote Java client connection into EC2 instance</link>.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
</chapter>
|
||||
|
|
Loading…
Reference in New Issue