HBASE-4249 - performance.xml (adding network section)

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1161273 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-08-24 21:05:38 +00:00
parent 0c02bd374a
commit 31e9c81780
1 changed files with 54 additions and 2 deletions

View File

@ -24,7 +24,59 @@
<para>Watch out for swapping. Set swappiness to 0.</para>
</section>
</section>
<section xml:id="perf.network">
<title>Network</title>
<para>
Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware
that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more).
</para>
<para>
Important items to consider:
<itemizedlist>
<listitem>Switching capacity of the device</listitem>
<listitem>Number of systems connected</listitem>
<listitem>Uplink capacity</listitem>
</itemizedlist>
</para>
<section xml:id="perf.network.1switch">
<title>Single Switch</title>
<para>The single most important factor in this configuration is that the switching capacity of the hardware is capable of
handling the traffic which can be generated by all systems connected to the switch. Some lower priced commodity hardware
can have a slower switching capacity than could be utilized by a full switch.
</para>
</section>
<section xml:id="perf.network.2switch">
<title>Multiple Switches</title>
<para>Multiple switches are a potential pitfall in the architecture. The most common configuration of lower priced hardware is a
simple 1Gbps uplink from one switch to another. This often overlooked pinch point can easily become a bottleneck for cluster communication.
Especially with MapReduce jobs that are both reading and writing a lot of data the communication across this uplink could be saturated.
</para>
<para>Mitigation of this issue is fairly simple and can be accomplished in multiple ways:
<itemizedlist>
<listitem>Use appropriate hardware for the scale of the cluster which you're attempting to build.</listitem>
<listitem>Use larger single switch configurations i.e. single 48 port as opposed to 2x 24 port</listitem>
<listitem>Configure port trunking for uplinks to utilize multiple interfaces to increase cross switch bandwidth.</listitem>
</itemizedlist>
</para>
</section>
<section xml:id="perf.network.multirack">
<title>Multiple Racks</title>
<para>Multiple rack configurations carry the same potential issues as multiple switches, and can suffer performance degradation from two main areas:
<itemizedlist>
<listitem>Poor switch capacity performance</listitem>
<listitem>Insufficient uplink to another rack</listitem>
</itemizedlist>
If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing
more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
The downside of this method however, is in the overhead of ports that could potentially be used. An example of this is, creating an 8Gbps port channel from rack
A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster.
</para>
<para>Using 10Gbe links between racks will greatly increase performance, and assuming your switches support a 10Gbe uplink or allow for an expansion card will allow you to
save your ports for machines as opposed to uplinks.
</para>
</section>
</section> <!-- network -->
<section xml:id="jvm">
<title>Java</title>
@ -56,7 +108,7 @@
</section>
<section xml:id="perf.configurations">
<title>Configurations</title>
<title>HBase Configurations</title>
<para>See <xref linkend="recommended_configurations" />.</para>