From 31e9c8178086f865611be943b4281cb36aea204c Mon Sep 17 00:00:00 2001 From: Doug Meil Date: Wed, 24 Aug 2011 21:05:38 +0000 Subject: [PATCH] HBASE-4249 - performance.xml (adding network section) git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1161273 13f79535-47bb-0310-9956-ffa450edef68 --- src/docbkx/performance.xml | 56 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 54 insertions(+), 2 deletions(-) diff --git a/src/docbkx/performance.xml b/src/docbkx/performance.xml index f002775ad48..1a48858368c 100644 --- a/src/docbkx/performance.xml +++ b/src/docbkx/performance.xml @@ -24,7 +24,59 @@ Watch out for swapping. Set swappiness to 0. - +
+ Network + + Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware + that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more). + + + Important items to consider: + + Switching capacity of the device + Number of systems connected + Uplink capacity + + +
+ Single Switch + The single most important factor in this configuration is that the switching capacity of the hardware is capable of + handling the traffic which can be generated by all systems connected to the switch. Some lower priced commodity hardware + can have a slower switching capacity than could be utilized by a full switch. + +
+
+ Multiple Switches + Multiple switches are a potential pitfall in the architecture. The most common configuration of lower priced hardware is a + simple 1Gbps uplink from one switch to another. This often overlooked pinch point can easily become a bottleneck for cluster communication. + Especially with MapReduce jobs that are both reading and writing a lot of data the communication across this uplink could be saturated. + + Mitigation of this issue is fairly simple and can be accomplished in multiple ways: + + Use appropriate hardware for the scale of the cluster which you're attempting to build. + Use larger single switch configurations i.e. single 48 port as opposed to 2x 24 port + Configure port trunking for uplinks to utilize multiple interfaces to increase cross switch bandwidth. + + +
+
+ Multiple Racks + Multiple rack configurations carry the same potential issues as multiple switches, and can suffer performance degradation from two main areas: + + Poor switch capacity performance + Insufficient uplink to another rack + + If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing + more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks. + The downside of this method however, is in the overhead of ports that could potentially be used. An example of this is, creating an 8Gbps port channel from rack + A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster. + + Using 10Gbe links between racks will greatly increase performance, and assuming your switches support a 10Gbe uplink or allow for an expansion card will allow you to + save your ports for machines as opposed to uplinks. + + +
+
Java @@ -56,7 +108,7 @@
- Configurations + HBase Configurations See .