diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index ca83609165e..1f3222c54c6 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@@ -202,8 +202,8 @@
hbase.regionserver.checksum.verifyHave HBase write the checksum into the datablock and save
- having to do the checksum seek whenever you read.
-
+ having to do the checksum seek whenever you read.
+
See ,
and
For more information see the
@@ -313,7 +313,7 @@ Result r = htable.get(get);
byte[] b = r.getValue(CF, ATTR); // returns current version of value
-
+
@@ -332,11 +332,11 @@ byte[] b = r.getValue(CF, ATTR); // returns current version of value
Table Creation: Pre-Creating Regions
-Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region
-until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions.
- Be somewhat conservative in this, because too-many regions can actually degrade performance.
+Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region
+until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions.
+ Be somewhat conservative in this, because too-many regions can actually degrade performance.
- There are two different approaches to pre-creating splits. The first approach is to rely on the default HBaseAdmin strategy
+ There are two different approaches to pre-creating splits. The first approach is to rely on the default HBaseAdmin strategy
(which is implemented in Bytes.split)...
@@ -664,7 +664,19 @@ fasterSee JD's HBase, mail # dev - read short circuit thread for
more discussion around short circuit reads.
-To enable "short circuit" reads, you must set two configurations.
+To enable "short circuit" reads, it will depend on your version of Hadoop.
+ The original shortcircuit read patch was much improved upon in Hadoop 2 in
+ HDFS-347.
+ See for details
+ on the difference between the old and new implementations. See
+ Hadoop shortcircuit reads configuration page
+ for how to enable the later version of shortcircuit.
+
+If you are running on an old Hadoop, one that is without
+ HDFS-347 but that
+ has
+HDFS-2246,
+you must set two configurations.
First, the hdfs-site.xml needs to be amended. Set
the property dfs.block.local-path-access.user
to be the only user that can use the shortcut.
@@ -686,7 +698,15 @@ username than the one configured here also has the shortcircuit
enabled, it will get an Exception regarding an unauthorized access but
the data will still be read.
+
+ dfs.client.read.shortcircuit.buffer.size
+ The default for this value is too high when running on a highly trafficed HBase. Set it down from its
+ 1M default down to 128k or so. Put this configuration in the HBase configs (its a HDFS client-side configuration).
+ The Hadoop DFSClient in HBase will allocate a direct byte buffer of this size for each
+ block it has open; given HBase keeps its HDFS files open all the time, this can add up quickly.
+
+
Performance Comparisons of HBase vs. HDFSA fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,