diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml
index 62a3514c27e..21348888c35 100644
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@@ -1316,7 +1316,7 @@ scan.setFilter(filter);
MasterHMaster is the implementation of the Master Server. The Master server
is responsible for monitoring all RegionServer instances in the cluster, and is
- the interface for all metadata changes.
+ the interface for all metadata changes. In a distributed cluster, the Master typically runs on the .
Startup BehaviorIf run in a multi-Master environment, all Masters compete to run the cluster. If the active
@@ -1352,7 +1352,8 @@ scan.setFilter(filter);
RegionServer
- HRegionServer is the RegionServer implementation. It is responsible for serving and managing regions.
+ HRegionServer is the RegionServer implementation. It is responsible for serving and managing regions.
+ In a distributed cluster, a RegionServer runs on a .
InterfaceThe methods exposed by HRegionRegionInterface contain both data-oriented and region-maintenance methods:
@@ -1711,6 +1712,27 @@ scan.setFilter(filter);
+
+ HDFS
+ As HBase runs on HDFS (and each StoreFile is written as a file on HDFS),
+ it is important to have an understanding of the HDFS Architecture
+ especially in terms of how it stores files, handles failovers, and replicates blocks.
+
+ See the Hadoop documentation on HDFS Architecture
+ for more information.
+
+ NameNode
+ The NameNode is responsible for maintaining the filesystem metadata. See the above HDFS Architecture link
+ for more information.
+
+
+ DataNode
+ The DataNodes are responsible for storing HDFS blocks. See the above HDFS Architecture link
+ for more information.
+
+
+
+
@@ -1889,15 +1911,15 @@ hbase> describe 't1'
- EC2
+ Amazon EC2
- Why doesn't my remote java connection into my ec2 cluster work?
+ I am running HBase on Amazon EC2 and...
- See Andrew's answer here, up on the user list: Remote Java client connection into EC2 instance.
-
+ See Troubleshooting and Performance sections.
+
diff --git a/src/docbkx/performance.xml b/src/docbkx/performance.xml
index 6c0a0bbd33c..1dfd4db8a36 100644
--- a/src/docbkx/performance.xml
+++ b/src/docbkx/performance.xml
@@ -409,15 +409,35 @@ htable.close();
+
+ HDFS
+ Because HBase runs on it is important to understand how it works and how it affects
+ HBase.
+
+ Current Issues With Low-Latency Reads
+ The original use-case for HDFS was batch processing. As such, there low-latency reads were historically not a priority.
+ With the increased adoption of HBase this is changing, and several improvements are already in development.
+ See the
+ Umbrella Jira Ticket for HDFS Improvements for HBase.
+
+
+ Performance Comparisons of HBase vs. HDFS
+ A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
+ a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,
+ returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this
+ processing context. Not that there isn't room for improvement (and this gap will, over time, be reduced), but HDFS
+ will always be faster in this use-case.
+
+
+ Amazon EC2
- Performance questions are common on Amazon EC2 environments because it is is a shared environment. You will
- not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same
- reason (i.e., it's a shared environment and you don't know what else is happening on the server).
-
- If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that
- because EC2 issues are practically a separate class of performance issues.
-
-
+ Performance questions are common on Amazon EC2 environments because it is is a shared environment. You will
+ not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same
+ reason (i.e., it's a shared environment and you don't know what else is happening on the server).
+
+ If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that
+ because EC2 issues are practically a separate class of performance issues.
+
diff --git a/src/docbkx/troubleshooting.xml b/src/docbkx/troubleshooting.xml
index 1ba03fe0045..fd757d489f2 100644
--- a/src/docbkx/troubleshooting.xml
+++ b/src/docbkx/troubleshooting.xml
@@ -793,6 +793,13 @@ ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expi
Questions on HBase and Amazon EC2 come up frequently on the HBase dist-list. Search for old threads using Search Hadoop
+
+ Remote Java Connection into EC2 Cluster Not Working
+
+ See Andrew's answer here, up on the user list: Remote Java client connection into EC2 instance.
+
+
+