diff --git a/src/docbkx/configuration.xml b/src/docbkx/configuration.xml
index 8d8124952d5..e898e1d5489 100644
--- a/src/docbkx/configuration.xml
+++ b/src/docbkx/configuration.xml
@@ -30,10 +30,10 @@
This chapter is the Not-So-Quick start guide to HBase configuration. It goes
over system requirements, Hadoop setup, the different HBase run modes, and the
various configurations in HBase. Please read this chapter carefully. At a mimimum
- ensure that all have
+ ensure that all have
been satisfied. Failure to do so will cause you (and us) grief debugging strange errors
and/or data loss.
-
+
HBase uses the same configuration system as Hadoop.
To configure a deploy, edit a file of environment variables
@@ -57,7 +57,7 @@ to ensure well-formedness of your document after an edit session.
content of the conf directory to
all nodes of the cluster. HBase will not do this for you.
Use rsync.
-
+
Basic PrerequisitesThis section lists required services and some required system configuration.
@@ -69,7 +69,7 @@ to ensure well-formedness of your document after an edit session.
xlink:href="http://www.java.com/download/">Oracle.
- Operating System
+ Operating Systemssh
@@ -151,9 +151,9 @@ to ensure well-formedness of your document after an edit session.
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
Do yourself a favor and change the upper bound on the
number of file descriptors. Set it to north of 10k. The math runs roughly as follows: per ColumnFamily
- there is at least one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the
+ there is at least one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the
average number of StoreFiles per ColumnFamily times the number of regions per RegionServer. For example, assuming
- that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily,
+ that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily,
and there are 100 regions per RegionServer, the JVM will open 3 * 3 * 100 = 900 file descriptors
(not counting open jar files, config files, etc.)
@@ -216,13 +216,13 @@ to ensure well-formedness of your document after an edit session.
xlink:href="http://cygwin.com/">Cygwin to have a *nix-like
environment for the shell scripts. The full details are explained in
the Windows
- Installation guide. Also
+ Installation guide. Also
search our user mailing list to pick
up latest fixes figured by Windows users.
-
+
Hadoop
@@ -289,7 +289,7 @@ to ensure well-formedness of your document after an edit session.
Cloudera or
MapR distributions.
Cloudera' CDH3
- is Apache Hadoop 0.20.x plus patches including all of the
+ is Apache Hadoop 0.20.x plus patches including all of the
branch-0.20-append
additions needed to add a durable sync. Use the released, most recent version of CDH3. In CDH, append
support is enabled by default so you do not need to make the above mentioned edits to
@@ -311,6 +311,16 @@ to ensure well-formedness of your document after an edit session.
replace the jar in HBase everywhere on your cluster. Hadoop version
mismatch issues have various manifestations but often all looks like
its hung up.
+ Packaging and Apache BigTop
+ Apache Bigtop
+ is an umbrella for packaging and tests of the Apache Hadoop
+ ecosystem, including Apache HBase. Bigtop performs testing at various
+ levels (packaging, platform, runtime, upgrade, etc...), developed by a
+ community, with a focus on the system as a whole, rather than individual
+ projects. We recommend installing Apache HBase packages as provided by a
+ Bigtop release rather than rolling your own piecemeal integration of
+ various component releases.
+ HBase on Secure Hadoop
@@ -320,7 +330,7 @@ to ensure well-formedness of your document after an edit session.
with the secure version. If you want to read more about how to setup
Secure HBase, see .
-
+
dfs.datanode.max.xcieversxcievers
@@ -354,7 +364,7 @@ to ensure well-formedness of your document after an edit session.
See also
-
+
@@ -418,7 +428,7 @@ to ensure well-formedness of your document after an edit session.
HBase. Do not use this configuration for production nor for
evaluating HBase performance.
- First, setup your HDFS in pseudo-distributed mode.
+ First, setup your HDFS in pseudo-distributed mode.
Next, configure HBase. Below is an example conf/hbase-site.xml.
This is the file into
@@ -501,10 +511,10 @@ to ensure well-formedness of your document after an edit session.
-
+
-
+
Fully-distributed
@@ -600,7 +610,7 @@ to ensure well-formedness of your document after an edit session.
Running and Confirming Your Installation
-
+
Make sure HDFS is running first. Start and stop the Hadoop HDFS
daemons by running bin/start-hdfs.sh over in the
@@ -610,31 +620,31 @@ to ensure well-formedness of your document after an edit session.
not normally use the mapreduce daemons. These do not need to be
started.
-
+
If you are managing your own ZooKeeper,
start it and confirm its running else, HBase will start up ZooKeeper
for you as part of its start process.
-
+
Start HBase with the following command:
-
+
bin/start-hbase.sh
- Run the above from the
+ Run the above from the
HBASE_HOME
- directory.
+ directory.
You should now have a running HBase instance. HBase logs can be
found in the logs subdirectory. Check them out
especially if HBase had trouble starting.
-
+
HBase also puts up a UI listing vital attributes. By default its
deployed on the Master host at port 60010 (HBase RegionServers listen
@@ -644,13 +654,13 @@ to ensure well-formedness of your document after an edit session.
Master's homepage you'd point your browser at
http://master.example.org:60010.
-
+
Once HBase has started, see the for how to
create tables, add data, scan your insertions, and finally disable and
drop your tables.
-
+
To stop HBase after exiting the HBase shell enter
$ ./bin/stop-hbase.sh
@@ -660,15 +670,15 @@ stopping hbase............... Shutdown can take a moment to
until HBase has shut down completely before stopping the Hadoop
daemons.
-
+
-
-
-
-
+
+
+
+ Configuration Files
-
+
hbase-site.xml and hbase-default.xmlJust as in Hadoop where you add site-specific HDFS configuration
@@ -744,11 +754,11 @@ stopping hbase............... Shutdown can take a moment to
Minimally, a client of HBase needs several libraries in its CLASSPATH when connecting to a cluster, including:
commons-configuration (commons-configuration-1.6.jar)
-commons-lang (commons-lang-2.5.jar)
-commons-logging (commons-logging-1.1.1.jar)
-hadoop-core (hadoop-core-1.0.0.jar)
+commons-lang (commons-lang-2.5.jar)
+commons-logging (commons-logging-1.1.1.jar)
+hadoop-core (hadoop-core-1.0.0.jar)
hbase (hbase-0.92.0.jar)
-log4j (log4j-1.2.16.jar)
+log4j (log4j-1.2.16.jar)
slf4j-api (slf4j-api-1.5.8.jar)
slf4j-log4j (slf4j-log4j12-1.5.8.jar)
zookeeper (zookeeper-3.4.2.jar)
@@ -769,7 +779,7 @@ zookeeper (zookeeper-3.4.2.jar)
]]>
-
+
Java client configurationThe configuration used by a Java client is kept
@@ -778,15 +788,15 @@ zookeeper (zookeeper-3.4.2.jar)
on invocation, will read in the content of the first hbase-site.xml found on
the client's CLASSPATH, if one is present
(Invocation will also factor in any hbase-default.xml found;
- an hbase-default.xml ships inside the hbase.X.X.X.jar).
+ an hbase-default.xml ships inside the hbase.X.X.X.jar).
It is also possible to specify configuration directly without having to read from a
hbase-site.xml. For example, to set the ZooKeeper
ensemble for the cluster programmatically do as follows:
Configuration config = HBaseConfiguration.create();
-config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally
+config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally
If multiple ZooKeeper instances make up your ZooKeeper ensemble,
they may be specified in a comma-separated list (just as in the hbase-site.xml file).
- This populated Configuration instance can then be passed to an
+ This populated Configuration instance can then be passed to an
HTable,
and so on.
@@ -794,7 +804,7 @@ config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zooke
-
+
Example Configurations
@@ -886,7 +896,7 @@ config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zooke
1G.
-
+
$ git diff hbase-env.sh
diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh
index e70ebc6..96f8c27 100644
@@ -894,11 +904,11 @@ index e70ebc6..96f8c27 100644
+++ b/conf/hbase-env.sh
@@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/
# export HBASE_CLASSPATH=
-
+
# The maximum amount of heap to use, in MB. Default is 1000.
-# export HBASE_HEAPSIZE=1000
+export HBASE_HEAPSIZE=4096
-
+
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
@@ -910,8 +920,8 @@ index e70ebc6..96f8c27 100644
-
-
+
+
The Important ConfigurationsBelow we list what the important
@@ -935,7 +945,7 @@ index e70ebc6..96f8c27 100644
configuration under control otherwise, a long garbage collection that lasts
beyond the ZooKeeper session timeout will take out
your RegionServer (You might be fine with this -- you probably want recovery to start
- on the server if a RegionServer has been in GC for a long period of time).
+ on the server if a RegionServer has been in GC for a long period of time).
To change this configuration, edit hbase-site.xml,
copy the changed file around the cluster and restart.
@@ -1011,7 +1021,7 @@ index e70ebc6..96f8c27 100644
cluster (You can always later manually split the big Regions should one prove
hot and you want to spread the request load over the cluster). A lower number of regions is
preferred, generally in the range of 20 to low-hundreds
- per RegionServer. Adjust the regionsize as appropriate to achieve this number.
+ per RegionServer. Adjust the regionsize as appropriate to achieve this number.
For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
@@ -1019,10 +1029,10 @@ index e70ebc6..96f8c27 100644
You may need to experiment with this setting based on your hardware configuration and application needs.
Adjust hbase.hregion.max.filesize in your hbase-site.xml.
- RegionSize can also be set on a per-table basis via
+ RegionSize can also be set on a per-table basis via
HTableDescriptor.
-
+
Managed Splitting
@@ -1075,22 +1085,22 @@ of all regions.
Managed Compactions
- A common administrative technique is to manage major compactions manually, rather than letting
+ A common administrative technique is to manage major compactions manually, rather than letting
HBase do it. By default, HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions
may kick in when you least desire it - especially on a busy system. To turn off automatic major compactions set
- the value to 0.
+ the value to 0.
It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when
- they occur. They can be administered through the HBase shell, or via
+ they occur. They can be administered through the HBase shell, or via
HBaseAdmin.
For more information about compactions and the compaction file selection process, see
-
+
Speculative Execution
- Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off
+ Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off
Speculative Execution at a system-level unless you need it for a specific case, where it can be configured per-job.
- Set the properties mapred.map.tasks.speculative.execution and
+ Set the properties mapred.map.tasks.speculative.execution and
mapred.reduce.tasks.speculative.execution to false.
@@ -1118,9 +1128,9 @@ of all regions.
Inconsistent scan performance with caching set to 1
and the issue cited therein where setting notcpdelay improved scan speeds.
-
+
-
+