diff --git a/src/docbkx/configuration.xml b/src/docbkx/configuration.xml index 8d8124952d5..e898e1d5489 100644 --- a/src/docbkx/configuration.xml +++ b/src/docbkx/configuration.xml @@ -30,10 +30,10 @@ This chapter is the Not-So-Quick start guide to HBase configuration. It goes over system requirements, Hadoop setup, the different HBase run modes, and the various configurations in HBase. Please read this chapter carefully. At a mimimum - ensure that all have + ensure that all have been satisfied. Failure to do so will cause you (and us) grief debugging strange errors and/or data loss. - + HBase uses the same configuration system as Hadoop. To configure a deploy, edit a file of environment variables @@ -57,7 +57,7 @@ to ensure well-formedness of your document after an edit session. content of the conf directory to all nodes of the cluster. HBase will not do this for you. Use rsync. - +
Basic Prerequisites This section lists required services and some required system configuration. @@ -69,7 +69,7 @@ to ensure well-formedness of your document after an edit session. xlink:href="http://www.java.com/download/">Oracle.
- Operating System + Operating System
ssh @@ -151,9 +151,9 @@ to ensure well-formedness of your document after an edit session. 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 Do yourself a favor and change the upper bound on the number of file descriptors. Set it to north of 10k. The math runs roughly as follows: per ColumnFamily - there is at least one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the + there is at least one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the average number of StoreFiles per ColumnFamily times the number of regions per RegionServer. For example, assuming - that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily, + that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM will open 3 * 3 * 100 = 900 file descriptors (not counting open jar files, config files, etc.) @@ -216,13 +216,13 @@ to ensure well-formedness of your document after an edit session. xlink:href="http://cygwin.com/">Cygwin to have a *nix-like environment for the shell scripts. The full details are explained in the Windows - Installation guide. Also + Installation guide. Also search our user mailing list to pick up latest fixes figured by Windows users.
- +
<link xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm> @@ -289,7 +289,7 @@ to ensure well-formedness of your document after an edit session. <link xlink:href="http://www.cloudera.com/">Cloudera</link> or <link xlink:href="http://www.mapr.com/">MapR</link> distributions. Cloudera' <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link> - is Apache Hadoop 0.20.x plus patches including all of the + is Apache Hadoop 0.20.x plus patches including all of the <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link> additions needed to add a durable sync. Use the released, most recent version of CDH3. In CDH, append support is enabled by default so you do not need to make the above mentioned edits to @@ -311,6 +311,16 @@ to ensure well-formedness of your document after an edit session. replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up.</para> + <note xml:id="bigtop"><title>Packaging and Apache BigTop + Apache Bigtop + is an umbrella for packaging and tests of the Apache Hadoop + ecosystem, including Apache HBase. Bigtop performs testing at various + levels (packaging, platform, runtime, upgrade, etc...), developed by a + community, with a focus on the system as a whole, rather than individual + projects. We recommend installing Apache HBase packages as provided by a + Bigtop release rather than rolling your own piecemeal integration of + various component releases. +
HBase on Secure Hadoop @@ -320,7 +330,7 @@ to ensure well-formedness of your document after an edit session. with the secure version. If you want to read more about how to setup Secure HBase, see .
- +
<varname>dfs.datanode.max.xcievers</varname><indexterm> <primary>xcievers</primary> @@ -354,7 +364,7 @@ to ensure well-formedness of your document after an edit session. <para>See also <xref linkend="casestudies.xceivers"/> </para> </section> - + </section> <!-- hadoop --> </section> @@ -418,7 +428,7 @@ to ensure well-formedness of your document after an edit session. HBase. Do not use this configuration for production nor for evaluating HBase performance.</para> - <para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>. + <para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>. </para> <para>Next, configure HBase. Below is an example <filename>conf/hbase-site.xml</filename>. This is the file into @@ -501,10 +511,10 @@ to ensure well-formedness of your document after an edit session. </programlisting> </para> </section> - + </section> - </section> + </section> <section xml:id="fully_dist"> <title>Fully-distributed @@ -600,7 +610,7 @@ to ensure well-formedness of your document after an edit session.
Running and Confirming Your Installation - + Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running bin/start-hdfs.sh over in the @@ -610,31 +620,31 @@ to ensure well-formedness of your document after an edit session. not normally use the mapreduce daemons. These do not need to be started. - + If you are managing your own ZooKeeper, start it and confirm its running else, HBase will start up ZooKeeper for you as part of its start process. - + Start HBase with the following command: - + bin/start-hbase.sh - Run the above from the + Run the above from the HBASE_HOME - directory. + directory. You should now have a running HBase instance. HBase logs can be found in the logs subdirectory. Check them out especially if HBase had trouble starting. - + HBase also puts up a UI listing vital attributes. By default its deployed on the Master host at port 60010 (HBase RegionServers listen @@ -644,13 +654,13 @@ to ensure well-formedness of your document after an edit session. Master's homepage you'd point your browser at http://master.example.org:60010. - + Once HBase has started, see the for how to create tables, add data, scan your insertions, and finally disable and drop your tables. - + To stop HBase after exiting the HBase shell enter $ ./bin/stop-hbase.sh @@ -660,15 +670,15 @@ stopping hbase............... Shutdown can take a moment to until HBase has shut down completely before stopping the Hadoop daemons. - +
- - - -
+ + + +
Configuration Files - +
<filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename> Just as in Hadoop where you add site-specific HDFS configuration @@ -744,11 +754,11 @@ stopping hbase............... Shutdown can take a moment to Minimally, a client of HBase needs several libraries in its CLASSPATH when connecting to a cluster, including: commons-configuration (commons-configuration-1.6.jar) -commons-lang (commons-lang-2.5.jar) -commons-logging (commons-logging-1.1.1.jar) -hadoop-core (hadoop-core-1.0.0.jar) +commons-lang (commons-lang-2.5.jar) +commons-logging (commons-logging-1.1.1.jar) +hadoop-core (hadoop-core-1.0.0.jar) hbase (hbase-0.92.0.jar) -log4j (log4j-1.2.16.jar) +log4j (log4j-1.2.16.jar) slf4j-api (slf4j-api-1.5.8.jar) slf4j-log4j (slf4j-log4j12-1.5.8.jar) zookeeper (zookeeper-3.4.2.jar) @@ -769,7 +779,7 @@ zookeeper (zookeeper-3.4.2.jar) ]]> - +
Java client configuration The configuration used by a Java client is kept @@ -778,15 +788,15 @@ zookeeper (zookeeper-3.4.2.jar) on invocation, will read in the content of the first hbase-site.xml found on the client's CLASSPATH, if one is present (Invocation will also factor in any hbase-default.xml found; - an hbase-default.xml ships inside the hbase.X.X.X.jar). + an hbase-default.xml ships inside the hbase.X.X.X.jar). It is also possible to specify configuration directly without having to read from a hbase-site.xml. For example, to set the ZooKeeper ensemble for the cluster programmatically do as follows: Configuration config = HBaseConfiguration.create(); -config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally +config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in a comma-separated list (just as in the hbase-site.xml file). - This populated Configuration instance can then be passed to an + This populated Configuration instance can then be passed to an HTable, and so on. @@ -794,7 +804,7 @@ config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zooke
- +
Example Configurations @@ -886,7 +896,7 @@ config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zooke 1G. - + $ git diff hbase-env.sh diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh index e70ebc6..96f8c27 100644 @@ -894,11 +904,11 @@ index e70ebc6..96f8c27 100644 +++ b/conf/hbase-env.sh @@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/ # export HBASE_CLASSPATH= - + # The maximum amount of heap to use, in MB. Default is 1000. -# export HBASE_HEAPSIZE=1000 +export HBASE_HEAPSIZE=4096 - + # Extra Java runtime options. # Below are what we set by default. May only work with SUN JVM. @@ -910,8 +920,8 @@ index e70ebc6..96f8c27 100644
- - + +
The Important Configurations Below we list what the important @@ -935,7 +945,7 @@ index e70ebc6..96f8c27 100644 configuration under control otherwise, a long garbage collection that lasts beyond the ZooKeeper session timeout will take out your RegionServer (You might be fine with this -- you probably want recovery to start - on the server if a RegionServer has been in GC for a long period of time). + on the server if a RegionServer has been in GC for a long period of time). To change this configuration, edit hbase-site.xml, copy the changed file around the cluster and restart. @@ -1011,7 +1021,7 @@ index e70ebc6..96f8c27 100644 cluster (You can always later manually split the big Regions should one prove hot and you want to spread the request load over the cluster). A lower number of regions is preferred, generally in the range of 20 to low-hundreds - per RegionServer. Adjust the regionsize as appropriate to achieve this number. + per RegionServer. Adjust the regionsize as appropriate to achieve this number. For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb. For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb). @@ -1019,10 +1029,10 @@ index e70ebc6..96f8c27 100644 You may need to experiment with this setting based on your hardware configuration and application needs. Adjust hbase.hregion.max.filesize in your hbase-site.xml. - RegionSize can also be set on a per-table basis via + RegionSize can also be set on a per-table basis via HTableDescriptor. - +
Managed Splitting @@ -1075,22 +1085,22 @@ of all regions.
Managed Compactions - A common administrative technique is to manage major compactions manually, rather than letting + A common administrative technique is to manage major compactions manually, rather than letting HBase do it. By default, HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions may kick in when you least desire it - especially on a busy system. To turn off automatic major compactions set - the value to 0. + the value to 0. It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when - they occur. They can be administered through the HBase shell, or via + they occur. They can be administered through the HBase shell, or via HBaseAdmin. For more information about compactions and the compaction file selection process, see
- +
Speculative Execution - Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off + Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off Speculative Execution at a system-level unless you need it for a specific case, where it can be configured per-job. - Set the properties mapred.map.tasks.speculative.execution and + Set the properties mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution to false.
@@ -1118,9 +1128,9 @@ of all regions. Inconsistent scan performance with caching set to 1 and the issue cited therein where setting notcpdelay improved scan speeds.
- + - +