Add in Andrew Purtell's BigTop pointer

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1400526 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2012-10-20 22:28:59 +00:00
parent a5bd102cd8
commit 77707f9b0a
1 changed files with 65 additions and 55 deletions

View File

@ -30,10 +30,10 @@
<para>This chapter is the Not-So-Quick start guide to HBase configuration. It goes
over system requirements, Hadoop setup, the different HBase run modes, and the
various configurations in HBase. Please read this chapter carefully. At a mimimum
ensure that all <xref linkend="basic.prerequisites" /> have
ensure that all <xref linkend="basic.prerequisites" /> have
been satisfied. Failure to do so will cause you (and us) grief debugging strange errors
and/or data loss.</para>
<para>
HBase uses the same configuration system as Hadoop.
To configure a deploy, edit a file of environment variables
@ -57,7 +57,7 @@ to ensure well-formedness of your document after an edit session.
content of the <filename>conf</filename> directory to
all nodes of the cluster. HBase will not do this for you.
Use <command>rsync</command>.</para>
<section xml:id="basic.prerequisites">
<title>Basic Prerequisites</title>
<para>This section lists required services and some required system configuration.
@ -69,7 +69,7 @@ to ensure well-formedness of your document after an edit session.
xlink:href="http://www.java.com/download/">Oracle</link>.</para>
</section>
<section xml:id="os">
<title>Operating System</title>
<title>Operating System</title>
<section xml:id="ssh">
<title>ssh</title>
@ -151,9 +151,9 @@ to ensure well-formedness of your document after an edit session.
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
</programlisting> Do yourself a favor and change the upper bound on the
number of file descriptors. Set it to north of 10k. The math runs roughly as follows: per ColumnFamily
there is at least one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the
there is at least one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the
average number of StoreFiles per ColumnFamily times the number of regions per RegionServer. For example, assuming
that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily,
that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily,
and there are 100 regions per RegionServer, the JVM will open 3 * 3 * 100 = 900 file descriptors
(not counting open jar files, config files, etc.)
</para>
@ -216,13 +216,13 @@ to ensure well-formedness of your document after an edit session.
xlink:href="http://cygwin.com/">Cygwin</link> to have a *nix-like
environment for the shell scripts. The full details are explained in
the <link xlink:href="http://hbase.apache.org/cygwin.html">Windows
Installation</link> guide. Also
Installation</link> guide. Also
<link xlink:href="http://search-hadoop.com/?q=hbase+windows&amp;fc_project=HBase&amp;fc_type=mail+_hash_+dev">search our user mailing list</link> to pick
up latest fixes figured by Windows users.</para>
</section>
</section> <!-- OS -->
<section xml:id="hadoop">
<title><link
xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm>
@ -289,7 +289,7 @@ to ensure well-formedness of your document after an edit session.
<link xlink:href="http://www.cloudera.com/">Cloudera</link> or
<link xlink:href="http://www.mapr.com/">MapR</link> distributions.
Cloudera' <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>
is Apache Hadoop 0.20.x plus patches including all of the
is Apache Hadoop 0.20.x plus patches including all of the
<link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
additions needed to add a durable sync. Use the released, most recent version of CDH3. In CDH, append
support is enabled by default so you do not need to make the above mentioned edits to
@ -311,6 +311,16 @@ to ensure well-formedness of your document after an edit session.
replace the jar in HBase everywhere on your cluster. Hadoop version
mismatch issues have various manifestations but often all looks like
its hung up.</para>
<note xml:id="bigtop"><title>Packaging and Apache BigTop</title>
<para><link xlink:href="http://bigtop.apache.org">Apache Bigtop</link>
is an umbrella for packaging and tests of the Apache Hadoop
ecosystem, including Apache HBase. Bigtop performs testing at various
levels (packaging, platform, runtime, upgrade, etc...), developed by a
community, with a focus on the system as a whole, rather than individual
projects. We recommend installing Apache HBase packages as provided by a
Bigtop release rather than rolling your own piecemeal integration of
various component releases.</para>
</note>
<section xml:id="hadoop.security">
<title>HBase on Secure Hadoop</title>
@ -320,7 +330,7 @@ to ensure well-formedness of your document after an edit session.
with the secure version. If you want to read more about how to setup
Secure HBase, see <xref linkend="hbase.secure.configuration" />.</para>
</section>
<section xml:id="dfs.datanode.max.xcievers">
<title><varname>dfs.datanode.max.xcievers</varname><indexterm>
<primary>xcievers</primary>
@ -354,7 +364,7 @@ to ensure well-formedness of your document after an edit session.
<para>See also <xref linkend="casestudies.xceivers"/>
</para>
</section>
</section> <!-- hadoop -->
</section>
@ -418,7 +428,7 @@ to ensure well-formedness of your document after an edit session.
HBase. Do not use this configuration for production nor for
evaluating HBase performance.</para>
<para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>.
<para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>.
</para>
<para>Next, configure HBase. Below is an example <filename>conf/hbase-site.xml</filename>.
This is the file into
@ -501,10 +511,10 @@ to ensure well-formedness of your document after an edit session.
</programlisting>
</para>
</section>
</section>
</section>
</section>
<section xml:id="fully_dist">
<title>Fully-distributed</title>
@ -600,7 +610,7 @@ to ensure well-formedness of your document after an edit session.
<section xml:id="confirm">
<title>Running and Confirming Your Installation</title>
<para>Make sure HDFS is running first. Start and stop the Hadoop HDFS
daemons by running <filename>bin/start-hdfs.sh</filename> over in the
@ -610,31 +620,31 @@ to ensure well-formedness of your document after an edit session.
not normally use the mapreduce daemons. These do not need to be
started.</para>
<para><emphasis>If</emphasis> you are managing your own ZooKeeper,
start it and confirm its running else, HBase will start up ZooKeeper
for you as part of its start process.</para>
<para>Start HBase with the following command:</para>
<programlisting>bin/start-hbase.sh</programlisting>
Run the above from the
Run the above from the
<varname>HBASE_HOME</varname>
directory.
directory.
<para>You should now have a running HBase instance. HBase logs can be
found in the <filename>logs</filename> subdirectory. Check them out
especially if HBase had trouble starting.</para>
<para>HBase also puts up a UI listing vital attributes. By default its
deployed on the Master host at port 60010 (HBase RegionServers listen
@ -644,13 +654,13 @@ to ensure well-formedness of your document after an edit session.
Master's homepage you'd point your browser at
<filename>http://master.example.org:60010</filename>.</para>
<para>Once HBase has started, see the <xref linkend="shell_exercises" /> for how to
create tables, add data, scan your insertions, and finally disable and
drop your tables.</para>
<para>To stop HBase after exiting the HBase shell enter
<programlisting>$ ./bin/stop-hbase.sh
@ -660,15 +670,15 @@ stopping hbase...............</programlisting> Shutdown can take a moment to
until HBase has shut down completely before stopping the Hadoop
daemons.</para>
</section>
</section> <!-- run modes -->
<section xml:id="config.files">
<section xml:id="config.files">
<title>Configuration Files</title>
<section xml:id="hbase.site">
<title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
<para>Just as in Hadoop where you add site-specific HDFS configuration
@ -744,11 +754,11 @@ stopping hbase...............</programlisting> Shutdown can take a moment to
Minimally, a client of HBase needs several libraries in its <varname>CLASSPATH</varname> when connecting to a cluster, including:
<programlisting>
commons-configuration (commons-configuration-1.6.jar)
commons-lang (commons-lang-2.5.jar)
commons-logging (commons-logging-1.1.1.jar)
hadoop-core (hadoop-core-1.0.0.jar)
commons-lang (commons-lang-2.5.jar)
commons-logging (commons-logging-1.1.1.jar)
hadoop-core (hadoop-core-1.0.0.jar)
hbase (hbase-0.92.0.jar)
log4j (log4j-1.2.16.jar)
log4j (log4j-1.2.16.jar)
slf4j-api (slf4j-api-1.5.8.jar)
slf4j-log4j (slf4j-log4j12-1.5.8.jar)
zookeeper (zookeeper-3.4.2.jar)</programlisting>
@ -769,7 +779,7 @@ zookeeper (zookeeper-3.4.2.jar)</programlisting>
</configuration>
]]></programlisting>
</para>
<section xml:id="java.client.config">
<title>Java client configuration</title>
<para>The configuration used by a Java client is kept
@ -778,15 +788,15 @@ zookeeper (zookeeper-3.4.2.jar)</programlisting>
on invocation, will read in the content of the first <filename>hbase-site.xml</filename> found on
the client's <varname>CLASSPATH</varname>, if one is present
(Invocation will also factor in any <filename>hbase-default.xml</filename> found;
an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>).
an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>).
It is also possible to specify configuration directly without having to read from a
<filename>hbase-site.xml</filename>. For example, to set the ZooKeeper
ensemble for the cluster programmatically do as follows:
<programlisting>Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally</programlisting>
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally</programlisting>
If multiple ZooKeeper instances make up your ZooKeeper ensemble,
they may be specified in a comma-separated list (just as in the <filename>hbase-site.xml</filename> file).
This populated <classname>Configuration</classname> instance can then be passed to an
This populated <classname>Configuration</classname> instance can then be passed to an
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>,
and so on.
</para>
@ -794,7 +804,7 @@ config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zooke
</section>
</section> <!-- config files -->
<section xml:id="example_config">
<title>Example Configurations</title>
@ -886,7 +896,7 @@ config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zooke
1G.</para>
<programlisting>
$ git diff hbase-env.sh
diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh
index e70ebc6..96f8c27 100644
@ -894,11 +904,11 @@ index e70ebc6..96f8c27 100644
+++ b/conf/hbase-env.sh
@@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/
# export HBASE_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
-# export HBASE_HEAPSIZE=1000
+export HBASE_HEAPSIZE=4096
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
@ -910,8 +920,8 @@ index e70ebc6..96f8c27 100644
</section>
</section>
</section> <!-- example config -->
<section xml:id="important_configurations">
<title>The Important Configurations</title>
<para>Below we list what the <emphasis>important</emphasis>
@ -935,7 +945,7 @@ index e70ebc6..96f8c27 100644
configuration under control otherwise, a long garbage collection that lasts
beyond the ZooKeeper session timeout will take out
your RegionServer (You might be fine with this -- you probably want recovery to start
on the server if a RegionServer has been in GC for a long period of time).</para>
on the server if a RegionServer has been in GC for a long period of time).</para>
<para>To change this configuration, edit <filename>hbase-site.xml</filename>,
copy the changed file around the cluster and restart.</para>
@ -1011,7 +1021,7 @@ index e70ebc6..96f8c27 100644
cluster (You can always later manually split the big Regions should one prove
hot and you want to spread the request load over the cluster). A lower number of regions is
preferred, generally in the range of 20 to low-hundreds
per RegionServer. Adjust the regionsize as appropriate to achieve this number.
per RegionServer. Adjust the regionsize as appropriate to achieve this number.
</para>
<para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
@ -1019,10 +1029,10 @@ index e70ebc6..96f8c27 100644
<para>You may need to experiment with this setting based on your hardware configuration and application needs.
</para>
<para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
RegionSize can also be set on a per-table basis via
RegionSize can also be set on a per-table basis via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
</para>
</section>
<section xml:id="disable.splitting">
<title>Managed Splitting</title>
@ -1075,22 +1085,22 @@ of all regions.
</para>
</section>
<section xml:id="managed.compactions"><title>Managed Compactions</title>
<para>A common administrative technique is to manage major compactions manually, rather than letting
<para>A common administrative technique is to manage major compactions manually, rather than letting
HBase do it. By default, <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname> is one day and major compactions
may kick in when you least desire it - especially on a busy system. To turn off automatic major compactions set
the value to <varname>0</varname>.
the value to <varname>0</varname>.
</para>
<para>It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when
they occur. They can be administered through the HBase shell, or via
they occur. They can be administered through the HBase shell, or via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
</para>
<para>For more information about compactions and the compaction file selection process, see <xref linkend="compaction"/></para>
</section>
<section xml:id="spec.ex"><title>Speculative Execution</title>
<para>Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off
<para>Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off
Speculative Execution at a system-level unless you need it for a specific case, where it can be configured per-job.
Set the properties <varname>mapred.map.tasks.speculative.execution</varname> and
Set the properties <varname>mapred.map.tasks.speculative.execution</varname> and
<varname>mapred.reduce.tasks.speculative.execution</varname> to false.
</para>
</section>
@ -1118,9 +1128,9 @@ of all regions.
<link xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&amp;subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent scan performance with caching set to 1</link>
and the issue cited therein where setting notcpdelay improved scan speeds.</para>
</section>
</section>
</section> <!-- important config -->
</chapter>