Add in Andrew Purtell's BigTop pointer
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1400526 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
a5bd102cd8
commit
77707f9b0a
|
@ -30,10 +30,10 @@
|
|||
<para>This chapter is the Not-So-Quick start guide to HBase configuration. It goes
|
||||
over system requirements, Hadoop setup, the different HBase run modes, and the
|
||||
various configurations in HBase. Please read this chapter carefully. At a mimimum
|
||||
ensure that all <xref linkend="basic.prerequisites" /> have
|
||||
ensure that all <xref linkend="basic.prerequisites" /> have
|
||||
been satisfied. Failure to do so will cause you (and us) grief debugging strange errors
|
||||
and/or data loss.</para>
|
||||
|
||||
|
||||
<para>
|
||||
HBase uses the same configuration system as Hadoop.
|
||||
To configure a deploy, edit a file of environment variables
|
||||
|
@ -57,7 +57,7 @@ to ensure well-formedness of your document after an edit session.
|
|||
content of the <filename>conf</filename> directory to
|
||||
all nodes of the cluster. HBase will not do this for you.
|
||||
Use <command>rsync</command>.</para>
|
||||
|
||||
|
||||
<section xml:id="basic.prerequisites">
|
||||
<title>Basic Prerequisites</title>
|
||||
<para>This section lists required services and some required system configuration.
|
||||
|
@ -69,7 +69,7 @@ to ensure well-formedness of your document after an edit session.
|
|||
xlink:href="http://www.java.com/download/">Oracle</link>.</para>
|
||||
</section>
|
||||
<section xml:id="os">
|
||||
<title>Operating System</title>
|
||||
<title>Operating System</title>
|
||||
<section xml:id="ssh">
|
||||
<title>ssh</title>
|
||||
|
||||
|
@ -151,9 +151,9 @@ to ensure well-formedness of your document after an edit session.
|
|||
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
|
||||
</programlisting> Do yourself a favor and change the upper bound on the
|
||||
number of file descriptors. Set it to north of 10k. The math runs roughly as follows: per ColumnFamily
|
||||
there is at least one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the
|
||||
there is at least one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the
|
||||
average number of StoreFiles per ColumnFamily times the number of regions per RegionServer. For example, assuming
|
||||
that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily,
|
||||
that a schema had 3 ColumnFamilies per region with an average of 3 StoreFiles per ColumnFamily,
|
||||
and there are 100 regions per RegionServer, the JVM will open 3 * 3 * 100 = 900 file descriptors
|
||||
(not counting open jar files, config files, etc.)
|
||||
</para>
|
||||
|
@ -216,13 +216,13 @@ to ensure well-formedness of your document after an edit session.
|
|||
xlink:href="http://cygwin.com/">Cygwin</link> to have a *nix-like
|
||||
environment for the shell scripts. The full details are explained in
|
||||
the <link xlink:href="http://hbase.apache.org/cygwin.html">Windows
|
||||
Installation</link> guide. Also
|
||||
Installation</link> guide. Also
|
||||
<link xlink:href="http://search-hadoop.com/?q=hbase+windows&fc_project=HBase&fc_type=mail+_hash_+dev">search our user mailing list</link> to pick
|
||||
up latest fixes figured by Windows users.</para>
|
||||
</section>
|
||||
|
||||
</section> <!-- OS -->
|
||||
|
||||
|
||||
<section xml:id="hadoop">
|
||||
<title><link
|
||||
xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm>
|
||||
|
@ -289,7 +289,7 @@ to ensure well-formedness of your document after an edit session.
|
|||
<link xlink:href="http://www.cloudera.com/">Cloudera</link> or
|
||||
<link xlink:href="http://www.mapr.com/">MapR</link> distributions.
|
||||
Cloudera' <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>
|
||||
is Apache Hadoop 0.20.x plus patches including all of the
|
||||
is Apache Hadoop 0.20.x plus patches including all of the
|
||||
<link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
|
||||
additions needed to add a durable sync. Use the released, most recent version of CDH3. In CDH, append
|
||||
support is enabled by default so you do not need to make the above mentioned edits to
|
||||
|
@ -311,6 +311,16 @@ to ensure well-formedness of your document after an edit session.
|
|||
replace the jar in HBase everywhere on your cluster. Hadoop version
|
||||
mismatch issues have various manifestations but often all looks like
|
||||
its hung up.</para>
|
||||
<note xml:id="bigtop"><title>Packaging and Apache BigTop</title>
|
||||
<para><link xlink:href="http://bigtop.apache.org">Apache Bigtop</link>
|
||||
is an umbrella for packaging and tests of the Apache Hadoop
|
||||
ecosystem, including Apache HBase. Bigtop performs testing at various
|
||||
levels (packaging, platform, runtime, upgrade, etc...), developed by a
|
||||
community, with a focus on the system as a whole, rather than individual
|
||||
projects. We recommend installing Apache HBase packages as provided by a
|
||||
Bigtop release rather than rolling your own piecemeal integration of
|
||||
various component releases.</para>
|
||||
</note>
|
||||
|
||||
<section xml:id="hadoop.security">
|
||||
<title>HBase on Secure Hadoop</title>
|
||||
|
@ -320,7 +330,7 @@ to ensure well-formedness of your document after an edit session.
|
|||
with the secure version. If you want to read more about how to setup
|
||||
Secure HBase, see <xref linkend="hbase.secure.configuration" />.</para>
|
||||
</section>
|
||||
|
||||
|
||||
<section xml:id="dfs.datanode.max.xcievers">
|
||||
<title><varname>dfs.datanode.max.xcievers</varname><indexterm>
|
||||
<primary>xcievers</primary>
|
||||
|
@ -354,7 +364,7 @@ to ensure well-formedness of your document after an edit session.
|
|||
<para>See also <xref linkend="casestudies.xceivers"/>
|
||||
</para>
|
||||
</section>
|
||||
|
||||
|
||||
</section> <!-- hadoop -->
|
||||
</section>
|
||||
|
||||
|
@ -418,7 +428,7 @@ to ensure well-formedness of your document after an edit session.
|
|||
HBase. Do not use this configuration for production nor for
|
||||
evaluating HBase performance.</para>
|
||||
|
||||
<para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>.
|
||||
<para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>.
|
||||
</para>
|
||||
<para>Next, configure HBase. Below is an example <filename>conf/hbase-site.xml</filename>.
|
||||
This is the file into
|
||||
|
@ -501,10 +511,10 @@ to ensure well-formedness of your document after an edit session.
|
|||
</programlisting>
|
||||
</para>
|
||||
</section>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section xml:id="fully_dist">
|
||||
<title>Fully-distributed</title>
|
||||
|
@ -600,7 +610,7 @@ to ensure well-formedness of your document after an edit session.
|
|||
<section xml:id="confirm">
|
||||
<title>Running and Confirming Your Installation</title>
|
||||
|
||||
|
||||
|
||||
|
||||
<para>Make sure HDFS is running first. Start and stop the Hadoop HDFS
|
||||
daemons by running <filename>bin/start-hdfs.sh</filename> over in the
|
||||
|
@ -610,31 +620,31 @@ to ensure well-formedness of your document after an edit session.
|
|||
not normally use the mapreduce daemons. These do not need to be
|
||||
started.</para>
|
||||
|
||||
|
||||
|
||||
|
||||
<para><emphasis>If</emphasis> you are managing your own ZooKeeper,
|
||||
start it and confirm its running else, HBase will start up ZooKeeper
|
||||
for you as part of its start process.</para>
|
||||
|
||||
|
||||
|
||||
|
||||
<para>Start HBase with the following command:</para>
|
||||
|
||||
|
||||
|
||||
|
||||
<programlisting>bin/start-hbase.sh</programlisting>
|
||||
|
||||
Run the above from the
|
||||
Run the above from the
|
||||
|
||||
<varname>HBASE_HOME</varname>
|
||||
|
||||
directory.
|
||||
directory.
|
||||
|
||||
<para>You should now have a running HBase instance. HBase logs can be
|
||||
found in the <filename>logs</filename> subdirectory. Check them out
|
||||
especially if HBase had trouble starting.</para>
|
||||
|
||||
|
||||
|
||||
|
||||
<para>HBase also puts up a UI listing vital attributes. By default its
|
||||
deployed on the Master host at port 60010 (HBase RegionServers listen
|
||||
|
@ -644,13 +654,13 @@ to ensure well-formedness of your document after an edit session.
|
|||
Master's homepage you'd point your browser at
|
||||
<filename>http://master.example.org:60010</filename>.</para>
|
||||
|
||||
|
||||
|
||||
|
||||
<para>Once HBase has started, see the <xref linkend="shell_exercises" /> for how to
|
||||
create tables, add data, scan your insertions, and finally disable and
|
||||
drop your tables.</para>
|
||||
|
||||
|
||||
|
||||
|
||||
<para>To stop HBase after exiting the HBase shell enter
|
||||
<programlisting>$ ./bin/stop-hbase.sh
|
||||
|
@ -660,15 +670,15 @@ stopping hbase...............</programlisting> Shutdown can take a moment to
|
|||
until HBase has shut down completely before stopping the Hadoop
|
||||
daemons.</para>
|
||||
|
||||
|
||||
|
||||
</section>
|
||||
</section> <!-- run modes -->
|
||||
|
||||
|
||||
|
||||
<section xml:id="config.files">
|
||||
|
||||
|
||||
|
||||
<section xml:id="config.files">
|
||||
<title>Configuration Files</title>
|
||||
|
||||
|
||||
<section xml:id="hbase.site">
|
||||
<title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
|
||||
<para>Just as in Hadoop where you add site-specific HDFS configuration
|
||||
|
@ -744,11 +754,11 @@ stopping hbase...............</programlisting> Shutdown can take a moment to
|
|||
Minimally, a client of HBase needs several libraries in its <varname>CLASSPATH</varname> when connecting to a cluster, including:
|
||||
<programlisting>
|
||||
commons-configuration (commons-configuration-1.6.jar)
|
||||
commons-lang (commons-lang-2.5.jar)
|
||||
commons-logging (commons-logging-1.1.1.jar)
|
||||
hadoop-core (hadoop-core-1.0.0.jar)
|
||||
commons-lang (commons-lang-2.5.jar)
|
||||
commons-logging (commons-logging-1.1.1.jar)
|
||||
hadoop-core (hadoop-core-1.0.0.jar)
|
||||
hbase (hbase-0.92.0.jar)
|
||||
log4j (log4j-1.2.16.jar)
|
||||
log4j (log4j-1.2.16.jar)
|
||||
slf4j-api (slf4j-api-1.5.8.jar)
|
||||
slf4j-log4j (slf4j-log4j12-1.5.8.jar)
|
||||
zookeeper (zookeeper-3.4.2.jar)</programlisting>
|
||||
|
@ -769,7 +779,7 @@ zookeeper (zookeeper-3.4.2.jar)</programlisting>
|
|||
</configuration>
|
||||
]]></programlisting>
|
||||
</para>
|
||||
|
||||
|
||||
<section xml:id="java.client.config">
|
||||
<title>Java client configuration</title>
|
||||
<para>The configuration used by a Java client is kept
|
||||
|
@ -778,15 +788,15 @@ zookeeper (zookeeper-3.4.2.jar)</programlisting>
|
|||
on invocation, will read in the content of the first <filename>hbase-site.xml</filename> found on
|
||||
the client's <varname>CLASSPATH</varname>, if one is present
|
||||
(Invocation will also factor in any <filename>hbase-default.xml</filename> found;
|
||||
an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>).
|
||||
an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>).
|
||||
It is also possible to specify configuration directly without having to read from a
|
||||
<filename>hbase-site.xml</filename>. For example, to set the ZooKeeper
|
||||
ensemble for the cluster programmatically do as follows:
|
||||
<programlisting>Configuration config = HBaseConfiguration.create();
|
||||
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally</programlisting>
|
||||
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally</programlisting>
|
||||
If multiple ZooKeeper instances make up your ZooKeeper ensemble,
|
||||
they may be specified in a comma-separated list (just as in the <filename>hbase-site.xml</filename> file).
|
||||
This populated <classname>Configuration</classname> instance can then be passed to an
|
||||
This populated <classname>Configuration</classname> instance can then be passed to an
|
||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>,
|
||||
and so on.
|
||||
</para>
|
||||
|
@ -794,7 +804,7 @@ config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zooke
|
|||
</section>
|
||||
|
||||
</section> <!-- config files -->
|
||||
|
||||
|
||||
<section xml:id="example_config">
|
||||
<title>Example Configurations</title>
|
||||
|
||||
|
@ -886,7 +896,7 @@ config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zooke
|
|||
1G.</para>
|
||||
|
||||
<programlisting>
|
||||
|
||||
|
||||
$ git diff hbase-env.sh
|
||||
diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh
|
||||
index e70ebc6..96f8c27 100644
|
||||
|
@ -894,11 +904,11 @@ index e70ebc6..96f8c27 100644
|
|||
+++ b/conf/hbase-env.sh
|
||||
@@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/
|
||||
# export HBASE_CLASSPATH=
|
||||
|
||||
|
||||
# The maximum amount of heap to use, in MB. Default is 1000.
|
||||
-# export HBASE_HEAPSIZE=1000
|
||||
+export HBASE_HEAPSIZE=4096
|
||||
|
||||
|
||||
# Extra Java runtime options.
|
||||
# Below are what we set by default. May only work with SUN JVM.
|
||||
|
||||
|
@ -910,8 +920,8 @@ index e70ebc6..96f8c27 100644
|
|||
</section>
|
||||
</section>
|
||||
</section> <!-- example config -->
|
||||
|
||||
|
||||
|
||||
|
||||
<section xml:id="important_configurations">
|
||||
<title>The Important Configurations</title>
|
||||
<para>Below we list what the <emphasis>important</emphasis>
|
||||
|
@ -935,7 +945,7 @@ index e70ebc6..96f8c27 100644
|
|||
configuration under control otherwise, a long garbage collection that lasts
|
||||
beyond the ZooKeeper session timeout will take out
|
||||
your RegionServer (You might be fine with this -- you probably want recovery to start
|
||||
on the server if a RegionServer has been in GC for a long period of time).</para>
|
||||
on the server if a RegionServer has been in GC for a long period of time).</para>
|
||||
|
||||
<para>To change this configuration, edit <filename>hbase-site.xml</filename>,
|
||||
copy the changed file around the cluster and restart.</para>
|
||||
|
@ -1011,7 +1021,7 @@ index e70ebc6..96f8c27 100644
|
|||
cluster (You can always later manually split the big Regions should one prove
|
||||
hot and you want to spread the request load over the cluster). A lower number of regions is
|
||||
preferred, generally in the range of 20 to low-hundreds
|
||||
per RegionServer. Adjust the regionsize as appropriate to achieve this number.
|
||||
per RegionServer. Adjust the regionsize as appropriate to achieve this number.
|
||||
</para>
|
||||
<para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
|
||||
For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
|
||||
|
@ -1019,10 +1029,10 @@ index e70ebc6..96f8c27 100644
|
|||
<para>You may need to experiment with this setting based on your hardware configuration and application needs.
|
||||
</para>
|
||||
<para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
|
||||
RegionSize can also be set on a per-table basis via
|
||||
RegionSize can also be set on a per-table basis via
|
||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
|
||||
</para>
|
||||
|
||||
|
||||
</section>
|
||||
<section xml:id="disable.splitting">
|
||||
<title>Managed Splitting</title>
|
||||
|
@ -1075,22 +1085,22 @@ of all regions.
|
|||
</para>
|
||||
</section>
|
||||
<section xml:id="managed.compactions"><title>Managed Compactions</title>
|
||||
<para>A common administrative technique is to manage major compactions manually, rather than letting
|
||||
<para>A common administrative technique is to manage major compactions manually, rather than letting
|
||||
HBase do it. By default, <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname> is one day and major compactions
|
||||
may kick in when you least desire it - especially on a busy system. To turn off automatic major compactions set
|
||||
the value to <varname>0</varname>.
|
||||
the value to <varname>0</varname>.
|
||||
</para>
|
||||
<para>It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when
|
||||
they occur. They can be administered through the HBase shell, or via
|
||||
they occur. They can be administered through the HBase shell, or via
|
||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
|
||||
</para>
|
||||
<para>For more information about compactions and the compaction file selection process, see <xref linkend="compaction"/></para>
|
||||
</section>
|
||||
|
||||
|
||||
<section xml:id="spec.ex"><title>Speculative Execution</title>
|
||||
<para>Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off
|
||||
<para>Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it is generally advised to turn off
|
||||
Speculative Execution at a system-level unless you need it for a specific case, where it can be configured per-job.
|
||||
Set the properties <varname>mapred.map.tasks.speculative.execution</varname> and
|
||||
Set the properties <varname>mapred.map.tasks.speculative.execution</varname> and
|
||||
<varname>mapred.reduce.tasks.speculative.execution</varname> to false.
|
||||
</para>
|
||||
</section>
|
||||
|
@ -1118,9 +1128,9 @@ of all regions.
|
|||
<link xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent scan performance with caching set to 1</link>
|
||||
and the issue cited therein where setting notcpdelay improved scan speeds.</para>
|
||||
</section>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
|
||||
</section> <!-- important config -->
|
||||
|
||||
</chapter>
|
||||
|
|
Loading…
Reference in New Issue