HBASE-2328 Make important configurations more obvious to new users; first part
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1030839 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
e3249c8431
commit
0ea21f78bc
|
@ -226,14 +226,196 @@ stopping hbase...............</programlisting></para>
|
|||
</section>
|
||||
|
||||
<section xml:id="notsoquick">
|
||||
<title>Not-so-quick Start</title>
|
||||
<para>The HBase API overview document contains a detailed <link
|
||||
xlink:href="http://hbase.apache.org/docs/current/api/overview-summary.html#overview_description">Getting
|
||||
Started</link> with a list of requirements and description of the
|
||||
different HBase run modes: standalone, what is described above in <link
|
||||
linkend="quickstart">Quick Start,</link> pseudo-distributed where all
|
||||
daemons run on a single server, and distributed.</para>
|
||||
<para>Be sure to read the <link linkend="important_configurations">Important Configurations</link>.</para>
|
||||
<title>Not-so-quick Start Guide</title>
|
||||
<section xml:id="requirements"><title>Requirements</title>
|
||||
<para>HBase has the following requirements. Please read the
|
||||
section below carefully and ensure that all requirements have been
|
||||
satisfied. Failure to do so will cause you (and us) grief debugging
|
||||
strange errors and/or data loss.
|
||||
</para>
|
||||
|
||||
<section xml:id="java"><title>java</title>
|
||||
<para>
|
||||
Just like Hadoop, HBase requires java 6 from <link xlink:href="http://www.java.com/download/">Oracle</link>.
|
||||
Usually you'll want to use the latest version available except the problematic u18 (u22 is the latest version as of this writing).</para>
|
||||
</section>
|
||||
<section xml:id="hadoop"><title><link xlink:href="http://hadoop.apache.org">hadoop</link></title>
|
||||
<para>This version of HBase will only run on <link xlink:href="http://hadoop.apache.org/common/releases.html">Hadoop 0.20.x</link>.
|
||||
HBase will lose data unless it is running on an HDFS that has a durable sync.
|
||||
Currently only the <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
|
||||
branch has this attribute. No official releases have been made from this branch as of this writing
|
||||
so you will have to build your own Hadoop from the tip of this branch
|
||||
(or install Cloudera's <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link> (as of this writing, it is in beta); it has the
|
||||
0.20-append patches needed to add a durable sync).
|
||||
See <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/CHANGES.txt">CHANGES.txt</link>
|
||||
in branch-0.20.-append to see list of patches involved.</para>
|
||||
</section>
|
||||
<section xml:id="ssh"> <title>ssh</title>
|
||||
<para><command>ssh</command> must be installed and <command>sshd</command> must be running to use Hadoop's scripts to manage remote Hadoop daemons.
|
||||
You must be able to ssh to all nodes, including your local node, using passwordless login (Google "ssh passwordless login").
|
||||
</para>
|
||||
</section>
|
||||
<section><title>DNS</title>
|
||||
<para>Basic name resolving must be working correctly on your cluster.
|
||||
</para>
|
||||
</section>
|
||||
<section><title>NTP</title>
|
||||
<para>
|
||||
The clocks on cluster members should be in basic alignments. Some skew is tolerable but
|
||||
wild skew could generate odd behaviors. Run <link xlink:href="http://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</link>
|
||||
on your cluster, or an equivalent.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="ulimit">
|
||||
<title><varname>ulimit</varname></title>
|
||||
<para>HBase is a database, it uses a lot of files at the same time.
|
||||
The default ulimit -n of 1024 on *nix systems is insufficient.
|
||||
Any significant amount of loading will lead you to
|
||||
<link xlink:href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</link>.
|
||||
You will also notice errors like:
|
||||
<programlisting>2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
|
||||
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
|
||||
</programlisting>
|
||||
Do yourself a favor and change the upper bound on the number of file descriptors.
|
||||
Set it to north of 10k. See the above referenced FAQ for how.</para>
|
||||
<para>To be clear, upping the file descriptors for the user who is
|
||||
running the HBase process is an operating system configuration, not an
|
||||
HBase configuration.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="dfs.datanode.max.xcievers">
|
||||
<title><varname>dfs.datanode.max.xcievers</varname></title>
|
||||
<para>
|
||||
Hadoop HDFS has an upper bound of files that it will serve at one same time,
|
||||
called <varname>xcievers</varname> (yes, this is misspelled). Again, before
|
||||
doing any loading, make sure you have configured Hadoop's <filename>conf/hdfs-site.xml</filename>
|
||||
setting the <varname>xceivers</varname> value to at least the following:
|
||||
<programlisting>
|
||||
<property>
|
||||
<name>dfs.datanode.max.xcievers</name>
|
||||
<value>2047</value>
|
||||
</property>
|
||||
</programlisting>
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section><title>HBase run modes: Standalone, Pseudo-distributed, and Distributed</title>
|
||||
<para>HBase has three different run modes: standalone, this is what is described above in
|
||||
<link linkend="quickstart">Quick Start,</link> pseudo-distributed mode where all
|
||||
daemons run on a single server, and distributed, where each of the daemons runs
|
||||
on different cluster node.</para>
|
||||
<section><title>Standalone HBase</title>
|
||||
<para>TODO</para>
|
||||
</section>
|
||||
<section><title>Pseudo-distributed</title>
|
||||
<para>TODO</para>
|
||||
</section>
|
||||
<section><title>Distributed</title>
|
||||
<para>TODO</para>
|
||||
</section>
|
||||
</section>
|
||||
<section><title>Client configuration and dependencies connecting to an HBase cluster</title>
|
||||
<para>TODO</para>
|
||||
</section>
|
||||
|
||||
<section><title>Example Configurations</title>
|
||||
<para>In this section we provide a few sample configurations.</para>
|
||||
<section><title>Basic Distributed HBase Install</title>
|
||||
<para>Here is example basic configuration of a ten node cluster running in
|
||||
distributed mode. The nodes
|
||||
are named <varname>example0</varname>, <varname>example1</varname>, etc., through
|
||||
node <varname>example9</varname> in this example. The HBase Master and the HDFS namenode
|
||||
are running on the node <varname>example0</varname>. RegionServers run on nodes
|
||||
<varname>example1</varname>-<varname>example9</varname>.
|
||||
A 3-node zookeeper ensemble runs on <varname>example1</varname>, <varname>example2</varname>, and <varname>example3</varname>.
|
||||
Below we show what the main configuration files
|
||||
-- <filename>hbase-site.xml</filename>, <filename>regionservers</filename>, and
|
||||
<filename>hbase-env.sh</filename> -- found in the <filename>conf</filename> directory
|
||||
might look like.
|
||||
</para>
|
||||
<section><title><filename>hbase-site.xml</filename></title>
|
||||
<programlisting>
|
||||
<![CDATA[
|
||||
<configuration>
|
||||
<property>
|
||||
<name>hbase.zookeeper.quorum</name>
|
||||
<value>example1,example2,example3</value>
|
||||
<description>The directory shared by region servers.
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.dataDir</name>
|
||||
<value>/export/stack/zookeeper</value>
|
||||
<description>Property from ZooKeeper's config zoo.cfg.
|
||||
The directory where the snapshot is stored.
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.rootdir</name>
|
||||
<value>hdfs://example1:9000/hbase</value>
|
||||
<description>The directory shared by region servers.
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.cluster.distributed</name>
|
||||
<value>true</value>
|
||||
<description>The mode the cluster will be in. Possible values are
|
||||
false: standalone and pseudo-distributed setups with managed Zookeeper
|
||||
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
|
||||
</description>
|
||||
</property>
|
||||
</configuration>
|
||||
]]>
|
||||
</programlisting>
|
||||
</section>
|
||||
|
||||
<section><title><filename>regionservers</filename></title>
|
||||
<para>In this file you list the nodes that will run regionservers. In
|
||||
our case we run regionservers on all but the head node example1 which is
|
||||
carrying the HBase master and the HDFS namenode</para>
|
||||
<programlisting>
|
||||
example1
|
||||
example3
|
||||
example4
|
||||
example5
|
||||
example6
|
||||
example7
|
||||
example8
|
||||
example9
|
||||
</programlisting>
|
||||
</section>
|
||||
|
||||
<section><title><filename>hbase-env.sh</filename></title>
|
||||
<para>Below we use a <command>diff</command> to show the differences from
|
||||
default in the <filename>hbase-env.sh</filename> file. Here we are setting
|
||||
the HBase heap to be 4G instead of the default 1G.
|
||||
</para>
|
||||
<programlisting>
|
||||
<![CDATA[
|
||||
$ git diff hbase-env.sh
|
||||
diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh
|
||||
index e70ebc6..96f8c27 100644
|
||||
--- a/conf/hbase-env.sh
|
||||
+++ b/conf/hbase-env.sh
|
||||
@@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/
|
||||
# export HBASE_CLASSPATH=
|
||||
|
||||
# The maximum amount of heap to use, in MB. Default is 1000.
|
||||
-# export HBASE_HEAPSIZE=1000
|
||||
+export HBASE_HEAPSIZE=4096
|
||||
|
||||
# Extra Java runtime options.
|
||||
# Below are what we set by default. May only work with SUN JVM.
|
||||
]]>
|
||||
</programlisting>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
</section>
|
||||
</section>
|
||||
</chapter>
|
||||
|
||||
|
@ -286,52 +468,11 @@ stopping hbase...............</programlisting></para>
|
|||
|
||||
|
||||
<section xml:id="required_configuration"><title>Required Configurations</title>
|
||||
<para>Here are some configurations you must configure to suit
|
||||
your deploy. Failure to do so will almost certainly result in
|
||||
<emphasis>data loss</emphasis>.
|
||||
<para>See the <link linkend="requirements">Requirements</link> section.
|
||||
It lists at least two required configurations needed running HBase bearing
|
||||
load: i.e. <link linkend="ulimit">file descriptors <varname>ulimit</varname></link> and
|
||||
<link linkend="dfs.datanode.max.xcievers"><varname>dfs.datanode.max.xcievers</varname></link>.
|
||||
</para>
|
||||
<section xml:id="ulimit">
|
||||
<title><varname>ulimit</varname></title>
|
||||
<para>HBase is a database, it uses a lot of files at the same time.
|
||||
The default ulimit -n of 1024 on *nix systems is insufficient.
|
||||
Any significant amount of loading will lead you to
|
||||
<link xlink:href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</link>.
|
||||
You will also notice errors like:
|
||||
<programlisting>2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
|
||||
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
|
||||
</programlisting>
|
||||
Do yourself a favor and change the upper bound on the number of file descriptors.
|
||||
Set it to north of 10k. See the above referenced FAQ for how. Verify this change
|
||||
has taken effect by checking the first line in your HBase logs. It'll print out
|
||||
the ulimit available to the running process (A frequent mistake is upping the ulimit
|
||||
for a user other than the user running the HBase process).</para>
|
||||
<para>To be clear, upping the file descriptors for the user who is
|
||||
running the HBase process is an operating system configuration, not an
|
||||
HBase configuration.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="dfs.datanode.max.xcievers">
|
||||
<title><varname>dfs.datanode.max.xcievers</varname></title>
|
||||
<para>
|
||||
Hadoop HDFS has an upper bound of files that it will serve at one same time,
|
||||
called <varname>xcievers</varname> (yes, this is misspelled). Again, before
|
||||
doing any loading, make sure you have configured Hadoop's <filename>conf/hdfs-site.xml</filename>
|
||||
setting the <varname>xceivers</varname> value to at least the following:
|
||||
<programlisting>
|
||||
<property>
|
||||
<name>dfs.datanode.max.xcievers</name>
|
||||
<value>2047</value>
|
||||
</property>
|
||||
</programlisting>
|
||||
<footnote>
|
||||
<para>
|
||||
For background see
|
||||
<link xlink:href="http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5">Problem: "xceiverCount 258 exceeds the limit of concurrent xcievers 256"</link>.
|
||||
</para>
|
||||
</footnote>
|
||||
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section xml:id="recommended_configurations"><title>Recommended Configuations</title>
|
||||
|
|
|
@ -51,73 +51,11 @@
|
|||
<li><a href="#related" >Related Documentation</a></li>
|
||||
</ul>
|
||||
|
||||
<h2><a name="requirements">Requirements</a></h2>
|
||||
<ul>
|
||||
<li>Java 1.6.x, preferably from <a href="http://www.java.com/download/">Sun</a>. Use the latest version available except u18 (u19 is fine).</li>
|
||||
<li>This version of HBase will only run on <a href="http://hadoop.apache.org/common/releases.html">Hadoop 0.20.x</a>.
|
||||
HBase will lose data unless it is running on an HDFS that has a durable sync operation.
|
||||
Currently only the <a href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</a>
|
||||
branch has this attribute. No official releases have been made from this branch as of this writing
|
||||
so you will have to build your own Hadoop from the tip of this branch
|
||||
(or install Cloudera's <a href="http://archive.cloudera.com/docs/">CDH3</a> (as of this writing, it is in beta); it has the
|
||||
0.20-append patches needed to add a durable sync).
|
||||
See <a href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/CHANGES.txt">CHANGES.txt</a>
|
||||
in branch-0.20.-append to see list of patches involved.</li>
|
||||
<li>
|
||||
<em>ssh</em> must be installed and <em>sshd</em> must be running to use Hadoop's scripts to manage remote Hadoop daemons.
|
||||
You must be able to ssh to all nodes, including your local node, using passwordless login
|
||||
(Google "ssh passwordless login").
|
||||
</li>
|
||||
<li>
|
||||
HBase depends on <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a> as of release 0.20.0.
|
||||
HBase keeps the location of its root table, who the current master is, and what regions are
|
||||
currently participating in the cluster in ZooKeeper.
|
||||
Clients and Servers now must know their <em>ZooKeeper Quorum locations</em> before
|
||||
they can do anything else (Usually they pick up this information from configuration
|
||||
supplied on their CLASSPATH). By default, HBase will manage a single ZooKeeper instance for you.
|
||||
In <em>standalone</em> and <em>pseudo-distributed</em> modes this is usually enough, but for
|
||||
<em>fully-distributed</em> mode you should configure a ZooKeeper quorum (more info below).
|
||||
</li>
|
||||
<li>Hosts must be able to resolve the fully-qualified domain name of the master.</li>
|
||||
<li>
|
||||
The clocks on cluster members should be in basic alignments. Some skew is tolerable but
|
||||
wild skew could generate odd behaviors. Run <a href="http://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</a>
|
||||
on your cluster, or an equivalent.
|
||||
</li>
|
||||
<li>
|
||||
The default <b>ulimit -n</b> of 1024 on *nix systems will be insufficient.
|
||||
Any significant amount of loading will lead you to
|
||||
<a href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</a>.
|
||||
You will also notice errors like:
|
||||
<pre>
|
||||
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
|
||||
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
|
||||
</pre>
|
||||
Do yourself a favor and change this to more than 10k. See the FAQ in the hbase wiki for how.
|
||||
Also, HDFS has an upper bound of files that it can serve at the same time,
|
||||
called xcievers (yes, this is <em>misspelled</em>). Again, before doing any loading,
|
||||
make sure you configured Hadoop's conf/hdfs-site.xml with this:
|
||||
<pre>
|
||||
<property>
|
||||
<name>dfs.datanode.max.xcievers</name>
|
||||
<value>2047</value>
|
||||
</property>
|
||||
</pre>
|
||||
See the background of this issue here: <a href="http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5">Problem: "xceiverCount 258 exceeds the limit of concurrent xcievers 256"</a>.
|
||||
Failure to follow these instructions will result in <b>data loss</b>.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<h3><a name="windows">Windows</a></h3>
|
||||
If you are running HBase on Windows, you must install
|
||||
<a href="http://cygwin.com/">Cygwin</a>
|
||||
to have a *nix-like environment for the shell scripts. The full details
|
||||
are explained in
|
||||
the <a href="../cygwin.html">Windows Installation</a>
|
||||
guide.
|
||||
|
||||
|
||||
<h2><a name="getting_started" >Getting Started</a></h2>
|
||||
<p>First review the <a href="http://hbase.apache.org/docs/current/book.html#requirements">requirements</a>
|
||||
section of the HBase Book. A careful reading will save you grief down the road.</p>
|
||||
|
||||
<p>What follows presumes you have obtained a copy of HBase,
|
||||
see <a href="http://hadoop.apache.org/hbase/releases.html">Releases</a>, and are installing
|
||||
for the first time. If upgrading your HBase instance, see <a href="#upgrading">Upgrading</a>.</p>
|
||||
|
@ -409,6 +347,14 @@ the HBase version. It does not change your install unless you explicitly ask it
|
|||
|
||||
<p>If your client is NOT Java, consider the Thrift or REST libraries.</p>
|
||||
|
||||
<h2><a name="windows">Windows</a></h2>
|
||||
If you are running HBase on Windows, you must install
|
||||
<a href="http://cygwin.com/">Cygwin</a>
|
||||
to have a *nix-like environment for the shell scripts. The full details
|
||||
are explained in
|
||||
the <a href="../cygwin.html">Windows Installation</a>
|
||||
guide.
|
||||
|
||||
<h2><a name="related" >Related Documentation</a></h2>
|
||||
<ul>
|
||||
<li><a href="http://hbase.org">HBase Home Page</a>
|
||||
|
|
Loading…
Reference in New Issue