HBASE-2423 Update 'Getting Started' for 0.20.4 including making

"important configurations more visiable"


git-svn-id: https://svn.apache.org/repos/asf/hadoop/hbase/trunk@932109 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Jean-Daniel Cryans 2010-04-08 20:48:20 +00:00
parent a303215a28
commit e48a8ab0fc
2 changed files with 24 additions and 13 deletions

View File

@ -492,6 +492,8 @@ Release 0.21.0 - Unreleased
writes occur in same millisecond (Clint Morgan via J-D)
HBASE-2360 Make sure we have all the hadoop fixes in our our copy of its rpc
(Todd Lipcon via Stack)
HBASE-2423 Update 'Getting Started' for 0.20.4 including making
"important configurations more visiable"
NEW FEATURES
HBASE-1961 HBase EC2 scripts

View File

@ -53,7 +53,7 @@
<h2><a name="requirements">Requirements</a></h2>
<ul>
<li>Java 1.6.x, preferably from <a href="http://www.java.com/download/">Sun</a>. Use the latest version available.</li>
<li>Java 1.6.x, preferably from <a href="http://www.java.com/download/">Sun</a>. Use the latest version available except u18 (u19 is fine).</li>
<li>This version of HBase will only run on <a href="http://hadoop.apache.org/common/releases.html">Hadoop 0.20.x</a>.</li>
<li>
<em>ssh</em> must be installed and <em>sshd</em> must be running to use Hadoop's scripts to manage remote Hadoop daemons.
@ -71,23 +71,11 @@
<em>fully-distributed</em> mode you should configure a ZooKeeper quorum (more info below).
</li>
<li>Hosts must be able to resolve the fully-qualified domain name of the master.</li>
<li>
HBase currently is a file handle hog. The usual default of 1024 on *nix systems is insufficient
if you are loading any significant amount of data into regionservers.
See the <a href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</a>
for how to up the limit. Also, as of 0.18.x Hadoop DataNodes have an upper-bound on the number of threads they will
support (<code>dfs.datanode.max.xcievers</code>). The default is 256 threads. Up this limit on your hadoop cluster.
</li>
<li>
The clocks on cluster members should be in basic alignments. Some skew is tolerable but
wild skew could generate odd behaviors. Run <a href="http://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</a>
on your cluster, or an equivalent.
</li>
<li>
HBase servers put up 10 listeners for incoming connections by default.
Up this number if you have a dataset of any substance by setting <code>hbase.regionserver.handler.count</code>
in your <code>hbase-site.xml</code>.
</li>
<li>
This is the current list of patches we recommend you apply to your running Hadoop cluster:
<ul>
@ -103,6 +91,27 @@
</li>
</ul>
</li>
<li>
HBase is a database, it uses a lot of files at the same time. The default <b>ulimit -n</b> of 1024 on *nix systems is insufficient.
Any significant amount of loading will lead you to
<a href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</a>.
You will also notice errors like:
<pre>
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
</pre>
Do yourself a favor and change this to more than 10k using the FAQ.
Also, HDFS has an upper bound of files that it can serve at the same time, called xcievers (yes, this is <em>misspelled</em>). Again, before doing any loading,
make sure you configured Hadoop's conf/hdfs-site.xml with this:
<pre>
&lt;property&gt;
&lt;name&gt;dfs.datanode.max.xcievers&lt;/name&gt;
&lt;value&gt;2047&lt;/value&gt;
&lt;/property&gt;
</pre>
See the background of this issue here: <a href="http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5">Problem: "xceiverCount 258 exceeds the limit of concurrent xcievers 256"</a>.
Failure to follow these instructions will result in <b>data loss</b>.
</li>
</ul>
<h3><a name="windows">Windows</a></h3>