HBASE-7834. Document Hadoop version support matrix in the book

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1446379 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Enis Soztutar 2013-02-14 22:55:17 +00:00
parent 67ba5e3d50
commit 432c100d3b
1 changed files with 45 additions and 75 deletions

View File

@ -221,18 +221,48 @@ to ensure well-formedness of your document after an edit session.
xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm>
<primary>Hadoop</primary>
</indexterm></title>
<note><title>Please read all of this section</title>
<para>Please read this section to the end. Up front we
wade through the weeds of Hadoop versions. Later we talk of what you must do in HBase
to make it work w/ a particular Hadoop version.</para>
</note>
<para>Selecting a Hadoop version is critical for your HBase deployment. Below table shows some information about what versions of Hadoop are supported by various HBase versions. Based on the version of HBase, you should select the most appropriate version of Hadoop. We are not in the Hadoop distro selection business. You can use Hadoop distributions from Apache, or learn about vendor distributions of Hadoop at <link xlink:href="http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support"/></para>
<para>
<table>
<title>Hadoop version support matrix</title>
<tgroup cols='4' align='left' colsep='1' rowsep='1'><colspec colname='c1' align='left'/><colspec colname='c2' align='center'/><colspec colname='c3' align='center'/><colspec colname='c4' align='center'/>
<thead>
<row><entry> </entry><entry>HBase-0.92.x</entry><entry>HBase-0.94.x</entry><entry>HBase-0.96</entry></row>
</thead><tbody>
<row><entry>Hadoop-0.20.205</entry><entry>S</entry> <entry>S</entry> <entry>X</entry></row>
<row><entry>Hadoop-0.22.x </entry><entry>S</entry> <entry>S</entry> <entry>X</entry></row>
<row><entry>Hadoop-1.0.x </entry><entry>S</entry> <entry>S</entry> <entry>S</entry></row>
<row><entry>Hadoop-1.1.x </entry><entry>NT</entry> <entry>S</entry> <entry>S</entry></row>
<row><entry>Hadoop-0.23.x </entry><entry>X</entry> <entry>S</entry> <entry>NT</entry></row>
<row><entry>Hadoop-2.x </entry><entry>X</entry> <entry>S</entry> <entry>S</entry></row>
</tbody></tgroup></table>
Where
<simplelist type='vert' columns='1'>
<member>S = supported and tested,</member>
<member>X = not supported,</member>
<member>NT = it should run, but not tested enough.</member>
</simplelist>
</para>
<para>
Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its <filename>lib</filename> directory. The bundled jar is ONLY for use in standalone mode. In distributed mode, it is <emphasis>critical</emphasis> that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up.
</para>
<section xml:id="hadoop.hbase-0.94">
<title>Apache HBase 0.92 and 0.94</title>
<para>HBase 0.92 and 0.94 versions can work with Hadoop versions, 0.20.205, 0.22.x, 1.0.x, and 1.1.x. HBase-0.94 can additionally work with Hadoop-0.23.x and 2.x, but you may have to recompile the code using the specific maven profile (see top level pom.xml)</para>
</section>
<section xml:id="hadoop.hbase-0.96">
<title>Apache HBase 0.96</title>
<para>Apache HBase 0.96.0 requires Apache Hadoop 1.x at a minimum, and it can run equally well on hadoop-2.0.
As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required. We will no longer run properly on older Hadoops such as 0.20.205 or branch-0.20-append. Do not move to Apache HBase 0.96.x if you cannot upgrade your Hadoop<footnote><para>See <link xlink:href="http://search-hadoop.com/m/7vFVx4EsUb2">HBase, mail # dev - DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?</link></para></footnote>.</para>
</section>
<section xml:id="hadoop.older.versions">
<title>Hadoop versions 0.20.x - 1.x</title>
<para>
HBase will lose data unless it is running on an HDFS that has a durable
<code>sync</code> implementation. Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0
DO NOT have this attribute.
Currently only Hadoop versions 0.20.205.x or any release in excess of this
version -- this includes hadoop 1.0.0 -- have a working, durable sync
<code>sync</code> implementation. DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0 which DO NOT have this attribute. Currently only Hadoop versions 0.20.205.x or any release in excess of this version -- this includes hadoop-1.0.0 -- have a working, durable sync
<footnote>
<para>The Cloudera blog post <link xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An update on Apache Hadoop 1.0</link>
by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
@ -252,73 +282,13 @@ to ensure well-formedness of your document after an edit session.
</programlisting>
You will have to restart your cluster after making this edit. Ignore the chicken-little
comment you'll find in the <filename>hdfs-default.xml</filename> in the
description for the <varname>dfs.support.append</varname> configuration; it says it is not enabled because there
are <quote>... bugs in the 'append code' and is not supported in any production
cluster.</quote>. This comment is stale, from another era, and while I'm sure there
are bugs, the sync/append code has been running
in production at large scale deploys and is on
by default in the offerings of hadoop by commercial vendors
<footnote><para>Until recently only the
<link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
branch had a working sync but no official release was ever made from this branch.
You had to build it yourself. Michael Noll wrote a detailed blog,
<link xlink:href="http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-version-for-hbase-0-90-2/">Building
an Hadoop 0.20.x version for Apache HBase 0.90.2</link>, on how to build an
Hadoop from branch-0.20-append. Recommended.</para></footnote>
<footnote><para>Praveen Kumar has written
a complimentary article,
<link xlink:href="http://praveen.kumar.in/2011/06/20/building-hadoop-and-hbase-for-hbase-maven-application-development/">Building Hadoop and HBase for HBase Maven application development</link>.
</para></footnote><footnote>Cloudera have <varname>dfs.support.append</varname> set to true by default.</footnote>.
Please use the most up-to-date Hadoop possible.</para>
<note><title>Apache HBase 0.96.0 requires Apache Hadoop 1.0.0 at a minimum</title>
<para>As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required. We will no
longer run properly on older Hadoops such as <filename>0.20.205</filename> or <filename>branch-0.20-append</filename>.
Do not move to Apache HBase 0.96.x if you cannot upgrade your Hadoop<footnote><para>See <link xlink:href="http://search-hadoop.com/m/7vFVx4EsUb2">HBase, mail # dev - DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?</link></para></footnote>.</para>
<para>Apache HBase 0.96.0 runs on Apache Hadoop 2.0.
description for the <varname>dfs.support.append</varname> configuration.
</para>
</note>
<para>Or use the
<link xlink:href="http://www.cloudera.com/">Cloudera</link> or
<link xlink:href="http://www.mapr.com/">MapR</link> distributions.
Cloudera' <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>
is Apache Hadoop 0.20.x plus patches including all of the
<link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
additions needed to add a durable sync. Use the released, most recent version of CDH3. In CDH, append
support is enabled by default so you do not need to make the above mentioned edits to
<filename>hdfs-site.xml</filename> or to <filename>hbase-site.xml</filename>.</para>
<para>
<link xlink:href="http://www.mapr.com/">MapR</link>
includes a commercial, reimplementation of HDFS.
It has a durable sync as well as some other interesting features that are not
yet in Apache Hadoop. Their <link xlink:href="http://www.mapr.com/products/mapr-editions/m3-edition">M3</link>
product is free to use and unlimited.
</para>
<para>Because HBase depends on Hadoop, it bundles an instance of the
Hadoop jar under its <filename>lib</filename> directory. The bundled jar is ONLY for use in standalone mode.
In distributed mode, it is <emphasis>critical</emphasis> that the version of Hadoop that is out
on your cluster match what is under HBase. Replace the hadoop jar found in the HBase
<filename>lib</filename> directory with the hadoop jar you are running on
your cluster to avoid version mismatch issues. Make sure you
replace the jar in HBase everywhere on your cluster. Hadoop version
mismatch issues have various manifestations but often all looks like
its hung up.</para>
<note xml:id="bigtop"><title>Packaging and Apache BigTop</title>
<para><link xlink:href="http://bigtop.apache.org">Apache Bigtop</link>
is an umbrella for packaging and tests of the Apache Hadoop
ecosystem, including Apache HBase. Bigtop performs testing at various
levels (packaging, platform, runtime, upgrade, etc...), developed by a
community, with a focus on the system as a whole, rather than individual
projects. We recommend installing Apache HBase packages as provided by a
Bigtop release rather than rolling your own piecemeal integration of
various component releases.</para>
</note>
</section>
<section xml:id="hadoop.security">
<title>Apache HBase on Secure Hadoop</title>
<para>Apache HBase will run on any Hadoop 0.20.x that incorporates Hadoop
security features -- e.g. Y! 0.20S or CDH3B3 -- as long as you do as
security features as long as you do as
suggested above and replace the Hadoop jar that ships with HBase
with the secure version. If you want to read more about how to setup
Secure HBase, see <xref linkend="hbase.secure.configuration" />.</para>