HADOOP-6292. Update native libraries guide. Contributed by Corinne Chandel

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@827855 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Christopher Douglas 2009-10-21 00:32:20 +00:00
parent 7313955d04
commit 449ac4ab87
2 changed files with 130 additions and 108 deletions

View File

@ -1122,6 +1122,8 @@ Release 0.21.0 - Unreleased
HADOOP-6286. Fix bugs in related to URI handling in glob methods in HADOOP-6286. Fix bugs in related to URI handling in glob methods in
FileContext. (Boris Shkolnik via suresh) FileContext. (Boris Shkolnik via suresh)
HADOOP-6292. Update native libraries guide. (Corinne Chandel via cdouglas)
Release 0.20.2 - Unreleased Release 0.20.2 - Unreleased
HADOOP-6231. Allow caching of filesystem instances to be disabled on a HADOOP-6231. Allow caching of filesystem instances to be disabled on a

View File

@ -26,90 +26,82 @@
<body> <body>
<section> <section>
<title>Purpose</title> <title>Overview</title>
<p>Hadoop has native implementations of certain components for reasons of <p>This guide describes the native hadoop library and includes a small discussion about native shared libraries.</p>
both performance and non-availability of Java implementations. These
components are available in a single, dynamically-linked, native library. <p><strong>Note:</strong> Depending on your environment, the term "native libraries" <em>could</em>
On the *nix platform it is <em>libhadoop.so</em>. This document describes refer to all *.so's you need to compile; and, the term "native compression" <em>could</em> refer to all *.so's
the usage and details on how to build the native libraries.</p> you need to compile that are specifically related to compression.
</section> Currently, however, this document only addresses the native hadoop library (<em>libhadoop.so</em>).</p>
</section>
<section> <section>
<title>Components</title> <title>Native Hadoop Library </title>
<p>Hadoop currently has the following <p>Hadoop has native implementations of certain components for
<a href="ext:api/org/apache/hadoop/io/compress/compressioncodec"> performance reasons and for non-availability of Java implementations. These
compression codecs</a> as the native components:</p> components are available in a single, dynamically-linked native library called
<ul> the native hadoop library. On the *nix platforms the library is named <em>libhadoop.so</em>. </p>
<li><a href="ext:zlib">zlib</a></li>
<li><a href="ext:gzip">gzip</a></li>
<li><a href="ext:bzip">bzip2</a></li>
</ul>
<p>Of the above, the availability of native hadoop libraries is imperative
for the gzip and bzip2 compression codecs to work.</p>
</section>
<section> <section>
<title>Usage</title> <title>Usage</title>
<p>It is fairly simple to use the native hadoop libraries:</p> <p>It is fairly easy to use the native hadoop library:</p>
<ol>
<li>
Review the <a href="#Components">components</a>.
</li>
<li>
Review the <a href="#Supported+Platforms">supported platforms</a>.
</li>
<li>
Either <a href="#Download">download</a> a hadoop release, which will
include a pre-built version of the native hadoop library, or
<a href="#Build">build</a> your own version of the
native hadoop library. Whether you download or build, the name for the library is
the same: <em>libhadoop.so</em>
</li>
<li>
Install the compression codec development packages
(<strong>&gt;zlib-1.2</strong>, <strong>&gt;gzip-1.2</strong>):
<ul>
<li>If you download the library, install one or more development packages -
whichever compression codecs you want to use with your deployment.</li>
<li>If you build the library, it is <strong>mandatory</strong>
to install both development packages.</li>
</ul>
</li>
<li>
Check the <a href="#Runtime">runtime</a> log files.
</li>
</ol>
</section>
<section>
<title>Components</title>
<p>The native hadoop library includes two components, the zlib and gzip
<a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html">
compression codecs</a>:
</p>
<ul> <ul>
<li> <li><a href="ext:zlib">zlib</a></li>
Take a look at the <li><a href="ext:gzip">gzip</a></li>
<a href="#Supported+Platforms">supported platforms</a>.
</li>
<li>
Either <a href="ext:releases/download">download</a> the pre-built
32-bit i386-Linux native hadoop libraries (available as part of hadoop
distribution in <code>lib/native</code> directory) or
<a href="#Building+Native+Hadoop+Libraries">build</a> them yourself.
</li>
<li>
Make sure you have any of or all of <strong>&gt;zlib-1.2</strong>,
<strong>&gt;gzip-1.2</strong>, and <strong>&gt;bzip2-1.0</strong>
packages for your platform installed;
depending on your needs.
</li>
</ul> </ul>
<p>The native hadoop library is imperative for gzip to work.</p>
<p>The <code>bin/hadoop</code> script ensures that the native hadoop
library is on the library path via the system property
<em>-Djava.library.path=&lt;path&gt;</em>.</p>
<p>To check everything went alright check the hadoop log files for:</p>
<p>
<code>
DEBUG util.NativeCodeLoader - Trying to load the custom-built
native-hadoop library...
</code><br/>
<code>
INFO util.NativeCodeLoader - Loaded the native-hadoop library
</code>
</p>
<p>If something goes wrong, then:</p>
<p>
<code>
INFO util.NativeCodeLoader - Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
</code>
</p>
</section> </section>
<section> <section>
<title>Supported Platforms</title> <title>Supported Platforms</title>
<p>Hadoop native library is supported only on *nix platforms only. <p>The native hadoop library is supported on *nix platforms only.
Unfortunately it is known not to work on <a href="ext:cygwin">Cygwin</a> The library does not to work with <a href="ext:cygwin">Cygwin</a>
and <a href="ext:osx">Mac OS X</a> and has mainly been used on the or the <a href="ext:osx">Mac OS X</a> platform.</p>
GNU/Linux platform.</p>
<p>It has been tested on the following GNU/Linux distributions:</p> <p>The native hadoop library is mainly used on the GNU/Linus platform and
has been tested on these distributions:</p>
<ul> <ul>
<li> <li>
<a href="http://www.redhat.com/rhel/">RHEL4</a>/<a href="http://fedora.redhat.com/">Fedora</a> <a href="http://www.redhat.com/rhel/">RHEL4</a>/<a href="http://fedora.redhat.com/">Fedora</a>
@ -118,22 +110,30 @@
<li><a href="http://www.gentoo.org/">Gentoo</a></li> <li><a href="http://www.gentoo.org/">Gentoo</a></li>
</ul> </ul>
<p>On all the above platforms a 32/64 bit Hadoop native library will work <p>On all the above distributions a 32/64 bit native hadoop library will work
with a respective 32/64 bit jvm.</p> with a respective 32/64 bit jvm.</p>
</section> </section>
<section> <section>
<title>Building Native Hadoop Libraries</title> <title>Download</title>
<p>Hadoop native library is written in <p>The pre-built 32-bit i386-Linux native hadoop library is available as part of the
<a href="http://en.wikipedia.org/wiki/ANSI_C">ANSI C</a> and built using hadoop distribution and is located in the <code>lib/native</code> directory. You can download the
the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool). hadoop distribution from <a href="ext:releases/download">Hadoop Common Releases</a>.</p>
This means it should be straight-forward to build them on any platform with
a standards compliant C compiler and the GNU autotools-chain.
See <a href="#Supported+Platforms">supported platforms</a>.</p>
<p>In particular the various packages you would need on the target <p>Be sure to install the zlib and/or gzip development packages - whichever compression
platform are:</p> codecs you want to use with your deployment.</p>
</section>
<section>
<title>Build</title>
<p>The native hadoop library is written in <a href="http://en.wikipedia.org/wiki/ANSI_C">ANSI C</a>
and is built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool).
This means it should be straight-forward to build the library on any platform with a standards-compliant
C compiler and the GNU autotools-chain (see the <a href="#Supported+Platforms">supported platforms</a>).</p>
<p>The packages you need to install on the target platform are:</p>
<ul> <ul>
<li> <li>
C compiler (e.g. <a href="http://gcc.gnu.org/">GNU C Compiler</a>) C compiler (e.g. <a href="http://gcc.gnu.org/">GNU C Compiler</a>)
@ -149,52 +149,69 @@
</li> </li>
</ul> </ul>
<p>Once you have the prerequisites use the standard <code>build.xml</code> <p>Once you installed the prerequisite packages use the standard hadoop <code>build.xml</code>
and pass along the <code>compile.native</code> flag (set to file and pass along the <code>compile.native</code> flag (set to <code>true</code>) to build the native hadoop library:</p>
<code>true</code>) to build the native hadoop library:</p>
<p><code>$ ant -Dcompile.native=true &lt;target&gt;</code></p> <p><code>$ ant -Dcompile.native=true &lt;target&gt;</code></p>
<p>The native hadoop library is not built by default since not everyone is <p>You should see the newly-built library in:</p>
interested in building them.</p>
<p>You should see the newly-built native hadoop library in:</p>
<p><code>$ build/native/&lt;platform&gt;/lib</code></p> <p><code>$ build/native/&lt;platform&gt;/lib</code></p>
<p>where &lt;platform&gt; is combination of the system-properties: <p>where &lt;<code>platform</code>&gt; is a combination of the system-properties:
<code>${os.name}-${os.arch}-${sun.arch.data.model}</code>; for e.g. <code>${os.name}-${os.arch}-${sun.arch.data.model}</code> (for example, Linux-i386-32).</p>
Linux-i386-32.</p>
<section>
<title>Notes</title>
<p>Please note the following:</p>
<ul> <ul>
<li> <li>
It is <strong>mandatory</strong> to have the It is <strong>mandatory</strong> to install both the zlib and gzip
zlib, gzip, and bzip2 development packages on the target platform in order to build the
development packages on the target platform for building the native hadoop library; however, for deployment it is sufficient to
native hadoop library; however for deployment it is sufficient to install just one package if you wish to use only one codec.
install one of them if you wish to use only one of them.
</li> </li>
<li> <li>
It is necessary to have the correct 32/64 libraries of both zlib It is necessary to have the correct 32/64 libraries for zlib,
depending on the 32/64 bit jvm for the target platform for depending on the 32/64 bit jvm for the target platform, in order to
building/deployment of the native hadoop library. build and deploy the native hadoop library.
</li> </li>
</ul> </ul>
</section>
</section> </section>
<section>
<title>Runtime</title>
<p>The <code>bin/hadoop</code> script ensures that the native hadoop
library is on the library path via the system property: <br/>
<em>-Djava.library.path=&lt;path&gt;</em></p>
<p>During runtime, check the hadoop log files for your MapReduce tasks.</p>
<ul>
<li>If everything is all right, then:<br/><br/>
<code> DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... </code><br/>
<code> INFO util.NativeCodeLoader - Loaded the native-hadoop library </code><br/>
</li>
<li>If something goes wrong, then:<br/><br/>
<code>
INFO util.NativeCodeLoader - Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
</code>
</li>
</ul>
</section>
</section>
<section> <section>
<title> Loading Native Libraries Through DistributedCache </title> <title>Native Shared Libraries</title>
<p>User can load native shared libraries through <p>You can load <strong>any</strong> native shared library using
<a href="http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#DistributedCache">DistributedCache</a> <a href="http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#DistributedCache">DistributedCache</a>
for <em>distributing</em> and <em>symlinking</em> the library files.</p> for <em>distributing</em> and <em>symlinking</em> the library files.</p>
<p>Here is an example, describing how to distribute the library and <p>This example shows you how to distribute a shared library, <code>mylib.so</code>,
load it from a MapReduce task. </p> and load it from a MapReduce task.</p>
<ol> <ol>
<li> First copy the library to the HDFS. <br/> <li> First copy the library to the HDFS: <br/>
<code>bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1</code> <code>bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1</code>
</li> </li>
<li> The job launching program should contain the following: <br/> <li> The job launching program should contain the following: <br/>
@ -206,6 +223,9 @@
<code> System.loadLibrary("mylib.so"); </code> <code> System.loadLibrary("mylib.so"); </code>
</li> </li>
</ol> </ol>
<p><br/><strong>Note:</strong> If you downloaded or built the native hadoop library, you dont need to use DistibutedCache to
make the library available to your MapReduce tasks.</p>
</section> </section>
</body> </body>