This guide describes the native hadoop library and includes a small discussion about native shared libraries.
Note: Depending on your environment, the term “native libraries” could refer to all *.so’s you need to compile; and, the term “native compression” could refer to all *.so’s you need to compile that are specifically related to compression. Currently, however, this document only addresses the native hadoop library (libhadoop.so
). The document for libhdfs library (libhdfs.so
) is here.
Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the *nix platforms the library is named libhadoop.so
.
It is fairly easy to use the native hadoop library:
The native hadoop library includes various components:
The native hadoop library is supported on *nix platforms only. The library does not to work with Cygwin or the Mac OS X platform.
The native hadoop library is mainly used on the GNU/Linus platform and has been tested on these distributions:
On all the above distributions a 32/64 bit native hadoop library will work with a respective 32/64 bit jvm.
The pre-built 32-bit i386-Linux native hadoop library is available as part of the hadoop distribution and is located in the lib/native
directory. You can download the hadoop distribution from Hadoop Common Releases.
Be sure to install the zlib and/or gzip development packages - whichever compression codecs you want to use with your deployment.
The native hadoop library is written in ANSI C and is built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool). This means it should be straight-forward to build the library on any platform with a standards-compliant C compiler and the GNU autotools-chain (see the supported platforms).
The packages you need to install on the target platform are:
Once you installed the prerequisite packages use the standard hadoop pom.xml file and pass along the native flag to build the native hadoop library:
$ mvn package -Pdist,native -DskipTests -Dtar
You should see the newly-built library in:
$ hadoop-dist/target/hadoop-3.4.0-SNAPSHOT/lib/native
Please note the following:
The bin/hadoop script ensures that the native hadoop library is on the library path via the system property: -Djava.library.path=<path>
During runtime, check the hadoop log files for your MapReduce tasks.
DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...
INFO util.NativeCodeLoader - Loaded the native-hadoop library
INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
NativeLibraryChecker is a tool to check whether native libraries are loaded correctly. You can launch NativeLibraryChecker as follows:
$ hadoop checknative -a 14/12/06 01:30:45 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 14/12/06 01:30:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /home/ozawa/hadoop/lib/native/libhadoop.so.1.0.0 zlib: true /lib/x86_64-linux-gnu/libz.so.1 zstd: true /usr/lib/libzstd.so.1 lz4: true revision:99 bzip2: false
You can load any native shared library using DistributedCache for distributing and symlinking the library files.
This example shows you how to distribute a shared library, mylib.so, and load it from a MapReduce task.
bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1
DistributedCache.createSymlink(conf);
DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);
System.loadLibrary("mylib.so");
Note: If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to make the library available to your MapReduce tasks.