From 257ab6525efc3575adaf7f4ef69d52a9bca7d1ec Mon Sep 17 00:00:00 2001 From: stack Date: Wed, 2 Jul 2014 16:42:32 -0700 Subject: [PATCH] HBASE-11459 Add more doc on compression codecs, how to hook up native lib, lz4, etc. --- src/main/docbkx/book.xml | 86 +++++++++++++++++++++++++++++++--------- 1 file changed, 67 insertions(+), 19 deletions(-) diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml index 6c4c9efd248..a5e351752fd 100644 --- a/src/main/docbkx/book.xml +++ b/src/main/docbkx/book.xml @@ -4330,7 +4330,20 @@ This option should not normally be used, and it is not in -fixAll. Compression In HBase<indexterm><primary>Compression</primary></indexterm> - There are a bunch of compression options in HBase. There is some helpful discussion + There are a bunch of compression options in HBase. Some codecs come with java -- + e.g. gzip -- and so require no additional installations. Others require native + libraries. The native libraries may be available in your hadoop as is the case + with lz4 and it is just a matter of making sure the hadoop native .so is available + to HBase. You may have to do extra work to make the codec accessible; for example, + if the codec has an apache-incompatible license that makes it so hadoop cannot bundle + the library. + Below we + discuss what is necessary for the common codecs. Whatever codec you use, be sure + to test it is installed properly and is available on all nodes that make up your cluster. + Add any necessary operational step that will ensure checking the codec present when you + happen to add new nodes to your cluster. The + discussed below can help check the codec is properly install. + As to which codec to use, there is some helpful discussion to be found in Documenting Guidance on compression and codecs. @@ -4341,11 +4354,25 @@ This option should not normally be used, and it is not in -fixAll. To run it, type /bin/hbase org.apache.hadoop.hbase.util.CompressionTest. This will emit usage on how to run the tool. - You need to restart regionserver for it to pick up fixed codecs! + You need to restart regionserver for it to pick up changes! Be aware that the regionserver caches the result of the compression check it runs - ahead of each region open. This means - that you will have to restart the regionserver for it to notice that you have fixed - any codec issues. + ahead of each region open. This means that you will have to restart the regionserver + for it to notice that you have fixed any codec issues; e.g. changed symlinks or + moved lib locations under HBase. + + On the location of native libraries + Hadoop looks in lib/native for .so files. HBase looks in + lib/native/PLATFORM. See the bin/hbase. + View the file and look for native. See how we + do the work to find out what platform we are running on running a little java program + org.apache.hadoop.util.PlatformName to figure it out. + We'll then add ./lib/native/PLATFORM to the + LD_LIBRARY_PATH environment for when the JVM starts. + The JVM will look in here (as well as in any other dirs specified on LD_LIBRARY_PATH) + for codec native libs. If you are unable to figure your 'platform', do: + $ ./bin/hbase org.apache.hadoop.util.PlatformName. + An example platform would be Linux-amd64-64. + @@ -4376,6 +4403,41 @@ This option should not normally be used, and it is not in -fixAll. +
+ + GZIP + + + GZIP will generally compress better than LZO but it will run slower. + For some setups, better compression may be preferred ('cold' data). + Java will use java's GZIP unless the native Hadoop libs are + available on the CLASSPATH; in this case it will use native + compressors instead (If the native libs are NOT present, + you will see lots of Got brand-new compressor + reports in your logs; see ). + +
+ +
+ + LZ4 + + + LZ4 is bundled with Hadoop. Make sure the hadoop .so is + accessible when you start HBase. One means of doing this is after figuring your + platform, see , make a symlink from HBase + to the native Hadoop libraries presuming the two software installs are colocated. + For example, if my 'platform' is Linux-amd64-64: + $ cd $HBASE_HOME +$ mkdir lib/native +$ ln -s $HADOOP_HOME/lib/native lib/native/Linux-amd64-64 + Use the compression tool to check lz4 installed on all nodes. + Start up (or restart) hbase. From here on out you will be able to create + and alter tables to enable LZ4 as a compression codec. E.g.: + hbase(main):003:0> alter 'TestTable', {NAME => 'info', COMPRESSION => 'LZ4'} + +
+
LZO @@ -4395,20 +4457,6 @@ This option should not normally be used, and it is not in <code>-fixAll</code>. for a feature to help protect against failed LZO install.</para> </section> - <section xml:id="gzip.compression"> - <title> - GZIP - - - GZIP will generally compress better than LZO though slower. - For some setups, better compression may be preferred. - Java will use java's GZIP unless the native Hadoop libs are - available on the CLASSPATH; in this case it will use native - compressors instead (If the native libs are NOT present, - you will see lots of Got brand-new compressor - reports in your logs; see ). - -
SNAPPY