HBASE-7264 Improve Snappy installation documentation

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1416667 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2012-12-03 21:20:28 +00:00
parent d269cd2687
commit b3271bd279
1 changed files with 34 additions and 6 deletions

View File

@ -300,7 +300,7 @@ try {
<title>Versions<indexterm><primary>Versions</primary></indexterm></title> <title>Versions<indexterm><primary>Versions</primary></indexterm></title>
<para>A <emphasis>{row, column, version} </emphasis>tuple exactly <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
specifies a <literal>cell</literal> in HBase. Its possible to have an specifies a <literal>cell</literal> in HBase. It's possible to have an
unbounded number of cells where the row and column are the same but the unbounded number of cells where the row and column are the same but the
cell address differs only in its version dimension.</para> cell address differs only in its version dimension.</para>
@ -633,7 +633,7 @@ admin.enableTable(table);
<para> <para>
In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores: In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
<link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>. The pile-up on a single region brought on <link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>. The pile-up on a single region brought on
by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general its best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key. by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
</para> </para>
@ -1448,7 +1448,7 @@ if (!b) {
After the region has been split eventually this row will be deleted. After the region has been split eventually this row will be deleted.
</para> </para>
<para>Notes on HRegionInfo: the empty key is used to denote table start and table end. A region with an empty start key <para>Notes on HRegionInfo: the empty key is used to denote table start and table end. A region with an empty start key
is the first region in a table. If region has both an empty start and an empty end key, its the only region in the table is the first region in a table. If region has both an empty start and an empty end key, it's the only region in the table
</para> </para>
<para>In the (hopefully unlikely) event that programmatic processing of catalog metadata is required, see the <para>In the (hopefully unlikely) event that programmatic processing of catalog metadata is required, see the
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29">Writables</link> utility. <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29">Writables</link> utility.
@ -2037,7 +2037,7 @@ rs.close();
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>When starting off, its probably best to stick to the default region-size, perhaps going <para>When starting off, it's probably best to stick to the default region-size, perhaps going
smaller for hot tables (or manually split hot regions to spread the load over smaller for hot tables (or manually split hot regions to spread the load over
the cluster), or go with larger region sizes if your cell sizes tend to be the cluster), or go with larger region sizes if your cell sizes tend to be
largish (100k and up).</para> largish (100k and up).</para>
@ -3030,7 +3030,7 @@ This option should not normally be used, and it is not in <code>-fixAll</code>.
<listitem> <listitem>
<para> <para>
Build and install <link xlink:href="http://code.google.com/p/snappy/">snappy</link> on all nodes Build and install <link xlink:href="http://code.google.com/p/snappy/">snappy</link> on all nodes
of your cluster. of your cluster (see below)
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
@ -3051,6 +3051,34 @@ hbase> describe 't1'</programlisting>
</orderedlist> </orderedlist>
</para> </para>
<section xml:id="snappy.compression.installation">
<title>
Installation
</title>
<para>
In order to install Snappy on you HBase server, you need to make sure both Snappy and Hadoop native libraries are
copied at the right place. Native libraries need to be installed under ${HBASE_HOME}/lib/native/Linux-amd64-64 or
${HBASE_HOME}/lib/native/Linux-i386-32 depending on your server architecture.
</para>
<para>
You will find the snappy library file under the .libs directory from your Snappy build (For example
/home/hbase/snappy-1.0.5/.libs/). The file is called libsnappy.so.1.x.x where 1.x.x is the version of the snappy
code you are building. You can either copy this file into your hbase directory under libsnappy.so name, or simply
create a symbolic link to it.
</para>
<para>
The second file you need is the hadoop native library. You will find this file in your hadoop installation directory
under lib/native/Linux-amd64-64/ or lib/native/Linux-i386-32/. The file you are looking for is libhadoop.so.1.x.x.
Again, you can simply copy this file or link to it, under the name libhadoop.so.
</para>
<para>
At the end of the installation, you should have both libsnappy.so and libhadoop.so links or files present into
lib/native/Linux-amd64-64 or into lib/native/Linux-i386-32
</para>
</section>
</section> </section>
<section xml:id="changing.compression"> <section xml:id="changing.compression">
<title>Changing Compression Schemes</title> <title>Changing Compression Schemes</title>
@ -3066,7 +3094,7 @@ hbase> describe 't1'</programlisting>
<title xml:id="ycsb"><link xlink:href="https://github.com/brianfrankcooper/YCSB/">YCSB: The Yahoo! Cloud Serving Benchmark</link> and HBase</title> <title xml:id="ycsb"><link xlink:href="https://github.com/brianfrankcooper/YCSB/">YCSB: The Yahoo! Cloud Serving Benchmark</link> and HBase</title>
<para>TODO: Describe how YCSB is poor for putting up a decent cluster load.</para> <para>TODO: Describe how YCSB is poor for putting up a decent cluster load.</para>
<para>TODO: Describe setup of YCSB for HBase</para> <para>TODO: Describe setup of YCSB for HBase</para>
<para>Ted Dunning redid YCSB so its mavenized and added facility for verifying workloads. See <link xlink:href="https://github.com/tdunning/YCSB">Ted Dunning's YCSB</link>.</para> <para>Ted Dunning redid YCSB so it's mavenized and added facility for verifying workloads. See <link xlink:href="https://github.com/tdunning/YCSB">Ted Dunning's YCSB</link>.</para>
</appendix> </appendix>