diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index fba4881208c..afafa2132e7 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -300,7 +300,7 @@ try { Versions<indexterm><primary>Versions</primary></indexterm> A {row, column, version} tuple exactly - specifies a cell in HBase. Its possible to have an + specifies a cell in HBase. It's possible to have an unbounded number of cells where the row and column are the same but the cell address differs only in its version dimension. @@ -633,7 +633,7 @@ admin.enableTable(table); In the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores: monotonically increasing values are bad. The pile-up on a single region brought on - by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general its best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key. + by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key. @@ -1448,7 +1448,7 @@ if (!b) { After the region has been split eventually this row will be deleted. Notes on HRegionInfo: the empty key is used to denote table start and table end. A region with an empty start key - is the first region in a table. If region has both an empty start and an empty end key, its the only region in the table + is the first region in a table. If region has both an empty start and an empty end key, it's the only region in the table In the (hopefully unlikely) event that programmatic processing of catalog metadata is required, see the Writables utility. @@ -2037,7 +2037,7 @@ rs.close(); - When starting off, its probably best to stick to the default region-size, perhaps going + When starting off, it's probably best to stick to the default region-size, perhaps going smaller for hot tables (or manually split hot regions to spread the load over the cluster), or go with larger region sizes if your cell sizes tend to be largish (100k and up). @@ -3030,7 +3030,7 @@ This option should not normally be used, and it is not in -fixAll. Build and install snappy on all nodes - of your cluster. + of your cluster (see below) @@ -3051,6 +3051,34 @@ hbase> describe 't1' +
+ + Installation + + + In order to install Snappy on you HBase server, you need to make sure both Snappy and Hadoop native libraries are + copied at the right place. Native libraries need to be installed under ${HBASE_HOME}/lib/native/Linux-amd64-64 or + ${HBASE_HOME}/lib/native/Linux-i386-32 depending on your server architecture. + + + + You will find the snappy library file under the .libs directory from your Snappy build (For example + /home/hbase/snappy-1.0.5/.libs/). The file is called libsnappy.so.1.x.x where 1.x.x is the version of the snappy + code you are building. You can either copy this file into your hbase directory under libsnappy.so name, or simply + create a symbolic link to it. + + + + The second file you need is the hadoop native library. You will find this file in your hadoop installation directory + under lib/native/Linux-amd64-64/ or lib/native/Linux-i386-32/. The file you are looking for is libhadoop.so.1.x.x. + Again, you can simply copy this file or link to it, under the name libhadoop.so. + + + + At the end of the installation, you should have both libsnappy.so and libhadoop.so links or files present into + lib/native/Linux-amd64-64 or into lib/native/Linux-i386-32 + +
Changing Compression Schemes @@ -3066,7 +3094,7 @@ hbase> describe 't1' <link xlink:href="https://github.com/brianfrankcooper/YCSB/">YCSB: The Yahoo! Cloud Serving Benchmark</link> and HBase TODO: Describe how YCSB is poor for putting up a decent cluster load. TODO: Describe setup of YCSB for HBase - Ted Dunning redid YCSB so its mavenized and added facility for verifying workloads. See Ted Dunning's YCSB. + Ted Dunning redid YCSB so it's mavenized and added facility for verifying workloads. See Ted Dunning's YCSB.