diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml
index 0f7695c098a..638441563b8 100644
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@@ -1064,6 +1064,7 @@ to ensure well-formedness of your document after an edit session.
+
LZO compressionYou should consider enabling LZO compression. Its
@@ -1084,7 +1085,72 @@ to ensure well-formedness of your document after an edit session.
hbase.regionserver.codecs
for a feature to help protect against failed LZO install.
+ See also the Compression Appendix
+ at the tail of this book.
+
+ Bigger Regions
+
+ Consider going to larger regions to cut down on the total number of regions
+ on your cluster. Generally less Regions to manage makes for a smoother running
+ cluster (You can always later manually split the big Regions should one prove
+ hot and you want to spread the request load over the cluster). By default,
+ regions are 256MB in size. You could run with
+ 1G. Some run with even larger regions; 4G or even larger. Adjust
+ hbase.hregion.max.filesize in your hbase-site.xml.
+
+
+
+ Managed Splitting
+
+ Rather than let HBase auto-split your Regions, manage the splitting manually
+ What follows is taken from the javadoc at the head of
+ the org.apache.hadoop.hbase.util.RegionSplitter tool
+ added to HBase post-0.90.0 release.
+
+ .
+ With growing amounts of data, splits will continually be needed. Since
+ you always know exactly what regions you have, long-term debugging and
+ profiling is much easier with manual splits. It is hard to trace the logs to
+ understand region level problems if it keeps splitting and getting renamed.
+ Data offlining bugs + unknown number of split regions == oh crap! If an
+ HLog or StoreFile
+ was mistakenly unprocessed by HBase due to a weird bug and
+ you notice it a day or so later, you can be assured that the regions
+ specified in these files are the same as the current regions and you have
+ less headaches trying to restore/replay your data.
+ You can finely tune your compaction algorithm. With roughly uniform data
+ growth, it's easy to cause split / compaction storms as the regions all
+ roughly hit the same data size at the same time. With manual splits, you can
+ let staggered, time-based major compactions spread out your network IO load.
+
+
+ How do I turn off automatic splitting? Automatic splitting is determined by the configuration value
+ hbase.hregion.max.filesize. It is not recommended that you set this
+ to Long.MAX_VALUE in case you forget about manual splits. A suggested setting
+ is 100GB, which would result in > 1hr major compactions if reached.
+
+ What's the optimal number of pre-split regions to create?
+ Mileage will vary depending upon your application.
+ You could start low with 10 pre-split regions / server and watch as data grows
+ over time. It's better to err on the side of too little regions and rolling split later.
+ A more complicated answer is that this depends upon the largest storefile
+ in your region. With a growing data size, this will get larger over time. You
+ want the largest region to be just big enough that the Store compact
+ selection algorithm only compacts it due to a timed major. If you don't, your
+ cluster can be prone to compaction storms as the algorithm decides to run
+ major compactions on a large series of regions all at once. Note that
+ compaction storms are due to the uniform data growth, not the manual split
+ decision.
+
+ If you pre-split your regions too thin, you can increase the major compaction
+interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. If your data size
+grows too large, use the (post-0.90.0 HBase) org.apache.hadoop.hbase.util.RegionSplitter
+script to perform a network IO safe rolling split
+of all regions.
+
+
+
@@ -1861,18 +1927,29 @@ to ensure well-formedness of your document after an edit session.
-
+
LZO
- Running with LZO enabled is recommended though HBase does not ship with
- LZO because of licensing issues. See the HBase wiki page
- Using LZO Compression
- for help installing LZO.
+ See LZO Compression above.
+
+
+ GZIP
+
+
+ GZIP will generally compress better than LZO though slower.
+ For some setups, better compression may be preferred.
+ Java will use java's GZIP unless the native Hadoop libs are
+ available on the CLASSPATH; in this case it will use native
+ compressors instead (If the native libs are NOT present,
+ you will see lots of Got brand-new compressor
+ reports in your logs; TO BE FIXED).
+
+