diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index 0f7695c098a..638441563b8 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -1064,6 +1064,7 @@ to ensure well-formedness of your document after an edit session. +
LZO compression You should consider enabling LZO compression. Its @@ -1084,7 +1085,72 @@ to ensure well-formedness of your document after an edit session. hbase.regionserver.codecs for a feature to help protect against failed LZO install. + See also the Compression Appendix + at the tail of this book.
+
+ Bigger Regions + + Consider going to larger regions to cut down on the total number of regions + on your cluster. Generally less Regions to manage makes for a smoother running + cluster (You can always later manually split the big Regions should one prove + hot and you want to spread the request load over the cluster). By default, + regions are 256MB in size. You could run with + 1G. Some run with even larger regions; 4G or even larger. Adjust + hbase.hregion.max.filesize in your hbase-site.xml. + +
+
+ Managed Splitting + + Rather than let HBase auto-split your Regions, manage the splitting manually + What follows is taken from the javadoc at the head of + the org.apache.hadoop.hbase.util.RegionSplitter tool + added to HBase post-0.90.0 release. + + . + With growing amounts of data, splits will continually be needed. Since + you always know exactly what regions you have, long-term debugging and + profiling is much easier with manual splits. It is hard to trace the logs to + understand region level problems if it keeps splitting and getting renamed. + Data offlining bugs + unknown number of split regions == oh crap! If an + HLog or StoreFile + was mistakenly unprocessed by HBase due to a weird bug and + you notice it a day or so later, you can be assured that the regions + specified in these files are the same as the current regions and you have + less headaches trying to restore/replay your data. + You can finely tune your compaction algorithm. With roughly uniform data + growth, it's easy to cause split / compaction storms as the regions all + roughly hit the same data size at the same time. With manual splits, you can + let staggered, time-based major compactions spread out your network IO load. + + + How do I turn off automatic splitting? Automatic splitting is determined by the configuration value + hbase.hregion.max.filesize. It is not recommended that you set this + to Long.MAX_VALUE in case you forget about manual splits. A suggested setting + is 100GB, which would result in > 1hr major compactions if reached. + + What's the optimal number of pre-split regions to create? + Mileage will vary depending upon your application. + You could start low with 10 pre-split regions / server and watch as data grows + over time. It's better to err on the side of too little regions and rolling split later. + A more complicated answer is that this depends upon the largest storefile + in your region. With a growing data size, this will get larger over time. You + want the largest region to be just big enough that the Store compact + selection algorithm only compacts it due to a timed major. If you don't, your + cluster can be prone to compaction storms as the algorithm decides to run + major compactions on a large series of regions all at once. Note that + compaction storms are due to the uniform data growth, not the manual split + decision. + + If you pre-split your regions too thin, you can increase the major compaction +interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. If your data size +grows too large, use the (post-0.90.0 HBase) org.apache.hadoop.hbase.util.RegionSplitter +script to perform a network IO safe rolling split +of all regions. + +
+ @@ -1861,18 +1927,29 @@ to ensure well-formedness of your document after an edit session. -
+
LZO - Running with LZO enabled is recommended though HBase does not ship with - LZO because of licensing issues. See the HBase wiki page - Using LZO Compression - for help installing LZO. + See LZO Compression above.
+
+ + GZIP + + + GZIP will generally compress better than LZO though slower. + For some setups, better compression may be preferred. + Java will use java's GZIP unless the native Hadoop libs are + available on the CLASSPATH; in this case it will use native + compressors instead (If the native libs are NOT present, + you will see lots of Got brand-new compressor + reports in your logs; TO BE FIXED). + +