From bcfc6d65af7860c83ec147f673cd1ff8970290c4 Mon Sep 17 00:00:00 2001 From: Jonathan M Hsieh Date: Wed, 3 Sep 2014 16:24:59 -0700 Subject: [PATCH] HBASE-11781 Document new TableMapReduceUtil scanning options (Misty Stanley-Jones) --- src/main/docbkx/book.xml | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml index 3d8216b19d2..19dd770157b 100644 --- a/src/main/docbkx/book.xml +++ b/src/main/docbkx/book.xml @@ -1048,6 +1048,42 @@ $ HADOOP_CLASSPATH=$(hbase mapredcp):/etc/hbase/conf hadoop jar MyApp +
+ MapReduce Scan Caching + TableMapReduceUtil now restores the option to set scanner caching (the number of rows + which are cached before returning the result to the client) on the Scan object that is + passed in. This functionality was lost due to a bug in HBase 0.95 (HBASE-11558), which + is fixed for HBase 0.98.5 and 0.96.3. The priority order for choosing the scanner caching is + as follows: + + + Caching settings which are set on the scan object. + + + Caching settings which are specified via the configuration option + , which can either be set manually in + hbase-site.xml or via the helper method + TableMapReduceUtil.setScannerCaching(). + + + The default value HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING, which is set to + 100. + + + Optimizing the caching settings is a balance between the time the client waits for a + result and the number of sets of results the client needs to receive. If the caching setting + is too large, the client could end up waiting for a long time or the request could even time + out. If the setting is too small, the scan needs to return results in several pieces. + If you think of the scan as a shovel, a bigger cache setting is analogous to a bigger + shovel, and a smaller cache setting is equivalent to more shoveling in order to fill the + bucket. + The list of priorities mentioned above allows you to set a reasonable default, and + override it for specific operations. + See the API documentation for Scan for more details. +
Bundled HBase MapReduce Jobs