diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml index 59e3fd4aade..9233035974b 100644 --- a/src/main/docbkx/book.xml +++ b/src/main/docbkx/book.xml @@ -1040,6 +1040,42 @@ $ HADOOP_CLASSPATH=$(hbase mapredcp):/etc/hbase/conf hadoop jar MyApp +
+ MapReduce Scan Caching + TableMapReduceUtil now restores the option to set scanner caching (the number of rows + which are cached before returning the result to the client) on the Scan object that is + passed in. This functionality was lost due to a bug in HBase 0.95 (HBASE-11558), which + is fixed for HBase 0.98.5 and 0.96.3. The priority order for choosing the scanner caching is + as follows: + + + Caching settings which are set on the scan object. + + + Caching settings which are specified via the configuration option + , which can either be set manually in + hbase-site.xml or via the helper method + TableMapReduceUtil.setScannerCaching(). + + + The default value HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING, which is set to + 100. + + + Optimizing the caching settings is a balance between the time the client waits for a + result and the number of sets of results the client needs to receive. If the caching setting + is too large, the client could end up waiting for a long time or the request could even time + out. If the setting is too small, the scan needs to return results in several pieces. + If you think of the scan as a shovel, a bigger cache setting is analogous to a bigger + shovel, and a smaller cache setting is equivalent to more shoveling in order to fill the + bucket. + The list of priorities mentioned above allows you to set a reasonable default, and + override it for specific operations. + See the API documentation for Scan for more details. +
Bundled HBase MapReduce Jobs