HBASE-11781 Document new TableMapReduceUtil scanning options (Misty Stanley-Jones)

This commit is contained in:
Jonathan M Hsieh 2014-09-03 16:24:59 -07:00
parent 1a6eea335f
commit bcfc6d65af
1 changed files with 36 additions and 0 deletions

View File

@ -1048,6 +1048,42 @@ $ <userinput>HADOOP_CLASSPATH=$(hbase mapredcp):/etc/hbase/conf hadoop jar MyApp
</caution> </caution>
</section> </section>
<section>
<title>MapReduce Scan Caching</title>
<para>TableMapReduceUtil now restores the option to set scanner caching (the number of rows
which are cached before returning the result to the client) on the Scan object that is
passed in. This functionality was lost due to a bug in HBase 0.95 (<link
xlink:href="https://issues.apache.org/jira/browse/HBASE-11558">HBASE-11558</link>), which
is fixed for HBase 0.98.5 and 0.96.3. The priority order for choosing the scanner caching is
as follows:</para>
<orderedlist>
<listitem>
<para>Caching settings which are set on the scan object.</para>
</listitem>
<listitem>
<para>Caching settings which are specified via the configuration option
<option>hbase.client.scanner.caching</option>, which can either be set manually in
<filename>hbase-site.xml</filename> or via the helper method
<code>TableMapReduceUtil.setScannerCaching()</code>.</para>
</listitem>
<listitem>
<para>The default value <code>HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING</code>, which is set to
<literal>100</literal>.</para>
</listitem>
</orderedlist>
<para>Optimizing the caching settings is a balance between the time the client waits for a
result and the number of sets of results the client needs to receive. If the caching setting
is too large, the client could end up waiting for a long time or the request could even time
out. If the setting is too small, the scan needs to return results in several pieces.
If you think of the scan as a shovel, a bigger cache setting is analogous to a bigger
shovel, and a smaller cache setting is equivalent to more shoveling in order to fill the
bucket.</para>
<para>The list of priorities mentioned above allows you to set a reasonable default, and
override it for specific operations.</para>
<para>See the API documentation for <link
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html"
>Scan</link> for more details.</para>
</section>
<section> <section>
<title>Bundled HBase MapReduce Jobs</title> <title>Bundled HBase MapReduce Jobs</title>