HBASE-11781 Document new TableMapReduceUtil scanning options (Misty Stanley-Jones)
This commit is contained in:
parent
1a6eea335f
commit
bcfc6d65af
|
@ -1048,6 +1048,42 @@ $ <userinput>HADOOP_CLASSPATH=$(hbase mapredcp):/etc/hbase/conf hadoop jar MyApp
|
||||||
</caution>
|
</caution>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>MapReduce Scan Caching</title>
|
||||||
|
<para>TableMapReduceUtil now restores the option to set scanner caching (the number of rows
|
||||||
|
which are cached before returning the result to the client) on the Scan object that is
|
||||||
|
passed in. This functionality was lost due to a bug in HBase 0.95 (<link
|
||||||
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-11558">HBASE-11558</link>), which
|
||||||
|
is fixed for HBase 0.98.5 and 0.96.3. The priority order for choosing the scanner caching is
|
||||||
|
as follows:</para>
|
||||||
|
<orderedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>Caching settings which are set on the scan object.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>Caching settings which are specified via the configuration option
|
||||||
|
<option>hbase.client.scanner.caching</option>, which can either be set manually in
|
||||||
|
<filename>hbase-site.xml</filename> or via the helper method
|
||||||
|
<code>TableMapReduceUtil.setScannerCaching()</code>.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para>The default value <code>HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING</code>, which is set to
|
||||||
|
<literal>100</literal>.</para>
|
||||||
|
</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
<para>Optimizing the caching settings is a balance between the time the client waits for a
|
||||||
|
result and the number of sets of results the client needs to receive. If the caching setting
|
||||||
|
is too large, the client could end up waiting for a long time or the request could even time
|
||||||
|
out. If the setting is too small, the scan needs to return results in several pieces.
|
||||||
|
If you think of the scan as a shovel, a bigger cache setting is analogous to a bigger
|
||||||
|
shovel, and a smaller cache setting is equivalent to more shoveling in order to fill the
|
||||||
|
bucket.</para>
|
||||||
|
<para>The list of priorities mentioned above allows you to set a reasonable default, and
|
||||||
|
override it for specific operations.</para>
|
||||||
|
<para>See the API documentation for <link
|
||||||
|
xlink:href="https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html"
|
||||||
|
>Scan</link> for more details.</para>
|
||||||
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title>Bundled HBase MapReduce Jobs</title>
|
<title>Bundled HBase MapReduce Jobs</title>
|
||||||
|
|
Loading…
Reference in New Issue