HBASE-4743 book.xml, performance.xml, troubleshooting.xml scan info
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1197315 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
92b57170e9
commit
ae6af9e630
|
@ -567,7 +567,7 @@ admin.enableTable(table);
|
|||
</para>
|
||||
<section xml:id="number.of.cfs.card"><title>Cardinality of ColumnFamilies</title>
|
||||
<para>Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows).
|
||||
If ColumnFamilyA has 1000,000 rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA's data will likely be spread
|
||||
If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA's data will likely be spread
|
||||
across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient.
|
||||
</para>
|
||||
</section>
|
||||
|
|
|
@ -353,6 +353,18 @@ Deferred log flush can be configured on tables via <link
|
|||
rows at a time to the client to be processed. There is a cost/benefit to
|
||||
have the cache value be large because it costs more in memory for both
|
||||
client and RegionServer, so bigger isn't always better.</para>
|
||||
<section xml:id="perf.hbase.client.caching.mr">
|
||||
<title>Scan Caching in MapReduce Jobs</title>
|
||||
<para>Scan settings in MapReduce jobs deserve special attention. Timeouts can result (e.g., UnknownScannerException)
|
||||
in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the
|
||||
next set of data. This problem can occur because there is non-trivial processing occuring per row. If you process
|
||||
rows quickly, set caching higher. If you process rows more slowly (e.g., lots of transformations per row, writes),
|
||||
then set caching lower.
|
||||
</para>
|
||||
<para>Timeouts can also happen in a non-MapReduce use case (i.e., single threaded HBase client doing a Scan), but the
|
||||
processing that is often performed in MapReduce jobs tends to exacerbate this issue.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
<section xml:id="perf.hbase.client.selection">
|
||||
<title>Scan Attribute Selection</title>
|
||||
|
|
|
@ -464,12 +464,14 @@ hadoop 17789 155 35.2 9067824 8604364 ? S<l Mar04 9855:48 /usr/java/j
|
|||
<para>For more information on the HBase client, see <xref linkend="client"/>.
|
||||
</para>
|
||||
<section xml:id="trouble.client.scantimeout">
|
||||
<title>ScannerTimeoutException</title>
|
||||
<title>ScannerTimeoutException or UnknownScannerException</title>
|
||||
<para>This is thrown if the time between RPC calls from the client to RegionServer exceeds the scan timeout.
|
||||
For example, if <code>Scan.setCaching</code> is set to 500, then there will be an RPC call to fetch the next batch of rows every 500 <code>.next()</code> calls on the ResultScanner
|
||||
because data is being transferred in blocks of 500 rows to the client. Reducing the setCaching value may be an option, but setting this value too low makes for inefficient
|
||||
processing on numbers of rows.
|
||||
</para>
|
||||
<para>See <xref linkend="perf.hbase.client.caching"/>.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="trouble.client.scarylogs">
|
||||
<title>Shell or client application throws lots of scary exceptions during normal operation</title>
|
||||
|
|
Loading…
Reference in New Issue