HBASE-7758 Update book to include documentation of CellCounter utility
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1442290 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
968facdd0a
commit
6430721b83
|
@ -265,16 +265,35 @@ row10 c1 c2
|
|||
</para>
|
||||
</section>
|
||||
<section xml:id="rowcounter">
|
||||
<title>RowCounter</title>
|
||||
<para>RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use
|
||||
as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
|
||||
It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to
|
||||
exploit.
|
||||
<title>RowCounter and CellCounter</title>
|
||||
<para><ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html">RowCounter</ulink> is a
|
||||
mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read
|
||||
all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single
|
||||
process but it will run faster if you have a MapReduce cluster in place for it to exploit.
|
||||
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]
|
||||
</programlisting>
|
||||
</para>
|
||||
<para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
|
||||
<para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
|
||||
</para>
|
||||
<para>HBase ships another diagnostic mapreduce job called
|
||||
<ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html">CellCounter</ulink>. Like
|
||||
RowCounter, it gathers more fine-grained statistics about your table. The statistics gathered by RowCounter are more fine-grained
|
||||
and include:
|
||||
<itemizedlist>
|
||||
<listitem>Total number of rows in the table.</listitem>
|
||||
<listitem>Total number of CFs across all rows.</listitem>
|
||||
<listitem>Total qualifiers across all rows.</listitem>
|
||||
<listitem>Total occurrence of each CF.</listitem>
|
||||
<listitem>Total occurrence of each qualifier.</listitem>
|
||||
<listitem>Total number of versions of each qualifier.</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>The program allows you to limit the scope of the run. Provide a row regex or prefix to limit the rows to analyze. Use
|
||||
<code>hbase.mapreduce.scan.column.family</code> to specify scanning a single column family.
|
||||
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> <outputDir> [regex or prefix]</programlisting>
|
||||
</para>
|
||||
<para>Note: just like RowCounter, caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the
|
||||
job configuration. </para>
|
||||
</section>
|
||||
|
||||
</section> <!-- tools -->
|
||||
|
|
Loading…
Reference in New Issue