HBASE-7758 Update book to include documentation of CellCounter utility

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1442290 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2013-02-04 18:25:47 +00:00
parent 968facdd0a
commit 6430721b83
1 changed files with 25 additions and 6 deletions

View File

@ -265,16 +265,35 @@ row10 c1 c2
</para>
</section>
<section xml:id="rowcounter">
<title>RowCounter</title>
<para>RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use
as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to
exploit.
<title>RowCounter and CellCounter</title>
<para><ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html">RowCounter</ulink> is a
mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read
all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single
process but it will run faster if you have a MapReduce cluster in place for it to exploit.
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter &lt;tablename&gt; [&lt;column1&gt; &lt;column2&gt;...]
</programlisting>
</para>
<para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
<para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
</para>
<para>HBase ships another diagnostic mapreduce job called
<ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html">CellCounter</ulink>. Like
RowCounter, it gathers more fine-grained statistics about your table. The statistics gathered by RowCounter are more fine-grained
and include:
<itemizedlist>
<listitem>Total number of rows in the table.</listitem>
<listitem>Total number of CFs across all rows.</listitem>
<listitem>Total qualifiers across all rows.</listitem>
<listitem>Total occurrence of each CF.</listitem>
<listitem>Total occurrence of each qualifier.</listitem>
<listitem>Total number of versions of each qualifier.</listitem>
</itemizedlist>
</para>
<para>The program allows you to limit the scope of the run. Provide a row regex or prefix to limit the rows to analyze. Use
<code>hbase.mapreduce.scan.column.family</code> to specify scanning a single column family.
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CellCounter &lt;tablename&gt; &lt;outputDir&gt; [regex or prefix]</programlisting>
</para>
<para>Note: just like RowCounter, caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the
job configuration. </para>
</section>
</section> <!-- tools -->