HBASE-7758 Update book to include documentation of CellCounter utility

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1442290 13f79535-47bb-0310-9956-ffa450edef68
2013-02-04 18:25:47 +00:00 · 2013-02-04 18:25:47 +00:00 · 6430721b83
parent 968facdd0a
commit 6430721b83
1 changed files with 25 additions and 6 deletions
--- a/src/docbkx/ops_mgt.xml
+++ b/src/docbkx/ops_mgt.xml
@ -265,16 +265,35 @@ row10	c1	c2
       </para>
    </section>
    <section xml:id="rowcounter">
-       <title>RowCounter</title>
-       <para>RowCounter is a mapreduce job to count all the rows of a table.  This is a good utility to use
-           as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
-           It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to
-           exploit.
+       <title>RowCounter and CellCounter</title>
+       <para><ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html">RowCounter</ulink> is a
+       mapreduce job to count all the rows of a table.  This is a good utility to use as a sanity check to ensure that HBase can read
+       all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single
+       process but it will run faster if you have a MapReduce cluster in place for it to exploit.
 <programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter &lt;tablename&gt; [&lt;column1&gt; &lt;column2&gt;...]
 </programlisting>
       </para>
       <para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
       </para>
+       <para>HBase ships another diagnostic mapreduce job called
+         <ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html">CellCounter</ulink>. Like
+         RowCounter, it gathers more fine-grained statistics about your table. The statistics gathered by RowCounter are more fine-grained
+         and include:
+         <itemizedlist>
+           <listitem>Total number of rows in the table.</listitem>
+           <listitem>Total number of CFs across all rows.</listitem>
+           <listitem>Total qualifiers across all rows.</listitem>
+           <listitem>Total occurrence of each CF.</listitem>
+           <listitem>Total occurrence of each qualifier.</listitem>
+           <listitem>Total number of versions of each qualifier.</listitem>
+         </itemizedlist>
+       </para>
+       <para>The program allows you to limit the scope of the run. Provide a row regex or prefix to limit the rows to analyze. Use
+         <code>hbase.mapreduce.scan.column.family</code> to specify scanning a single column family.
+         <programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CellCounter &lt;tablename&gt; &lt;outputDir&gt; [regex or prefix]</programlisting>
+       </para>
+       <para>Note: just like RowCounter, caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the
+       job configuration. </para>
    </section>

    </section>  <!--  tools -->