Added note on hlog tool, that it can be used to look at files in recovered edits file

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1001907 13f79535-47bb-0310-9956-ffa450edef68
2010-09-27 20:58:26 +00:00 · 2010-09-27 20:58:26 +00:00 · 5efa0ba9c9
parent 1339f395d6
commit 5efa0ba9c9
1 changed files with 42 additions and 47 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -66,54 +66,48 @@
      <para>TODO: Review all of the below to ensure it matches what was
      committed -- St.Ack 20100901</para>
    </note>
    <section>
-       <title>
+      <title>Region Size</title>
           Region Size
       </title>
 <para>Region size is one of those tricky things, there are a few factors to consider:
 </para>
        <itemizedlist>
          <listitem>
          <para>
 Regions are the basic element of availability and distribution.
          </para>
          </listitem>
          <listitem>
          <para>
 HBase scales by having regions across many servers.  Thus if you
 have 2 regions for 16GB data, on a 20 node machine you are a net loss
 there.
          </para>
          </listitem>
          <listitem>
          <para>
 High region count has been known to make things slow, this is
 getting better, but it is probably better to have 700 regions than
 3000 for the same amount of data.
          </para>
          </listitem>
          <listitem>
          <para>
 Low region count prevents parallel scalability as per point #2.
 This really cant be stressed enough, since a common problem is loading
 200MB data into HBase then wondering why your awesome 10 node cluster
 is mostly idle.
          </para>
          </listitem>
          <listitem>
          <para>
 There is not much memory footprint difference between 1 region and
 10 in terms of indexes, etc, held by the regionserver.
          </para>
          </listitem>
        </itemizedlist>
-<para>Its probably best to stick to the default,
+      <para>Region size is one of those tricky things, there are a few factors
-perhaps going smaller for hot tables (or manually split hot regions
+      to consider:</para>
 to spread the load over the cluster), or go with a 1GB region size
 if your cell sizes tend to be largish (100k and up).
 </para>
      <itemizedlist>
        <listitem>
          <para>Regions are the basic element of availability and
          distribution.</para>
        </listitem>
        <listitem>
          <para>HBase scales by having regions across many servers. Thus if
          you have 2 regions for 16GB data, on a 20 node machine you are a net
          loss there.</para>
        </listitem>
        <listitem>
          <para>High region count has been known to make things slow, this is
          getting better, but it is probably better to have 700 regions than
          3000 for the same amount of data.</para>
        </listitem>
        <listitem>
          <para>Low region count prevents parallel scalability as per point
          #2. This really cant be stressed enough, since a common problem is
          loading 200MB data into HBase then wondering why your awesome 10
          node cluster is mostly idle.</para>
        </listitem>
        <listitem>
          <para>There is not much memory footprint difference between 1 region
          and 10 in terms of indexes, etc, held by the regionserver.</para>
        </listitem>
      </itemizedlist>
      <para>Its probably best to stick to the default, perhaps going smaller
      for hot tables (or manually split hot regions to spread the load over
      the cluster), or go with a 1GB region size if your cell sizes tend to be
      largish (100k and up).</para>
    </section>
    <section>
@ -739,10 +733,11 @@ if your cell sizes tend to be largish (100k and up).
      <title>WAL Tools</title>
      <section>
-        <title><classname>HLog</classname> main</title>
+        <title><classname>HLog</classname> tool</title>
        <para>The main method on <classname>HLog</classname> offers manual
-        split and dump facilities.</para>
+        split and dump facilities. Pass it WALs or the product of a split, the
        content of the <filename>recovered.edits</filename>. directory.</para>
        <para>You can get a textual dump of a WAL file content by doing the
        following:<programlisting> <code>$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012</code> </programlisting>The