Added note on hlog tool, that it can be used to look at files in recovered edits file

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1001907 13f79535-47bb-0310-9956-ffa450edef68
2010-09-27 20:58:26 +00:00 · 2010-09-27 20:58:26 +00:00 · 5efa0ba9c9
commit 5efa0ba9c9
parent 1339f395d6
1 changed files with 42 additions and 47 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -66,54 +66,48 @@
      <para>TODO: Review all of the below to ensure it matches what was
      committed -- St.Ack 20100901</para>
    </note>
+
    <section>
-       <title>
-           Region Size
-       </title>
-<para>Region size is one of those tricky things, there are a few factors to consider:
-</para>
-        <itemizedlist>
-          <listitem>
-          <para>
-Regions are the basic element of availability and distribution.
-          </para>
-          </listitem>
-          <listitem>
-          <para>
-HBase scales by having regions across many servers.  Thus if you
-have 2 regions for 16GB data, on a 20 node machine you are a net loss
-there.
-          </para>
-          </listitem>
-          <listitem>
-          <para>
-High region count has been known to make things slow, this is
-getting better, but it is probably better to have 700 regions than
-3000 for the same amount of data.
-          </para>
-          </listitem>
-          <listitem>
-          <para>
-Low region count prevents parallel scalability as per point #2.
-This really cant be stressed enough, since a common problem is loading
-200MB data into HBase then wondering why your awesome 10 node cluster
-is mostly idle.
-          </para>
-          </listitem>
-          <listitem>
-          <para>
-There is not much memory footprint difference between 1 region and
-10 in terms of indexes, etc, held by the regionserver.
-          </para>
-          </listitem>
-        </itemizedlist>
+      <title>Region Size</title>

-<para>Its probably best to stick to the default,
-perhaps going smaller for hot tables (or manually split hot regions
-to spread the load over the cluster), or go with a 1GB region size
-if your cell sizes tend to be largish (100k and up).
-</para>
+      <para>Region size is one of those tricky things, there are a few factors
+      to consider:</para>

+      <itemizedlist>
+        <listitem>
+          <para>Regions are the basic element of availability and
+          distribution.</para>
+        </listitem>
+
+        <listitem>
+          <para>HBase scales by having regions across many servers. Thus if
+          you have 2 regions for 16GB data, on a 20 node machine you are a net
+          loss there.</para>
+        </listitem>
+
+        <listitem>
+          <para>High region count has been known to make things slow, this is
+          getting better, but it is probably better to have 700 regions than
+          3000 for the same amount of data.</para>
+        </listitem>
+
+        <listitem>
+          <para>Low region count prevents parallel scalability as per point
+          #2. This really cant be stressed enough, since a common problem is
+          loading 200MB data into HBase then wondering why your awesome 10
+          node cluster is mostly idle.</para>
+        </listitem>
+
+        <listitem>
+          <para>There is not much memory footprint difference between 1 region
+          and 10 in terms of indexes, etc, held by the regionserver.</para>
+        </listitem>
+      </itemizedlist>
+
+      <para>Its probably best to stick to the default, perhaps going smaller
+      for hot tables (or manually split hot regions to spread the load over
+      the cluster), or go with a 1GB region size if your cell sizes tend to be
+      largish (100k and up).</para>
    </section>

    <section>
@ -739,10 +733,11 @@ if your cell sizes tend to be largish (100k and up).
      <title>WAL Tools</title>

      <section>
-        <title><classname>HLog</classname> main</title>
+        <title><classname>HLog</classname> tool</title>

        <para>The main method on <classname>HLog</classname> offers manual
-        split and dump facilities.</para>
+        split and dump facilities. Pass it WALs or the product of a split, the
+        content of the <filename>recovered.edits</filename>. directory.</para>

        <para>You can get a textual dump of a WAL file content by doing the
        following:<programlisting> <code>$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012</code> </programlisting>The