Added note on hlog tool, that it can be used to look at files in recovered edits file

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1001907 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2010-09-27 20:58:26 +00:00
parent 1339f395d6
commit 5efa0ba9c9
1 changed files with 42 additions and 47 deletions

View File

@ -66,54 +66,48 @@
<para>TODO: Review all of the below to ensure it matches what was <para>TODO: Review all of the below to ensure it matches what was
committed -- St.Ack 20100901</para> committed -- St.Ack 20100901</para>
</note> </note>
<section> <section>
<title> <title>Region Size</title>
Region Size
</title> <para>Region size is one of those tricky things, there are a few factors
<para>Region size is one of those tricky things, there are a few factors to consider: to consider:</para>
</para>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para> <para>Regions are the basic element of availability and
Regions are the basic element of availability and distribution. distribution.</para>
</para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>HBase scales by having regions across many servers. Thus if
HBase scales by having regions across many servers. Thus if you you have 2 regions for 16GB data, on a 20 node machine you are a net
have 2 regions for 16GB data, on a 20 node machine you are a net loss loss there.</para>
there.
</para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>High region count has been known to make things slow, this is
High region count has been known to make things slow, this is getting better, but it is probably better to have 700 regions than
getting better, but it is probably better to have 700 regions than 3000 for the same amount of data.</para>
3000 for the same amount of data.
</para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>Low region count prevents parallel scalability as per point
Low region count prevents parallel scalability as per point #2. #2. This really cant be stressed enough, since a common problem is
This really cant be stressed enough, since a common problem is loading loading 200MB data into HBase then wondering why your awesome 10
200MB data into HBase then wondering why your awesome 10 node cluster node cluster is mostly idle.</para>
is mostly idle.
</para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>There is not much memory footprint difference between 1 region
There is not much memory footprint difference between 1 region and and 10 in terms of indexes, etc, held by the regionserver.</para>
10 in terms of indexes, etc, held by the regionserver.
</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>Its probably best to stick to the default, <para>Its probably best to stick to the default, perhaps going smaller
perhaps going smaller for hot tables (or manually split hot regions for hot tables (or manually split hot regions to spread the load over
to spread the load over the cluster), or go with a 1GB region size the cluster), or go with a 1GB region size if your cell sizes tend to be
if your cell sizes tend to be largish (100k and up). largish (100k and up).</para>
</para>
</section> </section>
<section> <section>
@ -739,10 +733,11 @@ if your cell sizes tend to be largish (100k and up).
<title>WAL Tools</title> <title>WAL Tools</title>
<section> <section>
<title><classname>HLog</classname> main</title> <title><classname>HLog</classname> tool</title>
<para>The main method on <classname>HLog</classname> offers manual <para>The main method on <classname>HLog</classname> offers manual
split and dump facilities.</para> split and dump facilities. Pass it WALs or the product of a split, the
content of the <filename>recovered.edits</filename>. directory.</para>
<para>You can get a textual dump of a WAL file content by doing the <para>You can get a textual dump of a WAL file content by doing the
following:<programlisting> <code>$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012</code> </programlisting>The following:<programlisting> <code>$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012</code> </programlisting>The