Added note on hlog tool, that it can be used to look at files in recovered edits file
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1001907 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
1339f395d6
commit
5efa0ba9c9
|
@ -66,54 +66,48 @@
|
||||||
<para>TODO: Review all of the below to ensure it matches what was
|
<para>TODO: Review all of the below to ensure it matches what was
|
||||||
committed -- St.Ack 20100901</para>
|
committed -- St.Ack 20100901</para>
|
||||||
</note>
|
</note>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title>
|
<title>Region Size</title>
|
||||||
Region Size
|
|
||||||
</title>
|
|
||||||
<para>Region size is one of those tricky things, there are a few factors to consider:
|
|
||||||
</para>
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Regions are the basic element of availability and distribution.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
HBase scales by having regions across many servers. Thus if you
|
|
||||||
have 2 regions for 16GB data, on a 20 node machine you are a net loss
|
|
||||||
there.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
High region count has been known to make things slow, this is
|
|
||||||
getting better, but it is probably better to have 700 regions than
|
|
||||||
3000 for the same amount of data.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Low region count prevents parallel scalability as per point #2.
|
|
||||||
This really cant be stressed enough, since a common problem is loading
|
|
||||||
200MB data into HBase then wondering why your awesome 10 node cluster
|
|
||||||
is mostly idle.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
There is not much memory footprint difference between 1 region and
|
|
||||||
10 in terms of indexes, etc, held by the regionserver.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
|
|
||||||
<para>Its probably best to stick to the default,
|
<para>Region size is one of those tricky things, there are a few factors
|
||||||
perhaps going smaller for hot tables (or manually split hot regions
|
to consider:</para>
|
||||||
to spread the load over the cluster), or go with a 1GB region size
|
|
||||||
if your cell sizes tend to be largish (100k and up).
|
|
||||||
</para>
|
|
||||||
|
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para>Regions are the basic element of availability and
|
||||||
|
distribution.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>HBase scales by having regions across many servers. Thus if
|
||||||
|
you have 2 regions for 16GB data, on a 20 node machine you are a net
|
||||||
|
loss there.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>High region count has been known to make things slow, this is
|
||||||
|
getting better, but it is probably better to have 700 regions than
|
||||||
|
3000 for the same amount of data.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>Low region count prevents parallel scalability as per point
|
||||||
|
#2. This really cant be stressed enough, since a common problem is
|
||||||
|
loading 200MB data into HBase then wondering why your awesome 10
|
||||||
|
node cluster is mostly idle.</para>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>There is not much memory footprint difference between 1 region
|
||||||
|
and 10 in terms of indexes, etc, held by the regionserver.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<para>Its probably best to stick to the default, perhaps going smaller
|
||||||
|
for hot tables (or manually split hot regions to spread the load over
|
||||||
|
the cluster), or go with a 1GB region size if your cell sizes tend to be
|
||||||
|
largish (100k and up).</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
@ -739,10 +733,11 @@ if your cell sizes tend to be largish (100k and up).
|
||||||
<title>WAL Tools</title>
|
<title>WAL Tools</title>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title><classname>HLog</classname> main</title>
|
<title><classname>HLog</classname> tool</title>
|
||||||
|
|
||||||
<para>The main method on <classname>HLog</classname> offers manual
|
<para>The main method on <classname>HLog</classname> offers manual
|
||||||
split and dump facilities.</para>
|
split and dump facilities. Pass it WALs or the product of a split, the
|
||||||
|
content of the <filename>recovered.edits</filename>. directory.</para>
|
||||||
|
|
||||||
<para>You can get a textual dump of a WAL file content by doing the
|
<para>You can get a textual dump of a WAL file content by doing the
|
||||||
following:<programlisting> <code>$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012</code> </programlisting>The
|
following:<programlisting> <code>$ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012</code> </programlisting>The
|
||||||
|
|
Loading…
Reference in New Issue