HBASE-11692 Document how and why to do a manual region split
Incorporated Stack's feedback
This commit is contained in:
parent
695261c4a9
commit
141e31b7bd
|
@ -3018,6 +3018,92 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
|
|||
</section>
|
||||
</section>
|
||||
|
||||
<section xml:id="manual_region_splitting_decisions">
|
||||
<title>Manual Region Splitting</title>
|
||||
<para>It is possible to manually split your table, either at table creation (pre-splitting),
|
||||
or at a later time as an administrative action. You might choose to split your region for
|
||||
one or more of the following reasons. There may be other valid reasons, but the need to
|
||||
manually split your table might also point to problems with your schema design.</para>
|
||||
<itemizedlist>
|
||||
<title>Reasons to Manually Split Your Table</title>
|
||||
<listitem>
|
||||
<para>Your data is sorted by timeseries or another similar algorithm that sorts new data
|
||||
at the end of the table. This means that the Region Server holding the last region is
|
||||
always under load, and the other Region Servers are idle, or mostly idle. See also
|
||||
<xref linkend="timeseries"/>.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>You have developed an unexpected hotspot in one region of your table. For
|
||||
instance, an application which tracks web searches might be inundated by a lot of
|
||||
searches for a celebrity in the event of news about that celebrity. See <xref
|
||||
linkend="perf.one.region"/> for more discussion about this particular
|
||||
scenario.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>After a big increase to the number of Region Servers in your cluster, to get the
|
||||
load spread out quickly.</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>Before a bulk-load which is likely to cause unusual and uneven load across
|
||||
regions.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>See <xref linkend="disable.splitting"/> for a discussion about the dangers and
|
||||
possible benefits of managing splitting completely manually.</para>
|
||||
<section>
|
||||
<title>Determining Split Points</title>
|
||||
<para>The goal of splitting your table manually is to improve the chances of balancing the
|
||||
load across the cluster in situations where good rowkey design alone won't get you
|
||||
there. Keeping that in mind, the way you split your regions is very dependent upon the
|
||||
characteristics of your data. It may be that you already know the best way to split your
|
||||
table. If not, the way you split your table depends on what your keys are like.</para>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Alphanumeric Rowkeys</term>
|
||||
<listitem>
|
||||
<para>If your rowkeys start with a letter or number, you can split your table at
|
||||
letter or number boundaries. For instance, the following command creates a table
|
||||
with regions that split at each vowel, so the first region has A-D, the second
|
||||
region has E-H, the third region has I-N, the fourth region has O-V, and the fifth
|
||||
region has U-Z.</para>
|
||||
<screen>hbase> create 'test_table', 'f1', SPLITS=> ['a', 'e', 'i', 'o', 'u']</screen>
|
||||
<para>The following command splits an existing table at split point '2'.</para>
|
||||
<screen>hbase> split 'test_table', '2'</screen>
|
||||
<para>You can also split a specific region by referring to its ID. You can find the
|
||||
region ID by looking at either the table or region in the Web UI. It will be a
|
||||
long number such as
|
||||
<literal>t2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.</literal>. The
|
||||
format is <replaceable>table_name,start_key,region_id</replaceable>To split that
|
||||
region into two, as close to equally as possible (at the nearest row boundary),
|
||||
issue the following command.</para>
|
||||
<screen>hbase> split 't2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.'</screen>
|
||||
<para>The split key is optional. If it is omitted, the table or region is split in
|
||||
half.</para>
|
||||
<para>The following example shows how to use the RegionSplitter to create 10
|
||||
regions, split at hexadecimal values.</para>
|
||||
<screen>hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 10 -f f1</screen>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Using a Custom Algorithm</term>
|
||||
<listitem>
|
||||
<para>The RegionSplitter tool is provided with HBase, and uses a <firstterm><link
|
||||
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.SplitAlgorithm.html"
|
||||
>SplitAlgorithm</link></firstterm> to determine split points for you. As
|
||||
parameters, you give it the algorithm, desired number of regions, and column
|
||||
families. It includes two split algorithms. The first is the <code><link
|
||||
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html"
|
||||
>HexStringSplit</link></code> algorithm, which assumes the row keys are
|
||||
hexadecimal strings. The second, <link
|
||||
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html"
|
||||
>UniformSplit</link>, assumes the row keys are random byte arrays. You will
|
||||
probably need to develop your own SplitAlgorithm, using the provided ones as
|
||||
models. </para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</section>
|
||||
</section>
|
||||
<section>
|
||||
<title>Online Region Merges</title>
|
||||
|
||||
|
|
|
@ -1355,7 +1355,9 @@ index e70ebc6..96f8c27 100644
|
|||
<varname>hbase.hregion.max.filesize</varname>,
|
||||
<varname>hbase.regionserver.regionSplitLimit</varname>. A simplistic view of splitting
|
||||
is that when a region grows to <varname>hbase.hregion.max.filesize</varname>, it is split.
|
||||
For most use patterns, most of the time, you should use automatic splitting.</para>
|
||||
For most use patterns, most of the time, you should use automatic splitting. See <xref
|
||||
linkend="manual_region_splitting_decisions"/> for more information about manual region
|
||||
splitting.</para>
|
||||
<para>Instead of allowing HBase to split your regions automatically, you can choose to
|
||||
manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing
|
||||
splits works if you know your keyspace well, otherwise let HBase figure where to split for you.
|
||||
|
|
|
@ -1730,8 +1730,8 @@ hbase> restore_snapshot 'myTableSnapshot-122112'
|
|||
pre-split 1 region per RS at most), especially if you don't know how much each table will
|
||||
grow. If you split too much, you may end up with too many regions, with some tables having
|
||||
too many small regions.</para>
|
||||
<para>For pre-splitting howto, see <xref
|
||||
linkend="precreate.regions" />.</para>
|
||||
<para>For pre-splitting howto, see <xref linkend="manual_region_splitting_decisions"/> and
|
||||
<xref linkend="precreate.regions"/>.</para>
|
||||
</section>
|
||||
<!-- ops.capacity.config.presplit -->
|
||||
</section>
|
||||
|
|
|
@ -680,9 +680,9 @@ admin.createTable(table, startKey, endKey, numberOfRegions);
|
|||
byte[][] splits = ...; // create your own splits
|
||||
admin.createTable(table, splits);
|
||||
</programlisting>
|
||||
<para> See <xref
|
||||
linkend="rowkey.regionsplits" /> for issues related to understanding your keyspace and
|
||||
pre-creating regions. </para>
|
||||
<para> See <xref linkend="rowkey.regionsplits"/> for issues related to understanding your
|
||||
keyspace and pre-creating regions. See <xref linkend="manual_region_splitting_decisions"/>
|
||||
for discussion on manually pre-splitting regions.</para>
|
||||
</section>
|
||||
<section
|
||||
xml:id="def.log.flush">
|
||||
|
|
Loading…
Reference in New Issue