diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml index bfdefe445d0..fe0cd6c1ffc 100644 --- a/src/main/docbkx/book.xml +++ b/src/main/docbkx/book.xml @@ -3018,6 +3018,92 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName( +
+ Manual Region Splitting + It is possible to manually split your table, either at table creation (pre-splitting), + or at a later time as an administrative action. You might choose to split your region for + one or more of the following reasons. There may be other valid reasons, but the need to + manually split your table might also point to problems with your schema design. + + Reasons to Manually Split Your Table + + Your data is sorted by timeseries or another similar algorithm that sorts new data + at the end of the table. This means that the Region Server holding the last region is + always under load, and the other Region Servers are idle, or mostly idle. See also + . + + + You have developed an unexpected hotspot in one region of your table. For + instance, an application which tracks web searches might be inundated by a lot of + searches for a celebrity in the event of news about that celebrity. See for more discussion about this particular + scenario. + + + After a big increase to the number of Region Servers in your cluster, to get the + load spread out quickly. + + + Before a bulk-load which is likely to cause unusual and uneven load across + regions. + + + See for a discussion about the dangers and + possible benefits of managing splitting completely manually. +
+ Determining Split Points + The goal of splitting your table manually is to improve the chances of balancing the + load across the cluster in situations where good rowkey design alone won't get you + there. Keeping that in mind, the way you split your regions is very dependent upon the + characteristics of your data. It may be that you already know the best way to split your + table. If not, the way you split your table depends on what your keys are like. + + + Alphanumeric Rowkeys + + If your rowkeys start with a letter or number, you can split your table at + letter or number boundaries. For instance, the following command creates a table + with regions that split at each vowel, so the first region has A-D, the second + region has E-H, the third region has I-N, the fourth region has O-V, and the fifth + region has U-Z. + hbase> create 'test_table', 'f1', SPLITS=> ['a', 'e', 'i', 'o', 'u'] + The following command splits an existing table at split point '2'. + hbase> split 'test_table', '2' + You can also split a specific region by referring to its ID. You can find the + region ID by looking at either the table or region in the Web UI. It will be a + long number such as + t2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.. The + format is table_name,start_key,region_idTo split that + region into two, as close to equally as possible (at the nearest row boundary), + issue the following command. + hbase> split 't2,1,1410227759524.829850c6eaba1acc689480acd8f081bd.' + The split key is optional. If it is omitted, the table or region is split in + half. + The following example shows how to use the RegionSplitter to create 10 + regions, split at hexadecimal values. + hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 10 -f f1 + + + + Using a Custom Algorithm + + The RegionSplitter tool is provided with HBase, and uses a SplitAlgorithm to determine split points for you. As + parameters, you give it the algorithm, desired number of regions, and column + families. It includes two split algorithms. The first is the HexStringSplit algorithm, which assumes the row keys are + hexadecimal strings. The second, UniformSplit, assumes the row keys are random byte arrays. You will + probably need to develop your own SplitAlgorithm, using the provided ones as + models. + + + +
+
Online Region Merges diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml index 4a84164aee1..942822ce742 100644 --- a/src/main/docbkx/configuration.xml +++ b/src/main/docbkx/configuration.xml @@ -1355,7 +1355,9 @@ index e70ebc6..96f8c27 100644 hbase.hregion.max.filesize, hbase.regionserver.regionSplitLimit. A simplistic view of splitting is that when a region grows to hbase.hregion.max.filesize, it is split. - For most use patterns, most of the time, you should use automatic splitting. + For most use patterns, most of the time, you should use automatic splitting. See for more information about manual region + splitting. Instead of allowing HBase to split your regions automatically, you can choose to manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing splits works if you know your keyspace well, otherwise let HBase figure where to split for you. diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml index 50ba37542ca..aafb422df56 100644 --- a/src/main/docbkx/ops_mgt.xml +++ b/src/main/docbkx/ops_mgt.xml @@ -1730,8 +1730,8 @@ hbase> restore_snapshot 'myTableSnapshot-122112' pre-split 1 region per RS at most), especially if you don't know how much each table will grow. If you split too much, you may end up with too many regions, with some tables having too many small regions. - For pre-splitting howto, see . + For pre-splitting howto, see and + .
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml index f562b4525d2..2cddacd1f4a 100644 --- a/src/main/docbkx/performance.xml +++ b/src/main/docbkx/performance.xml @@ -680,9 +680,9 @@ admin.createTable(table, startKey, endKey, numberOfRegions); byte[][] splits = ...; // create your own splits admin.createTable(table, splits); - See for issues related to understanding your keyspace and - pre-creating regions. + See for issues related to understanding your + keyspace and pre-creating regions. See + for discussion on manually pre-splitting regions.