hbase-7241. refGuide. Perf/Schema design cleanup.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1415422 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2012-11-29 22:46:02 +00:00
parent 726f822774
commit cb8aca6e74
2 changed files with 47 additions and 27 deletions

View File

@ -775,6 +775,34 @@ System.out.println("md5 digest as string length: " + sbDigest.length); // ret
<para>Lesson #2: While generally not advisable, using hex-keys (and more generally, displayable data) can still work with pre-split
tables as long as all the created regions are accessible in the keyspace.
</para>
<para>To conclude this example, the following is an example of how appropriate splits can be pre-created for hex-keys:.
</para>
<programlisting>public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
throws IOException {
try {
admin.createTable( table, splits );
return true;
} catch (TableExistsException e) {
logger.info("table " + table.getNameAsString() + " already exists");
// the table already exists...
return false;
}
}
public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) {
byte[][] splits = new byte[numRegions-1][];
BigInteger lowestKey = new BigInteger(startKey, 16);
BigInteger highestKey = new BigInteger(endKey, 16);
BigInteger range = highestKey.subtract(lowestKey);
BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
lowestKey = lowestKey.add(regionIncrement);
for(int i=0; i &lt; numRegions-1;i++) {
BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
byte[] b = String.format("%016x", key).getBytes();
splits[i] = b;
}
return splits;
}</programlisting>
</section>
</section> <!-- rowkey design -->
<section xml:id="schema.versions">

View File

@ -303,35 +303,27 @@
Table Creation: Pre-Creating Regions
</title>
<para>
Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions. Be somewhat conservative in this, because too-many regions can actually degrade performance. An example of pre-creation using hex-keys is as follows (note: this example may need to be tweaked to the individual applications keys):
Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region
until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions.
Be somewhat conservative in this, because too-many regions can actually degrade performance.
</para>
<para>There are two different approaches to pre-creating splits. The first approach is to rely on the default <code>HBaseAdmin</code> strategy
(which is implemented in <code>Bytes.split</code>)...
</para>
<programlisting>
byte[] startKey = ...; // your lowest keuy
byte[] endKey = ...; // your highest key
int numberOfRegions = ...; // # of regions to create
admin.createTable(table, startKey, endKey, numberOfRegions);
</programlisting>
<para>And the other approach is to define the splits yourself...
</para>
<programlisting>
byte[][] splits = ...; // create your own splits
admin.createTable(table, splits);
</programlisting>
<para>
<programlisting>public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
throws IOException {
try {
admin.createTable( table, splits );
return true;
} catch (TableExistsException e) {
logger.info("table " + table.getNameAsString() + " already exists");
// the table already exists...
return false;
}
}
public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) {
byte[][] splits = new byte[numRegions-1][];
BigInteger lowestKey = new BigInteger(startKey, 16);
BigInteger highestKey = new BigInteger(endKey, 16);
BigInteger range = highestKey.subtract(lowestKey);
BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
lowestKey = lowestKey.add(regionIncrement);
for(int i=0; i &lt; numRegions-1;i++) {
BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
byte[] b = String.format("%016x", key).getBytes();
splits[i] = b;
}
return splits;
}</programlisting>
See <xref linkend="rowkey.regionsplits"/> for issues related to understanding your keyspace and pre-creating regions.
</para>
</section>
<section xml:id="def.log.flush">