HBASE-4735 book.xml, schema design keysize code example
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1196860 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
88643f7220
commit
c61f0f296f
|
@ -567,8 +567,8 @@ admin.enableTable(table);
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="number.of.cfs.card"><title>Cardinality of ColumnFamilies</title>
|
<section xml:id="number.of.cfs.card"><title>Cardinality of ColumnFamilies</title>
|
||||||
<para>Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows).
|
<para>Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows).
|
||||||
If ColumnFamily-A has 1000,000 rows and ColumnFamily-B has 1 billion rows, ColumnFamily-A's data will likely be spread
|
If ColumnFamilyA has 1000,000 rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA's data will likely be spread
|
||||||
across many, many regions (and RegionServers). This makes mass scans for ColumnFamily-A less efficient.
|
across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
@ -631,11 +631,32 @@ admin.enableTable(table);
|
||||||
when designing rowkeys.
|
when designing rowkeys.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="keysize.example"><title>Numeric Example</title>
|
<section xml:id="keysize.patterns"><title>Byte Patterns</title>
|
||||||
<para>A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes.
|
<para>A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes.
|
||||||
If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
|
If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
|
||||||
This is a perfect example of a small inefficiency that may not seem like much, but can add up in HBase when
|
</para>
|
||||||
used as rowkeys.
|
<para>Not convinced? Below is some sample code that you can run on your own.
|
||||||
|
<programlisting>
|
||||||
|
// long
|
||||||
|
//
|
||||||
|
long l = 1234567890L;
|
||||||
|
byte[] lb = Bytes.toBytes(l);
|
||||||
|
System.out.println("long bytes length: " + lb.length); // returns 8
|
||||||
|
|
||||||
|
String s = "" + l;
|
||||||
|
byte[] sb = Bytes.toBytes(s);
|
||||||
|
System.out.println("long as string length: " + sb.length); // returns 10
|
||||||
|
|
||||||
|
// hash
|
||||||
|
//
|
||||||
|
MessageDigest md = MessageDigest.getInstance("MD5");
|
||||||
|
byte[] digest = md.digest(Bytes.toBytes(s));
|
||||||
|
System.out.println("md5 digest bytes length: " + digest.length); // returns 16
|
||||||
|
|
||||||
|
String sDigest = new String(digest);
|
||||||
|
byte[] sbDigest = Bytes.toBytes(sDigest);
|
||||||
|
System.out.println("md5 digest as string length: " + sbDigest.length); // returns 26
|
||||||
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|
Loading…
Reference in New Issue