From b08f746cd24a263fe14fe222caa146cee9099125 Mon Sep 17 00:00:00 2001 From: Doug Meil Date: Wed, 2 Nov 2011 21:09:47 +0000 Subject: [PATCH] HBASE-4731 book.xml, schema design - rowkey numeric example git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1196801 13f79535-47bb-0310-9956-ffa450edef68 --- src/docbkx/book.xml | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index 371a88e3cf1..7bb504980cb 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -614,7 +614,8 @@ admin.enableTable(table); Most of the time small inefficiencies don't matter all that much. Unfortunately, this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated - several billion times in your data. See for more information on HBase stores data internally. + several billion times in your data. + See for more information on HBase stores data internally.
Column Families Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default). @@ -630,6 +631,13 @@ admin.enableTable(table); when designing rowkeys.
+
Numeric Example + A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes. + If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes. + This is a perfect example of a small inefficiency that may not seem like much, but can add up in HBase when + used as rowkeys. + +
Reverse Timestamps A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps