diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index bdc0fb1515e..5fbe4513f2c 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -648,15 +648,17 @@ admin.enableTable(table); Most of the time small inefficiencies don't matter all that much. Unfortunately, this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated several billion times in your data. - See for more information on HBase stores data internally. + See for more information on HBase stores data internally to see why this is important.
Column Families Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default). + See for more information on HBase stores data internally to see why this is important.
Attributes Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via") to store in HBase. + See for more information on HBase stores data internally to see why this is important.
Rowkey Length Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan). @@ -692,6 +694,7 @@ System.out.println("md5 digest as string length: " + sbDigest.length); // ret
+
Reverse Timestamps A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps @@ -888,7 +891,7 @@ System.out.println("md5 digest as string length: " + sbDigest.length); // ret
Operational and Performance Configuration Options See the Performance section for more information operational and performance - schema design options, such as Bloom Filters, Table-configured regionsizes, and blocksizes. + schema design options, such as Bloom Filters, Table-configured regionsizes, compression, and blocksizes.
diff --git a/src/docbkx/performance.xml b/src/docbkx/performance.xml index 39e1226f5e5..39c2121a7e1 100644 --- a/src/docbkx/performance.xml +++ b/src/docbkx/performance.xml @@ -198,7 +198,8 @@
Key and Attribute Lengths - See . + See . See also for + compression caveats.
Table RegionSize The regionsize can be set on a per-table basis via setFileSize on @@ -244,6 +245,15 @@ Compression Production systems should use compression with their ColumnFamily definitions. See for more information. +
However... + Compression deflates data on disk. When it's in-memory (e.g., in the + MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated. + So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate + the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names. + + See on for schema design tips, and for more information on HBase stores data internally. + +