From a82042205a0604004fad7fa5cb9eecf9b43426af Mon Sep 17 00:00:00 2001 From: Michael Stack Date: Thu, 10 Mar 2011 20:04:27 +0000 Subject: [PATCH] Added section on keeping row and column names small to schema section git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1080332 13f79535-47bb-0310-9956-ffa450edef68 --- src/docbkx/book.xml | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index 9c4a9038f14..ada9ae638b6 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -1384,6 +1384,25 @@ of all regions. successful example. It has a page describing the schema it uses in HBase. You might also consider just using OpenTSDB altogether. +
+ Try to minimize row and column sizes + In HBase, values are always freighted with their coordinates; as a + cell value passes through the system, it'll be accompanied by its + row, column name, and timestamp. Always. If your rows and column names + are large, especially compared o the size of the cell value, then + you may run up against some interesting scenarios. One such is + the case described by Marc Limotte at the tail of + HBASE-3551 + (recommended!). + Therein, the indices that are kept on HBase storefiles (HFiles) + to facilitate random access may end up occupyng large chunks of the HBase + allotted RAM because the cell value coordinates are large. + Mark in the above cited comment suggests upping the block size so + entries in the store file index happen at a larger interval or + modify the table schema so it makes for smaller rows and column + names. + ` +