Added section on keeping row and column names small to schema section

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1080332 13f79535-47bb-0310-9956-ffa450edef68
2011-03-10 20:04:27 +00:00 · 2011-03-10 20:04:27 +00:00 · a82042205a
parent 3d4a190562
commit a82042205a
1 changed files with 19 additions and 0 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -1384,6 +1384,25 @@ of all regions.
  successful example.  It has a page describing the schema it uses in
  HBase.  You might also consider just using OpenTSDB altogether.</para>
  </section>
+  <section xml:id="keysize">
+      <title>Try to minimize row and column sizes</title>
+      <para>In HBase, values are always freighted with their coordinates; as a
+          cell value passes through the system, it'll be accompanied by its
+          row, column name, and timestamp.  Always.  If your rows and column names
+          are large, especially compared o the size of the cell value, then
+          you may run up against some interesting scenarios.  One such is
+          the case described by Marc Limotte at the tail of
+          <link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
+          (recommended!).
+          Therein, the indices that are kept on HBase storefiles (<link linkend="hfile">HFile</link>s)
+                  to facilitate random access may end up occupyng large chunks of the HBase
+                  allotted RAM because the cell value coordinates are large.
+                  Mark in the above cited comment suggests upping the block size so
+                  entries in the store file index happen at a larger interval or
+                  modify the table schema so it makes for smaller rows and column
+                  names.
+      `</para>
+  </section>
  </chapter>

  <chapter xml:id="hbase_metrics">