HBASE-4189 small fixes in book.xml and performance.xml

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1156398 13f79535-47bb-0310-9956-ffa450edef68
2011-08-10 23:03:54 +00:00 · 2011-08-10 23:03:54 +00:00 · 0cfb97d014
parent 3ab49af40f
commit 0cfb97d014
2 changed files with 17 additions and 8 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -179,7 +179,7 @@ admin.enableTable(table);
      On the number of column families
  </title>
  <para>
-      HBase currently does not do well with anything about two or three column families so keep the number
+      HBase currently does not do well with anything above two or three column families so keep the number
      of column families in your schema low.  Currently, flushing and compactions are done on a per Region basis so
      if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
      will also be flushed though the amount of data they carry is small.  Compaction is currently triggered
@ -187,7 +187,7 @@ admin.enableTable(table);
      flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
      changing flushing and compaction to work on a per column family basis).
    </para>
-    <para>Try to make do with one column famliy if you can in your schemas.  Only introduce a
+    <para>Try to make do with one column family if you can in your schemas.  Only introduce a
        second and third column family in the case where data access is usually column scoped;
        i.e. you query one column family or the other but usually not both at the one time.
    </para>
@ -214,7 +214,7 @@ admin.enableTable(table);
      <subtitle>Or why are my storefile indices large?</subtitle>
      <para>In HBase, values are always freighted with their coordinates; as a
          cell value passes through the system, it'll be accompanied by its
-          row, column name, and timestamp.  Always.  If your rows and column names
+          row, column name, and timestamp - always.  If your rows and column names
          are large, especially compared to the size of the cell value, then
          you may run up against some interesting scenarios.  One such is
          the case described by Marc Limotte at the tail of
@ -231,6 +231,8 @@ admin.enableTable(table);
                  the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
                  up on the user mailing list.
      `</para>
+      <para>In summary, although verbose attribute names (e.g., "myImportantAttribute") are easier to read, you pay for the clarity in storage and increased I/O - use shorter attribute names and constants. 
+      Also, try to keep the row-keys as small as possible too.</para>
  </section>
  <section xml:id="schema.versions">
  <title>
--- a/src/docbkx/performance.xml
+++ b/src/docbkx/performance.xml
@ -128,12 +128,19 @@

  </section>

-  <section xml:id="perf.number.of.cfs">
-    <title>Number of Column Families</title>
-
-    <para>See <xref linkend="number.of.cfs" />.</para>
+  <section xml:id="perf.schema">
+      <title>Schema Design</title>
+  
+    <section xml:id="perf.number.of.cfs">
+      <title>Number of Column Families</title>
+      <para>See <xref linkend="number.of.cfs" />.</para>
+    </section>
+    <section xml:id="perf.schema.keys">
+      <title>Key and Attribute Lengths</title>
+      <para>See <xref linkend="keysize" />.</para>
+    </section>
  </section>
-
+  
  <section xml:id="perf.writing">
    <title>Writing to HBase</title>