HBASE-4208 updates to book.xml, performance.xml

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1158427 13f79535-47bb-0310-9956-ffa450edef68
2011-08-16 19:32:13 +00:00 · 2011-08-16 19:32:13 +00:00 · f43cb3245c
parent 9686b3d4ba
commit f43cb3245c
2 changed files with 53 additions and 8 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -108,10 +108,10 @@ TableMapReduceUtil.initTableMapperJob(
  job			// job instance
  );</programlisting>
  ...and the mapper instance would extend <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html">TableMapper</link>...
-	<programlisting>public class MyMapper extends TableMapper&lt;Text, LongWritable&gt; {
+	<programlisting>
-public void map(ImmutableBytesWritable row, Result value, Context context) 
+public class MyMapper extends TableMapper&lt;Text, LongWritable&gt; {
-throws InterruptedException, IOException {
+  public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
-// process data for the row from the Result instance.</programlisting>
+  // process data for the row from the Result instance.</programlisting>
  	</para>
   </section>
   <section xml:id="mapreduce.htable.access">
@ -211,7 +211,7 @@ admin.enableTable(table);
  </section>
  <section xml:id="keysize">
      <title>Try to minimize row and column sizes</title>
-      <subtitle>Or why are my storefile indices large?</subtitle>
+      <subtitle>Or why are my StoreFile indices large?</subtitle>
      <para>In HBase, values are always freighted with their coordinates; as a
          cell value passes through the system, it'll be accompanied by its
          row, column name, and timestamp - always.  If your rows and column names
@ -230,9 +230,25 @@ admin.enableTable(table);
                  Compression will also make for larger indices.  See
                  the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
                  up on the user mailing list.
-      `</para>
+       </para>
-      <para>In summary, although verbose attribute names (e.g., "myImportantAttribute") are easier to read, you pay for the clarity in storage and increased I/O - use shorter attribute names and constants. 
+       <para>Most frequently small inefficiencies don't matter all that much.  Unfortunately,
-      Also, try to keep the row-keys as small as possible too.</para>
+         this is a case where it does.  Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
       several billion times in your data</para>
       <section xml:id="keysize.cf"><title>Column Families</title>
         <para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
         </para> 
       </section>
       <section xml:id="keysize.atttributes"><title>Attributes</title>
         <para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
         to store in HBase.
         </para> 
       </section>
       <section xml:id="keysize.row"><title>Row Key</title>
         <para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan). 
         A short key that is useless for data access is not better than a longer key with better get/scan properties.  Expect tradeoffs
         when designing rowkeys.
         </para> 
       </section>
  </section>
  <section xml:id="schema.versions">
  <title>
@ -289,6 +305,14 @@ admin.enableTable(table);
  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
  </para>
  </section>
  <section xml:id="ttl">
  <title>Time To Live (TTL)</title>
  <para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
  This applies to <emphasis>all</emphasis> versions of a row - even the current one.  The TTL time encoded in the HBase for the row is specified in UTC.
  </para>
  <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
  </para>
  </section>
  <section xml:id="secondary.indexes">
  <title>
  Secondary Indexes and Alternate Query Paths
--- a/src/docbkx/performance.xml
+++ b/src/docbkx/performance.xml
@ -336,4 +336,25 @@ htable.close();</programlisting></para>
   </section>
  </section>  <!--  reading -->
  <section xml:id="perf.deleting">
    <title>Deleting from HBase</title>
     <section xml:id="perf.deleting.queue">
       <title>Using HBase Tables as Queues</title>
       <para>HBase tables are sometimes used as queues.  In this case, special care must be taken to regularly perform major compactions on tables used in
       this manner.  As is documented in <xref linkend="datamodel" />, marking rows as deleted creates additional StoreFiles which then need to be processed
       on reads.  Tombstones only get cleaned up with major compactions.
       </para>
       <para>See also <xref linkend="compaction" /> and <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
       </para>
     </section>
     <section xml:id="perf.deleting.rpc">
       <title>Delete RPC Behavior</title>
       <para>Be aware that <code>htable.delete(Delete)</code> doesn't use the writeBuffer.  It will execute an RegionServer RPC with each invocation.
       For a large number of deletes, consider <code>htable.delete(List)</code>.
       </para>
       <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29"></link>
       </para>
     </section>
  </section>  <!--  deleting -->
 </chapter>