HBASE-4172 laundry list of changes (book, configuration, developer, performance)

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1154555 13f79535-47bb-0310-9956-ffa450edef68
2011-08-06 17:21:20 +00:00 · 2011-08-06 17:21:20 +00:00 · dc8bf13685
parent 2961f7dcd4
commit dc8bf13685
4 changed files with 52 additions and 14 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -260,6 +260,15 @@ admin.enableTable(table);
  search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and 
  that includes versioning.  Take that into consideration when making your design, as well as block size for the ColumnFamily.  
  </para>
+    <section xml:id="counters">
+      <title>Counters</title>
+      <para>
+      One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers).  See 
+      <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
+      </para>
+      <para>Synchronization on counters are done on the RegionServer, not in the client.
+      </para>
+    </section> 
  </section>
  <section xml:id="cf.in.memory">
  <title>
@ -811,7 +820,7 @@ admin.enableTable(table);
         <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
         is responsible for finding RegionServers that are serving the
         particular row range of interest.  It does this by querying
-         the <code>.META.</code> and <code>-ROOT</code> catalog tables
+         the <code>.META.</code> and <code>-ROOT-</code> catalog tables
         (TODO: Explain).  After locating the required
         region(s), the client <emphasis>directly</emphasis> contacts
         the RegionServer serving that region (i.e., it does not go
@ -842,6 +851,11 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
        For more information about how connections are handled in the HBase client,
        see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html">HConnectionManager</link>.
          </para>
+          <section xml:id="client.connection.pooling"><title>Connection Pooling</title>
+            <para>For applications which require high-end multithreaded access (e.g., web-servers or application servers that may serve many application threads
+            in a single JVM), see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html">HTablePool</link>.
+            </para>
+          </section>
   	  </section>
 	   <section xml:id="client.writebuffer"><title>WriteBuffer and Batch Methods</title>
           <para>If <xref linkend="perf.hbase.client.autoflush" /> is turned off on
@ -1055,7 +1069,7 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
         For a description of how a minor compaction picks files to compact, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">ascii diagram in the Store source code.</link>
         </para>
         <para>After a major compaction runs there will be a single storefile per store, and this will help performance usually.  Caution:  major compactions rewrite all of the stores data and on a loaded system, this may not be tenable;
-             major compactions will usually have to be <xref linkend="disable.splitting" /> on large systems.
+             major compactions will usually have to be done manually on large systems.  See <xref linkend="managed.compactions" />.
        </para>
      </section>

--- a/src/docbkx/configuration.xml
+++ b/src/docbkx/configuration.xml
@ -1076,6 +1076,17 @@ script to perform a network IO safe rolling split
 of all regions.
 </para>
      </section>
+      <section xml:id="managed.compactions"><title>Managed Compactions</title>
+      <para>A common administrative technique is to manage major compactions manually, rather than letting 
+      HBase do it.  By default, <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname> is one day and major compactions
+      may kick in when you least desire it - especially on a busy system.  To "turn off" automatic major compactions set
+      the value to <varname>Long.MAX_VALUE</varname>. 
+      </para>
+      <para>It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when
+      they occur.  They can be administered through the HBase shell, or via 
+      <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
+      </para>
+      </section>
      
      </section>

--- a/src/docbkx/developer.xml
+++ b/src/docbkx/developer.xml
@ -35,6 +35,7 @@ git clone git://git.apache.org/hbase.git
            <para>See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3678">HBASE-3678 Add Eclipse-based Apache Formatter to HBase Wiki</link>
              for an Eclipse formatter to help ensure your code conforms to HBase'y coding conventsion.
            The issue includes instructions for loading the attached formatter.</para>
+            <para>Also, no @author tags - that's a rule.  Quality Javadoc comments are appreciated.  And include the Apache license.</para>
            </section>         
            <section xml:id="eclipse.svn">
            <title>Subversive Plugin</title>
@ -129,13 +130,15 @@ mvn test -Dtest=TestXYZ

    <section xml:id="getting.involved"> 
        <title>Getting Involved</title>
-        <para>HBase gets better only when people contribute!  The following are highlights from the HBase wiki on 
-        <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToContribute">How To Contribute</link>.
+        <para>HBase gets better only when people contribute!
        </para>
        <section xml:id="mailing.list">
          <title>Mailing Lists</title>
-          <para>Sign up for the dev-list, and the user-list too for greater coverage.  See the 
+          <para>Sign up for the dev-list and the user-list.  See the 
          <link xlink:href="http://hbase.apache.org/mail-lists.html">mailing lists</link> page.
+          Posing questions - and helping to answer other people's questions - is encouraged!  
+          There are varying levels of experience on both lists so patience and politeness are encouraged (and please 
+          stay on topic.)  
          </para>
        </section>
        <section xml:id="jira">
@ -144,14 +147,19 @@ mvn test -Dtest=TestXYZ
          If it's either a new feature request, enhancement, or a bug, file a ticket.
          </para>
        </section>
+        <section xml:id="codelines"><title>Codelines</title>
+          <para>Most development is done on TRUNK.  However, there are branches for minor releases (e.g., 0.90.1, 0.90.2, and 0.90.3 are on the 0.90 branch).</para>
+          <para>If you have any questions on this just send an email to the dev dist-list.</para>
+        </section>
        <section xml:id="submitting.patches">
          <title>Submitting Patches</title>
          <section xml:id="submitting.patches.create">
            <title>Create Patch</title>
-          <para>Patch files can be easily generated from Eclipse, for example by selecting Team -&gt; Create Patch.
+          <para>Patch files can be easily generated from Eclipse, for example by selecting "Team -&gt; Create Patch".
          </para>
          <para>Please submit one patch-file per Jira.  For example, if multiple files are changed make sure the 
          selected resource when generating the patch is a directory.  Patch files can reflect changes in multiple files. </para>
+          <para>Make sure you review <xref linkend="eclipse.code.formatting"/> for code style. </para>
          </section>
          <section xml:id="submitting.patches.naming">
            <title>Patch File Naming</title>
@ -162,12 +170,14 @@ mvn test -Dtest=TestXYZ
          <section xml:id="submitting.patches.tests">
            <title>Unit Tests</title>
            <para>Yes, please.  Please try to include unit tests with every code patch (and especially new classes and large changes).</para>
+            <para>Also, please make sure unit tests pass locally before submitting the patch.</para>
          </section>
          <section xml:id="submitting.patches.jira">
            <title>Attach Patch to Jira</title>
-          <para>The patch should be attached to the associated Jira ticket.
+          <para>The patch should be attached to the associated Jira ticket "More Actions -&gt; Attach Files".  Make sure you click the
+          ASF license inclusion, otherwise the patch can't be considered for inclusion.
          </para>
-          <para>Once attached to the ticket, click "submit patch" and 
+          <para>Once attached to the ticket, click "Submit Patch" and 
          the status of the ticket will change.  Committers will review submitted patches for inclusion into the codebase.  Please
          understand that not every patch may get committed, and that feedback will likely be provided on the patch.  Fear not, though,
          because the HBase community is helpful!
@ -177,7 +187,7 @@ mvn test -Dtest=TestXYZ
        <section xml:id="committing.patches">
          <title>Committing Patches</title>
          <para>
-          See <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToCommit">How To Commit</link> in the HBase wiki.
+          Committers do this.  See <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToCommit">How To Commit</link> in the HBase wiki.
          </para>
        </section>
    
--- a/src/docbkx/performance.xml
+++ b/src/docbkx/performance.xml
@ -232,10 +232,13 @@ Deferred log flush can be configured on tables via <link
    </section>    
    <section xml:id="perf.hbase.write.mr.reducer">
      <title>MapReduce:  Skip The Reducer</title>
-      <para>When writing a lot of data to an HBase table in a in a Mapper (e.g., with <link
-      xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>),
-      skip the Reducer step whenever possible.  When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then shuffled to other 
-      Reducers that will most likely be off-node.   
+      <para>When writing a lot of data to an HBase table from a MR job (e.g., with <link
+      xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>), and specifically where Puts are being emitted
+      from the Mapper, skip the Reducer step.  When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other 
+      Reducers that will most likely be off-node.  It's far more efficient to just write directly to HBase.   
+      </para>
+      <para>For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). 
+      This is a different processing problem than from the the above case. 
      </para>
    </section>