HBASE-3563 [site] Add one-page-only version of hbase doc

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1074349 13f79535-47bb-0310-9956-ffa450edef68
2011-02-24 23:12:21 +00:00 · 2011-02-24 23:12:21 +00:00 · fb502bee15
parent d3ec71e317
commit fb502bee15
3 changed files with 97 additions and 31 deletions
--- a/CHANGES.txt
+++ b/CHANGES.txt
@ -78,6 +78,7 @@ Release 0.91.0 - Unreleased
               coprocesssors -- instead version each individually
   HBASE-3520  Update our bundled hadoop from branch-0.20-append to latest
               (rpc version 43)
+   HBASE-3563  [site] Add one-page-only version of hbase doc


  NEW FEATURES
--- a/pom.xml
+++ b/pom.xml
@ -287,10 +287,38 @@
        <version>2.0.11</version>
        <executions>
          <execution>
+              <id>multipage</id>
+              <phase>pre-site</phase>
+        <configuration>
+          <xincludeSupported>true</xincludeSupported>
+          <navigShowtitles>true</navigShowtitles>
+          <chunkedOutput>true</chunkedOutput>
+          <useIdAsFilename>true</useIdAsFilename>
+          <sectionAutolabelMaxDepth>100</sectionAutolabelMaxDepth>
+          <sectionAutolabel>true</sectionAutolabel>
+          <sectionLabelIncludesComponentLabel>true</sectionLabelIncludesComponentLabel>
+          <targetDirectory>${basedir}/target/site/book/</targetDirectory>
+          <htmlStylesheet>../css/freebsd_docbook.css</htmlStylesheet>
+        </configuration>
            <goals>
              <goal>generate-html</goal>
            </goals>
+          </execution>
+          <execution>
+              <id>onepage</id>
            <phase>pre-site</phase>
+        <configuration>
+          <xincludeSupported>true</xincludeSupported>
+          <useIdAsFilename>true</useIdAsFilename>
+          <sectionAutolabelMaxDepth>100</sectionAutolabelMaxDepth>
+          <sectionAutolabel>true</sectionAutolabel>
+          <sectionLabelIncludesComponentLabel>true</sectionLabelIncludesComponentLabel>
+          <targetDirectory>${basedir}/target/site/</targetDirectory>
+          <htmlStylesheet>css/freebsd_docbook.css</htmlStylesheet>
+        </configuration>
+            <goals>
+              <goal>generate-html</goal>
+            </goals>
          </execution>
        </executions>
        <dependencies>
@ -301,16 +329,6 @@
            <scope>runtime</scope>
          </dependency>
        </dependencies>
-        <configuration>
-          <xincludeSupported>true</xincludeSupported>
-          <chunkedOutput>true</chunkedOutput>
-          <useIdAsFilename>true</useIdAsFilename>
-          <sectionAutolabelMaxDepth>100</sectionAutolabelMaxDepth>
-          <sectionAutolabel>true</sectionAutolabel>
-          <sectionLabelIncludesComponentLabel>true</sectionLabelIncludesComponentLabel>
-          <targetDirectory>${basedir}/target/site/</targetDirectory>
-          <htmlStylesheet>css/freebsd_docbook.css</htmlStylesheet>
-        </configuration>
      </plugin>
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -251,7 +251,7 @@ hbase(main):013:0&gt; drop 'test'
            <para><programlisting>hbase(main):014:0&gt; exit</programlisting></para>
            </section>

-          <section>
+          <section xml:id="stopping">
          <title>Stopping HBase</title>
            <para>Stop your hbase instance by running the stop script.</para>

@ -456,7 +456,7 @@ guide.

      </section>

-      <section><title>HBase run modes: Standalone and Distributed</title>
+      <section xml:id="standalone_dist"><title>HBase run modes: Standalone and Distributed</title>
          <para>HBase has two run modes: <link linkend="standalone">standalone</link>
              and <link linkend="distributed">distributed</link>.
              Out of the box, HBase runs in standalone mode.  To set up a
@ -479,7 +479,7 @@ Set <varname>JAVA_HOME</varname> to point at the root of your
        talk to HBase.
      </para>
      </section>
-      <section><title>Distributed</title>
+      <section xml:id="distributed"><title>Distributed</title>
          <para>Distributed mode can be subdivided into distributed but all daemons run on a
          single node -- a.k.a <emphasis>pseudo-distributed</emphasis>-- and
          <emphasis>fully-distributed</emphasis> where the daemons 
@ -595,7 +595,7 @@ make the following configuration.</para>
 &lt;/configuration&gt;
 </programlisting>

-<section><title><filename>regionservers</filename></title>
+<section xml:id="regionserver"><title><filename>regionservers</filename></title>
 <para>In addition, a fully-distributed mode requires that you
 modify <filename>conf/regionservers</filename>.
 The <filename><link linkend="regionservrers">regionservers</link></filename> file lists all hosts
@ -820,7 +820,7 @@ before stopping the Hadoop daemons.</para>



-    <section><title>Example Configurations</title>
+    <section xml:id="example_config"><title>Example Configurations</title>
    <section><title>Basic Distributed HBase Install</title>
    <para>Here is an example basic configuration for a distributed ten node cluster.
    The nodes are named <varname>example0</varname>, <varname>example1</varname>, etc., through
@ -1002,7 +1002,7 @@ to ensure well-formedness of your document after an edit session.
    Use <command>rsync</command>.</para>


-    <section>
+    <section xml:id="hbase.site">
    <title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
    <para>Just as in Hadoop where you add site-specific HDFS configuration
    to the <filename>hdfs-site.xml</filename> file,
@ -1032,7 +1032,7 @@ to ensure well-formedness of your document after an edit session.
      href="../../target/site/hbase-default.xml" />
    </section>

-      <section>
+      <section xml:id="hbase.env.sh">
      <title><filename>hbase-env.sh</filename></title>
      <para>Set HBase environment variables in this file.
      Examples include options to pass the JVM on start of
@ -1280,7 +1280,7 @@ of all regions.
            <para>See <link linkend="shell_exercises">Shell Exercises</link>
            for example basic shell operation.</para>

-    <section><title>Scripting</title>
+    <section xml:id="scripting"><title>Scripting</title>
        <para>For examples scripting HBase, look in the
            HBase <filename>bin</filename> directory.  Look at the files
            that end in <filename>*.rb</filename>.  To run one of these
@ -1349,14 +1349,31 @@ of all regions.

  <chapter xml:id="schema">
  <title>HBase and Schema Design</title>
+  <section xml:id="number.of.cfs">
+  <title>
+      On the number of column families
+  </title>
+  <para>
+      HBase currently does not do well with anything about two or three column families so keep the number
+      of column families in your schema low.  Currently, flushing and compactions are done on a per Region basis so
+      if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
+      will also be flushed though the amount of data they carry is small.  Compaction is currently triggered
+      by the total number of files under a column family.  Its not size based.  When many column families the
+      flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
+      changing flushing and compaction to work on a per column family basis).
+    </para>
+    <para>Try to make do with one column famliy if you can in your schemas.  Only introduce a
+        second and third column family in the case where data access is usually column scoped;
+        i.e. you query one column family or the other but usually not both at the one time.
+    </para>
+  </section>
  <section>
  <title>
  Monotonically Increasing Row Keys/Timeseries Data
  </title>
  <para>See this comic by IKai Lan on why monotically increasing row keys are
  problematic in BigTable-like datastores:
-  <link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>
-  in BigTable-like stores.</para>
+  <link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>.</para>
  <para>If you need to upload time series data into HBase, you should
  study <link xlink:href="http://opentsdb.net/">OpenTSDB</link> as a
  successful example.  It has a page describing the schema it uses in
@ -1434,7 +1451,7 @@ of all regions.

      <para></para>
    </section>
-    <section>
+    <section xml:id="cells">
      <title>Cells<indexterm><primary>Cells</primary></indexterm></title>
      <para>A <emphasis>{row, column, version} </emphasis>tuple exactly
      specifies a <literal>cell</literal> in HBase. 
@ -1493,7 +1510,7 @@ of all regions.
          basically a synopsis of this article by Bruno Dumon.</para>
        </footnote>.</para>

-      <section>
+      <section xml:id="versions">
        <title>Versions and HBase Operations</title>

        <para>In this section we look at the behavior of the version dimension
@ -1635,15 +1652,15 @@ of all regions.

  <chapter xml:id="architecture">
    <title>Architecture</title>
-    <section>
+    <section xml:id="daemons">
     <title>Daemons</title>
-     <section><title>Master</title>
+     <section xml:id="master"><title>Master</title>
     </section>
-     <section><title>RegionServer</title>
+     <section xml:id="regionserver.arch"><title>RegionServer</title>
     </section>
    </section>

-    <section>
+    <section xml:id="regions.arch">
    <title>Regions</title>
    <para>This chapter is all about Regions.</para>
    <note>
@ -1758,7 +1775,7 @@ of all regions.
    <para>Each RegionServer adds updates to its Write-ahead Log (WAL)
    first, and then to memory.</para>

-    <section>
+    <section xml:id="purpose.wal">
      <title>What is the purpose of the HBase WAL</title>

      <para>
@ -1812,6 +1829,36 @@ of all regions.
    </section>

  </chapter>
+  <chapter xml:id="performance">
+    <title>Performance Tuning</title>
+    <para>Start with the <link xlink:href="http://wiki.apache.org/hadoop/PerformanceTuning">wiki Performance Tuning</link> page.
+        It has a general discussion of the main factors involved; RAM, compression, JVM settings, etc.
+        Afterward, come back here for more pointers.
+    </para>
+    <section xml:id="jvm">
+        <title>Java</title>
+    <section xml:id="gc">
+        <title>The Garage Collector and HBase</title>
+        <section xml:id="gcpause">
+            <title>Long GC pauses</title>
+        <para>
+            In his presentation,
+            <link xlink:href="http://www.slideshare.net/cloudera/hbase-hug-presentation">Avoiding Full GCs with MemStore-Local Allocation Buffers</link>,
+            Todd Lipcon describes two cases of stop-the-world garbage collections common in HBase, especially during loading;
+            CMS failure modes and old generation heap fragmentation brought.  To address the first,
+            start the CMS earlier than default by adding <code>-XX:CMSInitiatingOccupancyFraction</code>
+            and setting it down from defaults.  Start at 60 or 70 percent (The lower you bring down
+            the threshold, the more GCing is done, the more CPU used).  To address the second
+            fragmentation issue, Todd added an experimental facility that must be 
+            explicitly enabled in HBase 0.90.x (Its defaulted to be on in 0.92.x HBase).  See
+            <code>hbase.hregion.memstore.mslab.enabled</code> to true in your
+            <classname>Configuration</classname>.  See the cited slides for background and
+            detail.
+        </para>
+      </section>
+    </section>
+    </section>
+  </chapter>

  <chapter xml:id="blooms">
    <title>Bloom Filters</title>
@ -1839,7 +1886,7 @@ of all regions.
        work.</para>
      </footnote></para>

-    <section>
+    <section xml:id="bloom.config">
      <title>Configurations</title>

      <para>Blooms are enabled by specifying options on a column family in the
@ -1975,7 +2022,7 @@ of all regions.
        doing:<programlisting> $ ./<code>bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/</code></programlisting></para>
      </section>
    </section>
-    <section><title>Compression Tool</title>
+    <section xml:id="compression.tool"><title>Compression Tool</title>
        <para>See <link linkend="compression.tool" >Compression Tool</link>.</para>
    </section>
  </appendix>
@ -1984,7 +2031,7 @@ of all regions.

    <title >Compression In HBase<indexterm><primary>Compression</primary></indexterm></title>

-    <section id="compression.test">
+    <section xml:id="compression.test">
    <title>CompressionTest Tool</title>
    <para>
    HBase includes a tool to test compression is set up properly.
@ -1993,7 +2040,7 @@ of all regions.
    </para>
    </section>

-    <section id="hbase.regionserver.codecs">
+    <section xml:id="hbase.regionserver.codecs">
    <title>
    <varname>
    hbase.regionserver.codecs