HBASE-11399 Improve Quickstart chapter and move Pseudo-distributed and distrbuted into it (Misty Stanley-Jones)

2014-07-02 11:24:30 -07:00 · 2014-07-02 11:24:30 -07:00 · 15831cefd5
parent 20cac213af
commit 15831cefd5
2 changed files with 1007 additions and 524 deletions
--- a/src/main/docbkx/configuration.xml
+++ b/src/main/docbkx/configuration.xml
@ -29,228 +29,319 @@
 */
 -->
  <title>Apache HBase Configuration</title>
-  <para>This chapter is the Not-So-Quick start guide to Apache HBase configuration. It goes over
+  <para>This chapter expands upon the <xref linkend="getting_started" /> chapter to further explain
-    system requirements, Hadoop setup, the different Apache HBase run modes, and the various
+    configuration of Apache HBase. Please read this chapter carefully, especially <xref
-    configurations in HBase. Please read this chapter carefully. At a minimum ensure that all <xref
+      linkend="basic.prerequisites" /> to ensure that your HBase testing and deployment goes
-      linkend="basic.prerequisites" /> have been satisfied. Failure to do so will cause you (and us)
+    smoothly, and prevent data loss.</para>
    grief debugging strange errors and/or data loss.</para>
-  <para> Apache HBase uses the same configuration system as Apache Hadoop. To configure a deploy,
+  <para> Apache HBase uses the same configuration system as Apache Hadoop. All configuration files
-    edit a file of environment variables in <filename>conf/hbase-env.sh</filename> -- this
+    are located in the <filename>conf/</filename> directory, which needs to be kept in sync for each
-    configuration is used mostly by the launcher shell scripts getting the cluster off the ground --
+    node on your cluster.</para>
-    and then add configuration to an XML file to do things like override HBase defaults, tell HBase
+  
-    what Filesystem to use, and the location of the ZooKeeper ensemble. <footnote>
+  <variablelist>
-      <para> Be careful editing XML. Make sure you close all elements. Run your file through
+    <title>HBase Configuration Files</title>
-          <command>xmllint</command> or similar to ensure well-formedness of your document after an
+    <varlistentry>
-        edit session. </para>
+      <term><filename>backup-masters</filename></term>
-    </footnote></para>
+      <listitem>
        <para>Not present by default. A plain-text file which lists hosts on which the Master should
          start a backup Master process, one host per line.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term><filename>hadoop-metrics2-hbase.properties</filename></term>
      <listitem>
        <para>Used to connect HBase Hadoop's Metrics2 framework. See the <link
            xlink:href="http://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2">Hadoop Wiki
            entry</link> for more information on Metrics2. Contains only commented-out examples by
          default.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term><filename>hbase-env.cmd</filename> and <filename>hbase-env.sh</filename></term>
      <listitem>
        <para>Script for Windows and Linux / Unix environments to set up the working environment for
        HBase, including the location of Java, Java options, and other environment variables. The
        file contains many commented-out examples to provide guidance.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term><filename>hbase-policy.xml</filename></term>
      <listitem>
        <para>The default policy configuration file used by RPC servers to make authorization
          decisions on client requests. Only used if HBase security (<xref
            linkend="security" />) is enabled.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term><filename>hbase-site.xml</filename></term>
      <listitem>
        <para>The main HBase configuration file. This file specifies configuration options which
          override HBase's default configuration. You can view (but do not edit) the default
          configuration file at <filename>docs/hbase-default.xml</filename>. You can also view the
          entire effective configuration for your cluster (defaults and overrides) in the
            <guilabel>HBase Configuration</guilabel> tab of the HBase Web UI.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term><filename>log4j.properties</filename></term>
      <listitem>
        <para>Configuration file for HBase logging via <code>log4j</code>.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term><filename>regionservers</filename></term>
      <listitem>
        <para>A plain-text file containing a list of hosts which should run a RegionServer in your
          HBase cluster. By default this file contains the single entry
          <literal>localhost</literal>. It should contain a list of hostnames or IP addresses, one
          per line, and should only contain <literal>localhost</literal> if each node in your
          cluster will run a RegionServer on its <literal>localhost</literal> interface.</para>
      </listitem>
    </varlistentry>
  </variablelist>
  <tip>
    <title>Checking XML Validity</title>
    <para>When you edit XML, it is a good idea to use an XML-aware editor to be sure that your
      syntax is correct and your XML is well-formed. You can also use the <command>xmllint</command>
      utility to check that your XML is well-formed. By default, <command>xmllint</command> re-flows
      and prints the XML to standard output. To check for well-formedness and only print output if
      errors exist, use the command <command>xmllint -noout
        <replaceable>filename.xml</replaceable></command>.</para>
  </tip>
-  <para>When running in distributed mode, after you make an edit to an HBase configuration, make
+  <warning>
-    sure you copy the content of the <filename>conf</filename> directory to all nodes of the
+    <title>Keep Configuration In Sync Across the Cluster</title>
-    cluster. HBase will not do this for you. Use <command>rsync</command>. For most configuration, a
+    <para>When running in distributed mode, after you make an edit to an HBase configuration, make
-    restart is needed for servers to pick up changes (caveat dynamic config. to be described later
+      sure you copy the content of the <filename>conf/</filename> directory to all nodes of the
-    below).</para>
+      cluster. HBase will not do this for you. Use <command>rsync</command>, <command>scp</command>,
      or another secure mechanism for copying the configuration files to your nodes. For most
      configuration, a restart is needed for servers to pick up changes An exception is dynamic
      configuration. to be described later below.</para>
  </warning>
  <section
    xml:id="basic.prerequisites">
    <title>Basic Prerequisites</title>
    <para>This section lists required services and some required system configuration. </para>
-    <section
+    <table
      xml:id="java">
      <title>Java</title>
-      <para>HBase requires at least Java 6 from <link
+      <textobject>
-          xlink:href="http://www.java.com/download/">Oracle</link>. The following table lists which JDK version are
+        <para>HBase requires at least Java 6 from <link
-        compatible with each version of HBase.</para>
+            xlink:href="http://www.java.com/download/">Oracle</link>. The following table lists
-      <informaltable>
+          which JDK version are compatible with each version of HBase.</para>
-        <tgroup cols="4">
+      </textobject>
-          <thead>
+      <tgroup
-            <row>
+        cols="4">
-              <entry>HBase Version</entry>
+        <thead>
-              <entry>JDK 6</entry>
+          <row>
-              <entry>JDK 7</entry>
+            <entry>HBase Version</entry>
-              <entry>JDK 8</entry>
+            <entry>JDK 6</entry>
-            </row>
+            <entry>JDK 7</entry>
-          </thead>
+            <entry>JDK 8</entry>
-          <tbody>
+          </row>
-            <row>
+        </thead>
-              <entry>1.0</entry>
+        <tbody>
-              <entry><link xlink:href="http://search-hadoop.com/m/DHED4Zlz0R1">Not Supported</link></entry>
+          <row>
-              <entry>yes</entry>
+            <entry>1.0</entry>
-              <entry><para>Running with JDK 8 will work but is not well tested.</para></entry>
+            <entry><link
-            </row>
+                xlink:href="http://search-hadoop.com/m/DHED4Zlz0R1">Not Supported</link></entry>
-            <row>
+            <entry>yes</entry>
-              <entry>0.98</entry>
+            <entry><para>Running with JDK 8 will work but is not well tested.</para></entry>
-              <entry>yes</entry>
+          </row>
-              <entry>yes</entry>
+          <row>
-              <entry><para>Running with JDK 8 works but is not well tested. Building with JDK 8
+            <entry>0.98</entry>
-                would require removal of the deprecated remove() method of the PoolMap class and is
+            <entry>yes</entry>
-                under consideration. See ee <link
+            <entry>yes</entry>
-                xlink:href="https://issues.apache.org/jira/browse/HBASE-7608">HBASE-7608</link> for
+            <entry><para>Running with JDK 8 works but is not well tested. Building with JDK 8 would
-                more information about JDK 8 support.</para></entry>
+                require removal of the deprecated remove() method of the PoolMap class and is under
-            </row>
+                consideration. See ee <link
-            <row>
+                  xlink:href="https://issues.apache.org/jira/browse/HBASE-7608">HBASE-7608</link>
-              <entry>0.96</entry>
+                for more information about JDK 8 support.</para></entry>
-              <entry>yes</entry>
+          </row>
-              <entry>yes</entry>
+          <row>
-              <entry></entry>
+            <entry>0.96</entry>
-            </row>
+            <entry>yes</entry>
-            <row>
+            <entry>yes</entry>
-              <entry>0.94</entry>
+            <entry />
-              <entry>yes</entry>
+          </row>
-              <entry>yes</entry>
+          <row>
-              <entry></entry>
+            <entry>0.94</entry>
-            </row>
+            <entry>yes</entry>
-          </tbody>
+            <entry>yes</entry>
-        </tgroup>
+            <entry />
-      </informaltable>
+          </row>
-    </section>
+        </tbody>
      </tgroup>
    </table>
-    <section
+    <variablelist
      xml:id="os">
-      <title>Operating System</title>
+      <title>Operating System Utilities</title>
-      <section
+      <varlistentry
        xml:id="ssh">
-        <title>ssh</title>
+        <term>ssh</term>
-
+        <listitem>
-        <para><command>ssh</command> must be installed and <command>sshd</command> must be running
+          <para>HBase uses the Secure Shell (ssh) command and utilities extensively to communicate
-          to use Hadoop's scripts to manage remote Hadoop and HBase daemons. You must be able to ssh
+            between cluster nodes. Each server in the cluster must be running <command>ssh</command>
-          to all nodes, including your local node, using passwordless login (Google "ssh
+            so that the Hadoop and HBase daemons can be managed. You must be able to connect to all
-          passwordless login"). If on mac osx, see the section, <link
+            nodes via SSH, including the local node, from the Master as well as any backup Master,
-            xlink:href="http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29">SSH:
+            using a shared key rather than a password. You can see the basic methodology for such a
-            Setting up Remote Desktop and Enabling Self-Login</link> on the hadoop wiki.</para>
+            set-up in Linux or Unix systems at <xref
-      </section>
+              linkend="passwordless.ssh.quickstart" />. If your cluster nodes use OS X, see the
-
+            section, <link
-      <section
+              xlink:href="http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29">SSH:
              Setting up Remote Desktop and Enabling Self-Login</link> on the Hadoop wiki.</para>
        </listitem>
      </varlistentry>
      <varlistentry
        xml:id="dns">
-        <title>DNS</title>
+        <term>DNS</term>
        <listitem>
          <para>HBase uses the local hostname to self-report its IP address. Both forward and
            reverse DNS resolving must work in versions of HBase previous to 0.92.0.<footnote>
              <para>The <link
                  xlink:href="https://github.com/sujee/hadoop-dns-checker">hadoop-dns-checker</link>
                tool can be used to verify DNS is working correctly on the cluster. The project
                README file provides detailed instructions on usage. </para>
            </footnote></para>
-        <para>HBase uses the local hostname to self-report its IP address. Both forward and reverse
+          <para>If your server has multiple network interfaces, HBase defaults to using the
-          DNS resolving must work in versions of HBase previous to 0.92.0 <footnote>
+            interface that the primary hostname resolves to. To override this behavior, set the
-            <para>The <link
+              <code>hbase.regionserver.dns.interface</code> property to a different interface. This
-                xlink:href="https://github.com/sujee/hadoop-dns-checker">hadoop-dns-checker</link>
+            will only work if each server in your cluster uses the same network interface
-              tool can be used to verify DNS is working correctly on the cluster. The project README
+            configuration.</para>
              file provides detailed instructions on usage. </para>
          </footnote>.</para>
-        <para>If your machine has multiple interfaces, HBase will use the interface that the primary
+          <para>To choose a different DNS nameserver than the system default, set the
-          hostname resolves to.</para>
+              <varname>hbase.regionserver.dns.nameserver</varname> property to the IP address of
-
+            that nameserver.</para>
-        <para>If this is insufficient, you can set
+        </listitem>
-            <varname>hbase.regionserver.dns.interface</varname> to indicate the primary interface.
+      </varlistentry>
-          This only works if your cluster configuration is consistent and every host has the same
+      <varlistentry
          network interface configuration.</para>
        <para>Another alternative is setting <varname>hbase.regionserver.dns.nameserver</varname> to
          choose a different nameserver than the system wide default.</para>
      </section>
      <section
        xml:id="loopback.ip">
-        <title>Loopback IP</title>
+        <term>Loopback IP</term>
-        <para>Previous to hbase-0.96.0, HBase expects the loopback IP address to be 127.0.0.1. See <xref
+        <listitem>
-            linkend="loopback.ip" /></para>
+          <para>Prior to hbase-0.96.0, HBase only used the IP address
-      </section>
+              <systemitem>127.0.0.1</systemitem> to refer to <code>localhost</code>, and this could
-
+            not be configured. See <xref
-      <section
+              linkend="loopback.ip" />.</para>
        </listitem>
      </varlistentry>
      <varlistentry
        xml:id="ntp">
-        <title>NTP</title>
+        <term>NTP</term>
        <listitem>
          <para>The clocks on cluster nodes should be synchronized. A small amount of variation is
            acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time
            synchronization is one of the first things to check if you see unexplained problems in
            your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or
            another time-synchronization mechanism, on your cluster, and that all nodes look to the
            same service for time synchronization. See the <link
              xlink:href="http://www.tldp.org/LDP/sag/html/basic-ntp-config.html">Basic NTP
              Configuration</link> at <citetitle>The Linux Documentation Project (TLDP)</citetitle>
            to set up NTP.</para>
        </listitem>
      </varlistentry>
-        <para>The clocks on cluster members should be in basic alignments. Some skew is tolerable
+      <varlistentry
          but wild skew could generate odd behaviors. Run <link
            xlink:href="http://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</link> on your
          cluster, or an equivalent.</para>
        <para>If you are having problems querying data, or "weird" cluster operations, check system
          time!</para>
      </section>
      <section
        xml:id="ulimit">
-        <title>
+        <term>Limits on Number of Files and Processes (<command>ulimit</command>)
-          <varname>ulimit</varname><indexterm>
+          <indexterm>
            <primary>ulimit</primary>
-          </indexterm> and <varname>nproc</varname><indexterm>
+          </indexterm><indexterm>
            <primary>nproc</primary>
          </indexterm>
-        </title>
+        </term>
-        <para>Apache HBase is a database. It uses a lot of files all at the same time. The default
+        <listitem>
-          ulimit -n -- i.e. user file limit -- of 1024 on most *nix systems is insufficient (On mac
+          <para>Apache HBase is a database. It requires the ability to open a large number of files
-          os x its 256). Any significant amount of loading will lead you to <xref
+            at once. Many Linux distributions limit the number of files a single user is allowed to
-            linkend="trouble.rs.runtime.filehandles" />. You may also notice errors such as the
+            open to <literal>1024</literal> (or <literal>256</literal> on older versions of OS X).
-          following:</para>
+            You can check this limit on your servers by running the command <command>ulimit
-        <screen>
+              -n</command> when logged in as the user which runs HBase. See <xref
              linkend="trouble.rs.runtime.filehandles" /> for some of the problems you may
            experience if the limit is too low. You may also notice errors such as the
            following:</para>
          <screen>
 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
-        </screen>
+          </screen>
-        <para> Do yourself a favor and change the upper bound on the number of file descriptors. Set
+          <para>It is recommended to raise the ulimit to at least 10,000, but more likely 10,240,
-          it to north of 10k. The math runs roughly as follows: per ColumnFamily there is at least
+            because the value is usually expressed in multiples of 1024. Each ColumnFamily has at
-          one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the average
+            least one StoreFile, and possibly more than 6 StoreFiles if the region is under load.
-          number of StoreFiles per ColumnFamily times the number of regions per RegionServer. For
+            The number of open files required depends upon the number of ColumnFamilies and the
-          example, assuming that a schema had 3 ColumnFamilies per region with an average of 3
+            number of regions. The following is a rough formula for calculating the potential number
-          StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM will open
+            of open files on a RegionServer. </para>
-          3 * 3 * 100 = 900 file descriptors (not counting open jar files, config files, etc.) </para>
+          <example>
-        <para>You should also up the hbase users' <varname>nproc</varname> setting; under load, a
+            <title>Calculate the Potential Number of Open Files</title>
-          low-nproc setting could manifest as <classname>OutOfMemoryError</classname>. <footnote>
+            <screen>(StoreFiles per ColumnFamily) x (regions per RegionServer)</screen>
-            <para>See Jack Levin's <link
+          </example>
-                xlink:href="">major hdfs issues</link> note up on the user list.</para>
+          <para>For example, assuming that a schema had 3 ColumnFamilies per region with an average
-          </footnote>
+            of 3 StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM
-          <footnote>
+            will open 3 * 3 * 100 = 900 file descriptors, not counting open JAR files, configuration
-            <para>The requirement that a database requires upping of system limits is not peculiar
+            files, and others. Opening a file does not take many resources, and the risk of allowing
-              to Apache HBase. See for example the section <emphasis>Setting Shell Limits for the
+            a user to open too many files is minimal.</para>
-                Oracle User</emphasis> in <link
+          <para>Another related setting is the number of processes a user is allowed to run at once.
-                xlink:href="http://www.akadia.com/services/ora_linux_install_10g.html"> Short Guide
+            In Linux and Unix, the number of processes is set using the <command>ulimit -u</command>
-                to install Oracle 10 on Linux</link>.</para>
+            command. This should not be confused with the <command>nproc</command> command, which
-          </footnote></para>
+            controls the number of CPUs available to a given user. Under load, a
              <varname>nproc</varname> that is too low can cause OutOfMemoryError exceptions. See
            Jack Levin's <link
              xlink:href="http://thread.gmane.org/gmane.comp.java.hadoop.hbase.user/16374">major
              hdfs issues</link> thread on the hbase-users mailing list, from 2011.</para>
          <para>Configuring the fmaximum number of ile descriptors and processes for the user who is
            running the HBase process is an operating system configuration, rather than an HBase
            configuration. It is also important to be sure that the settings are changed for the
            user that actually runs HBase. To see which user started HBase, and that user's ulimit
            configuration, look at the first line of the HBase log for that instance.<footnote>
              <para>A useful read setting config on you hadoop cluster is Aaron Kimballs' <link
                  xlink:href="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/">Configuration
                  Parameters: What can you just ignore?</link></para>
            </footnote></para>
          <formalpara xml:id="ulimit_ubuntu">
            <title><command>ulimit</command> Settings on Ubuntu</title>
            <para>To configure <command>ulimit</command> settings on Ubuntu, edit
                <filename>/etc/security/limits.conf</filename>, which is a space-delimited file with
              four columns. Refer to the <link
                xlink:href="http://manpages.ubuntu.com/manpages/lucid/man5/limits.conf.5.html">man
                page for limits.conf</link> for details about the format of this file. In the
              following example, the first line sets both soft and hard limits for the number of
              open files (<literal>nofile</literal>) to <literal>32768</literal> for the operating
              system user with the username <literal>hadoop</literal>. The second line sets the
              number of processes to 32000 for the same user.</para>
          </formalpara>
          <screen>
 hadoop  -       nofile  32768
 hadoop  -       nproc   32000
          </screen>
          <para>The settings are only applied if the Pluggable Authentication Module (PAM)
            environment is directed to use them. To configure PAM to use these limits, be sure that
            the <filename>/etc/pam.d/common-session</filename> file contains the following line:</para>
          <screen>session required  pam_limits.so</screen>
        </listitem>
      </varlistentry>
-        <para>To be clear, upping the file descriptors and nproc for the user who is running the
+      <varlistentry
          HBase process is an operating system configuration, not an HBase configuration. Also, a
          common mistake is that administrators will up the file descriptors for a particular user
          but for whatever reason, HBase will be running as some one else. HBase prints in its logs
          as the first line the ulimit its seeing. Ensure its correct. <footnote>
            <para>A useful read setting config on you hadoop cluster is Aaron Kimballs' <link
                xlink:href="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/">Configuration
                Parameters: What can you just ignore?</link></para>
          </footnote></para>
        <section
          xml:id="ulimit_ubuntu">
          <title><varname>ulimit</varname> on Ubuntu</title>
          <para>If you are on Ubuntu you will need to make the following changes:</para>
          <para>In the file <filename>/etc/security/limits.conf</filename> add a line like:</para>
          <programlisting>hadoop  -       nofile  32768</programlisting>
          <para>Replace <varname>hadoop</varname> with whatever user is running Hadoop and HBase. If
            you have separate users, you will need 2 entries, one for each user. In the same file
            set nproc hard and soft limits. For example:</para>
          <programlisting>hadoop soft/hard nproc 32000</programlisting>
          <para>In the file <filename>/etc/pam.d/common-session</filename> add as the last line in
            the file: <programlisting>session required  pam_limits.so</programlisting> Otherwise the
            changes in <filename>/etc/security/limits.conf</filename> won't be applied.</para>
          <para>Don't forget to log out and back in again for the changes to take effect!</para>
        </section>
      </section>
      <section
        xml:id="windows">
-        <title>Windows</title>
+        <term>Windows</term>
-        <para>Previous to hbase-0.96.0, Apache HBase was little tested running on Windows. Running a
+        <listitem>
-          production install of HBase on top of Windows is not recommended.</para>
+          <para>Prior to HBase 0.96, testing for running HBase on Microsoft Windows was limited.
            Running a on Windows nodes is not recommended for production systems.</para>
-        <para>If you are running HBase on Windows pre-hbase-0.96.0, you must install <link
+        <para>To run versions of HBase prior to 0.96 on Microsoft Windows, you must install <link
-            xlink:href="http://cygwin.com/">Cygwin</link> to have a *nix-like environment for the
+            xlink:href="http://cygwin.com/">Cygwin</link> and run HBase within the Cygwin
-          shell scripts. The full details are explained in the <link
+          environment. This provides support for Linux/Unix commands and scripts. The full details are explained in the <link
            xlink:href="http://hbase.apache.org/cygwin.html">Windows Installation</link> guide. Also <link
            xlink:href="http://search-hadoop.com/?q=hbase+windows&amp;fc_project=HBase&amp;fc_type=mail+_hash_+dev">search
            our user mailing list</link> to pick up latest fixes figured by Windows users.</para>
        <para>Post-hbase-0.96.0, hbase runs natively on windows with supporting
-            <command>*.cmd</command> scripts bundled. </para>
+            <command>*.cmd</command> scripts bundled. </para></listitem>
-      </section>
+      </varlistentry>
-    </section>
+    </variablelist>
    <!--  OS -->
    <section
@ -259,17 +350,18 @@
          xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm>
          <primary>Hadoop</primary>
        </indexterm></title>
-      <para>The below table shows some information about what versions of Hadoop are supported by
+      <para>The following table summarizes the versions of Hadoop supported with each version of
-        various HBase versions. Based on the version of HBase, you should select the most
+        HBase. Based on the version of HBase, you should select the most
-        appropriate version of Hadoop. We are not in the Hadoop distro selection business. You can
+        appropriate version of Hadoop. You can use Apache Hadoop, or a vendor's distribution of
-        use Hadoop distributions from Apache, or learn about vendor distributions of Hadoop at <link
+        Hadoop. No distinction is made here. See <link
-          xlink:href="http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support" /></para>
+          xlink:href="http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support" />
        for information about vendors of Hadoop.</para>
      <tip>
-        <title>Hadoop 2.x is better than Hadoop 1.x</title>
+        <title>Hadoop 2.x is recommended.</title>
-        <para>Hadoop 2.x is faster, with more features such as short-circuit reads which will help
+        <para>Hadoop 2.x is faster and includes features, such as short-circuit reads, which will
-          improve your HBase random read profile as well important bug fixes that will improve your
+          help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes
-          overall HBase experience. You should run Hadoop 2 rather than Hadoop 1. HBase 0.98
+          that will improve your overall HBase experience. HBase 0.98 deprecates use of Hadoop 1.x,
-          deprecates use of Hadoop1. HBase 1.0 will not support Hadoop1. </para>
+          and HBase 1.0 will not support Hadoop 1.x.</para>
      </tip>
      <para>Use the following legend to interpret this table:</para>
      <simplelist
@ -618,7 +710,9 @@ Index: pom.xml
        instance of the <emphasis>Hadoop Distributed File System</emphasis> (HDFS).
        Fully-distributed mode can ONLY run on HDFS. See the Hadoop <link
          xlink:href="http://hadoop.apache.org/common/docs/r1.1.1/api/overview-summary.html#overview_description">
-          requirements and instructions</link> for how to set up HDFS.</para>
+          requirements and instructions</link> for how to set up HDFS for Hadoop 1.x. A good
        walk-through for setting up HDFS on Hadoop 2 is at <link
          xlink:href="http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide">http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide</link>.</para>
      <para>Below we describe the different distributed setups. Starting, verification and
        exploration of your install, whether a <emphasis>pseudo-distributed</emphasis> or
@ -628,207 +722,139 @@ Index: pom.xml
      <section
        xml:id="pseudo">
        <title>Pseudo-distributed</title>
        <note>
          <title>Pseudo-Distributed Quickstart</title>
          <para>A quickstart has been added to the <xref
              linkend="quickstart" /> chapter. See <xref
              linkend="quickstart-pseudo" />. Some of the information that was originally in this
            section has been moved there.</para>
        </note>
        <para>A pseudo-distributed mode is simply a fully-distributed mode run on a single host. Use
          this configuration testing and prototyping on HBase. Do not use this configuration for
          production nor for evaluating HBase performance.</para>
        <para>First, if you want to run on HDFS rather than on the local filesystem, setup your
          HDFS. You can set up HDFS also in pseudo-distributed mode (TODO: Add pointer to HOWTO doc;
          the hadoop site doesn't have any any more). Ensure you have a working HDFS before
          proceeding. </para>
        <para>Next, configure HBase. Edit <filename>conf/hbase-site.xml</filename>. This is the file
          into which you add local customizations and overrides. At a minimum, you must tell HBase
          to run in (pseudo-)distributed mode rather than in default standalone mode. To do this,
          set the <varname>hbase.cluster.distributed</varname> property to true (Its default is
            <varname>false</varname>). The absolute bare-minimum <filename>hbase-site.xml</filename>
          is therefore as follows:</para>
        <programlisting><![CDATA[
 <configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
 </configuration>
 ]]>
        </programlisting>
        <para>With this configuration, HBase will start up an HBase Master process, a ZooKeeper
          server, and a RegionServer process running against the local filesystem writing to
          wherever your operating system stores temporary files into a directory named
            <filename>hbase-YOUR_USER_NAME</filename>.</para>
        <para>Such a setup, using the local filesystem and writing to the operating systems's
          temporary directory is an ephemeral setup; the Hadoop local filesystem -- which is what
          HBase uses when it is writing the local filesytem -- would lose data unless the system
          was shutdown properly in versions of HBase before 0.98.4 and 1.0.0 (see
          <link xlink:href="https://issues.apache.org/jira/browse/HBASE-11218">HBASE-11218 Data
          loss in HBase standalone mode</link>). Writing to the operating
          system's temporary directory can also make for data loss when the machine is restarted as
          this directory is usually cleared on reboot. For a more permanent setup, see the next
          example where we make use of an instance of HDFS; HBase data will be written to the Hadoop
          distributed filesystem rather than to the local filesystem's tmp directory.</para>
        <para>In this <filename>conf/hbase-site.xml</filename> example, the
            <varname>hbase.rootdir</varname> property points to the local HDFS instance homed on the
          node <varname>h-24-30.example.com</varname>.</para>
        <note>
          <title>Let HBase create <filename>${hbase.rootdir}</filename></title>
          <para>Let HBase create the <varname>hbase.rootdir</varname> directory. If you don't,
            you'll get warning saying HBase needs a migration run because the directory is missing
            files expected by HBase (it'll create them if you let it).</para>
        </note>
        <programlisting>
 &lt;configuration&gt;
  &lt;property&gt;
    &lt;name&gt;hbase.rootdir&lt;/name&gt;
    &lt;value&gt;hdfs://h-24-30.sfo.stumble.net:8020/hbase&lt;/value&gt;
  &lt;/property&gt;
  &lt;property&gt;
    &lt;name&gt;hbase.cluster.distributed&lt;/name&gt;
    &lt;value&gt;true&lt;/value&gt;
  &lt;/property&gt;
 &lt;/configuration&gt;
        </programlisting>
        <para>Now skip to <xref
            linkend="confirm" /> for how to start and verify your pseudo-distributed install. <footnote>
            <para>See <xref
                linkend="pseudo.extras" /> for notes on how to start extra Masters and RegionServers
              when running pseudo-distributed.</para>
          </footnote></para>
        <section
          xml:id="pseudo.extras">
          <title>Pseudo-distributed Extras</title>
          <section
            xml:id="pseudo.extras.start">
            <title>Startup</title>
            <para>To start up the initial HBase cluster...</para>
            <screen>% bin/start-hbase.sh</screen>
            <para>To start up an extra backup master(s) on the same server run...</para>
            <screen>% bin/local-master-backup.sh start 1</screen>
            <para>... the '1' means use ports 16001 &amp; 16011, and this backup master's logfile
              will be at <filename>logs/hbase-${USER}-1-master-${HOSTNAME}.log</filename>. </para>
            <para>To startup multiple backup masters run...</para>
            <screen>% bin/local-master-backup.sh start 2 3</screen>
            <para>You can start up to 9 backup masters (10 total). </para>
            <para>To start up more regionservers...</para>
            <screen>% bin/local-regionservers.sh start 1</screen>
            <para>... where '1' means use ports 16201 &amp; 16301 and its logfile will be at
                `<filename>logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log</filename>. </para>
            <para>To add 4 more regionservers in addition to the one you just started by
              running...</para>
            <screen>% bin/local-regionservers.sh start 2 3 4 5</screen>
            <para>This supports up to 99 extra regionservers (100 total). </para>
          </section>
          <section
            xml:id="pseudo.options.stop">
            <title>Stop</title>
            <para>Assuming you want to stop master backup # 1, run...</para>
            <screen>% cat /tmp/hbase-${USER}-1-master.pid |xargs kill -9</screen>
            <para>Note that bin/local-master-backup.sh stop 1 will try to stop the cluster along
              with the master. </para>
            <para>To stop an individual regionserver, run...</para>
            <screen>% bin/local-regionservers.sh stop 1</screen>
          </section>
        </section>
      </section>
    </section>
    <section
      xml:id="fully_dist">
      <title>Fully-distributed</title>
      <para>By default, HBase runs in standalone mode. Both standalone mode and pseudo-distributed
        mode are provided for the purposes of small-scale testing. For a production environment,
        distributed mode is appropriate. In distributed mode, multiple instances of HBase daemons
        run on multiple servers in the cluster.</para>
      <para>Just as in pseudo-distributed mode, a fully distributed configuration requires that you
        set the <code>hbase-cluster.distributed</code> property to <literal>true</literal>.
        Typically, the <code>hbase.rootdir</code> is configured to point to a highly-available HDFS
        filesystem. </para>
      <para>In addition, the cluster is configured so that multiple cluster nodes enlist as
        RegionServers, ZooKeeper QuorumPeers, and backup HMaster servers. These configuration basics
        are all demonstrated in <xref
          linkend="quickstart-fully-distributed" />.</para>
      <formalpara
        xml:id="regionserver">
        <title>Distributed RegionServers</title>
        <para>Typically, your cluster will contain multiple RegionServers all running on different
          servers, as well as primary and backup Master and Zookeeper daemons. The
            <filename>conf/regionservers</filename> file on the master server contains a list of
          hosts whose RegionServers are associated with this cluster. Each host is on a separate
          line. All hosts listed in this file will have their RegionServer processes started and
          stopped when the master server starts or stops.</para>
      </formalpara>
      <formalpara
        xml:id="hbase.zookeeper">
        <title>ZooKeeper and HBase</title>
        <para>See section <xref
            linkend="zookeeper" /> for ZooKeeper setup for HBase.</para>
      </formalpara>
-      <section
+      <example>
-        xml:id="fully_dist">
+        <title>Example Distributed HBase Cluster</title>
-        <title>Fully-distributed</title>
+        <para>This is a bare-bones <filename>conf/hbase-site.xml</filename> for a distributed HBase
-
+          cluster. A cluster that is used for real-world work would contain more custom
-        <para>For running a fully-distributed operation on more than one host, make the following
+          configuration parameters. Most HBase configuration directives have default values, which
-          configurations. In <filename>hbase-site.xml</filename>, add the property
+          are used unless the value is overridden in the <filename>hbase-site.xml</filename>. See <xref
-            <varname>hbase.cluster.distributed</varname> and set it to <varname>true</varname> and
+            linkend="config.files" /> for more information.</para>
          point the HBase <varname>hbase.rootdir</varname> at the appropriate HDFS NameNode and
          location in HDFS where you would like HBase to write data. For example, if you namenode
          were running at namenode.example.org on port 8020 and you wanted to home your HBase in
          HDFS at <filename>/hbase</filename>, make the following configuration.</para>
        <programlisting><![CDATA[
 <configuration>
  ...
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode.example.org:8020/hbase</value>
    <description>The directory shared by RegionServers.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
  </property>
-  ...
+  <property>
      <name>hbase.zookeeper.quorum</name>
      <value>node-a.example.com,node-b.example.com,node-c.example.com</value>
    </property>
 </configuration>
 ]]>
        </programlisting>
        <para>This is an example <filename>conf/regionservers</filename> file, which contains a list
          of each node that should run a RegionServer in the cluster. These nodes need HBase
          installed and they need to use the same contents of the <filename>conf/</filename>
          directory as the Master server..</para>
        <programlisting>
 node-a.example.com
 node-b.example.com
 node-c.example.com
        </programlisting>
        <para>This is an example <filename>conf/backup-masters</filename> file, which contains a
          list of each node that should run a backup Master instance. The backup Master instances
          will sit idle unless the main Master becomes unavailable.</para>
        <programlisting>
 node-b.example.com
 node-c.example.com
        </programlisting>
      </example>
      <formalpara>
        <title>Distributed HBase Quickstart</title>
        <para>See <xref
            linkend="quickstart-fully-distributed" /> for a walk-through of a simple three-node
          cluster configuration with multiple ZooKeeper, backup HMaster, and RegionServer
          instances.</para>
      </formalpara>
-        <section
+      <procedure
-          xml:id="regionserver">
+        xml:id="hdfs_client_conf">
-          <title><filename>regionservers</filename></title>
+        <title>HDFS Client Configuration</title>
-
+        <step>
-          <para>In addition, a fully-distributed mode requires that you modify
+          <para>Of note, if you have made HDFS client configuration on your Hadoop cluster, such as
-              <filename>conf/regionservers</filename>. The <xref
+            configuration directives for HDFS clients, as opposed to server-side configurations, you
-              linkend="regionservers" /> file lists all hosts that you would have running
+            must use one of the following methods to enable HBase to see and use these configuration
-              <application>HRegionServer</application>s, one host per line (This file in HBase is
+            changes:</para>
-            like the Hadoop <filename>slaves</filename> file). All servers listed in this file will
+          <stepalternatives>
-            be started and stopped when HBase cluster start or stop is run.</para>
+            <step>
        </section>
        <section
          xml:id="hbase.zookeeper">
          <title>ZooKeeper and HBase</title>
          <para>See section <xref
              linkend="zookeeper" /> for ZooKeeper setup for HBase.</para>
        </section>
        <section
          xml:id="hdfs_client_conf">
          <title>HDFS Client Configuration</title>
          <para>Of note, if you have made <emphasis>HDFS client configuration</emphasis> on your
            Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to
            server-side configurations -- HBase will not see this configuration unless you do one of
            the following:</para>
          <itemizedlist>
            <listitem>
              <para>Add a pointer to your <varname>HADOOP_CONF_DIR</varname> to the
                  <varname>HBASE_CLASSPATH</varname> environment variable in
                  <filename>hbase-env.sh</filename>.</para>
-            </listitem>
+            </step>
-            <listitem>
+            <step>
              <para>Add a copy of <filename>hdfs-site.xml</filename> (or
                  <filename>hadoop-site.xml</filename>) or, better, symlinks, under
                  <filename>${HBASE_HOME}/conf</filename>, or</para>
-            </listitem>
+            </step>
-            <listitem>
+            <step>
              <para>if only a small set of HDFS client configurations, add them to
                  <filename>hbase-site.xml</filename>.</para>
-            </listitem>
+            </step>
-          </itemizedlist>
+          </stepalternatives>
-
+        </step>
-          <para>An example of such an HDFS client configuration is
+      </procedure>
-              <varname>dfs.replication</varname>. If for example, you want to run with a replication
+      <para>An example of such an HDFS client configuration is <varname>dfs.replication</varname>.
-            factor of 5, hbase will create files with the default of 3 unless you do the above to
+        If for example, you want to run with a replication factor of 5, hbase will create files with
-            make the configuration available to HBase.</para>
+        the default of 3 unless you do the above to make the configuration available to
-        </section>
+        HBase.</para>
      </section>
    </section>
  </section>
    <section
      xml:id="confirm">
@ -871,7 +897,7 @@ stopping hbase...............</screen>
        of many machines. If you are running a distributed operation, be sure to wait until HBase
        has shut down completely before stopping the Hadoop daemons.</para>
    </section>
-  </section>
+
  <!--  run modes -->
--- a/src/main/docbkx/getting_started.xml
+++ b/src/main/docbkx/getting_started.xml
@ -40,46 +40,51 @@
  <section
    xml:id="quickstart">
-    <title>Quick Start</title>
+    <title>Quick Start - Standalone HBase</title>
-    <para>This guide describes setup of a standalone HBase instance. It will run against the local
+    <para>This guide describes setup of a standalone HBase instance running against the local
-      filesystem. In later sections we will take you through how to run HBase on Apache Hadoop's
+      filesystem. This is not an appropriate configuration for a production instance of HBase, but
-      HDFS, a distributed filesystem. This section shows you how to create a table in HBase,
+      will allow you to experiment with HBase. This section shows you how to create a table in
-      inserting rows into your new HBase table via the HBase <command>shell</command>, and then
+      HBase using the <command>hbase shell</command> CLI, insert rows into the table, perform put
-      cleaning up and shutting down your standalone, local filesystem-based HBase instance. The
+      and scan operations against the table, enable or disable the table, and start and stop HBase.
-      below exercise should take no more than ten minutes (not including download time). </para>
+      Apart from downloading HBase, this procedure should take less than 10 minutes.</para>
-    <note
+    <warning
      xml:id="local.fs.durability">
      <title>Local Filesystem and Durability</title>
-      <para>Using HBase with a LocalFileSystem does not currently guarantee durability. The HDFS
+      <para><emphasis>The below advice is for HBase 0.98.2 and earlier releases only. This is fixed
-        local filesystem implementation will lose edits if files are not properly closed -- which is
+        in HBase 0.98.3 and beyond. See <link
-        very likely to happen when experimenting with a new download. You need to run HBase on HDFS
+          xlink:href="https://issues.apache.org/jira/browse/HBASE-11272">HBASE-11272</link> and
-        to ensure all writes are preserved. Running against the local filesystem though will get you
+        <link
-        off the ground quickly and get you familiar with how the general system works so lets run
+            xlink:href="https://issues.apache.org/jira/browse/HBASE-11218">HBASE-11218</link>.</emphasis></para>
-        with it for now. See <link
+      <para>Using HBase with a local filesystem does not guarantee durability. The HDFS
        local filesystem implementation will lose edits if files are not properly closed. This is
        very likely to happen when you are experimenting with new software, starting and stopping
        the daemons often and not always cleanly. You need to run HBase on HDFS
        to ensure all writes are preserved. Running against the local filesystem is intended as a
        shortcut to get you familiar with how the general system works, as the very first phase of
        evaluation. See <link
          xlink:href="https://issues.apache.org/jira/browse/HBASE-3696" /> and its associated issues
-        for more details.</para>
+        for more details about the issues of running on the local filesystem.</para>
-    </note>
+    </warning>
    <note
      xml:id="loopback.ip.getting.started">
-      <title>Loopback IP</title>
+      <title>Loopback IP - HBase 0.94.x and earlier</title>
-      <para><emphasis>The below advice is for hbase-0.94.x and older versions only. We believe this
+      <para><emphasis>The below advice is for hbase-0.94.x and older versions only. This is fixed in
-          fixed in hbase-0.96.0 and beyond (let us know if we have it wrong).</emphasis> There
+          hbase-0.96.0 and beyond.</emphasis></para>
        should be no need of the below modification to <filename>/etc/hosts</filename> in later
        versions of HBase.</para>
-      <para>HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other
+      <para>Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu
-        distributions, for example, will default to 127.0.1.1 and this will cause problems for you <footnote>
+        and some other distributions default to 127.0.1.1 and this will cause problems for you . See <link
-          <para>See <link
+          xlink:href="http://blog.devving.com/why-does-hbase-care-about-etchosts/">Why does HBase
-              xlink:href="http://blog.devving.com/why-does-hbase-care-about-etchosts/">Why does
+          care about /etc/hosts?</link> for detail.</para>
-              HBase care about /etc/hosts?</link> for detail.</para>
+      <example>
-        </footnote>. </para>
+        <title>Example /etc/hosts File for Ubuntu</title>
-      <para><filename>/etc/hosts</filename> should look something like this:</para>
+        <para>The following <filename>/etc/hosts</filename> file works correctly for HBase 0.94.x
-      <screen>
+          and earlier, on Ubuntu. Use this as a template if you run into trouble.</para>
        <screen>
 127.0.0.1 localhost
 127.0.0.1 ubuntu.ubuntu-domain ubuntu
-      </screen>
+        </screen>
-
+      </example>
    </note>
    <section>
@ -89,159 +94,611 @@
    </section>
    <section>
-      <title>Download and unpack the latest stable release.</title>
+      <title>Get Started with HBase</title>
-      <para>Choose a download site from this list of <link
+      <procedure>
        <title>Download, Configure, and Start HBase</title>
        <step>
          <para>Choose a download site from this list of <link
          xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download Mirrors</link>.
        Click on the suggested top link. This will take you to a mirror of <emphasis>HBase
          Releases</emphasis>. Click on the folder named <filename>stable</filename> and then
-        download the file that ends in <filename>.tar.gz</filename> to your local filesystem; e.g.
+        download the binary file that ends in <filename>.tar.gz</filename> to your local filesystem. Be
-          <filename>hbase-0.94.2.tar.gz</filename>.</para>
+        sure to choose the version that corresponds with the version of Hadoop you are likely to use
-
+      later. In most cases, you should choose the file for Hadoop 2, which will be called something
-      <para>Decompress and untar your download and then change into the unpacked directory.</para>
+      like <filename>hbase-0.98.3-hadoop2-bin.tar.gz</filename>. Do not download the file ending in
-
+        <filename>src.tar.gz</filename> for now.</para>
-      <screen><![CDATA[$ tar xfz hbase-<?eval ${project.version}?>.tar.gz
+        </step>
-$ cd hbase-<?eval ${project.version}?>]]>
+        <step>
-      </screen>
+          <para>Extract the downloaded file, and change to the newly-created directory.</para>
-
+          <screen>
-      <para>At this point, you are ready to start HBase. But before starting it, edit
+$ tar xzvf hbase-<![CDATA[<?eval ${project.version}?>]]>-hadoop2-bin.tar.gz  
-          <filename>conf/hbase-site.xml</filename>, the file you write your site-specific
+$ cd hbase-<![CDATA[<?eval ${project.version}?>]]>-hadoop2/
-        configurations into. Set <varname>hbase.rootdir</varname>, the directory HBase writes data
+          </screen>
-        to, and <varname>hbase.zookeeper.property.dataDir</varname>, the directory ZooKeeper writes
+        </step>
-        its data too:</para>
+        <step>
-      <programlisting><![CDATA[<?xml version="1.0"?>
+          <para>Edit <filename>conf/hbase-site.xml</filename>, which is the main HBase configuration
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+            file. At this time, you only need to specify the directory on the local filesystem where
            HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many
            servers are configured to delete the contents of /tmp upon reboot, so you should store
            the data elsewhere. The following configuration will store HBase's data in the
              <filename>hbase</filename> directory, in the home directory of the user called
              <systemitem>testuser</systemitem>. Paste the <markup>&lt;property&gt;</markup> tags beneath the
            <markup>&lt;configuration&gt;</markup> tags, which should be empty in a new HBase install.</para>
          <example>
            <title>Example <filename>hbase-site.xml</filename> for Standalone HBase</title>
            <programlisting><![CDATA[
 <configuration>
  <property>
    <name>hbase.rootdir</name>
-    <value>file:///DIRECTORY/hbase</value>
+    <value>file:///home/testuser/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
-    <value>/DIRECTORY/zookeeper</value>
+    <value>/home/testuser/zookeeper</value>
  </property>
-</configuration>]]></programlisting>
+</configuration>              
-      <para> Replace <varname>DIRECTORY</varname> in the above with the path to the directory you
+              ]]>
-        would have HBase and ZooKeeper write their data. By default,
+            </programlisting>
-          <varname>hbase.rootdir</varname> is set to <filename>/tmp/hbase-${user.name}</filename>
+          </example>
-        and similarly so for the default ZooKeeper data location which means you'll lose all your
+          <para>You do not need to create the HBase data directory. HBase will do this for you. If
-        data whenever your server reboots unless you change it (Most operating systems clear
+            you create the directory, HBase will attempt to do a migration, which is not what you
-          <filename>/tmp</filename> on restart).</para>
+            want.</para>
-    </section>
+        </step>
        <step xml:id="start_hbase">
          <para>The <filename>bin/start-hbase.sh</filename> script is provided as a convenient way
            to start HBase. Issue the command, and if all goes well, a message is logged to standard
            output showing that HBase started successfully. You can use the <command>jps</command>
            command to verify that you have one running process called <literal>HMaster</literal>
            and at least one called <literal>HRegionServer</literal>.</para>
          <note><para>Java needs to be installed and available. If you get an error indicating that
            Java is not installed, but it is on your system, perhaps in a non-standard location,
            edit the <filename>conf/hbase-env.sh</filename> file and modify the
            <envar>JAVA_HOME</envar> setting to point to the directory that contains
            <filename>bin/java</filename> your system.</para></note>
        </step>
      </procedure>
-    <section
+      <procedure xml:id="shell_exercises">
-      xml:id="start_hbase">
+        <title>Use HBase For the First Time</title>
-      <title>Start HBase</title>
+        <step>
-
+          <title>Connect to HBase.</title>
-      <para>Now start HBase:</para>
+          <para>Connect to your running instance of HBase using the <command>hbase shell</command>
-      <screen>$ ./bin/start-hbase.sh
+            command, located in the <filename>bin/</filename> directory of your HBase
-starting Master, logging to logs/hbase-user-master-example.org.out</screen>
+            install. In this example, some usage and version information that is printed when you
-
+            start HBase Shell has been omitted. The HBase Shell prompt ends with a
-      <para>You should now have a running standalone HBase instance. In standalone mode, HBase runs
+            <literal>&gt;</literal> character.</para>
-        all daemons in the the one JVM; i.e. both the HBase and ZooKeeper daemons. HBase logs can be
+          <screen>
-        found in the <filename>logs</filename> subdirectory. Check them out especially if it seems
+$ <userinput>./bin/hbase shell</userinput>
-        HBase had trouble starting.</para>
+hbase(main):001:0&gt; 
-
+          </screen>
-      <note>
+        </step>
-        <title>Is <application>java</application> installed?</title>
+        <step>
-
+          <title>Display HBase Shell Help Text.</title>
-        <para>All of the above presumes a 1.6 version of Oracle <application>java</application> is
+          <para>Type <literal>help</literal> and press Enter, to display some basic usage
-          installed on your machine and available on your path (See <xref
+            information for HBase Shell, as well as several example commands. Notice that table
-            linkend="java" />); i.e. when you type <application>java</application>, you see output
+            names, rows, columns all must be enclosed in quote characters.</para>
-          that describes the options the java program takes (HBase requires java 6). If this is not
+        </step>
-          the case, HBase will not start. Install java, edit <filename>conf/hbase-env.sh</filename>,
+        <step>
-          uncommenting the <envar>JAVA_HOME</envar> line pointing it to your java install, then,
+          <title>Create a table.</title>
-          retry the steps above.</para>
+          <para>Use the <code>create</code> command to create a new table. You must specify the
-      </note>
+            table name and the ColumnFamily name.</para>
-    </section>
+          <screen>
-
+hbase&gt; <userinput>create 'test', 'cf'</userinput>    
    <section
      xml:id="shell_exercises">
      <title>Shell Exercises</title>
      <para>Connect to your running HBase via the <command>shell</command>.</para>
      <screen><![CDATA[$ ./bin/hbase shell
 HBase Shell; enter 'help<RETURN>' for list of supported commands.
 Type "exit<RETURN>" to leave the HBase Shell
 Version: 0.90.0, r1001068, Fri Sep 24 13:55:42 PDT 2010
 hbase(main):001:0>]]> </screen>
      <para>Type <command>help</command> and then <command>&lt;RETURN&gt;</command> to see a listing
        of shell commands and options. Browse at least the paragraphs at the end of the help
        emission for the gist of how variables and command arguments are entered into the HBase
        shell; in particular note how table names, rows, and columns, etc., must be quoted.</para>
      <para>Create a table named <varname>test</varname> with a single column family named
          <varname>cf</varname>. Verify its creation by listing all tables and then insert some
        values.</para>
      <screen><![CDATA[hbase(main):003:0> create 'test', 'cf'
 0 row(s) in 1.2200 seconds
-hbase(main):003:0> list 'test'
+          </screen>
-..
+        </step>
-1 row(s) in 0.0550 seconds
+        <step>
-hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
+          <title>List Information About your Table</title>
-0 row(s) in 0.0560 seconds
+          <para>Use the <code>list</code> command to </para>
-hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
+          <screen>
-0 row(s) in 0.0370 seconds
+hbase&gt; <userinput>list 'test'</userinput>
-hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
+TABLE
-0 row(s) in 0.0450 seconds]]></screen>
+test
 1 row(s) in 0.0350 seconds
-      <para>Above we inserted 3 values, one at a time. The first insert is at
+=> ["test"]
-          <varname>row1</varname>, column <varname>cf:a</varname> with a value of
+          </screen>
-          <varname>value1</varname>. Columns in HBase are comprised of a column family prefix --
+        </step>
-          <varname>cf</varname> in this example -- followed by a colon and then a column qualifier
+        <step>
-        suffix (<varname>a</varname> in this case).</para>
+          <title>Put data into your table.</title>
          <para>To put data into your table, use the <code>put</code> command.</para>
          <screen>
 hbase&gt; <userinput>put 'test', 'row1', 'cf:a', 'value1'</userinput>
 0 row(s) in 0.1770 seconds
-      <para>Verify the data insert by running a scan of the table as follows</para>
+hbase&gt; <userinput>put 'test', 'row2', 'cf:b', 'value2'</userinput>
 0 row(s) in 0.0160 seconds
-      <screen><![CDATA[hbase(main):007:0> scan 'test'
+hbase&gt; <userinput>put 'test', 'row3', 'cf:c', 'value3'</userinput>
-ROW        COLUMN+CELL
+0 row(s) in 0.0260 seconds          
-row1       column=cf:a, timestamp=1288380727188, value=value1
+          </screen>
-row2       column=cf:b, timestamp=1288380738440, value=value2
+          <para>Here, we insert three values, one at a time. The first insert is at
-row3       column=cf:c, timestamp=1288380747365, value=value3
+              <literal>row1</literal>, column <literal>cf:a</literal>, with a value of
-3 row(s) in 0.0590 seconds]]></screen>
+              <literal>value1</literal>. Columns in HBase are comprised of a column family prefix,
              <literal>cf</literal> in this example, followed by a colon and then a column qualifier
            suffix, <literal>a</literal> in this case.</para>
        </step>
        <step>
          <title>Scan the table for all data at once.</title>
          <para>One of the ways to get data from HBase is to scan. Use the <command>scan</command>
            command to scan the table for data. You can limit your scan, but for now, all data is
            fetched.</para>
          <screen>
 hbase&gt; <userinput>scan 'test'</userinput>
 ROW                   COLUMN+CELL
 row1                 column=cf:a, timestamp=1403759475114, value=value1
 row2                 column=cf:b, timestamp=1403759492807, value=value2
 row3                 column=cf:c, timestamp=1403759503155, value=value3
 3 row(s) in 0.0440 seconds
          </screen>
        </step>
        <step>
          <title>Get a single row of data.</title>
          <para>To get a single row of data at a time, use the <command>get</command> command.</para>
          <screen>
 hbase&gt; <userinput>get 'test', 'row1'</userinput>
 COLUMN                CELL
 cf:a                 timestamp=1403759475114, value=value1
 1 row(s) in 0.0230 seconds            
          </screen>
        </step>
        <step>
          <title>Disable a table.</title>
          <para>If you want to delete a table or change its settings, as well as in some other
            situations, you need to disable the table first, using the <code>disable</code>
            command. You can re-enable it using the <code>enable</code> command.</para>
          <screen>
 hbase&gt; disable 'test'
 0 row(s) in 1.6270 seconds
-      <para>Get a single row</para>
+hbase&gt; enable 'test'
-
+0 row(s) in 0.4500 seconds
-      <screen><![CDATA[hbase(main):008:0> get 'test', 'row1'
+          </screen>
-COLUMN      CELL
+          <para>Disable the table again if you tested the <command>enable</command> command above:</para>
-cf:a        timestamp=1288380727188, value=value1
+          <screen>
-1 row(s) in 0.0400 seconds]]></screen>
+hbase&gt; disable 'test'
-
+0 row(s) in 1.6270 seconds            
-      <para>Now, disable and drop your table. This will clean up all done above.</para>
+          </screen>
-
+        </step>
-      <screen>h<![CDATA[base(main):012:0> disable 'test'
+        <step>
-0 row(s) in 1.0930 seconds
+          <title>Drop the table.</title>
-hbase(main):013:0> drop 'test'
+          <para>To drop (delete) a table, use the <code>drop</code> command.</para>
-0 row(s) in 0.0770 seconds ]]></screen>
+          <screen>
-
+hbase&gt; drop 'test'
-      <para>Exit the shell by typing exit.</para>
+0 row(s) in 0.2900 seconds            
-
+          </screen>
-      <programlisting><![CDATA[hbase(main):014:0> exit]]></programlisting>
+        </step>
        <step>
          <title>Exit the HBase Shell.</title>
          <para>To exit the HBase Shell and disconnect from your cluster, use the
              <command>quit</command> command. HBase is still running in the background.</para>
        </step>
      </procedure>
      <procedure
        xml:id="stopping">
        <title>Stop HBase</title>
        <step>
          <para>In the same way that the <filename>bin/start-hbase.sh</filename> script is provided
            to conveniently start all HBase daemons, the <filename>bin/stop-hbase.sh</filename>
            script stops them.</para>
          <screen>
 $ ./bin/stop-hbase.sh
 stopping hbase....................
 $
        </screen>
        </step>
        <step>
          <para>After issuing the command, it can take several minutes for the processes to shut
            down. Use the <command>jps</command> to be sure that the HMaster and HRegionServer
            processes are shut down.</para>
        </step>
      </procedure>
    </section>
-    <section
+    <section xml:id="quickstart-pseudo">
-      xml:id="stopping">
+      <title>Intermediate - Pseudo-Distributed Local Install</title>
-      <title>Stopping HBase</title>
+      <para>After working your way through <xref linkend="quickstart" />, you can re-configure HBase
-
+      to run in pseudo-distributed mode. Pseudo-distributed mode means
-      <para>Stop your hbase instance by running the stop script.</para>
+      that HBase still runs completely on a single host, but each HBase daemon (HMaster,
-
+      HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the
-      <screen>$ ./bin/stop-hbase.sh
+      <code>hbase.rootdir</code> property as described in <xref linkend="quickstart" />, your data
-stopping hbase...............</screen>
+        is still stored in <filename>/tmp/</filename>. In this walk-through, we store your data in
        HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to
        continue storing your data in the local filesystem.</para>
      <note>
        <title>Hadoop Configuration</title>
        <para>This procedure assumes that you have configured Hadoop and HDFS on your local system
          and or a remote system, and that they are running and available. It also assumes you are
          using Hadoop 2. Currently, the documentation on the Hadoop website does not include a
          quick start for Hadoop 2, but the guide at <link
            xlink:href="http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide">http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide</link>
          is a good starting point.</para>
      </note>
      <procedure>
        <step>
          <title>Stop HBase if it is running.</title>
          <para>If you have just finished <xref linkend="quickstart" /> and HBase is still running,
            stop it. This procedure will create a totally new directory where HBase will store its
            data, so any databases you created before will be lost.</para>
        </step>
        <step>
          <title>Configure HBase.</title>
          <para>
            Edit the <filename>hbase-site.xml</filename> configuration. First, add the following
            property, which directs HBase to run in distributed mode, with one JVM instance per
            daemon.
          </para>
          <programlisting><![CDATA[
 <property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
 </property>            
            ]]></programlisting>
          <para>Next, change the <code>hbase.rootdir</code> from the local filesystem to the address
            of your HDFS instance, using the <code>hdfs:////</code> URI syntax. In this example,
            HDFS is running on the localhost at port 8020.</para>
          <programlisting><![CDATA[
 <property>
  <name>hbase.rootdir</name>
  <value>hdfs://localhost:8020/hbase</value>
 </property>            
            ]]>
          </programlisting>
          <para>You do not need to create the directory in HDFS. HBase will do this for you. If you
            create the directory, HBase will attempt to do a migration, which is not what you
            want.</para>
        </step>
        <step>
          <title>Start HBase.</title>
          <para>Use the <filename>bin/start-hbase.sh</filename> command to start HBase. If your
            system is configured correctly, the <command>jps</command> command should show the
            HMaster and HRegionServer processes running.</para>
        </step>
        <step>
          <title>Check the HBase directory in HDFS.</title>
          <para>If everything worked correctly, HBase created its directory in HDFS. In the
            configuration above, it is stored in <filename>/hbase/</filename> on HDFS. You can use
            the <command>hadoop fs</command> command in Hadoop's <filename>bin/</filename> directory
            to list this directory.</para>
          <screen>
 $ <userinput>./bin/hadoop fs -ls /hbase</userinput>
 Found 7 items
 drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/.tmp
 drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/WALs
 drwxr-xr-x   - hbase users          0 2014-06-25 18:48 /hbase/corrupt
 drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/data
 -rw-r--r--   3 hbase users         42 2014-06-25 18:41 /hbase/hbase.id
 -rw-r--r--   3 hbase users          7 2014-06-25 18:41 /hbase/hbase.version
 drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/oldWALs
          </screen>
        </step>
        <step>
          <title>Create a table and populate it with data.</title>
          <para>You can use the HBase Shell to create a table, populate it with data, scan and get
            values from it, using the same procedure as in <xref linkend="shell_exercises" />.</para>
        </step>
        <step>
          <title>Start and stop a backup HBase Master (HMaster) server.</title>
          <note>
            <para>Running multiple HMaster instances on the same hardware does not make sense in a
              production environment, in the same way that running a pseudo-distributed cluster does
              not make sense for production. This step is offered for testing and learning purposes
              only.</para>
          </note>
          <para>The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster
            servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster,
            use the <command>local-master-backup.sh</command>. For each backup master you want to
            start, add a parameter representing the port offset for that master. Each HMaster uses
            two ports (16000 and 16010 by default). The port offset is added to these ports, so
            using an offset of 2, the first backup HMaster would use ports 16002 and 16012. The
            following command starts 3 backup servers using ports 16002/16012, 16003/16013, and
            16005/16015.</para>
            <screen>
 $ ./bin/local-master-backup.sh 2 3 5             
            </screen>
          <para>To kill a backup master without killing the entire cluster, you need to find its
            process ID (PID). The PID is stored in a file with a name like
            <filename>/tmp/hbase-<replaceable>USER</replaceable>-<replaceable>X</replaceable>-master.pid</filename>.
          The only contents of the file are the PID. You can use the <command>kill -9</command>
            command to kill that PID. The following command will kill the master with port offset 1,
          but leave the cluster running:</para>
          <screen>
 $ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9            
          </screen>
        </step>
        <step>
          <title>Start and stop additional RegionServers</title>
          <para>The HRegionServer manages the data in its StoreFiles as directed by the HMaster.
            Generally, one HRegionServer runs per node in the cluster. Running multiple
            HRegionServers on the same system can be useful for testing in pseudo-distributed mode.
            The <command>local-regionservers.sh</command> command allows you to run multiple
            RegionServers. It works in a similar way to the
            <command>local-master-backup.sh</command> command, in that each parameter you provide
            represents the port offset for an instance. Each RegionServer requires two ports, and
            the default ports are 16200 and 16300. You can run 99 additional RegionServers, or 100
            total, on a server. The following command starts four additional
          RegionServers, running on sequential ports starting at 16202/16302.</para>
          <screen>
 $ .bin/local-regionservers.sh start 2 3 4 5            
          </screen>
          <para>To stop a RegionServer manually, use the <command>local-regionservers.sh</command>
            command with the <literal>stop</literal> parameter and the offset of the server to
            stop.</para>
          <screen>$ .bin/local-regionservers.sh stop 3</screen>
        </step>
        <step>
          <title>Stop HBase.</title>
          <para>You can stop HBase the same way as in the <xref
              linkend="quickstart" /> procedure, using the
            <filename>bin/stop-hbase.sh</filename> command.</para>
        </step>
      </procedure>
    </section>
    <section xml:id="quickstart-fully-distributed">
      <title>Advanced - Fully Distributed</title>
      <para>In reality, you need a fully-distributed configuration to fully test HBase and to use it
        in real-world scenarios. In a distributed configuration, the cluster contains multiple
        nodes, each of which runs one or more HBase daemon. These include primary and backup Master
        instances, multiple Zookeeper nodes, and multiple RegionServer nodes.</para>
      <para>This advanced quickstart adds two more nodes to your cluster. The architecture will be
        as follows:</para>
      <table>
        <title>Distributed Cluster Demo Architecture</title>
        <tgroup cols="4">
          <thead>
            <row>
              <entry>Node Name</entry>
              <entry>Master</entry>
              <entry>ZooKeeper</entry>
              <entry>RegionServer</entry>
            </row>
          </thead>
          <tbody>
            <row>
              <entry>node-a.example.com</entry>
              <entry>yes</entry>
              <entry>yes</entry>
              <entry>no</entry>
            </row>
            <row>
              <entry>node-b.example.com</entry>
              <entry>backup</entry>
              <entry>yes</entry>
              <entry>yes</entry>
            </row>
            <row>
              <entry>node-c.example.com</entry>
              <entry>no</entry>
              <entry>yes</entry>
              <entry>yes</entry>
            </row>
          </tbody>
        </tgroup>
      </table>
      <para>This quickstart assumes that each node is a virtual machine and that they are all on the
      same network. It builds upon the previous quickstart, <xref linkend="quickstart-pseudo" />,
        assuming that the system you configured in that procedure is now <code>node-a</code>. Stop HBase on <code>node-a</code>
        before continuing.</para>
      <note>
        <para>Be sure that all the nodes have full access to communicate, and that no firewall rules
        are in place which could prevent them from talking to each other. If you see any errors like
        <literal>no route to host</literal>, check your firewall.</para>
      </note>
      <procedure xml:id="passwordless.ssh.quickstart">
        <title>Configure Password-Less SSH Access</title>
        <para><code>node-a</code> needs to be able to log into <code>node-b</code> and
          <code>node-c</code> (and to itself) in order to start the daemons. The easiest way to accomplish this is
          to use the same username on all hosts, and configure password-less SSH login from
          <code>node-a</code> to each of the others. </para>
        <step>
          <title>On <code>node-a</code>, generate a key pair.</title>
          <para>While logged in as the user who will run HBase, generate a SSH key pair, using the
            following command:
          </para>
          <screen>$ ssh-keygen -t rsa</screen>
          <para>If the command succeeds, the location of the key pair is printed to standard output.
          The default name of the public key is <filename>id_rsa.pub</filename>.</para>
        </step>
        <step>
          <title>Create the directory that will hold the shared keys on the other nodes.</title>
          <para>On <code>node-b</code> and <code>node-c</code>, log in as the HBase user and create
            a <filename>.ssh/</filename> directory in the user's home directory, if it does not
            already exist. If it already exists, be aware that it may already contain other keys.</para>
        </step>
        <step>
          <title>Copy the public key to the other nodes.</title>
          <para>Securely copy the public key from <code>node-a</code> to each of the nodes, by
            using the <command>scp</command> or some other secure means. On each of the other nodes,
            create a new file called <filename>.ssh/authorized_keys</filename> <emphasis>if it does
              not already exist</emphasis>, and append the contents of the
            <filename>id_rsa.pub</filename> file to the end of it. Note that you also need to do
            this for <code>node-a</code> itself.</para>
          <screen>$ cat id_rsa.pub &gt;&gt; ~/.ssh/authorized_keys</screen>
        </step>
        <step>
          <title>Test password-less login.</title>
          <para>If you performed the procedure correctly, if you SSH from <code>node-a</code> to
            either of the other nodes, using the same username, you should not be prompted for a password.
          </para>
        </step>
        <step>
          <para>Since <code>node-b</code> will run a backup Master, repeat the procedure above,
            substituting <code>node-b</code> everywhere you see <code>node-a</code>. Be sure not to
            overwrite your existing <filename>.ssh/authorized_keys</filename> files, but concatenate
          the new key onto the existing file using the <code>&gt;&gt;</code> operator rather than
            the <code>&gt;</code> operator.</para>
        </step>
      </procedure>
      <procedure>
        <title>Prepare <code>node-a</code></title>
        <para><code>node-a</code> will run your primary master and ZooKeeper processes, but no
          RegionServers.</para>
        <step>
          <title>Stop the RegionServer from starting on <code>node-a</code>.</title>
          <para>Edit <filename>conf/regionservers</filename> and remove the line which contains
              <literal>localhost</literal>. Add lines with the hostnames or IP addresses for
              <code>node-b</code> and <code>node-c</code>. Even if you did want to run a
            RegionServer on <code>node-a</code>, you should refer to it by the hostname the other
            servers would use to communicate with it. In this case, that would be
              <literal>node-a.example.com</literal>. This enables you to distribute the
            configuration to each node of your cluster any hostname conflicts. Save the file.</para>
        </step>
        <step>
          <title>Configure HBase to use <code>node-b</code> as a backup master.</title>
          <para>Create a new file in <filename>conf/</filename> called
            <filename>backup-masters</filename>, and add a new line to it with the hostname for
            <code>node-b</code>. In this demonstration, the hostname is
            <literal>node-b.example.com</literal>.</para>
        </step>
        <step>
          <title>Configure ZooKeeper</title>
          <para>In reality, you should carefully consider your ZooKeeper configuration. You can find
            out more about configuring ZooKeeper in <xref
              linkend="zookeeper" />. This configuration will direct HBase to start and manage a
            ZooKeeper instance on each node of the cluster.</para>
          <para>On <code>node-a</code>, edit <filename>conf/hbase-site.xml</filename> and add the
            following properties.</para>
          <programlisting><![CDATA[
 <property>
  <name>hbase.zookeeper.quorum</name>
  <value>node-a.example.com,node-b.example.com,node-c.example.com</value>
 </property>
 <property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/usr/local/zookeeper</value>
 </property>            
            ]]></programlisting>
        </step>
        <step>
          <para>Everywhere in your configuration that you have referred to <code>node-a</code> as
            <literal>localhost</literal>, change the reference to point to the hostname that
            the other nodes will use to refer to <code>node-a</code>. In these examples, the
            hostname is <literal>node-a.example.com</literal>.</para>
        </step>
      </procedure>
      <procedure>
        <title>Prepare <code>node-b</code> and <code>node-c</code></title>
        <para><code>node-b</code> will run a backup master server and a ZooKeeper instance.</para>
        <step>
          <title>Download and unpack HBase.</title>
          <para>Download and unpack HBase to <code>node-b</code>, just as you did for the standalone
          and pseudo-distributed quickstarts.</para>
        </step>
        <step>
          <title>Copy the configuration files from <code>node-a</code> to <code>node-b</code>.and
            <code>node-c</code>.</title>
          <para>Each node of your cluster needs to have the same configuration information. Copy the
            contents of the <filename>conf/</filename> directory to the <filename>conf/</filename>
            directory on <code>node-b</code> and <code>node-c</code>.</para>
        </step>
      </procedure>
      <procedure>
        <title>Start and Test Your Cluster</title>
        <step>
          <title>Be sure HBase is not running on any node.</title>
          <para>If you forgot to stop HBase from previous testing, you will have errors. Check to
            see whether HBase is running on any of your nodes by using the <command>jps</command>
            command. Look for the processes <literal>HMaster</literal>,
            <literal>HRegionServer</literal>, and <literal>HQuorumPeer</literal>. If they exist,
            kill them.</para>
        </step>
        <step>
          <title>Start the cluster.</title>
          <para>On <code>node-a</code>, issue the <command>start-hbase.sh</command> command. Your
            output will be similar to that below.</para>
          <screen>
 $ <userinput>bin/start-hbase.sh</userinput>
 node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
 node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
 node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
 starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
 node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
 node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out            
 node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out          
          </screen>
          <para>ZooKeeper starts first, followed by the master, then the RegionServers, and finally
            the backup masters. </para>
        </step>
        <step>
          <title>Verify that the processes are running.</title>
          <para>On each node of the cluster, run the <command>jps</command> command and verify that
            the correct processes are running on each server. You may see additional Java processes
            running on your servers as well, if they are used for other purposes.</para>
          <example>
            <title><code>node-a</code> <command>jps</command> Output</title>
            <screen>
 $ <userinput>jps</userinput>
 20355 Jps
 20071 HQuorumPeer
 20137 HMaster    
            </screen>
          </example>
          <example>
            <title><code>node-b</code> <command>jps</command> Output</title>
            <screen>
 $ <userinput>jps</userinput>
 15930 HRegionServer
 16194 Jps
 15838 HQuorumPeer
 16010 HMaster            
            </screen>
          </example>
          <example>
            <title><code>node-c</code> <command>jps</command> Output</title>
            <screen>
 $ <userinput>jps</userinput>    
 13901 Jps
 13639 HQuorumPeer
 13737 HRegionServer
            </screen>
          </example>
          <note>
            <title>ZooKeeper Process Name</title>
            <para>The <code>HQuorumPeer</code> process is a ZooKeeper instance which is controlled
              and started by HBase. If you use ZooKeeper this way, it is limited to one instance per
              cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of
              HBase, the process is called <code>QuorumPeer</code>. For more about ZooKeeper
              configuration, including using an external ZooKeeper instance with HBase, see <xref
                linkend="zookeeper" />.</para>
          </note>
        </step>
        <step>
          <title>Browse to the Web UI.</title>
          <note>
            <title>Web UI Port Changes</title>
            <para>In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from
              60010 for the Master and 60030 for each RegionServer to 16610 for the Master and 16030
              for the RegionServer.</para>
          </note>
          <para>If everything is set up correctly, you should be able to connect to the UI for the
            Master <literal>http://node-a.example.com:60110/</literal> or the secondary master at
              <literal>http://node-b.example.com:60110/</literal> for the secondary master, using a
            web browser. If you can connect via <code>localhost</code> but not from another host,
            check your firewall rules. You can see the web UI for each of the RegionServers at port
            60130 of their IP addresses, or by clicking their links in the web UI for the
            Master.</para>
        </step>
        <step>
          <title>Test what happens when nodes or services disappear.</title>
          <para>With a three-node cluster like you have configured, things will not be very
            resilient. Still, you can test what happens when the primary Master or a RegionServer
            disappears, by killing the processes and watching the logs.</para>
        </step>
      </procedure>
    </section>
    <section>
      <title>Where to go next</title>
-      <para>The above described standalone setup is good for testing and experiments only. In the
+      <para>The next chapter, <xref
-        next chapter, <xref
+          linkend="configuration" />, gives more information about the different HBase run modes,
-          linkend="configuration" />, we'll go into depth on the different HBase run modes, system
+        system requirements for running HBase, and critical configuration areas for setting up a
-        requirements running HBase, and critical configurations setting up a distributed HBase
+        distributed HBase cluster.</para>
        deploy.</para>
    </section>
  </section>
 </chapter>