hbase/src/main/docbkx/getting_started.xml

<?xml version="1.0" encoding="UTF-8"?>
<chapter
  version="5.0"
  xml:id="getting_started"
  xmlns="http://docbook.org/ns/docbook"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:svg="http://www.w3.org/2000/svg"
  xmlns:m="http://www.w3.org/1998/Math/MathML"
  xmlns:html="http://www.w3.org/1999/xhtml"
  xmlns:db="http://docbook.org/ns/docbook">
  <!--
/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
  <title>Getting Started</title>

  <section>
    <title>Introduction</title>

    <para><xref
        linkend="quickstart" /> will get you up and running on a single-node, standalone instance of
      HBase. </para>
  </section>

  <section
    xml:id="quickstart">
    <title>Quick Start - Standalone HBase</title>

    <para>This guide describes setup of a standalone HBase instance running against the local
      filesystem. This is not an appropriate configuration for a production instance of HBase, but
      will allow you to experiment with HBase. This section shows you how to create a table in
      HBase using the <command>hbase shell</command> CLI, insert rows into the table, perform put
      and scan operations against the table, enable or disable the table, and start and stop HBase.
      Apart from downloading HBase, this procedure should take less than 10 minutes.</para>
    <warning
      xml:id="local.fs.durability">
      <title>Local Filesystem and Durability</title>
      <para><emphasis>The below advice is for HBase 0.98.2 and earlier releases only. This is fixed
        in HBase 0.98.3 and beyond. See <link
          xlink:href="https://issues.apache.org/jira/browse/HBASE-11272">HBASE-11272</link> and
        <link
            xlink:href="https://issues.apache.org/jira/browse/HBASE-11218">HBASE-11218</link>.</emphasis></para>
      <para>Using HBase with a local filesystem does not guarantee durability. The HDFS
        local filesystem implementation will lose edits if files are not properly closed. This is
        very likely to happen when you are experimenting with new software, starting and stopping
        the daemons often and not always cleanly. You need to run HBase on HDFS
        to ensure all writes are preserved. Running against the local filesystem is intended as a
        shortcut to get you familiar with how the general system works, as the very first phase of
        evaluation. See <link
          xlink:href="https://issues.apache.org/jira/browse/HBASE-3696" /> and its associated issues
        for more details about the issues of running on the local filesystem.</para>
    </warning>
    <note
      xml:id="loopback.ip.getting.started">
      <title>Loopback IP - HBase 0.94.x and earlier</title>
      <para><emphasis>The below advice is for hbase-0.94.x and older versions only. This is fixed in
          hbase-0.96.0 and beyond.</emphasis></para>

      <para>Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu
        and some other distributions default to 127.0.1.1 and this will cause problems for you . See <link
          xlink:href="http://blog.devving.com/why-does-hbase-care-about-etchosts/">Why does HBase
          care about /etc/hosts?</link> for detail.</para>
      <example>
        <title>Example /etc/hosts File for Ubuntu</title>
        <para>The following <filename>/etc/hosts</filename> file works correctly for HBase 0.94.x
          and earlier, on Ubuntu. Use this as a template if you run into trouble.</para>
        <screen>
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
        </screen>
      </example>
    </note>

    <section>
      <title>JDK Version Requirements</title>
      <para>HBase requires that a JDK be installed. See <xref linkend="java" /> for information
        about supported JDK versions.</para>
    </section>

    <section>
      <title>Get Started with HBase</title>

      <procedure>
        <title>Download, Configure, and Start HBase</title>
        <step>
          <para>Choose a download site from this list of <link
          xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download Mirrors</link>.
        Click on the suggested top link. This will take you to a mirror of <emphasis>HBase
          Releases</emphasis>. Click on the folder named <filename>stable</filename> and then
        download the binary file that ends in <filename>.tar.gz</filename> to your local filesystem. Be
        sure to choose the version that corresponds with the version of Hadoop you are likely to use
      later. In most cases, you should choose the file for Hadoop 2, which will be called something
      like <filename>hbase-0.98.3-hadoop2-bin.tar.gz</filename>. Do not download the file ending in
        <filename>src.tar.gz</filename> for now.</para>
        </step>
        <step>
          <para>Extract the downloaded file, and change to the newly-created directory.</para>
          <screen>
$ tar xzvf hbase-<![CDATA[<?eval ${project.version}?>]]>-hadoop2-bin.tar.gz
$ cd hbase-<![CDATA[<?eval ${project.version}?>]]>-hadoop2/
          </screen>
        </step>
        <step>
          <para>Edit <filename>conf/hbase-site.xml</filename>, which is the main HBase configuration
            file. At this time, you only need to specify the directory on the local filesystem where
            HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many
            servers are configured to delete the contents of /tmp upon reboot, so you should store
            the data elsewhere. The following configuration will store HBase's data in the
              <filename>hbase</filename> directory, in the home directory of the user called
              <systemitem>testuser</systemitem>. Paste the <markup>&lt;property&gt;</markup> tags beneath the
            <markup>&lt;configuration&gt;</markup> tags, which should be empty in a new HBase install.</para>
          <example>
            <title>Example <filename>hbase-site.xml</filename> for Standalone HBase</title>
            <programlisting><![CDATA[
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/testuser/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/testuser/zookeeper</value>
  </property>
</configuration>
              ]]>
            </programlisting>
          </example>
          <para>You do not need to create the HBase data directory. HBase will do this for you. If
            you create the directory, HBase will attempt to do a migration, which is not what you
            want.</para>
        </step>
        <step xml:id="start_hbase">
          <para>The <filename>bin/start-hbase.sh</filename> script is provided as a convenient way
            to start HBase. Issue the command, and if all goes well, a message is logged to standard
            output showing that HBase started successfully. You can use the <command>jps</command>
            command to verify that you have one running process called <literal>HMaster</literal>
            and at least one called <literal>HRegionServer</literal>.</para>
          <note><para>Java needs to be installed and available. If you get an error indicating that
            Java is not installed, but it is on your system, perhaps in a non-standard location,
            edit the <filename>conf/hbase-env.sh</filename> file and modify the
            <envar>JAVA_HOME</envar> setting to point to the directory that contains
            <filename>bin/java</filename> your system.</para></note>
        </step>
      </procedure>

      <procedure xml:id="shell_exercises">
        <title>Use HBase For the First Time</title>
        <step>
          <title>Connect to HBase.</title>
          <para>Connect to your running instance of HBase using the <command>hbase shell</command>
            command, located in the <filename>bin/</filename> directory of your HBase
            install. In this example, some usage and version information that is printed when you
            start HBase Shell has been omitted. The HBase Shell prompt ends with a
            <literal>&gt;</literal> character.</para>
          <screen>
$ <userinput>./bin/hbase shell</userinput>
hbase(main):001:0&gt;
          </screen>
        </step>
        <step>
          <title>Display HBase Shell Help Text.</title>
          <para>Type <literal>help</literal> and press Enter, to display some basic usage
            information for HBase Shell, as well as several example commands. Notice that table
            names, rows, columns all must be enclosed in quote characters.</para>
        </step>
        <step>
          <title>Create a table.</title>
          <para>Use the <code>create</code> command to create a new table. You must specify the
            table name and the ColumnFamily name.</para>
          <screen>
hbase&gt; <userinput>create 'test', 'cf'</userinput>
0 row(s) in 1.2200 seconds
          </screen>
        </step>
        <step>
          <title>List Information About your Table</title>
          <para>Use the <code>list</code> command to </para>
          <screen>
hbase&gt; <userinput>list 'test'</userinput>
TABLE
test
1 row(s) in 0.0350 seconds

=> ["test"]
          </screen>
        </step>
        <step>
          <title>Put data into your table.</title>
          <para>To put data into your table, use the <code>put</code> command.</para>
          <screen>
hbase&gt; <userinput>put 'test', 'row1', 'cf:a', 'value1'</userinput>
0 row(s) in 0.1770 seconds

hbase&gt; <userinput>put 'test', 'row2', 'cf:b', 'value2'</userinput>
0 row(s) in 0.0160 seconds

hbase&gt; <userinput>put 'test', 'row3', 'cf:c', 'value3'</userinput>
0 row(s) in 0.0260 seconds
          </screen>
          <para>Here, we insert three values, one at a time. The first insert is at
              <literal>row1</literal>, column <literal>cf:a</literal>, with a value of
              <literal>value1</literal>. Columns in HBase are comprised of a column family prefix,
              <literal>cf</literal> in this example, followed by a colon and then a column qualifier
            suffix, <literal>a</literal> in this case.</para>
        </step>
        <step>
          <title>Scan the table for all data at once.</title>
          <para>One of the ways to get data from HBase is to scan. Use the <command>scan</command>
            command to scan the table for data. You can limit your scan, but for now, all data is
            fetched.</para>
          <screen>
hbase&gt; <userinput>scan 'test'</userinput>
ROW                   COLUMN+CELL
 row1                 column=cf:a, timestamp=1403759475114, value=value1
 row2                 column=cf:b, timestamp=1403759492807, value=value2
 row3                 column=cf:c, timestamp=1403759503155, value=value3
3 row(s) in 0.0440 seconds
          </screen>
        </step>
        <step>
          <title>Get a single row of data.</title>
          <para>To get a single row of data at a time, use the <command>get</command> command.</para>
          <screen>
hbase&gt; <userinput>get 'test', 'row1'</userinput>
COLUMN                CELL
 cf:a                 timestamp=1403759475114, value=value1
1 row(s) in 0.0230 seconds
          </screen>
        </step>
        <step>
          <title>Disable a table.</title>
          <para>If you want to delete a table or change its settings, as well as in some other
            situations, you need to disable the table first, using the <code>disable</code>
            command. You can re-enable it using the <code>enable</code> command.</para>
          <screen>
hbase&gt; disable 'test'
0 row(s) in 1.6270 seconds

hbase&gt; enable 'test'
0 row(s) in 0.4500 seconds
          </screen>
          <para>Disable the table again if you tested the <command>enable</command> command above:</para>
          <screen>
hbase&gt; disable 'test'
0 row(s) in 1.6270 seconds
          </screen>
        </step>
        <step>
          <title>Drop the table.</title>
          <para>To drop (delete) a table, use the <code>drop</code> command.</para>
          <screen>
hbase&gt; drop 'test'
0 row(s) in 0.2900 seconds
          </screen>
        </step>
        <step>
          <title>Exit the HBase Shell.</title>
          <para>To exit the HBase Shell and disconnect from your cluster, use the
              <command>quit</command> command. HBase is still running in the background.</para>
        </step>
      </procedure>

      <procedure
        xml:id="stopping">
        <title>Stop HBase</title>
        <step>
          <para>In the same way that the <filename>bin/start-hbase.sh</filename> script is provided
            to conveniently start all HBase daemons, the <filename>bin/stop-hbase.sh</filename>
            script stops them.</para>
          <screen>
$ ./bin/stop-hbase.sh
stopping hbase....................
$
        </screen>
        </step>
        <step>
          <para>After issuing the command, it can take several minutes for the processes to shut
            down. Use the <command>jps</command> to be sure that the HMaster and HRegionServer
            processes are shut down.</para>
        </step>
      </procedure>
    </section>

    <section xml:id="quickstart-pseudo">
      <title>Intermediate - Pseudo-Distributed Local Install</title>
      <para>After working your way through <xref linkend="quickstart" />, you can re-configure HBase
      to run in pseudo-distributed mode. Pseudo-distributed mode means
      that HBase still runs completely on a single host, but each HBase daemon (HMaster,
      HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the
      <code>hbase.rootdir</code> property as described in <xref linkend="quickstart" />, your data
        is still stored in <filename>/tmp/</filename>. In this walk-through, we store your data in
        HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to
        continue storing your data in the local filesystem.</para>
      <note>
        <title>Hadoop Configuration</title>
        <para>This procedure assumes that you have configured Hadoop and HDFS on your local system
          and or a remote system, and that they are running and available. It also assumes you are
          using Hadoop 2. Currently, the documentation on the Hadoop website does not include a
          quick start for Hadoop 2, but the guide at <link
            xlink:href="http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide">http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide</link>
          is a good starting point.</para>
      </note>
      <procedure>
        <step>
          <title>Stop HBase if it is running.</title>
          <para>If you have just finished <xref linkend="quickstart" /> and HBase is still running,
            stop it. This procedure will create a totally new directory where HBase will store its
            data, so any databases you created before will be lost.</para>
        </step>
        <step>
          <title>Configure HBase.</title>
          <para>
            Edit the <filename>hbase-site.xml</filename> configuration. First, add the following
            property, which directs HBase to run in distributed mode, with one JVM instance per
            daemon.
          </para>
          <programlisting><![CDATA[
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
            ]]></programlisting>
          <para>Next, change the <code>hbase.rootdir</code> from the local filesystem to the address
            of your HDFS instance, using the <code>hdfs:////</code> URI syntax. In this example,
            HDFS is running on the localhost at port 8020.</para>
          <programlisting><![CDATA[
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://localhost:8020/hbase</value>
</property>
            ]]>
          </programlisting>
          <para>You do not need to create the directory in HDFS. HBase will do this for you. If you
            create the directory, HBase will attempt to do a migration, which is not what you
            want.</para>
        </step>
        <step>
          <title>Start HBase.</title>
          <para>Use the <filename>bin/start-hbase.sh</filename> command to start HBase. If your
            system is configured correctly, the <command>jps</command> command should show the
            HMaster and HRegionServer processes running.</para>
        </step>
        <step>
          <title>Check the HBase directory in HDFS.</title>
          <para>If everything worked correctly, HBase created its directory in HDFS. In the
            configuration above, it is stored in <filename>/hbase/</filename> on HDFS. You can use
            the <command>hadoop fs</command> command in Hadoop's <filename>bin/</filename> directory
            to list this directory.</para>
          <screen>
$ <userinput>./bin/hadoop fs -ls /hbase</userinput>
Found 7 items
drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x   - hbase users          0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x   - hbase users          0 2014-06-25 18:58 /hbase/data
-rw-r--r--   3 hbase users         42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r--   3 hbase users          7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x   - hbase users          0 2014-06-25 21:49 /hbase/oldWALs
          </screen>
        </step>
        <step>
          <title>Create a table and populate it with data.</title>
          <para>You can use the HBase Shell to create a table, populate it with data, scan and get
            values from it, using the same procedure as in <xref linkend="shell_exercises" />.</para>
        </step>
        <step>
          <title>Start and stop a backup HBase Master (HMaster) server.</title>
          <note>
            <para>Running multiple HMaster instances on the same hardware does not make sense in a
              production environment, in the same way that running a pseudo-distributed cluster does
              not make sense for production. This step is offered for testing and learning purposes
              only.</para>
          </note>
          <para>The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster
            servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster,
            use the <command>local-master-backup.sh</command>. For each backup master you want to
            start, add a parameter representing the port offset for that master. Each HMaster uses
            three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so
            using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The
            following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033,
            and 16015/16025/16035.</para>
            <screen>
$ ./bin/local-master-backup.sh 2 3 5
            </screen>
          <para>To kill a backup master without killing the entire cluster, you need to find its
            process ID (PID). The PID is stored in a file with a name like
            <filename>/tmp/hbase-<replaceable>USER</replaceable>-<replaceable>X</replaceable>-master.pid</filename>.
          The only contents of the file are the PID. You can use the <command>kill -9</command>
            command to kill that PID. The following command will kill the master with port offset 1,
          but leave the cluster running:</para>
          <screen>
$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
          </screen>
        </step>
        <step>
          <title>Start and stop additional RegionServers</title>
          <para>The HRegionServer manages the data in its StoreFiles as directed by the HMaster.
            Generally, one HRegionServer runs per node in the cluster. Running multiple
            HRegionServers on the same system can be useful for testing in pseudo-distributed mode.
            The <command>local-regionservers.sh</command> command allows you to run multiple
            RegionServers. It works in a similar way to the
            <command>local-master-backup.sh</command> command, in that each parameter you provide
            represents the port offset for an instance. Each RegionServer requires two ports, and
            the default ports are 16020 and 16030. However, the base ports for additional RegionServers
            are not the default ports since the default ports are used by the HMaster, which is also
            a RegionServer since HBase version 1.0.0. The base ports are 16200 and 16300 instead.
            You can run 99 additional RegionServers that are not a HMaster or backup HMaster,
            on a server. The following command starts four additional RegionServers, running on
            sequential ports starting at 16202/16302 (base ports 16200/16300 plus 2).</para>
          <screen>
$ .bin/local-regionservers.sh start 2 3 4 5
          </screen>
          <para>To stop a RegionServer manually, use the <command>local-regionservers.sh</command>
            command with the <literal>stop</literal> parameter and the offset of the server to
            stop.</para>
          <screen>$ .bin/local-regionservers.sh stop 3</screen>
        </step>
        <step>
          <title>Stop HBase.</title>
          <para>You can stop HBase the same way as in the <xref
              linkend="quickstart" /> procedure, using the
            <filename>bin/stop-hbase.sh</filename> command.</para>
        </step>
      </procedure>
    </section>

    <section xml:id="quickstart-fully-distributed">
      <title>Advanced - Fully Distributed</title>
      <para>In reality, you need a fully-distributed configuration to fully test HBase and to use it
        in real-world scenarios. In a distributed configuration, the cluster contains multiple
        nodes, each of which runs one or more HBase daemon. These include primary and backup Master
        instances, multiple Zookeeper nodes, and multiple RegionServer nodes.</para>
      <para>This advanced quickstart adds two more nodes to your cluster. The architecture will be
        as follows:</para>
      <table>
        <title>Distributed Cluster Demo Architecture</title>
        <tgroup cols="4">
          <thead>
            <row>
              <entry>Node Name</entry>
              <entry>Master</entry>
              <entry>ZooKeeper</entry>
              <entry>RegionServer</entry>
            </row>
          </thead>
          <tbody>
            <row>
              <entry>node-a.example.com</entry>
              <entry>yes</entry>
              <entry>yes</entry>
              <entry>no</entry>
            </row>
            <row>
              <entry>node-b.example.com</entry>
              <entry>backup</entry>
              <entry>yes</entry>
              <entry>yes</entry>
            </row>
            <row>
              <entry>node-c.example.com</entry>
              <entry>no</entry>
              <entry>yes</entry>
              <entry>yes</entry>
            </row>
          </tbody>
        </tgroup>
      </table>
      <para>This quickstart assumes that each node is a virtual machine and that they are all on the
      same network. It builds upon the previous quickstart, <xref linkend="quickstart-pseudo" />,
        assuming that the system you configured in that procedure is now <code>node-a</code>. Stop HBase on <code>node-a</code>
        before continuing.</para>
      <note>
        <para>Be sure that all the nodes have full access to communicate, and that no firewall rules
        are in place which could prevent them from talking to each other. If you see any errors like
        <literal>no route to host</literal>, check your firewall.</para>
      </note>
      <procedure xml:id="passwordless.ssh.quickstart">
        <title>Configure Password-Less SSH Access</title>
        <para><code>node-a</code> needs to be able to log into <code>node-b</code> and
          <code>node-c</code> (and to itself) in order to start the daemons. The easiest way to accomplish this is
          to use the same username on all hosts, and configure password-less SSH login from
          <code>node-a</code> to each of the others. </para>
        <step>
          <title>On <code>node-a</code>, generate a key pair.</title>
          <para>While logged in as the user who will run HBase, generate a SSH key pair, using the
            following command:
          </para>
          <screen>$ ssh-keygen -t rsa</screen>
          <para>If the command succeeds, the location of the key pair is printed to standard output.
          The default name of the public key is <filename>id_rsa.pub</filename>.</para>
        </step>
        <step>
          <title>Create the directory that will hold the shared keys on the other nodes.</title>
          <para>On <code>node-b</code> and <code>node-c</code>, log in as the HBase user and create
            a <filename>.ssh/</filename> directory in the user's home directory, if it does not
            already exist. If it already exists, be aware that it may already contain other keys.</para>
        </step>
        <step>
          <title>Copy the public key to the other nodes.</title>
          <para>Securely copy the public key from <code>node-a</code> to each of the nodes, by
            using the <command>scp</command> or some other secure means. On each of the other nodes,
            create a new file called <filename>.ssh/authorized_keys</filename> <emphasis>if it does
              not already exist</emphasis>, and append the contents of the
            <filename>id_rsa.pub</filename> file to the end of it. Note that you also need to do
            this for <code>node-a</code> itself.</para>
          <screen>$ cat id_rsa.pub &gt;&gt; ~/.ssh/authorized_keys</screen>
        </step>
        <step>
          <title>Test password-less login.</title>
          <para>If you performed the procedure correctly, if you SSH from <code>node-a</code> to
            either of the other nodes, using the same username, you should not be prompted for a password.
          </para>
        </step>
        <step>
          <para>Since <code>node-b</code> will run a backup Master, repeat the procedure above,
            substituting <code>node-b</code> everywhere you see <code>node-a</code>. Be sure not to
            overwrite your existing <filename>.ssh/authorized_keys</filename> files, but concatenate
          the new key onto the existing file using the <code>&gt;&gt;</code> operator rather than
            the <code>&gt;</code> operator.</para>
        </step>
      </procedure>

      <procedure>
        <title>Prepare <code>node-a</code></title>
        <para><code>node-a</code> will run your primary master and ZooKeeper processes, but no
          RegionServers.</para>
        <step>
          <title>Stop the RegionServer from starting on <code>node-a</code>.</title>
          <para>Edit <filename>conf/regionservers</filename> and remove the line which contains
              <literal>localhost</literal>. Add lines with the hostnames or IP addresses for
              <code>node-b</code> and <code>node-c</code>. Even if you did want to run a
            RegionServer on <code>node-a</code>, you should refer to it by the hostname the other
            servers would use to communicate with it. In this case, that would be
              <literal>node-a.example.com</literal>. This enables you to distribute the
            configuration to each node of your cluster any hostname conflicts. Save the file.</para>
        </step>
        <step>
          <title>Configure HBase to use <code>node-b</code> as a backup master.</title>
          <para>Create a new file in <filename>conf/</filename> called
            <filename>backup-masters</filename>, and add a new line to it with the hostname for
            <code>node-b</code>. In this demonstration, the hostname is
            <literal>node-b.example.com</literal>.</para>
        </step>
        <step>
          <title>Configure ZooKeeper</title>
          <para>In reality, you should carefully consider your ZooKeeper configuration. You can find
            out more about configuring ZooKeeper in <xref
              linkend="zookeeper" />. This configuration will direct HBase to start and manage a
            ZooKeeper instance on each node of the cluster.</para>
          <para>On <code>node-a</code>, edit <filename>conf/hbase-site.xml</filename> and add the
            following properties.</para>
          <programlisting><![CDATA[
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>node-a.example.com,node-b.example.com,node-c.example.com</value>
</property>
<property>
  <name>hbase.zookeeper.property.dataDir</name>
  <value>/usr/local/zookeeper</value>
</property>
            ]]></programlisting>
        </step>
        <step>
          <para>Everywhere in your configuration that you have referred to <code>node-a</code> as
            <literal>localhost</literal>, change the reference to point to the hostname that
            the other nodes will use to refer to <code>node-a</code>. In these examples, the
            hostname is <literal>node-a.example.com</literal>.</para>
        </step>
      </procedure>
      <procedure>
        <title>Prepare <code>node-b</code> and <code>node-c</code></title>
        <para><code>node-b</code> will run a backup master server and a ZooKeeper instance.</para>
        <step>
          <title>Download and unpack HBase.</title>
          <para>Download and unpack HBase to <code>node-b</code>, just as you did for the standalone
          and pseudo-distributed quickstarts.</para>
        </step>
        <step>
          <title>Copy the configuration files from <code>node-a</code> to <code>node-b</code>.and
            <code>node-c</code>.</title>
          <para>Each node of your cluster needs to have the same configuration information. Copy the
            contents of the <filename>conf/</filename> directory to the <filename>conf/</filename>
            directory on <code>node-b</code> and <code>node-c</code>.</para>
        </step>
      </procedure>

      <procedure>
        <title>Start and Test Your Cluster</title>
        <step>
          <title>Be sure HBase is not running on any node.</title>
          <para>If you forgot to stop HBase from previous testing, you will have errors. Check to
            see whether HBase is running on any of your nodes by using the <command>jps</command>
            command. Look for the processes <literal>HMaster</literal>,
            <literal>HRegionServer</literal>, and <literal>HQuorumPeer</literal>. If they exist,
            kill them.</para>
        </step>
        <step>
          <title>Start the cluster.</title>
          <para>On <code>node-a</code>, issue the <command>start-hbase.sh</command> command. Your
            output will be similar to that below.</para>
          <screen>
$ <userinput>bin/start-hbase.sh</userinput>
node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
          </screen>
          <para>ZooKeeper starts first, followed by the master, then the RegionServers, and finally
            the backup masters. </para>
        </step>
        <step>
          <title>Verify that the processes are running.</title>
          <para>On each node of the cluster, run the <command>jps</command> command and verify that
            the correct processes are running on each server. You may see additional Java processes
            running on your servers as well, if they are used for other purposes.</para>
          <example>
            <title><code>node-a</code> <command>jps</command> Output</title>
            <screen>
$ <userinput>jps</userinput>
20355 Jps
20071 HQuorumPeer
20137 HMaster
            </screen>
          </example>
          <example>
            <title><code>node-b</code> <command>jps</command> Output</title>
            <screen>
$ <userinput>jps</userinput>
15930 HRegionServer
16194 Jps
15838 HQuorumPeer
16010 HMaster
            </screen>
          </example>
          <example>
            <title><code>node-c</code> <command>jps</command> Output</title>
            <screen>
$ <userinput>jps</userinput>
13901 Jps
13639 HQuorumPeer
13737 HRegionServer
            </screen>
          </example>
          <note>
            <title>ZooKeeper Process Name</title>
            <para>The <code>HQuorumPeer</code> process is a ZooKeeper instance which is controlled
              and started by HBase. If you use ZooKeeper this way, it is limited to one instance per
              cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of
              HBase, the process is called <code>QuorumPeer</code>. For more about ZooKeeper
              configuration, including using an external ZooKeeper instance with HBase, see <xref
                linkend="zookeeper" />.</para>
          </note>
        </step>
        <step>
          <title>Browse to the Web UI.</title>
          <note>
            <title>Web UI Port Changes</title>
            <para>In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from
              60010 for the Master and 60030 for each RegionServer to 16610 for the Master and 16030
              for the RegionServer.</para>
          </note>
          <para>If everything is set up correctly, you should be able to connect to the UI for the
            Master <literal>http://node-a.example.com:60110/</literal> or the secondary master at
              <literal>http://node-b.example.com:60110/</literal> for the secondary master, using a
            web browser. If you can connect via <code>localhost</code> but not from another host,
            check your firewall rules. You can see the web UI for each of the RegionServers at port
            60130 of their IP addresses, or by clicking their links in the web UI for the
            Master.</para>
        </step>
        <step>
          <title>Test what happens when nodes or services disappear.</title>
          <para>With a three-node cluster like you have configured, things will not be very
            resilient. Still, you can test what happens when the primary Master or a RegionServer
            disappears, by killing the processes and watching the logs.</para>
        </step>
      </procedure>
    </section>

    <section>
      <title>Where to go next</title>

      <para>The next chapter, <xref
          linkend="configuration" />, gives more information about the different HBase run modes,
        system requirements for running HBase, and critical configuration areas for setting up a
        distributed HBase cluster.</para>
    </section>
  </section>
</chapter>