708 lines
36 KiB
XML
708 lines
36 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<chapter
|
|
version="5.0"
|
|
xml:id="getting_started"
|
|
xmlns="http://docbook.org/ns/docbook"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:svg="http://www.w3.org/2000/svg"
|
|
xmlns:m="http://www.w3.org/1998/Math/MathML"
|
|
xmlns:html="http://www.w3.org/1999/xhtml"
|
|
xmlns:db="http://docbook.org/ns/docbook">
|
|
<!--
|
|
/**
|
|
* Licensed to the Apache Software Foundation (ASF) under one
|
|
* or more contributor license agreements. See the NOTICE file
|
|
* distributed with this work for additional information
|
|
* regarding copyright ownership. The ASF licenses this file
|
|
* to you under the Apache License, Version 2.0 (the
|
|
* "License"); you may not use this file except in compliance
|
|
* with the License. You may obtain a copy of the License at
|
|
*
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
*
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
* See the License for the specific language governing permissions and
|
|
* limitations under the License.
|
|
*/
|
|
-->
|
|
<title>Getting Started</title>
|
|
|
|
<section>
|
|
<title>Introduction</title>
|
|
|
|
<para><xref
|
|
linkend="quickstart" /> will get you up and running on a single-node, standalone instance of
|
|
HBase. </para>
|
|
</section>
|
|
|
|
<section
|
|
xml:id="quickstart">
|
|
<title>Quick Start - Standalone HBase</title>
|
|
|
|
<para>This guide describes setup of a standalone HBase instance running against the local
|
|
filesystem. This is not an appropriate configuration for a production instance of HBase, but
|
|
will allow you to experiment with HBase. This section shows you how to create a table in
|
|
HBase using the <command>hbase shell</command> CLI, insert rows into the table, perform put
|
|
and scan operations against the table, enable or disable the table, and start and stop HBase.
|
|
Apart from downloading HBase, this procedure should take less than 10 minutes.</para>
|
|
<warning
|
|
xml:id="local.fs.durability">
|
|
<title>Local Filesystem and Durability</title>
|
|
<para><emphasis>The below advice is for HBase 0.98.2 and earlier releases only. This is fixed
|
|
in HBase 0.98.3 and beyond. See <link
|
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-11272">HBASE-11272</link> and
|
|
<link
|
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-11218">HBASE-11218</link>.</emphasis></para>
|
|
<para>Using HBase with a local filesystem does not guarantee durability. The HDFS
|
|
local filesystem implementation will lose edits if files are not properly closed. This is
|
|
very likely to happen when you are experimenting with new software, starting and stopping
|
|
the daemons often and not always cleanly. You need to run HBase on HDFS
|
|
to ensure all writes are preserved. Running against the local filesystem is intended as a
|
|
shortcut to get you familiar with how the general system works, as the very first phase of
|
|
evaluation. See <link
|
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-3696" /> and its associated issues
|
|
for more details about the issues of running on the local filesystem.</para>
|
|
</warning>
|
|
<note
|
|
xml:id="loopback.ip.getting.started">
|
|
<title>Loopback IP - HBase 0.94.x and earlier</title>
|
|
<para><emphasis>The below advice is for hbase-0.94.x and older versions only. This is fixed in
|
|
hbase-0.96.0 and beyond.</emphasis></para>
|
|
|
|
<para>Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu
|
|
and some other distributions default to 127.0.1.1 and this will cause problems for you . See <link
|
|
xlink:href="http://blog.devving.com/why-does-hbase-care-about-etchosts/">Why does HBase
|
|
care about /etc/hosts?</link> for detail.</para>
|
|
<example>
|
|
<title>Example /etc/hosts File for Ubuntu</title>
|
|
<para>The following <filename>/etc/hosts</filename> file works correctly for HBase 0.94.x
|
|
and earlier, on Ubuntu. Use this as a template if you run into trouble.</para>
|
|
<screen>
|
|
127.0.0.1 localhost
|
|
127.0.0.1 ubuntu.ubuntu-domain ubuntu
|
|
</screen>
|
|
</example>
|
|
</note>
|
|
|
|
<section>
|
|
<title>JDK Version Requirements</title>
|
|
<para>HBase requires that a JDK be installed. See <xref linkend="java" /> for information
|
|
about supported JDK versions.</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Get Started with HBase</title>
|
|
|
|
<procedure>
|
|
<title>Download, Configure, and Start HBase</title>
|
|
<step>
|
|
<para>Choose a download site from this list of <link
|
|
xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download Mirrors</link>.
|
|
Click on the suggested top link. This will take you to a mirror of <emphasis>HBase
|
|
Releases</emphasis>. Click on the folder named <filename>stable</filename> and then
|
|
download the binary file that ends in <filename>.tar.gz</filename> to your local filesystem. Be
|
|
sure to choose the version that corresponds with the version of Hadoop you are likely to use
|
|
later. In most cases, you should choose the file for Hadoop 2, which will be called something
|
|
like <filename>hbase-0.98.3-hadoop2-bin.tar.gz</filename>. Do not download the file ending in
|
|
<filename>src.tar.gz</filename> for now.</para>
|
|
</step>
|
|
<step>
|
|
<para>Extract the downloaded file, and change to the newly-created directory.</para>
|
|
<screen>
|
|
$ tar xzvf hbase-<![CDATA[<?eval ${project.version}?>]]>-hadoop2-bin.tar.gz
|
|
$ cd hbase-<![CDATA[<?eval ${project.version}?>]]>-hadoop2/
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<para>Edit <filename>conf/hbase-site.xml</filename>, which is the main HBase configuration
|
|
file. At this time, you only need to specify the directory on the local filesystem where
|
|
HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many
|
|
servers are configured to delete the contents of /tmp upon reboot, so you should store
|
|
the data elsewhere. The following configuration will store HBase's data in the
|
|
<filename>hbase</filename> directory, in the home directory of the user called
|
|
<systemitem>testuser</systemitem>. Paste the <markup><property></markup> tags beneath the
|
|
<markup><configuration></markup> tags, which should be empty in a new HBase install.</para>
|
|
<example>
|
|
<title>Example <filename>hbase-site.xml</filename> for Standalone HBase</title>
|
|
<programlisting><![CDATA[
|
|
<configuration>
|
|
<property>
|
|
<name>hbase.rootdir</name>
|
|
<value>file:///home/testuser/hbase</value>
|
|
</property>
|
|
<property>
|
|
<name>hbase.zookeeper.property.dataDir</name>
|
|
<value>/home/testuser/zookeeper</value>
|
|
</property>
|
|
</configuration>
|
|
]]>
|
|
</programlisting>
|
|
</example>
|
|
<para>You do not need to create the HBase data directory. HBase will do this for you. If
|
|
you create the directory, HBase will attempt to do a migration, which is not what you
|
|
want.</para>
|
|
</step>
|
|
<step xml:id="start_hbase">
|
|
<para>The <filename>bin/start-hbase.sh</filename> script is provided as a convenient way
|
|
to start HBase. Issue the command, and if all goes well, a message is logged to standard
|
|
output showing that HBase started successfully. You can use the <command>jps</command>
|
|
command to verify that you have one running process called <literal>HMaster</literal>
|
|
and at least one called <literal>HRegionServer</literal>.</para>
|
|
<note><para>Java needs to be installed and available. If you get an error indicating that
|
|
Java is not installed, but it is on your system, perhaps in a non-standard location,
|
|
edit the <filename>conf/hbase-env.sh</filename> file and modify the
|
|
<envar>JAVA_HOME</envar> setting to point to the directory that contains
|
|
<filename>bin/java</filename> your system.</para></note>
|
|
</step>
|
|
</procedure>
|
|
|
|
<procedure xml:id="shell_exercises">
|
|
<title>Use HBase For the First Time</title>
|
|
<step>
|
|
<title>Connect to HBase.</title>
|
|
<para>Connect to your running instance of HBase using the <command>hbase shell</command>
|
|
command, located in the <filename>bin/</filename> directory of your HBase
|
|
install. In this example, some usage and version information that is printed when you
|
|
start HBase Shell has been omitted. The HBase Shell prompt ends with a
|
|
<literal>></literal> character.</para>
|
|
<screen>
|
|
$ <userinput>./bin/hbase shell</userinput>
|
|
hbase(main):001:0>
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>Display HBase Shell Help Text.</title>
|
|
<para>Type <literal>help</literal> and press Enter, to display some basic usage
|
|
information for HBase Shell, as well as several example commands. Notice that table
|
|
names, rows, columns all must be enclosed in quote characters.</para>
|
|
</step>
|
|
<step>
|
|
<title>Create a table.</title>
|
|
<para>Use the <code>create</code> command to create a new table. You must specify the
|
|
table name and the ColumnFamily name.</para>
|
|
<screen>
|
|
hbase> <userinput>create 'test', 'cf'</userinput>
|
|
0 row(s) in 1.2200 seconds
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>List Information About your Table</title>
|
|
<para>Use the <code>list</code> command to </para>
|
|
<screen>
|
|
hbase> <userinput>list 'test'</userinput>
|
|
TABLE
|
|
test
|
|
1 row(s) in 0.0350 seconds
|
|
|
|
=> ["test"]
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>Put data into your table.</title>
|
|
<para>To put data into your table, use the <code>put</code> command.</para>
|
|
<screen>
|
|
hbase> <userinput>put 'test', 'row1', 'cf:a', 'value1'</userinput>
|
|
0 row(s) in 0.1770 seconds
|
|
|
|
hbase> <userinput>put 'test', 'row2', 'cf:b', 'value2'</userinput>
|
|
0 row(s) in 0.0160 seconds
|
|
|
|
hbase> <userinput>put 'test', 'row3', 'cf:c', 'value3'</userinput>
|
|
0 row(s) in 0.0260 seconds
|
|
</screen>
|
|
<para>Here, we insert three values, one at a time. The first insert is at
|
|
<literal>row1</literal>, column <literal>cf:a</literal>, with a value of
|
|
<literal>value1</literal>. Columns in HBase are comprised of a column family prefix,
|
|
<literal>cf</literal> in this example, followed by a colon and then a column qualifier
|
|
suffix, <literal>a</literal> in this case.</para>
|
|
</step>
|
|
<step>
|
|
<title>Scan the table for all data at once.</title>
|
|
<para>One of the ways to get data from HBase is to scan. Use the <command>scan</command>
|
|
command to scan the table for data. You can limit your scan, but for now, all data is
|
|
fetched.</para>
|
|
<screen>
|
|
hbase> <userinput>scan 'test'</userinput>
|
|
ROW COLUMN+CELL
|
|
row1 column=cf:a, timestamp=1403759475114, value=value1
|
|
row2 column=cf:b, timestamp=1403759492807, value=value2
|
|
row3 column=cf:c, timestamp=1403759503155, value=value3
|
|
3 row(s) in 0.0440 seconds
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>Get a single row of data.</title>
|
|
<para>To get a single row of data at a time, use the <command>get</command> command.</para>
|
|
<screen>
|
|
hbase> <userinput>get 'test', 'row1'</userinput>
|
|
COLUMN CELL
|
|
cf:a timestamp=1403759475114, value=value1
|
|
1 row(s) in 0.0230 seconds
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>Disable a table.</title>
|
|
<para>If you want to delete a table or change its settings, as well as in some other
|
|
situations, you need to disable the table first, using the <code>disable</code>
|
|
command. You can re-enable it using the <code>enable</code> command.</para>
|
|
<screen>
|
|
hbase> disable 'test'
|
|
0 row(s) in 1.6270 seconds
|
|
|
|
hbase> enable 'test'
|
|
0 row(s) in 0.4500 seconds
|
|
</screen>
|
|
<para>Disable the table again if you tested the <command>enable</command> command above:</para>
|
|
<screen>
|
|
hbase> disable 'test'
|
|
0 row(s) in 1.6270 seconds
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>Drop the table.</title>
|
|
<para>To drop (delete) a table, use the <code>drop</code> command.</para>
|
|
<screen>
|
|
hbase> drop 'test'
|
|
0 row(s) in 0.2900 seconds
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>Exit the HBase Shell.</title>
|
|
<para>To exit the HBase Shell and disconnect from your cluster, use the
|
|
<command>quit</command> command. HBase is still running in the background.</para>
|
|
</step>
|
|
</procedure>
|
|
|
|
<procedure
|
|
xml:id="stopping">
|
|
<title>Stop HBase</title>
|
|
<step>
|
|
<para>In the same way that the <filename>bin/start-hbase.sh</filename> script is provided
|
|
to conveniently start all HBase daemons, the <filename>bin/stop-hbase.sh</filename>
|
|
script stops them.</para>
|
|
<screen>
|
|
$ ./bin/stop-hbase.sh
|
|
stopping hbase....................
|
|
$
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<para>After issuing the command, it can take several minutes for the processes to shut
|
|
down. Use the <command>jps</command> to be sure that the HMaster and HRegionServer
|
|
processes are shut down.</para>
|
|
</step>
|
|
</procedure>
|
|
</section>
|
|
|
|
<section xml:id="quickstart-pseudo">
|
|
<title>Intermediate - Pseudo-Distributed Local Install</title>
|
|
<para>After working your way through <xref linkend="quickstart" />, you can re-configure HBase
|
|
to run in pseudo-distributed mode. Pseudo-distributed mode means
|
|
that HBase still runs completely on a single host, but each HBase daemon (HMaster,
|
|
HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the
|
|
<code>hbase.rootdir</code> property as described in <xref linkend="quickstart" />, your data
|
|
is still stored in <filename>/tmp/</filename>. In this walk-through, we store your data in
|
|
HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to
|
|
continue storing your data in the local filesystem.</para>
|
|
<note>
|
|
<title>Hadoop Configuration</title>
|
|
<para>This procedure assumes that you have configured Hadoop and HDFS on your local system
|
|
and or a remote system, and that they are running and available. It also assumes you are
|
|
using Hadoop 2. Currently, the documentation on the Hadoop website does not include a
|
|
quick start for Hadoop 2, but the guide at <link
|
|
xlink:href="http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide">http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide</link>
|
|
is a good starting point.</para>
|
|
</note>
|
|
<procedure>
|
|
<step>
|
|
<title>Stop HBase if it is running.</title>
|
|
<para>If you have just finished <xref linkend="quickstart" /> and HBase is still running,
|
|
stop it. This procedure will create a totally new directory where HBase will store its
|
|
data, so any databases you created before will be lost.</para>
|
|
</step>
|
|
<step>
|
|
<title>Configure HBase.</title>
|
|
<para>
|
|
Edit the <filename>hbase-site.xml</filename> configuration. First, add the following
|
|
property, which directs HBase to run in distributed mode, with one JVM instance per
|
|
daemon.
|
|
</para>
|
|
<programlisting><![CDATA[
|
|
<property>
|
|
<name>hbase.cluster.distributed</name>
|
|
<value>true</value>
|
|
</property>
|
|
]]></programlisting>
|
|
<para>Next, change the <code>hbase.rootdir</code> from the local filesystem to the address
|
|
of your HDFS instance, using the <code>hdfs:////</code> URI syntax. In this example,
|
|
HDFS is running on the localhost at port 8020.</para>
|
|
<programlisting><![CDATA[
|
|
<property>
|
|
<name>hbase.rootdir</name>
|
|
<value>hdfs://localhost:8020/hbase</value>
|
|
</property>
|
|
]]>
|
|
</programlisting>
|
|
<para>You do not need to create the directory in HDFS. HBase will do this for you. If you
|
|
create the directory, HBase will attempt to do a migration, which is not what you
|
|
want.</para>
|
|
</step>
|
|
<step>
|
|
<title>Start HBase.</title>
|
|
<para>Use the <filename>bin/start-hbase.sh</filename> command to start HBase. If your
|
|
system is configured correctly, the <command>jps</command> command should show the
|
|
HMaster and HRegionServer processes running.</para>
|
|
</step>
|
|
<step>
|
|
<title>Check the HBase directory in HDFS.</title>
|
|
<para>If everything worked correctly, HBase created its directory in HDFS. In the
|
|
configuration above, it is stored in <filename>/hbase/</filename> on HDFS. You can use
|
|
the <command>hadoop fs</command> command in Hadoop's <filename>bin/</filename> directory
|
|
to list this directory.</para>
|
|
<screen>
|
|
$ <userinput>./bin/hadoop fs -ls /hbase</userinput>
|
|
Found 7 items
|
|
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
|
|
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
|
|
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
|
|
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
|
|
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
|
|
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
|
|
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>Create a table and populate it with data.</title>
|
|
<para>You can use the HBase Shell to create a table, populate it with data, scan and get
|
|
values from it, using the same procedure as in <xref linkend="shell_exercises" />.</para>
|
|
</step>
|
|
<step>
|
|
<title>Start and stop a backup HBase Master (HMaster) server.</title>
|
|
<note>
|
|
<para>Running multiple HMaster instances on the same hardware does not make sense in a
|
|
production environment, in the same way that running a pseudo-distributed cluster does
|
|
not make sense for production. This step is offered for testing and learning purposes
|
|
only.</para>
|
|
</note>
|
|
<para>The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster
|
|
servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster,
|
|
use the <command>local-master-backup.sh</command>. For each backup master you want to
|
|
start, add a parameter representing the port offset for that master. Each HMaster uses
|
|
three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so
|
|
using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The
|
|
following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033,
|
|
and 16015/16025/16035.</para>
|
|
<screen>
|
|
$ ./bin/local-master-backup.sh 2 3 5
|
|
</screen>
|
|
<para>To kill a backup master without killing the entire cluster, you need to find its
|
|
process ID (PID). The PID is stored in a file with a name like
|
|
<filename>/tmp/hbase-<replaceable>USER</replaceable>-<replaceable>X</replaceable>-master.pid</filename>.
|
|
The only contents of the file are the PID. You can use the <command>kill -9</command>
|
|
command to kill that PID. The following command will kill the master with port offset 1,
|
|
but leave the cluster running:</para>
|
|
<screen>
|
|
$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
|
|
</screen>
|
|
</step>
|
|
<step>
|
|
<title>Start and stop additional RegionServers</title>
|
|
<para>The HRegionServer manages the data in its StoreFiles as directed by the HMaster.
|
|
Generally, one HRegionServer runs per node in the cluster. Running multiple
|
|
HRegionServers on the same system can be useful for testing in pseudo-distributed mode.
|
|
The <command>local-regionservers.sh</command> command allows you to run multiple
|
|
RegionServers. It works in a similar way to the
|
|
<command>local-master-backup.sh</command> command, in that each parameter you provide
|
|
represents the port offset for an instance. Each RegionServer requires two ports, and
|
|
the default ports are 16020 and 16030. However, the base ports for additional RegionServers
|
|
are not the default ports since the default ports are used by the HMaster, which is also
|
|
a RegionServer since HBase version 1.0.0. The base ports are 16200 and 16300 instead.
|
|
You can run 99 additional RegionServers that are not a HMaster or backup HMaster,
|
|
on a server. The following command starts four additional RegionServers, running on
|
|
sequential ports starting at 16202/16302 (base ports 16200/16300 plus 2).</para>
|
|
<screen>
|
|
$ .bin/local-regionservers.sh start 2 3 4 5
|
|
</screen>
|
|
<para>To stop a RegionServer manually, use the <command>local-regionservers.sh</command>
|
|
command with the <literal>stop</literal> parameter and the offset of the server to
|
|
stop.</para>
|
|
<screen>$ .bin/local-regionservers.sh stop 3</screen>
|
|
</step>
|
|
<step>
|
|
<title>Stop HBase.</title>
|
|
<para>You can stop HBase the same way as in the <xref
|
|
linkend="quickstart" /> procedure, using the
|
|
<filename>bin/stop-hbase.sh</filename> command.</para>
|
|
</step>
|
|
</procedure>
|
|
</section>
|
|
|
|
<section xml:id="quickstart-fully-distributed">
|
|
<title>Advanced - Fully Distributed</title>
|
|
<para>In reality, you need a fully-distributed configuration to fully test HBase and to use it
|
|
in real-world scenarios. In a distributed configuration, the cluster contains multiple
|
|
nodes, each of which runs one or more HBase daemon. These include primary and backup Master
|
|
instances, multiple Zookeeper nodes, and multiple RegionServer nodes.</para>
|
|
<para>This advanced quickstart adds two more nodes to your cluster. The architecture will be
|
|
as follows:</para>
|
|
<table>
|
|
<title>Distributed Cluster Demo Architecture</title>
|
|
<tgroup cols="4">
|
|
<thead>
|
|
<row>
|
|
<entry>Node Name</entry>
|
|
<entry>Master</entry>
|
|
<entry>ZooKeeper</entry>
|
|
<entry>RegionServer</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>node-a.example.com</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
<entry>no</entry>
|
|
</row>
|
|
<row>
|
|
<entry>node-b.example.com</entry>
|
|
<entry>backup</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
</row>
|
|
<row>
|
|
<entry>node-c.example.com</entry>
|
|
<entry>no</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
<para>This quickstart assumes that each node is a virtual machine and that they are all on the
|
|
same network. It builds upon the previous quickstart, <xref linkend="quickstart-pseudo" />,
|
|
assuming that the system you configured in that procedure is now <code>node-a</code>. Stop HBase on <code>node-a</code>
|
|
before continuing.</para>
|
|
<note>
|
|
<para>Be sure that all the nodes have full access to communicate, and that no firewall rules
|
|
are in place which could prevent them from talking to each other. If you see any errors like
|
|
<literal>no route to host</literal>, check your firewall.</para>
|
|
</note>
|
|
<procedure xml:id="passwordless.ssh.quickstart">
|
|
<title>Configure Password-Less SSH Access</title>
|
|
<para><code>node-a</code> needs to be able to log into <code>node-b</code> and
|
|
<code>node-c</code> (and to itself) in order to start the daemons. The easiest way to accomplish this is
|
|
to use the same username on all hosts, and configure password-less SSH login from
|
|
<code>node-a</code> to each of the others. </para>
|
|
<step>
|
|
<title>On <code>node-a</code>, generate a key pair.</title>
|
|
<para>While logged in as the user who will run HBase, generate a SSH key pair, using the
|
|
following command:
|
|
</para>
|
|
<screen>$ ssh-keygen -t rsa</screen>
|
|
<para>If the command succeeds, the location of the key pair is printed to standard output.
|
|
The default name of the public key is <filename>id_rsa.pub</filename>.</para>
|
|
</step>
|
|
<step>
|
|
<title>Create the directory that will hold the shared keys on the other nodes.</title>
|
|
<para>On <code>node-b</code> and <code>node-c</code>, log in as the HBase user and create
|
|
a <filename>.ssh/</filename> directory in the user's home directory, if it does not
|
|
already exist. If it already exists, be aware that it may already contain other keys.</para>
|
|
</step>
|
|
<step>
|
|
<title>Copy the public key to the other nodes.</title>
|
|
<para>Securely copy the public key from <code>node-a</code> to each of the nodes, by
|
|
using the <command>scp</command> or some other secure means. On each of the other nodes,
|
|
create a new file called <filename>.ssh/authorized_keys</filename> <emphasis>if it does
|
|
not already exist</emphasis>, and append the contents of the
|
|
<filename>id_rsa.pub</filename> file to the end of it. Note that you also need to do
|
|
this for <code>node-a</code> itself.</para>
|
|
<screen>$ cat id_rsa.pub >> ~/.ssh/authorized_keys</screen>
|
|
</step>
|
|
<step>
|
|
<title>Test password-less login.</title>
|
|
<para>If you performed the procedure correctly, if you SSH from <code>node-a</code> to
|
|
either of the other nodes, using the same username, you should not be prompted for a password.
|
|
</para>
|
|
</step>
|
|
<step>
|
|
<para>Since <code>node-b</code> will run a backup Master, repeat the procedure above,
|
|
substituting <code>node-b</code> everywhere you see <code>node-a</code>. Be sure not to
|
|
overwrite your existing <filename>.ssh/authorized_keys</filename> files, but concatenate
|
|
the new key onto the existing file using the <code>>></code> operator rather than
|
|
the <code>></code> operator.</para>
|
|
</step>
|
|
</procedure>
|
|
|
|
<procedure>
|
|
<title>Prepare <code>node-a</code></title>
|
|
<para><code>node-a</code> will run your primary master and ZooKeeper processes, but no
|
|
RegionServers.</para>
|
|
<step>
|
|
<title>Stop the RegionServer from starting on <code>node-a</code>.</title>
|
|
<para>Edit <filename>conf/regionservers</filename> and remove the line which contains
|
|
<literal>localhost</literal>. Add lines with the hostnames or IP addresses for
|
|
<code>node-b</code> and <code>node-c</code>. Even if you did want to run a
|
|
RegionServer on <code>node-a</code>, you should refer to it by the hostname the other
|
|
servers would use to communicate with it. In this case, that would be
|
|
<literal>node-a.example.com</literal>. This enables you to distribute the
|
|
configuration to each node of your cluster any hostname conflicts. Save the file.</para>
|
|
</step>
|
|
<step>
|
|
<title>Configure HBase to use <code>node-b</code> as a backup master.</title>
|
|
<para>Create a new file in <filename>conf/</filename> called
|
|
<filename>backup-masters</filename>, and add a new line to it with the hostname for
|
|
<code>node-b</code>. In this demonstration, the hostname is
|
|
<literal>node-b.example.com</literal>.</para>
|
|
</step>
|
|
<step>
|
|
<title>Configure ZooKeeper</title>
|
|
<para>In reality, you should carefully consider your ZooKeeper configuration. You can find
|
|
out more about configuring ZooKeeper in <xref
|
|
linkend="zookeeper" />. This configuration will direct HBase to start and manage a
|
|
ZooKeeper instance on each node of the cluster.</para>
|
|
<para>On <code>node-a</code>, edit <filename>conf/hbase-site.xml</filename> and add the
|
|
following properties.</para>
|
|
<programlisting><![CDATA[
|
|
<property>
|
|
<name>hbase.zookeeper.quorum</name>
|
|
<value>node-a.example.com,node-b.example.com,node-c.example.com</value>
|
|
</property>
|
|
<property>
|
|
<name>hbase.zookeeper.property.dataDir</name>
|
|
<value>/usr/local/zookeeper</value>
|
|
</property>
|
|
]]></programlisting>
|
|
</step>
|
|
<step>
|
|
<para>Everywhere in your configuration that you have referred to <code>node-a</code> as
|
|
<literal>localhost</literal>, change the reference to point to the hostname that
|
|
the other nodes will use to refer to <code>node-a</code>. In these examples, the
|
|
hostname is <literal>node-a.example.com</literal>.</para>
|
|
</step>
|
|
</procedure>
|
|
<procedure>
|
|
<title>Prepare <code>node-b</code> and <code>node-c</code></title>
|
|
<para><code>node-b</code> will run a backup master server and a ZooKeeper instance.</para>
|
|
<step>
|
|
<title>Download and unpack HBase.</title>
|
|
<para>Download and unpack HBase to <code>node-b</code>, just as you did for the standalone
|
|
and pseudo-distributed quickstarts.</para>
|
|
</step>
|
|
<step>
|
|
<title>Copy the configuration files from <code>node-a</code> to <code>node-b</code>.and
|
|
<code>node-c</code>.</title>
|
|
<para>Each node of your cluster needs to have the same configuration information. Copy the
|
|
contents of the <filename>conf/</filename> directory to the <filename>conf/</filename>
|
|
directory on <code>node-b</code> and <code>node-c</code>.</para>
|
|
</step>
|
|
</procedure>
|
|
|
|
<procedure>
|
|
<title>Start and Test Your Cluster</title>
|
|
<step>
|
|
<title>Be sure HBase is not running on any node.</title>
|
|
<para>If you forgot to stop HBase from previous testing, you will have errors. Check to
|
|
see whether HBase is running on any of your nodes by using the <command>jps</command>
|
|
command. Look for the processes <literal>HMaster</literal>,
|
|
<literal>HRegionServer</literal>, and <literal>HQuorumPeer</literal>. If they exist,
|
|
kill them.</para>
|
|
</step>
|
|
<step>
|
|
<title>Start the cluster.</title>
|
|
<para>On <code>node-a</code>, issue the <command>start-hbase.sh</command> command. Your
|
|
output will be similar to that below.</para>
|
|
<screen>
|
|
$ <userinput>bin/start-hbase.sh</userinput>
|
|
node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
|
|
node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
|
|
node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
|
|
starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
|
|
node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
|
|
node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
|
|
node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
|
|
</screen>
|
|
<para>ZooKeeper starts first, followed by the master, then the RegionServers, and finally
|
|
the backup masters. </para>
|
|
</step>
|
|
<step>
|
|
<title>Verify that the processes are running.</title>
|
|
<para>On each node of the cluster, run the <command>jps</command> command and verify that
|
|
the correct processes are running on each server. You may see additional Java processes
|
|
running on your servers as well, if they are used for other purposes.</para>
|
|
<example>
|
|
<title><code>node-a</code> <command>jps</command> Output</title>
|
|
<screen>
|
|
$ <userinput>jps</userinput>
|
|
20355 Jps
|
|
20071 HQuorumPeer
|
|
20137 HMaster
|
|
</screen>
|
|
</example>
|
|
<example>
|
|
<title><code>node-b</code> <command>jps</command> Output</title>
|
|
<screen>
|
|
$ <userinput>jps</userinput>
|
|
15930 HRegionServer
|
|
16194 Jps
|
|
15838 HQuorumPeer
|
|
16010 HMaster
|
|
</screen>
|
|
</example>
|
|
<example>
|
|
<title><code>node-c</code> <command>jps</command> Output</title>
|
|
<screen>
|
|
$ <userinput>jps</userinput>
|
|
13901 Jps
|
|
13639 HQuorumPeer
|
|
13737 HRegionServer
|
|
</screen>
|
|
</example>
|
|
<note>
|
|
<title>ZooKeeper Process Name</title>
|
|
<para>The <code>HQuorumPeer</code> process is a ZooKeeper instance which is controlled
|
|
and started by HBase. If you use ZooKeeper this way, it is limited to one instance per
|
|
cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of
|
|
HBase, the process is called <code>QuorumPeer</code>. For more about ZooKeeper
|
|
configuration, including using an external ZooKeeper instance with HBase, see <xref
|
|
linkend="zookeeper" />.</para>
|
|
</note>
|
|
</step>
|
|
<step>
|
|
<title>Browse to the Web UI.</title>
|
|
<note>
|
|
<title>Web UI Port Changes</title>
|
|
<para>In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from
|
|
60010 for the Master and 60030 for each RegionServer to 16610 for the Master and 16030
|
|
for the RegionServer.</para>
|
|
</note>
|
|
<para>If everything is set up correctly, you should be able to connect to the UI for the
|
|
Master <literal>http://node-a.example.com:60110/</literal> or the secondary master at
|
|
<literal>http://node-b.example.com:60110/</literal> for the secondary master, using a
|
|
web browser. If you can connect via <code>localhost</code> but not from another host,
|
|
check your firewall rules. You can see the web UI for each of the RegionServers at port
|
|
60130 of their IP addresses, or by clicking their links in the web UI for the
|
|
Master.</para>
|
|
</step>
|
|
<step>
|
|
<title>Test what happens when nodes or services disappear.</title>
|
|
<para>With a three-node cluster like you have configured, things will not be very
|
|
resilient. Still, you can test what happens when the primary Master or a RegionServer
|
|
disappears, by killing the processes and watching the logs.</para>
|
|
</step>
|
|
</procedure>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Where to go next</title>
|
|
|
|
<para>The next chapter, <xref
|
|
linkend="configuration" />, gives more information about the different HBase run modes,
|
|
system requirements for running HBase, and critical configuration areas for setting up a
|
|
distributed HBase cluster.</para>
|
|
</section>
|
|
</section>
|
|
</chapter>
|