Apache HBase Configuration
- This chapter is the Not-So-Quick start guide to Apache HBase configuration. It goes over
- system requirements, Hadoop setup, the different Apache HBase run modes, and the various
- configurations in HBase. Please read this chapter carefully. At a minimum ensure that all have been satisfied. Failure to do so will cause you (and us)
- grief debugging strange errors and/or data loss.
+ This chapter expands upon the chapter to further explain
+ configuration of Apache HBase. Please read this chapter carefully, especially to ensure that your HBase testing and deployment goes
+ smoothly, and prevent data loss.
- Apache HBase uses the same configuration system as Apache Hadoop. To configure a deploy,
- edit a file of environment variables in conf/hbase-env.sh -- this
- configuration is used mostly by the launcher shell scripts getting the cluster off the ground --
- and then add configuration to an XML file to do things like override HBase defaults, tell HBase
- what Filesystem to use, and the location of the ZooKeeper ensemble.
- Be careful editing XML. Make sure you close all elements. Run your file through
- xmllint or similar to ensure well-formedness of your document after an
- edit session.
-
+ Apache HBase uses the same configuration system as Apache Hadoop. All configuration files
+ are located in the conf/ directory, which needs to be kept in sync for each
+ node on your cluster.
+
+
+ HBase Configuration Files
+
+ backup-masters
+
+ Not present by default. A plain-text file which lists hosts on which the Master should
+ start a backup Master process, one host per line.
+
+
+
+ hadoop-metrics2-hbase.properties
+
+ Used to connect HBase Hadoop's Metrics2 framework. See the Hadoop Wiki
+ entry for more information on Metrics2. Contains only commented-out examples by
+ default.
+
+
+
+ hbase-env.cmd and hbase-env.sh
+
+ Script for Windows and Linux / Unix environments to set up the working environment for
+ HBase, including the location of Java, Java options, and other environment variables. The
+ file contains many commented-out examples to provide guidance.
+
+
+
+ hbase-policy.xml
+
+ The default policy configuration file used by RPC servers to make authorization
+ decisions on client requests. Only used if HBase security () is enabled.
+
+
+
+ hbase-site.xml
+
+ The main HBase configuration file. This file specifies configuration options which
+ override HBase's default configuration. You can view (but do not edit) the default
+ configuration file at docs/hbase-default.xml. You can also view the
+ entire effective configuration for your cluster (defaults and overrides) in the
+ HBase Configuration tab of the HBase Web UI.
+
+
+
+ log4j.properties
+
+ Configuration file for HBase logging via log4j.
+
+
+
+ regionservers
+
+ A plain-text file containing a list of hosts which should run a RegionServer in your
+ HBase cluster. By default this file contains the single entry
+ localhost. It should contain a list of hostnames or IP addresses, one
+ per line, and should only contain localhost if each node in your
+ cluster will run a RegionServer on its localhost interface.
+
+
+
+
+
+ Checking XML Validity
+ When you edit XML, it is a good idea to use an XML-aware editor to be sure that your
+ syntax is correct and your XML is well-formed. You can also use the xmllint
+ utility to check that your XML is well-formed. By default, xmllint re-flows
+ and prints the XML to standard output. To check for well-formedness and only print output if
+ errors exist, use the command xmllint -noout
+ filename.xml.
+
- When running in distributed mode, after you make an edit to an HBase configuration, make
- sure you copy the content of the conf directory to all nodes of the
- cluster. HBase will not do this for you. Use rsync. For most configuration, a
- restart is needed for servers to pick up changes (caveat dynamic config. to be described later
- below).
+
+ Keep Configuration In Sync Across the Cluster
+ When running in distributed mode, after you make an edit to an HBase configuration, make
+ sure you copy the content of the conf/ directory to all nodes of the
+ cluster. HBase will not do this for you. Use rsync, scp,
+ or another secure mechanism for copying the configuration files to your nodes. For most
+ configuration, a restart is needed for servers to pick up changes An exception is dynamic
+ configuration. to be described later below.
+ Basic PrerequisitesThis section lists required services and some required system configuration.
- Java
- HBase requires at least Java 6 from Oracle. The following table lists which JDK version are
- compatible with each version of HBase.
-
-
-
-
- HBase Version
- JDK 6
- JDK 7
- JDK 8
-
-
-
-
- 1.0
- Not Supported
- yes
- Running with JDK 8 will work but is not well tested.
-
-
- 0.98
- yes
- yes
- Running with JDK 8 works but is not well tested. Building with JDK 8
- would require removal of the deprecated remove() method of the PoolMap class and is
- under consideration. See ee HBASE-7608 for
- more information about JDK 8 support.
-
-
- 0.96
- yes
- yes
-
-
-
- 0.94
- yes
- yes
-
-
-
-
-
-
+
+ HBase requires at least Java 6 from Oracle. The following table lists
+ which JDK version are compatible with each version of HBase.
+
+
+
+
+ HBase Version
+ JDK 6
+ JDK 7
+ JDK 8
+
+
+
+
+ 1.0
+ Not Supported
+ yes
+ Running with JDK 8 will work but is not well tested.
+
+
+ 0.98
+ yes
+ yes
+ Running with JDK 8 works but is not well tested. Building with JDK 8 would
+ require removal of the deprecated remove() method of the PoolMap class and is under
+ consideration. See ee HBASE-7608
+ for more information about JDK 8 support.
+
+
+ 0.96
+ yes
+ yes
+
+
+
+ 0.94
+ yes
+ yes
+
+
+
+
+
-
- Operating System
- Operating System Utilities
+
- ssh
-
- ssh must be installed and sshd must be running
- to use Hadoop's scripts to manage remote Hadoop and HBase daemons. You must be able to ssh
- to all nodes, including your local node, using passwordless login (Google "ssh
- passwordless login"). If on mac osx, see the section, SSH:
- Setting up Remote Desktop and Enabling Self-Login on the hadoop wiki.
-
-
- ssh
+
+ HBase uses the Secure Shell (ssh) command and utilities extensively to communicate
+ between cluster nodes. Each server in the cluster must be running ssh
+ so that the Hadoop and HBase daemons can be managed. You must be able to connect to all
+ nodes via SSH, including the local node, from the Master as well as any backup Master,
+ using a shared key rather than a password. You can see the basic methodology for such a
+ set-up in Linux or Unix systems at . If your cluster nodes use OS X, see the
+ section, SSH:
+ Setting up Remote Desktop and Enabling Self-Login on the Hadoop wiki.
+
+
+
- DNS
+ DNS
+
+ HBase uses the local hostname to self-report its IP address. Both forward and
+ reverse DNS resolving must work in versions of HBase previous to 0.92.0.
+ The hadoop-dns-checker
+ tool can be used to verify DNS is working correctly on the cluster. The project
+ README file provides detailed instructions on usage.
+
- HBase uses the local hostname to self-report its IP address. Both forward and reverse
- DNS resolving must work in versions of HBase previous to 0.92.0
- The hadoop-dns-checker
- tool can be used to verify DNS is working correctly on the cluster. The project README
- file provides detailed instructions on usage.
- .
+ If your server has multiple network interfaces, HBase defaults to using the
+ interface that the primary hostname resolves to. To override this behavior, set the
+ hbase.regionserver.dns.interface property to a different interface. This
+ will only work if each server in your cluster uses the same network interface
+ configuration.
- If your machine has multiple interfaces, HBase will use the interface that the primary
- hostname resolves to.
-
- If this is insufficient, you can set
- hbase.regionserver.dns.interface to indicate the primary interface.
- This only works if your cluster configuration is consistent and every host has the same
- network interface configuration.
-
- Another alternative is setting hbase.regionserver.dns.nameserver to
- choose a different nameserver than the system wide default.
-
- To choose a different DNS nameserver than the system default, set the
+ hbase.regionserver.dns.nameserver property to the IP address of
+ that nameserver.
+
+
+
- Loopback IP
- Previous to hbase-0.96.0, HBase expects the loopback IP address to be 127.0.0.1. See
-
-
- Loopback IP
+
+ Prior to hbase-0.96.0, HBase only used the IP address
+ 127.0.0.1 to refer to localhost, and this could
+ not be configured. See .
+
+
+
- NTP
+ NTP
+
+ The clocks on cluster nodes should be synchronized. A small amount of variation is
+ acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time
+ synchronization is one of the first things to check if you see unexplained problems in
+ your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or
+ another time-synchronization mechanism, on your cluster, and that all nodes look to the
+ same service for time synchronization. See the Basic NTP
+ Configuration at The Linux Documentation Project (TLDP)
+ to set up NTP.
+
+
- The clocks on cluster members should be in basic alignments. Some skew is tolerable
- but wild skew could generate odd behaviors. Run NTP on your
- cluster, or an equivalent.
-
- If you are having problems querying data, or "weird" cluster operations, check system
- time!
-
-
-
-
- ulimit
+ Limits on Number of Files and Processes (ulimit)
+ ulimit
- and nproc
+ nproc
-
+
- Apache HBase is a database. It uses a lot of files all at the same time. The default
- ulimit -n -- i.e. user file limit -- of 1024 on most *nix systems is insufficient (On mac
- os x its 256). Any significant amount of loading will lead you to . You may also notice errors such as the
- following:
-
+
+ Apache HBase is a database. It requires the ability to open a large number of files
+ at once. Many Linux distributions limit the number of files a single user is allowed to
+ open to 1024 (or 256 on older versions of OS X).
+ You can check this limit on your servers by running the command ulimit
+ -n when logged in as the user which runs HBase. See for some of the problems you may
+ experience if the limit is too low. You may also notice errors such as the
+ following:
+
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
-
- Do yourself a favor and change the upper bound on the number of file descriptors. Set
- it to north of 10k. The math runs roughly as follows: per ColumnFamily there is at least
- one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the average
- number of StoreFiles per ColumnFamily times the number of regions per RegionServer. For
- example, assuming that a schema had 3 ColumnFamilies per region with an average of 3
- StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM will open
- 3 * 3 * 100 = 900 file descriptors (not counting open jar files, config files, etc.)
- You should also up the hbase users' nproc setting; under load, a
- low-nproc setting could manifest as OutOfMemoryError.
- See Jack Levin's major hdfs issues note up on the user list.
-
-
- The requirement that a database requires upping of system limits is not peculiar
- to Apache HBase. See for example the section Setting Shell Limits for the
- Oracle User in Short Guide
- to install Oracle 10 on Linux.
-
+
+ It is recommended to raise the ulimit to at least 10,000, but more likely 10,240,
+ because the value is usually expressed in multiples of 1024. Each ColumnFamily has at
+ least one StoreFile, and possibly more than 6 StoreFiles if the region is under load.
+ The number of open files required depends upon the number of ColumnFamilies and the
+ number of regions. The following is a rough formula for calculating the potential number
+ of open files on a RegionServer.
+
+ Calculate the Potential Number of Open Files
+ (StoreFiles per ColumnFamily) x (regions per RegionServer)
+
+ For example, assuming that a schema had 3 ColumnFamilies per region with an average
+ of 3 StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM
+ will open 3 * 3 * 100 = 900 file descriptors, not counting open JAR files, configuration
+ files, and others. Opening a file does not take many resources, and the risk of allowing
+ a user to open too many files is minimal.
+ Another related setting is the number of processes a user is allowed to run at once.
+ In Linux and Unix, the number of processes is set using the ulimit -u
+ command. This should not be confused with the nproc command, which
+ controls the number of CPUs available to a given user. Under load, a
+ nproc that is too low can cause OutOfMemoryError exceptions. See
+ Jack Levin's major
+ hdfs issues thread on the hbase-users mailing list, from 2011.
+ Configuring the fmaximum number of ile descriptors and processes for the user who is
+ running the HBase process is an operating system configuration, rather than an HBase
+ configuration. It is also important to be sure that the settings are changed for the
+ user that actually runs HBase. To see which user started HBase, and that user's ulimit
+ configuration, look at the first line of the HBase log for that instance.
+ A useful read setting config on you hadoop cluster is Aaron Kimballs' Configuration
+ Parameters: What can you just ignore?
+
+
+ ulimit Settings on Ubuntu
+ To configure ulimit settings on Ubuntu, edit
+ /etc/security/limits.conf, which is a space-delimited file with
+ four columns. Refer to the man
+ page for limits.conf for details about the format of this file. In the
+ following example, the first line sets both soft and hard limits for the number of
+ open files (nofile) to 32768 for the operating
+ system user with the username hadoop. The second line sets the
+ number of processes to 32000 for the same user.
+
+
+hadoop - nofile 32768
+hadoop - nproc 32000
+
+ The settings are only applied if the Pluggable Authentication Module (PAM)
+ environment is directed to use them. To configure PAM to use these limits, be sure that
+ the /etc/pam.d/common-session file contains the following line:
+ session required pam_limits.so
+
+
- To be clear, upping the file descriptors and nproc for the user who is running the
- HBase process is an operating system configuration, not an HBase configuration. Also, a
- common mistake is that administrators will up the file descriptors for a particular user
- but for whatever reason, HBase will be running as some one else. HBase prints in its logs
- as the first line the ulimit its seeing. Ensure its correct.
- A useful read setting config on you hadoop cluster is Aaron Kimballs' Configuration
- Parameters: What can you just ignore?
-
-
-
- ulimit on Ubuntu
-
- If you are on Ubuntu you will need to make the following changes:
-
- In the file /etc/security/limits.conf add a line like:
- hadoop - nofile 32768
- Replace hadoop with whatever user is running Hadoop and HBase. If
- you have separate users, you will need 2 entries, one for each user. In the same file
- set nproc hard and soft limits. For example:
- hadoop soft/hard nproc 32000
- In the file /etc/pam.d/common-session add as the last line in
- the file: session required pam_limits.so Otherwise the
- changes in /etc/security/limits.conf won't be applied.
-
- Don't forget to log out and back in again for the changes to take effect!
-
-
-
-
- Windows
+ Windows
- Previous to hbase-0.96.0, Apache HBase was little tested running on Windows. Running a
- production install of HBase on top of Windows is not recommended.
+
+ Prior to HBase 0.96, testing for running HBase on Microsoft Windows was limited.
+ Running a on Windows nodes is not recommended for production systems.
- If you are running HBase on Windows pre-hbase-0.96.0, you must install Cygwin to have a *nix-like environment for the
- shell scripts. The full details are explained in the To run versions of HBase prior to 0.96 on Microsoft Windows, you must install Cygwin and run HBase within the Cygwin
+ environment. This provides support for Linux/Unix commands and scripts. The full details are explained in the Windows Installation guide. Also search
our user mailing list to pick up latest fixes figured by Windows users.Post-hbase-0.96.0, hbase runs natively on windows with supporting
- *.cmd scripts bundled.
-
+ *.cmd scripts bundled.
+
-
+
HadoopHadoop
- The below table shows some information about what versions of Hadoop are supported by
- various HBase versions. Based on the version of HBase, you should select the most
- appropriate version of Hadoop. We are not in the Hadoop distro selection business. You can
- use Hadoop distributions from Apache, or learn about vendor distributions of Hadoop at
+ The following table summarizes the versions of Hadoop supported with each version of
+ HBase. Based on the version of HBase, you should select the most
+ appropriate version of Hadoop. You can use Apache Hadoop, or a vendor's distribution of
+ Hadoop. No distinction is made here. See
+ for information about vendors of Hadoop.
- Hadoop 2.x is better than Hadoop 1.x
- Hadoop 2.x is faster, with more features such as short-circuit reads which will help
- improve your HBase random read profile as well important bug fixes that will improve your
- overall HBase experience. You should run Hadoop 2 rather than Hadoop 1. HBase 0.98
- deprecates use of Hadoop1. HBase 1.0 will not support Hadoop1.
+ Hadoop 2.x is recommended.
+ Hadoop 2.x is faster and includes features, such as short-circuit reads, which will
+ help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes
+ that will improve your overall HBase experience. HBase 0.98 deprecates use of Hadoop 1.x,
+ and HBase 1.0 will not support Hadoop 1.x.Use the following legend to interpret this table:Hadoop Distributed File System (HDFS).
Fully-distributed mode can ONLY run on HDFS. See the Hadoop
- requirements and instructions for how to set up HDFS.
+ requirements and instructions for how to set up HDFS for Hadoop 1.x. A good
+ walk-through for setting up HDFS on Hadoop 2 is at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide.
Below we describe the different distributed setups. Starting, verification and
exploration of your install, whether a pseudo-distributed or
@@ -628,207 +722,139 @@ Index: pom.xml
Pseudo-distributed
+
+ Pseudo-Distributed Quickstart
+ A quickstart has been added to the chapter. See . Some of the information that was originally in this
+ section has been moved there.
+ A pseudo-distributed mode is simply a fully-distributed mode run on a single host. Use
this configuration testing and prototyping on HBase. Do not use this configuration for
production nor for evaluating HBase performance.
- First, if you want to run on HDFS rather than on the local filesystem, setup your
- HDFS. You can set up HDFS also in pseudo-distributed mode (TODO: Add pointer to HOWTO doc;
- the hadoop site doesn't have any any more). Ensure you have a working HDFS before
- proceeding.
-
- Next, configure HBase. Edit conf/hbase-site.xml. This is the file
- into which you add local customizations and overrides. At a minimum, you must tell HBase
- to run in (pseudo-)distributed mode rather than in default standalone mode. To do this,
- set the hbase.cluster.distributed property to true (Its default is
- false). The absolute bare-minimum hbase-site.xml
- is therefore as follows:
-
-
- hbase.cluster.distributed
- true
-
-
-]]>
-
- With this configuration, HBase will start up an HBase Master process, a ZooKeeper
- server, and a RegionServer process running against the local filesystem writing to
- wherever your operating system stores temporary files into a directory named
- hbase-YOUR_USER_NAME.
-
- Such a setup, using the local filesystem and writing to the operating systems's
- temporary directory is an ephemeral setup; the Hadoop local filesystem -- which is what
- HBase uses when it is writing the local filesytem -- would lose data unless the system
- was shutdown properly in versions of HBase before 0.98.4 and 1.0.0 (see
- HBASE-11218 Data
- loss in HBase standalone mode). Writing to the operating
- system's temporary directory can also make for data loss when the machine is restarted as
- this directory is usually cleared on reboot. For a more permanent setup, see the next
- example where we make use of an instance of HDFS; HBase data will be written to the Hadoop
- distributed filesystem rather than to the local filesystem's tmp directory.
- In this conf/hbase-site.xml example, the
- hbase.rootdir property points to the local HDFS instance homed on the
- node h-24-30.example.com.
-
- Let HBase create ${hbase.rootdir}
- Let HBase create the hbase.rootdir directory. If you don't,
- you'll get warning saying HBase needs a migration run because the directory is missing
- files expected by HBase (it'll create them if you let it).
-
-
-<configuration>
- <property>
- <name>hbase.rootdir</name>
- <value>hdfs://h-24-30.sfo.stumble.net:8020/hbase</value>
- </property>
- <property>
- <name>hbase.cluster.distributed</name>
- <value>true</value>
- </property>
-</configuration>
-
-
- Now skip to for how to start and verify your pseudo-distributed install.
- See for notes on how to start extra Masters and RegionServers
- when running pseudo-distributed.
-
-
-
- Pseudo-distributed Extras
-
-
- Startup
- To start up the initial HBase cluster...
- % bin/start-hbase.sh
- To start up an extra backup master(s) on the same server run...
- % bin/local-master-backup.sh start 1
- ... the '1' means use ports 16001 & 16011, and this backup master's logfile
- will be at logs/hbase-${USER}-1-master-${HOSTNAME}.log.
- To startup multiple backup masters run...
- % bin/local-master-backup.sh start 2 3
- You can start up to 9 backup masters (10 total).
- To start up more regionservers...
- % bin/local-regionservers.sh start 1
- ... where '1' means use ports 16201 & 16301 and its logfile will be at
- `logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log.
- To add 4 more regionservers in addition to the one you just started by
- running...
- % bin/local-regionservers.sh start 2 3 4 5
- This supports up to 99 extra regionservers (100 total).
-
-
- Stop
- Assuming you want to stop master backup # 1, run...
- % cat /tmp/hbase-${USER}-1-master.pid |xargs kill -9
- Note that bin/local-master-backup.sh stop 1 will try to stop the cluster along
- with the master.
- To stop an individual regionserver, run...
- % bin/local-regionservers.sh stop 1
-
-
-
-
+
+
+ Fully-distributed
+ By default, HBase runs in standalone mode. Both standalone mode and pseudo-distributed
+ mode are provided for the purposes of small-scale testing. For a production environment,
+ distributed mode is appropriate. In distributed mode, multiple instances of HBase daemons
+ run on multiple servers in the cluster.
+ Just as in pseudo-distributed mode, a fully distributed configuration requires that you
+ set the hbase-cluster.distributed property to true.
+ Typically, the hbase.rootdir is configured to point to a highly-available HDFS
+ filesystem.
+ In addition, the cluster is configured so that multiple cluster nodes enlist as
+ RegionServers, ZooKeeper QuorumPeers, and backup HMaster servers. These configuration basics
+ are all demonstrated in .
+
+ Distributed RegionServers
+ Typically, your cluster will contain multiple RegionServers all running on different
+ servers, as well as primary and backup Master and Zookeeper daemons. The
+ conf/regionservers file on the master server contains a list of
+ hosts whose RegionServers are associated with this cluster. Each host is on a separate
+ line. All hosts listed in this file will have their RegionServer processes started and
+ stopped when the master server starts or stops.
+
+
+ ZooKeeper and HBase
+ See section for ZooKeeper setup for HBase.
+
-
- Fully-distributed
-
- For running a fully-distributed operation on more than one host, make the following
- configurations. In hbase-site.xml, add the property
- hbase.cluster.distributed and set it to true and
- point the HBase hbase.rootdir at the appropriate HDFS NameNode and
- location in HDFS where you would like HBase to write data. For example, if you namenode
- were running at namenode.example.org on port 8020 and you wanted to home your HBase in
- HDFS at /hbase, make the following configuration.
-
+
+ Example Distributed HBase Cluster
+ This is a bare-bones conf/hbase-site.xml for a distributed HBase
+ cluster. A cluster that is used for real-world work would contain more custom
+ configuration parameters. Most HBase configuration directives have default values, which
+ are used unless the value is overridden in the hbase-site.xml. See for more information.
- ...
hbase.rootdirhdfs://namenode.example.org:8020/hbase
- The directory shared by RegionServers.
- hbase.cluster.distributedtrue
- The mode the cluster will be in. Possible values are
- false: standalone and pseudo-distributed setups with managed Zookeeper
- true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
-
- ...
+
+ hbase.zookeeper.quorum
+ node-a.example.com,node-b.example.com,node-c.example.com
+
]]>
+ This is an example conf/regionservers file, which contains a list
+ of each node that should run a RegionServer in the cluster. These nodes need HBase
+ installed and they need to use the same contents of the conf/
+ directory as the Master server..
+
+node-a.example.com
+node-b.example.com
+node-c.example.com
+
+ This is an example conf/backup-masters file, which contains a
+ list of each node that should run a backup Master instance. The backup Master instances
+ will sit idle unless the main Master becomes unavailable.
+
+node-b.example.com
+node-c.example.com
+
+
+
+ Distributed HBase Quickstart
+ See for a walk-through of a simple three-node
+ cluster configuration with multiple ZooKeeper, backup HMaster, and RegionServer
+ instances.
+
-
- regionservers
-
- In addition, a fully-distributed mode requires that you modify
- conf/regionservers. The file lists all hosts that you would have running
- HRegionServers, one host per line (This file in HBase is
- like the Hadoop slaves file). All servers listed in this file will
- be started and stopped when HBase cluster start or stop is run.
-
-
-
- ZooKeeper and HBase
- See section for ZooKeeper setup for HBase.
-
-
-
- HDFS Client Configuration
-
- Of note, if you have made HDFS client configuration on your
- Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to
- server-side configurations -- HBase will not see this configuration unless you do one of
- the following:
-
-
-
+
+ HDFS Client Configuration
+
+ Of note, if you have made HDFS client configuration on your Hadoop cluster, such as
+ configuration directives for HDFS clients, as opposed to server-side configurations, you
+ must use one of the following methods to enable HBase to see and use these configuration
+ changes:
+
+ Add a pointer to your HADOOP_CONF_DIR to the
HBASE_CLASSPATH environment variable in
hbase-env.sh.
-
+
-
+ Add a copy of hdfs-site.xml (or
hadoop-site.xml) or, better, symlinks, under
${HBASE_HOME}/conf, or
-
+
-
+ if only a small set of HDFS client configurations, add them to
hbase-site.xml.
-
-
-
- An example of such an HDFS client configuration is
- dfs.replication. If for example, you want to run with a replication
- factor of 5, hbase will create files with the default of 3 unless you do the above to
- make the configuration available to HBase.
-
-
+
+
+
+
+ An example of such an HDFS client configuration is dfs.replication.
+ If for example, you want to run with a replication factor of 5, hbase will create files with
+ the default of 3 unless you do the above to make the configuration available to
+ HBase.
+
@@ -871,7 +897,7 @@ stopping hbase...............
of many machines. If you are running a distributed operation, be sure to wait until HBase
has shut down completely before stopping the Hadoop daemons.
-
+
diff --git a/src/main/docbkx/getting_started.xml b/src/main/docbkx/getting_started.xml
index 117e1ec3546..da5946c72fb 100644
--- a/src/main/docbkx/getting_started.xml
+++ b/src/main/docbkx/getting_started.xml
@@ -40,46 +40,51 @@
- Quick Start
+ Quick Start - Standalone HBase
- This guide describes setup of a standalone HBase instance. It will run against the local
- filesystem. In later sections we will take you through how to run HBase on Apache Hadoop's
- HDFS, a distributed filesystem. This section shows you how to create a table in HBase,
- inserting rows into your new HBase table via the HBase shell, and then
- cleaning up and shutting down your standalone, local filesystem-based HBase instance. The
- below exercise should take no more than ten minutes (not including download time).
- This guide describes setup of a standalone HBase instance running against the local
+ filesystem. This is not an appropriate configuration for a production instance of HBase, but
+ will allow you to experiment with HBase. This section shows you how to create a table in
+ HBase using the hbase shell CLI, insert rows into the table, perform put
+ and scan operations against the table, enable or disable the table, and start and stop HBase.
+ Apart from downloading HBase, this procedure should take less than 10 minutes.
+ Local Filesystem and Durability
- Using HBase with a LocalFileSystem does not currently guarantee durability. The HDFS
- local filesystem implementation will lose edits if files are not properly closed -- which is
- very likely to happen when experimenting with a new download. You need to run HBase on HDFS
- to ensure all writes are preserved. Running against the local filesystem though will get you
- off the ground quickly and get you familiar with how the general system works so lets run
- with it for now. See The below advice is for HBase 0.98.2 and earlier releases only. This is fixed
+ in HBase 0.98.3 and beyond. See HBASE-11272 and
+ HBASE-11218.
+ Using HBase with a local filesystem does not guarantee durability. The HDFS
+ local filesystem implementation will lose edits if files are not properly closed. This is
+ very likely to happen when you are experimenting with new software, starting and stopping
+ the daemons often and not always cleanly. You need to run HBase on HDFS
+ to ensure all writes are preserved. Running against the local filesystem is intended as a
+ shortcut to get you familiar with how the general system works, as the very first phase of
+ evaluation. See and its associated issues
- for more details.
-
+ for more details about the issues of running on the local filesystem.
+
- Loopback IP
- The below advice is for hbase-0.94.x and older versions only. We believe this
- fixed in hbase-0.96.0 and beyond (let us know if we have it wrong). There
- should be no need of the below modification to /etc/hosts in later
- versions of HBase.
+ Loopback IP - HBase 0.94.x and earlier
+ The below advice is for hbase-0.94.x and older versions only. This is fixed in
+ hbase-0.96.0 and beyond.
- HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other
- distributions, for example, will default to 127.0.1.1 and this will cause problems for you
- See Why does
- HBase care about /etc/hosts? for detail.
- .
- /etc/hosts should look something like this:
-
+ Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu
+ and some other distributions default to 127.0.1.1 and this will cause problems for you . See Why does HBase
+ care about /etc/hosts? for detail.
+
+ Example /etc/hosts File for Ubuntu
+ The following /etc/hosts file works correctly for HBase 0.94.x
+ and earlier, on Ubuntu. Use this as a template if you run into trouble.
+
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
-
-
+
+
@@ -89,159 +94,611 @@
- Download and unpack the latest stable release.
+ Get Started with HBase
- Choose a download site from this list of
+ Download, Configure, and Start HBase
+
+ Choose a download site from this list of Apache Download Mirrors.
Click on the suggested top link. This will take you to a mirror of HBase
Releases. Click on the folder named stable and then
- download the file that ends in .tar.gz to your local filesystem; e.g.
- hbase-0.94.2.tar.gz.
-
- Decompress and untar your download and then change into the unpacked directory.
-
- .tar.gz
-$ cd hbase-]]>
-
-
- At this point, you are ready to start HBase. But before starting it, edit
- conf/hbase-site.xml, the file you write your site-specific
- configurations into. Set hbase.rootdir, the directory HBase writes data
- to, and hbase.zookeeper.property.dataDir, the directory ZooKeeper writes
- its data too:
-
-
+ download the binary file that ends in .tar.gz to your local filesystem. Be
+ sure to choose the version that corresponds with the version of Hadoop you are likely to use
+ later. In most cases, you should choose the file for Hadoop 2, which will be called something
+ like hbase-0.98.3-hadoop2-bin.tar.gz. Do not download the file ending in
+ src.tar.gz for now.
+
+
+ Extract the downloaded file, and change to the newly-created directory.
+
+$ tar xzvf hbase-]]>-hadoop2-bin.tar.gz
+$ cd hbase-]]>-hadoop2/
+
+
+
+ Edit conf/hbase-site.xml, which is the main HBase configuration
+ file. At this time, you only need to specify the directory on the local filesystem where
+ HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many
+ servers are configured to delete the contents of /tmp upon reboot, so you should store
+ the data elsewhere. The following configuration will store HBase's data in the
+ hbase directory, in the home directory of the user called
+ testuser. Paste the <property> tags beneath the
+ <configuration> tags, which should be empty in a new HBase install.
+
+ Example hbase-site.xml for Standalone HBase
+ hbase.rootdir
- file:///DIRECTORY/hbase
+ file:///home/testuser/hbasehbase.zookeeper.property.dataDir
- /DIRECTORY/zookeeper
+ /home/testuser/zookeeper
-]]>
- Replace DIRECTORY in the above with the path to the directory you
- would have HBase and ZooKeeper write their data. By default,
- hbase.rootdir is set to /tmp/hbase-${user.name}
- and similarly so for the default ZooKeeper data location which means you'll lose all your
- data whenever your server reboots unless you change it (Most operating systems clear
- /tmp on restart).
-
+
+ ]]>
+
+
+ You do not need to create the HBase data directory. HBase will do this for you. If
+ you create the directory, HBase will attempt to do a migration, which is not what you
+ want.
+
+
+ The bin/start-hbase.sh script is provided as a convenient way
+ to start HBase. Issue the command, and if all goes well, a message is logged to standard
+ output showing that HBase started successfully. You can use the jps
+ command to verify that you have one running process called HMaster
+ and at least one called HRegionServer.
+ Java needs to be installed and available. If you get an error indicating that
+ Java is not installed, but it is on your system, perhaps in a non-standard location,
+ edit the conf/hbase-env.sh file and modify the
+ JAVA_HOME setting to point to the directory that contains
+ bin/java your system.
+
+
-
- Start HBase
-
- Now start HBase:
- $ ./bin/start-hbase.sh
-starting Master, logging to logs/hbase-user-master-example.org.out
-
- You should now have a running standalone HBase instance. In standalone mode, HBase runs
- all daemons in the the one JVM; i.e. both the HBase and ZooKeeper daemons. HBase logs can be
- found in the logs subdirectory. Check them out especially if it seems
- HBase had trouble starting.
-
-
- Is java installed?
-
- All of the above presumes a 1.6 version of Oracle java is
- installed on your machine and available on your path (See ); i.e. when you type java, you see output
- that describes the options the java program takes (HBase requires java 6). If this is not
- the case, HBase will not start. Install java, edit conf/hbase-env.sh,
- uncommenting the JAVA_HOME line pointing it to your java install, then,
- retry the steps above.
-
-
-
-
- Shell Exercises
-
- Connect to your running HBase via the shell.
-
- ' for list of supported commands.
-Type "exit" to leave the HBase Shell
-Version: 0.90.0, r1001068, Fri Sep 24 13:55:42 PDT 2010
-
-hbase(main):001:0>]]>
-
- Type help and then <RETURN> to see a listing
- of shell commands and options. Browse at least the paragraphs at the end of the help
- emission for the gist of how variables and command arguments are entered into the HBase
- shell; in particular note how table names, rows, and columns, etc., must be quoted.
-
- Create a table named test with a single column family named
- cf. Verify its creation by listing all tables and then insert some
- values.
-
- create 'test', 'cf'
+
+ Use HBase For the First Time
+
+ Connect to HBase.
+ Connect to your running instance of HBase using the hbase shell
+ command, located in the bin/ directory of your HBase
+ install. In this example, some usage and version information that is printed when you
+ start HBase Shell has been omitted. The HBase Shell prompt ends with a
+ > character.
+
+$ ./bin/hbase shell
+hbase(main):001:0>
+
+
+
+ Display HBase Shell Help Text.
+ Type help and press Enter, to display some basic usage
+ information for HBase Shell, as well as several example commands. Notice that table
+ names, rows, columns all must be enclosed in quote characters.
+
+
+ Create a table.
+ Use the create command to create a new table. You must specify the
+ table name and the ColumnFamily name.
+
+hbase> create 'test', 'cf'
0 row(s) in 1.2200 seconds
-hbase(main):003:0> list 'test'
-..
-1 row(s) in 0.0550 seconds
-hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
-0 row(s) in 0.0560 seconds
-hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
-0 row(s) in 0.0370 seconds
-hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
-0 row(s) in 0.0450 seconds]]>
+
+
+
+ List Information About your Table
+ Use the list command to
+
+hbase> list 'test'
+TABLE
+test
+1 row(s) in 0.0350 seconds
- Above we inserted 3 values, one at a time. The first insert is at
- row1, column cf:a with a value of
- value1. Columns in HBase are comprised of a column family prefix --
- cf in this example -- followed by a colon and then a column qualifier
- suffix (a in this case).
+=> ["test"]
+
+
+
+ Put data into your table.
+ To put data into your table, use the put command.
+
+hbase> put 'test', 'row1', 'cf:a', 'value1'
+0 row(s) in 0.1770 seconds
- Verify the data insert by running a scan of the table as follows
+hbase> put 'test', 'row2', 'cf:b', 'value2'
+0 row(s) in 0.0160 seconds
- scan 'test'
-ROW COLUMN+CELL
-row1 column=cf:a, timestamp=1288380727188, value=value1
-row2 column=cf:b, timestamp=1288380738440, value=value2
-row3 column=cf:c, timestamp=1288380747365, value=value3
-3 row(s) in 0.0590 seconds]]>
+hbase> put 'test', 'row3', 'cf:c', 'value3'
+0 row(s) in 0.0260 seconds
+
+ Here, we insert three values, one at a time. The first insert is at
+ row1, column cf:a, with a value of
+ value1. Columns in HBase are comprised of a column family prefix,
+ cf in this example, followed by a colon and then a column qualifier
+ suffix, a in this case.
+
+
+ Scan the table for all data at once.
+ One of the ways to get data from HBase is to scan. Use the scan
+ command to scan the table for data. You can limit your scan, but for now, all data is
+ fetched.
+
+hbase> scan 'test'
+ROW COLUMN+CELL
+ row1 column=cf:a, timestamp=1403759475114, value=value1
+ row2 column=cf:b, timestamp=1403759492807, value=value2
+ row3 column=cf:c, timestamp=1403759503155, value=value3
+3 row(s) in 0.0440 seconds
+
+
+
+ Get a single row of data.
+ To get a single row of data at a time, use the get command.
+
+hbase> get 'test', 'row1'
+COLUMN CELL
+ cf:a timestamp=1403759475114, value=value1
+1 row(s) in 0.0230 seconds
+
+
+
+ Disable a table.
+ If you want to delete a table or change its settings, as well as in some other
+ situations, you need to disable the table first, using the disable
+ command. You can re-enable it using the enable command.
+
+hbase> disable 'test'
+0 row(s) in 1.6270 seconds
- Get a single row
-
- get 'test', 'row1'
-COLUMN CELL
-cf:a timestamp=1288380727188, value=value1
-1 row(s) in 0.0400 seconds]]>
-
- Now, disable and drop your table. This will clean up all done above.
-
- h disable 'test'
-0 row(s) in 1.0930 seconds
-hbase(main):013:0> drop 'test'
-0 row(s) in 0.0770 seconds ]]>
-
- Exit the shell by typing exit.
-
- exit]]>
+hbase> enable 'test'
+0 row(s) in 0.4500 seconds
+
+ Disable the table again if you tested the enable command above:
+
+hbase> disable 'test'
+0 row(s) in 1.6270 seconds
+
+
+
+ Drop the table.
+ To drop (delete) a table, use the drop command.
+
+hbase> drop 'test'
+0 row(s) in 0.2900 seconds
+
+
+
+ Exit the HBase Shell.
+ To exit the HBase Shell and disconnect from your cluster, use the
+ quit command. HBase is still running in the background.
+
+
+
+
+ Stop HBase
+
+ In the same way that the bin/start-hbase.sh script is provided
+ to conveniently start all HBase daemons, the bin/stop-hbase.sh
+ script stops them.
+
+$ ./bin/stop-hbase.sh
+stopping hbase....................
+$
+
+
+
+ After issuing the command, it can take several minutes for the processes to shut
+ down. Use the jps to be sure that the HMaster and HRegionServer
+ processes are shut down.
+
+
-
- Stopping HBase
-
- Stop your hbase instance by running the stop script.
-
- $ ./bin/stop-hbase.sh
-stopping hbase...............
+
+ Intermediate - Pseudo-Distributed Local Install
+ After working your way through , you can re-configure HBase
+ to run in pseudo-distributed mode. Pseudo-distributed mode means
+ that HBase still runs completely on a single host, but each HBase daemon (HMaster,
+ HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the
+ hbase.rootdir property as described in , your data
+ is still stored in /tmp/. In this walk-through, we store your data in
+ HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to
+ continue storing your data in the local filesystem.
+
+ Hadoop Configuration
+ This procedure assumes that you have configured Hadoop and HDFS on your local system
+ and or a remote system, and that they are running and available. It also assumes you are
+ using Hadoop 2. Currently, the documentation on the Hadoop website does not include a
+ quick start for Hadoop 2, but the guide at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide
+ is a good starting point.
+
+
+
+ Stop HBase if it is running.
+ If you have just finished and HBase is still running,
+ stop it. This procedure will create a totally new directory where HBase will store its
+ data, so any databases you created before will be lost.
+
+
+ Configure HBase.
+
+ Edit the hbase-site.xml configuration. First, add the following
+ property, which directs HBase to run in distributed mode, with one JVM instance per
+ daemon.
+
+
+ hbase.cluster.distributed
+ true
+
+ ]]>
+ Next, change the hbase.rootdir from the local filesystem to the address
+ of your HDFS instance, using the hdfs://// URI syntax. In this example,
+ HDFS is running on the localhost at port 8020.
+
+ hbase.rootdir
+ hdfs://localhost:8020/hbase
+
+ ]]>
+
+ You do not need to create the directory in HDFS. HBase will do this for you. If you
+ create the directory, HBase will attempt to do a migration, which is not what you
+ want.
+
+
+ Start HBase.
+ Use the bin/start-hbase.sh command to start HBase. If your
+ system is configured correctly, the jps command should show the
+ HMaster and HRegionServer processes running.
+
+
+ Check the HBase directory in HDFS.
+ If everything worked correctly, HBase created its directory in HDFS. In the
+ configuration above, it is stored in /hbase/ on HDFS. You can use
+ the hadoop fs command in Hadoop's bin/ directory
+ to list this directory.
+
+$ ./bin/hadoop fs -ls /hbase
+Found 7 items
+drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
+drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
+drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
+drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
+-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
+-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
+drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
+
+
+
+ Create a table and populate it with data.
+ You can use the HBase Shell to create a table, populate it with data, scan and get
+ values from it, using the same procedure as in .
+
+
+ Start and stop a backup HBase Master (HMaster) server.
+
+ Running multiple HMaster instances on the same hardware does not make sense in a
+ production environment, in the same way that running a pseudo-distributed cluster does
+ not make sense for production. This step is offered for testing and learning purposes
+ only.
+
+ The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster
+ servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster,
+ use the local-master-backup.sh. For each backup master you want to
+ start, add a parameter representing the port offset for that master. Each HMaster uses
+ two ports (16000 and 16010 by default). The port offset is added to these ports, so
+ using an offset of 2, the first backup HMaster would use ports 16002 and 16012. The
+ following command starts 3 backup servers using ports 16002/16012, 16003/16013, and
+ 16005/16015.
+
+$ ./bin/local-master-backup.sh 2 3 5
+
+ To kill a backup master without killing the entire cluster, you need to find its
+ process ID (PID). The PID is stored in a file with a name like
+ /tmp/hbase-USER-X-master.pid.
+ The only contents of the file are the PID. You can use the kill -9
+ command to kill that PID. The following command will kill the master with port offset 1,
+ but leave the cluster running:
+
+$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
+
+
+
+ Start and stop additional RegionServers
+ The HRegionServer manages the data in its StoreFiles as directed by the HMaster.
+ Generally, one HRegionServer runs per node in the cluster. Running multiple
+ HRegionServers on the same system can be useful for testing in pseudo-distributed mode.
+ The local-regionservers.sh command allows you to run multiple
+ RegionServers. It works in a similar way to the
+ local-master-backup.sh command, in that each parameter you provide
+ represents the port offset for an instance. Each RegionServer requires two ports, and
+ the default ports are 16200 and 16300. You can run 99 additional RegionServers, or 100
+ total, on a server. The following command starts four additional
+ RegionServers, running on sequential ports starting at 16202/16302.
+
+$ .bin/local-regionservers.sh start 2 3 4 5
+
+ To stop a RegionServer manually, use the local-regionservers.sh
+ command with the stop parameter and the offset of the server to
+ stop.
+ $ .bin/local-regionservers.sh stop 3
+
+
+ Stop HBase.
+ You can stop HBase the same way as in the procedure, using the
+ bin/stop-hbase.sh command.
+
+
+
+
+ Advanced - Fully Distributed
+ In reality, you need a fully-distributed configuration to fully test HBase and to use it
+ in real-world scenarios. In a distributed configuration, the cluster contains multiple
+ nodes, each of which runs one or more HBase daemon. These include primary and backup Master
+ instances, multiple Zookeeper nodes, and multiple RegionServer nodes.
+ This advanced quickstart adds two more nodes to your cluster. The architecture will be
+ as follows:
+
+ This quickstart assumes that each node is a virtual machine and that they are all on the
+ same network. It builds upon the previous quickstart, ,
+ assuming that the system you configured in that procedure is now node-a. Stop HBase on node-a
+ before continuing.
+
+ Be sure that all the nodes have full access to communicate, and that no firewall rules
+ are in place which could prevent them from talking to each other. If you see any errors like
+ no route to host, check your firewall.
+
+
+ Configure Password-Less SSH Access
+ node-a needs to be able to log into node-b and
+ node-c (and to itself) in order to start the daemons. The easiest way to accomplish this is
+ to use the same username on all hosts, and configure password-less SSH login from
+ node-a to each of the others.
+
+ On node-a, generate a key pair.
+ While logged in as the user who will run HBase, generate a SSH key pair, using the
+ following command:
+
+ $ ssh-keygen -t rsa
+ If the command succeeds, the location of the key pair is printed to standard output.
+ The default name of the public key is id_rsa.pub.
+
+
+ Create the directory that will hold the shared keys on the other nodes.
+ On node-b and node-c, log in as the HBase user and create
+ a .ssh/ directory in the user's home directory, if it does not
+ already exist. If it already exists, be aware that it may already contain other keys.
+
+
+ Copy the public key to the other nodes.
+ Securely copy the public key from node-a to each of the nodes, by
+ using the scp or some other secure means. On each of the other nodes,
+ create a new file called .ssh/authorized_keysif it does
+ not already exist, and append the contents of the
+ id_rsa.pub file to the end of it. Note that you also need to do
+ this for node-a itself.
+ $ cat id_rsa.pub >> ~/.ssh/authorized_keys
+
+
+ Test password-less login.
+ If you performed the procedure correctly, if you SSH from node-a to
+ either of the other nodes, using the same username, you should not be prompted for a password.
+
+
+
+ Since node-b will run a backup Master, repeat the procedure above,
+ substituting node-b everywhere you see node-a. Be sure not to
+ overwrite your existing .ssh/authorized_keys files, but concatenate
+ the new key onto the existing file using the >> operator rather than
+ the > operator.
+
+
+
+
+ Prepare node-a
+ node-a will run your primary master and ZooKeeper processes, but no
+ RegionServers.
+
+ Stop the RegionServer from starting on node-a.
+ Edit conf/regionservers and remove the line which contains
+ localhost. Add lines with the hostnames or IP addresses for
+ node-b and node-c. Even if you did want to run a
+ RegionServer on node-a, you should refer to it by the hostname the other
+ servers would use to communicate with it. In this case, that would be
+ node-a.example.com. This enables you to distribute the
+ configuration to each node of your cluster any hostname conflicts. Save the file.
+
+
+ Configure HBase to use node-b as a backup master.
+ Create a new file in conf/ called
+ backup-masters, and add a new line to it with the hostname for
+ node-b. In this demonstration, the hostname is
+ node-b.example.com.
+
+
+ Configure ZooKeeper
+ In reality, you should carefully consider your ZooKeeper configuration. You can find
+ out more about configuring ZooKeeper in . This configuration will direct HBase to start and manage a
+ ZooKeeper instance on each node of the cluster.
+ On node-a, edit conf/hbase-site.xml and add the
+ following properties.
+
+ hbase.zookeeper.quorum
+ node-a.example.com,node-b.example.com,node-c.example.com
+
+
+ hbase.zookeeper.property.dataDir
+ /usr/local/zookeeper
+
+ ]]>
+
+
+ Everywhere in your configuration that you have referred to node-a as
+ localhost, change the reference to point to the hostname that
+ the other nodes will use to refer to node-a. In these examples, the
+ hostname is node-a.example.com.
+
+
+
+ Prepare node-b and node-c
+ node-b will run a backup master server and a ZooKeeper instance.
+
+ Download and unpack HBase.
+ Download and unpack HBase to node-b, just as you did for the standalone
+ and pseudo-distributed quickstarts.
+
+
+ Copy the configuration files from node-a to node-b.and
+ node-c.
+ Each node of your cluster needs to have the same configuration information. Copy the
+ contents of the conf/ directory to the conf/
+ directory on node-b and node-c.
+
+
+
+ Start and Test Your Cluster
+
+ Be sure HBase is not running on any node.
+ If you forgot to stop HBase from previous testing, you will have errors. Check to
+ see whether HBase is running on any of your nodes by using the jps
+ command. Look for the processes HMaster,
+ HRegionServer, and HQuorumPeer. If they exist,
+ kill them.
+
+
+ Start the cluster.
+ On node-a, issue the start-hbase.sh command. Your
+ output will be similar to that below.
+
+$ bin/start-hbase.sh
+node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
+node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
+node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
+starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
+node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
+node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
+node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
+
+ ZooKeeper starts first, followed by the master, then the RegionServers, and finally
+ the backup masters.
+
+
+ Verify that the processes are running.
+ On each node of the cluster, run the jps command and verify that
+ the correct processes are running on each server. You may see additional Java processes
+ running on your servers as well, if they are used for other purposes.
+
+ node-ajps Output
+
+$ jps
+20355 Jps
+20071 HQuorumPeer
+20137 HMaster
+
+
+
+ node-bjps Output
+
+$ jps
+15930 HRegionServer
+16194 Jps
+15838 HQuorumPeer
+16010 HMaster
+
+
+
+ node-cjps Output
+
+$ jps
+13901 Jps
+13639 HQuorumPeer
+13737 HRegionServer
+
+
+
+ ZooKeeper Process Name
+ The HQuorumPeer process is a ZooKeeper instance which is controlled
+ and started by HBase. If you use ZooKeeper this way, it is limited to one instance per
+ cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of
+ HBase, the process is called QuorumPeer. For more about ZooKeeper
+ configuration, including using an external ZooKeeper instance with HBase, see .
+
+
+
+ Browse to the Web UI.
+
+ Web UI Port Changes
+ In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from
+ 60010 for the Master and 60030 for each RegionServer to 16610 for the Master and 16030
+ for the RegionServer.
+
+ If everything is set up correctly, you should be able to connect to the UI for the
+ Master http://node-a.example.com:60110/ or the secondary master at
+ http://node-b.example.com:60110/ for the secondary master, using a
+ web browser. If you can connect via localhost but not from another host,
+ check your firewall rules. You can see the web UI for each of the RegionServers at port
+ 60130 of their IP addresses, or by clicking their links in the web UI for the
+ Master.
+
+
+ Test what happens when nodes or services disappear.
+ With a three-node cluster like you have configured, things will not be very
+ resilient. Still, you can test what happens when the primary Master or a RegionServer
+ disappears, by killing the processes and watching the logs.
+
+
+
+
Where to go next
- The above described standalone setup is good for testing and experiments only. In the
- next chapter, , we'll go into depth on the different HBase run modes, system
- requirements running HBase, and critical configurations setting up a distributed HBase
- deploy.
+ The next chapter, , gives more information about the different HBase run modes,
+ system requirements for running HBase, and critical configuration areas for setting up a
+ distributed HBase cluster.
-