From 15831cefd5dfc98dbee55741d442d29ea63097bc Mon Sep 17 00:00:00 2001 From: Jonathan M Hsieh Date: Wed, 2 Jul 2014 11:24:30 -0700 Subject: [PATCH] HBASE-11399 Improve Quickstart chapter and move Pseudo-distributed and distrbuted into it (Misty Stanley-Jones) --- src/main/docbkx/configuration.xml | 754 ++++++++++++++------------- src/main/docbkx/getting_started.xml | 777 ++++++++++++++++++++++------ 2 files changed, 1007 insertions(+), 524 deletions(-) diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml index 8464b87842a..56a3dd712d1 100644 --- a/src/main/docbkx/configuration.xml +++ b/src/main/docbkx/configuration.xml @@ -29,228 +29,319 @@ */ --> Apache HBase Configuration - This chapter is the Not-So-Quick start guide to Apache HBase configuration. It goes over - system requirements, Hadoop setup, the different Apache HBase run modes, and the various - configurations in HBase. Please read this chapter carefully. At a minimum ensure that all have been satisfied. Failure to do so will cause you (and us) - grief debugging strange errors and/or data loss. + This chapter expands upon the chapter to further explain + configuration of Apache HBase. Please read this chapter carefully, especially to ensure that your HBase testing and deployment goes + smoothly, and prevent data loss. - Apache HBase uses the same configuration system as Apache Hadoop. To configure a deploy, - edit a file of environment variables in conf/hbase-env.sh -- this - configuration is used mostly by the launcher shell scripts getting the cluster off the ground -- - and then add configuration to an XML file to do things like override HBase defaults, tell HBase - what Filesystem to use, and the location of the ZooKeeper ensemble. - Be careful editing XML. Make sure you close all elements. Run your file through - xmllint or similar to ensure well-formedness of your document after an - edit session. - + Apache HBase uses the same configuration system as Apache Hadoop. All configuration files + are located in the conf/ directory, which needs to be kept in sync for each + node on your cluster. + + + HBase Configuration Files + + backup-masters + + Not present by default. A plain-text file which lists hosts on which the Master should + start a backup Master process, one host per line. + + + + hadoop-metrics2-hbase.properties + + Used to connect HBase Hadoop's Metrics2 framework. See the Hadoop Wiki + entry for more information on Metrics2. Contains only commented-out examples by + default. + + + + hbase-env.cmd and hbase-env.sh + + Script for Windows and Linux / Unix environments to set up the working environment for + HBase, including the location of Java, Java options, and other environment variables. The + file contains many commented-out examples to provide guidance. + + + + hbase-policy.xml + + The default policy configuration file used by RPC servers to make authorization + decisions on client requests. Only used if HBase security () is enabled. + + + + hbase-site.xml + + The main HBase configuration file. This file specifies configuration options which + override HBase's default configuration. You can view (but do not edit) the default + configuration file at docs/hbase-default.xml. You can also view the + entire effective configuration for your cluster (defaults and overrides) in the + HBase Configuration tab of the HBase Web UI. + + + + log4j.properties + + Configuration file for HBase logging via log4j. + + + + regionservers + + A plain-text file containing a list of hosts which should run a RegionServer in your + HBase cluster. By default this file contains the single entry + localhost. It should contain a list of hostnames or IP addresses, one + per line, and should only contain localhost if each node in your + cluster will run a RegionServer on its localhost interface. + + + + + + Checking XML Validity + When you edit XML, it is a good idea to use an XML-aware editor to be sure that your + syntax is correct and your XML is well-formed. You can also use the xmllint + utility to check that your XML is well-formed. By default, xmllint re-flows + and prints the XML to standard output. To check for well-formedness and only print output if + errors exist, use the command xmllint -noout + filename.xml. + - When running in distributed mode, after you make an edit to an HBase configuration, make - sure you copy the content of the conf directory to all nodes of the - cluster. HBase will not do this for you. Use rsync. For most configuration, a - restart is needed for servers to pick up changes (caveat dynamic config. to be described later - below). + + Keep Configuration In Sync Across the Cluster + When running in distributed mode, after you make an edit to an HBase configuration, make + sure you copy the content of the conf/ directory to all nodes of the + cluster. HBase will not do this for you. Use rsync, scp, + or another secure mechanism for copying the configuration files to your nodes. For most + configuration, a restart is needed for servers to pick up changes An exception is dynamic + configuration. to be described later below. +
Basic Prerequisites This section lists required services and some required system configuration. -
Java - HBase requires at least Java 6 from Oracle. The following table lists which JDK version are - compatible with each version of HBase. - - - - - HBase Version - JDK 6 - JDK 7 - JDK 8 - - - - - 1.0 - Not Supported - yes - Running with JDK 8 will work but is not well tested. - - - 0.98 - yes - yes - Running with JDK 8 works but is not well tested. Building with JDK 8 - would require removal of the deprecated remove() method of the PoolMap class and is - under consideration. See ee HBASE-7608 for - more information about JDK 8 support. - - - 0.96 - yes - yes - - - - 0.94 - yes - yes - - - - - -
+ + HBase requires at least Java 6 from Oracle. The following table lists + which JDK version are compatible with each version of HBase. + + + + + HBase Version + JDK 6 + JDK 7 + JDK 8 + + + + + 1.0 + Not Supported + yes + Running with JDK 8 will work but is not well tested. + + + 0.98 + yes + yes + Running with JDK 8 works but is not well tested. Building with JDK 8 would + require removal of the deprecated remove() method of the PoolMap class and is under + consideration. See ee HBASE-7608 + for more information about JDK 8 support. + + + 0.96 + yes + yes + + + + 0.94 + yes + yes + + + + + -
- Operating System -
Operating System Utilities + - ssh - - ssh must be installed and sshd must be running - to use Hadoop's scripts to manage remote Hadoop and HBase daemons. You must be able to ssh - to all nodes, including your local node, using passwordless login (Google "ssh - passwordless login"). If on mac osx, see the section, SSH: - Setting up Remote Desktop and Enabling Self-Login on the hadoop wiki. -
- -
ssh + + HBase uses the Secure Shell (ssh) command and utilities extensively to communicate + between cluster nodes. Each server in the cluster must be running ssh + so that the Hadoop and HBase daemons can be managed. You must be able to connect to all + nodes via SSH, including the local node, from the Master as well as any backup Master, + using a shared key rather than a password. You can see the basic methodology for such a + set-up in Linux or Unix systems at . If your cluster nodes use OS X, see the + section, SSH: + Setting up Remote Desktop and Enabling Self-Login on the Hadoop wiki. + + + - DNS + DNS + + HBase uses the local hostname to self-report its IP address. Both forward and + reverse DNS resolving must work in versions of HBase previous to 0.92.0. + The hadoop-dns-checker + tool can be used to verify DNS is working correctly on the cluster. The project + README file provides detailed instructions on usage. + - HBase uses the local hostname to self-report its IP address. Both forward and reverse - DNS resolving must work in versions of HBase previous to 0.92.0 - The hadoop-dns-checker - tool can be used to verify DNS is working correctly on the cluster. The project README - file provides detailed instructions on usage. - . + If your server has multiple network interfaces, HBase defaults to using the + interface that the primary hostname resolves to. To override this behavior, set the + hbase.regionserver.dns.interface property to a different interface. This + will only work if each server in your cluster uses the same network interface + configuration. - If your machine has multiple interfaces, HBase will use the interface that the primary - hostname resolves to. - - If this is insufficient, you can set - hbase.regionserver.dns.interface to indicate the primary interface. - This only works if your cluster configuration is consistent and every host has the same - network interface configuration. - - Another alternative is setting hbase.regionserver.dns.nameserver to - choose a different nameserver than the system wide default. -
-
To choose a different DNS nameserver than the system default, set the + hbase.regionserver.dns.nameserver property to the IP address of + that nameserver. + + + - Loopback IP - Previous to hbase-0.96.0, HBase expects the loopback IP address to be 127.0.0.1. See -
- -
Loopback IP + + Prior to hbase-0.96.0, HBase only used the IP address + 127.0.0.1 to refer to localhost, and this could + not be configured. See . + + + - NTP + NTP + + The clocks on cluster nodes should be synchronized. A small amount of variation is + acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time + synchronization is one of the first things to check if you see unexplained problems in + your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or + another time-synchronization mechanism, on your cluster, and that all nodes look to the + same service for time synchronization. See the Basic NTP + Configuration at The Linux Documentation Project (TLDP) + to set up NTP. + + - The clocks on cluster members should be in basic alignments. Some skew is tolerable - but wild skew could generate odd behaviors. Run NTP on your - cluster, or an equivalent. - - If you are having problems querying data, or "weird" cluster operations, check system - time! -
- -
- - <varname>ulimit</varname><indexterm> + <term>Limits on Number of Files and Processes (<command>ulimit</command>) + <indexterm> <primary>ulimit</primary> - </indexterm> and <varname>nproc</varname><indexterm> + </indexterm><indexterm> <primary>nproc</primary> </indexterm> - + - Apache HBase is a database. It uses a lot of files all at the same time. The default - ulimit -n -- i.e. user file limit -- of 1024 on most *nix systems is insufficient (On mac - os x its 256). Any significant amount of loading will lead you to . You may also notice errors such as the - following: - + + Apache HBase is a database. It requires the ability to open a large number of files + at once. Many Linux distributions limit the number of files a single user is allowed to + open to 1024 (or 256 on older versions of OS X). + You can check this limit on your servers by running the command ulimit + -n when logged in as the user which runs HBase. See for some of the problems you may + experience if the limit is too low. You may also notice errors such as the + following: + 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 - - Do yourself a favor and change the upper bound on the number of file descriptors. Set - it to north of 10k. The math runs roughly as follows: per ColumnFamily there is at least - one StoreFile and possibly up to 5 or 6 if the region is under load. Multiply the average - number of StoreFiles per ColumnFamily times the number of regions per RegionServer. For - example, assuming that a schema had 3 ColumnFamilies per region with an average of 3 - StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM will open - 3 * 3 * 100 = 900 file descriptors (not counting open jar files, config files, etc.) - You should also up the hbase users' nproc setting; under load, a - low-nproc setting could manifest as OutOfMemoryError. - See Jack Levin's major hdfs issues note up on the user list. - - - The requirement that a database requires upping of system limits is not peculiar - to Apache HBase. See for example the section Setting Shell Limits for the - Oracle User in Short Guide - to install Oracle 10 on Linux. - + + It is recommended to raise the ulimit to at least 10,000, but more likely 10,240, + because the value is usually expressed in multiples of 1024. Each ColumnFamily has at + least one StoreFile, and possibly more than 6 StoreFiles if the region is under load. + The number of open files required depends upon the number of ColumnFamilies and the + number of regions. The following is a rough formula for calculating the potential number + of open files on a RegionServer. + + Calculate the Potential Number of Open Files + (StoreFiles per ColumnFamily) x (regions per RegionServer) + + For example, assuming that a schema had 3 ColumnFamilies per region with an average + of 3 StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM + will open 3 * 3 * 100 = 900 file descriptors, not counting open JAR files, configuration + files, and others. Opening a file does not take many resources, and the risk of allowing + a user to open too many files is minimal. + Another related setting is the number of processes a user is allowed to run at once. + In Linux and Unix, the number of processes is set using the ulimit -u + command. This should not be confused with the nproc command, which + controls the number of CPUs available to a given user. Under load, a + nproc that is too low can cause OutOfMemoryError exceptions. See + Jack Levin's major + hdfs issues thread on the hbase-users mailing list, from 2011. + Configuring the fmaximum number of ile descriptors and processes for the user who is + running the HBase process is an operating system configuration, rather than an HBase + configuration. It is also important to be sure that the settings are changed for the + user that actually runs HBase. To see which user started HBase, and that user's ulimit + configuration, look at the first line of the HBase log for that instance. + A useful read setting config on you hadoop cluster is Aaron Kimballs' Configuration + Parameters: What can you just ignore? + + + <command>ulimit</command> Settings on Ubuntu + To configure ulimit settings on Ubuntu, edit + /etc/security/limits.conf, which is a space-delimited file with + four columns. Refer to the man + page for limits.conf for details about the format of this file. In the + following example, the first line sets both soft and hard limits for the number of + open files (nofile) to 32768 for the operating + system user with the username hadoop. The second line sets the + number of processes to 32000 for the same user. + + +hadoop - nofile 32768 +hadoop - nproc 32000 + + The settings are only applied if the Pluggable Authentication Module (PAM) + environment is directed to use them. To configure PAM to use these limits, be sure that + the /etc/pam.d/common-session file contains the following line: + session required pam_limits.so + + - To be clear, upping the file descriptors and nproc for the user who is running the - HBase process is an operating system configuration, not an HBase configuration. Also, a - common mistake is that administrators will up the file descriptors for a particular user - but for whatever reason, HBase will be running as some one else. HBase prints in its logs - as the first line the ulimit its seeing. Ensure its correct. - A useful read setting config on you hadoop cluster is Aaron Kimballs' Configuration - Parameters: What can you just ignore? - - -
- <varname>ulimit</varname> on Ubuntu - - If you are on Ubuntu you will need to make the following changes: - - In the file /etc/security/limits.conf add a line like: - hadoop - nofile 32768 - Replace hadoop with whatever user is running Hadoop and HBase. If - you have separate users, you will need 2 entries, one for each user. In the same file - set nproc hard and soft limits. For example: - hadoop soft/hard nproc 32000 - In the file /etc/pam.d/common-session add as the last line in - the file: session required pam_limits.so Otherwise the - changes in /etc/security/limits.conf won't be applied. - - Don't forget to log out and back in again for the changes to take effect! -
-
- -
- Windows + Windows - Previous to hbase-0.96.0, Apache HBase was little tested running on Windows. Running a - production install of HBase on top of Windows is not recommended. + + Prior to HBase 0.96, testing for running HBase on Microsoft Windows was limited. + Running a on Windows nodes is not recommended for production systems. - If you are running HBase on Windows pre-hbase-0.96.0, you must install Cygwin to have a *nix-like environment for the - shell scripts. The full details are explained in the To run versions of HBase prior to 0.96 on Microsoft Windows, you must install Cygwin and run HBase within the Cygwin + environment. This provides support for Linux/Unix commands and scripts. The full details are explained in the Windows Installation guide. Also search our user mailing list to pick up latest fixes figured by Windows users. Post-hbase-0.96.0, hbase runs natively on windows with supporting - *.cmd scripts bundled. -
+ *.cmd scripts bundled. + -
+
Hadoop Hadoop - The below table shows some information about what versions of Hadoop are supported by - various HBase versions. Based on the version of HBase, you should select the most - appropriate version of Hadoop. We are not in the Hadoop distro selection business. You can - use Hadoop distributions from Apache, or learn about vendor distributions of Hadoop at + The following table summarizes the versions of Hadoop supported with each version of + HBase. Based on the version of HBase, you should select the most + appropriate version of Hadoop. You can use Apache Hadoop, or a vendor's distribution of + Hadoop. No distinction is made here. See + for information about vendors of Hadoop. - Hadoop 2.x is better than Hadoop 1.x - Hadoop 2.x is faster, with more features such as short-circuit reads which will help - improve your HBase random read profile as well important bug fixes that will improve your - overall HBase experience. You should run Hadoop 2 rather than Hadoop 1. HBase 0.98 - deprecates use of Hadoop1. HBase 1.0 will not support Hadoop1. + Hadoop 2.x is recommended. + Hadoop 2.x is faster and includes features, such as short-circuit reads, which will + help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes + that will improve your overall HBase experience. HBase 0.98 deprecates use of Hadoop 1.x, + and HBase 1.0 will not support Hadoop 1.x. Use the following legend to interpret this table: Hadoop Distributed File System (HDFS). Fully-distributed mode can ONLY run on HDFS. See the Hadoop - requirements and instructions for how to set up HDFS. + requirements and instructions for how to set up HDFS for Hadoop 1.x. A good + walk-through for setting up HDFS on Hadoop 2 is at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide. Below we describe the different distributed setups. Starting, verification and exploration of your install, whether a pseudo-distributed or @@ -628,207 +722,139 @@ Index: pom.xml
Pseudo-distributed + + Pseudo-Distributed Quickstart + A quickstart has been added to the chapter. See . Some of the information that was originally in this + section has been moved there. + A pseudo-distributed mode is simply a fully-distributed mode run on a single host. Use this configuration testing and prototyping on HBase. Do not use this configuration for production nor for evaluating HBase performance. - First, if you want to run on HDFS rather than on the local filesystem, setup your - HDFS. You can set up HDFS also in pseudo-distributed mode (TODO: Add pointer to HOWTO doc; - the hadoop site doesn't have any any more). Ensure you have a working HDFS before - proceeding. - - Next, configure HBase. Edit conf/hbase-site.xml. This is the file - into which you add local customizations and overrides. At a minimum, you must tell HBase - to run in (pseudo-)distributed mode rather than in default standalone mode. To do this, - set the hbase.cluster.distributed property to true (Its default is - false). The absolute bare-minimum hbase-site.xml - is therefore as follows: - - - hbase.cluster.distributed - true - - -]]> - - With this configuration, HBase will start up an HBase Master process, a ZooKeeper - server, and a RegionServer process running against the local filesystem writing to - wherever your operating system stores temporary files into a directory named - hbase-YOUR_USER_NAME. - - Such a setup, using the local filesystem and writing to the operating systems's - temporary directory is an ephemeral setup; the Hadoop local filesystem -- which is what - HBase uses when it is writing the local filesytem -- would lose data unless the system - was shutdown properly in versions of HBase before 0.98.4 and 1.0.0 (see - HBASE-11218 Data - loss in HBase standalone mode). Writing to the operating - system's temporary directory can also make for data loss when the machine is restarted as - this directory is usually cleared on reboot. For a more permanent setup, see the next - example where we make use of an instance of HDFS; HBase data will be written to the Hadoop - distributed filesystem rather than to the local filesystem's tmp directory. - In this conf/hbase-site.xml example, the - hbase.rootdir property points to the local HDFS instance homed on the - node h-24-30.example.com. - - Let HBase create <filename>${hbase.rootdir}</filename> - Let HBase create the hbase.rootdir directory. If you don't, - you'll get warning saying HBase needs a migration run because the directory is missing - files expected by HBase (it'll create them if you let it). - - -<configuration> - <property> - <name>hbase.rootdir</name> - <value>hdfs://h-24-30.sfo.stumble.net:8020/hbase</value> - </property> - <property> - <name>hbase.cluster.distributed</name> - <value>true</value> - </property> -</configuration> - - - Now skip to for how to start and verify your pseudo-distributed install. - See for notes on how to start extra Masters and RegionServers - when running pseudo-distributed. - - -
- Pseudo-distributed Extras - -
- Startup - To start up the initial HBase cluster... - % bin/start-hbase.sh - To start up an extra backup master(s) on the same server run... - % bin/local-master-backup.sh start 1 - ... the '1' means use ports 16001 & 16011, and this backup master's logfile - will be at logs/hbase-${USER}-1-master-${HOSTNAME}.log. - To startup multiple backup masters run... - % bin/local-master-backup.sh start 2 3 - You can start up to 9 backup masters (10 total). - To start up more regionservers... - % bin/local-regionservers.sh start 1 - ... where '1' means use ports 16201 & 16301 and its logfile will be at - `logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log. - To add 4 more regionservers in addition to the one you just started by - running... - % bin/local-regionservers.sh start 2 3 4 5 - This supports up to 99 extra regionservers (100 total). -
-
- Stop - Assuming you want to stop master backup # 1, run... - % cat /tmp/hbase-${USER}-1-master.pid |xargs kill -9 - Note that bin/local-master-backup.sh stop 1 will try to stop the cluster along - with the master. - To stop an individual regionserver, run... - % bin/local-regionservers.sh stop 1 -
- -
-
+
+
+ Fully-distributed + By default, HBase runs in standalone mode. Both standalone mode and pseudo-distributed + mode are provided for the purposes of small-scale testing. For a production environment, + distributed mode is appropriate. In distributed mode, multiple instances of HBase daemons + run on multiple servers in the cluster. + Just as in pseudo-distributed mode, a fully distributed configuration requires that you + set the hbase-cluster.distributed property to true. + Typically, the hbase.rootdir is configured to point to a highly-available HDFS + filesystem. + In addition, the cluster is configured so that multiple cluster nodes enlist as + RegionServers, ZooKeeper QuorumPeers, and backup HMaster servers. These configuration basics + are all demonstrated in . + + Distributed RegionServers + Typically, your cluster will contain multiple RegionServers all running on different + servers, as well as primary and backup Master and Zookeeper daemons. The + conf/regionservers file on the master server contains a list of + hosts whose RegionServers are associated with this cluster. Each host is on a separate + line. All hosts listed in this file will have their RegionServer processes started and + stopped when the master server starts or stops. + + + ZooKeeper and HBase + See section for ZooKeeper setup for HBase. + -
- Fully-distributed - - For running a fully-distributed operation on more than one host, make the following - configurations. In hbase-site.xml, add the property - hbase.cluster.distributed and set it to true and - point the HBase hbase.rootdir at the appropriate HDFS NameNode and - location in HDFS where you would like HBase to write data. For example, if you namenode - were running at namenode.example.org on port 8020 and you wanted to home your HBase in - HDFS at /hbase, make the following configuration. - + + Example Distributed HBase Cluster + This is a bare-bones conf/hbase-site.xml for a distributed HBase + cluster. A cluster that is used for real-world work would contain more custom + configuration parameters. Most HBase configuration directives have default values, which + are used unless the value is overridden in the hbase-site.xml. See for more information. - ... hbase.rootdir hdfs://namenode.example.org:8020/hbase - The directory shared by RegionServers. - hbase.cluster.distributed true - The mode the cluster will be in. Possible values are - false: standalone and pseudo-distributed setups with managed Zookeeper - true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) - - ... + + hbase.zookeeper.quorum + node-a.example.com,node-b.example.com,node-c.example.com + ]]> + This is an example conf/regionservers file, which contains a list + of each node that should run a RegionServer in the cluster. These nodes need HBase + installed and they need to use the same contents of the conf/ + directory as the Master server.. + +node-a.example.com +node-b.example.com +node-c.example.com + + This is an example conf/backup-masters file, which contains a + list of each node that should run a backup Master instance. The backup Master instances + will sit idle unless the main Master becomes unavailable. + +node-b.example.com +node-c.example.com + + + + Distributed HBase Quickstart + See for a walk-through of a simple three-node + cluster configuration with multiple ZooKeeper, backup HMaster, and RegionServer + instances. + -
- <filename>regionservers</filename> - - In addition, a fully-distributed mode requires that you modify - conf/regionservers. The file lists all hosts that you would have running - HRegionServers, one host per line (This file in HBase is - like the Hadoop slaves file). All servers listed in this file will - be started and stopped when HBase cluster start or stop is run. -
- -
- ZooKeeper and HBase - See section for ZooKeeper setup for HBase. -
- -
- HDFS Client Configuration - - Of note, if you have made HDFS client configuration on your - Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to - server-side configurations -- HBase will not see this configuration unless you do one of - the following: - - - + + HDFS Client Configuration + + Of note, if you have made HDFS client configuration on your Hadoop cluster, such as + configuration directives for HDFS clients, as opposed to server-side configurations, you + must use one of the following methods to enable HBase to see and use these configuration + changes: + + Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh. - + - + Add a copy of hdfs-site.xml (or hadoop-site.xml) or, better, symlinks, under ${HBASE_HOME}/conf, or - + - + if only a small set of HDFS client configurations, add them to hbase-site.xml. - - - - An example of such an HDFS client configuration is - dfs.replication. If for example, you want to run with a replication - factor of 5, hbase will create files with the default of 3 unless you do the above to - make the configuration available to HBase. -
-
+ + + + + An example of such an HDFS client configuration is dfs.replication. + If for example, you want to run with a replication factor of 5, hbase will create files with + the default of 3 unless you do the above to make the configuration available to + HBase.
+
@@ -871,7 +897,7 @@ stopping hbase............... of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.
- + diff --git a/src/main/docbkx/getting_started.xml b/src/main/docbkx/getting_started.xml index 117e1ec3546..da5946c72fb 100644 --- a/src/main/docbkx/getting_started.xml +++ b/src/main/docbkx/getting_started.xml @@ -40,46 +40,51 @@
- Quick Start + Quick Start - Standalone HBase - This guide describes setup of a standalone HBase instance. It will run against the local - filesystem. In later sections we will take you through how to run HBase on Apache Hadoop's - HDFS, a distributed filesystem. This section shows you how to create a table in HBase, - inserting rows into your new HBase table via the HBase shell, and then - cleaning up and shutting down your standalone, local filesystem-based HBase instance. The - below exercise should take no more than ten minutes (not including download time). - This guide describes setup of a standalone HBase instance running against the local + filesystem. This is not an appropriate configuration for a production instance of HBase, but + will allow you to experiment with HBase. This section shows you how to create a table in + HBase using the hbase shell CLI, insert rows into the table, perform put + and scan operations against the table, enable or disable the table, and start and stop HBase. + Apart from downloading HBase, this procedure should take less than 10 minutes. + Local Filesystem and Durability - Using HBase with a LocalFileSystem does not currently guarantee durability. The HDFS - local filesystem implementation will lose edits if files are not properly closed -- which is - very likely to happen when experimenting with a new download. You need to run HBase on HDFS - to ensure all writes are preserved. Running against the local filesystem though will get you - off the ground quickly and get you familiar with how the general system works so lets run - with it for now. See The below advice is for HBase 0.98.2 and earlier releases only. This is fixed + in HBase 0.98.3 and beyond. See HBASE-11272 and + HBASE-11218. + Using HBase with a local filesystem does not guarantee durability. The HDFS + local filesystem implementation will lose edits if files are not properly closed. This is + very likely to happen when you are experimenting with new software, starting and stopping + the daemons often and not always cleanly. You need to run HBase on HDFS + to ensure all writes are preserved. Running against the local filesystem is intended as a + shortcut to get you familiar with how the general system works, as the very first phase of + evaluation. See and its associated issues - for more details. - + for more details about the issues of running on the local filesystem. + - Loopback IP - The below advice is for hbase-0.94.x and older versions only. We believe this - fixed in hbase-0.96.0 and beyond (let us know if we have it wrong). There - should be no need of the below modification to /etc/hosts in later - versions of HBase. + Loopback IP - HBase 0.94.x and earlier + The below advice is for hbase-0.94.x and older versions only. This is fixed in + hbase-0.96.0 and beyond. - HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other - distributions, for example, will default to 127.0.1.1 and this will cause problems for you - See Why does - HBase care about /etc/hosts? for detail. - . - /etc/hosts should look something like this: - + Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu + and some other distributions default to 127.0.1.1 and this will cause problems for you . See Why does HBase + care about /etc/hosts? for detail. + + Example /etc/hosts File for Ubuntu + The following /etc/hosts file works correctly for HBase 0.94.x + and earlier, on Ubuntu. Use this as a template if you run into trouble. + 127.0.0.1 localhost 127.0.0.1 ubuntu.ubuntu-domain ubuntu - - + +
@@ -89,159 +94,611 @@
- Download and unpack the latest stable release. + Get Started with HBase - Choose a download site from this list of + Download, Configure, and Start HBase + + Choose a download site from this list of Apache Download Mirrors. Click on the suggested top link. This will take you to a mirror of HBase Releases. Click on the folder named stable and then - download the file that ends in .tar.gz to your local filesystem; e.g. - hbase-0.94.2.tar.gz. - - Decompress and untar your download and then change into the unpacked directory. - - .tar.gz -$ cd hbase-]]> - - - At this point, you are ready to start HBase. But before starting it, edit - conf/hbase-site.xml, the file you write your site-specific - configurations into. Set hbase.rootdir, the directory HBase writes data - to, and hbase.zookeeper.property.dataDir, the directory ZooKeeper writes - its data too: - - + download the binary file that ends in .tar.gz to your local filesystem. Be + sure to choose the version that corresponds with the version of Hadoop you are likely to use + later. In most cases, you should choose the file for Hadoop 2, which will be called something + like hbase-0.98.3-hadoop2-bin.tar.gz. Do not download the file ending in + src.tar.gz for now. + + + Extract the downloaded file, and change to the newly-created directory. + +$ tar xzvf hbase-]]>-hadoop2-bin.tar.gz +$ cd hbase-]]>-hadoop2/ + + + + Edit conf/hbase-site.xml, which is the main HBase configuration + file. At this time, you only need to specify the directory on the local filesystem where + HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many + servers are configured to delete the contents of /tmp upon reboot, so you should store + the data elsewhere. The following configuration will store HBase's data in the + hbase directory, in the home directory of the user called + testuser. Paste the <property> tags beneath the + <configuration> tags, which should be empty in a new HBase install. + + Example <filename>hbase-site.xml</filename> for Standalone HBase + hbase.rootdir - file:///DIRECTORY/hbase + file:///home/testuser/hbase hbase.zookeeper.property.dataDir - /DIRECTORY/zookeeper + /home/testuser/zookeeper -]]> - Replace DIRECTORY in the above with the path to the directory you - would have HBase and ZooKeeper write their data. By default, - hbase.rootdir is set to /tmp/hbase-${user.name} - and similarly so for the default ZooKeeper data location which means you'll lose all your - data whenever your server reboots unless you change it (Most operating systems clear - /tmp on restart). -
+ + ]]> + + + You do not need to create the HBase data directory. HBase will do this for you. If + you create the directory, HBase will attempt to do a migration, which is not what you + want. + + + The bin/start-hbase.sh script is provided as a convenient way + to start HBase. Issue the command, and if all goes well, a message is logged to standard + output showing that HBase started successfully. You can use the jps + command to verify that you have one running process called HMaster + and at least one called HRegionServer. + Java needs to be installed and available. If you get an error indicating that + Java is not installed, but it is on your system, perhaps in a non-standard location, + edit the conf/hbase-env.sh file and modify the + JAVA_HOME setting to point to the directory that contains + bin/java your system. + + -
- Start HBase - - Now start HBase: - $ ./bin/start-hbase.sh -starting Master, logging to logs/hbase-user-master-example.org.out - - You should now have a running standalone HBase instance. In standalone mode, HBase runs - all daemons in the the one JVM; i.e. both the HBase and ZooKeeper daemons. HBase logs can be - found in the logs subdirectory. Check them out especially if it seems - HBase had trouble starting. - - - Is <application>java</application> installed? - - All of the above presumes a 1.6 version of Oracle java is - installed on your machine and available on your path (See ); i.e. when you type java, you see output - that describes the options the java program takes (HBase requires java 6). If this is not - the case, HBase will not start. Install java, edit conf/hbase-env.sh, - uncommenting the JAVA_HOME line pointing it to your java install, then, - retry the steps above. - -
- -
- Shell Exercises - - Connect to your running HBase via the shell. - - ' for list of supported commands. -Type "exit" to leave the HBase Shell -Version: 0.90.0, r1001068, Fri Sep 24 13:55:42 PDT 2010 - -hbase(main):001:0>]]> - - Type help and then <RETURN> to see a listing - of shell commands and options. Browse at least the paragraphs at the end of the help - emission for the gist of how variables and command arguments are entered into the HBase - shell; in particular note how table names, rows, and columns, etc., must be quoted. - - Create a table named test with a single column family named - cf. Verify its creation by listing all tables and then insert some - values. - - create 'test', 'cf' + + Use HBase For the First Time + + Connect to HBase. + Connect to your running instance of HBase using the hbase shell + command, located in the bin/ directory of your HBase + install. In this example, some usage and version information that is printed when you + start HBase Shell has been omitted. The HBase Shell prompt ends with a + > character. + +$ ./bin/hbase shell +hbase(main):001:0> + + + + Display HBase Shell Help Text. + Type help and press Enter, to display some basic usage + information for HBase Shell, as well as several example commands. Notice that table + names, rows, columns all must be enclosed in quote characters. + + + Create a table. + Use the create command to create a new table. You must specify the + table name and the ColumnFamily name. + +hbase> create 'test', 'cf' 0 row(s) in 1.2200 seconds -hbase(main):003:0> list 'test' -.. -1 row(s) in 0.0550 seconds -hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1' -0 row(s) in 0.0560 seconds -hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2' -0 row(s) in 0.0370 seconds -hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3' -0 row(s) in 0.0450 seconds]]> + + + + List Information About your Table + Use the list command to + +hbase> list 'test' +TABLE +test +1 row(s) in 0.0350 seconds - Above we inserted 3 values, one at a time. The first insert is at - row1, column cf:a with a value of - value1. Columns in HBase are comprised of a column family prefix -- - cf in this example -- followed by a colon and then a column qualifier - suffix (a in this case). +=> ["test"] + + + + Put data into your table. + To put data into your table, use the put command. + +hbase> put 'test', 'row1', 'cf:a', 'value1' +0 row(s) in 0.1770 seconds - Verify the data insert by running a scan of the table as follows +hbase> put 'test', 'row2', 'cf:b', 'value2' +0 row(s) in 0.0160 seconds - scan 'test' -ROW COLUMN+CELL -row1 column=cf:a, timestamp=1288380727188, value=value1 -row2 column=cf:b, timestamp=1288380738440, value=value2 -row3 column=cf:c, timestamp=1288380747365, value=value3 -3 row(s) in 0.0590 seconds]]> +hbase> put 'test', 'row3', 'cf:c', 'value3' +0 row(s) in 0.0260 seconds + + Here, we insert three values, one at a time. The first insert is at + row1, column cf:a, with a value of + value1. Columns in HBase are comprised of a column family prefix, + cf in this example, followed by a colon and then a column qualifier + suffix, a in this case. + + + Scan the table for all data at once. + One of the ways to get data from HBase is to scan. Use the scan + command to scan the table for data. You can limit your scan, but for now, all data is + fetched. + +hbase> scan 'test' +ROW COLUMN+CELL + row1 column=cf:a, timestamp=1403759475114, value=value1 + row2 column=cf:b, timestamp=1403759492807, value=value2 + row3 column=cf:c, timestamp=1403759503155, value=value3 +3 row(s) in 0.0440 seconds + + + + Get a single row of data. + To get a single row of data at a time, use the get command. + +hbase> get 'test', 'row1' +COLUMN CELL + cf:a timestamp=1403759475114, value=value1 +1 row(s) in 0.0230 seconds + + + + Disable a table. + If you want to delete a table or change its settings, as well as in some other + situations, you need to disable the table first, using the disable + command. You can re-enable it using the enable command. + +hbase> disable 'test' +0 row(s) in 1.6270 seconds - Get a single row - - get 'test', 'row1' -COLUMN CELL -cf:a timestamp=1288380727188, value=value1 -1 row(s) in 0.0400 seconds]]> - - Now, disable and drop your table. This will clean up all done above. - - h disable 'test' -0 row(s) in 1.0930 seconds -hbase(main):013:0> drop 'test' -0 row(s) in 0.0770 seconds ]]> - - Exit the shell by typing exit. - - exit]]> +hbase> enable 'test' +0 row(s) in 0.4500 seconds + + Disable the table again if you tested the enable command above: + +hbase> disable 'test' +0 row(s) in 1.6270 seconds + + + + Drop the table. + To drop (delete) a table, use the drop command. + +hbase> drop 'test' +0 row(s) in 0.2900 seconds + + + + Exit the HBase Shell. + To exit the HBase Shell and disconnect from your cluster, use the + quit command. HBase is still running in the background. + + + + + Stop HBase + + In the same way that the bin/start-hbase.sh script is provided + to conveniently start all HBase daemons, the bin/stop-hbase.sh + script stops them. + +$ ./bin/stop-hbase.sh +stopping hbase.................... +$ + + + + After issuing the command, it can take several minutes for the processes to shut + down. Use the jps to be sure that the HMaster and HRegionServer + processes are shut down. + +
-
- Stopping HBase - - Stop your hbase instance by running the stop script. - - $ ./bin/stop-hbase.sh -stopping hbase............... +
+ Intermediate - Pseudo-Distributed Local Install + After working your way through , you can re-configure HBase + to run in pseudo-distributed mode. Pseudo-distributed mode means + that HBase still runs completely on a single host, but each HBase daemon (HMaster, + HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the + hbase.rootdir property as described in , your data + is still stored in /tmp/. In this walk-through, we store your data in + HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to + continue storing your data in the local filesystem. + + Hadoop Configuration + This procedure assumes that you have configured Hadoop and HDFS on your local system + and or a remote system, and that they are running and available. It also assumes you are + using Hadoop 2. Currently, the documentation on the Hadoop website does not include a + quick start for Hadoop 2, but the guide at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide + is a good starting point. + + + + Stop HBase if it is running. + If you have just finished and HBase is still running, + stop it. This procedure will create a totally new directory where HBase will store its + data, so any databases you created before will be lost. + + + Configure HBase. + + Edit the hbase-site.xml configuration. First, add the following + property, which directs HBase to run in distributed mode, with one JVM instance per + daemon. + + + hbase.cluster.distributed + true + + ]]> + Next, change the hbase.rootdir from the local filesystem to the address + of your HDFS instance, using the hdfs://// URI syntax. In this example, + HDFS is running on the localhost at port 8020. + + hbase.rootdir + hdfs://localhost:8020/hbase + + ]]> + + You do not need to create the directory in HDFS. HBase will do this for you. If you + create the directory, HBase will attempt to do a migration, which is not what you + want. + + + Start HBase. + Use the bin/start-hbase.sh command to start HBase. If your + system is configured correctly, the jps command should show the + HMaster and HRegionServer processes running. + + + Check the HBase directory in HDFS. + If everything worked correctly, HBase created its directory in HDFS. In the + configuration above, it is stored in /hbase/ on HDFS. You can use + the hadoop fs command in Hadoop's bin/ directory + to list this directory. + +$ ./bin/hadoop fs -ls /hbase +Found 7 items +drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp +drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs +drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt +drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data +-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id +-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version +drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs + + + + Create a table and populate it with data. + You can use the HBase Shell to create a table, populate it with data, scan and get + values from it, using the same procedure as in . + + + Start and stop a backup HBase Master (HMaster) server. + + Running multiple HMaster instances on the same hardware does not make sense in a + production environment, in the same way that running a pseudo-distributed cluster does + not make sense for production. This step is offered for testing and learning purposes + only. + + The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster + servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster, + use the local-master-backup.sh. For each backup master you want to + start, add a parameter representing the port offset for that master. Each HMaster uses + two ports (16000 and 16010 by default). The port offset is added to these ports, so + using an offset of 2, the first backup HMaster would use ports 16002 and 16012. The + following command starts 3 backup servers using ports 16002/16012, 16003/16013, and + 16005/16015. + +$ ./bin/local-master-backup.sh 2 3 5 + + To kill a backup master without killing the entire cluster, you need to find its + process ID (PID). The PID is stored in a file with a name like + /tmp/hbase-USER-X-master.pid. + The only contents of the file are the PID. You can use the kill -9 + command to kill that PID. The following command will kill the master with port offset 1, + but leave the cluster running: + +$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9 + + + + Start and stop additional RegionServers + The HRegionServer manages the data in its StoreFiles as directed by the HMaster. + Generally, one HRegionServer runs per node in the cluster. Running multiple + HRegionServers on the same system can be useful for testing in pseudo-distributed mode. + The local-regionservers.sh command allows you to run multiple + RegionServers. It works in a similar way to the + local-master-backup.sh command, in that each parameter you provide + represents the port offset for an instance. Each RegionServer requires two ports, and + the default ports are 16200 and 16300. You can run 99 additional RegionServers, or 100 + total, on a server. The following command starts four additional + RegionServers, running on sequential ports starting at 16202/16302. + +$ .bin/local-regionservers.sh start 2 3 4 5 + + To stop a RegionServer manually, use the local-regionservers.sh + command with the stop parameter and the offset of the server to + stop. + $ .bin/local-regionservers.sh stop 3 + + + Stop HBase. + You can stop HBase the same way as in the procedure, using the + bin/stop-hbase.sh command. + +
+ +
+ Advanced - Fully Distributed + In reality, you need a fully-distributed configuration to fully test HBase and to use it + in real-world scenarios. In a distributed configuration, the cluster contains multiple + nodes, each of which runs one or more HBase daemon. These include primary and backup Master + instances, multiple Zookeeper nodes, and multiple RegionServer nodes. + This advanced quickstart adds two more nodes to your cluster. The architecture will be + as follows: + + Distributed Cluster Demo Architecture + + + + Node Name + Master + ZooKeeper + RegionServer + + + + + node-a.example.com + yes + yes + no + + + node-b.example.com + backup + yes + yes + + + node-c.example.com + no + yes + yes + + + +
+ This quickstart assumes that each node is a virtual machine and that they are all on the + same network. It builds upon the previous quickstart, , + assuming that the system you configured in that procedure is now node-a. Stop HBase on node-a + before continuing. + + Be sure that all the nodes have full access to communicate, and that no firewall rules + are in place which could prevent them from talking to each other. If you see any errors like + no route to host, check your firewall. + + + Configure Password-Less SSH Access + node-a needs to be able to log into node-b and + node-c (and to itself) in order to start the daemons. The easiest way to accomplish this is + to use the same username on all hosts, and configure password-less SSH login from + node-a to each of the others. + + On <code>node-a</code>, generate a key pair. + While logged in as the user who will run HBase, generate a SSH key pair, using the + following command: + + $ ssh-keygen -t rsa + If the command succeeds, the location of the key pair is printed to standard output. + The default name of the public key is id_rsa.pub. + + + Create the directory that will hold the shared keys on the other nodes. + On node-b and node-c, log in as the HBase user and create + a .ssh/ directory in the user's home directory, if it does not + already exist. If it already exists, be aware that it may already contain other keys. + + + Copy the public key to the other nodes. + Securely copy the public key from node-a to each of the nodes, by + using the scp or some other secure means. On each of the other nodes, + create a new file called .ssh/authorized_keys if it does + not already exist, and append the contents of the + id_rsa.pub file to the end of it. Note that you also need to do + this for node-a itself. + $ cat id_rsa.pub >> ~/.ssh/authorized_keys + + + Test password-less login. + If you performed the procedure correctly, if you SSH from node-a to + either of the other nodes, using the same username, you should not be prompted for a password. + + + + Since node-b will run a backup Master, repeat the procedure above, + substituting node-b everywhere you see node-a. Be sure not to + overwrite your existing .ssh/authorized_keys files, but concatenate + the new key onto the existing file using the >> operator rather than + the > operator. + + + + + Prepare <code>node-a</code> + node-a will run your primary master and ZooKeeper processes, but no + RegionServers. + + Stop the RegionServer from starting on <code>node-a</code>. + Edit conf/regionservers and remove the line which contains + localhost. Add lines with the hostnames or IP addresses for + node-b and node-c. Even if you did want to run a + RegionServer on node-a, you should refer to it by the hostname the other + servers would use to communicate with it. In this case, that would be + node-a.example.com. This enables you to distribute the + configuration to each node of your cluster any hostname conflicts. Save the file. + + + Configure HBase to use <code>node-b</code> as a backup master. + Create a new file in conf/ called + backup-masters, and add a new line to it with the hostname for + node-b. In this demonstration, the hostname is + node-b.example.com. + + + Configure ZooKeeper + In reality, you should carefully consider your ZooKeeper configuration. You can find + out more about configuring ZooKeeper in . This configuration will direct HBase to start and manage a + ZooKeeper instance on each node of the cluster. + On node-a, edit conf/hbase-site.xml and add the + following properties. + + hbase.zookeeper.quorum + node-a.example.com,node-b.example.com,node-c.example.com + + + hbase.zookeeper.property.dataDir + /usr/local/zookeeper + + ]]> + + + Everywhere in your configuration that you have referred to node-a as + localhost, change the reference to point to the hostname that + the other nodes will use to refer to node-a. In these examples, the + hostname is node-a.example.com. + + + + Prepare <code>node-b</code> and <code>node-c</code> + node-b will run a backup master server and a ZooKeeper instance. + + Download and unpack HBase. + Download and unpack HBase to node-b, just as you did for the standalone + and pseudo-distributed quickstarts. + + + Copy the configuration files from <code>node-a</code> to <code>node-b</code>.and + <code>node-c</code>. + Each node of your cluster needs to have the same configuration information. Copy the + contents of the conf/ directory to the conf/ + directory on node-b and node-c. + + + + Start and Test Your Cluster + + Be sure HBase is not running on any node. + If you forgot to stop HBase from previous testing, you will have errors. Check to + see whether HBase is running on any of your nodes by using the jps + command. Look for the processes HMaster, + HRegionServer, and HQuorumPeer. If they exist, + kill them. + + + Start the cluster. + On node-a, issue the start-hbase.sh command. Your + output will be similar to that below. + +$ bin/start-hbase.sh +node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out +node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out +node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out +starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out +node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out +node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out +node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out + + ZooKeeper starts first, followed by the master, then the RegionServers, and finally + the backup masters. + + + Verify that the processes are running. + On each node of the cluster, run the jps command and verify that + the correct processes are running on each server. You may see additional Java processes + running on your servers as well, if they are used for other purposes. + + <code>node-a</code> <command>jps</command> Output + +$ jps +20355 Jps +20071 HQuorumPeer +20137 HMaster + + + + <code>node-b</code> <command>jps</command> Output + +$ jps +15930 HRegionServer +16194 Jps +15838 HQuorumPeer +16010 HMaster + + + + <code>node-c</code> <command>jps</command> Output + +$ jps +13901 Jps +13639 HQuorumPeer +13737 HRegionServer + + + + ZooKeeper Process Name + The HQuorumPeer process is a ZooKeeper instance which is controlled + and started by HBase. If you use ZooKeeper this way, it is limited to one instance per + cluster node, , and is appropriate for testing only. If ZooKeeper is run outside of + HBase, the process is called QuorumPeer. For more about ZooKeeper + configuration, including using an external ZooKeeper instance with HBase, see . + + + + Browse to the Web UI. + + Web UI Port Changes + In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from + 60010 for the Master and 60030 for each RegionServer to 16610 for the Master and 16030 + for the RegionServer. + + If everything is set up correctly, you should be able to connect to the UI for the + Master http://node-a.example.com:60110/ or the secondary master at + http://node-b.example.com:60110/ for the secondary master, using a + web browser. If you can connect via localhost but not from another host, + check your firewall rules. You can see the web UI for each of the RegionServers at port + 60130 of their IP addresses, or by clicking their links in the web UI for the + Master. + + + Test what happens when nodes or services disappear. + With a three-node cluster like you have configured, things will not be very + resilient. Still, you can test what happens when the primary Master or a RegionServer + disappears, by killing the processes and watching the logs. + + +
+
Where to go next - The above described standalone setup is good for testing and experiments only. In the - next chapter, , we'll go into depth on the different HBase run modes, system - requirements running HBase, and critical configurations setting up a distributed HBase - deploy. + The next chapter, , gives more information about the different HBase run modes, + system requirements for running HBase, and critical configuration areas for setting up a + distributed HBase cluster.
-