Apache HBase (TM) is a distributed, column-oriented store, modeled after Google's BigTable. Apache HBase is built on top of Hadoop for its MapReduce and distributed file system implementation. All these projects are open-source and part of the Apache Software Foundation.
As being distributed, large scale platforms, the Hadoop and HBase projects mainly focus on *nix environments for production installations. However, being developed in Java, both projects are fully portable across platforms and, hence, also to the Windows operating system. For ease of development the projects rely on Cygwin to have a *nix-like environment on Windows to run the shell scripts.
This document explains the intricacies of running Apache HBase on Windows using Cygwin as an all-in-one single-node installation for testing and development. The HBase Overview and QuickStart guides on the other hand go a long way in explaning how to setup HBase in more complex deployment scenario's.
For running Apache HBase on Windows, 3 technologies are required: Java, Cygwin and SSH. The following paragraphs detail the installation of each of the aforementioned technologies.
HBase depends on the Java Platform, Standard Edition, 6 Release. So the target system has to be provided with at least the Java Runtime Environment (JRE); however if the system will also be used for development, the Jave Development Kit (JDK) is preferred. You can download the latest versions for both from Sun's download page. Installation is a simple GUI wizard that guides you through the process.
Cygwin is probably the oddest technology in this solution stack. It provides a dynamic link library that emulates most of a *nix environment on Windows. On top of that a whole bunch of the most common *nix tools are supplied. Combined, the DLL with the tools form a very *nix-alike environment on Windows.
For installation, Cygwin provides the setup.exe
utility that tracks the versions of all installed components on the target system and provides the mechanism for installing or updating everything from the mirror sites of Cygwin.
To support installation, the setup.exe
utility uses 2 directories on the target system. The Root directory for Cygwin (defaults to C:\cygwin)
which will become /
within the eventual Cygwin installation; and the Local Package directory (e.g. C:\cygsetup
that is the cache where setup.exe
stores the packages before they are installed. The cache must not be the same folder as the Cygwin root.
Perform following steps to install Cygwin, which are elaboratly detailed in the 2nd chapter of the Cygwin User's Guide:
Administrator
privileges on the target system.C:\cygwin\root
and C:\cygwin\setup
folders.setup.exe
utility and save it to the Local Package directory.setup.exe
utility,
Install from Internet
option,setup.exe
utility in the Local Package folder.CYGWIN_HOME
system-wide environment variable that points to your Root directory.%CYGWIN_HOME%\bin
to the end of your PATH
environment variable.Cygwin.bat
command in the Root folder. You should end up in a terminal window that is running a Bash shell. Test the shell by issuing following commands:
cd /
should take you to thr Root directory in Cygwin;LS
commands that should list all files and folders in the current directory.exit
command to end the terminal.HBase (and Hadoop) rely on SSH for interprocess/-node communication and launching remote commands. SSH will be provisioned on the target system via Cygwin, which supports running Cygwin programs as Windows services!
setup.exe
utility.Next
button until the Select Packages
panel is shown.View
button to toggle to the list view, which is ordered alfabetically on Package
, making it easier to find the packages we'll need.Skip
) so it's marked for installation. Use the Next
button to download and install the packages.
Download the latest release of Apache HBase from the website. As the Apache HBase distributable is just a zipped archive, installation is as simple as unpacking the archive so it ends up in its final installation directory. Notice that HBase has to be installed in Cygwin and a good directory suggestion is to use /usr/local/
(or [Root directory]\usr\local
in Windows slang). You should end up with a /usr/local/hbase-<version>
installation in Cygwin.
There are 3 parts left to configure: Java, SSH and HBase itself. Following paragraphs explain eacht topic in detail.
One important thing to remember in shell scripting in general (i.e. *nix and Windows) is that managing, manipulating and assembling path names that contains spaces can be very hard, due to the need to escape and quote those characters and strings. So we try to stay away from spaces in path names. *nix environments can help us out here very easily by using symbolic links.
/usr/local
to the Java home directory by using the following command and substituting the name of your chosen Java environment:
LN -s /cygdrive/c/Program\ Files/Java/<jre name> /usr/local/<jre name>
CD /usr/local/<jre name>
and issueing the command ./bin/java -version
. This should output your version of the chosen JRE.Configuring SSH is quite elaborate, but primarily a question of launching it by default as a Windows service.
Run as Administrator
.LS -L
command on the different files. Also, notice the auto-completion feature in the shell using <TAB>
is extremely handy in these situations.
chmod +r /etc/passwd
to make the passwords file readable for allchmod u+w /etc/passwd
to make the passwords file writable for the ownerchmod +r /etc/group
to make the groups file readable for allchmod u+w /etc/group
to make the groups file writable for the ownerchmod 755 /var
to make the var folder writable to owner and readable and executable to allPARANOID
line:
ALL : localhost 127.0.0.1/32 : allow
ALL : [::1]/128 : allow
ssh-host-config
/etc/ssh_config
, answer yes
./etc/sshd_config
, answer yes
.yes
.sshd
as a service, answer yes
. Make sure you started your shell as Adminstrator!<enter>
as the default is ntsec
.sshd
account, answer yes
.no
as the default will suffice.cyg_server
account, answer yes
. Enter a password for the account.net start sshd
or cygrunsrv --start sshd
. Notice that cygrunsrv
is the utility that make the process run as a Windows service. Confirm that you see a message stating that the CYGWIN sshd service was started succesfully.
mkpasswd -cl > /etc/passwd
mkgroup --local > /etc/group
whoami
to verify your userIDssh localhost
to connect to the system itself
yes
when presented with the server's fingerprintexit
command should take you back to your first shell in CygwinExit
should terminate the Cygwin shell.[installation directory]
as working directory.
./conf/hbase-env.sh
to configure its dependencies on the runtime environment. Copy and uncomment following lines just underneath their original, change them to fit your environemnt. They should read something like:
export JAVA_HOME=/usr/local/<jre name>
export HBASE_IDENT_STRING=$HOSTNAME
as this most likely does not inlcude spaces.hbase-default.xml
file for configuration. Some properties do not resolve to existing directories because the JVM runs on Windows. This is the major issue to keep in mind when working with Cygwin: within the shell all paths are *nix-alike, hence relative to the root /
. However, every parameter that is to be consumed within the windows processes themself, need to be Windows settings, hence C:\
-alike. Change following propeties in the configuration file, adjusting paths where necessary to conform with your own installation:
hbase.rootdir
must read e.g. file:///C:/cygwin/root/tmp/hbase/data
hbase.tmp.dir
must read C:/cygwin/root/tmp/hbase/tmp
hbase.zookeeper.quorum
must read 127.0.0.1
because for some reason localhost
doesn't seem to resolve properly on Cygwin.hbase.rootdir
and hbase.tmp.dir
directories exist and have the proper rights set up e.g. by issuing a chmod 777
on them.This should conclude the installation and configuration of Apache HBase on Windows using Cygwin. So it's time to test it.
CD /usr/local/hbase-<version>
, preferably using auto-completion../bin/start-hbase.sh
yes
../logs
directory for any exceptions../bin/hbase shell
create 'test', 'data'
list
put 'test', 'row1', 'data:1', 'value1' put 'test', 'row2', 'data:2', 'value2' put 'test', 'row3', 'data:3', 'value3'
scan 'test'
that should list all the rows previously inserted. Notice how 3 new columns where added without changing the schema!disable 'test'
followed by drop 'test'
and verified by list
which should give an empty listing.exit
./bin/stop-hbase.sh
command. And wait for it to complete!!! Killing the process might corrupt your data on disk../logs
directory.#hbase@freenode.net
). People are very active and keen to help out!Now your HBase server is running, start coding and build that next killer app on this particular, but scalable datastore!