diff --git a/src/java/org/apache/hadoop/hbase/client/package-info.java b/src/java/org/apache/hadoop/hbase/client/package-info.java new file mode 100644 index 00000000000..0ad66d94ef5 --- /dev/null +++ b/src/java/org/apache/hadoop/hbase/client/package-info.java @@ -0,0 +1,157 @@ +/* + * Copyright 2009 The Apache Software Foundation + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +/** +Provides HBase Client + +

Table of Contents

+ + +

Example API Usage

+ +

Once you have a running HBase, you probably want a way to hook your application up to it. + If your application is in Java, then you should use the Java API. Here's an example of what + a simple client might look like. This example assumes that you've created a table called + "myTable" with a column family called "myColumnFamily". +

+ +
+
+REPLACE!!!!!!!!
+import java.io.IOException;
+import org.apache.hadoop.hbase.client.HTable;
+import org.apache.hadoop.hbase.client.Scanner;
+import org.apache.hadoop.hbase.io.BatchUpdate;
+import org.apache.hadoop.hbase.io.Cell;
+import org.apache.hadoop.hbase.io.RowResult;
+import org.apache.hadoop.hbase.util.Bytes;
+
+public class MyClient {
+
+  public static void main(String args[]) throws IOException {
+    // You need a configuration object to tell the client where to connect.
+    // But don't worry, the defaults are pulled from the local config file.
+    HBaseConfiguration config = new HBaseConfiguration();
+
+    // This instantiates an HTable object that connects you to the "myTable"
+    // table. 
+    HTable table = new HTable(config, "myTable");
+
+    // To do any sort of update on a row, you use an instance of the BatchUpdate
+    // class. A BatchUpdate takes a row and optionally a timestamp which your
+    // updates will affect.  If no timestamp, the server applies current time
+    // to the edits.
+    BatchUpdate batchUpdate = new BatchUpdate("myRow");
+
+    // The BatchUpdate#put method takes a byte [] (or String) that designates
+    // what cell you want to put a value into, and a byte array that is the
+    // value you want to store. Note that if you want to store Strings, you
+    // have to getBytes() from the String for HBase to store it since HBase is
+    // all about byte arrays. The same goes for primitives like ints and longs
+    // and user-defined classes - you must find a way to reduce it to bytes.
+    // The Bytes class from the hbase util package has utility for going from
+    // String to utf-8 bytes and back again and help for other base types.
+    batchUpdate.put("myColumnFamily:columnQualifier1", 
+      Bytes.toBytes("columnQualifier1 value!"));
+
+    // Deletes are batch operations in HBase as well. 
+    batchUpdate.delete("myColumnFamily:cellIWantDeleted");
+
+    // Once you've done all the puts you want, you need to commit the results.
+    // The HTable#commit method takes the BatchUpdate instance you've been 
+    // building and pushes the batch of changes you made into HBase.
+    table.commit(batchUpdate);
+
+    // Now, to retrieve the data we just wrote. The values that come back are
+    // Cell instances. A Cell is a combination of the value as a byte array and
+    // the timestamp the value was stored with. If you happen to know that the 
+    // value contained is a string and want an actual string, then you must 
+    // convert it yourself.
+    Cell cell = table.get("myRow", "myColumnFamily:columnQualifier1");
+    // This could throw a NullPointerException if there was no value at the cell
+    // location.
+    String valueStr = Bytes.toString(cell.getValue());
+    
+    // Sometimes, you won't know the row you're looking for. In this case, you
+    // use a Scanner. This will give you cursor-like interface to the contents
+    // of the table.
+    Scanner scanner = 
+      // we want to get back only "myColumnFamily:columnQualifier1" when we iterate
+      table.getScanner(new String[]{"myColumnFamily:columnQualifier1"});
+    
+    
+    // Scanners return RowResult instances. A RowResult is like the
+    // row key and the columns all wrapped up in a single Object. 
+    // RowResult#getRow gives you the row key. RowResult also implements 
+    // Map, so you can get to your column results easily. 
+    
+    // Now, for the actual iteration. One way is to use a while loop like so:
+    RowResult rowResult = scanner.next();
+    
+    while (rowResult != null) {
+      // print out the row we found and the columns we were looking for
+      System.out.println("Found row: " + Bytes.toString(rowResult.getRow()) +
+        " with value: " + rowResult.get(Bytes.toBytes("myColumnFamily:columnQualifier1")));
+      rowResult = scanner.next();
+    }
+    
+    // The other approach is to use a foreach loop. Scanners are iterable!
+    for (RowResult result : scanner) {
+      // print out the row we found and the columns we were looking for
+      System.out.println("Found row: " + Bytes.toString(rowResult.getRow()) +
+        " with value: " + rowResult.get(Bytes.toBytes("myColumnFamily:columnQualifier1")));
+    }
+    
+    // Make sure you close your scanners when you are done!
+    // Its probably best to put the iteration into a try/finally with the below
+    // inside the finally clause.
+    scanner.close();
+  }
+}
+
+
+ +

There are many other methods for putting data into and getting data out of + HBase, but these examples should get you started. See the HTable javadoc for + more methods. Additionally, there are methods for managing tables in the + HBaseAdmin class.

+ +

If your client is NOT Java, then you should consider the Thrift or REST + libraries.

+ +

Related Documentation

+ + + + +

There are many other methods for putting data into and getting data out of + HBase, but these examples should get you started. See the HTable javadoc for + more methods. Additionally, there are methods for managing tables in the + HBaseAdmin class.

+ + + +*/ +package org.apache.hadoop.hbase.client; diff --git a/src/java/overview.html b/src/java/overview.html index ce6873bb4bc..db968a10077 100644 --- a/src/java/overview.html +++ b/src/java/overview.html @@ -27,9 +27,9 @@

Requirements

Windows

-If you are running HBase on Windows, you must install Cygwin. Additionally, it is strongly recommended that you add or append to the following environment variables. If you install Cygwin in a location that is not C:\cygwin you should modify the following appropriately. +If you are running HBase on Windows, you must install Cygwin. +Additionally, it is strongly recommended that you add or append to the following +environment variables. If you install Cygwin in a location that is not C:\cygwin you +should modify the following appropriately.

+

 HOME=c:\cygwin\home\jim
 ANT_HOME=(wherever you installed ant)
@@ -58,27 +76,33 @@ JAVA_HOME=(wherever you installed java)
 PATH=C:\cygwin\bin;%JAVA_HOME%\bin;%ANT_HOME%\bin; other windows stuff 
 SHELL=/bin/bash
 
-For additional information, see the Hadoop Quick Start Guide +
+For additional information, see the +Hadoop Quick Start Guide

Getting Started

-What follows presumes you have obtained a copy of HBase and are installing +What follows presumes you have obtained a copy of HBase, +see Releases, and are installing for the first time. If upgrading your HBase instance, see Upgrading. +

Three modes are described: standalone, pseudo-distributed (where all servers are run on +a single host), and distributed. If new to hbase start by following the standalone instruction.

-Define ${HBASE_HOME} to be the location of the root of your HBase installation, e.g. +Whatever your mode, define ${HBASE_HOME} to be the location of the root of your HBase installation, e.g. /user/local/hbase. Edit ${HBASE_HOME}/conf/hbase-env.sh. In this file you can set the heapsize for HBase, etc. At a minimum, set JAVA_HOME to point at the root of your Java installation.

+

Standalone Mode

If you are running a standalone operation, there should be nothing further to configure; proceed to Running and Confirming Your Installation. If you are running a distributed operation, continue reading.

-

Distributed Operation

+

Distributed Operation: Pseudo- and Fully-Distributed Modes

Distributed mode requires an instance of the Hadoop Distributed File System (DFS). See the Hadoop requirements and instructions for how to set up a DFS. @@ -113,13 +137,12 @@ create them if you let it).

Fully-Distributed Operation

-For running a fully-distributed operation on more than one host, the following +

For running a fully-distributed operation on more than one host, the following configurations must be made in addition to those described in the pseudo-distributed operation section above. -A Zookeeper cluster is also required to ensure higher availability. -In hbase-site.xml, you must also configure -hbase.cluster.distributed to 'true'. -

+In this mode, a ZooKeeper cluster is required.

+

In hbase-site.xml, set hbase.cluster.distributed to 'true'. +

 <configuration>
   ...
@@ -134,43 +157,60 @@ In hbase-site.xml, you must also configure
   ...
 </configuration>
 
-

-Keep in mind that for a fully-distributed operation, you may not want your hbase.rootdir -to point to localhost (maybe, as in the configuration above, you will want to use -example.org). In addition to hbase-site.xml, a fully-distributed -operation requires that you also modify ${HBASE_HOME}/conf/regionservers. -regionserver lists all the hosts running HRegionServers, one host per line (This file -in HBase is like the hadoop slaves file at ${HADOOP_HOME}/conf/slaves). +

-Furthermore, you have to configure a distributed ZooKeeper cluster. -The ZooKeeper configuration file is stored at ${HBASE_HOME}/conf/zoo.cfg. -See the ZooKeeper Getting Started Guide for information about the format and options of that file. -Specifically, look at the Running Replicated ZooKeeper section. -In ${HBASE_HOME}/conf/hbase-env.sh, set the following to tell HBase not to manage its own single instance of ZooKeeper. +In fully-distributed operation, you probably want to change your hbase.rootdir +from localhost to the name of the node running the HDFS namenode. In addition +to hbase-site.xml changes, a fully-distributed operation requires that you +modify ${HBASE_HOME}/conf/regionservers. +The regionserver file lists all hosts running HRegionServers, one host per line +(This file in HBase is like the hadoop slaves file at ${HADOOP_HOME}/conf/slaves). +

+

+A distributed HBase depends on a running ZooKeeper cluster. +The ZooKeeper configuration file for HBase is stored at ${HBASE_HOME}/conf/zoo.cfg. +See the ZooKeeper Getting Started Guide +for information about the format and options of that file. Specifically, look at the +Running Replicated ZooKeeper section. + + +After configuring zoo.cfg, in ${HBASE_HOME}/conf/hbase-env.sh, +set the following to tell HBase to STOP managing its instance of ZooKeeper. +

   ...
 # Tell HBase whether it should manage it's own instance of Zookeeper or not.
 export HBASE_MANAGES_ZK=false
 
+

-It's still possible to use HBase in order to start a single Zookeeper instance in fully-distributed operation. -The first thing to do is still to change ${HBASE_HOME}/conf/zoo.cfg and set a single node. -Note that leaving the value "localhost" will make it impossible to start HBase. +Though not recommended, it can be convenient having HBase continue to manage +ZooKeeper even when in distributed mode (It can be good when testing or taking +hbase for a testdrive). Change ${HBASE_HOME}/conf/zoo.cfg and +set the server.0 property to the IP of the node that will be running ZooKeeper +(Leaving the default value of "localhost" will make it impossible to start HBase).

   ...
 server.0=example.org:2888:3888
+
Then on the example.org server do the following before running HBase.
 ${HBASE_HOME}/bin/hbase-daemon.sh start zookeeper
 
+ +

To stop ZooKeeper, after you've shut down hbase, do: +

+
+${HBASE_HOME}/bin/hbase-daemon.sh stop zookeeper
+
+
Be aware that this option is only recommanded for testing purposes as a failure on that node would render HBase unusable.

-

Of note, if you have made HDFS client configuration on your hadoop cluster, HBase will not see this configuration unless you do one of the following: