HBASE-738 overview.html in need of updating
git-svn-id: https://svn.apache.org/repos/asf/hadoop/hbase/trunk@676090 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
36f0d36de9
commit
e6e11eae01
|
@ -283,6 +283,7 @@ Trunk (unreleased changes)
|
|||
(Jean-Daniel Cryans via Stack)
|
||||
HBASE-730 On startup, rinse STARTCODE and SERVER from .META.
|
||||
(Jean-Daniel Cryans via Stack)
|
||||
HBASE-738 overview.html in need of updating (Izaak Rubin via Stack)
|
||||
|
||||
NEW FEATURES
|
||||
HBASE-47 Option to set TTL for columns in hbase
|
||||
|
|
|
@ -28,7 +28,8 @@
|
|||
<ul>
|
||||
<li>Java 1.5.x, preferably from <a href="http://www.java.com/en/download/">Sun</a>.
|
||||
</li>
|
||||
<li>Hadoop 0.17.x. This version of HBase will only run on this version of Hadoop.</a>.
|
||||
<li><a href="http://hadoop.apache.org/core/releases.html">Hadoop 0.17.x</a>. This version of HBase will
|
||||
only run on this version of Hadoop.
|
||||
</li>
|
||||
<li>
|
||||
ssh must be installed and sshd must be running to use Hadoop's
|
||||
|
@ -48,63 +49,75 @@ for the first time. If upgrading your
|
|||
HBase instance, see <a href="#upgrading">Upgrading</a>.
|
||||
</p>
|
||||
<p>
|
||||
<ul>
|
||||
<li><code>${HBASE_HOME}</code>: Set HBASE_HOME to the location of the HBase root: e.g. <code>/user/local/hbase</code>.
|
||||
</li>
|
||||
</ul>
|
||||
</p>
|
||||
<p>Edit <code>${HBASE_HOME}/conf/hbase-env.sh</code>. In this file you can
|
||||
Define <code>${HBASE_HOME}</code> to be the location of the root of your HBase installation, e.g.
|
||||
<code>/user/local/hbase</code>. Edit <code>${HBASE_HOME}/conf/hbase-env.sh</code>. In this file you can
|
||||
set the heapsize for HBase, etc. At a minimum, set <code>JAVA_HOME</code> to point at the root of
|
||||
your Java installation.
|
||||
</p>
|
||||
<p>
|
||||
If you are running a standalone operation, there should be nothing further to configure; proceed to
|
||||
<a href=#runandconfirm>Running and Confirming Your Installation</a>. If you are running a distributed
|
||||
<a href=#runandconfirm>Running and Confirming Your Installation</a>. If you are running a distributed
|
||||
operation, continue reading.
|
||||
</p>
|
||||
|
||||
<h2><a name="distributed" >Distributed Operation</a></h2>
|
||||
<h2><a name="distributed">Distributed Operation</a></h2>
|
||||
<p>Distributed mode requires an instance of the Hadoop Distributed File System (DFS).
|
||||
See the Hadoop <a href="http://lucene.apache.org/hadoop/api/overview-summary.html#overview_description">
|
||||
requirements and instructions</a> for how to set up a DFS.</p>
|
||||
<p>Once you have confirmed your DFS setup, configuring HBase requires modification of the following two files:
|
||||
<code>${HBASE_HOME}/conf/hbase-site.xml</code> and <code>${HBASE_HOME}/conf/regionservers</code>.
|
||||
The former needs to be pointed at the running Hadoop DFS instance. The latter file lists
|
||||
all the members of the HBase cluster.
|
||||
</p>
|
||||
<p>Use <code>hbase-site.xml</code> to override the properties defined in
|
||||
|
||||
<h3><a name="pseudo-distrib">Pseudo-Distributed Operation</a></h3>
|
||||
<p>A pseudo-distributed operation is simply a distributed operation run on a single host.
|
||||
Once you have confirmed your DFS setup, configuring HBase for use on one host requires modification of
|
||||
<code>${HBASE_HOME}/conf/hbase-site.xml</code>, which needs to be pointed at the running Hadoop DFS instance.
|
||||
Use <code>hbase-site.xml</code> to override the properties defined in
|
||||
<code>${HBASE_HOME}/conf/hbase-default.xml</code> (<code>hbase-default.xml</code> itself
|
||||
should never be modified). At a minimum the <code>hbase.master</code> and the
|
||||
<code>hbase.rootdir</code> properties should be redefined
|
||||
in <code>hbase-site.xml</code> to configure the <code>host:port</code> pair on which the
|
||||
HMaster runs (<a href="http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture">read about the
|
||||
HBase master, regionservers, etc</a>) and to point HBase at the Hadoop filesystem to use. For
|
||||
example, adding the below to your hbase-site.xml says the master is up on port 60000 on the host
|
||||
example.org and that HBase should use the <code>/hbase</code> directory in the HDFS whose namenode
|
||||
is at port 9000, again on example.org:
|
||||
should never be modified). At a minimum the <code>hbase.rootdir</code> property should be redefined
|
||||
in <code>hbase-site.xml</code> to point HBase at the Hadoop filesystem to use. For example, adding the property
|
||||
below to your <code>hbase-site.xml</code> says that HBase should use the <code>/hbase</code> directory in the
|
||||
HDFS whose namenode is at port 9000 on your local machine:
|
||||
</p>
|
||||
<pre>
|
||||
<configuration>
|
||||
...
|
||||
<property>
|
||||
<name>hbase.rootdir</name>
|
||||
<value>hdfs://localhost:9000/hbase</value>
|
||||
<description>The directory shared by region servers.
|
||||
</description>
|
||||
</property>
|
||||
...
|
||||
</configuration>
|
||||
</pre>
|
||||
|
||||
<h3><a name="fully-distrib">Fully-Distributed Operation</a></h3>
|
||||
For running a fully-distributed operation on more than one host, the following configurations
|
||||
must be made <i>in addition</i> to those described in the
|
||||
<a href="#pseudo-distrib">pseudo-distributed operation</a> section above. In
|
||||
<code>hbase-site.xml</code>, you must also configure <code>hbase.master</code> to the
|
||||
<code>host:port</code> pair on which the HMaster runs
|
||||
(<a href="http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture">read about the HBase master,
|
||||
regionservers, etc</a>). For example, adding the below to your <code>hbase-site.xml</code> says the
|
||||
master is up on port 60000 on the host example.org:
|
||||
</p>
|
||||
<pre>
|
||||
<configuration>
|
||||
...
|
||||
<property>
|
||||
<name>hbase.master</name>
|
||||
<value>example.org:60000</value>
|
||||
<description>The host and port that the HBase master runs at.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>hbase.rootdir</name>
|
||||
<value>hdfs://example.org:9000/hbase</value>
|
||||
<description>The directory shared by region servers.
|
||||
</description>
|
||||
</property>
|
||||
|
||||
...
|
||||
</configuration>
|
||||
</pre>
|
||||
<p>
|
||||
The <code>regionserver</code> file lists all the hosts running HRegionServers, one
|
||||
host per line (This file in HBase is like the hadoop slaves file at
|
||||
<code>${HADOOP_HOME}/conf/slaves</code>).
|
||||
Keep in mind that for a fully-distributed operation, you may not want your <code>hbase.rootdir</code>
|
||||
to point to localhost (maybe, as in the configuration above, you will want to use
|
||||
<code>example.org</code>). In addition to <code>hbase-site.xml</code>, a fully-distributed
|
||||
operation requires that you also modify <code>${HBASE_HOME}/conf/regionservers</code>.
|
||||
<code>regionserver</code> lists all the hosts running HRegionServers, one host per line (This file
|
||||
in HBase is like the hadoop slaves file at <code>${HADOOP_HOME}/conf/slaves</code>).
|
||||
</p>
|
||||
<p>Of note, if you have made <i>HDFS client configuration</i> on your hadoop cluster, hbase will not
|
||||
see this configuration unless you do one of the following:
|
||||
|
@ -114,8 +127,8 @@ see this configuration unless you do one of the following:
|
|||
<li>If only a small set of HDFS client configurations, add them to <code>hbase-site.xml</code></li>
|
||||
</ul>
|
||||
An example of such an HDFS client configuration is <code>dfs.replication</code>. If for example,
|
||||
you want to run with a replication factor of 5, hbase will make files will create files with
|
||||
the default of 3 unless you do the above to make the configuration available to hbase.
|
||||
you want to run with a replication factor of 5, hbase will create files with the default of 3 unless
|
||||
you do the above to make the configuration available to hbase.
|
||||
</p>
|
||||
|
||||
<h2><a name="runandconfirm">Running and Confirming Your Installation</a></h2>
|
||||
|
@ -125,7 +138,7 @@ the local filesystem.</p>
|
|||
<p>If you are running a distributed cluster you will need to start the Hadoop DFS daemons
|
||||
before starting HBase and stop the daemons after HBase has shut down. Start and
|
||||
stop the Hadoop DFS daemons by running <code>${HADOOP_HOME}/bin/start-dfs.sh</code>.
|
||||
Ensure it started properly by testing the put and get of files into the Hadoop filesystem.
|
||||
You can ensure it started properly by testing the put and get of files into the Hadoop filesystem.
|
||||
HBase does not normally use the mapreduce daemons. These do not need to be started.</p>
|
||||
|
||||
<p>Start HBase with the following command:
|
||||
|
@ -169,14 +182,12 @@ the HBase version. It does not change your install unless you explicitly ask it
|
|||
|
||||
<div style="background-color: #cccccc; padding: 2px">
|
||||
<code><pre>
|
||||
import java.io.IOException;
|
||||
import org.apache.hadoop.hbase.client.HTable;
|
||||
import org.apache.hadoop.hbase.HBaseConfiguration;
|
||||
import org.apache.hadoop.hbase.HStoreKey;
|
||||
import org.apache.hadoop.hbase.HScannerInterface;
|
||||
import org.apache.hadoop.hbase.client.Scanner;
|
||||
import org.apache.hadoop.hbase.io.BatchUpdate;
|
||||
import org.apache.hadoop.hbase.io.Cell;
|
||||
import org.apache.hadoop.io.Text;
|
||||
import java.io.IOException;
|
||||
import org.apache.hadoop.hbase.io.RowResult;
|
||||
|
||||
public class MyClient {
|
||||
|
||||
|
@ -187,12 +198,12 @@ public class MyClient {
|
|||
|
||||
// This instantiates an HTable object that connects you to the "myTable"
|
||||
// table.
|
||||
HTable table = new HTable(config, new Text("myTable"));
|
||||
HTable table = new HTable(config, "myTable");
|
||||
|
||||
// To do any sort of update on a row, you use an instance of the BatchUpdate
|
||||
// class. A BatchUpdate takes a row and optionally a timestamp which your
|
||||
// updates will affect.
|
||||
BatchUpdate batchUpdate = new BatchUpdate(new Text("myRow"));
|
||||
BatchUpdate batchUpdate = new BatchUpdate("myRow");
|
||||
|
||||
// The BatchUpdate#put method takes a Text that describes what cell you want
|
||||
// to put a value into, and a byte array that is the value you want to
|
||||
|
@ -200,11 +211,11 @@ public class MyClient {
|
|||
// from the string for HBase to understand how to store it. (The same goes
|
||||
// for primitives like ints and longs and user-defined classes - you must
|
||||
// find a way to reduce it to bytes.)
|
||||
batchUpdate.put(new Text("myColumnFamily:columnQualifier1"),
|
||||
batchUpdate.put("myColumnFamily:columnQualifier1",
|
||||
"columnQualifier1 value!".getBytes());
|
||||
|
||||
// Deletes are batch operations in HBase as well.
|
||||
batchUpdate.delete(new Text("myColumnFamily:cellIWantDeleted"));
|
||||
batchUpdate.delete("myColumnFamily:cellIWantDeleted");
|
||||
|
||||
// Once you've done all the puts you want, you need to commit the results.
|
||||
// The HTable#commit method takes the BatchUpdate instance you've been
|
||||
|
@ -216,44 +227,38 @@ public class MyClient {
|
|||
// the timestamp the value was stored with. If you happen to know that the
|
||||
// value contained is a string and want an actual string, then you must
|
||||
// convert it yourself.
|
||||
Cell cell = table.get(new Text("myRow"),
|
||||
new Text("myColumnFamily:columnQualifier1"));
|
||||
String valueStr = new String(valueBytes.getValue());
|
||||
Cell cell = table.get("myRow", "myColumnFamily:columnQualifier1");
|
||||
String valueStr = new String(cell.getValue());
|
||||
|
||||
// Sometimes, you won't know the row you're looking for. In this case, you
|
||||
// use a Scanner. This will give you cursor-like interface to the contents
|
||||
// of the table.
|
||||
HStoreKey row = new HStoreKey();
|
||||
SortedMap<Text, byte[]> columns = new TreeMap<Text, byte[]>();
|
||||
HScannerInterface scanner =
|
||||
Scanner scanner =
|
||||
// we want to get back only "myColumnFamily:columnQualifier1" when we iterate
|
||||
table.obtainScanner(new Text[]{new Text("myColumnFamily:columnQualifier1")},
|
||||
// we want to start scanning from an empty Text, meaning the beginning of
|
||||
// the table
|
||||
new Text(""));
|
||||
table.getScanner(new String[]{"myColumnFamily:columnQualifier1"});
|
||||
|
||||
|
||||
// Scanners in HBase 0.2 return RowResult instances. A RowResult is like the
|
||||
// row key and the columns all wrapped up in a single interface.
|
||||
// RowResult#getRow gives you the row key. RowResult also implements
|
||||
// Map<Text, Cell>, so you can get to your column results easily.
|
||||
// Map, so you can get to your column results easily.
|
||||
|
||||
// Now, for the actual iteration. One way is to use a while loop like so:
|
||||
RowResult rowResult = scanner.next();
|
||||
|
||||
while(rowResult != null) {
|
||||
// print out the row we found and the columns we were looking for
|
||||
System.out.println("Found row: " + rowResult.getRow() + " with value: " +
|
||||
new String(rowResult.get("myColumnFamily:columnQualifier1")));
|
||||
System.out.println("Found row: " + new String(rowResult.getRow()) + " with value: " +
|
||||
rowResult.get("myColumnFamily:columnQualifier1".getBytes()));
|
||||
|
||||
rowResult = scanner.next();
|
||||
}
|
||||
|
||||
// The other approach is to use a foreach loop. Scanners are iterable!
|
||||
for (RowResult rowResult : scanner) {
|
||||
for (RowResult result : scanner) {
|
||||
// print out the row we found and the columns we were looking for
|
||||
System.out.println("Found row: " + rowResult.getRow() + " with value: " +
|
||||
new String(rowResult.get("myColumnFamily:columnQualifier1")));
|
||||
System.out.println("Found row: " + new String(result.getRow()) + " with value: " +
|
||||
result.get("myColumnFamily:columnQualifier1".getBytes()));
|
||||
}
|
||||
|
||||
// Make sure you close your scanners when you are done!
|
||||
|
|
Loading…
Reference in New Issue