HBASE-738 overview.html in need of updating

git-svn-id: https://svn.apache.org/repos/asf/hadoop/hbase/trunk@676090 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2008-07-11 21:43:23 +00:00
parent 36f0d36de9
commit e6e11eae01
2 changed files with 68 additions and 62 deletions

View File

@ -283,6 +283,7 @@ Trunk (unreleased changes)
(Jean-Daniel Cryans via Stack) (Jean-Daniel Cryans via Stack)
HBASE-730 On startup, rinse STARTCODE and SERVER from .META. HBASE-730 On startup, rinse STARTCODE and SERVER from .META.
(Jean-Daniel Cryans via Stack) (Jean-Daniel Cryans via Stack)
HBASE-738 overview.html in need of updating (Izaak Rubin via Stack)
NEW FEATURES NEW FEATURES
HBASE-47 Option to set TTL for columns in hbase HBASE-47 Option to set TTL for columns in hbase

View File

@ -28,7 +28,8 @@
<ul> <ul>
<li>Java 1.5.x, preferably from <a href="http://www.java.com/en/download/">Sun</a>. <li>Java 1.5.x, preferably from <a href="http://www.java.com/en/download/">Sun</a>.
</li> </li>
<li>Hadoop 0.17.x. This version of HBase will only run on this version of Hadoop.</a>. <li><a href="http://hadoop.apache.org/core/releases.html">Hadoop 0.17.x</a>. This version of HBase will
only run on this version of Hadoop.
</li> </li>
<li> <li>
ssh must be installed and sshd must be running to use Hadoop's ssh must be installed and sshd must be running to use Hadoop's
@ -48,63 +49,75 @@ for the first time. If upgrading your
HBase instance, see <a href="#upgrading">Upgrading</a>. HBase instance, see <a href="#upgrading">Upgrading</a>.
</p> </p>
<p> <p>
<ul> Define <code>${HBASE_HOME}</code> to be the location of the root of your HBase installation, e.g.
<li><code>${HBASE_HOME}</code>: Set HBASE_HOME to the location of the HBase root: e.g. <code>/user/local/hbase</code>. <code>/user/local/hbase</code>. Edit <code>${HBASE_HOME}/conf/hbase-env.sh</code>. In this file you can
</li>
</ul>
</p>
<p>Edit <code>${HBASE_HOME}/conf/hbase-env.sh</code>. In this file you can
set the heapsize for HBase, etc. At a minimum, set <code>JAVA_HOME</code> to point at the root of set the heapsize for HBase, etc. At a minimum, set <code>JAVA_HOME</code> to point at the root of
your Java installation. your Java installation.
</p>
<p> <p>
If you are running a standalone operation, there should be nothing further to configure; proceed to If you are running a standalone operation, there should be nothing further to configure; proceed to
<a href=#runandconfirm>Running and Confirming Your Installation</a>. If you are running a distributed <a href=#runandconfirm>Running and Confirming Your Installation</a>. If you are running a distributed
operation, continue reading. operation, continue reading.
</p> </p>
<h2><a name="distributed" >Distributed Operation</a></h2> <h2><a name="distributed">Distributed Operation</a></h2>
<p>Distributed mode requires an instance of the Hadoop Distributed File System (DFS). <p>Distributed mode requires an instance of the Hadoop Distributed File System (DFS).
See the Hadoop <a href="http://lucene.apache.org/hadoop/api/overview-summary.html#overview_description"> See the Hadoop <a href="http://lucene.apache.org/hadoop/api/overview-summary.html#overview_description">
requirements and instructions</a> for how to set up a DFS.</p> requirements and instructions</a> for how to set up a DFS.</p>
<p>Once you have confirmed your DFS setup, configuring HBase requires modification of the following two files:
<code>${HBASE_HOME}/conf/hbase-site.xml</code> and <code>${HBASE_HOME}/conf/regionservers</code>. <h3><a name="pseudo-distrib">Pseudo-Distributed Operation</a></h3>
The former needs to be pointed at the running Hadoop DFS instance. The latter file lists <p>A pseudo-distributed operation is simply a distributed operation run on a single host.
all the members of the HBase cluster. Once you have confirmed your DFS setup, configuring HBase for use on one host requires modification of
</p> <code>${HBASE_HOME}/conf/hbase-site.xml</code>, which needs to be pointed at the running Hadoop DFS instance.
<p>Use <code>hbase-site.xml</code> to override the properties defined in Use <code>hbase-site.xml</code> to override the properties defined in
<code>${HBASE_HOME}/conf/hbase-default.xml</code> (<code>hbase-default.xml</code> itself <code>${HBASE_HOME}/conf/hbase-default.xml</code> (<code>hbase-default.xml</code> itself
should never be modified). At a minimum the <code>hbase.master</code> and the should never be modified). At a minimum the <code>hbase.rootdir</code> property should be redefined
<code>hbase.rootdir</code> properties should be redefined in <code>hbase-site.xml</code> to point HBase at the Hadoop filesystem to use. For example, adding the property
in <code>hbase-site.xml</code> to configure the <code>host:port</code> pair on which the below to your <code>hbase-site.xml</code> says that HBase should use the <code>/hbase</code> directory in the
HMaster runs (<a href="http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture">read about the HDFS whose namenode is at port 9000 on your local machine:
HBase master, regionservers, etc</a>) and to point HBase at the Hadoop filesystem to use. For
example, adding the below to your hbase-site.xml says the master is up on port 60000 on the host
example.org and that HBase should use the <code>/hbase</code> directory in the HDFS whose namenode
is at port 9000, again on example.org:
</p> </p>
<pre> <pre>
&lt;configuration&gt; &lt;configuration&gt;
...
&lt;property&gt;
&lt;name&gt;hbase.rootdir&lt;/name&gt;
&lt;value&gt;hdfs://localhost:9000/hbase&lt;/value&gt;
&lt;description&gt;The directory shared by region servers.
&lt;/description&gt;
&lt;/property&gt;
...
&lt;/configuration&gt;
</pre>
<h3><a name="fully-distrib">Fully-Distributed Operation</a></h3>
For running a fully-distributed operation on more than one host, the following configurations
must be made <i>in addition</i> to those described in the
<a href="#pseudo-distrib">pseudo-distributed operation</a> section above. In
<code>hbase-site.xml</code>, you must also configure <code>hbase.master</code> to the
<code>host:port</code> pair on which the HMaster runs
(<a href="http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture">read about the HBase master,
regionservers, etc</a>). For example, adding the below to your <code>hbase-site.xml</code> says the
master is up on port 60000 on the host example.org:
</p>
<pre>
&lt;configuration&gt;
...
&lt;property&gt; &lt;property&gt;
&lt;name&gt;hbase.master&lt;/name&gt; &lt;name&gt;hbase.master&lt;/name&gt;
&lt;value&gt;example.org:60000&lt;/value&gt; &lt;value&gt;example.org:60000&lt;/value&gt;
&lt;description&gt;The host and port that the HBase master runs at. &lt;description&gt;The host and port that the HBase master runs at.
&lt;/description&gt; &lt;/description&gt;
&lt;/property&gt; &lt;/property&gt;
...
&lt;property&gt;
&lt;name&gt;hbase.rootdir&lt;/name&gt;
&lt;value&gt;hdfs://example.org:9000/hbase&lt;/value&gt;
&lt;description&gt;The directory shared by region servers.
&lt;/description&gt;
&lt;/property&gt;
&lt;/configuration&gt; &lt;/configuration&gt;
</pre> </pre>
<p> <p>
The <code>regionserver</code> file lists all the hosts running HRegionServers, one Keep in mind that for a fully-distributed operation, you may not want your <code>hbase.rootdir</code>
host per line (This file in HBase is like the hadoop slaves file at to point to localhost (maybe, as in the configuration above, you will want to use
<code>${HADOOP_HOME}/conf/slaves</code>). <code>example.org</code>). In addition to <code>hbase-site.xml</code>, a fully-distributed
operation requires that you also modify <code>${HBASE_HOME}/conf/regionservers</code>.
<code>regionserver</code> lists all the hosts running HRegionServers, one host per line (This file
in HBase is like the hadoop slaves file at <code>${HADOOP_HOME}/conf/slaves</code>).
</p> </p>
<p>Of note, if you have made <i>HDFS client configuration</i> on your hadoop cluster, hbase will not <p>Of note, if you have made <i>HDFS client configuration</i> on your hadoop cluster, hbase will not
see this configuration unless you do one of the following: see this configuration unless you do one of the following:
@ -114,8 +127,8 @@ see this configuration unless you do one of the following:
<li>If only a small set of HDFS client configurations, add them to <code>hbase-site.xml</code></li> <li>If only a small set of HDFS client configurations, add them to <code>hbase-site.xml</code></li>
</ul> </ul>
An example of such an HDFS client configuration is <code>dfs.replication</code>. If for example, An example of such an HDFS client configuration is <code>dfs.replication</code>. If for example,
you want to run with a replication factor of 5, hbase will make files will create files with you want to run with a replication factor of 5, hbase will create files with the default of 3 unless
the default of 3 unless you do the above to make the configuration available to hbase. you do the above to make the configuration available to hbase.
</p> </p>
<h2><a name="runandconfirm">Running and Confirming Your Installation</a></h2> <h2><a name="runandconfirm">Running and Confirming Your Installation</a></h2>
@ -125,7 +138,7 @@ the local filesystem.</p>
<p>If you are running a distributed cluster you will need to start the Hadoop DFS daemons <p>If you are running a distributed cluster you will need to start the Hadoop DFS daemons
before starting HBase and stop the daemons after HBase has shut down. Start and before starting HBase and stop the daemons after HBase has shut down. Start and
stop the Hadoop DFS daemons by running <code>${HADOOP_HOME}/bin/start-dfs.sh</code>. stop the Hadoop DFS daemons by running <code>${HADOOP_HOME}/bin/start-dfs.sh</code>.
Ensure it started properly by testing the put and get of files into the Hadoop filesystem. You can ensure it started properly by testing the put and get of files into the Hadoop filesystem.
HBase does not normally use the mapreduce daemons. These do not need to be started.</p> HBase does not normally use the mapreduce daemons. These do not need to be started.</p>
<p>Start HBase with the following command: <p>Start HBase with the following command:
@ -169,14 +182,12 @@ the HBase version. It does not change your install unless you explicitly ask it
<div style="background-color: #cccccc; padding: 2px"> <div style="background-color: #cccccc; padding: 2px">
<code><pre> <code><pre>
import java.io.IOException;
import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Scanner;
import org.apache.hadoop.hbase.HStoreKey;
import org.apache.hadoop.hbase.HScannerInterface;
import org.apache.hadoop.hbase.io.BatchUpdate; import org.apache.hadoop.hbase.io.BatchUpdate;
import org.apache.hadoop.hbase.io.Cell; import org.apache.hadoop.hbase.io.Cell;
import org.apache.hadoop.io.Text; import org.apache.hadoop.hbase.io.RowResult;
import java.io.IOException;
public class MyClient { public class MyClient {
@ -187,12 +198,12 @@ public class MyClient {
// This instantiates an HTable object that connects you to the "myTable" // This instantiates an HTable object that connects you to the "myTable"
// table. // table.
HTable table = new HTable(config, new Text("myTable")); HTable table = new HTable(config, "myTable");
// To do any sort of update on a row, you use an instance of the BatchUpdate // To do any sort of update on a row, you use an instance of the BatchUpdate
// class. A BatchUpdate takes a row and optionally a timestamp which your // class. A BatchUpdate takes a row and optionally a timestamp which your
// updates will affect. // updates will affect.
BatchUpdate batchUpdate = new BatchUpdate(new Text("myRow")); BatchUpdate batchUpdate = new BatchUpdate("myRow");
// The BatchUpdate#put method takes a Text that describes what cell you want // The BatchUpdate#put method takes a Text that describes what cell you want
// to put a value into, and a byte array that is the value you want to // to put a value into, and a byte array that is the value you want to
@ -200,11 +211,11 @@ public class MyClient {
// from the string for HBase to understand how to store it. (The same goes // from the string for HBase to understand how to store it. (The same goes
// for primitives like ints and longs and user-defined classes - you must // for primitives like ints and longs and user-defined classes - you must
// find a way to reduce it to bytes.) // find a way to reduce it to bytes.)
batchUpdate.put(new Text("myColumnFamily:columnQualifier1"), batchUpdate.put("myColumnFamily:columnQualifier1",
"columnQualifier1 value!".getBytes()); "columnQualifier1 value!".getBytes());
// Deletes are batch operations in HBase as well. // Deletes are batch operations in HBase as well.
batchUpdate.delete(new Text("myColumnFamily:cellIWantDeleted")); batchUpdate.delete("myColumnFamily:cellIWantDeleted");
// Once you've done all the puts you want, you need to commit the results. // Once you've done all the puts you want, you need to commit the results.
// The HTable#commit method takes the BatchUpdate instance you've been // The HTable#commit method takes the BatchUpdate instance you've been
@ -216,44 +227,38 @@ public class MyClient {
// the timestamp the value was stored with. If you happen to know that the // the timestamp the value was stored with. If you happen to know that the
// value contained is a string and want an actual string, then you must // value contained is a string and want an actual string, then you must
// convert it yourself. // convert it yourself.
Cell cell = table.get(new Text("myRow"), Cell cell = table.get("myRow", "myColumnFamily:columnQualifier1");
new Text("myColumnFamily:columnQualifier1")); String valueStr = new String(cell.getValue());
String valueStr = new String(valueBytes.getValue());
// Sometimes, you won't know the row you're looking for. In this case, you // Sometimes, you won't know the row you're looking for. In this case, you
// use a Scanner. This will give you cursor-like interface to the contents // use a Scanner. This will give you cursor-like interface to the contents
// of the table. // of the table.
HStoreKey row = new HStoreKey(); Scanner scanner =
SortedMap<Text, byte[]> columns = new TreeMap<Text, byte[]>();
HScannerInterface scanner =
// we want to get back only "myColumnFamily:columnQualifier1" when we iterate // we want to get back only "myColumnFamily:columnQualifier1" when we iterate
table.obtainScanner(new Text[]{new Text("myColumnFamily:columnQualifier1")}, table.getScanner(new String[]{"myColumnFamily:columnQualifier1"});
// we want to start scanning from an empty Text, meaning the beginning of
// the table
new Text(""));
// Scanners in HBase 0.2 return RowResult instances. A RowResult is like the // Scanners in HBase 0.2 return RowResult instances. A RowResult is like the
// row key and the columns all wrapped up in a single interface. // row key and the columns all wrapped up in a single interface.
// RowResult#getRow gives you the row key. RowResult also implements // RowResult#getRow gives you the row key. RowResult also implements
// Map<Text, Cell>, so you can get to your column results easily. // Map, so you can get to your column results easily.
// Now, for the actual iteration. One way is to use a while loop like so: // Now, for the actual iteration. One way is to use a while loop like so:
RowResult rowResult = scanner.next(); RowResult rowResult = scanner.next();
while(rowResult != null) { while(rowResult != null) {
// print out the row we found and the columns we were looking for // print out the row we found and the columns we were looking for
System.out.println("Found row: " + rowResult.getRow() + " with value: " + System.out.println("Found row: " + new String(rowResult.getRow()) + " with value: " +
new String(rowResult.get("myColumnFamily:columnQualifier1"))); rowResult.get("myColumnFamily:columnQualifier1".getBytes()));
rowResult = scanner.next(); rowResult = scanner.next();
} }
// The other approach is to use a foreach loop. Scanners are iterable! // The other approach is to use a foreach loop. Scanners are iterable!
for (RowResult rowResult : scanner) { for (RowResult result : scanner) {
// print out the row we found and the columns we were looking for // print out the row we found and the columns we were looking for
System.out.println("Found row: " + rowResult.getRow() + " with value: " + System.out.println("Found row: " + new String(result.getRow()) + " with value: " +
new String(rowResult.get("myColumnFamily:columnQualifier1"))); result.get("myColumnFamily:columnQualifier1".getBytes()));
} }
// Make sure you close your scanners when you are done! // Make sure you close your scanners when you are done!