Added a head-up to preface that user is about to enter the realm of distributed computing, added how to enable rpc logging, added note to decommissioning server that balacner should be off, and converted links to xrefs when they were linkends
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1095988 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
baefac4c42
commit
537463ec95
|
@ -148,7 +148,7 @@ throws InterruptedException, IOException {
|
||||||
<title>
|
<title>
|
||||||
Schema Creation
|
Schema Creation
|
||||||
</title>
|
</title>
|
||||||
<para>HBase schemas can be created or updated through the <link linkend="shell">HBase shell</link>
|
<para>HBase schemas can be created or updated with <xref linkend="shell" />
|
||||||
or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
|
or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -197,7 +197,7 @@ throws InterruptedException, IOException {
|
||||||
the case described by Marc Limotte at the tail of
|
the case described by Marc Limotte at the tail of
|
||||||
<link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
|
<link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
|
||||||
(recommended!).
|
(recommended!).
|
||||||
Therein, the indices that are kept on HBase storefiles (<link linkend="hfile">HFile</link>s)
|
Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
|
||||||
to facilitate random access may end up occupyng large chunks of the HBase
|
to facilitate random access may end up occupyng large chunks of the HBase
|
||||||
allotted RAM because the cell value coordinates are large.
|
allotted RAM because the cell value coordinates are large.
|
||||||
Mark in the above cited comment suggests upping the block size so
|
Mark in the above cited comment suggests upping the block size so
|
||||||
|
@ -213,7 +213,7 @@ throws InterruptedException, IOException {
|
||||||
<para>The number of row versions to store is configured per column
|
<para>The number of row versions to store is configured per column
|
||||||
family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
|
family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
|
||||||
The default is 3.
|
The default is 3.
|
||||||
This is an important parameter because as described in the <link linkend="datamodel">Data Model</link>
|
This is an important parameter because as described in <xref linkend="datamodel" />
|
||||||
section HBase does <emphasis>not</emphasis> overwrite row values, but rather
|
section HBase does <emphasis>not</emphasis> overwrite row values, but rather
|
||||||
stores different values per row by time (and qualifier). Excess versions are removed during major
|
stores different values per row by time (and qualifier). Excess versions are removed during major
|
||||||
compactions. The number of versions may need to be increased or decreased depending on application needs.
|
compactions. The number of versions may need to be increased or decreased depending on application needs.
|
||||||
|
@ -248,7 +248,7 @@ throws InterruptedException, IOException {
|
||||||
<para>Size of the compaction queue. This is the number of stores in the region that have been targeted for compaction.</para>
|
<para>Size of the compaction queue. This is the number of stores in the region that have been targeted for compaction.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="hbase.regionserver.fsReadLatency_avg_time"><title><varname>hbase.regionserver.fsReadLatency_avg_time</varname></title>
|
<section xml:id="hbase.regionserver.fsReadLatency_avg_time"><title><varname>hbase.regionserver.fsReadLatency_avg_time</varname></title>
|
||||||
<para>Filesystem read latency (ms)</para>
|
<para>Filesystem read latency (ms). This is the average time to read from HDFS.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="hbase.regionserver.fsReadLatency_num_ops"><title><varname>hbase.regionserver.fsReadLatency_num_ops</varname></title>
|
<section xml:id="hbase.regionserver.fsReadLatency_num_ops"><title><varname>hbase.regionserver.fsReadLatency_num_ops</varname></title>
|
||||||
<para>TODO</para>
|
<para>TODO</para>
|
||||||
|
@ -294,11 +294,10 @@ throws InterruptedException, IOException {
|
||||||
|
|
||||||
<chapter xml:id="datamodel">
|
<chapter xml:id="datamodel">
|
||||||
<title>Data Model</title>
|
<title>Data Model</title>
|
||||||
<para>In short, applications store data into HBase <link linkend="table">tables</link>.
|
<para>In short, applications store data into an HBase table.
|
||||||
Tables are made of <link linkend="row">rows</link> and <emphasis>columns</emphasis>.
|
Tables are made of rows and columns.
|
||||||
All columns in HBase belong to a particular
|
All columns in HBase belong to a particular column family.
|
||||||
<link linkend="columnfamily">column family</link>.
|
Table cells -- the intersection of row and column
|
||||||
Table <link linkend="cell">cells</link> -- the intersection of row and column
|
|
||||||
coordinates -- are versioned.
|
coordinates -- are versioned.
|
||||||
A cell’s content is an uninterpreted array of bytes.
|
A cell’s content is an uninterpreted array of bytes.
|
||||||
</para>
|
</para>
|
||||||
|
@ -709,7 +708,7 @@ throws InterruptedException, IOException {
|
||||||
<para>Administrative functions are handled through <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link>
|
<para>Administrative functions are handled through <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link>
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="client.connections"><title>Connections</title>
|
<section xml:id="client.connections"><title>Connections</title>
|
||||||
<para>For connection configuration information, see the <link linkend="client_dependencies">configuration</link> section.
|
<para>For connection configuration information, see <xref linkend="client_dependencies" />.
|
||||||
</para>
|
</para>
|
||||||
<para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
|
<para><link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
|
||||||
instances are not thread-safe. When creating HTable instances, it is advisable to use the same <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration">HBaseConfiguration</link>
|
instances are not thread-safe. When creating HTable instances, it is advisable to use the same <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration">HBaseConfiguration</link>
|
||||||
|
@ -728,7 +727,8 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="client.writebuffer"><title>WriteBuffer and Batch Methods</title>
|
<section xml:id="client.writebuffer"><title>WriteBuffer and Batch Methods</title>
|
||||||
<para>If <link linkend="perf.hbase.client.autoflush">autoflush</link> is turned off on <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>,
|
<para>If <xref linkend="perf.hbase.client.autoflush" /> is turned off on
|
||||||
|
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>,
|
||||||
<classname>Put</classname>s are sent to region servers when the writebuffer
|
<classname>Put</classname>s are sent to region servers when the writebuffer
|
||||||
is filled. The writebuffer is 2MB by default. Before an HTable instance is
|
is filled. The writebuffer is 2MB by default. Before an HTable instance is
|
||||||
discarded, either <methodname>close()</methodname> or
|
discarded, either <methodname>close()</methodname> or
|
||||||
|
@ -813,7 +813,7 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
|
||||||
participate. The RegionServer splits a region, offlines the split
|
participate. The RegionServer splits a region, offlines the split
|
||||||
region and then adds the daughter regions to META, opens daughters on
|
region and then adds the daughter regions to META, opens daughters on
|
||||||
the parent's hosting RegionServer and then reports the split to the
|
the parent's hosting RegionServer and then reports the split to the
|
||||||
Master. See <link linkend="disable.splitting">Managed Splitting</link> for how to manually manage
|
Master. See <xref linkend="disable.splitting" /> for how to manually manage
|
||||||
splits (and for why you might do this)</para>
|
splits (and for why you might do this)</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
@ -872,7 +872,7 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
|
||||||
For a description of how a minor compaction picks files to compact, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">ascii diagram in the Store source code.</link>
|
For a description of how a minor compaction picks files to compact, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">ascii diagram in the Store source code.</link>
|
||||||
</para>
|
</para>
|
||||||
<para>After a major compaction runs there will be a single storefile per store, and this will help performance usually. Caution: major compactions rewrite all of the stores data and on a loaded system, this may not be tenable;
|
<para>After a major compaction runs there will be a single storefile per store, and this will help performance usually. Caution: major compactions rewrite all of the stores data and on a loaded system, this may not be tenable;
|
||||||
major compactions will usually have to be <link linkend="disable.splitting">managed</link> on large systems.
|
major compactions will usually have to be <xref linkend="disable.splitting" /> on large systems.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
@ -888,7 +888,7 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
|
||||||
<title>Purpose</title>
|
<title>Purpose</title>
|
||||||
|
|
||||||
<para>Each RegionServer adds updates (Puts, Deletes) to its write-ahead log (WAL)
|
<para>Each RegionServer adds updates (Puts, Deletes) to its write-ahead log (WAL)
|
||||||
first, and then to the <link linkend="store.memstore">MemStore</link> for the affected <link linkend="store">Store</link>.
|
first, and then to the <xref linkend="store.memstore"/> for the affected <xref linkend="store" />.
|
||||||
This ensures that HBase has durable writes. Without WAL, there is the possibility of data loss in the case of a RegionServer failure
|
This ensures that HBase has durable writes. Without WAL, there is the possibility of data loss in the case of a RegionServer failure
|
||||||
before each MemStore is flushed and new StoreFiles are written. <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/wal/HLog.html">HLog</link>
|
before each MemStore is flushed and new StoreFiles are written. <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/regionserver/wal/HLog.html">HLog</link>
|
||||||
is the HBase WAL implementation, and there is one HLog instance per RegionServer.
|
is the HBase WAL implementation, and there is one HLog instance per RegionServer.
|
||||||
|
@ -1090,7 +1090,7 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
<section><title>HFile Tool</title>
|
<section><title>HFile Tool</title>
|
||||||
<para>See <link linkend="hfile_tool" >HFile Tool</link>.</para>
|
<para>See <xref linkend="hfile_tool" />.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="wal_tools">
|
<section xml:id="wal_tools">
|
||||||
<title>WAL Tools</title>
|
<title>WAL Tools</title>
|
||||||
|
@ -1113,10 +1113,31 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="compression.tool"><title>Compression Tool</title>
|
<section xml:id="compression.tool"><title>Compression Tool</title>
|
||||||
<para>See <link linkend="compression.tool" >Compression Tool</link>.</para>
|
<para>See <xref linkend="compression.tool" />.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="decommission"><title>Node Decommission</title>
|
<section xml:id="decommission"><title>Node Decommission</title>
|
||||||
<para>Since HBase 0.90.2, you can have a node gradually shed its load and then shutdown using the
|
<para>You can stop an individual regionserver by running the following
|
||||||
|
script in the HBase directory on the particular node:
|
||||||
|
<programlisting>$ ./bin/hbase-daemon.sh stop regionserver</programlisting>
|
||||||
|
The regionserver will first close all regions and then shut itself down.
|
||||||
|
On shutdown, the regionserver's ephemeral node in ZooKeeper will expire.
|
||||||
|
The master will notice the regionserver gone and will treat it as
|
||||||
|
a 'crashed' server; it will reassign the nodes the regionserver was carrying.
|
||||||
|
<note><title>Disable the Load Balancer before Decommissioning a node</title>
|
||||||
|
<para>If the load balancer runs while a node is shutting down, then
|
||||||
|
there could be contention between the Load Balancer and the
|
||||||
|
Master's recovery of the just decommissioned regionserver.
|
||||||
|
Avoid any problems by disabling the balancer first.
|
||||||
|
See <xref linkend="lb" /> below.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
A downside to the above stop of a regionserver is that regions could be offline for
|
||||||
|
a good period of time. Regions are closed in order. If many regions on the server, the
|
||||||
|
first region to close may not be back online until all regions close and after the master
|
||||||
|
notices the regionserver's znode gone. In HBase 0.90.2, we added facility for having
|
||||||
|
a node gradually shed its load and then shutdown itself down. HBase 0.90.2 added the
|
||||||
<filename>graceful_stop.sh</filename> script. Here is its usage:
|
<filename>graceful_stop.sh</filename> script. Here is its usage:
|
||||||
<programlisting>$ ./bin/graceful_stop.sh
|
<programlisting>$ ./bin/graceful_stop.sh
|
||||||
Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] [--thrift] [--rest] &hostname>
|
Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] [--thrift] [--rest] &hostname>
|
||||||
|
@ -1152,7 +1173,7 @@ Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] [--thri
|
||||||
RegionServer gone but all regions will have already been redeployed
|
RegionServer gone but all regions will have already been redeployed
|
||||||
and because the RegionServer went down cleanly, there will be no
|
and because the RegionServer went down cleanly, there will be no
|
||||||
WAL logs to split.
|
WAL logs to split.
|
||||||
<note><title>Load Balancer</title>
|
<note xml:id="lb"><title>Load Balancer</title>
|
||||||
<para>
|
<para>
|
||||||
It is assumed that the Region Load Balancer is disabled while the
|
It is assumed that the Region Load Balancer is disabled while the
|
||||||
<command>graceful_stop</command> script runs (otherwise the balancer
|
<command>graceful_stop</command> script runs (otherwise the balancer
|
||||||
|
@ -1270,7 +1291,7 @@ false
|
||||||
LZO
|
LZO
|
||||||
</title>
|
</title>
|
||||||
<para>
|
<para>
|
||||||
See <link linkend="lzo">LZO Compression</link> above.
|
See <xref linkend="lzo" /> above.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
@ -1285,7 +1306,7 @@ false
|
||||||
available on the CLASSPATH; in this case it will use native
|
available on the CLASSPATH; in this case it will use native
|
||||||
compressors instead (If the native libs are NOT present,
|
compressors instead (If the native libs are NOT present,
|
||||||
you will see lots of <emphasis>Got brand-new compressor</emphasis>
|
you will see lots of <emphasis>Got brand-new compressor</emphasis>
|
||||||
reports in your logs; see <link linkend="brand.new.compressor">FAQ</link>).
|
reports in your logs; see <xref linkend="brand.new.compressor" />).
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
</appendix>
|
</appendix>
|
||||||
|
@ -1309,7 +1330,7 @@ false
|
||||||
<answer>
|
<answer>
|
||||||
<para>
|
<para>
|
||||||
Not really. SQL-ish support for HBase via <link xlink:href="http://hive.apache.org/">Hive</link> is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests.
|
Not really. SQL-ish support for HBase via <link xlink:href="http://hive.apache.org/">Hive</link> is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests.
|
||||||
See the <link linkend="datamodel">Data Model</link> section for examples on the HBase client.
|
See the <xref linkend="datamodel" /> section for examples on the HBase client.
|
||||||
</para>
|
</para>
|
||||||
</answer>
|
</answer>
|
||||||
</qandaentry>
|
</qandaentry>
|
||||||
|
@ -1320,7 +1341,7 @@ false
|
||||||
<link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files. It's documentation
|
<link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files. It's documentation
|
||||||
states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
|
states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
|
||||||
HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion.
|
HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion.
|
||||||
See the <link linkend="datamodel">Data Model</link> and <link linkend="architecture">Architecture</link> sections for more information on how HBase achieves its goals.
|
See the <xref linkend="datamodel" /> and <xref linkend="architecture" /> sections for more information on how HBase achieves its goals.
|
||||||
</para>
|
</para>
|
||||||
</answer>
|
</answer>
|
||||||
</qandaentry>
|
</qandaentry>
|
||||||
|
@ -1406,6 +1427,7 @@ When I build, why do I always get <code>Unable to find resource 'VM_global_libra
|
||||||
|
|
||||||
<appendix>
|
<appendix>
|
||||||
<title xml:id="ycsb"><link xlink:href="https://github.com/brianfrankcooper/YCSB/">YCSB: The Yahoo! Cloud Serving Benchmark</link> and HBase</title>
|
<title xml:id="ycsb"><link xlink:href="https://github.com/brianfrankcooper/YCSB/">YCSB: The Yahoo! Cloud Serving Benchmark</link> and HBase</title>
|
||||||
|
<para>TODO: Describe how YCSB is poor for putting up a decent cluster load.</para>
|
||||||
<para>TODO: Describe setup of YCSB for HBase</para>
|
<para>TODO: Describe setup of YCSB for HBase</para>
|
||||||
<para>Ted Dunning redid YCSB so its mavenized and added facility for verifying workloads. See <link xlink:href="https://github.com/tdunning/YCSB">Ted Dunning's YCSB</link>.</para>
|
<para>Ted Dunning redid YCSB so its mavenized and added facility for verifying workloads. See <link xlink:href="https://github.com/tdunning/YCSB">Ted Dunning's YCSB</link>.</para>
|
||||||
|
|
||||||
|
|
|
@ -40,7 +40,7 @@ to ensure well-formedness of your document after an edit session.
|
||||||
for HBase, site specific customizations go into
|
for HBase, site specific customizations go into
|
||||||
the file <filename>conf/hbase-site.xml</filename>.
|
the file <filename>conf/hbase-site.xml</filename>.
|
||||||
For the list of configurable properties, see
|
For the list of configurable properties, see
|
||||||
<link linkend="hbase_default_configurations">Default HBase Configurations</link>
|
<xref linkend="hbase_default_configurations" />
|
||||||
below or view the raw <filename>hbase-default.xml</filename>
|
below or view the raw <filename>hbase-default.xml</filename>
|
||||||
source file in the HBase source code at
|
source file in the HBase source code at
|
||||||
<filename>src/main/resources</filename>.
|
<filename>src/main/resources</filename>.
|
||||||
|
@ -99,10 +99,10 @@ to ensure well-formedness of your document after an edit session.
|
||||||
|
|
||||||
|
|
||||||
<section xml:id="required_configuration"><title>Required Configurations</title>
|
<section xml:id="required_configuration"><title>Required Configurations</title>
|
||||||
<para>See the <link linkend="requirements">Requirements</link> section.
|
<para>See <xref linkend="requirements" />.
|
||||||
It lists at least two required configurations needed running HBase bearing
|
It lists at least two required configurations needed running HBase bearing
|
||||||
load: i.e. <link linkend="ulimit">file descriptors <varname>ulimit</varname></link> and
|
load: i.e. <xref linkend="ulimit" /> and
|
||||||
<link linkend="dfs.datanode.max.xcievers"><varname>dfs.datanode.max.xcievers</varname></link>.
|
<xref linkend="dfs.datanode.max.xcievers" />.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
@ -185,10 +185,10 @@ to ensure well-formedness of your document after an edit session.
|
||||||
fixup on the new machine. In versions since HBase 0.90.0, we should
|
fixup on the new machine. In versions since HBase 0.90.0, we should
|
||||||
fail in a way that makes it plain what the problem is, but maybe not.
|
fail in a way that makes it plain what the problem is, but maybe not.
|
||||||
Remember you read this paragraph<footnote><para>See
|
Remember you read this paragraph<footnote><para>See
|
||||||
<link linkend="hbase.regionserver.codecs">hbase.regionserver.codecs</link>
|
<xref linkend="hbase.regionserver.codecs" />
|
||||||
for a feature to help protect against failed LZO install</para></footnote>.
|
for a feature to help protect against failed LZO install</para></footnote>.
|
||||||
</para>
|
</para>
|
||||||
<para>See also the <link linkend="compression">Compression Appendix</link>
|
<para>See also <xref linkend="compression" />
|
||||||
at the tail of this book.</para>
|
at the tail of this book.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="bigger.regions">
|
<section xml:id="bigger.regions">
|
||||||
|
@ -303,11 +303,11 @@ of all regions.
|
||||||
(Invocation will also factor in any <filename>hbase-default.xml</filename> found;
|
(Invocation will also factor in any <filename>hbase-default.xml</filename> found;
|
||||||
an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>).
|
an hbase-default.xml ships inside the <filename>hbase.X.X.X.jar</filename>).
|
||||||
It is also possible to specify configuration directly without having to read from a
|
It is also possible to specify configuration directly without having to read from a
|
||||||
<filename>hbase-site.xml</filename>. For example, to set the
|
<filename>hbase-site.xml</filename>. For example, to set the ZooKeeper
|
||||||
<link linkend="zookeeper">zookeeper</link> ensemble for the cluster programmatically do as follows:
|
ensemble for the cluster programmatically do as follows:
|
||||||
<programlisting>Configuration config = HBaseConfiguration.create();
|
<programlisting>Configuration config = HBaseConfiguration.create();
|
||||||
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally</programlisting>
|
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally</programlisting>
|
||||||
If multiple <link linkend="zookeeper">zookeeper</link> instances make up your zookeeper ensemble,
|
If multiple ZooKeeper instances make up your zookeeper ensemble,
|
||||||
they may be specified in a comma-separated list (just as in the <filename>hbase-site.xml</filename> file).
|
they may be specified in a comma-separated list (just as in the <filename>hbase-site.xml</filename> file).
|
||||||
This populated <classname>Configuration</classname> instance can then be passed to an
|
This populated <classname>Configuration</classname> instance can then be passed to an
|
||||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>,
|
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>,
|
||||||
|
|
|
@ -12,9 +12,9 @@
|
||||||
<section>
|
<section>
|
||||||
<title>Introduction</title>
|
<title>Introduction</title>
|
||||||
|
|
||||||
<para><link linkend="quickstart">Quick Start</link> will get you up and
|
<para><xref linkend="quickstart" /> will get you up and
|
||||||
running on a single-node instance of HBase using the local filesystem. The
|
running on a single-node instance of HBase using the local filesystem. The
|
||||||
<link linkend="notsoquick">Not-so-quick Start Guide</link> describes setup
|
<xref linkend="notsoquick" /> describes setup
|
||||||
of HBase in distributed mode running on top of HDFS.</para>
|
of HBase in distributed mode running on top of HDFS.</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
@ -23,7 +23,7 @@
|
||||||
|
|
||||||
<para>This guide describes setup of a standalone HBase instance that uses
|
<para>This guide describes setup of a standalone HBase instance that uses
|
||||||
the local filesystem. It leads you through creating a table, inserting
|
the local filesystem. It leads you through creating a table, inserting
|
||||||
rows via the <link linkend="shell">HBase Shell</link>, and then cleaning
|
rows via the HBase <command>shell</command>, and then cleaning
|
||||||
up and shutting down your standalone HBase instance. The below exercise
|
up and shutting down your standalone HBase instance. The below exercise
|
||||||
should take no more than ten minutes (not including download time).</para>
|
should take no more than ten minutes (not including download time).</para>
|
||||||
|
|
||||||
|
@ -97,8 +97,7 @@ starting Master, logging to logs/hbase-user-master-example.org.out</programlisti
|
||||||
<section xml:id="shell_exercises">
|
<section xml:id="shell_exercises">
|
||||||
<title>Shell Exercises</title>
|
<title>Shell Exercises</title>
|
||||||
|
|
||||||
<para>Connect to your running HBase via the <link linkend="shell">HBase
|
<para>Connect to your running HBase via the <command>shell</command>.</para>
|
||||||
Shell</link>.</para>
|
|
||||||
|
|
||||||
<para><programlisting>$ ./bin/hbase shell
|
<para><programlisting>$ ./bin/hbase shell
|
||||||
HBase Shell; enter 'help<RETURN>' for list of supported commands.
|
HBase Shell; enter 'help<RETURN>' for list of supported commands.
|
||||||
|
@ -114,8 +113,7 @@ hbase(main):001:0> </programlisting></para>
|
||||||
HBase shell; in particular note how table names, rows, and columns,
|
HBase shell; in particular note how table names, rows, and columns,
|
||||||
etc., must be quoted.</para>
|
etc., must be quoted.</para>
|
||||||
|
|
||||||
<para>Create a table named <varname>test</varname> with a single <link
|
<para>Create a table named <varname>test</varname> with a single column family named <varname>cf</varname>.
|
||||||
linkend="columnfamily">column family</link> named <varname>cf</varname>.
|
|
||||||
Verify its creation by listing all tables and then insert some
|
Verify its creation by listing all tables and then insert some
|
||||||
values.</para>
|
values.</para>
|
||||||
|
|
||||||
|
@ -133,8 +131,7 @@ hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
|
||||||
|
|
||||||
<para>Above we inserted 3 values, one at a time. The first insert is at
|
<para>Above we inserted 3 values, one at a time. The first insert is at
|
||||||
<varname>row1</varname>, column <varname>cf:a</varname> with a value of
|
<varname>row1</varname>, column <varname>cf:a</varname> with a value of
|
||||||
<varname>value1</varname>. Columns in HBase are comprised of a <link
|
<varname>value1</varname>. Columns in HBase are comprised of a column family prefix --
|
||||||
linkend="columnfamily">column family</link> prefix --
|
|
||||||
<varname>cf</varname> in this example -- followed by a colon and then a
|
<varname>cf</varname> in this example -- followed by a colon and then a
|
||||||
column qualifier suffix (<varname>a</varname> in this case).</para>
|
column qualifier suffix (<varname>a</varname> in this case).</para>
|
||||||
|
|
||||||
|
@ -182,8 +179,7 @@ stopping hbase...............</programlisting></para>
|
||||||
<title>Where to go next</title>
|
<title>Where to go next</title>
|
||||||
|
|
||||||
<para>The above described standalone setup is good for testing and
|
<para>The above described standalone setup is good for testing and
|
||||||
experiments only. Move on to the next section, the <link
|
experiments only. Next move on to <xref linkend="notsoquick" /> where we'll go into
|
||||||
linkend="notsoquick">Not-so-quick Start Guide</link> where we'll go into
|
|
||||||
depth on the different HBase run modes, requirements and critical
|
depth on the different HBase run modes, requirements and critical
|
||||||
configurations needed setting up a distributed HBase deploy.</para>
|
configurations needed setting up a distributed HBase deploy.</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -437,9 +433,7 @@ stopping hbase...............</programlisting></para>
|
||||||
<section xml:id="standalone_dist">
|
<section xml:id="standalone_dist">
|
||||||
<title>HBase run modes: Standalone and Distributed</title>
|
<title>HBase run modes: Standalone and Distributed</title>
|
||||||
|
|
||||||
<para>HBase has two run modes: <link
|
<para>HBase has two run modes: <xref linkend="standalone" /> and <xref linkend="distributed" />. Out of the box, HBase runs in
|
||||||
linkend="standalone">standalone</link> and <link
|
|
||||||
linkend="distributed">distributed</link>. Out of the box, HBase runs in
|
|
||||||
standalone mode. To set up a distributed deploy, you will need to
|
standalone mode. To set up a distributed deploy, you will need to
|
||||||
configure HBase by editing files in the HBase <filename>conf</filename>
|
configure HBase by editing files in the HBase <filename>conf</filename>
|
||||||
directory.</para>
|
directory.</para>
|
||||||
|
@ -456,7 +450,7 @@ stopping hbase...............</programlisting></para>
|
||||||
<title>Standalone HBase</title>
|
<title>Standalone HBase</title>
|
||||||
|
|
||||||
<para>This is the default mode. Standalone mode is what is described
|
<para>This is the default mode. Standalone mode is what is described
|
||||||
in the <link linkend="quickstart">quickstart</link> section. In
|
in the <xref linkend="quickstart" /> section. In
|
||||||
standalone mode, HBase does not use HDFS -- it uses the local
|
standalone mode, HBase does not use HDFS -- it uses the local
|
||||||
filesystem instead -- and it runs all HBase daemons and a local
|
filesystem instead -- and it runs all HBase daemons and a local
|
||||||
zookeeper all up in the same JVM. Zookeeper binds to a well known port
|
zookeeper all up in the same JVM. Zookeeper binds to a well known port
|
||||||
|
@ -485,8 +479,7 @@ stopping hbase...............</programlisting></para>
|
||||||
verification and exploration of your install, whether a
|
verification and exploration of your install, whether a
|
||||||
<emphasis>pseudo-distributed</emphasis> or
|
<emphasis>pseudo-distributed</emphasis> or
|
||||||
<emphasis>fully-distributed</emphasis> configuration is described in a
|
<emphasis>fully-distributed</emphasis> configuration is described in a
|
||||||
section that follows, <link linkend="confirm">Running and Confirming
|
section that follows, <xref linkend="confirm" />. The same verification script applies to both
|
||||||
your Installation</link>. The same verification script applies to both
|
|
||||||
deploy types.</para>
|
deploy types.</para>
|
||||||
|
|
||||||
<section xml:id="pseudo">
|
<section xml:id="pseudo">
|
||||||
|
@ -499,10 +492,8 @@ stopping hbase...............</programlisting></para>
|
||||||
|
|
||||||
<para>Once you have confirmed your HDFS setup, edit
|
<para>Once you have confirmed your HDFS setup, edit
|
||||||
<filename>conf/hbase-site.xml</filename>. This is the file into
|
<filename>conf/hbase-site.xml</filename>. This is the file into
|
||||||
which you add local customizations and overrides for <link
|
which you add local customizations and overrides for
|
||||||
linkend="hbase_default_configurations">Default HBase
|
<xreg linkend="hbase_default_configurations" /> and <xref linkend="hdfs_client_conf" />. Point HBase at the running Hadoop HDFS
|
||||||
Configurations</link> and <link linkend="hdfs_client_conf">HDFS
|
|
||||||
Client Configurations</link>. Point HBase at the running Hadoop HDFS
|
|
||||||
instance by setting the <varname>hbase.rootdir</varname> property.
|
instance by setting the <varname>hbase.rootdir</varname> property.
|
||||||
This property points HBase at the Hadoop filesystem instance to use.
|
This property points HBase at the Hadoop filesystem instance to use.
|
||||||
For example, adding the properties below to your
|
For example, adding the properties below to your
|
||||||
|
@ -543,8 +534,7 @@ stopping hbase...............</programlisting></para>
|
||||||
want to connect from a remote location.</para>
|
want to connect from a remote location.</para>
|
||||||
</note>
|
</note>
|
||||||
|
|
||||||
<para>Now skip to <link linkend="confirm">Running and Confirming
|
<para>Now skip to <xref linkend="confirm" /> for how to start and verify your
|
||||||
your Installation</link> for how to start and verify your
|
|
||||||
pseudo-distributed install. <footnote>
|
pseudo-distributed install. <footnote>
|
||||||
<para>See <link
|
<para>See <link
|
||||||
xlink:href="http://hbase.apache.org/pseudo-distributed.html">Pseudo-distributed
|
xlink:href="http://hbase.apache.org/pseudo-distributed.html">Pseudo-distributed
|
||||||
|
@ -594,8 +584,7 @@ stopping hbase...............</programlisting></para>
|
||||||
|
|
||||||
<para>In addition, a fully-distributed mode requires that you
|
<para>In addition, a fully-distributed mode requires that you
|
||||||
modify <filename>conf/regionservers</filename>. The
|
modify <filename>conf/regionservers</filename>. The
|
||||||
<filename><link
|
<xref linkend="regionservers" /> file
|
||||||
linkend="regionservrers">regionservers</link></filename> file
|
|
||||||
lists all hosts that you would have running
|
lists all hosts that you would have running
|
||||||
<application>HRegionServer</application>s, one host per line (This
|
<application>HRegionServer</application>s, one host per line (This
|
||||||
file in HBase is like the Hadoop <filename>slaves</filename>
|
file in HBase is like the Hadoop <filename>slaves</filename>
|
||||||
|
@ -634,9 +623,7 @@ stopping hbase...............</programlisting></para>
|
||||||
by setting the
|
by setting the
|
||||||
<varname>hbase.zookeeper.property.clientPort</varname> property.
|
<varname>hbase.zookeeper.property.clientPort</varname> property.
|
||||||
For all default values used by HBase, including ZooKeeper
|
For all default values used by HBase, including ZooKeeper
|
||||||
configuration, see the section <link
|
configuration, see <xref linkend="hbase_default_configurations" />. Look for the
|
||||||
linkend="hbase_default_configurations">Default HBase
|
|
||||||
Configurations</link>. Look for the
|
|
||||||
<varname>hbase.zookeeper.property</varname> prefix <footnote>
|
<varname>hbase.zookeeper.property</varname> prefix <footnote>
|
||||||
<para>For the full list of ZooKeeper configurations, see
|
<para>For the full list of ZooKeeper configurations, see
|
||||||
ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
|
ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
|
||||||
|
@ -835,8 +822,7 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
<para>Once HBase has started, see the <link
|
<para>Once HBase has started, see the <xref linkend="shell_exercises" /> for how to
|
||||||
linkend="shell_exercises">Shell Exercises</link> section for how to
|
|
||||||
create tables, add data, scan your insertions, and finally disable and
|
create tables, add data, scan your insertions, and finally disable and
|
||||||
drop your tables.</para>
|
drop your tables.</para>
|
||||||
|
|
||||||
|
|
|
@ -15,6 +15,20 @@
|
||||||
factors involved; RAM, compression, JVM settings, etc. Afterward, come back
|
factors involved; RAM, compression, JVM settings, etc. Afterward, come back
|
||||||
here for more pointers.</para>
|
here for more pointers.</para>
|
||||||
|
|
||||||
|
<note xml:id="rpc.logging"><title>Enabling RPC-level logging</title>
|
||||||
|
<para>Enabling the RPC-level logging on a regionserver can often given
|
||||||
|
insight on timings at the server. Once enabled, the amount of log
|
||||||
|
spewed is voluminous. It is not recommended that you leave this
|
||||||
|
logging on for more than short bursts of time. To enable RPC-level
|
||||||
|
logging, browse to the regionserver UI and click on
|
||||||
|
<emphasis>Log Level</emphasis>. Set the log level to DEBUG for the package
|
||||||
|
<classname>org.apache.hadoop.ipc</classname> (Thats right, for
|
||||||
|
hadoop.ipc, NOT, hbase.ipc). Then tail the regionservers log.
|
||||||
|
Analyze.</para>
|
||||||
|
<para>To disable, set the logging level back to WARN level.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
|
||||||
<section xml:id="jvm">
|
<section xml:id="jvm">
|
||||||
<title>Java</title>
|
<title>Java</title>
|
||||||
|
|
||||||
|
@ -46,16 +60,14 @@
|
||||||
<section xml:id="perf.configurations">
|
<section xml:id="perf.configurations">
|
||||||
<title>Configurations</title>
|
<title>Configurations</title>
|
||||||
|
|
||||||
<para>See the section on <link
|
<para>See <xref linkend="recommended_configurations" />.</para>
|
||||||
linkend="recommended_configurations">recommended
|
|
||||||
configurations</link>.</para>
|
|
||||||
|
|
||||||
<section xml:id="perf.number.of.regions">
|
<section xml:id="perf.number.of.regions">
|
||||||
<title>Number of Regions</title>
|
<title>Number of Regions</title>
|
||||||
|
|
||||||
<para>The number of regions for an HBase table is driven by the <link
|
<para>The number of regions for an HBase table is driven by the <xref
|
||||||
linkend="bigger.regions">filesize</link>. Also, see the architecture
|
linkend="bigger.regions" />. Also, see the architecture
|
||||||
section on <link linkend="arch.regions.size">region size</link></para>
|
section on <xref linkend="arch.regions.size" /></para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="perf.compactions.and.splits">
|
<section xml:id="perf.compactions.and.splits">
|
||||||
|
@ -68,18 +80,28 @@
|
||||||
|
|
||||||
<section xml:id="perf.compression">
|
<section xml:id="perf.compression">
|
||||||
<title>Compression</title>
|
<title>Compression</title>
|
||||||
|
<para>Production systems should use compression such as <xref linkend="lzo" /> compression with their column family
|
||||||
<para>Production systems should use compression such as <link
|
|
||||||
linkend="lzo">LZO</link> compression with their column family
|
|
||||||
definitions.</para>
|
definitions.</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section xml:id="perf.handlers">
|
||||||
|
<title><varname>hbase.regionserver.handler.count</varname></title>
|
||||||
|
<para>This setting is in essence sets how many requests are
|
||||||
|
concurrently being processed inside the regionserver at any
|
||||||
|
one time. If set too high, then throughput may suffer as
|
||||||
|
the concurrent requests contend; if set too low, requests will
|
||||||
|
be stuck waiting to get into the machine. You can get a
|
||||||
|
sense of whether you have too little or too many handlers by
|
||||||
|
<xref linkend="rpc.logging" />
|
||||||
|
on an individual regionserver then tailing its logs.</para>
|
||||||
|
</section>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="perf.number.of.cfs">
|
<section xml:id="perf.number.of.cfs">
|
||||||
<title>Number of Column Families</title>
|
<title>Number of Column Families</title>
|
||||||
|
|
||||||
<para>See the section on <link linkend="number.of.cfs">Number of Column
|
<para>See <xref linkend="number.of.cfs" />.</para>
|
||||||
Families</link>.</para>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="perf.one.region">
|
<section xml:id="perf.one.region">
|
||||||
|
|
|
@ -23,4 +23,25 @@
|
||||||
hope to fill in the holes with time. Feel free to add to this book by adding
|
hope to fill in the holes with time. Feel free to add to this book by adding
|
||||||
a patch to an issue up in the HBase <link
|
a patch to an issue up in the HBase <link
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>.</para>
|
xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>.</para>
|
||||||
|
|
||||||
|
<note xml:id="headsup">
|
||||||
|
<title>Heads-up</title>
|
||||||
|
<para>
|
||||||
|
If this is your first foray into the wonderful world of
|
||||||
|
Distributed Computing, then you are in for
|
||||||
|
some interesting times. First off, distributed systems are
|
||||||
|
hard; making a distributed system hum requires a disparate
|
||||||
|
skillset that needs span systems (hardware and software) and
|
||||||
|
networking. Your cluster' operation can hiccup because of any
|
||||||
|
of a myriad set of reasons from bugs in HBase itself through misconfigurations
|
||||||
|
-- misconfiguration of HBase but also operating system misconfigurations --
|
||||||
|
through to hardware problems whether it be a bug in your network card
|
||||||
|
drivers or an underprovisioned RAM bus (to mention two recent
|
||||||
|
examples of hardware issues that manifested as "HBase is slow").
|
||||||
|
You will also need to do a recalibration if up to this your
|
||||||
|
computing has been bound to a single box. Here is one good
|
||||||
|
starting point:
|
||||||
|
<link xlink:href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing">Fallacies of Distributed Computing</link>.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
</preface>
|
</preface>
|
||||||
|
|
|
@ -24,7 +24,7 @@
|
||||||
arguments are entered into the
|
arguments are entered into the
|
||||||
HBase shell; in particular note how table names, rows, and
|
HBase shell; in particular note how table names, rows, and
|
||||||
columns, etc., must be quoted.</para>
|
columns, etc., must be quoted.</para>
|
||||||
<para>See <link linkend="shell_exercises">Shell Exercises</link>
|
<para>See <xref linkend="shell_exercises" />
|
||||||
for example basic shell operation.</para>
|
for example basic shell operation.</para>
|
||||||
|
|
||||||
<section xml:id="scripting"><title>Scripting</title>
|
<section xml:id="scripting"><title>Scripting</title>
|
||||||
|
|
|
@ -9,8 +9,7 @@
|
||||||
xmlns:db="http://docbook.org/ns/docbook">
|
xmlns:db="http://docbook.org/ns/docbook">
|
||||||
<title>Upgrading</title>
|
<title>Upgrading</title>
|
||||||
<para>
|
<para>
|
||||||
Review the <link linkend="requirements">requirements</link>
|
Review <xref linkend="requirements" />, in particular the section on Hadoop version.
|
||||||
section above, in particular the section on Hadoop version.
|
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="upgrade0.90">
|
<section xml:id="upgrade0.90">
|
||||||
<title>Upgrading to HBase 0.90.x from 0.20.x or 0.89.x</title>
|
<title>Upgrading to HBase 0.90.x from 0.20.x or 0.89.x</title>
|
||||||
|
@ -30,7 +29,7 @@
|
||||||
HBase jar and read from there. If you would like to review
|
HBase jar and read from there. If you would like to review
|
||||||
the content of this file, see it in the src tree at
|
the content of this file, see it in the src tree at
|
||||||
<filename>src/main/resources/hbase-default.xml</filename> or
|
<filename>src/main/resources/hbase-default.xml</filename> or
|
||||||
see <link linkend="hbase_default_configurations">Default HBase Configurations</link>.
|
see <xref linkend="hbase_default_configurations" />.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
Finally, if upgrading from 0.20.x, check your
|
Finally, if upgrading from 0.20.x, check your
|
||||||
|
|
Loading…
Reference in New Issue