More edits: Moved ZK to its own chapter, put the bloom filter stuff together in one place, made the distributed setup more focused
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1389153 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
7d709c965a
commit
623a9be04d
|
@ -2319,65 +2319,6 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
|
||||||
|
|
||||||
</section> <!-- store -->
|
</section> <!-- store -->
|
||||||
|
|
||||||
<section xml:id="blooms">
|
|
||||||
<title>Bloom Filters</title>
|
|
||||||
<para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
|
|
||||||
Add bloomfilters</link>.<footnote>
|
|
||||||
<para>For description of the development process -- why static blooms
|
|
||||||
rather than dynamic -- and for an overview of the unique properties
|
|
||||||
that pertain to blooms in HBase, as well as possible future
|
|
||||||
directions, see the <emphasis>Development Process</emphasis> section
|
|
||||||
of the document <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
|
||||||
in HBase</link> attached to <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
|
|
||||||
</footnote><footnote>
|
|
||||||
<para>The bloom filters described here are actually version two of
|
|
||||||
blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
|
|
||||||
option based on work done by the <link
|
|
||||||
xlink:href="http://www.one-lab.org">European Commission One-Lab
|
|
||||||
Project 034819</link>. The core of the HBase bloom work was later
|
|
||||||
pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
|
|
||||||
Version 1 of HBase blooms never worked that well. Version 2 is a
|
|
||||||
rewrite from scratch though again it starts with the one-lab
|
|
||||||
work.</para>
|
|
||||||
</footnote></para>
|
|
||||||
<para>See also <xref linkend="schema.bloom" /> and <xref linkend="config.bloom" />.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<section xml:id="bloom_footprint">
|
|
||||||
<title>Bloom StoreFile footprint</title>
|
|
||||||
|
|
||||||
<para>Bloom filters add an entry to the <classname>StoreFile</classname>
|
|
||||||
general <classname>FileInfo</classname> data structure and then two
|
|
||||||
extra entries to the <classname>StoreFile</classname> metadata
|
|
||||||
section.</para>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>BloomFilter in the <classname>StoreFile</classname>
|
|
||||||
<classname>FileInfo</classname> data structure</title>
|
|
||||||
|
|
||||||
<para><classname>FileInfo</classname> has a
|
|
||||||
<varname>BLOOM_FILTER_TYPE</varname> entry which is set to
|
|
||||||
<varname>NONE</varname>, <varname>ROW</varname> or
|
|
||||||
<varname>ROWCOL.</varname></para>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>BloomFilter entries in <classname>StoreFile</classname>
|
|
||||||
metadata</title>
|
|
||||||
|
|
||||||
<para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
|
|
||||||
Function used, etc. Its small in size and is cached on
|
|
||||||
<classname>StoreFile.Reader</classname> load</para>
|
|
||||||
<para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
|
|
||||||
data. Obtained on-demand. Stored in the LRU cache, if it is enabled
|
|
||||||
(Its enabled by default).</para>
|
|
||||||
</section>
|
|
||||||
</section>
|
|
||||||
</section> <!-- bloom -->
|
|
||||||
|
|
||||||
</section> <!-- regions -->
|
</section> <!-- regions -->
|
||||||
|
|
||||||
<section xml:id="arch.bulk.load"><title>Bulk Loading</title>
|
<section xml:id="arch.bulk.load"><title>Bulk Loading</title>
|
||||||
|
@ -2519,6 +2460,7 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
|
||||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="case_studies.xml" />
|
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="case_studies.xml" />
|
||||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ops_mgt.xml" />
|
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ops_mgt.xml" />
|
||||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="developer.xml" />
|
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="developer.xml" />
|
||||||
|
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="zookeeper.xml" />
|
||||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="community.xml" />
|
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="community.xml" />
|
||||||
|
|
||||||
<appendix xml:id="faq">
|
<appendix xml:id="faq">
|
||||||
|
|
|
@ -27,8 +27,10 @@
|
||||||
*/
|
*/
|
||||||
-->
|
-->
|
||||||
<title>Configuration</title>
|
<title>Configuration</title>
|
||||||
<para>This chapter is the Not-So-Quick start guide to HBase configuration.</para>
|
<para>This chapter is the Not-So-Quick start guide to HBase configuration. It goes
|
||||||
<para>Please read this chapter carefully and ensure that all requirements have
|
over system requirements, Hadoop setup, the different HBase run modes, and the
|
||||||
|
various configurations in HBase. Please read this chapter carefully and ensure
|
||||||
|
that all <xref linkend="basic.requirements" /> have
|
||||||
been satisfied. Failure to do so will cause you (and us) grief debugging strange errors
|
been satisfied. Failure to do so will cause you (and us) grief debugging strange errors
|
||||||
and/or data loss.</para>
|
and/or data loss.</para>
|
||||||
|
|
||||||
|
@ -56,6 +58,10 @@ to ensure well-formedness of your document after an edit session.
|
||||||
all nodes of the cluster. HBase will not do this for you.
|
all nodes of the cluster. HBase will not do this for you.
|
||||||
Use <command>rsync</command>.</para>
|
Use <command>rsync</command>.</para>
|
||||||
|
|
||||||
|
<section xml:id="basic.requirements">
|
||||||
|
<title>Basic Requirements</title>
|
||||||
|
<para>This section lists required services and some required system configuration.
|
||||||
|
</para>
|
||||||
<section xml:id="java">
|
<section xml:id="java">
|
||||||
<title>Java</title>
|
<title>Java</title>
|
||||||
|
|
||||||
|
@ -237,7 +243,6 @@ to ensure well-formedness of your document after an edit session.
|
||||||
Currently only Hadoop versions 0.20.205.x or any release in excess of this
|
Currently only Hadoop versions 0.20.205.x or any release in excess of this
|
||||||
version -- this includes hadoop 1.0.0 -- have a working, durable sync
|
version -- this includes hadoop 1.0.0 -- have a working, durable sync
|
||||||
<footnote>
|
<footnote>
|
||||||
<title>On Hadoop Versions</title>
|
|
||||||
<para>The Cloudera blog post <link xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An update on Apache Hadoop 1.0</link>
|
<para>The Cloudera blog post <link xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An update on Apache Hadoop 1.0</link>
|
||||||
by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
|
by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
|
||||||
Its worth checking out if you are having trouble making sense of the
|
Its worth checking out if you are having trouble making sense of the
|
||||||
|
@ -352,6 +357,7 @@ to ensure well-formedness of your document after an edit session.
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section> <!-- hadoop -->
|
</section> <!-- hadoop -->
|
||||||
|
</section>
|
||||||
|
|
||||||
<section xml:id="standalone_dist">
|
<section xml:id="standalone_dist">
|
||||||
<title>HBase run modes: Standalone and Distributed</title>
|
<title>HBase run modes: Standalone and Distributed</title>
|
||||||
|
@ -686,565 +692,6 @@ stopping hbase...............</programlisting> Shutdown can take a moment to
|
||||||
</section>
|
</section>
|
||||||
</section> <!-- run modes -->
|
</section> <!-- run modes -->
|
||||||
|
|
||||||
<section xml:id="zookeeper">
|
|
||||||
<title>ZooKeeper<indexterm>
|
|
||||||
<primary>ZooKeeper</primary>
|
|
||||||
</indexterm></title>
|
|
||||||
|
|
||||||
<para>A distributed HBase depends on a running ZooKeeper cluster.
|
|
||||||
All participating nodes and clients need to be able to access the
|
|
||||||
running ZooKeeper ensemble. HBase by default manages a ZooKeeper
|
|
||||||
"cluster" for you. It will start and stop the ZooKeeper ensemble
|
|
||||||
as part of the HBase start/stop process. You can also manage the
|
|
||||||
ZooKeeper ensemble independent of HBase and just point HBase at
|
|
||||||
the cluster it should use. To toggle HBase management of
|
|
||||||
ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
|
|
||||||
<filename>conf/hbase-env.sh</filename>. This variable, which
|
|
||||||
defaults to <varname>true</varname>, tells HBase whether to
|
|
||||||
start/stop the ZooKeeper ensemble servers as part of HBase
|
|
||||||
start/stop.</para>
|
|
||||||
|
|
||||||
<para>When HBase manages the ZooKeeper ensemble, you can specify
|
|
||||||
ZooKeeper configuration using its native
|
|
||||||
<filename>zoo.cfg</filename> file, or, the easier option is to
|
|
||||||
just specify ZooKeeper options directly in
|
|
||||||
<filename>conf/hbase-site.xml</filename>. A ZooKeeper
|
|
||||||
configuration option can be set as a property in the HBase
|
|
||||||
<filename>hbase-site.xml</filename> XML configuration file by
|
|
||||||
prefacing the ZooKeeper option name with
|
|
||||||
<varname>hbase.zookeeper.property</varname>. For example, the
|
|
||||||
<varname>clientPort</varname> setting in ZooKeeper can be changed
|
|
||||||
by setting the
|
|
||||||
<varname>hbase.zookeeper.property.clientPort</varname> property.
|
|
||||||
For all default values used by HBase, including ZooKeeper
|
|
||||||
configuration, see <xref linkend="hbase_default_configurations" />. Look for the
|
|
||||||
<varname>hbase.zookeeper.property</varname> prefix <footnote>
|
|
||||||
<para>For the full list of ZooKeeper configurations, see
|
|
||||||
ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
|
|
||||||
with a <filename>zoo.cfg</filename> so you will need to browse
|
|
||||||
the <filename>conf</filename> directory in an appropriate
|
|
||||||
ZooKeeper download.</para>
|
|
||||||
</footnote></para>
|
|
||||||
|
|
||||||
<para>You must at least list the ensemble servers in
|
|
||||||
<filename>hbase-site.xml</filename> using the
|
|
||||||
<varname>hbase.zookeeper.quorum</varname> property. This property
|
|
||||||
defaults to a single ensemble member at
|
|
||||||
<varname>localhost</varname> which is not suitable for a fully
|
|
||||||
distributed HBase. (It binds to the local machine only and remote
|
|
||||||
clients will not be able to connect). <note xml:id="how_many_zks">
|
|
||||||
<title>How many ZooKeepers should I run?</title>
|
|
||||||
|
|
||||||
<para>You can run a ZooKeeper ensemble that comprises 1 node
|
|
||||||
only but in production it is recommended that you run a
|
|
||||||
ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
|
|
||||||
ensemble has, the more tolerant the ensemble is of host
|
|
||||||
failures. Also, run an odd number of machines. In ZooKeeper,
|
|
||||||
an even number of peers is supported, but it is normally not used
|
|
||||||
because an even sized ensemble requires, proportionally, more peers
|
|
||||||
to form a quorum than an odd sized ensemble requires. For example, an
|
|
||||||
ensemble with 4 peers requires 3 to form a quorum, while an ensemble with
|
|
||||||
5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to
|
|
||||||
fail, and thus is more fault tolerant than the ensemble of 4, which allows
|
|
||||||
only 1 down peer.
|
|
||||||
</para>
|
|
||||||
<para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
|
|
||||||
dedicated disk (A dedicated disk is the best thing you can do
|
|
||||||
to ensure a performant ZooKeeper ensemble). For very heavily
|
|
||||||
loaded clusters, run ZooKeeper servers on separate machines
|
|
||||||
from RegionServers (DataNodes and TaskTrackers).</para>
|
|
||||||
</note></para>
|
|
||||||
|
|
||||||
<para>For example, to have HBase manage a ZooKeeper quorum on
|
|
||||||
nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
|
|
||||||
port 2222 (the default is 2181) ensure
|
|
||||||
<varname>HBASE_MANAGE_ZK</varname> is commented out or set to
|
|
||||||
<varname>true</varname> in <filename>conf/hbase-env.sh</filename>
|
|
||||||
and then edit <filename>conf/hbase-site.xml</filename> and set
|
|
||||||
<varname>hbase.zookeeper.property.clientPort</varname> and
|
|
||||||
<varname>hbase.zookeeper.quorum</varname>. You should also set
|
|
||||||
<varname>hbase.zookeeper.property.dataDir</varname> to other than
|
|
||||||
the default as the default has ZooKeeper persist data under
|
|
||||||
<filename>/tmp</filename> which is often cleared on system
|
|
||||||
restart. In the example below we have ZooKeeper persist to
|
|
||||||
<filename>/user/local/zookeeper</filename>. <programlisting>
|
|
||||||
<configuration>
|
|
||||||
...
|
|
||||||
<property>
|
|
||||||
<name>hbase.zookeeper.property.clientPort</name>
|
|
||||||
<value>2222</value>
|
|
||||||
<description>Property from ZooKeeper's config zoo.cfg.
|
|
||||||
The port at which the clients will connect.
|
|
||||||
</description>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hbase.zookeeper.quorum</name>
|
|
||||||
<value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
|
|
||||||
<description>Comma separated list of servers in the ZooKeeper Quorum.
|
|
||||||
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
|
|
||||||
By default this is set to localhost for local and pseudo-distributed modes
|
|
||||||
of operation. For a fully-distributed setup, this should be set to a full
|
|
||||||
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
|
|
||||||
this is the list of servers which we will start/stop ZooKeeper on.
|
|
||||||
</description>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hbase.zookeeper.property.dataDir</name>
|
|
||||||
<value>/usr/local/zookeeper</value>
|
|
||||||
<description>Property from ZooKeeper's config zoo.cfg.
|
|
||||||
The directory where the snapshot is stored.
|
|
||||||
</description>
|
|
||||||
</property>
|
|
||||||
...
|
|
||||||
</configuration></programlisting></para>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>Using existing ZooKeeper ensemble</title>
|
|
||||||
|
|
||||||
<para>To point HBase at an existing ZooKeeper cluster, one that
|
|
||||||
is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
|
|
||||||
in <filename>conf/hbase-env.sh</filename> to false
|
|
||||||
<programlisting>
|
|
||||||
...
|
|
||||||
# Tell HBase whether it should manage its own instance of Zookeeper or not.
|
|
||||||
export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
|
|
||||||
and client port, if non-standard, in
|
|
||||||
<filename>hbase-site.xml</filename>, or add a suitably
|
|
||||||
configured <filename>zoo.cfg</filename> to HBase's
|
|
||||||
<filename>CLASSPATH</filename>. HBase will prefer the
|
|
||||||
configuration found in <filename>zoo.cfg</filename> over any
|
|
||||||
settings in <filename>hbase-site.xml</filename>.</para>
|
|
||||||
|
|
||||||
<para>When HBase manages ZooKeeper, it will start/stop the
|
|
||||||
ZooKeeper servers as a part of the regular start/stop scripts.
|
|
||||||
If you would like to run ZooKeeper yourself, independent of
|
|
||||||
HBase start/stop, you would do the following</para>
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
<para>Note that you can use HBase in this manner to spin up a
|
|
||||||
ZooKeeper cluster, unrelated to HBase. Just make sure to set
|
|
||||||
<varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
|
|
||||||
if you want it to stay up across HBase restarts so that when
|
|
||||||
HBase shuts down, it doesn't take ZooKeeper down with it.</para>
|
|
||||||
|
|
||||||
<para>For more information about running a distinct ZooKeeper
|
|
||||||
cluster, see the ZooKeeper <link
|
|
||||||
xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
|
|
||||||
Started Guide</link>. Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the
|
|
||||||
<link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link>
|
|
||||||
for more information on ZooKeeper sizing.
|
|
||||||
</para>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
|
|
||||||
<section xml:id="zk.sasl.auth">
|
|
||||||
<title>SASL Authentication with ZooKeeper</title>
|
|
||||||
<para>Newer releases of HBase (>= 0.92) will
|
|
||||||
support connecting to a ZooKeeper Quorum that supports
|
|
||||||
SASL authentication (which is available in Zookeeper
|
|
||||||
versions 3.4.0 or later).</para>
|
|
||||||
|
|
||||||
<para>This describes how to set up HBase to mutually
|
|
||||||
authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
|
|
||||||
mutual authentication (<link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
|
|
||||||
is required as part of a complete secure HBase configuration
|
|
||||||
(<link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
|
|
||||||
|
|
||||||
For simplicity of explication, this section ignores
|
|
||||||
additional configuration required (Secure HDFS and Coprocessor
|
|
||||||
configuration). It's recommended to begin with an
|
|
||||||
HBase-managed Zookeeper configuration (as opposed to a
|
|
||||||
standalone Zookeeper quorum) for ease of learning.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<section><title>Operating System Prerequisites</title></section>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
You need to have a working Kerberos KDC setup. For
|
|
||||||
each <code>$HOST</code> that will run a ZooKeeper
|
|
||||||
server, you should have a principle
|
|
||||||
<code>zookeeper/$HOST</code>. For each such host,
|
|
||||||
add a service key (using the <code>kadmin</code> or
|
|
||||||
<code>kadmin.local</code> tool's <code>ktadd</code>
|
|
||||||
command) for <code>zookeeper/$HOST</code> and copy
|
|
||||||
this file to <code>$HOST</code>, and make it
|
|
||||||
readable only to the user that will run zookeeper on
|
|
||||||
<code>$HOST</code>. Note the location of this file,
|
|
||||||
which we will use below as
|
|
||||||
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Similarly, for each <code>$HOST</code> that will run
|
|
||||||
an HBase server (master or regionserver), you should
|
|
||||||
have a principle: <code>hbase/$HOST</code>. For each
|
|
||||||
host, add a keytab file called
|
|
||||||
<filename>hbase.keytab</filename> containing a service
|
|
||||||
key for <code>hbase/$HOST</code>, copy this file to
|
|
||||||
<code>$HOST</code>, and make it readable only to the
|
|
||||||
user that will run an HBase service on
|
|
||||||
<code>$HOST</code>. Note the location of this file,
|
|
||||||
which we will use below as
|
|
||||||
<filename>$PATH_TO_HBASE_KEYTAB</filename>.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Each user who will be an HBase client should also be
|
|
||||||
given a Kerberos principal. This principal should
|
|
||||||
usually have a password assigned to it (as opposed to,
|
|
||||||
as with the HBase servers, a keytab file) which only
|
|
||||||
this user knows. The client's principal's
|
|
||||||
<code>maxrenewlife</code> should be set so that it can
|
|
||||||
be renewed enough so that the user can complete their
|
|
||||||
HBase client processes. For example, if a user runs a
|
|
||||||
long-running HBase client process that takes at most 3
|
|
||||||
days, we might create this user's principal within
|
|
||||||
<code>kadmin</code> with: <code>addprinc -maxrenewlife
|
|
||||||
3days</code>. The Zookeeper client and server
|
|
||||||
libraries manage their own ticket refreshment by
|
|
||||||
running threads that wake up periodically to do the
|
|
||||||
refreshment.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>On each host that will run an HBase client
|
|
||||||
(e.g. <code>hbase shell</code>), add the following
|
|
||||||
file to the HBase home directory's <filename>conf</filename>
|
|
||||||
directory:</para>
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
Client {
|
|
||||||
com.sun.security.auth.module.Krb5LoginModule required
|
|
||||||
useKeyTab=false
|
|
||||||
useTicketCache=true;
|
|
||||||
};
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
<para>We'll refer to this JAAS configuration file as
|
|
||||||
<filename>$CLIENT_CONF</filename> below.</para>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>HBase-managed Zookeeper Configuration</title>
|
|
||||||
|
|
||||||
<para>On each node that will run a zookeeper, a
|
|
||||||
master, or a regionserver, create a <link
|
|
||||||
xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
|
|
||||||
configuration file in the conf directory of the node's
|
|
||||||
<filename>HBASE_HOME</filename> directory that looks like the
|
|
||||||
following:</para>
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
Server {
|
|
||||||
com.sun.security.auth.module.Krb5LoginModule required
|
|
||||||
useKeyTab=true
|
|
||||||
keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
|
|
||||||
storeKey=true
|
|
||||||
useTicketCache=false
|
|
||||||
principal="zookeeper/$HOST";
|
|
||||||
};
|
|
||||||
Client {
|
|
||||||
com.sun.security.auth.module.Krb5LoginModule required
|
|
||||||
useKeyTab=true
|
|
||||||
useTicketCache=false
|
|
||||||
keyTab="$PATH_TO_HBASE_KEYTAB"
|
|
||||||
principal="hbase/$HOST";
|
|
||||||
};
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
|
|
||||||
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
|
|
||||||
you created above, and <code>$HOST</code> is the hostname for that
|
|
||||||
node.
|
|
||||||
|
|
||||||
<para>The <code>Server</code> section will be used by
|
|
||||||
the Zookeeper quorum server, while the
|
|
||||||
<code>Client</code> section will be used by the HBase
|
|
||||||
master and regionservers. The path to this file should
|
|
||||||
be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
|
|
||||||
in the <filename>hbase-env.sh</filename>
|
|
||||||
listing below.</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
The path to this file should be substituted for the
|
|
||||||
text <filename>$CLIENT_CONF</filename> in the
|
|
||||||
<filename>hbase-env.sh</filename> listing below.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>Modify your <filename>hbase-env.sh</filename> to include the
|
|
||||||
following:</para>
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
|
|
||||||
export HBASE_MANAGES_ZK=true
|
|
||||||
export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
|
||||||
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
|
||||||
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
where <filename>$HBASE_SERVER_CONF</filename> and
|
|
||||||
<filename>$CLIENT_CONF</filename> are the full paths to the
|
|
||||||
JAAS configuration files created above.
|
|
||||||
|
|
||||||
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
|
||||||
that will run zookeeper, master or regionserver to contain:</para>
|
|
||||||
|
|
||||||
<programlisting><![CDATA[
|
|
||||||
<configuration>
|
|
||||||
<property>
|
|
||||||
<name>hbase.zookeeper.quorum</name>
|
|
||||||
<value>$ZK_NODES</value>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hbase.cluster.distributed</name>
|
|
||||||
<value>true</value>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hbase.zookeeper.property.authProvider.1</name>
|
|
||||||
<value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
|
|
||||||
<value>true</value>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
|
|
||||||
<value>true</value>
|
|
||||||
</property>
|
|
||||||
</configuration>
|
|
||||||
]]></programlisting>
|
|
||||||
|
|
||||||
<para>where <code>$ZK_NODES</code> is the
|
|
||||||
comma-separated list of hostnames of the Zookeeper
|
|
||||||
Quorum hosts.</para>
|
|
||||||
|
|
||||||
<para>Start your hbase cluster by running one or more
|
|
||||||
of the following set of commands on the appropriate
|
|
||||||
hosts:
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
bin/hbase zookeeper start
|
|
||||||
bin/hbase master start
|
|
||||||
bin/hbase regionserver start
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section><title>External Zookeeper Configuration</title>
|
|
||||||
<para>Add a JAAS configuration file that looks like:
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
Client {
|
|
||||||
com.sun.security.auth.module.Krb5LoginModule required
|
|
||||||
useKeyTab=true
|
|
||||||
useTicketCache=false
|
|
||||||
keyTab="$PATH_TO_HBASE_KEYTAB"
|
|
||||||
principal="hbase/$HOST";
|
|
||||||
};
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab
|
|
||||||
created above for HBase services to run on this host, and <code>$HOST</code> is the
|
|
||||||
hostname for that node. Put this in the HBase home's
|
|
||||||
configuration directory. We'll refer to this file's
|
|
||||||
full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
|
|
||||||
|
|
||||||
<para>Modify your hbase-env.sh to include the following:</para>
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
|
|
||||||
export HBASE_MANAGES_ZK=false
|
|
||||||
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
|
||||||
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
|
|
||||||
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
|
||||||
that will run a master or regionserver to contain:</para>
|
|
||||||
|
|
||||||
<programlisting><![CDATA[
|
|
||||||
<configuration>
|
|
||||||
<property>
|
|
||||||
<name>hbase.zookeeper.quorum</name>
|
|
||||||
<value>$ZK_NODES</value>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hbase.cluster.distributed</name>
|
|
||||||
<value>true</value>
|
|
||||||
</property>
|
|
||||||
</configuration>
|
|
||||||
]]>
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
<para>where <code>$ZK_NODES</code> is the
|
|
||||||
comma-separated list of hostnames of the Zookeeper
|
|
||||||
Quorum hosts.</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
|
|
||||||
<programlisting>
|
|
||||||
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
|
|
||||||
kerberos.removeHostFromPrincipal=true
|
|
||||||
kerberos.removeRealmFromPrincipal=true
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
Also on each of these hosts, create a JAAS configuration file containing:
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
Server {
|
|
||||||
com.sun.security.auth.module.Krb5LoginModule required
|
|
||||||
useKeyTab=true
|
|
||||||
keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
|
|
||||||
storeKey=true
|
|
||||||
useTicketCache=false
|
|
||||||
principal="zookeeper/$HOST";
|
|
||||||
};
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
where <code>$HOST</code> is the hostname of each
|
|
||||||
Quorum host. We will refer to the full pathname of
|
|
||||||
this file as <filename>$ZK_SERVER_CONF</filename> below.
|
|
||||||
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Start your Zookeepers on each Zookeeper Quorum host with:
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
bin/hbase master start
|
|
||||||
bin/hbase regionserver start
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>Zookeeper Server Authentication Log Output</title>
|
|
||||||
<para>If the configuration above is successful,
|
|
||||||
you should see something similar to the following in
|
|
||||||
your Zookeeper server logs:
|
|
||||||
<programlisting>
|
|
||||||
11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
|
|
||||||
11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
|
|
||||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
|
|
||||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:39 UTC 2011
|
|
||||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:39 UTC 2011
|
|
||||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
|
|
||||||
..
|
|
||||||
11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler:
|
|
||||||
Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN;
|
|
||||||
authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
|
|
||||||
11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
|
|
||||||
11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
</para>
|
|
||||||
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>Zookeeper Client Authentication Log Output</title>
|
|
||||||
<para>On the Zookeeper client side (HBase master or regionserver),
|
|
||||||
you should see something similar to the following:
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
|
|
||||||
11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:59 UTC 2011
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:59 UTC 2011
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
|
|
||||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
|
|
||||||
</programlisting>
|
|
||||||
</para>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>Configuration from Scratch</title>
|
|
||||||
|
|
||||||
This has been tested on the current standard Amazon
|
|
||||||
Linux AMI. First setup KDC and principals as
|
|
||||||
described above. Next checkout code and run a sanity
|
|
||||||
check.
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
git clone git://git.apache.org/hbase.git
|
|
||||||
cd hbase
|
|
||||||
mvn -PlocalTests clean test -Dtest=TestZooKeeperACL
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
Then configure HBase as described above.
|
|
||||||
Manually edit target/cached_classpath.txt (see below)..
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
bin/hbase zookeeper &
|
|
||||||
bin/hbase master &
|
|
||||||
bin/hbase regionserver &
|
|
||||||
</programlisting>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>Future improvements</title>
|
|
||||||
|
|
||||||
<section><title>Fix target/cached_classpath.txt</title>
|
|
||||||
<para>
|
|
||||||
You must override the standard hadoop-core jar file from the
|
|
||||||
<code>target/cached_classpath.txt</code>
|
|
||||||
file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt
|
|
||||||
mv target/tmp.txt target/cached_classpath.txt
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
</para>
|
|
||||||
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>Set JAAS configuration
|
|
||||||
programmatically</title>
|
|
||||||
|
|
||||||
|
|
||||||
This would avoid the need for a separate Hadoop jar
|
|
||||||
that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title>Elimination of
|
|
||||||
<code>kerberos.removeHostFromPrincipal</code> and
|
|
||||||
<code>kerberos.removeRealmFromPrincipal</code></title>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
</section>
|
|
||||||
|
|
||||||
|
|
||||||
</section> <!-- SASL Authentication with ZooKeeper -->
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
</section> <!-- zookeeper -->
|
|
||||||
|
|
||||||
|
|
||||||
<section xml:id="config.files">
|
<section xml:id="config.files">
|
||||||
|
@ -1704,34 +1151,4 @@ of all regions.
|
||||||
|
|
||||||
</section> <!-- important config -->
|
</section> <!-- important config -->
|
||||||
|
|
||||||
<section xml:id="config.bloom">
|
|
||||||
<title>Bloom Filter Configuration</title>
|
|
||||||
<section>
|
|
||||||
<title><varname>io.hfile.bloom.enabled</varname> global kill
|
|
||||||
switch</title>
|
|
||||||
|
|
||||||
<para><code>io.hfile.bloom.enabled</code> in
|
|
||||||
<classname>Configuration</classname> serves as the kill switch in case
|
|
||||||
something goes wrong. Default = <varname>true</varname>.</para>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title><varname>io.hfile.bloom.error.rate</varname></title>
|
|
||||||
|
|
||||||
<para><varname>io.hfile.bloom.error.rate</varname> = average false
|
|
||||||
positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
|
|
||||||
bit per bloom entry.</para>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<title><varname>io.hfile.bloom.max.fold</varname></title>
|
|
||||||
|
|
||||||
<para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
|
|
||||||
fold rate. Most people should leave this alone. Default = 7, or can
|
|
||||||
collapse to at least 1/128th of original size. See the
|
|
||||||
<emphasis>Development Process</emphasis> section of the document <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
|
||||||
in HBase</link> for more on what this option means.</para>
|
|
||||||
</section>
|
|
||||||
</section>
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
|
@ -33,8 +33,9 @@
|
||||||
|
|
||||||
<para><xref linkend="quickstart" /> will get you up and
|
<para><xref linkend="quickstart" /> will get you up and
|
||||||
running on a single-node instance of HBase using the local filesystem.
|
running on a single-node instance of HBase using the local filesystem.
|
||||||
<xref linkend="configuration" /> describes setup
|
<xref linkend="configuration" /> describes basic system
|
||||||
of HBase in distributed mode running on top of HDFS.</para>
|
requirements and configuration running HBase in distributed mode
|
||||||
|
on top of HDFS.</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="quickstart">
|
<section xml:id="quickstart">
|
||||||
|
@ -51,7 +52,7 @@
|
||||||
|
|
||||||
<para>Choose a download site from this list of <link
|
<para>Choose a download site from this list of <link
|
||||||
xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download
|
xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download
|
||||||
Mirrors</link>. Click on suggested top link. This will take you to a
|
Mirrors</link>. Click on the suggested top link. This will take you to a
|
||||||
mirror of <emphasis>HBase Releases</emphasis>. Click on the folder named
|
mirror of <emphasis>HBase Releases</emphasis>. Click on the folder named
|
||||||
<filename>stable</filename> and then download the file that ends in
|
<filename>stable</filename> and then download the file that ends in
|
||||||
<filename>.tar.gz</filename> to your local filesystem; e.g.
|
<filename>.tar.gz</filename> to your local filesystem; e.g.
|
||||||
|
@ -65,24 +66,21 @@ $ cd hbase-<?eval ${project.version}?>
|
||||||
</programlisting></para>
|
</programlisting></para>
|
||||||
|
|
||||||
<para>At this point, you are ready to start HBase. But before starting
|
<para>At this point, you are ready to start HBase. But before starting
|
||||||
it, you might want to edit <filename>conf/hbase-site.xml</filename> and
|
it, you might want to edit <filename>conf/hbase-site.xml</filename>, the
|
||||||
set the directory you want HBase to write to,
|
file you write your site-specific configurations into, and
|
||||||
<varname>hbase.rootdir</varname>. <programlisting>
|
set <varname>hbase.rootdir</varname>, the directory HBase writes data to,
|
||||||
|
<programlisting><?xml version="1.0"?>
|
||||||
<?xml version="1.0"?>
|
|
||||||
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
|
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
|
||||||
<configuration>
|
<configuration>
|
||||||
<property>
|
<property>
|
||||||
<name>hbase.rootdir</name>
|
<name>hbase.rootdir</name>
|
||||||
<value>file:///DIRECTORY/hbase</value>
|
<value>file:///DIRECTORY/hbase</value>
|
||||||
</property>
|
</property>
|
||||||
</configuration>
|
</configuration></programlisting> Replace <varname>DIRECTORY</varname> in the above with the
|
||||||
|
path to the directory where you want HBase to store its data. By default,
|
||||||
</programlisting> Replace <varname>DIRECTORY</varname> in the above with a
|
|
||||||
path to a directory where you want HBase to store its data. By default,
|
|
||||||
<varname>hbase.rootdir</varname> is set to
|
<varname>hbase.rootdir</varname> is set to
|
||||||
<filename>/tmp/hbase-${user.name}</filename> which means you'll lose all
|
<filename>/tmp/hbase-${user.name}</filename> which means you'll lose all
|
||||||
your data whenever your server reboots (Most operating systems clear
|
your data whenever your server reboots unless you change it (Most operating systems clear
|
||||||
<filename>/tmp</filename> on restart).</para>
|
<filename>/tmp</filename> on restart).</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
@ -96,7 +94,7 @@ starting Master, logging to logs/hbase-user-master-example.org.out</programlisti
|
||||||
standalone mode, HBase runs all daemons in the the one JVM; i.e. both
|
standalone mode, HBase runs all daemons in the the one JVM; i.e. both
|
||||||
the HBase and ZooKeeper daemons. HBase logs can be found in the
|
the HBase and ZooKeeper daemons. HBase logs can be found in the
|
||||||
<filename>logs</filename> subdirectory. Check them out especially if
|
<filename>logs</filename> subdirectory. Check them out especially if
|
||||||
HBase had trouble starting.</para>
|
it seems HBase had trouble starting.</para>
|
||||||
|
|
||||||
<note>
|
<note>
|
||||||
<title>Is <application>java</application> installed?</title>
|
<title>Is <application>java</application> installed?</title>
|
||||||
|
@ -108,7 +106,7 @@ starting Master, logging to logs/hbase-user-master-example.org.out</programlisti
|
||||||
options the java program takes (HBase requires java 6). If this is not
|
options the java program takes (HBase requires java 6). If this is not
|
||||||
the case, HBase will not start. Install java, edit
|
the case, HBase will not start. Install java, edit
|
||||||
<filename>conf/hbase-env.sh</filename>, uncommenting the
|
<filename>conf/hbase-env.sh</filename>, uncommenting the
|
||||||
<envar>JAVA_HOME</envar> line pointing it to your java install. Then,
|
<envar>JAVA_HOME</envar> line pointing it to your java install, then,
|
||||||
retry the steps above.</para>
|
retry the steps above.</para>
|
||||||
</note>
|
</note>
|
||||||
</section>
|
</section>
|
||||||
|
@ -154,9 +152,7 @@ hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
|
||||||
<varname>cf</varname> in this example -- followed by a colon and then a
|
<varname>cf</varname> in this example -- followed by a colon and then a
|
||||||
column qualifier suffix (<varname>a</varname> in this case).</para>
|
column qualifier suffix (<varname>a</varname> in this case).</para>
|
||||||
|
|
||||||
<para>Verify the data insert.</para>
|
<para>Verify the data insert by running a scan of the table as follows</para>
|
||||||
|
|
||||||
<para>Run a scan of the table by doing the following</para>
|
|
||||||
|
|
||||||
<para><programlisting>hbase(main):007:0> scan 'test'
|
<para><programlisting>hbase(main):007:0> scan 'test'
|
||||||
ROW COLUMN+CELL
|
ROW COLUMN+CELL
|
||||||
|
@ -165,7 +161,7 @@ row2 column=cf:b, timestamp=1288380738440, value=value2
|
||||||
row3 column=cf:c, timestamp=1288380747365, value=value3
|
row3 column=cf:c, timestamp=1288380747365, value=value3
|
||||||
3 row(s) in 0.0590 seconds</programlisting></para>
|
3 row(s) in 0.0590 seconds</programlisting></para>
|
||||||
|
|
||||||
<para>Get a single row as follows</para>
|
<para>Get a single row</para>
|
||||||
|
|
||||||
<para><programlisting>hbase(main):008:0> get 'test', 'row1'
|
<para><programlisting>hbase(main):008:0> get 'test', 'row1'
|
||||||
COLUMN CELL
|
COLUMN CELL
|
||||||
|
@ -198,9 +194,9 @@ stopping hbase...............</programlisting></para>
|
||||||
<title>Where to go next</title>
|
<title>Where to go next</title>
|
||||||
|
|
||||||
<para>The above described standalone setup is good for testing and
|
<para>The above described standalone setup is good for testing and
|
||||||
experiments only. Next move on to <xref linkend="configuration" /> where we'll go into
|
experiments only. In the next chapter, <xref linkend="configuration" />,
|
||||||
depth on the different HBase run modes, requirements and critical
|
we'll go into depth on the different HBase run modes, system requirements
|
||||||
configurations needed setting up a distributed HBase deploy.</para>
|
running HBase, and critical configurations setting up a distributed HBase deploy.</para>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
|
@ -526,6 +526,96 @@ htable.close();</programlisting></para>
|
||||||
too few regions then the reads could likely be served from too few nodes. </para>
|
too few regions then the reads could likely be served from too few nodes. </para>
|
||||||
<para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>
|
<para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>
|
||||||
</section>
|
</section>
|
||||||
|
<section xml:id="blooms">
|
||||||
|
<title>Bloom Filters</title>
|
||||||
|
<para>Enabling Bloom Filters can save your having to go to disk and
|
||||||
|
can help improve read latencys.</para>
|
||||||
|
<para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
|
||||||
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
|
||||||
|
Add bloomfilters</link>.<footnote>
|
||||||
|
<para>For description of the development process -- why static blooms
|
||||||
|
rather than dynamic -- and for an overview of the unique properties
|
||||||
|
that pertain to blooms in HBase, as well as possible future
|
||||||
|
directions, see the <emphasis>Development Process</emphasis> section
|
||||||
|
of the document <link
|
||||||
|
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
||||||
|
in HBase</link> attached to <link
|
||||||
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
|
||||||
|
</footnote><footnote>
|
||||||
|
<para>The bloom filters described here are actually version two of
|
||||||
|
blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
|
||||||
|
option based on work done by the <link
|
||||||
|
xlink:href="http://www.one-lab.org">European Commission One-Lab
|
||||||
|
Project 034819</link>. The core of the HBase bloom work was later
|
||||||
|
pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
|
||||||
|
Version 1 of HBase blooms never worked that well. Version 2 is a
|
||||||
|
rewrite from scratch though again it starts with the one-lab
|
||||||
|
work.</para>
|
||||||
|
</footnote></para>
|
||||||
|
<para>See also <xref linkend="schema.bloom" />.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<section xml:id="bloom_footprint">
|
||||||
|
<title>Bloom StoreFile footprint</title>
|
||||||
|
|
||||||
|
<para>Bloom filters add an entry to the <classname>StoreFile</classname>
|
||||||
|
general <classname>FileInfo</classname> data structure and then two
|
||||||
|
extra entries to the <classname>StoreFile</classname> metadata
|
||||||
|
section.</para>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>BloomFilter in the <classname>StoreFile</classname>
|
||||||
|
<classname>FileInfo</classname> data structure</title>
|
||||||
|
|
||||||
|
<para><classname>FileInfo</classname> has a
|
||||||
|
<varname>BLOOM_FILTER_TYPE</varname> entry which is set to
|
||||||
|
<varname>NONE</varname>, <varname>ROW</varname> or
|
||||||
|
<varname>ROWCOL.</varname></para>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>BloomFilter entries in <classname>StoreFile</classname>
|
||||||
|
metadata</title>
|
||||||
|
|
||||||
|
<para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
|
||||||
|
Function used, etc. Its small in size and is cached on
|
||||||
|
<classname>StoreFile.Reader</classname> load</para>
|
||||||
|
<para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
|
||||||
|
data. Obtained on-demand. Stored in the LRU cache, if it is enabled
|
||||||
|
(Its enabled by default).</para>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
<section xml:id="config.bloom">
|
||||||
|
<title>Bloom Filter Configuration</title>
|
||||||
|
<section>
|
||||||
|
<title><varname>io.hfile.bloom.enabled</varname> global kill
|
||||||
|
switch</title>
|
||||||
|
|
||||||
|
<para><code>io.hfile.bloom.enabled</code> in
|
||||||
|
<classname>Configuration</classname> serves as the kill switch in case
|
||||||
|
something goes wrong. Default = <varname>true</varname>.</para>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title><varname>io.hfile.bloom.error.rate</varname></title>
|
||||||
|
|
||||||
|
<para><varname>io.hfile.bloom.error.rate</varname> = average false
|
||||||
|
positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
|
||||||
|
bit per bloom entry.</para>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title><varname>io.hfile.bloom.max.fold</varname></title>
|
||||||
|
|
||||||
|
<para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
|
||||||
|
fold rate. Most people should leave this alone. Default = 7, or can
|
||||||
|
collapse to at least 1/128th of original size. See the
|
||||||
|
<emphasis>Development Process</emphasis> section of the document <link
|
||||||
|
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
||||||
|
in HBase</link> for more on what this option means.</para>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
</section> <!-- bloom -->
|
||||||
|
|
||||||
</section> <!-- reading -->
|
</section> <!-- reading -->
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,586 @@
|
||||||
|
<?xml version="1.0"?>
|
||||||
|
<chapter xml:id="zookeeper"
|
||||||
|
version="5.0" xmlns="http://docbook.org/ns/docbook"
|
||||||
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||||
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||||
|
xmlns:svg="http://www.w3.org/2000/svg"
|
||||||
|
xmlns:m="http://www.w3.org/1998/Math/MathML"
|
||||||
|
xmlns:html="http://www.w3.org/1999/xhtml"
|
||||||
|
xmlns:db="http://docbook.org/ns/docbook">
|
||||||
|
<!--
|
||||||
|
/**
|
||||||
|
* Licensed to the Apache Software Foundation (ASF) under one
|
||||||
|
* or more contributor license agreements. See the NOTICE file
|
||||||
|
* distributed with this work for additional information
|
||||||
|
* regarding copyright ownership. The ASF licenses this file
|
||||||
|
* to you under the Apache License, Version 2.0 (the
|
||||||
|
* "License"); you may not use this file except in compliance
|
||||||
|
* with the License. You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
-->
|
||||||
|
|
||||||
|
<title>ZooKeeper<indexterm>
|
||||||
|
<primary>ZooKeeper</primary>
|
||||||
|
</indexterm></title>
|
||||||
|
|
||||||
|
<para>A distributed HBase depends on a running ZooKeeper cluster.
|
||||||
|
All participating nodes and clients need to be able to access the
|
||||||
|
running ZooKeeper ensemble. HBase by default manages a ZooKeeper
|
||||||
|
"cluster" for you. It will start and stop the ZooKeeper ensemble
|
||||||
|
as part of the HBase start/stop process. You can also manage the
|
||||||
|
ZooKeeper ensemble independent of HBase and just point HBase at
|
||||||
|
the cluster it should use. To toggle HBase management of
|
||||||
|
ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
|
||||||
|
<filename>conf/hbase-env.sh</filename>. This variable, which
|
||||||
|
defaults to <varname>true</varname>, tells HBase whether to
|
||||||
|
start/stop the ZooKeeper ensemble servers as part of HBase
|
||||||
|
start/stop.</para>
|
||||||
|
|
||||||
|
<para>When HBase manages the ZooKeeper ensemble, you can specify
|
||||||
|
ZooKeeper configuration using its native
|
||||||
|
<filename>zoo.cfg</filename> file, or, the easier option is to
|
||||||
|
just specify ZooKeeper options directly in
|
||||||
|
<filename>conf/hbase-site.xml</filename>. A ZooKeeper
|
||||||
|
configuration option can be set as a property in the HBase
|
||||||
|
<filename>hbase-site.xml</filename> XML configuration file by
|
||||||
|
prefacing the ZooKeeper option name with
|
||||||
|
<varname>hbase.zookeeper.property</varname>. For example, the
|
||||||
|
<varname>clientPort</varname> setting in ZooKeeper can be changed
|
||||||
|
by setting the
|
||||||
|
<varname>hbase.zookeeper.property.clientPort</varname> property.
|
||||||
|
For all default values used by HBase, including ZooKeeper
|
||||||
|
configuration, see <xref linkend="hbase_default_configurations" />. Look for the
|
||||||
|
<varname>hbase.zookeeper.property</varname> prefix <footnote>
|
||||||
|
<para>For the full list of ZooKeeper configurations, see
|
||||||
|
ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
|
||||||
|
with a <filename>zoo.cfg</filename> so you will need to browse
|
||||||
|
the <filename>conf</filename> directory in an appropriate
|
||||||
|
ZooKeeper download.</para>
|
||||||
|
</footnote></para>
|
||||||
|
|
||||||
|
<para>You must at least list the ensemble servers in
|
||||||
|
<filename>hbase-site.xml</filename> using the
|
||||||
|
<varname>hbase.zookeeper.quorum</varname> property. This property
|
||||||
|
defaults to a single ensemble member at
|
||||||
|
<varname>localhost</varname> which is not suitable for a fully
|
||||||
|
distributed HBase. (It binds to the local machine only and remote
|
||||||
|
clients will not be able to connect). <note xml:id="how_many_zks">
|
||||||
|
<title>How many ZooKeepers should I run?</title>
|
||||||
|
|
||||||
|
<para>You can run a ZooKeeper ensemble that comprises 1 node
|
||||||
|
only but in production it is recommended that you run a
|
||||||
|
ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
|
||||||
|
ensemble has, the more tolerant the ensemble is of host
|
||||||
|
failures. Also, run an odd number of machines. In ZooKeeper,
|
||||||
|
an even number of peers is supported, but it is normally not used
|
||||||
|
because an even sized ensemble requires, proportionally, more peers
|
||||||
|
to form a quorum than an odd sized ensemble requires. For example, an
|
||||||
|
ensemble with 4 peers requires 3 to form a quorum, while an ensemble with
|
||||||
|
5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to
|
||||||
|
fail, and thus is more fault tolerant than the ensemble of 4, which allows
|
||||||
|
only 1 down peer.
|
||||||
|
</para>
|
||||||
|
<para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
|
||||||
|
dedicated disk (A dedicated disk is the best thing you can do
|
||||||
|
to ensure a performant ZooKeeper ensemble). For very heavily
|
||||||
|
loaded clusters, run ZooKeeper servers on separate machines
|
||||||
|
from RegionServers (DataNodes and TaskTrackers).</para>
|
||||||
|
</note></para>
|
||||||
|
|
||||||
|
<para>For example, to have HBase manage a ZooKeeper quorum on
|
||||||
|
nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
|
||||||
|
port 2222 (the default is 2181) ensure
|
||||||
|
<varname>HBASE_MANAGE_ZK</varname> is commented out or set to
|
||||||
|
<varname>true</varname> in <filename>conf/hbase-env.sh</filename>
|
||||||
|
and then edit <filename>conf/hbase-site.xml</filename> and set
|
||||||
|
<varname>hbase.zookeeper.property.clientPort</varname> and
|
||||||
|
<varname>hbase.zookeeper.quorum</varname>. You should also set
|
||||||
|
<varname>hbase.zookeeper.property.dataDir</varname> to other than
|
||||||
|
the default as the default has ZooKeeper persist data under
|
||||||
|
<filename>/tmp</filename> which is often cleared on system
|
||||||
|
restart. In the example below we have ZooKeeper persist to
|
||||||
|
<filename>/user/local/zookeeper</filename>. <programlisting>
|
||||||
|
<configuration>
|
||||||
|
...
|
||||||
|
<property>
|
||||||
|
<name>hbase.zookeeper.property.clientPort</name>
|
||||||
|
<value>2222</value>
|
||||||
|
<description>Property from ZooKeeper's config zoo.cfg.
|
||||||
|
The port at which the clients will connect.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.zookeeper.quorum</name>
|
||||||
|
<value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
|
||||||
|
<description>Comma separated list of servers in the ZooKeeper Quorum.
|
||||||
|
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
|
||||||
|
By default this is set to localhost for local and pseudo-distributed modes
|
||||||
|
of operation. For a fully-distributed setup, this should be set to a full
|
||||||
|
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
|
||||||
|
this is the list of servers which we will start/stop ZooKeeper on.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.zookeeper.property.dataDir</name>
|
||||||
|
<value>/usr/local/zookeeper</value>
|
||||||
|
<description>Property from ZooKeeper's config zoo.cfg.
|
||||||
|
The directory where the snapshot is stored.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
...
|
||||||
|
</configuration></programlisting></para>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Using existing ZooKeeper ensemble</title>
|
||||||
|
|
||||||
|
<para>To point HBase at an existing ZooKeeper cluster, one that
|
||||||
|
is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
|
||||||
|
in <filename>conf/hbase-env.sh</filename> to false
|
||||||
|
<programlisting>
|
||||||
|
...
|
||||||
|
# Tell HBase whether it should manage its own instance of Zookeeper or not.
|
||||||
|
export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
|
||||||
|
and client port, if non-standard, in
|
||||||
|
<filename>hbase-site.xml</filename>, or add a suitably
|
||||||
|
configured <filename>zoo.cfg</filename> to HBase's
|
||||||
|
<filename>CLASSPATH</filename>. HBase will prefer the
|
||||||
|
configuration found in <filename>zoo.cfg</filename> over any
|
||||||
|
settings in <filename>hbase-site.xml</filename>.</para>
|
||||||
|
|
||||||
|
<para>When HBase manages ZooKeeper, it will start/stop the
|
||||||
|
ZooKeeper servers as a part of the regular start/stop scripts.
|
||||||
|
If you would like to run ZooKeeper yourself, independent of
|
||||||
|
HBase start/stop, you would do the following</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>Note that you can use HBase in this manner to spin up a
|
||||||
|
ZooKeeper cluster, unrelated to HBase. Just make sure to set
|
||||||
|
<varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
|
||||||
|
if you want it to stay up across HBase restarts so that when
|
||||||
|
HBase shuts down, it doesn't take ZooKeeper down with it.</para>
|
||||||
|
|
||||||
|
<para>For more information about running a distinct ZooKeeper
|
||||||
|
cluster, see the ZooKeeper <link
|
||||||
|
xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
|
||||||
|
Started Guide</link>. Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the
|
||||||
|
<link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link>
|
||||||
|
for more information on ZooKeeper sizing.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
|
||||||
|
<section xml:id="zk.sasl.auth">
|
||||||
|
<title>SASL Authentication with ZooKeeper</title>
|
||||||
|
<para>Newer releases of HBase (>= 0.92) will
|
||||||
|
support connecting to a ZooKeeper Quorum that supports
|
||||||
|
SASL authentication (which is available in Zookeeper
|
||||||
|
versions 3.4.0 or later).</para>
|
||||||
|
|
||||||
|
<para>This describes how to set up HBase to mutually
|
||||||
|
authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
|
||||||
|
mutual authentication (<link
|
||||||
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
|
||||||
|
is required as part of a complete secure HBase configuration
|
||||||
|
(<link
|
||||||
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
|
||||||
|
|
||||||
|
For simplicity of explication, this section ignores
|
||||||
|
additional configuration required (Secure HDFS and Coprocessor
|
||||||
|
configuration). It's recommended to begin with an
|
||||||
|
HBase-managed Zookeeper configuration (as opposed to a
|
||||||
|
standalone Zookeeper quorum) for ease of learning.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<section><title>Operating System Prerequisites</title></section>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
You need to have a working Kerberos KDC setup. For
|
||||||
|
each <code>$HOST</code> that will run a ZooKeeper
|
||||||
|
server, you should have a principle
|
||||||
|
<code>zookeeper/$HOST</code>. For each such host,
|
||||||
|
add a service key (using the <code>kadmin</code> or
|
||||||
|
<code>kadmin.local</code> tool's <code>ktadd</code>
|
||||||
|
command) for <code>zookeeper/$HOST</code> and copy
|
||||||
|
this file to <code>$HOST</code>, and make it
|
||||||
|
readable only to the user that will run zookeeper on
|
||||||
|
<code>$HOST</code>. Note the location of this file,
|
||||||
|
which we will use below as
|
||||||
|
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Similarly, for each <code>$HOST</code> that will run
|
||||||
|
an HBase server (master or regionserver), you should
|
||||||
|
have a principle: <code>hbase/$HOST</code>. For each
|
||||||
|
host, add a keytab file called
|
||||||
|
<filename>hbase.keytab</filename> containing a service
|
||||||
|
key for <code>hbase/$HOST</code>, copy this file to
|
||||||
|
<code>$HOST</code>, and make it readable only to the
|
||||||
|
user that will run an HBase service on
|
||||||
|
<code>$HOST</code>. Note the location of this file,
|
||||||
|
which we will use below as
|
||||||
|
<filename>$PATH_TO_HBASE_KEYTAB</filename>.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Each user who will be an HBase client should also be
|
||||||
|
given a Kerberos principal. This principal should
|
||||||
|
usually have a password assigned to it (as opposed to,
|
||||||
|
as with the HBase servers, a keytab file) which only
|
||||||
|
this user knows. The client's principal's
|
||||||
|
<code>maxrenewlife</code> should be set so that it can
|
||||||
|
be renewed enough so that the user can complete their
|
||||||
|
HBase client processes. For example, if a user runs a
|
||||||
|
long-running HBase client process that takes at most 3
|
||||||
|
days, we might create this user's principal within
|
||||||
|
<code>kadmin</code> with: <code>addprinc -maxrenewlife
|
||||||
|
3days</code>. The Zookeeper client and server
|
||||||
|
libraries manage their own ticket refreshment by
|
||||||
|
running threads that wake up periodically to do the
|
||||||
|
refreshment.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>On each host that will run an HBase client
|
||||||
|
(e.g. <code>hbase shell</code>), add the following
|
||||||
|
file to the HBase home directory's <filename>conf</filename>
|
||||||
|
directory:</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
Client {
|
||||||
|
com.sun.security.auth.module.Krb5LoginModule required
|
||||||
|
useKeyTab=false
|
||||||
|
useTicketCache=true;
|
||||||
|
};
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>We'll refer to this JAAS configuration file as
|
||||||
|
<filename>$CLIENT_CONF</filename> below.</para>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>HBase-managed Zookeeper Configuration</title>
|
||||||
|
|
||||||
|
<para>On each node that will run a zookeeper, a
|
||||||
|
master, or a regionserver, create a <link
|
||||||
|
xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
|
||||||
|
configuration file in the conf directory of the node's
|
||||||
|
<filename>HBASE_HOME</filename> directory that looks like the
|
||||||
|
following:</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
Server {
|
||||||
|
com.sun.security.auth.module.Krb5LoginModule required
|
||||||
|
useKeyTab=true
|
||||||
|
keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
|
||||||
|
storeKey=true
|
||||||
|
useTicketCache=false
|
||||||
|
principal="zookeeper/$HOST";
|
||||||
|
};
|
||||||
|
Client {
|
||||||
|
com.sun.security.auth.module.Krb5LoginModule required
|
||||||
|
useKeyTab=true
|
||||||
|
useTicketCache=false
|
||||||
|
keyTab="$PATH_TO_HBASE_KEYTAB"
|
||||||
|
principal="hbase/$HOST";
|
||||||
|
};
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
|
||||||
|
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
|
||||||
|
you created above, and <code>$HOST</code> is the hostname for that
|
||||||
|
node.
|
||||||
|
|
||||||
|
<para>The <code>Server</code> section will be used by
|
||||||
|
the Zookeeper quorum server, while the
|
||||||
|
<code>Client</code> section will be used by the HBase
|
||||||
|
master and regionservers. The path to this file should
|
||||||
|
be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
|
||||||
|
in the <filename>hbase-env.sh</filename>
|
||||||
|
listing below.</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The path to this file should be substituted for the
|
||||||
|
text <filename>$CLIENT_CONF</filename> in the
|
||||||
|
<filename>hbase-env.sh</filename> listing below.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>Modify your <filename>hbase-env.sh</filename> to include the
|
||||||
|
following:</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
|
||||||
|
export HBASE_MANAGES_ZK=true
|
||||||
|
export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||||
|
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||||
|
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
where <filename>$HBASE_SERVER_CONF</filename> and
|
||||||
|
<filename>$CLIENT_CONF</filename> are the full paths to the
|
||||||
|
JAAS configuration files created above.
|
||||||
|
|
||||||
|
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
||||||
|
that will run zookeeper, master or regionserver to contain:</para>
|
||||||
|
|
||||||
|
<programlisting><![CDATA[
|
||||||
|
<configuration>
|
||||||
|
<property>
|
||||||
|
<name>hbase.zookeeper.quorum</name>
|
||||||
|
<value>$ZK_NODES</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.cluster.distributed</name>
|
||||||
|
<value>true</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.zookeeper.property.authProvider.1</name>
|
||||||
|
<value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
|
||||||
|
<value>true</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
|
||||||
|
<value>true</value>
|
||||||
|
</property>
|
||||||
|
</configuration>
|
||||||
|
]]></programlisting>
|
||||||
|
|
||||||
|
<para>where <code>$ZK_NODES</code> is the
|
||||||
|
comma-separated list of hostnames of the Zookeeper
|
||||||
|
Quorum hosts.</para>
|
||||||
|
|
||||||
|
<para>Start your hbase cluster by running one or more
|
||||||
|
of the following set of commands on the appropriate
|
||||||
|
hosts:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
bin/hbase zookeeper start
|
||||||
|
bin/hbase master start
|
||||||
|
bin/hbase regionserver start
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section><title>External Zookeeper Configuration</title>
|
||||||
|
<para>Add a JAAS configuration file that looks like:
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
Client {
|
||||||
|
com.sun.security.auth.module.Krb5LoginModule required
|
||||||
|
useKeyTab=true
|
||||||
|
useTicketCache=false
|
||||||
|
keyTab="$PATH_TO_HBASE_KEYTAB"
|
||||||
|
principal="hbase/$HOST";
|
||||||
|
};
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab
|
||||||
|
created above for HBase services to run on this host, and <code>$HOST</code> is the
|
||||||
|
hostname for that node. Put this in the HBase home's
|
||||||
|
configuration directory. We'll refer to this file's
|
||||||
|
full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
|
||||||
|
|
||||||
|
<para>Modify your hbase-env.sh to include the following:</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
|
||||||
|
export HBASE_MANAGES_ZK=false
|
||||||
|
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||||
|
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
|
||||||
|
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
||||||
|
that will run a master or regionserver to contain:</para>
|
||||||
|
|
||||||
|
<programlisting><![CDATA[
|
||||||
|
<configuration>
|
||||||
|
<property>
|
||||||
|
<name>hbase.zookeeper.quorum</name>
|
||||||
|
<value>$ZK_NODES</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.cluster.distributed</name>
|
||||||
|
<value>true</value>
|
||||||
|
</property>
|
||||||
|
</configuration>
|
||||||
|
]]>
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>where <code>$ZK_NODES</code> is the
|
||||||
|
comma-separated list of hostnames of the Zookeeper
|
||||||
|
Quorum hosts.</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
|
||||||
|
<programlisting>
|
||||||
|
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
|
||||||
|
kerberos.removeHostFromPrincipal=true
|
||||||
|
kerberos.removeRealmFromPrincipal=true
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
Also on each of these hosts, create a JAAS configuration file containing:
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
Server {
|
||||||
|
com.sun.security.auth.module.Krb5LoginModule required
|
||||||
|
useKeyTab=true
|
||||||
|
keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
|
||||||
|
storeKey=true
|
||||||
|
useTicketCache=false
|
||||||
|
principal="zookeeper/$HOST";
|
||||||
|
};
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
where <code>$HOST</code> is the hostname of each
|
||||||
|
Quorum host. We will refer to the full pathname of
|
||||||
|
this file as <filename>$ZK_SERVER_CONF</filename> below.
|
||||||
|
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Start your Zookeepers on each Zookeeper Quorum host with:
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
bin/hbase master start
|
||||||
|
bin/hbase regionserver start
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Zookeeper Server Authentication Log Output</title>
|
||||||
|
<para>If the configuration above is successful,
|
||||||
|
you should see something similar to the following in
|
||||||
|
your Zookeeper server logs:
|
||||||
|
<programlisting>
|
||||||
|
11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
|
||||||
|
11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
|
||||||
|
11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
|
||||||
|
11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:39 UTC 2011
|
||||||
|
11/12/05 22:43:39 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:39 UTC 2011
|
||||||
|
11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
|
||||||
|
..
|
||||||
|
11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler:
|
||||||
|
Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN;
|
||||||
|
authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
|
||||||
|
11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
|
||||||
|
11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
</para>
|
||||||
|
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Zookeeper Client Authentication Log Output</title>
|
||||||
|
<para>On the Zookeeper client side (HBase master or regionserver),
|
||||||
|
you should see something similar to the following:
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
|
||||||
|
11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:59 UTC 2011
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:59 UTC 2011
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
|
||||||
|
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Configuration from Scratch</title>
|
||||||
|
|
||||||
|
This has been tested on the current standard Amazon
|
||||||
|
Linux AMI. First setup KDC and principals as
|
||||||
|
described above. Next checkout code and run a sanity
|
||||||
|
check.
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
git clone git://git.apache.org/hbase.git
|
||||||
|
cd hbase
|
||||||
|
mvn -PlocalTests clean test -Dtest=TestZooKeeperACL
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
Then configure HBase as described above.
|
||||||
|
Manually edit target/cached_classpath.txt (see below)..
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
bin/hbase zookeeper &
|
||||||
|
bin/hbase master &
|
||||||
|
bin/hbase regionserver &
|
||||||
|
</programlisting>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Future improvements</title>
|
||||||
|
|
||||||
|
<section><title>Fix target/cached_classpath.txt</title>
|
||||||
|
<para>
|
||||||
|
You must override the standard hadoop-core jar file from the
|
||||||
|
<code>target/cached_classpath.txt</code>
|
||||||
|
file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt
|
||||||
|
mv target/tmp.txt target/cached_classpath.txt
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
</para>
|
||||||
|
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Set JAAS configuration
|
||||||
|
programmatically</title>
|
||||||
|
|
||||||
|
|
||||||
|
This would avoid the need for a separate Hadoop jar
|
||||||
|
that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Elimination of
|
||||||
|
<code>kerberos.removeHostFromPrincipal</code> and
|
||||||
|
<code>kerberos.removeRealmFromPrincipal</code></title>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
</section>
|
||||||
|
|
||||||
|
|
||||||
|
</section> <!-- SASL Authentication with ZooKeeper -->
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
</chapter>
|
Loading…
Reference in New Issue