More edits: Moved ZK to its own chapter, put the bloom filter stuff together in one place, made the distributed setup more focused
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1389153 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
7d709c965a
commit
623a9be04d
|
@ -2319,65 +2319,6 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
|
|||
|
||||
</section> <!-- store -->
|
||||
|
||||
<section xml:id="blooms">
|
||||
<title>Bloom Filters</title>
|
||||
<para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
|
||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
|
||||
Add bloomfilters</link>.<footnote>
|
||||
<para>For description of the development process -- why static blooms
|
||||
rather than dynamic -- and for an overview of the unique properties
|
||||
that pertain to blooms in HBase, as well as possible future
|
||||
directions, see the <emphasis>Development Process</emphasis> section
|
||||
of the document <link
|
||||
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
||||
in HBase</link> attached to <link
|
||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
|
||||
</footnote><footnote>
|
||||
<para>The bloom filters described here are actually version two of
|
||||
blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
|
||||
option based on work done by the <link
|
||||
xlink:href="http://www.one-lab.org">European Commission One-Lab
|
||||
Project 034819</link>. The core of the HBase bloom work was later
|
||||
pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
|
||||
Version 1 of HBase blooms never worked that well. Version 2 is a
|
||||
rewrite from scratch though again it starts with the one-lab
|
||||
work.</para>
|
||||
</footnote></para>
|
||||
<para>See also <xref linkend="schema.bloom" /> and <xref linkend="config.bloom" />.
|
||||
</para>
|
||||
|
||||
<section xml:id="bloom_footprint">
|
||||
<title>Bloom StoreFile footprint</title>
|
||||
|
||||
<para>Bloom filters add an entry to the <classname>StoreFile</classname>
|
||||
general <classname>FileInfo</classname> data structure and then two
|
||||
extra entries to the <classname>StoreFile</classname> metadata
|
||||
section.</para>
|
||||
|
||||
<section>
|
||||
<title>BloomFilter in the <classname>StoreFile</classname>
|
||||
<classname>FileInfo</classname> data structure</title>
|
||||
|
||||
<para><classname>FileInfo</classname> has a
|
||||
<varname>BLOOM_FILTER_TYPE</varname> entry which is set to
|
||||
<varname>NONE</varname>, <varname>ROW</varname> or
|
||||
<varname>ROWCOL.</varname></para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>BloomFilter entries in <classname>StoreFile</classname>
|
||||
metadata</title>
|
||||
|
||||
<para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
|
||||
Function used, etc. Its small in size and is cached on
|
||||
<classname>StoreFile.Reader</classname> load</para>
|
||||
<para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
|
||||
data. Obtained on-demand. Stored in the LRU cache, if it is enabled
|
||||
(Its enabled by default).</para>
|
||||
</section>
|
||||
</section>
|
||||
</section> <!-- bloom -->
|
||||
|
||||
</section> <!-- regions -->
|
||||
|
||||
<section xml:id="arch.bulk.load"><title>Bulk Loading</title>
|
||||
|
@ -2519,6 +2460,7 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
|
|||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="case_studies.xml" />
|
||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ops_mgt.xml" />
|
||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="developer.xml" />
|
||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="zookeeper.xml" />
|
||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="community.xml" />
|
||||
|
||||
<appendix xml:id="faq">
|
||||
|
|
|
@ -27,8 +27,10 @@
|
|||
*/
|
||||
-->
|
||||
<title>Configuration</title>
|
||||
<para>This chapter is the Not-So-Quick start guide to HBase configuration.</para>
|
||||
<para>Please read this chapter carefully and ensure that all requirements have
|
||||
<para>This chapter is the Not-So-Quick start guide to HBase configuration. It goes
|
||||
over system requirements, Hadoop setup, the different HBase run modes, and the
|
||||
various configurations in HBase. Please read this chapter carefully and ensure
|
||||
that all <xref linkend="basic.requirements" /> have
|
||||
been satisfied. Failure to do so will cause you (and us) grief debugging strange errors
|
||||
and/or data loss.</para>
|
||||
|
||||
|
@ -56,6 +58,10 @@ to ensure well-formedness of your document after an edit session.
|
|||
all nodes of the cluster. HBase will not do this for you.
|
||||
Use <command>rsync</command>.</para>
|
||||
|
||||
<section xml:id="basic.requirements">
|
||||
<title>Basic Requirements</title>
|
||||
<para>This section lists required services and some required system configuration.
|
||||
</para>
|
||||
<section xml:id="java">
|
||||
<title>Java</title>
|
||||
|
||||
|
@ -237,7 +243,6 @@ to ensure well-formedness of your document after an edit session.
|
|||
Currently only Hadoop versions 0.20.205.x or any release in excess of this
|
||||
version -- this includes hadoop 1.0.0 -- have a working, durable sync
|
||||
<footnote>
|
||||
<title>On Hadoop Versions</title>
|
||||
<para>The Cloudera blog post <link xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An update on Apache Hadoop 1.0</link>
|
||||
by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate.
|
||||
Its worth checking out if you are having trouble making sense of the
|
||||
|
@ -352,6 +357,7 @@ to ensure well-formedness of your document after an edit session.
|
|||
</section>
|
||||
|
||||
</section> <!-- hadoop -->
|
||||
</section>
|
||||
|
||||
<section xml:id="standalone_dist">
|
||||
<title>HBase run modes: Standalone and Distributed</title>
|
||||
|
@ -686,565 +692,6 @@ stopping hbase...............</programlisting> Shutdown can take a moment to
|
|||
</section>
|
||||
</section> <!-- run modes -->
|
||||
|
||||
<section xml:id="zookeeper">
|
||||
<title>ZooKeeper<indexterm>
|
||||
<primary>ZooKeeper</primary>
|
||||
</indexterm></title>
|
||||
|
||||
<para>A distributed HBase depends on a running ZooKeeper cluster.
|
||||
All participating nodes and clients need to be able to access the
|
||||
running ZooKeeper ensemble. HBase by default manages a ZooKeeper
|
||||
"cluster" for you. It will start and stop the ZooKeeper ensemble
|
||||
as part of the HBase start/stop process. You can also manage the
|
||||
ZooKeeper ensemble independent of HBase and just point HBase at
|
||||
the cluster it should use. To toggle HBase management of
|
||||
ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
|
||||
<filename>conf/hbase-env.sh</filename>. This variable, which
|
||||
defaults to <varname>true</varname>, tells HBase whether to
|
||||
start/stop the ZooKeeper ensemble servers as part of HBase
|
||||
start/stop.</para>
|
||||
|
||||
<para>When HBase manages the ZooKeeper ensemble, you can specify
|
||||
ZooKeeper configuration using its native
|
||||
<filename>zoo.cfg</filename> file, or, the easier option is to
|
||||
just specify ZooKeeper options directly in
|
||||
<filename>conf/hbase-site.xml</filename>. A ZooKeeper
|
||||
configuration option can be set as a property in the HBase
|
||||
<filename>hbase-site.xml</filename> XML configuration file by
|
||||
prefacing the ZooKeeper option name with
|
||||
<varname>hbase.zookeeper.property</varname>. For example, the
|
||||
<varname>clientPort</varname> setting in ZooKeeper can be changed
|
||||
by setting the
|
||||
<varname>hbase.zookeeper.property.clientPort</varname> property.
|
||||
For all default values used by HBase, including ZooKeeper
|
||||
configuration, see <xref linkend="hbase_default_configurations" />. Look for the
|
||||
<varname>hbase.zookeeper.property</varname> prefix <footnote>
|
||||
<para>For the full list of ZooKeeper configurations, see
|
||||
ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
|
||||
with a <filename>zoo.cfg</filename> so you will need to browse
|
||||
the <filename>conf</filename> directory in an appropriate
|
||||
ZooKeeper download.</para>
|
||||
</footnote></para>
|
||||
|
||||
<para>You must at least list the ensemble servers in
|
||||
<filename>hbase-site.xml</filename> using the
|
||||
<varname>hbase.zookeeper.quorum</varname> property. This property
|
||||
defaults to a single ensemble member at
|
||||
<varname>localhost</varname> which is not suitable for a fully
|
||||
distributed HBase. (It binds to the local machine only and remote
|
||||
clients will not be able to connect). <note xml:id="how_many_zks">
|
||||
<title>How many ZooKeepers should I run?</title>
|
||||
|
||||
<para>You can run a ZooKeeper ensemble that comprises 1 node
|
||||
only but in production it is recommended that you run a
|
||||
ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
|
||||
ensemble has, the more tolerant the ensemble is of host
|
||||
failures. Also, run an odd number of machines. In ZooKeeper,
|
||||
an even number of peers is supported, but it is normally not used
|
||||
because an even sized ensemble requires, proportionally, more peers
|
||||
to form a quorum than an odd sized ensemble requires. For example, an
|
||||
ensemble with 4 peers requires 3 to form a quorum, while an ensemble with
|
||||
5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to
|
||||
fail, and thus is more fault tolerant than the ensemble of 4, which allows
|
||||
only 1 down peer.
|
||||
</para>
|
||||
<para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
|
||||
dedicated disk (A dedicated disk is the best thing you can do
|
||||
to ensure a performant ZooKeeper ensemble). For very heavily
|
||||
loaded clusters, run ZooKeeper servers on separate machines
|
||||
from RegionServers (DataNodes and TaskTrackers).</para>
|
||||
</note></para>
|
||||
|
||||
<para>For example, to have HBase manage a ZooKeeper quorum on
|
||||
nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
|
||||
port 2222 (the default is 2181) ensure
|
||||
<varname>HBASE_MANAGE_ZK</varname> is commented out or set to
|
||||
<varname>true</varname> in <filename>conf/hbase-env.sh</filename>
|
||||
and then edit <filename>conf/hbase-site.xml</filename> and set
|
||||
<varname>hbase.zookeeper.property.clientPort</varname> and
|
||||
<varname>hbase.zookeeper.quorum</varname>. You should also set
|
||||
<varname>hbase.zookeeper.property.dataDir</varname> to other than
|
||||
the default as the default has ZooKeeper persist data under
|
||||
<filename>/tmp</filename> which is often cleared on system
|
||||
restart. In the example below we have ZooKeeper persist to
|
||||
<filename>/user/local/zookeeper</filename>. <programlisting>
|
||||
<configuration>
|
||||
...
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.clientPort</name>
|
||||
<value>2222</value>
|
||||
<description>Property from ZooKeeper's config zoo.cfg.
|
||||
The port at which the clients will connect.
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.quorum</name>
|
||||
<value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
|
||||
<description>Comma separated list of servers in the ZooKeeper Quorum.
|
||||
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
|
||||
By default this is set to localhost for local and pseudo-distributed modes
|
||||
of operation. For a fully-distributed setup, this should be set to a full
|
||||
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
|
||||
this is the list of servers which we will start/stop ZooKeeper on.
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.dataDir</name>
|
||||
<value>/usr/local/zookeeper</value>
|
||||
<description>Property from ZooKeeper's config zoo.cfg.
|
||||
The directory where the snapshot is stored.
|
||||
</description>
|
||||
</property>
|
||||
...
|
||||
</configuration></programlisting></para>
|
||||
|
||||
<section>
|
||||
<title>Using existing ZooKeeper ensemble</title>
|
||||
|
||||
<para>To point HBase at an existing ZooKeeper cluster, one that
|
||||
is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
|
||||
in <filename>conf/hbase-env.sh</filename> to false
|
||||
<programlisting>
|
||||
...
|
||||
# Tell HBase whether it should manage its own instance of Zookeeper or not.
|
||||
export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
|
||||
and client port, if non-standard, in
|
||||
<filename>hbase-site.xml</filename>, or add a suitably
|
||||
configured <filename>zoo.cfg</filename> to HBase's
|
||||
<filename>CLASSPATH</filename>. HBase will prefer the
|
||||
configuration found in <filename>zoo.cfg</filename> over any
|
||||
settings in <filename>hbase-site.xml</filename>.</para>
|
||||
|
||||
<para>When HBase manages ZooKeeper, it will start/stop the
|
||||
ZooKeeper servers as a part of the regular start/stop scripts.
|
||||
If you would like to run ZooKeeper yourself, independent of
|
||||
HBase start/stop, you would do the following</para>
|
||||
|
||||
<programlisting>
|
||||
${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||
</programlisting>
|
||||
|
||||
<para>Note that you can use HBase in this manner to spin up a
|
||||
ZooKeeper cluster, unrelated to HBase. Just make sure to set
|
||||
<varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
|
||||
if you want it to stay up across HBase restarts so that when
|
||||
HBase shuts down, it doesn't take ZooKeeper down with it.</para>
|
||||
|
||||
<para>For more information about running a distinct ZooKeeper
|
||||
cluster, see the ZooKeeper <link
|
||||
xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
|
||||
Started Guide</link>. Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the
|
||||
<link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link>
|
||||
for more information on ZooKeeper sizing.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
|
||||
<section xml:id="zk.sasl.auth">
|
||||
<title>SASL Authentication with ZooKeeper</title>
|
||||
<para>Newer releases of HBase (>= 0.92) will
|
||||
support connecting to a ZooKeeper Quorum that supports
|
||||
SASL authentication (which is available in Zookeeper
|
||||
versions 3.4.0 or later).</para>
|
||||
|
||||
<para>This describes how to set up HBase to mutually
|
||||
authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
|
||||
mutual authentication (<link
|
||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
|
||||
is required as part of a complete secure HBase configuration
|
||||
(<link
|
||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
|
||||
|
||||
For simplicity of explication, this section ignores
|
||||
additional configuration required (Secure HDFS and Coprocessor
|
||||
configuration). It's recommended to begin with an
|
||||
HBase-managed Zookeeper configuration (as opposed to a
|
||||
standalone Zookeeper quorum) for ease of learning.
|
||||
</para>
|
||||
|
||||
<section><title>Operating System Prerequisites</title></section>
|
||||
|
||||
<para>
|
||||
You need to have a working Kerberos KDC setup. For
|
||||
each <code>$HOST</code> that will run a ZooKeeper
|
||||
server, you should have a principle
|
||||
<code>zookeeper/$HOST</code>. For each such host,
|
||||
add a service key (using the <code>kadmin</code> or
|
||||
<code>kadmin.local</code> tool's <code>ktadd</code>
|
||||
command) for <code>zookeeper/$HOST</code> and copy
|
||||
this file to <code>$HOST</code>, and make it
|
||||
readable only to the user that will run zookeeper on
|
||||
<code>$HOST</code>. Note the location of this file,
|
||||
which we will use below as
|
||||
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Similarly, for each <code>$HOST</code> that will run
|
||||
an HBase server (master or regionserver), you should
|
||||
have a principle: <code>hbase/$HOST</code>. For each
|
||||
host, add a keytab file called
|
||||
<filename>hbase.keytab</filename> containing a service
|
||||
key for <code>hbase/$HOST</code>, copy this file to
|
||||
<code>$HOST</code>, and make it readable only to the
|
||||
user that will run an HBase service on
|
||||
<code>$HOST</code>. Note the location of this file,
|
||||
which we will use below as
|
||||
<filename>$PATH_TO_HBASE_KEYTAB</filename>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Each user who will be an HBase client should also be
|
||||
given a Kerberos principal. This principal should
|
||||
usually have a password assigned to it (as opposed to,
|
||||
as with the HBase servers, a keytab file) which only
|
||||
this user knows. The client's principal's
|
||||
<code>maxrenewlife</code> should be set so that it can
|
||||
be renewed enough so that the user can complete their
|
||||
HBase client processes. For example, if a user runs a
|
||||
long-running HBase client process that takes at most 3
|
||||
days, we might create this user's principal within
|
||||
<code>kadmin</code> with: <code>addprinc -maxrenewlife
|
||||
3days</code>. The Zookeeper client and server
|
||||
libraries manage their own ticket refreshment by
|
||||
running threads that wake up periodically to do the
|
||||
refreshment.
|
||||
</para>
|
||||
|
||||
<para>On each host that will run an HBase client
|
||||
(e.g. <code>hbase shell</code>), add the following
|
||||
file to the HBase home directory's <filename>conf</filename>
|
||||
directory:</para>
|
||||
|
||||
<programlisting>
|
||||
Client {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=false
|
||||
useTicketCache=true;
|
||||
};
|
||||
</programlisting>
|
||||
|
||||
<para>We'll refer to this JAAS configuration file as
|
||||
<filename>$CLIENT_CONF</filename> below.</para>
|
||||
|
||||
<section>
|
||||
<title>HBase-managed Zookeeper Configuration</title>
|
||||
|
||||
<para>On each node that will run a zookeeper, a
|
||||
master, or a regionserver, create a <link
|
||||
xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
|
||||
configuration file in the conf directory of the node's
|
||||
<filename>HBASE_HOME</filename> directory that looks like the
|
||||
following:</para>
|
||||
|
||||
<programlisting>
|
||||
Server {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=true
|
||||
keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
|
||||
storeKey=true
|
||||
useTicketCache=false
|
||||
principal="zookeeper/$HOST";
|
||||
};
|
||||
Client {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=true
|
||||
useTicketCache=false
|
||||
keyTab="$PATH_TO_HBASE_KEYTAB"
|
||||
principal="hbase/$HOST";
|
||||
};
|
||||
</programlisting>
|
||||
|
||||
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
|
||||
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
|
||||
you created above, and <code>$HOST</code> is the hostname for that
|
||||
node.
|
||||
|
||||
<para>The <code>Server</code> section will be used by
|
||||
the Zookeeper quorum server, while the
|
||||
<code>Client</code> section will be used by the HBase
|
||||
master and regionservers. The path to this file should
|
||||
be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
|
||||
in the <filename>hbase-env.sh</filename>
|
||||
listing below.</para>
|
||||
|
||||
<para>
|
||||
The path to this file should be substituted for the
|
||||
text <filename>$CLIENT_CONF</filename> in the
|
||||
<filename>hbase-env.sh</filename> listing below.
|
||||
</para>
|
||||
|
||||
<para>Modify your <filename>hbase-env.sh</filename> to include the
|
||||
following:</para>
|
||||
|
||||
<programlisting>
|
||||
export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
|
||||
export HBASE_MANAGES_ZK=true
|
||||
export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
</programlisting>
|
||||
|
||||
where <filename>$HBASE_SERVER_CONF</filename> and
|
||||
<filename>$CLIENT_CONF</filename> are the full paths to the
|
||||
JAAS configuration files created above.
|
||||
|
||||
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
||||
that will run zookeeper, master or regionserver to contain:</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
<configuration>
|
||||
<property>
|
||||
<name>hbase.zookeeper.quorum</name>
|
||||
<value>$ZK_NODES</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.cluster.distributed</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.authProvider.1</name>
|
||||
<value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
</configuration>
|
||||
]]></programlisting>
|
||||
|
||||
<para>where <code>$ZK_NODES</code> is the
|
||||
comma-separated list of hostnames of the Zookeeper
|
||||
Quorum hosts.</para>
|
||||
|
||||
<para>Start your hbase cluster by running one or more
|
||||
of the following set of commands on the appropriate
|
||||
hosts:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
bin/hbase zookeeper start
|
||||
bin/hbase master start
|
||||
bin/hbase regionserver start
|
||||
</programlisting>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>External Zookeeper Configuration</title>
|
||||
<para>Add a JAAS configuration file that looks like:
|
||||
|
||||
<programlisting>
|
||||
Client {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=true
|
||||
useTicketCache=false
|
||||
keyTab="$PATH_TO_HBASE_KEYTAB"
|
||||
principal="hbase/$HOST";
|
||||
};
|
||||
</programlisting>
|
||||
|
||||
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab
|
||||
created above for HBase services to run on this host, and <code>$HOST</code> is the
|
||||
hostname for that node. Put this in the HBase home's
|
||||
configuration directory. We'll refer to this file's
|
||||
full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
|
||||
|
||||
<para>Modify your hbase-env.sh to include the following:</para>
|
||||
|
||||
<programlisting>
|
||||
export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
|
||||
export HBASE_MANAGES_ZK=false
|
||||
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
</programlisting>
|
||||
|
||||
|
||||
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
||||
that will run a master or regionserver to contain:</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
<configuration>
|
||||
<property>
|
||||
<name>hbase.zookeeper.quorum</name>
|
||||
<value>$ZK_NODES</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.cluster.distributed</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
</configuration>
|
||||
]]>
|
||||
</programlisting>
|
||||
|
||||
<para>where <code>$ZK_NODES</code> is the
|
||||
comma-separated list of hostnames of the Zookeeper
|
||||
Quorum hosts.</para>
|
||||
|
||||
<para>
|
||||
Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
|
||||
<programlisting>
|
||||
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
|
||||
kerberos.removeHostFromPrincipal=true
|
||||
kerberos.removeRealmFromPrincipal=true
|
||||
</programlisting>
|
||||
|
||||
Also on each of these hosts, create a JAAS configuration file containing:
|
||||
|
||||
<programlisting>
|
||||
Server {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=true
|
||||
keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
|
||||
storeKey=true
|
||||
useTicketCache=false
|
||||
principal="zookeeper/$HOST";
|
||||
};
|
||||
</programlisting>
|
||||
|
||||
where <code>$HOST</code> is the hostname of each
|
||||
Quorum host. We will refer to the full pathname of
|
||||
this file as <filename>$ZK_SERVER_CONF</filename> below.
|
||||
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Start your Zookeepers on each Zookeeper Quorum host with:
|
||||
|
||||
<programlisting>
|
||||
SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
|
||||
</programlisting>
|
||||
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
bin/hbase master start
|
||||
bin/hbase regionserver start
|
||||
</programlisting>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Zookeeper Server Authentication Log Output</title>
|
||||
<para>If the configuration above is successful,
|
||||
you should see something similar to the following in
|
||||
your Zookeeper server logs:
|
||||
<programlisting>
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
|
||||
11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:39 UTC 2011
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:39 UTC 2011
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
|
||||
..
|
||||
11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler:
|
||||
Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN;
|
||||
authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
|
||||
11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
|
||||
11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
|
||||
</programlisting>
|
||||
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Zookeeper Client Authentication Log Output</title>
|
||||
<para>On the Zookeeper client side (HBase master or regionserver),
|
||||
you should see something similar to the following:
|
||||
|
||||
<programlisting>
|
||||
11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
|
||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
|
||||
11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
|
||||
11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
|
||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:59 UTC 2011
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:59 UTC 2011
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
|
||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
|
||||
</programlisting>
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Configuration from Scratch</title>
|
||||
|
||||
This has been tested on the current standard Amazon
|
||||
Linux AMI. First setup KDC and principals as
|
||||
described above. Next checkout code and run a sanity
|
||||
check.
|
||||
|
||||
<programlisting>
|
||||
git clone git://git.apache.org/hbase.git
|
||||
cd hbase
|
||||
mvn -PlocalTests clean test -Dtest=TestZooKeeperACL
|
||||
</programlisting>
|
||||
|
||||
Then configure HBase as described above.
|
||||
Manually edit target/cached_classpath.txt (see below)..
|
||||
|
||||
<programlisting>
|
||||
bin/hbase zookeeper &
|
||||
bin/hbase master &
|
||||
bin/hbase regionserver &
|
||||
</programlisting>
|
||||
</section>
|
||||
|
||||
|
||||
<section>
|
||||
<title>Future improvements</title>
|
||||
|
||||
<section><title>Fix target/cached_classpath.txt</title>
|
||||
<para>
|
||||
You must override the standard hadoop-core jar file from the
|
||||
<code>target/cached_classpath.txt</code>
|
||||
file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
|
||||
|
||||
<programlisting>
|
||||
echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt
|
||||
mv target/tmp.txt target/cached_classpath.txt
|
||||
</programlisting>
|
||||
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Set JAAS configuration
|
||||
programmatically</title>
|
||||
|
||||
|
||||
This would avoid the need for a separate Hadoop jar
|
||||
that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Elimination of
|
||||
<code>kerberos.removeHostFromPrincipal</code> and
|
||||
<code>kerberos.removeRealmFromPrincipal</code></title>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
|
||||
</section> <!-- SASL Authentication with ZooKeeper -->
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</section> <!-- zookeeper -->
|
||||
|
||||
|
||||
<section xml:id="config.files">
|
||||
|
@ -1704,34 +1151,4 @@ of all regions.
|
|||
|
||||
</section> <!-- important config -->
|
||||
|
||||
<section xml:id="config.bloom">
|
||||
<title>Bloom Filter Configuration</title>
|
||||
<section>
|
||||
<title><varname>io.hfile.bloom.enabled</varname> global kill
|
||||
switch</title>
|
||||
|
||||
<para><code>io.hfile.bloom.enabled</code> in
|
||||
<classname>Configuration</classname> serves as the kill switch in case
|
||||
something goes wrong. Default = <varname>true</varname>.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title><varname>io.hfile.bloom.error.rate</varname></title>
|
||||
|
||||
<para><varname>io.hfile.bloom.error.rate</varname> = average false
|
||||
positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
|
||||
bit per bloom entry.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title><varname>io.hfile.bloom.max.fold</varname></title>
|
||||
|
||||
<para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
|
||||
fold rate. Most people should leave this alone. Default = 7, or can
|
||||
collapse to at least 1/128th of original size. See the
|
||||
<emphasis>Development Process</emphasis> section of the document <link
|
||||
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
||||
in HBase</link> for more on what this option means.</para>
|
||||
</section>
|
||||
</section>
|
||||
</chapter>
|
||||
|
|
|
@ -33,8 +33,9 @@
|
|||
|
||||
<para><xref linkend="quickstart" /> will get you up and
|
||||
running on a single-node instance of HBase using the local filesystem.
|
||||
<xref linkend="configuration" /> describes setup
|
||||
of HBase in distributed mode running on top of HDFS.</para>
|
||||
<xref linkend="configuration" /> describes basic system
|
||||
requirements and configuration running HBase in distributed mode
|
||||
on top of HDFS.</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="quickstart">
|
||||
|
@ -51,7 +52,7 @@
|
|||
|
||||
<para>Choose a download site from this list of <link
|
||||
xlink:href="http://www.apache.org/dyn/closer.cgi/hbase/">Apache Download
|
||||
Mirrors</link>. Click on suggested top link. This will take you to a
|
||||
Mirrors</link>. Click on the suggested top link. This will take you to a
|
||||
mirror of <emphasis>HBase Releases</emphasis>. Click on the folder named
|
||||
<filename>stable</filename> and then download the file that ends in
|
||||
<filename>.tar.gz</filename> to your local filesystem; e.g.
|
||||
|
@ -65,24 +66,21 @@ $ cd hbase-<?eval ${project.version}?>
|
|||
</programlisting></para>
|
||||
|
||||
<para>At this point, you are ready to start HBase. But before starting
|
||||
it, you might want to edit <filename>conf/hbase-site.xml</filename> and
|
||||
set the directory you want HBase to write to,
|
||||
<varname>hbase.rootdir</varname>. <programlisting>
|
||||
|
||||
<?xml version="1.0"?>
|
||||
it, you might want to edit <filename>conf/hbase-site.xml</filename>, the
|
||||
file you write your site-specific configurations into, and
|
||||
set <varname>hbase.rootdir</varname>, the directory HBase writes data to,
|
||||
<programlisting><?xml version="1.0"?>
|
||||
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
|
||||
<configuration>
|
||||
<property>
|
||||
<name>hbase.rootdir</name>
|
||||
<value>file:///DIRECTORY/hbase</value>
|
||||
</property>
|
||||
</configuration>
|
||||
|
||||
</programlisting> Replace <varname>DIRECTORY</varname> in the above with a
|
||||
path to a directory where you want HBase to store its data. By default,
|
||||
</configuration></programlisting> Replace <varname>DIRECTORY</varname> in the above with the
|
||||
path to the directory where you want HBase to store its data. By default,
|
||||
<varname>hbase.rootdir</varname> is set to
|
||||
<filename>/tmp/hbase-${user.name}</filename> which means you'll lose all
|
||||
your data whenever your server reboots (Most operating systems clear
|
||||
your data whenever your server reboots unless you change it (Most operating systems clear
|
||||
<filename>/tmp</filename> on restart).</para>
|
||||
</section>
|
||||
|
||||
|
@ -96,7 +94,7 @@ starting Master, logging to logs/hbase-user-master-example.org.out</programlisti
|
|||
standalone mode, HBase runs all daemons in the the one JVM; i.e. both
|
||||
the HBase and ZooKeeper daemons. HBase logs can be found in the
|
||||
<filename>logs</filename> subdirectory. Check them out especially if
|
||||
HBase had trouble starting.</para>
|
||||
it seems HBase had trouble starting.</para>
|
||||
|
||||
<note>
|
||||
<title>Is <application>java</application> installed?</title>
|
||||
|
@ -108,7 +106,7 @@ starting Master, logging to logs/hbase-user-master-example.org.out</programlisti
|
|||
options the java program takes (HBase requires java 6). If this is not
|
||||
the case, HBase will not start. Install java, edit
|
||||
<filename>conf/hbase-env.sh</filename>, uncommenting the
|
||||
<envar>JAVA_HOME</envar> line pointing it to your java install. Then,
|
||||
<envar>JAVA_HOME</envar> line pointing it to your java install, then,
|
||||
retry the steps above.</para>
|
||||
</note>
|
||||
</section>
|
||||
|
@ -154,9 +152,7 @@ hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
|
|||
<varname>cf</varname> in this example -- followed by a colon and then a
|
||||
column qualifier suffix (<varname>a</varname> in this case).</para>
|
||||
|
||||
<para>Verify the data insert.</para>
|
||||
|
||||
<para>Run a scan of the table by doing the following</para>
|
||||
<para>Verify the data insert by running a scan of the table as follows</para>
|
||||
|
||||
<para><programlisting>hbase(main):007:0> scan 'test'
|
||||
ROW COLUMN+CELL
|
||||
|
@ -165,7 +161,7 @@ row2 column=cf:b, timestamp=1288380738440, value=value2
|
|||
row3 column=cf:c, timestamp=1288380747365, value=value3
|
||||
3 row(s) in 0.0590 seconds</programlisting></para>
|
||||
|
||||
<para>Get a single row as follows</para>
|
||||
<para>Get a single row</para>
|
||||
|
||||
<para><programlisting>hbase(main):008:0> get 'test', 'row1'
|
||||
COLUMN CELL
|
||||
|
@ -198,9 +194,9 @@ stopping hbase...............</programlisting></para>
|
|||
<title>Where to go next</title>
|
||||
|
||||
<para>The above described standalone setup is good for testing and
|
||||
experiments only. Next move on to <xref linkend="configuration" /> where we'll go into
|
||||
depth on the different HBase run modes, requirements and critical
|
||||
configurations needed setting up a distributed HBase deploy.</para>
|
||||
experiments only. In the next chapter, <xref linkend="configuration" />,
|
||||
we'll go into depth on the different HBase run modes, system requirements
|
||||
running HBase, and critical configurations setting up a distributed HBase deploy.</para>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
|
|
@ -526,6 +526,96 @@ htable.close();</programlisting></para>
|
|||
too few regions then the reads could likely be served from too few nodes. </para>
|
||||
<para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>
|
||||
</section>
|
||||
<section xml:id="blooms">
|
||||
<title>Bloom Filters</title>
|
||||
<para>Enabling Bloom Filters can save your having to go to disk and
|
||||
can help improve read latencys.</para>
|
||||
<para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
|
||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
|
||||
Add bloomfilters</link>.<footnote>
|
||||
<para>For description of the development process -- why static blooms
|
||||
rather than dynamic -- and for an overview of the unique properties
|
||||
that pertain to blooms in HBase, as well as possible future
|
||||
directions, see the <emphasis>Development Process</emphasis> section
|
||||
of the document <link
|
||||
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
||||
in HBase</link> attached to <link
|
||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
|
||||
</footnote><footnote>
|
||||
<para>The bloom filters described here are actually version two of
|
||||
blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
|
||||
option based on work done by the <link
|
||||
xlink:href="http://www.one-lab.org">European Commission One-Lab
|
||||
Project 034819</link>. The core of the HBase bloom work was later
|
||||
pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
|
||||
Version 1 of HBase blooms never worked that well. Version 2 is a
|
||||
rewrite from scratch though again it starts with the one-lab
|
||||
work.</para>
|
||||
</footnote></para>
|
||||
<para>See also <xref linkend="schema.bloom" />.
|
||||
</para>
|
||||
|
||||
<section xml:id="bloom_footprint">
|
||||
<title>Bloom StoreFile footprint</title>
|
||||
|
||||
<para>Bloom filters add an entry to the <classname>StoreFile</classname>
|
||||
general <classname>FileInfo</classname> data structure and then two
|
||||
extra entries to the <classname>StoreFile</classname> metadata
|
||||
section.</para>
|
||||
|
||||
<section>
|
||||
<title>BloomFilter in the <classname>StoreFile</classname>
|
||||
<classname>FileInfo</classname> data structure</title>
|
||||
|
||||
<para><classname>FileInfo</classname> has a
|
||||
<varname>BLOOM_FILTER_TYPE</varname> entry which is set to
|
||||
<varname>NONE</varname>, <varname>ROW</varname> or
|
||||
<varname>ROWCOL.</varname></para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>BloomFilter entries in <classname>StoreFile</classname>
|
||||
metadata</title>
|
||||
|
||||
<para><varname>BLOOM_FILTER_META</varname> holds Bloom Size, Hash
|
||||
Function used, etc. Its small in size and is cached on
|
||||
<classname>StoreFile.Reader</classname> load</para>
|
||||
<para><varname>BLOOM_FILTER_DATA</varname> is the actual bloomfilter
|
||||
data. Obtained on-demand. Stored in the LRU cache, if it is enabled
|
||||
(Its enabled by default).</para>
|
||||
</section>
|
||||
</section>
|
||||
<section xml:id="config.bloom">
|
||||
<title>Bloom Filter Configuration</title>
|
||||
<section>
|
||||
<title><varname>io.hfile.bloom.enabled</varname> global kill
|
||||
switch</title>
|
||||
|
||||
<para><code>io.hfile.bloom.enabled</code> in
|
||||
<classname>Configuration</classname> serves as the kill switch in case
|
||||
something goes wrong. Default = <varname>true</varname>.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title><varname>io.hfile.bloom.error.rate</varname></title>
|
||||
|
||||
<para><varname>io.hfile.bloom.error.rate</varname> = average false
|
||||
positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
|
||||
bit per bloom entry.</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title><varname>io.hfile.bloom.max.fold</varname></title>
|
||||
|
||||
<para><varname>io.hfile.bloom.max.fold</varname> = guaranteed minimum
|
||||
fold rate. Most people should leave this alone. Default = 7, or can
|
||||
collapse to at least 1/128th of original size. See the
|
||||
<emphasis>Development Process</emphasis> section of the document <link
|
||||
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
||||
in HBase</link> for more on what this option means.</para>
|
||||
</section>
|
||||
</section>
|
||||
</section> <!-- bloom -->
|
||||
|
||||
</section> <!-- reading -->
|
||||
|
||||
|
|
|
@ -0,0 +1,586 @@
|
|||
<?xml version="1.0"?>
|
||||
<chapter xml:id="zookeeper"
|
||||
version="5.0" xmlns="http://docbook.org/ns/docbook"
|
||||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||||
xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||
xmlns:svg="http://www.w3.org/2000/svg"
|
||||
xmlns:m="http://www.w3.org/1998/Math/MathML"
|
||||
xmlns:html="http://www.w3.org/1999/xhtml"
|
||||
xmlns:db="http://docbook.org/ns/docbook">
|
||||
<!--
|
||||
/**
|
||||
* Licensed to the Apache Software Foundation (ASF) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. The ASF licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
-->
|
||||
|
||||
<title>ZooKeeper<indexterm>
|
||||
<primary>ZooKeeper</primary>
|
||||
</indexterm></title>
|
||||
|
||||
<para>A distributed HBase depends on a running ZooKeeper cluster.
|
||||
All participating nodes and clients need to be able to access the
|
||||
running ZooKeeper ensemble. HBase by default manages a ZooKeeper
|
||||
"cluster" for you. It will start and stop the ZooKeeper ensemble
|
||||
as part of the HBase start/stop process. You can also manage the
|
||||
ZooKeeper ensemble independent of HBase and just point HBase at
|
||||
the cluster it should use. To toggle HBase management of
|
||||
ZooKeeper, use the <varname>HBASE_MANAGES_ZK</varname> variable in
|
||||
<filename>conf/hbase-env.sh</filename>. This variable, which
|
||||
defaults to <varname>true</varname>, tells HBase whether to
|
||||
start/stop the ZooKeeper ensemble servers as part of HBase
|
||||
start/stop.</para>
|
||||
|
||||
<para>When HBase manages the ZooKeeper ensemble, you can specify
|
||||
ZooKeeper configuration using its native
|
||||
<filename>zoo.cfg</filename> file, or, the easier option is to
|
||||
just specify ZooKeeper options directly in
|
||||
<filename>conf/hbase-site.xml</filename>. A ZooKeeper
|
||||
configuration option can be set as a property in the HBase
|
||||
<filename>hbase-site.xml</filename> XML configuration file by
|
||||
prefacing the ZooKeeper option name with
|
||||
<varname>hbase.zookeeper.property</varname>. For example, the
|
||||
<varname>clientPort</varname> setting in ZooKeeper can be changed
|
||||
by setting the
|
||||
<varname>hbase.zookeeper.property.clientPort</varname> property.
|
||||
For all default values used by HBase, including ZooKeeper
|
||||
configuration, see <xref linkend="hbase_default_configurations" />. Look for the
|
||||
<varname>hbase.zookeeper.property</varname> prefix <footnote>
|
||||
<para>For the full list of ZooKeeper configurations, see
|
||||
ZooKeeper's <filename>zoo.cfg</filename>. HBase does not ship
|
||||
with a <filename>zoo.cfg</filename> so you will need to browse
|
||||
the <filename>conf</filename> directory in an appropriate
|
||||
ZooKeeper download.</para>
|
||||
</footnote></para>
|
||||
|
||||
<para>You must at least list the ensemble servers in
|
||||
<filename>hbase-site.xml</filename> using the
|
||||
<varname>hbase.zookeeper.quorum</varname> property. This property
|
||||
defaults to a single ensemble member at
|
||||
<varname>localhost</varname> which is not suitable for a fully
|
||||
distributed HBase. (It binds to the local machine only and remote
|
||||
clients will not be able to connect). <note xml:id="how_many_zks">
|
||||
<title>How many ZooKeepers should I run?</title>
|
||||
|
||||
<para>You can run a ZooKeeper ensemble that comprises 1 node
|
||||
only but in production it is recommended that you run a
|
||||
ZooKeeper ensemble of 3, 5 or 7 machines; the more members an
|
||||
ensemble has, the more tolerant the ensemble is of host
|
||||
failures. Also, run an odd number of machines. In ZooKeeper,
|
||||
an even number of peers is supported, but it is normally not used
|
||||
because an even sized ensemble requires, proportionally, more peers
|
||||
to form a quorum than an odd sized ensemble requires. For example, an
|
||||
ensemble with 4 peers requires 3 to form a quorum, while an ensemble with
|
||||
5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to
|
||||
fail, and thus is more fault tolerant than the ensemble of 4, which allows
|
||||
only 1 down peer.
|
||||
</para>
|
||||
<para>Give each ZooKeeper server around 1GB of RAM, and if possible, its own
|
||||
dedicated disk (A dedicated disk is the best thing you can do
|
||||
to ensure a performant ZooKeeper ensemble). For very heavily
|
||||
loaded clusters, run ZooKeeper servers on separate machines
|
||||
from RegionServers (DataNodes and TaskTrackers).</para>
|
||||
</note></para>
|
||||
|
||||
<para>For example, to have HBase manage a ZooKeeper quorum on
|
||||
nodes <emphasis>rs{1,2,3,4,5}.example.com</emphasis>, bound to
|
||||
port 2222 (the default is 2181) ensure
|
||||
<varname>HBASE_MANAGE_ZK</varname> is commented out or set to
|
||||
<varname>true</varname> in <filename>conf/hbase-env.sh</filename>
|
||||
and then edit <filename>conf/hbase-site.xml</filename> and set
|
||||
<varname>hbase.zookeeper.property.clientPort</varname> and
|
||||
<varname>hbase.zookeeper.quorum</varname>. You should also set
|
||||
<varname>hbase.zookeeper.property.dataDir</varname> to other than
|
||||
the default as the default has ZooKeeper persist data under
|
||||
<filename>/tmp</filename> which is often cleared on system
|
||||
restart. In the example below we have ZooKeeper persist to
|
||||
<filename>/user/local/zookeeper</filename>. <programlisting>
|
||||
<configuration>
|
||||
...
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.clientPort</name>
|
||||
<value>2222</value>
|
||||
<description>Property from ZooKeeper's config zoo.cfg.
|
||||
The port at which the clients will connect.
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.quorum</name>
|
||||
<value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
|
||||
<description>Comma separated list of servers in the ZooKeeper Quorum.
|
||||
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
|
||||
By default this is set to localhost for local and pseudo-distributed modes
|
||||
of operation. For a fully-distributed setup, this should be set to a full
|
||||
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
|
||||
this is the list of servers which we will start/stop ZooKeeper on.
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.dataDir</name>
|
||||
<value>/usr/local/zookeeper</value>
|
||||
<description>Property from ZooKeeper's config zoo.cfg.
|
||||
The directory where the snapshot is stored.
|
||||
</description>
|
||||
</property>
|
||||
...
|
||||
</configuration></programlisting></para>
|
||||
|
||||
<section>
|
||||
<title>Using existing ZooKeeper ensemble</title>
|
||||
|
||||
<para>To point HBase at an existing ZooKeeper cluster, one that
|
||||
is not managed by HBase, set <varname>HBASE_MANAGES_ZK</varname>
|
||||
in <filename>conf/hbase-env.sh</filename> to false
|
||||
<programlisting>
|
||||
...
|
||||
# Tell HBase whether it should manage its own instance of Zookeeper or not.
|
||||
export HBASE_MANAGES_ZK=false</programlisting> Next set ensemble locations
|
||||
and client port, if non-standard, in
|
||||
<filename>hbase-site.xml</filename>, or add a suitably
|
||||
configured <filename>zoo.cfg</filename> to HBase's
|
||||
<filename>CLASSPATH</filename>. HBase will prefer the
|
||||
configuration found in <filename>zoo.cfg</filename> over any
|
||||
settings in <filename>hbase-site.xml</filename>.</para>
|
||||
|
||||
<para>When HBase manages ZooKeeper, it will start/stop the
|
||||
ZooKeeper servers as a part of the regular start/stop scripts.
|
||||
If you would like to run ZooKeeper yourself, independent of
|
||||
HBase start/stop, you would do the following</para>
|
||||
|
||||
<programlisting>
|
||||
${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||
</programlisting>
|
||||
|
||||
<para>Note that you can use HBase in this manner to spin up a
|
||||
ZooKeeper cluster, unrelated to HBase. Just make sure to set
|
||||
<varname>HBASE_MANAGES_ZK</varname> to <varname>false</varname>
|
||||
if you want it to stay up across HBase restarts so that when
|
||||
HBase shuts down, it doesn't take ZooKeeper down with it.</para>
|
||||
|
||||
<para>For more information about running a distinct ZooKeeper
|
||||
cluster, see the ZooKeeper <link
|
||||
xlink:href="http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html">Getting
|
||||
Started Guide</link>. Additionally, see the <link xlink:href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7">ZooKeeper Wiki</link> or the
|
||||
<link xlink:href="http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup">ZooKeeper documentation</link>
|
||||
for more information on ZooKeeper sizing.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
|
||||
<section xml:id="zk.sasl.auth">
|
||||
<title>SASL Authentication with ZooKeeper</title>
|
||||
<para>Newer releases of HBase (>= 0.92) will
|
||||
support connecting to a ZooKeeper Quorum that supports
|
||||
SASL authentication (which is available in Zookeeper
|
||||
versions 3.4.0 or later).</para>
|
||||
|
||||
<para>This describes how to set up HBase to mutually
|
||||
authenticate with a ZooKeeper Quorum. ZooKeeper/HBase
|
||||
mutual authentication (<link
|
||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-2418">HBASE-2418</link>)
|
||||
is required as part of a complete secure HBase configuration
|
||||
(<link
|
||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-3025">HBASE-3025</link>).
|
||||
|
||||
For simplicity of explication, this section ignores
|
||||
additional configuration required (Secure HDFS and Coprocessor
|
||||
configuration). It's recommended to begin with an
|
||||
HBase-managed Zookeeper configuration (as opposed to a
|
||||
standalone Zookeeper quorum) for ease of learning.
|
||||
</para>
|
||||
|
||||
<section><title>Operating System Prerequisites</title></section>
|
||||
|
||||
<para>
|
||||
You need to have a working Kerberos KDC setup. For
|
||||
each <code>$HOST</code> that will run a ZooKeeper
|
||||
server, you should have a principle
|
||||
<code>zookeeper/$HOST</code>. For each such host,
|
||||
add a service key (using the <code>kadmin</code> or
|
||||
<code>kadmin.local</code> tool's <code>ktadd</code>
|
||||
command) for <code>zookeeper/$HOST</code> and copy
|
||||
this file to <code>$HOST</code>, and make it
|
||||
readable only to the user that will run zookeeper on
|
||||
<code>$HOST</code>. Note the location of this file,
|
||||
which we will use below as
|
||||
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Similarly, for each <code>$HOST</code> that will run
|
||||
an HBase server (master or regionserver), you should
|
||||
have a principle: <code>hbase/$HOST</code>. For each
|
||||
host, add a keytab file called
|
||||
<filename>hbase.keytab</filename> containing a service
|
||||
key for <code>hbase/$HOST</code>, copy this file to
|
||||
<code>$HOST</code>, and make it readable only to the
|
||||
user that will run an HBase service on
|
||||
<code>$HOST</code>. Note the location of this file,
|
||||
which we will use below as
|
||||
<filename>$PATH_TO_HBASE_KEYTAB</filename>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Each user who will be an HBase client should also be
|
||||
given a Kerberos principal. This principal should
|
||||
usually have a password assigned to it (as opposed to,
|
||||
as with the HBase servers, a keytab file) which only
|
||||
this user knows. The client's principal's
|
||||
<code>maxrenewlife</code> should be set so that it can
|
||||
be renewed enough so that the user can complete their
|
||||
HBase client processes. For example, if a user runs a
|
||||
long-running HBase client process that takes at most 3
|
||||
days, we might create this user's principal within
|
||||
<code>kadmin</code> with: <code>addprinc -maxrenewlife
|
||||
3days</code>. The Zookeeper client and server
|
||||
libraries manage their own ticket refreshment by
|
||||
running threads that wake up periodically to do the
|
||||
refreshment.
|
||||
</para>
|
||||
|
||||
<para>On each host that will run an HBase client
|
||||
(e.g. <code>hbase shell</code>), add the following
|
||||
file to the HBase home directory's <filename>conf</filename>
|
||||
directory:</para>
|
||||
|
||||
<programlisting>
|
||||
Client {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=false
|
||||
useTicketCache=true;
|
||||
};
|
||||
</programlisting>
|
||||
|
||||
<para>We'll refer to this JAAS configuration file as
|
||||
<filename>$CLIENT_CONF</filename> below.</para>
|
||||
|
||||
<section>
|
||||
<title>HBase-managed Zookeeper Configuration</title>
|
||||
|
||||
<para>On each node that will run a zookeeper, a
|
||||
master, or a regionserver, create a <link
|
||||
xlink:href="http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html">JAAS</link>
|
||||
configuration file in the conf directory of the node's
|
||||
<filename>HBASE_HOME</filename> directory that looks like the
|
||||
following:</para>
|
||||
|
||||
<programlisting>
|
||||
Server {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=true
|
||||
keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
|
||||
storeKey=true
|
||||
useTicketCache=false
|
||||
principal="zookeeper/$HOST";
|
||||
};
|
||||
Client {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=true
|
||||
useTicketCache=false
|
||||
keyTab="$PATH_TO_HBASE_KEYTAB"
|
||||
principal="hbase/$HOST";
|
||||
};
|
||||
</programlisting>
|
||||
|
||||
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
|
||||
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
|
||||
you created above, and <code>$HOST</code> is the hostname for that
|
||||
node.
|
||||
|
||||
<para>The <code>Server</code> section will be used by
|
||||
the Zookeeper quorum server, while the
|
||||
<code>Client</code> section will be used by the HBase
|
||||
master and regionservers. The path to this file should
|
||||
be substituted for the text <filename>$HBASE_SERVER_CONF</filename>
|
||||
in the <filename>hbase-env.sh</filename>
|
||||
listing below.</para>
|
||||
|
||||
<para>
|
||||
The path to this file should be substituted for the
|
||||
text <filename>$CLIENT_CONF</filename> in the
|
||||
<filename>hbase-env.sh</filename> listing below.
|
||||
</para>
|
||||
|
||||
<para>Modify your <filename>hbase-env.sh</filename> to include the
|
||||
following:</para>
|
||||
|
||||
<programlisting>
|
||||
export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
|
||||
export HBASE_MANAGES_ZK=true
|
||||
export HBASE_ZOOKEEPER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
</programlisting>
|
||||
|
||||
where <filename>$HBASE_SERVER_CONF</filename> and
|
||||
<filename>$CLIENT_CONF</filename> are the full paths to the
|
||||
JAAS configuration files created above.
|
||||
|
||||
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
||||
that will run zookeeper, master or regionserver to contain:</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
<configuration>
|
||||
<property>
|
||||
<name>hbase.zookeeper.quorum</name>
|
||||
<value>$ZK_NODES</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.cluster.distributed</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.authProvider.1</name>
|
||||
<value>org.apache.zookeeper.server.auth.SASLAuthenticationProvider</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.kerberos.removeHostFromPrincipal</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.zookeeper.property.kerberos.removeRealmFromPrincipal</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
</configuration>
|
||||
]]></programlisting>
|
||||
|
||||
<para>where <code>$ZK_NODES</code> is the
|
||||
comma-separated list of hostnames of the Zookeeper
|
||||
Quorum hosts.</para>
|
||||
|
||||
<para>Start your hbase cluster by running one or more
|
||||
of the following set of commands on the appropriate
|
||||
hosts:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
bin/hbase zookeeper start
|
||||
bin/hbase master start
|
||||
bin/hbase regionserver start
|
||||
</programlisting>
|
||||
|
||||
</section>
|
||||
|
||||
<section><title>External Zookeeper Configuration</title>
|
||||
<para>Add a JAAS configuration file that looks like:
|
||||
|
||||
<programlisting>
|
||||
Client {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=true
|
||||
useTicketCache=false
|
||||
keyTab="$PATH_TO_HBASE_KEYTAB"
|
||||
principal="hbase/$HOST";
|
||||
};
|
||||
</programlisting>
|
||||
|
||||
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> is the keytab
|
||||
created above for HBase services to run on this host, and <code>$HOST</code> is the
|
||||
hostname for that node. Put this in the HBase home's
|
||||
configuration directory. We'll refer to this file's
|
||||
full pathname as <filename>$HBASE_SERVER_CONF</filename> below.</para>
|
||||
|
||||
<para>Modify your hbase-env.sh to include the following:</para>
|
||||
|
||||
<programlisting>
|
||||
export HBASE_OPTS="-Djava.security.auth.login.config=$CLIENT_CONF"
|
||||
export HBASE_MANAGES_ZK=false
|
||||
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||
</programlisting>
|
||||
|
||||
|
||||
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
||||
that will run a master or regionserver to contain:</para>
|
||||
|
||||
<programlisting><![CDATA[
|
||||
<configuration>
|
||||
<property>
|
||||
<name>hbase.zookeeper.quorum</name>
|
||||
<value>$ZK_NODES</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.cluster.distributed</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
</configuration>
|
||||
]]>
|
||||
</programlisting>
|
||||
|
||||
<para>where <code>$ZK_NODES</code> is the
|
||||
comma-separated list of hostnames of the Zookeeper
|
||||
Quorum hosts.</para>
|
||||
|
||||
<para>
|
||||
Add a <filename>zoo.cfg</filename> for each Zookeeper Quorum host containing:
|
||||
<programlisting>
|
||||
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
|
||||
kerberos.removeHostFromPrincipal=true
|
||||
kerberos.removeRealmFromPrincipal=true
|
||||
</programlisting>
|
||||
|
||||
Also on each of these hosts, create a JAAS configuration file containing:
|
||||
|
||||
<programlisting>
|
||||
Server {
|
||||
com.sun.security.auth.module.Krb5LoginModule required
|
||||
useKeyTab=true
|
||||
keyTab="$PATH_TO_ZOOKEEPER_KEYTAB"
|
||||
storeKey=true
|
||||
useTicketCache=false
|
||||
principal="zookeeper/$HOST";
|
||||
};
|
||||
</programlisting>
|
||||
|
||||
where <code>$HOST</code> is the hostname of each
|
||||
Quorum host. We will refer to the full pathname of
|
||||
this file as <filename>$ZK_SERVER_CONF</filename> below.
|
||||
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Start your Zookeepers on each Zookeeper Quorum host with:
|
||||
|
||||
<programlisting>
|
||||
SERVER_JVMFLAGS="-Djava.security.auth.login.config=$ZK_SERVER_CONF" bin/zkServer start
|
||||
</programlisting>
|
||||
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Start your HBase cluster by running one or more of the following set of commands on the appropriate nodes:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
bin/hbase master start
|
||||
bin/hbase regionserver start
|
||||
</programlisting>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Zookeeper Server Authentication Log Output</title>
|
||||
<para>If the configuration above is successful,
|
||||
you should see something similar to the following in
|
||||
your Zookeeper server logs:
|
||||
<programlisting>
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: successfully logged in.
|
||||
11/12/05 22:43:39 INFO server.NIOServerCnxnFactory: binding to port 0.0.0.0/0.0.0.0:2181
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh thread started.
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:39 UTC 2011
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:39 UTC 2011
|
||||
11/12/05 22:43:39 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:36:42 UTC 2011
|
||||
..
|
||||
11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler:
|
||||
Successfully authenticated client: authenticationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN;
|
||||
authorizationID=hbase/ip-10-166-175-249.us-west-1.compute.internal@HADOOP.LOCALDOMAIN.
|
||||
11/12/05 22:43:59 INFO auth.SaslServerCallbackHandler: Setting authorizedID: hbase
|
||||
11/12/05 22:43:59 INFO server.ZooKeeperServer: adding SASL authorization for authorizationID: hbase
|
||||
</programlisting>
|
||||
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Zookeeper Client Authentication Log Output</title>
|
||||
<para>On the Zookeeper client side (HBase master or regionserver),
|
||||
you should see something similar to the following:
|
||||
|
||||
<programlisting>
|
||||
11/12/05 22:43:59 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-166-175-249.us-west-1.compute.internal:2181 sessionTimeout=180000 watcher=master:60000
|
||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Opening socket connection to server /10.166.175.249:2181
|
||||
11/12/05 22:43:59 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 14851@ip-10-166-175-249
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: successfully logged in.
|
||||
11/12/05 22:43:59 INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh thread started.
|
||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, initiating session
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT valid starting at: Mon Dec 05 22:43:59 UTC 2011
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT expires: Tue Dec 06 22:43:59 UTC 2011
|
||||
11/12/05 22:43:59 INFO zookeeper.Login: TGT refresh sleeping until: Tue Dec 06 18:30:37 UTC 2011
|
||||
11/12/05 22:43:59 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-166-175-249.us-west-1.compute.internal/10.166.175.249:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
|
||||
</programlisting>
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Configuration from Scratch</title>
|
||||
|
||||
This has been tested on the current standard Amazon
|
||||
Linux AMI. First setup KDC and principals as
|
||||
described above. Next checkout code and run a sanity
|
||||
check.
|
||||
|
||||
<programlisting>
|
||||
git clone git://git.apache.org/hbase.git
|
||||
cd hbase
|
||||
mvn -PlocalTests clean test -Dtest=TestZooKeeperACL
|
||||
</programlisting>
|
||||
|
||||
Then configure HBase as described above.
|
||||
Manually edit target/cached_classpath.txt (see below)..
|
||||
|
||||
<programlisting>
|
||||
bin/hbase zookeeper &
|
||||
bin/hbase master &
|
||||
bin/hbase regionserver &
|
||||
</programlisting>
|
||||
</section>
|
||||
|
||||
|
||||
<section>
|
||||
<title>Future improvements</title>
|
||||
|
||||
<section><title>Fix target/cached_classpath.txt</title>
|
||||
<para>
|
||||
You must override the standard hadoop-core jar file from the
|
||||
<code>target/cached_classpath.txt</code>
|
||||
file with the version containing the HADOOP-7070 fix. You can use the following script to do this:
|
||||
|
||||
<programlisting>
|
||||
echo `find ~/.m2 -name "*hadoop-core*7070*SNAPSHOT.jar"` ':' `cat target/cached_classpath.txt` | sed 's/ //g' > target/tmp.txt
|
||||
mv target/tmp.txt target/cached_classpath.txt
|
||||
</programlisting>
|
||||
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Set JAAS configuration
|
||||
programmatically</title>
|
||||
|
||||
|
||||
This would avoid the need for a separate Hadoop jar
|
||||
that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Elimination of
|
||||
<code>kerberos.removeHostFromPrincipal</code> and
|
||||
<code>kerberos.removeRealmFromPrincipal</code></title>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
|
||||
</section> <!-- SASL Authentication with ZooKeeper -->
|
||||
|
||||
|
||||
|
||||
|
||||
</chapter>
|
Loading…
Reference in New Issue