Edited and additions to pseudo-distributed section after trying it and finding what was there missing

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1521252 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2013-09-09 19:23:18 +00:00
parent 87b4bfefa0
commit 65c68a146e
2 changed files with 59 additions and 39 deletions

View File

@ -39,7 +39,7 @@
</inlinemediaobject>
</link>
</subtitle>
<copyright><year>2012</year><holder>Apache Software Foundation.
<copyright><year>2013</year><holder>Apache Software Foundation.
All Rights Reserved. Apache Hadoop, Hadoop, MapReduce, HDFS, Zookeeper, HBase, and the HBase project logo are trademarks of the Apache Software Foundation.
</holder>
</copyright>

View File

@ -351,12 +351,8 @@ to ensure well-formedness of your document after an edit session.
<title>HBase run modes: Standalone and Distributed</title>
<para>HBase has two run modes: <xref linkend="standalone" /> and <xref linkend="distributed" />. Out of the box, HBase runs in
standalone mode. To set up a distributed deploy, you will need to
configure HBase by editing files in the HBase <filename>conf</filename>
directory.</para>
<para>Whatever your mode, you will need to edit
<code>conf/hbase-env.sh</code> to tell HBase which
standalone mode. Whatever your mode, you will need to configure HBase by editing files in the HBase <filename>conf</filename>
directory. At a minimum, you must edit <code>conf/hbase-env.sh</code> to tell HBase which
<command>java</command> to use. In this file you set HBase environment
variables such as the heapsize and other options for the
<application>JVM</application>, the preferred location for log files,
@ -386,11 +382,12 @@ to ensure well-formedness of your document after an edit session.
comes from Hadoop.</para>
</footnote>.</para>
<para>Distributed modes require an instance of the <emphasis>Hadoop
Distributed File System</emphasis> (HDFS). See the Hadoop <link
<para>Pseudo-distributed mode can run against the local filesystem or
it can run against an instance of the <emphasis>Hadoop
Distributed File System</emphasis> (HDFS). Fully-distributed mode can
ONLY run on HDFS. See the Hadoop <link
xlink:href="http://hadoop.apache.org/common/docs/r1.1.1/api/overview-summary.html#overview_description">
requirements and instructions</link> for how to set up a HDFS. Before
proceeding, ensure you have an appropriate, working HDFS.</para>
requirements and instructions</link> for how to set up HDFS.</para>
<para>Below we describe the different distributed setups. Starting,
verification and exploration of your install, whether a
@ -399,45 +396,65 @@ to ensure well-formedness of your document after an edit session.
section that follows, <xref linkend="confirm" />. The same verification script applies to both
deploy types.</para>
<section xml:id="pseudo">
<title>Pseudo-distributed</title>
<para>A pseudo-distributed mode is simply a distributed mode run on
<para>A pseudo-distributed mode is simply a fully-distributed mode run on
a single host. Use this configuration testing and prototyping on
HBase. Do not use this configuration for production nor for
evaluating HBase performance.</para>
<para>First, setup your HDFS in <link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>.
</para>
<para>Next, configure HBase. Below is an example <filename>conf/hbase-site.xml</filename>.
This is the file into
which you add local customizations and overrides for
<xref linkend="hbase_default_configurations" /> and <xref linkend="hdfs_client_conf" />.
Note that the <varname>hbase.rootdir</varname> property points to the
local HDFS instance.
<para>First, if you want to run on HDFS rather than on the local filesystem,
setup your HDFS. You can set up HDFS also in
<link xlink:href="http://hadoop.apache.org/docs/r1.0.3/single_node_setup.html">pseudo-distributed mode</link>.
Ensure you have a working HDFS before proceeding.
</para>
<para>Now skip to <xref linkend="confirm" /> for how to start and verify your
pseudo-distributed install. <footnote>
<para>See <xref linkend="pseudo.extras">Pseudo-distributed
mode extras</xref> for notes on how to start extra Masters and
RegionServers when running pseudo-distributed.</para>
</footnote></para>
<para>Next, configure HBase. Edit <filename>conf/hbase-site.xml</filename>.
This is the file into which you add local customizations and overrides.
At a minimum, you must tell HBase to run in (pseudo-)distributed mode rather than
in default standalone mode. To do this, set the <varname>hbase.cluster.distributed</varname>
property to true (Its default is <varname>false</varname>). The absolute bare-minimum
<filename>hbase-site.xml</filename> is therefore as follows:
<programlisting>
&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;hbase.cluster.distributed&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;
</programlisting>
With this configuration, HBase will start up an HBase Master process, a ZooKeeper server,
and a RegionServer process running against the
local filesystem writing to wherever your operating system stores temporary files into a directory
named <filename>hbase-YOUR_USER_NAME</filename>.</para>
<para>Such a setup, using the local filesystem and
writing to the operating systems's temporary directory is an ephemeral setup; the Hadoop
local filesystem -- which is what HBase uses when it is writing the local filesytem does not
support <command>sync</command> so unless the system is shutdown properly, the data will be lost. Writing to
the operating system's temporary directory can also make for data loss when the machine
is restarted as this directory is usually cleared on reboot. For a more permanent
setup, see the next example where we make use of an instance of HDFS; HBase data will
be written to the Hadoop distributed filesystem rather than to the local filesystem's
tmp directory.</para>
<para>In this <filename>conf/hbase-site.xml</filename> example, the
<varname>hbase.rootdir</varname> property points to the local HDFS instance
homed on the node <varname>h-24-30.example.com</varname>.
<note>
<title>Let HBase create <filename>${hbase.rootdir}</filename></title>
<para>Let HBase create the <varname>hbase.rootdir</varname>
directory. If you don't, you'll get warning saying HBase needs a
migration run because the directory is missing files expected by
HBase (it'll create them if you let it).</para>
</note>
<section xml:id="pseudo.config">
<title>Pseudo-distributed Configuration File</title>
<para>Below is a sample pseudo-distributed file for the node <varname>h-24-30.example.com</varname>.
<filename>hbase-site.xml</filename>
<programlisting>
&lt;configuration&gt;
...
&lt;property&gt;
&lt;name&gt;hbase.rootdir&lt;/name&gt;
&lt;value&gt;hdfs://h-24-30.sfo.stumble.net:8020/hbase&lt;/value&gt;
@ -446,16 +463,15 @@ to ensure well-formedness of your document after an edit session.
&lt;name&gt;hbase.cluster.distributed&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hbase.zookeeper.quorum&lt;/name&gt;
&lt;value&gt;h-24-30.sfo.stumble.net&lt;/value&gt;
&lt;/property&gt;
...
&lt;/configuration&gt;
</programlisting>
</para>
</section>
<para>Now skip to <xref linkend="confirm" /> for how to start and verify your
pseudo-distributed install. <footnote>
<para>See <xref linkend="pseudo.extras">Pseudo-distributed
mode extras</xref> for notes on how to start extra Masters and
RegionServers when running pseudo-distributed.</para>
</footnote></para>
<section xml:id="pseudo.extras">
<title>Pseudo-distributed Extras</title>
@ -495,6 +511,10 @@ to ensure well-formedness of your document after an edit session.
</section>
<section xml:id="fully_dist">
<title>Fully-distributed</title>