HBASE-10202 Documentation is lacking information about rolling-restart.sh script (Misty Stanley-Jones)

This commit is contained in:
Jonathan M Hsieh 2014-08-18 15:28:04 -07:00
parent c55ecba071
commit f282ade9a7
1 changed files with 132 additions and 42 deletions

View File

@ -793,48 +793,138 @@ false
<section <section
xml:id="rolling"> xml:id="rolling">
<title>Rolling Restart</title> <title>Rolling Restart</title>
<para> You can also ask this script to restart a RegionServer after the shutdown AND move its
old regions back into place. The latter you might do to retain data locality. A primitive <para>Some cluster configuration changes require either the entire cluster, or the
rolling restart might be effected by running something like the following:</para> RegionServers, to be restarted in order to pick up the changes. In addition, rolling
<screen language="bourne">$ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &amp;> /tmp/log.txt &amp;</screen> restarts are supported for upgrading to a minor or maintenance release, and to a major
<para> Tail the output of <filename>/tmp/log.txt</filename> to follow the scripts progress. release if at all possible. See the release notes for release you want to upgrade to, to
The above does RegionServers only. The script will also disable the load balancer before find out about limitations to the ability to perform a rolling upgrade.</para>
moving the regions. You'd need to do the master update separately. Do it before you run the <para>There are multiple ways to restart your cluster nodes, depending on your situation.
above script. Here is a pseudo-script for how you might craft a rolling restart script: </para> These methods are detailed below.</para>
<section>
<title>Using the <command>rolling-restart.sh</command> Script</title>
<para>HBase ships with a script, <filename>bin/rolling-restart.sh</filename>, that allows
you to perform rolling restarts on the entire cluster, the master only, or the
RegionServers only. The script is provided as a template for your own script, and is not
explicitly tested. It requires password-less SSH login to be configured and assumes that
you have deployed using a tarball. The script requires you to set some environment
variables before running it. Examine the script and modify it to suit your needs.</para>
<example>
<title><filename>rolling-restart.sh</filename> General Usage</title>
<screen language="bourne">
$ <userinput>./bin/rolling-restart.sh --help</userinput><![CDATA[
Usage: rolling-restart.sh [--config <hbase-confdir>] [--rs-only] [--master-only] [--graceful] [--maxthreads xx]
]]></screen>
</example>
<variablelist>
<varlistentry>
<term>Rolling Restart on RegionServers Only</term>
<listitem>
<para>To perform a rolling restart on the RegionServers only, use the
<code>--rs-only</code> option. This might be necessary if you need to reboot the
individual RegionServer or if you make a configuration change that only affects
RegionServers and not the other HBase processes.</para>
<para>If you need to restart only a single RegionServer, or if you need to do extra
actions during the restart, use the <filename>bin/graceful_stop.sh</filename>
command instead. See <xref linkend="rolling.restart.manual" />.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Rolling Restart on Masters Only</term>
<listitem>
<para>To perform a rolling restart on the active and backup Masters, use the
<code>--master-only</code> option. You might use this if you know that your
configuration change only affects the Master and not the RegionServers, or if you
need to restart the server where the active Master is running.</para>
<para>If you are not running backup Masters, the Master is simply restarted. If you
are running backup Masters, they are all stopped before any are restarted, to avoid
a race condition in ZooKeeper to determine which is the new Master. First the main
Master is restarted, then the backup Masters are restarted. Directly after restart,
it checks for and cleans out any regions in transition before taking on its normal
workload.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Graceful Restart</term>
<listitem>
<para>If you specify the <code>--graceful</code> option, RegionServers are restarted
using the <filename>bin/graceful_stop.sh</filename> script, which moves regions off
a RegionServer before restarting it. This is safer, but can delay the
restart.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>Limiting the Number of Threads</term>
<listitem>
<para>To limit the rolling restart to using only a specific number of threads, use the
<code>--maxthreads</code> option.</para>
</listitem>
</varlistentry>
</variablelist>
</section>
<section xml:id="rolling.restart.manual">
<title>Manual Rolling Restart</title>
<para>To retain more control over the process, you may wish to manually do a rolling restart
across your cluster. This uses the <command>graceful-stop.sh</command> command <xref
linkend="decommission" />. In this method, you can restart each RegionServer
individually and then move its old regions back into place, retaining locality. If you
also need to restart the Master, you need to do it separately, and restart the Master
before restarting the RegionServers using this method. The following is an example of such
a command. You may need to tailor it to your environment. This script does a rolling
restart of RegionServers only. It disables the load balancer before moving the
regions.</para>
<screen><![CDATA[
$ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &;
]]></screen>
<para>Monitor the output of the <filename>/tmp/log.txt</filename> file to follow the
progress of the script. </para>
</section>
<section>
<title>Logic for Crafting Your Own Rolling Restart Script</title>
<para>Use the following guidelines if you want to create your own rolling restart script.</para>
<orderedlist> <orderedlist>
<listitem> <listitem>
<para>Untar your release, make sure of its configuration and then rsync it across the <para>Extract the new release, verify its configuration, and synchronize it to all nodes
cluster. If this is 0.90.2, patch it with HBASE-3744 and HBASE-3756. </para> of your cluster using <command>rsync</command>, <command>scp</command>, or another
secure synchronization mechanism.</para></listitem>
<listitem><para>Use the hbck utility to ensure that the cluster is consistent.</para>
<screen>
$ ./bin/hbck
</screen>
<para>Perform repairs if required. See <xref linkend="hbck" /> for details.</para>
</listitem> </listitem>
<listitem> <listitem><para>Restart the master first. You may need to modify these commands if your
<para>Run hbck to ensure the cluster consistent new HBase directory is different from the old one, such as for an upgrade.</para>
<programlisting language="bourne">$ ./bin/hbase hbck</programlisting> Effect repairs if inconsistent. <screen>
</para> $ ./bin/hbase-daemon.sh stop master; ./bin/hbase-daemon.sh start master
</screen>
</listitem> </listitem>
<listitem> <listitem><para>Gracefully restart each RegionServer, using a script such as the
<para>Restart the Master: following, from the Master.</para>
<programlisting language="bourne">$ ./bin/hbase-daemon.sh stop master; ./bin/hbase-daemon.sh start master</programlisting> <screen><![CDATA[
</para> $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &
</listitem> ]]></screen>
<listitem> <para>If you are running Thrift or REST servers, pass the --thrift or --rest options.
<para>Run the <filename>graceful_stop.sh</filename> script per RegionServer. For For other available options, run the <command>bin/graceful-stop.sh --help</command>
example:</para> command.</para>
<programlisting language="bourne">$ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &amp;> /tmp/log.txt &amp; <para>It is important to drain HBase regions slowly when restarting multiple
</programlisting> RegionServers. Otherwise, multiple regions go offline simultaneously and must be
<para> If you are running thrift or rest servers on the RegionServer, pass --thrift or reassigned to other nodes, which may also go offline soon. This can negatively affect
--rest options (See usage for <filename>graceful_stop.sh</filename> script). </para> performance. You can inject delays into the script above, for instance, by adding a
</listitem> Shell command such as <command>sleep</command>. To wait for 5 minutes between each
<listitem> RegionServer restart, modify the above script to the following:</para>
<para>Restart the Master again. This will clear out dead servers list and reenable the <screen><![CDATA[
balancer. </para> $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i & sleep 5m; done &> /tmp/log.txt &
</listitem> ]]></screen>
<listitem>
<para>Run hbck to ensure the cluster is consistent. </para>
</listitem> </listitem>
<listitem><para>Restart the Master again, to clear out the dead servers list and re-enable
the load balancer.</para></listitem>
<listitem><para>Run the <command>hbck</command> utility again, to be sure the cluster is
consistent.</para></listitem>
</orderedlist> </orderedlist>
<para>It is important to drain HBase regions slowly when restarting regionservers. Otherwise, </section>
multiple regions go offline simultaneously as they are re-assigned to other nodes. Depending
on your usage patterns, this might not be desirable. </para>
</section> </section>
<section <section
xml:id="adding.new.node"> xml:id="adding.new.node">