hbase-6082. porting hbck document to the RefGuide Appendix

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1342094 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2012-05-23 23:41:02 +00:00
parent a51230651c
commit 49349a35b0
2 changed files with 191 additions and 0 deletions

View File

@ -2711,6 +2711,195 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
</qandaset>
</appendix>
<appendix xml:id="hbck.in.depth">
<title>hbck In Depth</title>
<para>HBaseFsck (hbck) is a tool for checking for region consistency and table integrity problems
and repairing a corrupted HBase. It works in two basic modes -- a read-only inconsistency
identifying mode and a multi-phase read-write repair mode.
</para>
<section>
<title>Running hbck to identify inconsistencies</title>
To check to see if your HBase cluster has corruptions, run hbck against your HBase cluster:
<programlisting>
$ ./bin/hbase hbck
</programlisting>
<para>
At the end of the commands output it prints OK or tells you the number of INCONSISTENCIES
present. You may also want to run run hbck a few times because some inconsistencies can be
transient (e.g. cluster is starting up or a region is splitting). Operationally you may want to run
hbck regularly and setup alert (e.g. via nagios) if it repeatedly reports inconsistencies .
A run of hbck will report a list of inconsistencies along with a brief description of the regions and
tables affected. The using the <code>-details</code> option will report more details including a representative
listing of all the splits present in all the tables.
</para>
<programlisting>
$ ./bin/hbase hbck -details
</programlisting>
</section>
<section><title>Inconsistencies</title>
<para>
If after several runs, inconsistencies continue to be reported, you may have encountered a
corruption. These should be rare, but in the event they occur newer versions of HBase include
the hbck tool enabled with automatic repair options.
</para>
<para>
There are two invariants that when violated create inconsistencies in HBase:
</para>
<itemizedlist>
<listitem>HBases region consistency invariant is satisfied if every region is assigned and
deployed on exactly one region server, and all places where this state kept is in
accordance.
</listitem>
<listitem>HBases table integrity invariant is satisfied if for each table, every possible row key
resolves to exactly one region.
</listitem>
</itemizedlist>
<para>
Repairs generally work in three phases -- a read-only information gathering phase that identifies
inconsistencies, a table integrity repair phase that restores the table integrity invariant, and then
finally a region consistency repair phase that restores the region consistency invariant.
Starting from version 0.90.0, hbck could detect region consistency problems report on a subset
of possible table integrity problems. It also included the ability to automatically fix the most
common inconsistency, region assignment and deployment consistency problems. This repair
could be done by using the <code>-fix</code> command line option. These problems close regions if they are
open on the wrong server or on multiple region servers and also assigns regions to region
servers if they are not open.
</para>
<para>
Starting from HBase versions 0.90.7, 0.92.2 and 0.94.0, several new command line options are
introduced to aid repairing a corrupted HBase. This hbck sometimes goes by the nickname
“uberhbck”. Each particular version of uber hbck is compatible with the HBases of the same
major version (0.90.7 uberhbck can repair a 0.90.4). However, versions &lt;=0.90.6 and versions
&lt;=0.92.1 may require restarting the master or failing over to a backup master.
</para>
</section>
<section><title>Localized repairs</title>
<para>
When repairing a corrupted HBase, it is best to repair the lowest risk inconsistencies first.
These are generally region consistency repairs -- localized single region repairs, that only modify
in-memory data, ephemeral zookeeper data, or patch holes in the META table.
Region consistency requires that the HBase instance has the state of the regions data in HDFS
(.regioninfo files), the regions row in the .META. table., and regions deployment/assignments on
region servers and the master in accordance. Options for repairing region consistency include:
<itemizedlist>
<listitem><code>-fixAssignments</code> (equivalent to the 0.90 <code>-fix</code> option) repairs unassigned, incorrectly
assigned or multiply assigned regions.
</listitem>
<listitem><code>-fixMeta</code> which removes meta rows when corresponding regions are not present in
HDFS and adds new meta rows if they regions are present in HDFS while not in META.
</listitem>
</itemizedlist>
To fix deployment and assignment problems you can run this command:
</para>
<programlisting>
$ ./bin/hbase hbck -fixAssignments
</programlisting>
To fix deployment and assignment problems as well as repairing incorrect meta rows you can
run this command:.
<programlisting>
$ ./bin/hbase hbck -fixAssignments -fixMeta
</programlisting>
There are a few classes of table integrity problems that are low risk repairs. The first two are
degenerate (startkey == endkey) regions and backwards regions (startkey > endkey). These are
automatically handled by sidelining the data to a temporary directory (/hbck/xxxx).
The third low-risk class is hdfs region holes. This can be repaired by using the:
<itemizedlist>
<listitem><code>-fixHdfsHoles</code> option for fabricating new empty regions on the file system.
If holes are detected you can use -fixHdfsHoles and should include -fixMeta and -fixAssignments to make the new region consistent.
</listitem>
</itemizedlist>
<programlisting>
$ ./bin/hbase hbck -fixAssignments -fixMeta -fixHdfsHoles
</programlisting>
Since this is a common operation, weve added a the <code>-repairHoles</code> flag that is equivalent to the
previous command:
<programlisting>
$ ./bin/hbase hbck -repairHoles
</programlisting>
If inconsistencies still remain after these steps, you most likely have table integrity problems
related to orphaned or overlapping regions.
</section>
<section><title>Region Overlap Repairs</title>
Table integrity problems can require repairs that deal with overlaps. This is a riskier operation
because it requires modifications to the file system, requires some decision making, and may
require some manual steps. For these repairs it is best to analyze the output of a <code>hbck -details</code>
run so that you isolate repairs attempts only upon problems the checks identify. Because this is
riskier, there are safeguard that should be used to limit the scope of the repairs.
WARNING: This is a relatively new and have only been tested on online but idle HBase instances
(no reads/writes). Use at your own risk in an active production environment!
The options for repairing table integrity violations include:
<itemizedlist>
<listitem><code>-fixHdfsOrphans</code> option for “adopting” a region directory that is missing a region
metadata file (the .regioninfo file).
</listitem>
<listitem><code>-fixHdfsOverlaps</code> ability for fixing overlapping regions
</listitem>
</itemizedlist>
When repairing overlapping regions, a regions data can be modified on the file system in two
ways: 1) by merging regions into a larger region or 2) by sidelining regions by moving data to
“sideline” directory where data could be restored later. Merging a large number of regions is
technically correct but could result in an extremely large region that requires series of costly
compactions and splitting operations. In these cases, it is probably better to sideline the regions
that overlap with the most other regions (likely the largest ranges) so that merges can happen on
a more reasonable scale. Since these sidelined regions are already laid out in HBases native
directory and HFile format, they can be restored by using HBases bulk load mechanism.
The default safeguard thresholds are conservative. These options let you override the default
thresholds and to enable the large region sidelining feature.
<itemizedlist>
<listitem><code>-maxMerge &lt;n&gt;</code> maximum number of overlapping regions to merge
</listitem>
<listitem><code>-sidelineBigOverlaps</code> if more than maxMerge regions are overlapping, sideline attempt
to sideline the regions overlapping with the most other regions.
</listitem>
<listitem><code>-maxOverlapsToSideline &lt;n&gt;</code> if sidelining large overlapping regions, sideline at most n
regions.
</listitem>
</itemizedlist>
Since often times you would just want to get the tables repaired, you can use this option to turn
on all repair options:
<itemizedlist>
<listitem><code>-repair</code> includes all the region consistency options and only the hole repairing table
integrity options.
</listitem>
</itemizedlist>
Finally, there are safeguards to limit repairs to only specific tables. For example the following
command would only attempt to repair table TableFoo and TableBar.
<programlisting>
$ ./bin/hbase/ hbck -repair TableFoo TableBar
</programlisting>
<section><title>Special cases: Meta is not properly assigned</title>
There are a few special cases that hbck can handle as well.
Sometimes the meta tables only region is inconsistently assigned or deployed. In this case
there is a special <code>-fixMetaOnly</code> option that can try to fix meta assignments.
<programlisting>
$ ./bin/hbase hbck -fixMetaOnly -fixAssignments
</programlisting>
</section>
<section><title>Special cases: HBase version file is missing</title>
HBases data on the file system requires a version file in order to start. If this flie is missing, you
can use the <code>-fixVersionFile</code> option to fabricating a new HBase version file. This assumes that
the version of hbck you are running is the appropriate version for the HBase cluster.
</section>
<section><title>Special case: Root and META are corrupt.</title>
The most drastic corruption scenario is the case where the ROOT or META is corrupted and
HBase will not start. In this case you can use the OfflineMetaRepair tool create new ROOT
and META regions and tables.
This tool assumes that HBase is offline. It then marches through the existing HBase home
directory, loads as much information from region metadata files (.regioninfo files) as possible
from the file system. If the region metadata has proper table integrity, it sidelines the original root
and meta table directories, and builds new ones with pointers to the region directories and their
data.
<programlisting>
$ ./bin/hbase org.apache.hadoop.hbase.util.OfflineMetaRepair
</programlisting>
NOTE: This tool is not as clever as uberhbck but can be used to bootstrap repairs that uberhbck
can complete.
If the tool succeeds you should be able to start hbase and run online repairs if necessary.
</section>
</section>
</appendix>
<appendix xml:id="compression">
<title >Compression In HBase<indexterm><primary>Compression</primary></indexterm></title>

View File

@ -69,6 +69,8 @@ Valid program names are:
Passing <command>-fix</command> may correct the inconsistency (This latter
is an experimental feature).
</para>
<para>For more information, see <xref linkend="hbck.in.depth"/>.
</para>
</section>
<section xml:id="hfile_tool2"><title>HFile Tool</title>
<para>See <xref linkend="hfile_tool" />.</para>