HBASE-3499 Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1068277 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2011-02-08 07:01:13 +00:00
parent d02659378e
commit d336634bab
2 changed files with 177 additions and 32 deletions

View File

@ -0,0 +1,91 @@
#
# Copyright 2011 The Apache Software Foundation
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# This script must be used on a live cluster in order to fix .META.'s
# MEMSTORE_SIZE back to 64MB instead of the 16KB that was configured
# in 0.20 era. This is only required if .META. was created at that time.
#
# After running this script, HBase needs to be restarted.
#
# To see usage for this script, run:
#
# ${HBASE_HOME}/bin/hbase org.jruby.Main set_meta_memstore_size.rb
#
include Java
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.HConstants
import org.apache.hadoop.hbase.HRegionInfo
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Delete
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.client.Scan
import org.apache.hadoop.hbase.HTableDescriptor
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.util.FSUtils
import org.apache.hadoop.hbase.util.Writables
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
import org.apache.commons.logging.LogFactory
# Name of this script
NAME = "set_meta_memstore_size.rb"
# Print usage for this script
def usage
puts 'Usage: %s.rb]' % NAME
exit!
end
# Get configuration to use.
c = HBaseConfiguration.create()
# Set hadoop filesystem configuration using the hbase.rootdir.
# Otherwise, we'll always use localhost though the hbase.rootdir
# might be pointing at hdfs location.
c.set("fs.default.name", c.get(HConstants::HBASE_DIR))
fs = FileSystem.get(c)
# Get a logger and a metautils instance.
LOG = LogFactory.getLog(NAME)
# Check arguments
if ARGV.size > 0
usage
end
# Clean mentions of table from .META.
# Scan the .META. and remove all lines that begin with tablename
metaTable = HTable.new(c, HConstants::ROOT_TABLE_NAME)
scan = Scan.new()
scan.addColumn(HConstants::CATALOG_FAMILY, HConstants::REGIONINFO_QUALIFIER);
scanner = metaTable.getScanner(scan)
while (result = scanner.next())
rowid = Bytes.toString(result.getRow())
LOG.info("Settting memstore to 64MB on : " + rowid);
hriValue = result.getValue(HConstants::CATALOG_FAMILY, HConstants::REGIONINFO_QUALIFIER)
hri = Writables.getHRegionInfo(hriValue)
htd = hri.getTableDesc()
htd.setMemStoreFlushSize(64 * 1024 * 1024)
p = Put.new(result.getRow())
p.add(HConstants::CATALOG_FAMILY, HConstants::REGIONINFO_QUALIFIER, Writables.getBytes(hri));
metaTable.put(p)
end
scanner.close()

View File

@ -288,34 +288,56 @@ Usually you'll want to use the latest version available except the problematic u
<section xml:id="hadoop"><title><link xlink:href="http://hadoop.apache.org">hadoop</link><indexterm><primary>Hadoop</primary></indexterm></title>
<para>This version of HBase will only run on <link xlink:href="http://hadoop.apache.org/common/releases.html">Hadoop 0.20.x</link>.
It will not run on hadoop 0.21.x (nor 0.22.x) as of this writing.
HBase will lose data unless it is running on an HDFS that has a durable <code>sync</code>.
Currently only the <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
branch has this attribute. No official releases have been made from this branch as of this writing
so you will have to build your own Hadoop from the tip of this branch <footnote>
<para>Scroll down in the Hadoop <link xlink:href="http://wiki.apache.org/hadoop/HowToRelease">How To Release</link> to the section
Build Requirements for instruction on how to build Hadoop.
</para>
</footnote> or you could use
Cloudera's <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>.
CDH has the 0.20-append patches needed to add a durable sync (As of this writing
CDH3 is still in beta. Either CDH3b2 or CDH3b3 will suffice).
HBase will lose data unless it is running on an HDFS that has a
durable <code>sync</code>. Currently only the
<link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
branch has this attribute
<footnote>
<para>
See <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/CHANGES.txt">CHANGES.txt</link>
in branch-0.20-append to see list of patches involved.</para>
<para>Because HBase depends on Hadoop, it bundles an Hadoop instance under its <filename>lib</filename> directory.
The bundled Hadoop was made from the Apache branch-0.20-append branch.
If you want to run HBase on an Hadoop cluster that is other than a version made from branch-0.20.append,
you must replace the hadoop jar found in the HBase <filename>lib</filename> directory with the
in branch-0.20-append to see list of patches involved adding append on the Hadoop 0.20 branch.
</para>
</footnote>.
No official releases have been made from this branch up to now
so you will have to build your own Hadoop from the tip of this branch.
Scroll down in the Hadoop <link xlink:href="http://wiki.apache.org/hadoop/HowToRelease">How To Release</link> to the section
<emphasis>Build Requirements</emphasis> for instruction on how to build Hadoop.
</para>
<para>
Or rather than build your own, you could use
Cloudera's <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>.
CDH has the 0.20-append patches needed to add a durable sync (CDH3 is still in beta.
Either CDH3b2 or CDH3b3 will suffice).
</para>
<para>Because HBase depends on Hadoop, it bundles an instance of
the Hadoop jar under its <filename>lib</filename> directory.
The bundled Hadoop was made from the Apache branch-0.20-append branch
at the time of this HBase's release.
It is <emphasis>critical</emphasis> that the version of Hadoop that is
out on your cluster matches what is Hbase match. Replace the hadoop
jar found in the HBase <filename>lib</filename> directory with the
hadoop jar you are running out on your cluster to avoid version mismatch issues.
Make sure you replace the jar all over your cluster.
For example, versions of CDH do not have HDFS-724 whereas
Hadoops branch-0.20-append branch does have HDFS-724. This
patch changes the RPC version because protocol was changed.
Version mismatch issues have various manifestations but often all looks like its hung up.
</para>
<note><title>Can I just replace the jar in Hadoop 0.20.2 tarball with the <emphasis>sync</emphasis>-supporting Hadoop jar found in HBase?</title>
<para>
You could do this. It works going by a recent posting up on the
<link xlink:href="http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm">mailing list</link>.
</para>
</note>
<note><title>Hadoop Security</title>
<para>HBase will run on any Hadoop 0.20.x that incorporates Hadoop security features -- e.g. Y! 0.20S or CDH3B3 -- as long
as you do as suggested above and replace the Hadoop jar that ships with HBase with the secure version.
</para>
</note>
</section>
<section xml:id="ssh"> <title>ssh</title>
<para><command>ssh</command> must be installed and <command>sshd</command> must
@ -903,8 +925,56 @@ index e70ebc6..96f8c27 100644
</section>
</section>
</chapter>
<chapter xml:id="upgrading">
<title>Upgrading</title>
<para>
Review the <link linkend="requirements">requirements</link>
section above, in particular the section on Hadoop version.
</para>
<section xml:id="upgrade0.90">
<title>Upgrading to HBase 0.90.x from 0.20.x or 0.89.x</title>
<para>This version of 0.90.x HBase can be started on data written by
HBase 0.20.x or HBase 0.89.x. There is no need of a migration step.
HBase 0.89.x and 0.90.x does write out the name of region directories
differently -- it names them with a md5 hash of the region name rather
than a jenkins hash -- so this means that once started, there is no
going back to HBase 0.20.x.
</para>
<para>
Be sure to remove the <filename>hbase-default.xml</filename> from
your <filename>conf</filename>
directory on upgrade. A 0.20.x version of this file will have
sub-optimal configurations for 0.90.x HBase. The
<filename>hbase-default.xml</filename> file is now bundled into the
HBase jar and read from there. If you would like to review
the content of this file, see it in the src tree at
<filename>src/main/resources/hbase-default.xml</filename> or
see <link linkend="hbase_default_configurations">Default HBase Configurations</link>.
</para>
<para>
Finally, if upgrading from 0.20.x, check your
<varname>.META.</varname> schema in the shell. In the past we would
recommend that users run with a 16MB
<varname>MEMSTORE_SIZE</varname>.
Run <code>hbase> scan '-ROOT-'</code> in the shell. This will output
the current <varname>.META.</varname> schema. Check
<varname>MEMSTORE_SIZE</varname> size. Is it 16MB? If so, you will
need to change this. Run the script <filename>bin/set_meta_memstore_size.rb</filename>.
This will make the necessary edit to your <varname>.META.</varname> schema.
Failure to run this change will make for a slow cluster <footnote>
<para>
See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3499">HBASE-3499 Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE</link>
</para>
</footnote>
.
</para>
</section>
</chapter>
<chapter xml:id="configuration">
<title>Configuration</title>
<para>
@ -2021,22 +2091,6 @@ When I build, why do I always get <code>Unable to find resource 'VM_global_libra
</para>
</answer>
</qandaentry>
</qandadiv>
<qandadiv><title>Upgrading your HBase</title>
<qandaentry>
<question xml:id="0_90_upgrade"><para>
Whats involved upgrading to HBase 0.90.x from 0.89.x or from 0.20.x?
</para></question>
<answer>
<para>This version of 0.90.x HBase can be started on data written by
HBase 0.20.x or HBase 0.89.x. There is no need of a migration step.
HBase 0.89.x and 0.90.x does write out the name of region directories
differently -- it names them with a md5 hash of the region name rather
than a jenkins hash -- so this means that once started, there is no
going back to HBase 0.20.x.
</para>
</answer>
</qandaentry>
</qandadiv>
</qandaset>
</appendix>