HBASE-3499 Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1068277 13f79535-47bb-0310-9956-ffa450edef68
2011-02-08 07:01:13 +00:00 · 2011-02-08 07:01:13 +00:00 · d336634bab
parent d02659378e
commit d336634bab
2 changed files with 177 additions and 32 deletions
--- a/bin/set_meta_memstore_size.rb
+++ b/bin/set_meta_memstore_size.rb
@ -0,0 +1,91 @@
 #
 # Copyright 2011 The Apache Software Foundation
 #
 # Licensed to the Apache Software Foundation (ASF) under one
 # or more contributor license agreements.  See the NOTICE file
 # distributed with this work for additional information
 # regarding copyright ownership.  The ASF licenses this file
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
 # 
 # This script must be used on a live cluster in order to fix .META.'s 
 # MEMSTORE_SIZE back to 64MB instead of the 16KB that was configured 
 # in 0.20 era. This is only required if .META. was created at that time.
 #
 # After running this script, HBase needs to be restarted.
 #
 # To see usage for this script, run: 
 #
 #  ${HBASE_HOME}/bin/hbase org.jruby.Main set_meta_memstore_size.rb
 #
 include Java
 import org.apache.hadoop.hbase.util.Bytes
 import org.apache.hadoop.hbase.HConstants
 import org.apache.hadoop.hbase.HRegionInfo
 import org.apache.hadoop.hbase.client.HTable
 import org.apache.hadoop.hbase.client.Delete
 import org.apache.hadoop.hbase.client.Put
 import org.apache.hadoop.hbase.client.Scan
 import org.apache.hadoop.hbase.HTableDescriptor
 import org.apache.hadoop.hbase.HBaseConfiguration
 import org.apache.hadoop.hbase.util.FSUtils
 import org.apache.hadoop.hbase.util.Writables
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.fs.FileSystem
 import org.apache.commons.logging.LogFactory
 # Name of this script
 NAME = "set_meta_memstore_size.rb"
 # Print usage for this script
 def usage
  puts 'Usage: %s.rb]' % NAME
  exit!
 end
 # Get configuration to use.
 c = HBaseConfiguration.create()
 # Set hadoop filesystem configuration using the hbase.rootdir.
 # Otherwise, we'll always use localhost though the hbase.rootdir
 # might be pointing at hdfs location.
 c.set("fs.default.name", c.get(HConstants::HBASE_DIR))
 fs = FileSystem.get(c)
 # Get a logger and a metautils instance.
 LOG = LogFactory.getLog(NAME)
 # Check arguments
 if ARGV.size > 0
  usage
 end
 # Clean mentions of table from .META.
 # Scan the .META. and remove all lines that begin with tablename
 metaTable = HTable.new(c, HConstants::ROOT_TABLE_NAME)
 scan = Scan.new()
 scan.addColumn(HConstants::CATALOG_FAMILY, HConstants::REGIONINFO_QUALIFIER);
 scanner = metaTable.getScanner(scan)
 while (result = scanner.next())
  rowid = Bytes.toString(result.getRow())
  LOG.info("Settting memstore to 64MB on : " + rowid);
  hriValue = result.getValue(HConstants::CATALOG_FAMILY, HConstants::REGIONINFO_QUALIFIER)
  hri = Writables.getHRegionInfo(hriValue)
  htd = hri.getTableDesc()
  htd.setMemStoreFlushSize(64 * 1024 * 1024)
  p = Put.new(result.getRow())
  p.add(HConstants::CATALOG_FAMILY, HConstants::REGIONINFO_QUALIFIER, Writables.getBytes(hri));
  metaTable.put(p)
 end
 scanner.close()
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -288,34 +288,56 @@ Usually you'll want to use the latest version available except the problematic u
  <section xml:id="hadoop"><title><link xlink:href="http://hadoop.apache.org">hadoop</link><indexterm><primary>Hadoop</primary></indexterm></title>
 <para>This version of HBase will only run on <link xlink:href="http://hadoop.apache.org/common/releases.html">Hadoop 0.20.x</link>.
    It will not run on hadoop 0.21.x (nor 0.22.x) as of this writing.
-    HBase will lose data unless it is running on an HDFS that has a durable <code>sync</code>.
+    HBase will lose data unless it is running on an HDFS that has a
- Currently only the <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
+    durable <code>sync</code>.  Currently only the
- branch has this attribute.  No official releases have been made from this branch as of this writing
+    <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/">branch-0.20-append</link>
- so you will have to build your own Hadoop from the tip of this branch <footnote>
+    branch has this attribute
-     <para>Scroll down in the Hadoop <link xlink:href="http://wiki.apache.org/hadoop/HowToRelease">How To Release</link> to the section
+    <footnote>
-         Build Requirements for instruction on how to build Hadoop.
+    <para>
     </para>
 </footnote> or you could use
 Cloudera's <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>.
 CDH has the 0.20-append patches needed to add a durable sync (As of this writing
 CDH3 is still in beta.  Either CDH3b2 or CDH3b3 will suffice).
 See <link xlink:href="http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-append/CHANGES.txt">CHANGES.txt</link>
- in branch-0.20-append to see list of patches involved.</para>
+ in branch-0.20-append to see list of patches involved adding append on the Hadoop 0.20 branch.
- <para>Because HBase depends on Hadoop, it bundles an Hadoop instance under its <filename>lib</filename> directory.
+ </para>
- The bundled Hadoop was made from the Apache branch-0.20-append branch.
+ </footnote>.
- If you want to run HBase on an Hadoop cluster that is other than a version made from branch-0.20.append,
+    No official releases have been made from this branch up to now
- you must replace the hadoop jar found in the HBase <filename>lib</filename> directory with the
+    so you will have to build your own Hadoop from the tip of this branch.
    Scroll down in the Hadoop <link xlink:href="http://wiki.apache.org/hadoop/HowToRelease">How To Release</link> to the section
    <emphasis>Build Requirements</emphasis> for instruction on how to build Hadoop.
    </para>
 <para>
 Or rather than build your own, you could use
 Cloudera's <link xlink:href="http://archive.cloudera.com/docs/">CDH3</link>.
 CDH has the 0.20-append patches needed to add a durable sync (CDH3 is still in beta.
 Either CDH3b2 or CDH3b3 will suffice).
 </para>
 <para>Because HBase depends on Hadoop, it bundles an instance of
 the Hadoop jar under its <filename>lib</filename> directory.
 The bundled Hadoop was made from the Apache branch-0.20-append branch
 at the time of this HBase's release.
 It is <emphasis>critical</emphasis> that the version of Hadoop that is
 out on your cluster matches what is Hbase match.  Replace the hadoop
 jar found in the HBase <filename>lib</filename> directory with the
 hadoop jar you are running out on your cluster to avoid version mismatch issues.
 Make sure you replace the jar all over your cluster.
 For example, versions of CDH do not have HDFS-724 whereas
 Hadoops branch-0.20-append branch does have HDFS-724. This
 patch changes the RPC version because protocol was changed.
 Version mismatch issues have various manifestations but often all looks like its hung up.
 </para>
 <note><title>Can I just replace the jar in Hadoop 0.20.2 tarball with the <emphasis>sync</emphasis>-supporting Hadoop jar found in HBase?</title>
 <para>
 You could do this.  It works going by a recent posting up on the
 <link xlink:href="http://www.apacheserver.net/Using-Hadoop-bundled-in-lib-directory-HBase-at1136240.htm">mailing list</link>.
 </para>
 </note>
 <note><title>Hadoop Security</title>
     <para>HBase will run on any Hadoop 0.20.x that incorporates Hadoop security features -- e.g. Y! 0.20S or CDH3B3 -- as long
         as you do as suggested above and replace the Hadoop jar that ships with HBase with the secure version.
  </para>
  </note>
  </section>
 <section xml:id="ssh"> <title>ssh</title>
 <para><command>ssh</command> must be installed and <command>sshd</command> must
@ -903,6 +925,54 @@ index e70ebc6..96f8c27 100644
    </section>
    </section>
  </chapter>
    <chapter xml:id="upgrading">
    <title>Upgrading</title>
    <para>
    Review the <link linkend="requirements">requirements</link>
    section above, in particular the section on Hadoop version.
    </para>
    <section xml:id="upgrade0.90">
    <title>Upgrading to HBase 0.90.x from 0.20.x or 0.89.x</title>
          <para>This version of 0.90.x HBase can be started on data written by
              HBase 0.20.x or HBase 0.89.x.  There is no need of a migration step.
              HBase 0.89.x and 0.90.x does write out the name of region directories
              differently -- it names them with a md5 hash of the region name rather
              than a jenkins hash -- so this means that once started, there is no
              going back to HBase 0.20.x.
          </para>
          <para>
             Be sure to remove the <filename>hbase-default.xml</filename> from
             your <filename>conf</filename>
             directory on upgrade.  A 0.20.x version of this file will have
             sub-optimal configurations for 0.90.x HBase.  The
             <filename>hbase-default.xml</filename> file is now bundled into the
             HBase jar and read from there.  If you would like to review
             the content of this file, see it in the src tree at
             <filename>src/main/resources/hbase-default.xml</filename> or
             see <link linkend="hbase_default_configurations">Default HBase Configurations</link>.
          </para>
          <para>
            Finally, if upgrading from 0.20.x, check your 
            <varname>.META.</varname> schema in the shell.  In the past we would
            recommend that users run with a 16MB
            <varname>MEMSTORE_SIZE</varname>.
            Run <code>hbase> scan '-ROOT-'</code> in the shell. This will output
            the current <varname>.META.</varname> schema.  Check
            <varname>MEMSTORE_SIZE</varname> size.  Is it 16MB?  If so, you will
            need to change this.  Run the script <filename>bin/set_meta_memstore_size.rb</filename>.
            This will make the necessary edit to your <varname>.META.</varname> schema.
            Failure to run this change will make for a slow cluster <footnote>
            <para>
            See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3499">HBASE-3499 Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE</link>
            </para>
            </footnote>
            .
          </para>
          </section>
    </chapter>
  <chapter xml:id="configuration">
@ -2021,22 +2091,6 @@ When I build, why do I always get <code>Unable to find resource 'VM_global_libra
                </para>
            </answer>
        </qandaentry>
    </qandadiv>
        <qandadiv><title>Upgrading your HBase</title>
        <qandaentry>
            <question xml:id="0_90_upgrade"><para>
            Whats involved upgrading to HBase 0.90.x from 0.89.x or from 0.20.x?
            </para></question>
            <answer>
          <para>This version of 0.90.x HBase can be started on data written by
              HBase 0.20.x or HBase 0.89.x.  There is no need of a migration step.
              HBase 0.89.x and 0.90.x does write out the name of region directories
              differently -- it names them with a md5 hash of the region name rather
              than a jenkins hash -- so this means that once started, there is no
              going back to HBase 0.20.x.
          </para>
            </answer>
        </qandaentry>
    </qandadiv>
    </qandaset>
  </appendix>