HBASE-11986 Addendum - Added hbase_mob.xml
This commit is contained in:
parent
7dbd828a90
commit
ac65b1fb75
|
@ -0,0 +1,227 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<chapter version="5.0" xml:id="mob" xmlns="http://docbook.org/ns/docbook"
|
||||||
|
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude"
|
||||||
|
xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML"
|
||||||
|
xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook">
|
||||||
|
<!--
|
||||||
|
/**
|
||||||
|
* Licensed to the Apache Software Foundation (ASF) under one
|
||||||
|
* or more contributor license agreements. See the NOTICE file
|
||||||
|
* distributed with this work for additional information
|
||||||
|
* regarding copyright ownership. The ASF licenses this file
|
||||||
|
* to you under the Apache License, Version 2.0 (the
|
||||||
|
* "License"); you may not use this file except in compliance
|
||||||
|
* with the License. You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
-->
|
||||||
|
|
||||||
|
<title>HBase Medium Object (MOB) Storage</title>
|
||||||
|
<para>Data comes in many sizes, and saving all of your data in HBase, including binary data such
|
||||||
|
as images and documents, is ideal. HBase can technically handle binary objects with cells
|
||||||
|
that are up to 10MB in size. However, HBase's normal read and write paths are optimized for
|
||||||
|
values smaller than 100KB in size. When HBase deals with large numbers of values up to 10MB,
|
||||||
|
referred to here as <firstterm>medium objects</firstterm>, or <firstterm>MOBs</firstterm>,
|
||||||
|
performance is degraded due to write amplification caused by splits and compactions. HBase
|
||||||
|
2.0+ adds support for better managing large numbers of MOBs while maintaining performance,
|
||||||
|
consistency, and low operational overhead. MOB support is provided by the work done in <link
|
||||||
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-11339"
|
||||||
|
>HBASE-11339</link>.</para>
|
||||||
|
|
||||||
|
<para>To take advantage of MOB, you need to use HFile version 3. Optionally, configure the MOB
|
||||||
|
file reader's cache settings for each RegionServer (see <xref linkend="mob.cache.configure"
|
||||||
|
/>), then configure specific columns to hold MOB data. Currently, you also need to configure
|
||||||
|
a periodic re-optimization of MOB data layout, but this requirement is expected to be
|
||||||
|
removed at a later date.</para>
|
||||||
|
<para>Client code does not need to change to take advantage of HBase MOB support. The feature is
|
||||||
|
transparent to the client.</para>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Limitations of MOB Functionality</title>
|
||||||
|
<para>Work on HBase MOB is ongoing. Work is needed for support for snapshots (<link
|
||||||
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-11645">HBASE-11645</link>),
|
||||||
|
metrics (<link xlink:href="https://issues.apache.org/jira/browse/HBASE-11683"
|
||||||
|
>HBASE-11683</link>), and a native compaction mechanism (<link
|
||||||
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-11861"
|
||||||
|
>HBASE-11861)</link>.</para>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Configure Columns for MOB</title>
|
||||||
|
<para>You can configure columns to support MOB during table creation or alteration, either
|
||||||
|
in HBase Shell or via the Java API. The two relevant properties are the boolean
|
||||||
|
<code>IS_MOB</code> and the <code>MOB_THRESHOLD</code>, which is the number of bytes
|
||||||
|
at which an object is considered to be a MOB. Only <code>IS_MOB</code> is required. If
|
||||||
|
you do not specify the <code>MOB_THRESHOLD</code>, the default threshold value of 100 kb
|
||||||
|
is used.</para>
|
||||||
|
<example>
|
||||||
|
<title>Configure a Column for MOB Using HBase Shell</title>
|
||||||
|
<screen>
|
||||||
|
hbase> create 't1', 'f1', {IS_MOB => true, MOB_THRESHOLD => 102400}
|
||||||
|
hbase> alter ‘t1′, {NAME => ‘f1', IS_MOB => true, MOB_THRESHOLD => 102400}
|
||||||
|
</screen>
|
||||||
|
</example>
|
||||||
|
<example>
|
||||||
|
<title>Configure a Column for MOB Using the API</title>
|
||||||
|
<programlisting language="java">
|
||||||
|
...
|
||||||
|
HColumnDescriptor hcd = new HColumnDescriptor(“f”);
|
||||||
|
hcd.setValue(MobConstants.IS_MOB, Bytes.toBytes(Boolean.TRUE));
|
||||||
|
...
|
||||||
|
HColumnDescriptor hcd;
|
||||||
|
hcd.setValue(MobConstants.MOB_THRESHOLD, Bytes.toBytes(102400L);
|
||||||
|
...
|
||||||
|
</programlisting>
|
||||||
|
</example>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Testing MOB</title>
|
||||||
|
<para>The utility <command>org.apache.hadoop.hbase.IntegrationTestIngestMOB</command> is
|
||||||
|
provided to assist with testing the MOB feature. The utility is run as follows:</para>
|
||||||
|
<screen>$ <userinput>sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
|
||||||
|
-threshold 100*1024 \
|
||||||
|
-minMobDataSize 100*1024*4/5 \
|
||||||
|
-maxMobDataSize 100*1024*50</userinput></screen>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>
|
||||||
|
<para><literal>threshold</literal> is the threshold at which cells are considered to
|
||||||
|
be MOBs. The default is 100 kb.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para><literal>minMobDataSize</literal> is the minimum value for the size of MOB
|
||||||
|
data. The default is 80 kb.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<para><literal>maxMobDataSize</literal> is the maximum value for the size of MOB
|
||||||
|
data. The default is 5 MB.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Set Up MOB Re-Optimization Tasks</title>
|
||||||
|
<para>The MOB feature introduces a new read and write path to HBase and currently requires
|
||||||
|
an external tool, the <command>sweeper</command> tool, for housekeeping and
|
||||||
|
optimization. The <command>sweeper</command> tool uses MapReduce to coalesce small MOB
|
||||||
|
files or MOB files with many deletions or updates</para>
|
||||||
|
|
||||||
|
<procedure>
|
||||||
|
<title>Configure and Run the <command>sweeper</command> Tool</title>
|
||||||
|
<step>
|
||||||
|
<para>First, configure the <command>sweeper</command>'s properties in the
|
||||||
|
RegionServer's <filename>hbase-site.xml</filename> file. Adjust these properties
|
||||||
|
to suit your environment.</para>
|
||||||
|
<programlisting language="xml"><![CDATA[
|
||||||
|
<property>
|
||||||
|
<name>hbase.mob.sweep.tool.compaction.ratio</name>
|
||||||
|
<value>0.5f</value>
|
||||||
|
<description>
|
||||||
|
If there're too many cells deleted in a mob file, it's regarded
|
||||||
|
as an invalid file and needs to be merged.
|
||||||
|
If existingCellsSize/mobFileSize is less than ratio, it's regarded
|
||||||
|
as an invalid file. The default value is 0.5f.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.mob.sweep.tool.compaction.mergeable.size</name>
|
||||||
|
<value>134217728</value>
|
||||||
|
<description>
|
||||||
|
If the size of a mob file is less than this value, it's regarded as a small
|
||||||
|
file and needs to be merged. The default value is 128MB.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.mob.sweep.tool.compaction.memstore.flush.size</name>
|
||||||
|
<value>134217728</value>
|
||||||
|
<description>
|
||||||
|
The flush size for the memstore used by sweep job. Each sweep reducer owns such a memstore.
|
||||||
|
The default value is 128MB.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.mob.cleaner.interval</name>
|
||||||
|
<value>86400000</value>
|
||||||
|
<description>
|
||||||
|
The period that ExpiredMobFileCleaner runs. The unit is millisecond.
|
||||||
|
The default value is one day.
|
||||||
|
</description>
|
||||||
|
</property>]]>
|
||||||
|
</programlisting>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Next, add the HBase install directory, <envar>$HBASE_HOME/*</envar>, and HBase
|
||||||
|
library directory to <filename>yarn-site.xml</filename> Adjust this example to
|
||||||
|
suit your environment.</para>
|
||||||
|
<programlisting language="xml"><![CDATA[
|
||||||
|
<property>
|
||||||
|
<description>Classpath for typical applications.</description>
|
||||||
|
<name>yarn.application.classpath</name>
|
||||||
|
<value>
|
||||||
|
$HADOOP_CONF_DIR
|
||||||
|
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*
|
||||||
|
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
|
||||||
|
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*
|
||||||
|
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
|
||||||
|
$HBASE_HOME/*, $HBASE_HOME/lib/*
|
||||||
|
</value>
|
||||||
|
</property>]]>
|
||||||
|
</programlisting>
|
||||||
|
</step>
|
||||||
|
<step>
|
||||||
|
<para>Finally, run the <command>sweeper</command> tool for each column which is
|
||||||
|
configured for MOB..</para>
|
||||||
|
<screen>$ <userinput>org.apache.hadoop.hbase.mob.compactions.Sweeper \
|
||||||
|
<replaceable>tableName</replaceable> \
|
||||||
|
<replaceable>familyName</replaceable></userinput></screen>
|
||||||
|
</step>
|
||||||
|
</procedure>
|
||||||
|
</section>
|
||||||
|
<section xml:id="mob.cache.configure">
|
||||||
|
<title>Configure the MOB Cache</title>
|
||||||
|
<para>Because there can be a large number of MOB files at any time, as compared to the
|
||||||
|
number of HFiles, MOB files are not always kept open. The MOB file reader cache is a LRU
|
||||||
|
cache which keeps the most recently used MOB files open. To configure the MOB file
|
||||||
|
reader's cache on each RegionServer, add the following properties to the RegionServer's
|
||||||
|
<filename>hbase-site.xm</filename>l, customize the configuration to suit your
|
||||||
|
environment, and restart or rolling restart the RegionServer.</para>
|
||||||
|
<programlisting language="xml"><![CDATA[
|
||||||
|
<property>
|
||||||
|
<name>hbase.mob.file.cache.size</name>
|
||||||
|
<value>1000</value>
|
||||||
|
<description>
|
||||||
|
Number of opened file handlers to cache.
|
||||||
|
A larger value will benefit reads by provinding more file handlers per mob
|
||||||
|
file cache and would reduce frequent file opening and closing.
|
||||||
|
However, if this is set too high, this could lead to a "too many opened file handers"
|
||||||
|
The default value is 1000.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.mob.cache.evict.period</name>
|
||||||
|
<value>3600</value>
|
||||||
|
<description>
|
||||||
|
The amount of time in seconds before the mob cache evicts cached mob files.
|
||||||
|
The default value is 3600 seconds.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.mob.cache.evict.remain.ratio</name>
|
||||||
|
<value>0.5f</value>
|
||||||
|
<description>
|
||||||
|
The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
|
||||||
|
is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
|
||||||
|
The default value is 0.5f.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
]]>
|
||||||
|
</programlisting>
|
||||||
|
</section>
|
||||||
|
</chapter>
|
Loading…
Reference in New Issue