HBASE-11339 Converted hbase_mob.xml to Asciidoc and added it to the Asciidoc TOC

This commit is contained in:
Misty Stanley-Jones 2015-02-27 12:14:50 +10:00
parent 85bcec3995
commit a1e9ce3d87
3 changed files with 121 additions and 226 deletions

View File

@ -0,0 +1,120 @@
////
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
////
[[hbase_mob]]
== Storing Medium-sized Objects (MOB)
:doctype: book
:numbered:
:toc: left
:icons: font
:experimental:
:toc: left
:source-language: java
Data comes in many sizes, and saving all of your data in HBase, including binary data such as images and documents, is ideal. HBase can technically handle binary objects with cells that are up to 10MB in size. However, HBase's normal read and write paths are optimized for values smaller than 100KB in size. When HBase deals with large numbers of values up to 10MB, referred to here as medium objects, or MOBs, performance is degraded due to write amplification caused by splits and compactions. HBase ***FIX_VERSION_NUMBER*** adds support for better managing large numbers of MOBs while maintaining performance, consistency, and low operational overhead. MOB support is provided by the work done in link:https://issues.apache.org/jira/browse/HBASE-11339[HBASE-11339].
To take advantage of MOB, you need to use <<hfilev3,HFile version 3>>. Optionally, configure the MOB file reader's cache settings for each RegionServer (see <<mob.cache.configure>>), then configure specific columns to hold MOB data.
Client code does not need to change to take advantage of HBase MOB support. The feature is transparent to the client.
=== Configuring Columns for MOB
You can configure columns to support MOB during table creation or alteration, either in HBase Shell or via the Java API. The two relevant properties are the boolean `IS_MOB` and the `MOB_THRESHOLD`, which is the number of bytes at which an object is considered to be a MOB. Only `IS_MOB` is required. If you do not specify the `MOB_THRESHOLD`, the default threshold value of 100 kb is used.
.Configure a Column for MOB Using HBase Shell
====
----
hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
----
====
.Configure a Column for MOB Using the Java API
====
[source,java]
----
...
HColumnDescriptor hcd = new HColumnDescriptor(“f”);
hcd.setMobEnabled(true);
...
hcd.setMobThreshold(102400L);
...
----
====
=== Testing MOB
The utility `org.apache.hadoop.hbase.IntegrationTestIngestMOB` is provided to assist with testing the MOB feature. The utility is run as follows:
[source,bash]
----
$ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
-threshold 102400 \
-minMobDataSize 512 \
-maxMobDataSize 5120
----
* `*threshold*` is the threshold at which cells are considered to be MOBs. The default is 1 kB, expressed in bytes.
* `*minMobDataSize*` is the minimum value for the size of MOB data. The default is 512 B, expressed in bytes.
* `*maxMobDataSize*` is the maximum value for the size of MOB data. The default is 5 kB, expressed in bytes.
[[mob.cache.configure]]
=== Configuring the MOB Cache
Because there can be a large number of MOB files at any time, as compared to the number of HFiles, MOB files are not always kept open. The MOB file reader cache is a LRU cache which keeps the most recently used MOB files open. To configure the MOB file reader's cache on each RegionServer, add the following properties to the RegionServer's `hbase-site.xml`, customize the configuration to suit your environment, and restart or rolling restart the RegionServer.
.Example MOB Cache Configuration
====
[source,xml]
----
<property>
<name>hbase.mob.file.cache.size</name>
<value>1000</value>
<description>
Number of opened file handlers to cache.
A larger value will benefit reads by provinding more file handlers per mob
file cache and would reduce frequent file opening and closing.
However, if this is set too high, this could lead to a "too many opened file handers"
The default value is 1000.
</description>
</property>
<property>
<name>hbase.mob.cache.evict.period</name>
<value>3600</value>
<description>
The amount of time in seconds before the mob cache evicts cached mob files.
The default value is 3600 seconds.
</description>
</property>
<property>
<name>hbase.mob.cache.evict.remain.ratio</name>
<value>0.5f</value>
<description>
The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
The default value is 0.5f.
</description>
</property>
----
====

View File

@ -51,6 +51,7 @@ include::_chapters/schema_design.adoc[]
include::_chapters/mapreduce.adoc[]
include::_chapters/security.adoc[]
include::_chapters/architecture.adoc[]
include::_chapters/hbase_mob.adoc[]
include::_chapters/hbase_apis.adoc[]
include::_chapters/external_apis.adoc[]
include::_chapters/thrift_filter_language.adoc[]

View File

@ -1,226 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<chapter version="5.0" xml:id="mob" xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:svg="http://www.w3.org/2000/svg" xmlns:m="http://www.w3.org/1998/Math/MathML"
xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook">
<!--
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<title>HBase Medium Object (MOB) Storage</title>
<para>Data comes in many sizes, and saving all of your data in HBase, including binary data such
as images and documents, is ideal. HBase can technically handle binary objects with cells
that are up to 10MB in size. However, HBase's normal read and write paths are optimized for
values smaller than 100KB in size. When HBase deals with large numbers of values up to 10MB,
referred to here as <firstterm>medium objects</firstterm>, or <firstterm>MOBs</firstterm>,
performance is degraded due to write amplification caused by splits and compactions. HBase
2.0+ adds support for better managing large numbers of MOBs while maintaining performance,
consistency, and low operational overhead. MOB support is provided by the work done in <link
xlink:href="https://issues.apache.org/jira/browse/HBASE-11339"
>HBASE-11339</link>.</para>
<para>To take advantage of MOB, you need to use HFile version 3. Optionally, configure the MOB
file reader's cache settings for each RegionServer (see <xref linkend="mob.cache.configure"
/>), then configure specific columns to hold MOB data. Currently, you also need to configure
a periodic re-optimization of MOB data layout, but this requirement is expected to be
removed at a later date.</para>
<para>Client code does not need to change to take advantage of HBase MOB support. The feature is
transparent to the client.</para>
<section>
<title>Limitations of MOB Functionality</title>
<para>Work on HBase MOB is ongoing. Work is needed for support for snapshots (<link
xlink:href="https://issues.apache.org/jira/browse/HBASE-11645">HBASE-11645</link>),
metrics (<link xlink:href="https://issues.apache.org/jira/browse/HBASE-11683"
>HBASE-11683</link>), and a native compaction mechanism (<link
xlink:href="https://issues.apache.org/jira/browse/HBASE-11861"
>HBASE-11861)</link>.</para>
</section>
<section>
<title>Configure Columns for MOB</title>
<para>You can configure columns to support MOB during table creation or alteration, either
in HBase Shell or via the Java API. The two relevant properties are the boolean
<code>IS_MOB</code> and the <code>MOB_THRESHOLD</code>, which is the number of bytes
at which an object is considered to be a MOB. Only <code>IS_MOB</code> is required. If
you do not specify the <code>MOB_THRESHOLD</code>, the default threshold value of 100 kb
is used.</para>
<example>
<title>Configure a Column for MOB Using HBase Shell</title>
<screen>
hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
hbase> alter 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 102400}
</screen>
</example>
<example>
<title>Configure a Column for MOB Using the API</title>
<programlisting language="java">
...
HColumnDescriptor hcd = new HColumnDescriptor(“f”);
hcd.setMobEnabled(true);
...
hcd.setMobThreshold(102400L);
...
</programlisting>
</example>
</section>
<section>
<title>Testing MOB</title>
<para>The utility <command>org.apache.hadoop.hbase.IntegrationTestIngestMOB</command> is
provided to assist with testing the MOB feature. The utility is run as follows:</para>
<screen>$ <userinput>sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
-threshold 100*1024 \
-minMobDataSize 100*1024*4/5 \
-maxMobDataSize 100*1024*50</userinput></screen>
<itemizedlist>
<listitem>
<para><literal>threshold</literal> is the threshold at which cells are considered to
be MOBs. The default is 1 kB.</para>
</listitem>
<listitem>
<para><literal>minMobDataSize</literal> is the minimum value for the size of MOB
data. The default is 512 B.</para>
</listitem>
<listitem>
<para><literal>maxMobDataSize</literal> is the maximum value for the size of MOB
data. The default is 5 kB.</para>
</listitem>
</itemizedlist>
</section>
<section>
<title>Set Up MOB Re-Optimization Tasks</title>
<para>The MOB feature introduces a new read and write path to HBase and currently requires
an external tool, the <command>sweeper</command> tool, for housekeeping and
optimization. The <command>sweeper</command> tool uses MapReduce to coalesce small MOB
files or MOB files with many deletions or updates</para>
<procedure>
<title>Configure and Run the <command>sweeper</command> Tool</title>
<step>
<para>First, configure the <command>sweeper</command>'s properties in the
RegionServer's <filename>hbase-site.xml</filename> file. Adjust these properties
to suit your environment.</para>
<programlisting language="xml"><![CDATA[
<property>
<name>hbase.mob.sweep.tool.compaction.ratio</name>
<value>0.5f</value>
<description>
If there are too many cells deleted in a mob file, it's regarded
as an invalid file and needs to be merged.
If existingCellsSize/mobFileSize is less than ratio, it's regarded
as an invalid file. The default value is 0.5f.
</description>
</property>
<property>
<name>hbase.mob.sweep.tool.compaction.mergeable.size</name>
<value>134217728</value>
<description>
If the size of a mob file is less than this value, it's regarded as a small
file and needs to be merged. The default value is 128MB.
</description>
</property>
<property>
<name>hbase.mob.sweep.tool.compaction.memstore.flush.size</name>
<value>134217728</value>
<description>
The flush size for the memstore used by sweep job. Each sweep reducer owns such a memstore.
The default value is 128MB.
</description>
</property>
<property>
<name>hbase.mob.cleaner.interval</name>
<value>86400000</value>
<description>
The period that ExpiredMobFileCleaner runs. The unit is millisecond.
The default value is one day.
</description>
</property>]]>
</programlisting>
</step>
<step>
<para>Next, add the HBase install directory, <envar>$HBASE_HOME/*</envar>, and HBase
library directory to <filename>yarn-site.xml</filename> Adjust this example to
suit your environment.</para>
<programlisting language="xml"><![CDATA[
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
$HBASE_HOME/*, $HBASE_HOME/lib/*
</value>
</property>]]>
</programlisting>
</step>
<step>
<para>Finally, run the <command>sweeper</command> tool for each column which is
configured for MOB..</para>
<screen>$ <userinput>org.apache.hadoop.hbase.mob.compactions.Sweeper \
<replaceable>tableName</replaceable> \
<replaceable>familyName</replaceable></userinput></screen>
</step>
</procedure>
</section>
<section xml:id="mob.cache.configure">
<title>Configure the MOB Cache</title>
<para>Because there can be a large number of MOB files at any time, as compared to the
number of HFiles, MOB files are not always kept open. The MOB file reader cache is a LRU
cache which keeps the most recently used MOB files open. To configure the MOB file
reader's cache on each RegionServer, add the following properties to the RegionServer's
<filename>hbase-site.xm</filename>l, customize the configuration to suit your
environment, and restart or rolling restart the RegionServer.</para>
<programlisting language="xml"><![CDATA[
<property>
<name>hbase.mob.file.cache.size</name>
<value>1000</value>
<description>
Number of opened file handlers to cache.
A larger value will benefit reads by provinding more file handlers per mob
file cache and would reduce frequent file opening and closing.
However, if this is set too high, this could lead to a "too many opened file handers"
The default value is 1000.
</description>
</property>
<property>
<name>hbase.mob.cache.evict.period</name>
<value>3600</value>
<description>
The amount of time in seconds before the mob cache evicts cached mob files.
The default value is 3600 seconds.
</description>
</property>
<property>
<name>hbase.mob.cache.evict.remain.ratio</name>
<value>0.5f</value>
<description>
The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
The default value is 0.5f.
</description>
</property>
]]>
</programlisting>
</section>
</chapter>