Apply feedback to MOB docs

This commit is contained in:
Misty Stanley-Jones 2015-06-22 14:14:29 +10:00
parent b889339006
commit c4437e2516
1 changed files with 43 additions and 15 deletions

View File

@ -29,15 +29,30 @@
:toc: left
:source-language: java
Data comes in many sizes, and saving all of your data in HBase, including binary data such as images and documents, is ideal. HBase can technically handle binary objects with cells that are up to 10MB in size. However, HBase's normal read and write paths are optimized for values smaller than 100KB in size. When HBase deals with large numbers of values up to 10MB, referred to here as medium objects, or MOBs, performance is degraded due to write amplification caused by splits and compactions. HBase ***FIX_VERSION_NUMBER*** adds support for better managing large numbers of MOBs while maintaining performance, consistency, and low operational overhead. MOB support is provided by the work done in link:https://issues.apache.org/jira/browse/HBASE-11339[HBASE-11339].
To take advantage of MOB, you need to use <<hfilev3,HFile version 3>>. Optionally, configure the MOB file reader's cache settings for each RegionServer (see <<mob.cache.configure>>), then configure specific columns to hold MOB data.
Client code does not need to change to take advantage of HBase MOB support. The feature is transparent to the client.
Data comes in many sizes, and saving all of your data in HBase, including binary
data such as images and documents, is ideal. While HBase can technically handle
binary objects with cells that are larger than 100 KB in size, HBase's normal
read and write paths are optimized for values smaller than 100KB in size. When
HBase deals with large numbers of objects over this threshold, referred to here
as medium objects, or MOBs, performance is degraded due to write amplification
caused by splits and compactions. When using MOBs, ideally your objects will be between
100KB and 10MB. HBase ***FIX_VERSION_NUMBER*** adds support
for better managing large numbers of MOBs while maintaining performance,
consistency, and low operational overhead. MOB support is provided by the work
done in link:https://issues.apache.org/jira/browse/HBASE-11339[HBASE-11339]. To
take advantage of MOB, you need to use <<hfilev3,HFile version 3>>. Optionally,
configure the MOB file reader's cache settings for each RegionServer (see
<<mob.cache.configure>>), then configure specific columns to hold MOB data.
Client code does not need to change to take advantage of HBase MOB support. The
feature is transparent to the client.
=== Configuring Columns for MOB
You can configure columns to support MOB during table creation or alteration, either in HBase Shell or via the Java API. The two relevant properties are the boolean `IS_MOB` and the `MOB_THRESHOLD`, which is the number of bytes at which an object is considered to be a MOB. Only `IS_MOB` is required. If you do not specify the `MOB_THRESHOLD`, the default threshold value of 100 kb is used.
You can configure columns to support MOB during table creation or alteration,
either in HBase Shell or via the Java API. The two relevant properties are the
boolean `IS_MOB` and the `MOB_THRESHOLD`, which is the number of bytes at which
an object is considered to be a MOB. Only `IS_MOB` is required. If you do not
specify the `MOB_THRESHOLD`, the default threshold value of 100 KB is used.
.Configure a Column for MOB Using HBase Shell
====
@ -56,7 +71,7 @@ HColumnDescriptor hcd = new HColumnDescriptor(“f”);
hcd.setMobEnabled(true);
...
hcd.setMobThreshold(102400L);
...
...
----
====
@ -102,17 +117,19 @@ Because there can be a large number of MOB files at any time, as compared to the
<name>hbase.mob.cache.evict.period</name>
<value>3600</value>
<description>
The amount of time in seconds before the mob cache evicts cached mob files.
The default value is 3600 seconds.
The amount of time in seconds after which an unused file is evicted from the
MOB cache. The default value is 3600 seconds.
</description>
</property>
<property>
<name>hbase.mob.cache.evict.remain.ratio</name>
<value>0.5f</value>
<description>
The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
The default value is 0.5f.
A multiplier (between 0.0 and 1.0), which determines how many files remain cached
after the threshold of files that remains cached after a cache eviction occurs
which is triggered by reaching the `hbase.mob.file.cache.size` threshold.
The default value is 0.5f, which means that half the files (the least-recently-used
ones) are evicted.
</description>
</property>
----
@ -122,7 +139,12 @@ Because there can be a large number of MOB files at any time, as compared to the
==== Manually Compacting MOB Files
To manually compact MOB files, rather than waiting for the <<mob.cache.configure,configuration>> to trigger compaction, use the `compact_mob` or `major_compact_mob` HBase shell commands. These commands require the first argument to be the table name, and take an optional column family as the second argument. If the column family is omitted, all MOB-enabled column families are compacted.
To manually compact MOB files, rather than waiting for the
<<mob.cache.configure,configuration>> to trigger compaction, use the
`compact_mob` or `major_compact_mob` HBase shell commands. These commands
require the first argument to be the table name, and take an optional column
family as the second argument. If the column family is omitted, all MOB-enabled
column families are compacted.
----
hbase> compact_mob 't1', 'c1'
@ -131,11 +153,17 @@ hbase> major_compact_mob 't1', 'c1'
hbase> major_compact_mob 't1'
----
These commands are also available via `Admin.compactMob` and `Admin.majorCompactMob` methods.
These commands are also available via `Admin.compactMob` and
`Admin.majorCompactMob` methods.
==== MOB Sweeper
HBase MOB currently relies on a MapReduce job called the Sweeper tool for optimization. The Sweeper tool oalesces small MOB files or MOB files with many deletions or updates. A native MOB compaction tool is still in testing. To configure the Sweeper tool, set the following options:
HBase MOB a MapReduce job called the Sweeper tool for
optimization. The Sweeper tool oalesces small MOB files or MOB files with many
deletions or updates. The Sweeper tool is not required if you use native MOB compaction, which
does not rely on MapReduce.
To configure the Sweeper tool, set the following options:
[source,xml]
----