HBASE-21790 - Detail docs on ref guide for CompactionTool

Change-Id: I5d60d177d562d94296b278297dcbf2f5a9eba0ae
2019-01-26 10:48:27 -06:00 · 2019-01-26 10:48:27 -06:00 · 8018a7a46b
parent 274e4ccea8
commit 8018a7a46b
1 changed files with 74 additions and 4 deletions
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@ -959,15 +959,85 @@ See link:https://issues.apache.org/jira/browse/HBASE-4391[HBASE-4391 Add ability
 [[compaction.tool]]
 === Offline Compaction Tool
-See the usage for the
+*CompactionTool* provides a way of running compactions (either minor or major) as an independent
-link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool].
+process from the RegionServer. It reuses same internal implementation classes executed by RegionServer
-Run it like:
+compaction feature. However, since this runs on a complete separate independent java process, it
 releases RegionServers from the overhead involved in rewrite a set of hfiles, which can be critical
 for latency sensitive use cases.
-[source, bash]
+Usage:
 ----
 $ ./bin/hbase org.apache.hadoop.hbase.regionserver.CompactionTool
 Usage: java org.apache.hadoop.hbase.regionserver.CompactionTool \
  [-compactOnce] [-major] [-mapred] [-D<property=value>]* files...
 Options:
 mapred         Use MapReduce to run compaction.
 compactOnce    Execute just one compaction step. (default: while needed)
 major          Trigger major compaction.
 Note: -D properties will be applied to the conf used.
 For example:
 To stop delete of compacted file, pass -Dhbase.compactiontool.delete=false
 To set tmp dir, pass -Dhbase.tmp.dir=ALTERNATE_DIR
 Examples:
 To compact the full 'TestTable' using MapReduce:
 $ hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred hdfs://hbase/data/default/TestTable
 To compact column family 'x' of the table 'TestTable' region 'abc':
 $ hbase org.apache.hadoop.hbase.regionserver.CompactionTool hdfs://hbase/data/default/TestTable/abc/x
 ----
 As shown by usage options above, *CompactionTool* can run as a standalone client or a mapreduce job.
 When running as mapreduce job, each family dir is handled as an input split, and is processed
 by a separate map task.
 The *compactionOnce* parameter controls how many compaction cycles will be performed until
 *CompactionTool* program decides to finish its work. If omitted, it will assume it should keep
 running compactions on each specified family as determined by the given compaction policy
 configured. For more info on compaction policy, see <<compaction,compaction>>.
 If a major compaction is desired, *major* flag can be specified. If omitted, *CompactionTool* will
 assume minor compaction is wanted by default.
 It also allows for configuration overrides with `-D` flag. In the usage section above, for example,
 `-Dhbase.compactiontool.delete=false` option will instruct compaction engine to not delete original
 files from temp folder.
 Files targeted for compaction must be specified as parent hdfs dirs. It allows for multiple dirs
 definition, as long as each for these dirs are either a *family*, a *region*, or a *table* dir. If a
 table or region dir is passed, the program will recursively iterate through related sub-folders,
 effectively running compaction for each family found below the table/region level.
 Since these dirs are nested under *hbase* hdfs directory tree, *CompactionTool* requires hbase super
 user permissions in order to have access to required hfiles.
 .Running in MapReduce mode
 [NOTE]
 ====
 MapReduce mode offers the ability to process each family dir in parallel, as a separate map task.
 Generally, it would make sense to run in this mode when specifying one or more table dirs as targets
 for compactions. The caveat, though, is that if number of families to be compacted become too large,
 the related mapreduce job may have indirect impacts on *RegionServers* performance .
 Since *NodeManagers* are normally co-located with RegionServers, such large jobs could
 compete for IO/Bandwidth resources with the *RegionServers*.
 ====
 .MajorCompaction completely disabled on RegionServers due performance impacts
 [NOTE]
 ====
 *Major compactions* can be a costly operation (see <<compaction,compaction>>), and can indeed
 impact performance on RegionServers, leading operators to completely disable it for critical
 low latency application. *CompactionTool* could be used as an alternative in such scenarios,
 although, additional custom application logic would need to be implemented, such as deciding
 scheduling and selection of tables/regions/families target for a given compaction run.
 ====
 For additional details about CompactionTool, see also
 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool].
 === `hbase clean`
 The `hbase clean` command cleans HBase data from ZooKeeper, HDFS, or both.