diff --git a/CHANGES.txt b/CHANGES.txt index d7a9ecfdc64..7dceeecb127 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -135,6 +135,8 @@ Release 0.91.0 - Unreleased (Ted Yu via Stack) HBASE-3694 high multiput latency due to checking global mem store size in a synchronized function (Liyin Tang via Stack) + HBASE-3710 Book.xml - fill out descriptions of metrics + (Doug Meil via Stack) TASK HBASE-3559 Move report of split to master OFF the heartbeat channel diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index d8d85401466..0643176cda0 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -232,49 +232,55 @@ throws InterruptedException, IOException {
Region Server Metrics
<varname>hbase.regionserver.blockCacheCount</varname> - + Block cache item count in memory. This is the number of blocks of storefiles (HFiles) in the cache.
<varname>hbase.regionserver.blockCacheFree</varname> - + Block cache memory available (MB).
<varname>hbase.regionserver.blockCacheHitRatio</varname> - + Block cache hit ratio (0 to 100). TODO: describe impact to ratio where read requests that have cacheBlocks=false
<varname>hbase.regionserver.blockCacheSize</varname> - + Block cache size in memory (MB) +
+
<varname>hbase.regionserver.compactionQueueSize</varname> + Size of the compaction queue.
<varname>hbase.regionserver.fsReadLatency_avg_time</varname> - + Filesystem read latency (ms)
<varname>hbase.regionserver.fsReadLatency_num_ops</varname> - + TODO
<varname>hbase.regionserver.fsSyncLatency_avg_time</varname> - + Filesystem sync latency (ms)
<varname>hbase.regionserver.fsSyncLatency_num_ops</varname> - + TODO
<varname>hbase.regionserver.fsWriteLatency_avg_time</varname> - + Filesystem write latency (ms)
<varname>hbase.regionserver.fsWriteLatency_num_ops</varname> - + TODO
<varname>hbase.regionserver.memstoreSizeMB</varname> - + Sum of all the memstore sizes in this regionserver (MB)
<varname>hbase.regionserver.regions</varname> - + Number of regions served by the regionserver
<varname>hbase.regionserver.requests</varname> - + Total number of read and write requests. Requests correspond to regionserver RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request will constitute 1 request per HFile.
<varname>hbase.regionserver.storeFileIndexSizeMB</varname> - + Sum of all the storefile index sizes in this regionserver (MB)
<varname>hbase.regionserver.stores</varname> - + Number of stores open on the regionserver. A store corresponds to a column family. For example, if a table (which contains the column family) has 3 regions on a regionserver, there will be 3 stores open for that column family. +
+
<varname>hbase.regionserver.storeFiles</varname> + Number of store filles open on the regionserver. A store may have more than one storefile (HFile).
@@ -1055,24 +1061,38 @@ throws InterruptedException, IOException {
Node Decommission You can have a node gradually shed its load and then shutdown using the - graceful_restart.sh script. Here is its usage: - $ ./bin/graceful_stop.sh -Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] &hostname> - restart If we should restart after graceful stop - reload Move offloaded regions back on to the stopped server - debug Move offloaded regions back on to the stopped server - hostname Hostname of server we are to stop + graceful_stop.sh script. Here is its usage: + $ ./bin/graceful_stop.sh +Usage: graceful_stop.sh [--config &conf-dir>] [--restart] [--reload] [--thrift] [--rest] &hostname> + thrift If we should stop/start thrift before/after the hbase stop/start + rest If we should stop/start rest before/after the hbase stop/start + restart If we should restart after graceful stop + reload Move offloaded regions back on to the stopped server + debug Move offloaded regions back on to the stopped server + hostname Hostname of server we are to stop To decommission a loaded regionserver, run the following: - $ ./bin/graceful_stop.sh HOSTNAME + $ ./bin/graceful_stop.sh HOSTNAME where HOSTNAME is the host carrying the RegionServer - you would decommission. The script will move the regions off the + you would decommission. + On <varname>HOSTNAME</varname> + The HOSTNAME passed to graceful_stop.sh + must match the hostname that hbase is using to identify regionservers. + Check the list of regionservers in the master UI for how HBase is + referring to servers. Its usually hostname but can also be FQDN. + Whatever HBase is using, this is what you should pass the + graceful_stop.sh decommission + script. If you pass IPs, the script is not yet smart enough to make + a hostname (or FQDN) of it and so it will fail when it checks if server is + currently running; the graceful unloading of regions will not run. + + The graceful_stop.sh script will move the regions off the decommissioned regionserver one at a time to minimize region churn. It will verify the region deployed in the new location before it will moves the next region and so on until the decommissioned server - is carrying zero regions. At this point, the graceful_stop - tells the RegionServer stop. The master will at this point notice the + is carrying zero regions. At this point, the graceful_stop.sh + tells the RegionServer stop. The master will at this point notice the RegionServer gone but all regions will have already been redeployed and because the RegionServer went down cleanly, there will be no WAL logs to split.