HBASE-17089 Add doc on experience running with hedged reads; ADDENDUM adding in Ashu Pachauri's experience

This commit is contained in:
Michael Stack 2016-11-14 21:06:29 -08:00
parent c7bdb3017f
commit 86df89b016
1 changed files with 23 additions and 8 deletions

View File

@ -760,11 +760,30 @@ See the _Development Process_ section of the document link:https://issues.apache
Hedged reads are a feature of HDFS, introduced in Hadoop 2.4.0 with link:https://issues.apache.org/jira/browse/HDFS-5776[HDFS-5776]. Hedged reads are a feature of HDFS, introduced in Hadoop 2.4.0 with link:https://issues.apache.org/jira/browse/HDFS-5776[HDFS-5776].
Normally, a single thread is spawned for each read request. Normally, a single thread is spawned for each read request.
However, if hedged reads are enabled, the client waits some configurable amount of time, and if the read does not return, the client spawns a second read request, against a different block replica of the same data. However, if hedged reads are enabled, the client waits some
Whichever read returns first is used, and the other read request is discarded. configurable amount of time, and if the read does not return,
Hedged reads can be helpful for times where a rare slow read is caused by a transient error such as a failing disk or flaky network connection. the client spawns a second read request, against a different
block replica of the same data. Whichever read returns first is
used, and the other read request is discarded.
Because an HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding the following properties to the RegionServer's hbase-site.xml and tuning the values to suit your environment. Hedged reads are "...very good at eliminating outlier datanodes, which
in turn makes them very good choice for latency sensitive setups.
But, if you are looking for maximizing throughput, hedged reads tend to
create load amplification as things get slower in general. In short,
the thing to watch out for is the non-graceful performance degradation
when you are running close a certain throughput threshold." (Quote from Ashu Pachauri in HBASE-17083).
Other concerns to keep in mind while running with hedged reads enabled
include:
* They may lead to network congestion. See link:https://issues.apache.org/jira/browse/HBASE-17083[HBASE-17083]
* Make sure you set the thread pool large enough so as blocking on the pool does not become a bottleneck (Again see link:https://issues.apache.org/jira/browse/HBASE-17083[HBASE-17083])
(From Yu Li up in HBASE-17083)
Because an HBase RegionServer is a HDFS client, you can enable hedged
reads in HBase, by adding the following properties to the RegionServer's
hbase-site.xml and tuning the values to suit your environment.
.Configuration for Hedged Reads .Configuration for Hedged Reads
* `dfs.client.hedged.read.threadpool.size` - the number of threads dedicated to servicing hedged reads. * `dfs.client.hedged.read.threadpool.size` - the number of threads dedicated to servicing hedged reads.
@ -795,10 +814,6 @@ See <<hbase_metrics>> for more information.
* hedgeReadOpsWin - the number of times the hedged read thread was faster than the original thread. * hedgeReadOpsWin - the number of times the hedged read thread was faster than the original thread.
This could indicate that a given RegionServer is having trouble servicing requests. This could indicate that a given RegionServer is having trouble servicing requests.
Concerns while running with hedged reads enabled include:
* They may lead to network congestion. See link:https://issues.apache.org/jira/browse/HBASE-17083[HBASE-17083]
* Make sure you set the thread pool large enough so as blocking on the pool does not become a bottleneck (Again see link:https://issues.apache.org/jira/browse/HBASE-17083[HBASE-17083])
[[perf.deleting]] [[perf.deleting]]
== Deleting from HBase == Deleting from HBase