Client-side ZooKeeper metrics cause issues when launching MapReduce
jobs via 'yarn jar' on the command line. This stems from ClassLoader
separation issues that YARN implements. It was chosen that the
easiest solution was to remove these ZooKeeper metrics entirely.
Revert "HBASE-17448 Export metrics from RecoverableZooKeeper"
This reverts commit defc25c6d1.
For a egionserver's view of a table (the regions
that belong to a table hosted on a regionserver),
this change tracks the latencies of operations that
affect the regions for this table.
Tracking at the per-table level avoids the memory bloat
and performance impact that accompanied the previous
per-region latency metrics while still providing important
details for operators to consume.
Signed-Off-By: Andrew Purtell <apurtell@apache.org>
Added metrics for RecoverableZooKeeper related to specific exceptions,
total failed ZooKeeper API calls and latency histograms for read,
write and sync operations. Also added unit tests for the same. Added
service provider for the ZooKeeper metrics implementation inside the
hadoop compatibility module.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Conflicts:
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
hbase-metrics-api/src/main/java/org/apache/hadoop/hbase/metrics/PackageMarker.java
In some particular deployments, the Replication code believes it has
reached EOF for a WAL prior to succesfully parsing all bytes known to
exist in a cleanly closed file.
Consistently this failure happens due to an InvalidProtobufException
after some number of seeks during our attempts to tail the in-progress
RegionServer WAL. As a work-around, this patch treats cleanly closed
files differently than other execution paths. If an EOF is detected due
to parsing or other errors while there are still unparsed bytes before
the end-of-file trailer, we now reset the WAL to the very beginning and
attempt a clean read-through.
In current testing, a single such reset is sufficient to work around
observed dataloss. However, the above change will retry a given WAL file
indefinitely. On each such attempt, a log message like the below will
be emitted at the WARN level:
Processing end of WAL file '{}'. At position {}, which is too far away
from reported file length {}. Restarting WAL reading (see HBASE-15983
for details).
Additionally, this patch adds some additional log detail at the TRACE
level about file offsets seen while handling recoverable errors. It also
add metrics that measure the use of this recovery mechanism.