`CallDroppedException` can be thrown with `CallRunner.drop()` by queue implementations that decide to drop calls to groom the RPC call backlog. The LifoCoDel queue does this I believe and with Pluggable queue it's possible for 3rd party queue implementations to be using `drop()` for similar reasons. It would be nice for the server to be tracking these exceptions in metrics since otherwise you might have to do some extra lifting on the client side.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Reviewed-by: Bryan Beaudreault <bbeaudreault@hubspot.com>
Introduces a new metric that tracks number of replication sources that are stuck in initialization.
Signed-off-by: Xu Cang <xucang@apache.org>
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
(cherry picked from commit ff3821814a)
Add new metric rpcFullScanRequestCount to track number of requests that are full region scans. Can be used to notify user to check if this is truly intended.
Signed-off-by Anoop Sam John <anoopsamjohn@apache.org>
Signed-off-by Ramkrishna S Vasudevan <ramkrishna@apache.org>
* Admin API getLogEntries() for ring buffer use-cases: so far, provides balancerDecision and slowLogResponse
* Refactor RPC call for similar use-cases
* Single RPC API getLogEntries() for both Master.proto and Admin.proto
Closes#2261
Signed-off-by: Andrew Purtell <apurtell@apache.org>
* HBASE-24205 - Create metric to know the number of reads that happens
from memstore (branch-2)
* Add the optimization as in master and fix whitestyle and checkstyle
* Fix compilation error that accidently crept in
Authored-by: Ramkrishna <ramkrishna@apache.org>
Signed-off by:Anoop Sam John<anoopsamjohn@gmail.com>
Signed-off by:Viraj Jasani<virajjasani@apache.org>
JMXCacheBuster resets the metrics state at various points in time. These
events can potentially race with a master shutdown. When the master is
tearing down, metrics initialization can touch a lot of unsafe state,
for example invalidated FS objects. To avoid this, this patch makes
the getMetrics() a no-op when the master is either stopped or in the
process of shutting down. Additionally, getClusterId() when the server
is shutting down is made a no-op.
Simulating a test for this is a bit tricky but with the patch I don't
locally see the long stacktraces from the jira.
Signed-off-by: Michael Stack <stack@apache.org>
(cherry picked from commit 6f213e9d5a)
Includes the following, incorporating HBASE-20439 and HBASE-20440, too.
1)
HBASE-18133 Decrease quota reaction latency by HBase
Certain operations in HBase are known to directly affect
the utilization of tables on HDFS. When these actions
occur, we can circumvent the normal path and notify the
Master directly. This results in a much faster response to
changes in HDFS usage.
This requires FS scanning by the RS to be decoupled from
the reporting of sizes to the Master. An API inside each
RS is made so that any operation can hook into this call
in the face of other operations (e.g. compaction, flush,
bulk load).
2)
HBASE-18135 Implement mechanism for RegionServers to report file archival for space quotas
This de-couples the snapshot size calculation from the
SpaceQuotaObserverChore into another API which both the periodically
invoked Master chore and the Master service endpoint can invoke. This
allows for multiple sources of snapshot size to reported (from the
multiple sources we have in HBase).
When a file is archived, snapshot sizes can be more quickly realized and
the Master can still perform periodical computations of the total
snapshot size to account for any delayed/missing/lost file archival RPCs.
3)
HBASE-20531 RS may throw NPE when close meta regions in shutdown procedure.