Commit Graph

153 Commits

Author SHA1 Message Date
Josh Elser d671a1dbc6 HBASE-17955 Various reviewboard improvements to space quota work
Most notable change is to cache SpaceViolationPolicyEnforcement objects
in the write path. When a table has no quota or there is not SpaceQuotaSnapshot
for that table (yet), we want to avoid creating lots of
SpaceViolationPolicyEnforcement instances, caching one instance
instead. This will help reduce GC pressure.
2017-05-22 13:41:36 -04:00
Josh Elser 13af7f8ac6 HBASE-17002 JMX metrics and some UI additions for space quotas 2017-05-22 13:41:35 -04:00
ckulkarni eb6ded4849 HBASE-17448 Export metrics from RecoverableZooKeeper
Added metrics for RecoverableZooKeeper related to specific exceptions,
total failed ZooKeeper API calls and latency histograms for read,
write and sync operations. Also added unit tests for the same. Added
service provider for the ZooKeeper metrics implementation inside the
hadoop compatibility module.

Signed-off-by: Andrew Purtell <apurtell@apache.org>
2017-04-26 18:30:13 -07:00
Umesh Agashe c8461456d0 HBASE-17888: Added generic methods for updating metrics on submit and finish of a procedure execution
Signed-off-by: Michael Stack <stack@apache.org>
2017-04-14 11:51:08 -07:00
Jan Hentschel b53f354763 HBASE-17532 Replaced explicit type with diamond operator
Signed-off-by: Michael Stack <stack@apache.org>
2017-03-07 11:22:51 -08:00
Ashu Pachauri d2c083d21c HBASE-17627 Active workers metric for thrift (Ashu Pachauri)
Signed-off-by: Gary Helmling <garyh@apache.org>
2017-02-15 14:50:19 -08:00
Gary Helmling 7a6518f05d HBASE-17578 Thrift metrics should handle exceptions 2017-02-06 11:39:49 -08:00
Jerry He bc168b419d HBASE-17581 mvn clean test -PskipXXXTests does not work properly for some modules (Yi Liang) 2017-02-02 11:05:17 -08:00
Michael Stack fb2c89b1b3 HBASE-16785 We are not running all tests
M TestStressWALProcedureStore.java
Disable test that now runs that fails because of difference in pb3.1.0.

Signed-off-by: Michael Stack <stack@apache.org>
2017-01-26 21:49:18 -08:00
Michael Stack 1ee8776273 HBASE-16785 We are not running all tests Do nothing patch just to get baseline of how many tests we run 2017-01-26 12:21:00 -08:00
Enis Soztutar c64a1d1994 HBASE-9774 HBase native metrics and metric collection for coprocessors 2017-01-25 11:47:35 -08:00
Guanghao Zhang b3d8d06703 HBASE-17205 Add a metric for the duration of region in transition 2016-12-01 10:13:37 -08:00
Guanghao Zhang cc03f7ad53 HBASE-16561 Add metrics about read/write/scan queue length and active read/write/scan handler count
Signed-off-by: zhangduo <zhangduo@apache.org>
2016-11-29 09:43:30 +08:00
Enis Soztutar 03bc884ea0 HBASE-17017 Remove the current per-region latency histogram metrics 2016-11-08 18:31:53 -08:00
Dustin Pho fcef2c02c9 HBASE-16661 Add last major compaction age to per-region metrics
Signed-off-by: Gary Helmling <garyh@apache.org>
2016-10-10 15:16:12 -07:00
Sean Busbey 76396714e1 HBASE-15984 Handle premature EOF treatment of WALs in replication.
In some particular deployments, the Replication code believes it has
reached EOF for a WAL prior to succesfully parsing all bytes known to
exist in a cleanly closed file.

Consistently this failure happens due to an InvalidProtobufException
after some number of seeks during our attempts to tail the in-progress
RegionServer WAL. As a work-around, this patch treats cleanly closed
files differently than other execution paths. If an EOF is detected due
to parsing or other errors while there are still unparsed bytes before
the end-of-file trailer, we now reset the WAL to the very beginning and
attempt a clean read-through.

In current testing, a single such reset is sufficient to work around
observed dataloss. However, the above change will retry a given WAL file
indefinitely. On each such attempt, a log message like the below will
be emitted at the WARN level:

  Processing end of WAL file '{}'. At position {}, which is too far away
  from reported file length {}. Restarting WAL reading (see HBASE-15983
  for details).

Additionally, this patch adds some additional log detail at the TRACE
level about file offsets seen while handling recoverable errors. It also
add metrics that measure the use of this recovery mechanism.
2016-09-29 10:07:14 -05:00
Enis Soztutar eb112783ae HBASE-16604 Scanner retries on IOException can cause the scans to miss data - RECOMMIT after revert 2016-09-23 11:27:13 -07:00
Enis Soztutar 39db0cac78 Revert "HBASE-16604 Scanner retries on IOException can cause the scans to miss data"
This reverts commit 83cf44cd3f.

Reverting because accidental files are committed with this.
2016-09-23 11:25:23 -07:00
Enis Soztutar 83cf44cd3f HBASE-16604 Scanner retries on IOException can cause the scans to miss data 2016-09-22 12:06:11 -07:00
anoopsamjohn 1384c9a08d HBASE-16650 Wrong usage of BlockCache eviction stat for heap memory tuning. 2016-09-22 21:28:30 +05:30
Ashish Singhi 31f16d6aec HBASE-16471 Region Server metrics context will be wrong when machine hostname contain "master" word (Pankaj Kumar) 2016-08-24 18:59:44 +05:30
Geoffrey cb02be38ab HBASE-16448 Custom metrics for custom replication endpoints
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2016-08-23 17:17:08 -07:00
stack 69d170063f HBASE-16181 Fix AssignmentManager MBean name (Reid Chan) 2016-07-29 16:40:39 -07:00
Reid Chan abfd584fe6 HBASE-14743 Add metrics around HeapMemoryManager. (Reid Chan)
Change-Id: I7305f7b7034b216930b5fb5c57de9ba5eabf96d8

Signed-off-by: Apekshit Sharma <appy@apache.org>
2016-07-25 14:32:38 -07:00
Apekshit Sharma eff38ccf8c Revert HBASE-14743 because of wrong attribution. Since I added commit message to the raw patch, it's making me as author instead of Reid. I should have used --author flag to set Reid as author.
This reverts commit 064271da16.
2016-07-25 14:32:38 -07:00
Apekshit Sharma 064271da16 HBASE-14743 Add metrics around HeapMemoryManager. (Reid Chan)
Change-Id: I60b2435355b3e605e7d91cbf5aca5d2988f26f33
2016-07-25 13:45:50 -07:00
Jingcheng Du 518faa735b HBASE-15353 Add metric for number of CallQueueTooBigException's 2016-06-24 13:00:06 +08:00
Gary Helmling f4cec2e202 HBASE-16085 Add a metric for failed compactions 2016-06-23 15:38:58 -07:00
stack 3a95552cfe HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers" Adds HADOOP-9955 RPC idle connection closing is extremely inefficient
Changes how we do accounting of Connections to match how it is done in Hadoop.
Adds a ConnectionManager class. Adds new configurations for this new class.

"hbase.ipc.client.idlethreshold" 4000
"hbase.ipc.client.connection.idle-scan-interval.ms" 10000
"hbase.ipc.client.connection.maxidletime" 10000
"hbase.ipc.client.kill.max", 10
"hbase.ipc.server.handler.queue.size", 100

The new scheme does away with synchronization that purportedly would freeze out
reads while we were cleaning up stale connections (according to HADOOP-9955)

Also adds in new mechanism for accepting Connections by pulling in as many
as we can at a time adding them to a Queue instead of doing one at a time.
Can help when bursty traffic according to HADOOP-9956. Removes a blocking
while Reader is busy parsing a request. Adds configuration
"hbase.ipc.server.read.connection-queue.size" with default of 100 for
queue size.

Signed-off-by: stack <stack@apache.org>
2016-06-07 16:42:21 -07:00
stack e66ecd7db6 Revert "HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers""
Revert mistaken commit...
This reverts commit e0b70c00e7.
2016-06-07 16:41:30 -07:00
stack 6d5a25935e Revert "HBASE-15967 Metric for active ipc Readers and make default fraction of cpu count"
Revert mistaken commit
This reverts commit 1125215aad.
2016-06-07 16:41:01 -07:00
stack 1125215aad HBASE-15967 Metric for active ipc Readers and make default fraction of cpu count
Add new metric hbase.regionserver.ipc.runningReaders
Also make it so Reader count is a factor of processor count
2016-06-07 13:10:14 -07:00
stack e0b70c00e7 HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers"
Adds HADOOP-9955 RPC idle connection closing is extremely inefficient
Then removes queue added by HADOOP-9956 at Enis suggestion

    Changes how we do accounting of Connections to match how it is done in Hadoop.
    Adds a ConnectionManager class. Adds new configurations for this new class.

    "hbase.ipc.client.idlethreshold" 4000
    "hbase.ipc.client.connection.idle-scan-interval.ms" 10000
    "hbase.ipc.client.connection.maxidletime" 10000
    "hbase.ipc.client.kill.max", 10
    "hbase.ipc.server.handler.queue.size", 100

    The new scheme does away with synchronization that purportedly would freeze out
    reads while we were cleaning up stale connections (according to HADOOP-9955)

    Also adds in new mechanism for accepting Connections by pulling in as many
    as we can at a time adding them to a Queue instead of doing one at a time.
    Can help when bursty traffic according to HADOOP-9956. Removes a blocking
    while Reader is busy parsing a request. Adds configuration
    "hbase.ipc.server.read.connection-queue.size" with default of 100 for
    queue size.
2016-06-07 13:10:14 -07:00
Enis Soztutar b75b226804 HBASE-15740 Replication source.shippedKBs metric is undercounting because it is in KB 2016-05-09 10:25:49 -07:00
Alex Moundalexis 0bf065a5d5 HBASE-15768 fix capitalization of ZooKeeper usage
Signed-off-by: Sean Busbey <busbey@apache.org>
2016-05-05 15:35:44 -05:00
Enis Soztutar 4c0587134a HBASE-15671 Add per-table metrics on memstore, storefile and regionsize (Alicia Ying Shu) 2016-04-21 13:33:26 -07:00
Andrew Purtell b6617b4eb9 HBASE-15663 Hook up JvmPauseMonitor to ThriftServer 2016-04-20 17:37:35 -07:00
Andrew Purtell a330a2b505 HBASE-15662 Hook up JvmPauseMonitor to REST server 2016-04-20 17:37:35 -07:00
Andrew Purtell 2c26fe37ac HBASE-15614 Report metrics from JvmPauseMonitor 2016-04-20 17:37:34 -07:00
Enis Soztutar 18d70bc680 HBASE-15518 Add Per-Table metrics back (Alicia Ying Shu) 2016-04-20 14:35:45 -07:00
tedyu 8541fe4ad1 HBASE-15093 Replication can report incorrect size of log queue for the global source when multiwal is enabled (Ashu Pachauri) 2016-04-11 08:17:20 -07:00
Elliott Clark a71ce6e738 HBASE-14983 Create metrics for per block type hit/miss ratios
Summary: Missing a root index block is worse than missing a data block. We should know the difference

Test Plan: Tested on a local instance. All numbers looked reasonable.

Differential Revision: https://reviews.facebook.net/D55563
2016-03-30 11:41:11 -07:00
Enis Soztutar b3fe4ed16c HBASE-15412 Add average region size metric (Alicia Ying Shu) 2016-03-22 14:46:27 -07:00
Enis Soztutar 797562e6c3 HBASE-15464 Flush / Compaction metrics revisited 2016-03-21 17:50:02 -07:00
Enis Soztutar 51259fe4a5 HBASE-15377 Per-RS Get metric is time based, per-region metric is size-based (Heng Chen) 2016-03-15 11:22:18 -07:00
Enis Soztutar a979d85582 HBASE-15435 Add WAL (in bytes) written metric (Alicia Ying Shu) 2016-03-10 20:16:30 -08:00
chenheng f30afa05d9 HBASE-15376 ScanNext metric is size-based while every other per-operation metric is time based 2016-03-07 17:36:40 +08:00
Jonathan M Hsieh f658f3ef83 HBASE-15356 Remove unused imports (Youngjoon Kim) 2016-03-03 11:42:38 -08:00
Mikhail Antonov 43f99def67 HBASE-15136 Explore different queuing behaviors while busy 2016-02-24 20:41:30 -08:00
Elliott Clark 630a65825e HBASE-15222 Use less contended classes for metrics
Summary:
Use less contended things for metrics.
For histogram which was the largest culprit we use FastLongHistogram
For atomic long where possible we now use counter.

Test Plan: unit tests

Reviewers:

Subscribers:

Differential Revision: https://reviews.facebook.net/D54381
2016-02-24 14:34:05 -08:00