203 Commits

Author SHA1 Message Date
Josh Elser
d671a1dbc6 HBASE-17955 Various reviewboard improvements to space quota work
Most notable change is to cache SpaceViolationPolicyEnforcement objects
in the write path. When a table has no quota or there is not SpaceQuotaSnapshot
for that table (yet), we want to avoid creating lots of
SpaceViolationPolicyEnforcement instances, caching one instance
instead. This will help reduce GC pressure.
2017-05-22 13:41:36 -04:00
Josh Elser
13af7f8ac6 HBASE-17002 JMX metrics and some UI additions for space quotas 2017-05-22 13:41:35 -04:00
ckulkarni
eb6ded4849 HBASE-17448 Export metrics from RecoverableZooKeeper
Added metrics for RecoverableZooKeeper related to specific exceptions,
total failed ZooKeeper API calls and latency histograms for read,
write and sync operations. Also added unit tests for the same. Added
service provider for the ZooKeeper metrics implementation inside the
hadoop compatibility module.

Signed-off-by: Andrew Purtell <apurtell@apache.org>
2017-04-26 18:30:13 -07:00
Umesh Agashe
c8461456d0 HBASE-17888: Added generic methods for updating metrics on submit and finish of a procedure execution
Signed-off-by: Michael Stack <stack@apache.org>
2017-04-14 11:51:08 -07:00
Jan Hentschel
b53f354763 HBASE-17532 Replaced explicit type with diamond operator
Signed-off-by: Michael Stack <stack@apache.org>
2017-03-07 11:22:51 -08:00
Ashu Pachauri
d2c083d21c HBASE-17627 Active workers metric for thrift (Ashu Pachauri)
Signed-off-by: Gary Helmling <garyh@apache.org>
2017-02-15 14:50:19 -08:00
Gary Helmling
7a6518f05d HBASE-17578 Thrift metrics should handle exceptions 2017-02-06 11:39:49 -08:00
Jerry He
bc168b419d HBASE-17581 mvn clean test -PskipXXXTests does not work properly for some modules (Yi Liang) 2017-02-02 11:05:17 -08:00
Michael Stack
fb2c89b1b3 HBASE-16785 We are not running all tests
M TestStressWALProcedureStore.java
Disable test that now runs that fails because of difference in pb3.1.0.

Signed-off-by: Michael Stack <stack@apache.org>
2017-01-26 21:49:18 -08:00
Michael Stack
1ee8776273 HBASE-16785 We are not running all tests Do nothing patch just to get baseline of how many tests we run 2017-01-26 12:21:00 -08:00
Enis Soztutar
c64a1d1994 HBASE-9774 HBase native metrics and metric collection for coprocessors 2017-01-25 11:47:35 -08:00
Guanghao Zhang
b3d8d06703 HBASE-17205 Add a metric for the duration of region in transition 2016-12-01 10:13:37 -08:00
Guanghao Zhang
cc03f7ad53 HBASE-16561 Add metrics about read/write/scan queue length and active read/write/scan handler count
Signed-off-by: zhangduo <zhangduo@apache.org>
2016-11-29 09:43:30 +08:00
Enis Soztutar
03bc884ea0 HBASE-17017 Remove the current per-region latency histogram metrics 2016-11-08 18:31:53 -08:00
Dustin Pho
fcef2c02c9 HBASE-16661 Add last major compaction age to per-region metrics
Signed-off-by: Gary Helmling <garyh@apache.org>
2016-10-10 15:16:12 -07:00
Sean Busbey
76396714e1 HBASE-15984 Handle premature EOF treatment of WALs in replication.
In some particular deployments, the Replication code believes it has
reached EOF for a WAL prior to succesfully parsing all bytes known to
exist in a cleanly closed file.

Consistently this failure happens due to an InvalidProtobufException
after some number of seeks during our attempts to tail the in-progress
RegionServer WAL. As a work-around, this patch treats cleanly closed
files differently than other execution paths. If an EOF is detected due
to parsing or other errors while there are still unparsed bytes before
the end-of-file trailer, we now reset the WAL to the very beginning and
attempt a clean read-through.

In current testing, a single such reset is sufficient to work around
observed dataloss. However, the above change will retry a given WAL file
indefinitely. On each such attempt, a log message like the below will
be emitted at the WARN level:

  Processing end of WAL file '{}'. At position {}, which is too far away
  from reported file length {}. Restarting WAL reading (see HBASE-15983
  for details).

Additionally, this patch adds some additional log detail at the TRACE
level about file offsets seen while handling recoverable errors. It also
add metrics that measure the use of this recovery mechanism.
2016-09-29 10:07:14 -05:00
Enis Soztutar
eb112783ae HBASE-16604 Scanner retries on IOException can cause the scans to miss data - RECOMMIT after revert 2016-09-23 11:27:13 -07:00
Enis Soztutar
39db0cac78 Revert "HBASE-16604 Scanner retries on IOException can cause the scans to miss data"
This reverts commit 83cf44cd3f19c841ac53889d09454ed5247ce591.

Reverting because accidental files are committed with this.
2016-09-23 11:25:23 -07:00
Enis Soztutar
83cf44cd3f HBASE-16604 Scanner retries on IOException can cause the scans to miss data 2016-09-22 12:06:11 -07:00
anoopsamjohn
1384c9a08d HBASE-16650 Wrong usage of BlockCache eviction stat for heap memory tuning. 2016-09-22 21:28:30 +05:30
Ashish Singhi
31f16d6aec HBASE-16471 Region Server metrics context will be wrong when machine hostname contain "master" word (Pankaj Kumar) 2016-08-24 18:59:44 +05:30
Geoffrey
cb02be38ab HBASE-16448 Custom metrics for custom replication endpoints
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2016-08-23 17:17:08 -07:00
stack
69d170063f HBASE-16181 Fix AssignmentManager MBean name (Reid Chan) 2016-07-29 16:40:39 -07:00
Reid Chan
abfd584fe6 HBASE-14743 Add metrics around HeapMemoryManager. (Reid Chan)
Change-Id: I7305f7b7034b216930b5fb5c57de9ba5eabf96d8

Signed-off-by: Apekshit Sharma <appy@apache.org>
2016-07-25 14:32:38 -07:00
Apekshit Sharma
eff38ccf8c Revert HBASE-14743 because of wrong attribution. Since I added commit message to the raw patch, it's making me as author instead of Reid. I should have used --author flag to set Reid as author.
This reverts commit 064271da16efd3e5d9d4787d778fa711b7f9f6ab.
2016-07-25 14:32:38 -07:00
Apekshit Sharma
064271da16 HBASE-14743 Add metrics around HeapMemoryManager. (Reid Chan)
Change-Id: I60b2435355b3e605e7d91cbf5aca5d2988f26f33
2016-07-25 13:45:50 -07:00
Jingcheng Du
518faa735b HBASE-15353 Add metric for number of CallQueueTooBigException's 2016-06-24 13:00:06 +08:00
Gary Helmling
f4cec2e202 HBASE-16085 Add a metric for failed compactions 2016-06-23 15:38:58 -07:00
stack
3a95552cfe HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers" Adds HADOOP-9955 RPC idle connection closing is extremely inefficient
Changes how we do accounting of Connections to match how it is done in Hadoop.
Adds a ConnectionManager class. Adds new configurations for this new class.

"hbase.ipc.client.idlethreshold" 4000
"hbase.ipc.client.connection.idle-scan-interval.ms" 10000
"hbase.ipc.client.connection.maxidletime" 10000
"hbase.ipc.client.kill.max", 10
"hbase.ipc.server.handler.queue.size", 100

The new scheme does away with synchronization that purportedly would freeze out
reads while we were cleaning up stale connections (according to HADOOP-9955)

Also adds in new mechanism for accepting Connections by pulling in as many
as we can at a time adding them to a Queue instead of doing one at a time.
Can help when bursty traffic according to HADOOP-9956. Removes a blocking
while Reader is busy parsing a request. Adds configuration
"hbase.ipc.server.read.connection-queue.size" with default of 100 for
queue size.

Signed-off-by: stack <stack@apache.org>
2016-06-07 16:42:21 -07:00
stack
e66ecd7db6 Revert "HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers""
Revert mistaken commit...
This reverts commit e0b70c00e74aeaac33570508e3732a53daea839e.
2016-06-07 16:41:30 -07:00
stack
6d5a25935e Revert "HBASE-15967 Metric for active ipc Readers and make default fraction of cpu count"
Revert mistaken commit
This reverts commit 1125215aad3f5b149f3458ba7019c5920f6dca66.
2016-06-07 16:41:01 -07:00
stack
1125215aad HBASE-15967 Metric for active ipc Readers and make default fraction of cpu count
Add new metric hbase.regionserver.ipc.runningReaders
Also make it so Reader count is a factor of processor count
2016-06-07 13:10:14 -07:00
stack
e0b70c00e7 HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers"
Adds HADOOP-9955 RPC idle connection closing is extremely inefficient
Then removes queue added by HADOOP-9956 at Enis suggestion

    Changes how we do accounting of Connections to match how it is done in Hadoop.
    Adds a ConnectionManager class. Adds new configurations for this new class.

    "hbase.ipc.client.idlethreshold" 4000
    "hbase.ipc.client.connection.idle-scan-interval.ms" 10000
    "hbase.ipc.client.connection.maxidletime" 10000
    "hbase.ipc.client.kill.max", 10
    "hbase.ipc.server.handler.queue.size", 100

    The new scheme does away with synchronization that purportedly would freeze out
    reads while we were cleaning up stale connections (according to HADOOP-9955)

    Also adds in new mechanism for accepting Connections by pulling in as many
    as we can at a time adding them to a Queue instead of doing one at a time.
    Can help when bursty traffic according to HADOOP-9956. Removes a blocking
    while Reader is busy parsing a request. Adds configuration
    "hbase.ipc.server.read.connection-queue.size" with default of 100 for
    queue size.
2016-06-07 13:10:14 -07:00
Enis Soztutar
b75b226804 HBASE-15740 Replication source.shippedKBs metric is undercounting because it is in KB 2016-05-09 10:25:49 -07:00
Alex Moundalexis
0bf065a5d5 HBASE-15768 fix capitalization of ZooKeeper usage
Signed-off-by: Sean Busbey <busbey@apache.org>
2016-05-05 15:35:44 -05:00
Enis Soztutar
4c0587134a HBASE-15671 Add per-table metrics on memstore, storefile and regionsize (Alicia Ying Shu) 2016-04-21 13:33:26 -07:00
Andrew Purtell
b6617b4eb9 HBASE-15663 Hook up JvmPauseMonitor to ThriftServer 2016-04-20 17:37:35 -07:00
Andrew Purtell
a330a2b505 HBASE-15662 Hook up JvmPauseMonitor to REST server 2016-04-20 17:37:35 -07:00
Andrew Purtell
2c26fe37ac HBASE-15614 Report metrics from JvmPauseMonitor 2016-04-20 17:37:34 -07:00
Enis Soztutar
18d70bc680 HBASE-15518 Add Per-Table metrics back (Alicia Ying Shu) 2016-04-20 14:35:45 -07:00
tedyu
8541fe4ad1 HBASE-15093 Replication can report incorrect size of log queue for the global source when multiwal is enabled (Ashu Pachauri) 2016-04-11 08:17:20 -07:00
Elliott Clark
a71ce6e738 HBASE-14983 Create metrics for per block type hit/miss ratios
Summary: Missing a root index block is worse than missing a data block. We should know the difference

Test Plan: Tested on a local instance. All numbers looked reasonable.

Differential Revision: https://reviews.facebook.net/D55563
2016-03-30 11:41:11 -07:00
Enis Soztutar
b3fe4ed16c HBASE-15412 Add average region size metric (Alicia Ying Shu) 2016-03-22 14:46:27 -07:00
Enis Soztutar
797562e6c3 HBASE-15464 Flush / Compaction metrics revisited 2016-03-21 17:50:02 -07:00
Enis Soztutar
51259fe4a5 HBASE-15377 Per-RS Get metric is time based, per-region metric is size-based (Heng Chen) 2016-03-15 11:22:18 -07:00
Enis Soztutar
a979d85582 HBASE-15435 Add WAL (in bytes) written metric (Alicia Ying Shu) 2016-03-10 20:16:30 -08:00
chenheng
f30afa05d9 HBASE-15376 ScanNext metric is size-based while every other per-operation metric is time based 2016-03-07 17:36:40 +08:00
Jonathan M Hsieh
f658f3ef83 HBASE-15356 Remove unused imports (Youngjoon Kim) 2016-03-03 11:42:38 -08:00
Mikhail Antonov
43f99def67 HBASE-15136 Explore different queuing behaviors while busy 2016-02-24 20:41:30 -08:00
Elliott Clark
630a65825e HBASE-15222 Use less contended classes for metrics
Summary:
Use less contended things for metrics.
For histogram which was the largest culprit we use FastLongHistogram
For atomic long where possible we now use counter.

Test Plan: unit tests

Reviewers:

Subscribers:

Differential Revision: https://reviews.facebook.net/D54381
2016-02-24 14:34:05 -08:00