Commit Graph

138 Commits

Author SHA1 Message Date
Sean Busbey 76396714e1 HBASE-15984 Handle premature EOF treatment of WALs in replication.
In some particular deployments, the Replication code believes it has
reached EOF for a WAL prior to succesfully parsing all bytes known to
exist in a cleanly closed file.

Consistently this failure happens due to an InvalidProtobufException
after some number of seeks during our attempts to tail the in-progress
RegionServer WAL. As a work-around, this patch treats cleanly closed
files differently than other execution paths. If an EOF is detected due
to parsing or other errors while there are still unparsed bytes before
the end-of-file trailer, we now reset the WAL to the very beginning and
attempt a clean read-through.

In current testing, a single such reset is sufficient to work around
observed dataloss. However, the above change will retry a given WAL file
indefinitely. On each such attempt, a log message like the below will
be emitted at the WARN level:

  Processing end of WAL file '{}'. At position {}, which is too far away
  from reported file length {}. Restarting WAL reading (see HBASE-15983
  for details).

Additionally, this patch adds some additional log detail at the TRACE
level about file offsets seen while handling recoverable errors. It also
add metrics that measure the use of this recovery mechanism.
2016-09-29 10:07:14 -05:00
Enis Soztutar eb112783ae HBASE-16604 Scanner retries on IOException can cause the scans to miss data - RECOMMIT after revert 2016-09-23 11:27:13 -07:00
Enis Soztutar 39db0cac78 Revert "HBASE-16604 Scanner retries on IOException can cause the scans to miss data"
This reverts commit 83cf44cd3f.

Reverting because accidental files are committed with this.
2016-09-23 11:25:23 -07:00
Enis Soztutar 83cf44cd3f HBASE-16604 Scanner retries on IOException can cause the scans to miss data 2016-09-22 12:06:11 -07:00
anoopsamjohn 1384c9a08d HBASE-16650 Wrong usage of BlockCache eviction stat for heap memory tuning. 2016-09-22 21:28:30 +05:30
Ashish Singhi 31f16d6aec HBASE-16471 Region Server metrics context will be wrong when machine hostname contain "master" word (Pankaj Kumar) 2016-08-24 18:59:44 +05:30
Geoffrey cb02be38ab HBASE-16448 Custom metrics for custom replication endpoints
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2016-08-23 17:17:08 -07:00
stack 69d170063f HBASE-16181 Fix AssignmentManager MBean name (Reid Chan) 2016-07-29 16:40:39 -07:00
Reid Chan abfd584fe6 HBASE-14743 Add metrics around HeapMemoryManager. (Reid Chan)
Change-Id: I7305f7b7034b216930b5fb5c57de9ba5eabf96d8

Signed-off-by: Apekshit Sharma <appy@apache.org>
2016-07-25 14:32:38 -07:00
Apekshit Sharma eff38ccf8c Revert HBASE-14743 because of wrong attribution. Since I added commit message to the raw patch, it's making me as author instead of Reid. I should have used --author flag to set Reid as author.
This reverts commit 064271da16.
2016-07-25 14:32:38 -07:00
Apekshit Sharma 064271da16 HBASE-14743 Add metrics around HeapMemoryManager. (Reid Chan)
Change-Id: I60b2435355b3e605e7d91cbf5aca5d2988f26f33
2016-07-25 13:45:50 -07:00
Jingcheng Du 518faa735b HBASE-15353 Add metric for number of CallQueueTooBigException's 2016-06-24 13:00:06 +08:00
Gary Helmling f4cec2e202 HBASE-16085 Add a metric for failed compactions 2016-06-23 15:38:58 -07:00
stack 3a95552cfe HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers" Adds HADOOP-9955 RPC idle connection closing is extremely inefficient
Changes how we do accounting of Connections to match how it is done in Hadoop.
Adds a ConnectionManager class. Adds new configurations for this new class.

"hbase.ipc.client.idlethreshold" 4000
"hbase.ipc.client.connection.idle-scan-interval.ms" 10000
"hbase.ipc.client.connection.maxidletime" 10000
"hbase.ipc.client.kill.max", 10
"hbase.ipc.server.handler.queue.size", 100

The new scheme does away with synchronization that purportedly would freeze out
reads while we were cleaning up stale connections (according to HADOOP-9955)

Also adds in new mechanism for accepting Connections by pulling in as many
as we can at a time adding them to a Queue instead of doing one at a time.
Can help when bursty traffic according to HADOOP-9956. Removes a blocking
while Reader is busy parsing a request. Adds configuration
"hbase.ipc.server.read.connection-queue.size" with default of 100 for
queue size.

Signed-off-by: stack <stack@apache.org>
2016-06-07 16:42:21 -07:00
stack e66ecd7db6 Revert "HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers""
Revert mistaken commit...
This reverts commit e0b70c00e7.
2016-06-07 16:41:30 -07:00
stack 6d5a25935e Revert "HBASE-15967 Metric for active ipc Readers and make default fraction of cpu count"
Revert mistaken commit
This reverts commit 1125215aad.
2016-06-07 16:41:01 -07:00
stack 1125215aad HBASE-15967 Metric for active ipc Readers and make default fraction of cpu count
Add new metric hbase.regionserver.ipc.runningReaders
Also make it so Reader count is a factor of processor count
2016-06-07 13:10:14 -07:00
stack e0b70c00e7 HBASE-15948 Port "HADOOP-9956 RPC listener inefficiently assigns connections to readers"
Adds HADOOP-9955 RPC idle connection closing is extremely inefficient
Then removes queue added by HADOOP-9956 at Enis suggestion

    Changes how we do accounting of Connections to match how it is done in Hadoop.
    Adds a ConnectionManager class. Adds new configurations for this new class.

    "hbase.ipc.client.idlethreshold" 4000
    "hbase.ipc.client.connection.idle-scan-interval.ms" 10000
    "hbase.ipc.client.connection.maxidletime" 10000
    "hbase.ipc.client.kill.max", 10
    "hbase.ipc.server.handler.queue.size", 100

    The new scheme does away with synchronization that purportedly would freeze out
    reads while we were cleaning up stale connections (according to HADOOP-9955)

    Also adds in new mechanism for accepting Connections by pulling in as many
    as we can at a time adding them to a Queue instead of doing one at a time.
    Can help when bursty traffic according to HADOOP-9956. Removes a blocking
    while Reader is busy parsing a request. Adds configuration
    "hbase.ipc.server.read.connection-queue.size" with default of 100 for
    queue size.
2016-06-07 13:10:14 -07:00
Enis Soztutar b75b226804 HBASE-15740 Replication source.shippedKBs metric is undercounting because it is in KB 2016-05-09 10:25:49 -07:00
Alex Moundalexis 0bf065a5d5 HBASE-15768 fix capitalization of ZooKeeper usage
Signed-off-by: Sean Busbey <busbey@apache.org>
2016-05-05 15:35:44 -05:00
Enis Soztutar 4c0587134a HBASE-15671 Add per-table metrics on memstore, storefile and regionsize (Alicia Ying Shu) 2016-04-21 13:33:26 -07:00
Andrew Purtell b6617b4eb9 HBASE-15663 Hook up JvmPauseMonitor to ThriftServer 2016-04-20 17:37:35 -07:00
Andrew Purtell a330a2b505 HBASE-15662 Hook up JvmPauseMonitor to REST server 2016-04-20 17:37:35 -07:00
Andrew Purtell 2c26fe37ac HBASE-15614 Report metrics from JvmPauseMonitor 2016-04-20 17:37:34 -07:00
Enis Soztutar 18d70bc680 HBASE-15518 Add Per-Table metrics back (Alicia Ying Shu) 2016-04-20 14:35:45 -07:00
tedyu 8541fe4ad1 HBASE-15093 Replication can report incorrect size of log queue for the global source when multiwal is enabled (Ashu Pachauri) 2016-04-11 08:17:20 -07:00
Elliott Clark a71ce6e738 HBASE-14983 Create metrics for per block type hit/miss ratios
Summary: Missing a root index block is worse than missing a data block. We should know the difference

Test Plan: Tested on a local instance. All numbers looked reasonable.

Differential Revision: https://reviews.facebook.net/D55563
2016-03-30 11:41:11 -07:00
Enis Soztutar b3fe4ed16c HBASE-15412 Add average region size metric (Alicia Ying Shu) 2016-03-22 14:46:27 -07:00
Enis Soztutar 797562e6c3 HBASE-15464 Flush / Compaction metrics revisited 2016-03-21 17:50:02 -07:00
Enis Soztutar 51259fe4a5 HBASE-15377 Per-RS Get metric is time based, per-region metric is size-based (Heng Chen) 2016-03-15 11:22:18 -07:00
Enis Soztutar a979d85582 HBASE-15435 Add WAL (in bytes) written metric (Alicia Ying Shu) 2016-03-10 20:16:30 -08:00
chenheng f30afa05d9 HBASE-15376 ScanNext metric is size-based while every other per-operation metric is time based 2016-03-07 17:36:40 +08:00
Jonathan M Hsieh f658f3ef83 HBASE-15356 Remove unused imports (Youngjoon Kim) 2016-03-03 11:42:38 -08:00
Mikhail Antonov 43f99def67 HBASE-15136 Explore different queuing behaviors while busy 2016-02-24 20:41:30 -08:00
Elliott Clark 630a65825e HBASE-15222 Use less contended classes for metrics
Summary:
Use less contended things for metrics.
For histogram which was the largest culprit we use FastLongHistogram
For atomic long where possible we now use counter.

Test Plan: unit tests

Reviewers:

Subscribers:

Differential Revision: https://reviews.facebook.net/D54381
2016-02-24 14:34:05 -08:00
Mikhail Antonov e58c0385a7 HBASE-15135 Add metrics for storefile age 2016-02-22 02:21:02 -08:00
stack eacf7bcf97 HBASE-15163 Add sampling code and metrics for get/scan/multi/mutate count separately (Yu Li) 2016-02-06 06:30:56 -08:00
chenheng 8f20bc748d HBASE-15197 Expose filtered read requests metric to metrics framework and Web UI (Eungsop Yoo) 2016-02-05 10:57:14 +08:00
tedyu 5266b07708 HBASE-15068 Add metrics for region normalization plans 2016-01-07 03:13:16 -08:00
Elliott Clark 48e217a7db HBASE-14946 Don't allow multi's to over run the max result size.
Summary:
* Add VersionInfoUtil to determine if a client has a specified version or better
* Add an exception type to say that the response should be chunked
* Add on client knowledge of retry exceptions
* Add on metrics for how often this happens

Test Plan: Added a unit test

Differential Revision: https://reviews.facebook.net/D51771
2015-12-10 18:10:32 -08:00
ramkrishna 26ac60b03f HBASE-13153 Bulk Loaded HFile Replication (Ashish Singhi) 2015-12-10 13:07:46 +05:30
Lars Hofhansl 7bfbb6a3c9 HBASE-14869 Better request latency and size histograms. (Vikas Vishwakarma and Lars Hofhansl) 2015-12-08 17:02:27 -08:00
Vrishal Kulkarni 1f999c1e2b HBASE-14719 Add metrics for master WAL count (numMasterWALs). Metric numMasterWALs appears as follows in metrics dump
{
    "name" : "Hadoop:service=HBase,name=Master,sub=Procedure",
    "modelerType" : "Master,sub=Procedure",
    "tag.Context" : "master",
    "tag.Hostname" : "vrishal-mbp",
    "numMasterWALs" : 1
},

Signed-off-by: Elliott Clark <eclark@apache.org>
2015-12-07 11:14:29 -08:00
stack 51503efcf0 HBASE-13857 Slow WAL Append count in ServerMetricsTmpl.jamon is hardcoded to zero (Vrishal Kulkarni) 2015-12-03 17:00:29 -08:00
Sanjeev Lakshmanan 6b11adbfa4 HBASE-14862 Add support for reporting p90 for histogram metrics
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2015-11-23 15:55:45 -08:00
Elliott Clark a48d30984a HBASE-14793 Allow limiting size of block into L1 block cache. 2015-11-17 10:37:49 -08:00
Elliott Clark ea795213b2 HBASE-14778 Make block cache hit percentages not integer in the metrics system 2015-11-10 12:25:59 -08:00
stack 9630fec2d5 Revert "HBASE-14725 Vet categorization of tests so they for sure go into the right small/medium/large buckets"
Revert. Seems to have destabilized the build

This reverts commit 6dbb5a8052.
2015-11-02 08:17:41 -08:00
stack 6dbb5a8052 HBASE-14725 Vet categorization of tests so they for sure go into the right small/medium/large buckets 2015-11-01 22:26:43 -08:00
Gary Helmling 683f84e6a2 HBASE-14700 Support a permissive mode for secure clusters to allow SIMPLE auth clients 2015-10-30 19:45:46 -07:00