Network identities should be bound late. Remote addresses should be
resolved at the last possible moment, just before connect(). Network
identity mappings can change, so our code should not inappropriately
cache them. Otherwise we might miss a change and fail to operate normally.
Revert "HBASE-14544 Allow HConnectionImpl to not refresh the dns on errors"
Removes hbase.resolve.hostnames.on.failure and related code. We always
resolve hostnames, as late as possible.
Preserve InetSocketAddress caching per RPC connection. Avoids potential
lookups per Call.
Replace InetSocketAddress with Address where used as a map key. If we want
to key by hostname and/or resolved address we should be explicit about it.
Using Address chooses mapping by hostname and port only.
Add metrics for potential nameservice resolution attempts, whenever an
InetSocketAddress is instantiated for connect; and metrics for failed
resolution, whenever InetSocketAddress#isUnresolved on the new instance
is true.
* Use ServerName directly to build a stub key
* Resolve and cache ISA on a RpcChannel as late as possible, at first call
* Remove now invalid unit test TestCIBadHostname
We resolve DNS at the latest possible time, at first call, and do not
resolve hostnames for creating stubs at all, so this unit test cannot
work now.
Reviewed-by: Mingliang Liu <liuml07@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
HBase UI is currently using in bootstrap 3.3.7. This version is vulnerable to 4
medium CVEs (CVE-2018-14040, CVE-2018-14041, CVE-2018-14042, and CVE-2019-8331).
Details on all the bootstrap versions and vulnerabilities is
here: https://snyk.io/vuln/npm:bootstrap
Upgrading to bootstrap 4 would be nice, but potentially more work to do. We
should at least upgrade to the latest bootstrap 3, which is 3.4.1 currently.
closes#2661
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
If hbase.regionserver.close.wait.abort is set to true, interrupt RPC
handler threads holding the region close lock.
Until requests in progress can be aborted, wait on the region close lock for
a configurable interval (specified by hbase.regionserver.close.wait.time.ms,
default 60000 (1 minute)). If we have failed to acquire the close lock after
this interval elapses, if allowed (also specified by
hbase.regionserver.close.wait.abort), abort the regionserver.
We will attempt to interrupt any running handlers every
hbase.regionserver.close.wait.interval.ms (default 10000 (10 seconds)) until
either the close lock is acquired or we reach the maximum wait time.
Define a subset of region operations as interruptible. Track threads holding
the close lock transiting those operations. Set the thread interrupt status
of tracked threads when trying to close the region. Use the thread interrupt
status where safe to break out of request processing.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Reid Chan <reidchan@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Patch fixes the single table input format case.
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Reid Chan <reidchan@apache.org>
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Change the test to wait for evidence that the active master has seen
that the backup master killed by the test has gone away. This is done
before proceeding to validate that the dead backup is correctly
omitted from the ClusterStatus report.
Also, minor fixup to several assertions, using `assertEquals` instead
of `assertTrue(...equals(...))` and correcting expected vs. actual
ordering of assertion arguments.
Signed-off-by: Michael Stack <stack@apache.org>
This patch adds the ability to discover newly added masters
dynamically on the master registry side. The trigger for the
re-fetch is either periodic (5 mins) or any registry RPC failure.
Master server information is cached in masters to avoid repeated
ZK lookups.
Updates the client side connection metrics to maintain a counter
per RPC type so that clients have visibility into counts grouped
by RPC method name.
I didn't add the method to ZK registry interface since there
is a design discussion going on in splittable meta doc. We can
add it later if needed.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Andrew Purtell <apurtell@apache.org>
(cherry picked from commit 275a38e153)
(cherry picked from commit bb9121da77)
* HBASE-23304: RPCs needed for client meta information lookup
This patch implements the RPCs needed for the meta information
lookup during connection init. New tests added to cover the RPC
code paths. HBASE-23305 builds on this to implement the client
side logic.
Fixed a bunch of checkstyle nits around the places the patch
touches.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
(cherry picked from commit 4f8fbba0c0)
(cherry picked from commit 488460e840)
* HBASE-23281: Track meta region changes on masters
This patch adds a simple cache that tracks the meta region replica
locations. It keeps an eye on the region movements so that the
cached locations are not stale.
This information is used for servicing client RPCs for connections
that use master based registry (HBASE-18095). The RPC end points
will be added in a separate patch.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Andrew Purtell <apurtell@apache.org>
(cherry picked from commit 8571d389cf)
(cherry picked from commit 89581d9d21)
Currently we just track whether an active master exists.
It helps to also track the address of the active master in
all the masters to help serve the client RPC requests to
know which master is active.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Andrew Purtell <apurtell@apache.org>
(cherry picked from commit efebb843af)
(cherry picked from commit 742949165f)
This patch implements a simple cache that all the masters
can lookup to serve cluster ID to clients. Active HMaster
is still responsible for creating it but all the masters
will read it from fs to serve clients.
RPCs exposing it will come in a separate patch as a part of
HBASE-18095.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Signed-off-by: Guangxu Cheng <guangxucheng@gmail.com>
(cherry picked from commit c2e01f2398)
(cherry picked from commit 9ab652982b)
Note: branch-1 has design difference compared to other branches in the replication sub-system. HMaster does not coordinate replication actions in branch-1 and hence each RS is responsible for initing peers and updating ZK states. As part of this change we are updating zk state of peers after reading from configuration, so if there is a divergence in configuration across RS the result can be can be non-deterministic and the last RS RPC will win.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Looped through the test 100 times and it passes. Without the patch it fails
every ~10 runs or so.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
Looped through the test 100 times and it passes. Without the patch it fails
every ~10 runs or so.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
* refactor how we use connection to rely on the access method
* refactor initialization and cleanup of the shared connection
* incompatibly change HCTU's Configuration member variable to be final so it can be safely accessed from multiple threads.
Closes#2188
adapted for jdk7
Signed-off-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 86ebbdd8a2)
(cherry picked from commit 0806349ada)
if hbase.rowlock.wait.duration is <=0 then log a message and treat it as a value of 1ms.
amended for branches-1
Signed-off-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 840a55761b)
Rewrote the patch for branch-1 since master has significanly diverged.
(cherry picked from commit dc5ef7af1f8b9e386495a73924c9442203f65a77)
Co-authored-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Sandeep Pal <50725353+sandeepvinayak@users.noreply.github.com>
Co-authored-by: Sandeep Pal <50725353+sandeepvinayak@users.noreply.github.com>
We observed this delete call to be a bottleneck for table with lots of
regions. Patch attempts to parallelize them.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
(cherry picked from commit f07f30ae24)
Writing a test for this is tricky. There is enough coverage for
functional tests. Only concern is performance, but there is enough
logging for it to detect timed out/badly performing sync calls.
Additionally, this patch decouples the ZK event processing into it's
own thread rather than doing it in the EventThread's context. That
avoids deadlocks and stalls of the event thread.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 84e246f9b1)
(cherry picked from commit 2379a25f0c)
* Add chaos monkey action for suspend/resume region servers
* Add these to relevant chaos monkeys
branch-1-backport-note: Graceful regionserver restart action wasn't
backported due to a dependency of "RegionMover" script. Can be done
later if needed.
Signed-off-by: Balazs Meszaros <meszibalu@apache.org>
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
While we never expect table descriptors to be missing, a corrupt meta
can result in the master crashing before regions get assigned. We can
guard against that happening with a simple null-check.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Closes#1908
This one surfaced as a flake test but turns out to be a legit bug
in FIFOCompaction code. FifoCompaction does not check if an empty
store file is already being compacted by an in-flight compaction
request and still enqueues. It can potentially race with a running
compaction (as in this test case, see jira for the exact exception).
Fixes the edge case and cleans up the test code a bit.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
HBASE-24479: Deflake TestCompaction#testStopStartCompaction
Polling of active compaction count is racy. Tightened the asserts
to be more reliable.
Reid Chan <reidchan@apache.org>
This utility is useful for any module that wants to detect
dynamic config changes. Having it to hbase-common makes it
accessible to all the other modules.
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>