Note: branch-1 has design difference compared to other branches in the replication sub-system. HMaster does not coordinate replication actions in branch-1 and hence each RS is responsible for initing peers and updating ZK states. As part of this change we are updating zk state of peers after reading from configuration, so if there is a divergence in configuration across RS the result can be can be non-deterministic and the last RS RPC will win.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Looped through the test 100 times and it passes. Without the patch it fails
every ~10 runs or so.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
Looped through the test 100 times and it passes. Without the patch it fails
every ~10 runs or so.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
* refactor how we use connection to rely on the access method
* refactor initialization and cleanup of the shared connection
* incompatibly change HCTU's Configuration member variable to be final so it can be safely accessed from multiple threads.
Closes#2188
adapted for jdk7
Signed-off-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 86ebbdd8a2df89de37c2c3bd50e64292eaf28b11)
(cherry picked from commit 0806349adab338330428c900588234d7f6fcfcc2)
if hbase.rowlock.wait.duration is <=0 then log a message and treat it as a value of 1ms.
amended for branches-1
Signed-off-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 840a55761b4b3db6bfcf8ca5b7ae67509fc21566)
Rewrote the patch for branch-1 since master has significanly diverged.
(cherry picked from commit dc5ef7af1f8b9e386495a73924c9442203f65a77)
Co-authored-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Sandeep Pal <50725353+sandeepvinayak@users.noreply.github.com>
Co-authored-by: Sandeep Pal <50725353+sandeepvinayak@users.noreply.github.com>
We observed this delete call to be a bottleneck for table with lots of
regions. Patch attempts to parallelize them.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
(cherry picked from commit f07f30ae24e6d0e9151bb76aebf78928ec9572e3)
Writing a test for this is tricky. There is enough coverage for
functional tests. Only concern is performance, but there is enough
logging for it to detect timed out/badly performing sync calls.
Additionally, this patch decouples the ZK event processing into it's
own thread rather than doing it in the EventThread's context. That
avoids deadlocks and stalls of the event thread.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 84e246f9b197bfa4307172db5465214771b78d38)
(cherry picked from commit 2379a25f0c4f2bdd3ea91fa5e0ba63f034c8d21c)
* Add chaos monkey action for suspend/resume region servers
* Add these to relevant chaos monkeys
branch-1-backport-note: Graceful regionserver restart action wasn't
backported due to a dependency of "RegionMover" script. Can be done
later if needed.
Signed-off-by: Balazs Meszaros <meszibalu@apache.org>
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
While we never expect table descriptors to be missing, a corrupt meta
can result in the master crashing before regions get assigned. We can
guard against that happening with a simple null-check.
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Closes#1908