Add being able to configure netty thread counts. Enable socket reuse
(should not have any impact).
hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/BlockingRpcConnection.java
Rename the threads we create in here so they are NOT named same was
threads created by Hadoop RPC.
hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/DefaultNettyEventLoopConfig.java
hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcClient.java
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java
Allow configuring eventloopgroup thread count (so can override for
tests)
hbase-examples/src/main/java/org/apache/hadoop/hbase/client/example/HttpProxyExample.java
Enable socket resuse.
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java
Enable socket resuse and config for how many threads to use.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
hbase-server/src/main/java/org/apache/hadoop/hbase/util/ModifyRegionUtils.java
Thread name edit; drop the redundant 'Thread' suffix.
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HFileReplicator.java
Make closeable and shutdown executor when called.
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
Call close on HFileReplicator
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java
HDFS creates lots of threads. Use less of it so less threads overall.
hbase-server/src/test/resources/hbase-site.xml
hbase-server/src/test/resources/hdfs-site.xml
Constrain resources when running in test context.
hbase-server/src/test/resources/log4j.properties
Enable debug on netty to see netty configs in our log
pom.xml
Add system properties when we launch JVMs to constrain thread counts in
tests
Signed-off-by: Duo Zhang <zhangduo@apache.org>
A miscellaney. Add extra logging to help w/ debug to a bunch of tests.
Fix some issues particular where we ran into mismatched filesystem
complaint. Some modernizations, removal of unnecessary deletes
(especially after seeing tests fail in table delete), and cleanup.
Recategorized one tests because it starts four clusters in the one
JVM from medium to large. Finally, zk standalone server won't come
on occasion; added debug and thread dumping to help figure why (
manifests as test failing in startup saying master didn't launch).
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshot.java
Fixes occasional mismatched filesystems where the difference is file:// vs file:///
or we pick up hdfs schema when it a local fs test. Had to do this
vetting of how we do make qualified on a Path in a few places, not
just here as a few tests failed with this same issue. Code in here is
used by a lot of tests that each in turn suffered this mismatch.
Refactor for clarity
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotV1NoCluster.java
Unused import.
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/store/wal/TestWALProcedureStore.java
This test fails if tmp dir is not where it expects because tries to
make rootdir there. Give it a rootdir under test data dir.
hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
This change is probably useless. I think the issue is actually
a problem addressed later where our test for zk server being
up gets stuck and never times out.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestSplitOrMergeStatus.java
Move off deprecated APIs.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
Log when we fail balance check for DEBUG Currently just says 'false'
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSplitWALProcedure.java
NPEs on way out if setup failed.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
Add logging when assert fails to help w/ DEBUG
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbortTimeout.java
Don't bother removing stuff on teardown. All gets thrown away anyways.
Saw a few hangs in here in the teardown where hdfs was down before
expected messing up shutdown.
hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
Add timeout on socket; was seeing check for zk server getting stuck
and never timing out (test time out in startup)
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotWithTemporaryDirectory.java
Write to test data dir instead.
Be careful about how we make qualified paths.
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
Remove snowflake configs.
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationStatus.java
Add a hacky pause. Tried adding barriers but didn't work. Needs deep
dive.
hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
Remove code copied from zk and use zk methods directly instead.
A general problem is that zk cluster doesn't come up occasionally but
no clue why. Add thread dumping and state check.
These classifications come of running at various fork counts.. A test
may complete quick if low fork count but if it is accessing disk, it
will run much slower if fork count is high. This edit accommodates
some of this phenomenon.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Jan Hentschel <janh@apache.org>
HBASE-20981 - Rollback stateCount accounting thrown-off when exception out of rollbackState
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Sakthi <sakthi@apache.org>
Signed-off-by: Peter Somogyi <psomogyi@apache.org>
(cherry picked from commit 8e8b9b698f)
* Add a bit of javadoc around SerialReplicationChecker.
* Miniscule edit to the profiler jsp page and then a bit of doc on how to make it work that might help.
* Add some detail if NPE getting BitSetNode to help w/ debug.
* Change HbckChore to log region names instead of encoded names; helps doing diagnostics; can take region name and query in shell to find out all about the region according to hbase:meta.
* Add some fix-it help inline in the HBCK Report page – how to fix.
* Add counts in procedures page so can see if making progress; move listing of WALs to end of the page.
The original fix is provided by Sergey Shelukhin, the UT is added by Duo Zhang
Amending-Author: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
This patch switches to the builder pattern by adding a helper method.
It also checks to ensure that the pattern is available (i.e. that
HBase is running on a hadoop version that supports it).
Amending-Author: Mike Drob <mdrob@apache.org>
Signed-off-by: tedyu <yuzhihong@gmail.com>
Signed-off-by: zhangduo <zhangduo@apache.org>
Provides an extra client descriptor to build a second
tarball with a reduced set of dependencies. Not of great
impact now, but will build the way for better in the future.
Signed-off-by: Sean Busbey <busbey@apache.org>
Conflicts:
hbase-assembly/pom.xml
Conflicts:
hbase-spark/pom.xml
A reattempt at fixing HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock
The scenario is a SCP after processing WALs, goes to assign regions that
were on the crashed server but a concurrent Procedure gets in there
first and tries to unassign a region that was on the crashed server
(could be part of a move procedure or a disable table, etc.). The
unassign happens to run AFTER SCP has released all RPCs that
were going against the crashed server. The unassign fails because the
server is crashed. The unassign used to suspend itself only it would
never be woken up because the server it was going against had already
been processed. Worse, the SCP could not make progress because the
unassign was suspended with the lock on a region that it wanted to
assign held making it so it could make no progress.
In here, we add to the unassign recognition of the state where it is
running post SCP cleanup of RPCs. If present, unassign moves to finish
instead of suspending itself.
Includes a nice unit test made by Duo Zhang that reproduces nicely the
hung scenario.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java
Moved this class back to hbase-procedure where it belongs.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoNodeDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoServerDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NullTargetServerDispatchException.java
Specializiations on FRDE so we can be more particular when we say there
was a problem.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
Change addOperationToNode so we throw exceptions that give more detail
on issue rather than a mysterious true/false
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Undo SERVER_CRASH_HANDLE_RIT2. Bad idea (from HBASE-20173)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Have expireServer return true if it actually queued an expiration. Used
later in this patch.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Hide methods that shouldn't be public. Add a particular check used out
in unassign procedure failure processing.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
Check that server we're to move from is actually online (might
catch a few silly move requests early).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
Add doc on ServerState. Wasn't being used really. Now we actually stamp
a Server OFFLINE after its WAL has been split. Means its safe to assign
since all WALs have been processed. Add methods to update SPLITTING
and to set it to OFFLINE after splitting done.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Change logging to be new-style and less repetitive of info.
Cater to new way in which .addOperationToNode returns info (exceptions
rather than true/false).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Add looking for the case where we failed assign AND we should not
suspend because we will never be woken up because SCP is beyond
doing this for all stuck RPCs.
Some cleanup of the failure processing grouping where we can proceed.
TODOs have been handled in this refactor including the TODO that
wonders if it possible that there are concurrent fails coming in
(Yes).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
Doc and removing the old HBASE-20173 'fix'.
Also updating ServerStateNode post WAL splitting so it gets marked
OFFLINE.
A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestServerCrashProcedureStuck.java
Nice test by Duo Zhang.
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Duo Zhang <palomino219@gmail.com>
Signed-off-by: Mike Drob <mdrob@apache.org>
Add backoff when stuck in RegionTransitionProcedure, the subclass of
AssignProcedure and UnassignProcedure. Can happen when we go to
transition but the current Region state is not what we expect.
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
Add doc on being able to suspend and wait on a timeout.
M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
Add 'attempt' counter so we can do backoff when we get stuck.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
Add persistence of new 'attempt' counter
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
Doc data members that are persisted by subclasses given this is 'odd'.
Add a counter for 'attempts' used when 'stuck' to implement backoff.
Add suspend with timeout when 'stuck'. Add callback when timeout is
exhausted which does wakeup of this procedure.
A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestUnexpectedStateException.java
Test of backoff.
rollWriter() fails after creating the file and returns false. In next iteration of while loop in recoverLease() file list is refreshed.
Signed-off-by: Appy <appy@cloudera.com>
Remove commons-cli and commons-collections4 use. Account
for the newer internal protobuf version of 3.5.1.
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: Mike Drob <mdrob@apache.org>
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.
Changes for HBASE-20027 seem to cause UI not showing up on default port in standalone mode. For concurrent
unit test execution, individual tests can set hbase.localcluster.assign.random.ports to true or modify
test/resources/hbase-site.xml.