Edit of log about archiving that shows in middle of a table create;
try to make it less disorientating.
Loosen assert. Compaction may have produced a single file only. Allow
for this.
Make this test less furious given it is inline w/ a bunch of unit
Add debug
Add wait on quota table to show up before moving forward; otherwise,
attempt at quota setting fails.
Remove asserts that expected regions to still have a presence in fs
after merge when a catalogjanitor may have cleaned up parent dirs.
Catch exception on way out and log it rather than let it fail test.
Wait on acl table before proceeding.
Remove spurious assert. Just before this it waits an arbitrary 10
seconds. Compactions could have completed inside this time. The spirit
of the test remains.
Get log cleaner to go down promptly; its sticking around. See if this
helps with TestMasterShutdown
We get a rare NPE trying to sync. Make local copy of SyncFuture and see
if that helps.
Compaction may have completed when not expected; allow for it.
Add wait before testing. Compaction may not have completed. Let
compaction complete before progressing and then test for empty cache.
Less resources.
Wait till online before we try and do compaction (else request is
Disable test that fails randomly w/ mockito complaint on some mac os
TestMasterShutdown... fix NPE in RSRpcDispatcher... catch it and covert
to false and have master check for successful startup.
Change parameter name and add javadoc to make it more clear what the
param actually is.
Move postOpenDeployTasks so if it fails to talk to the Master -- which
can happen on cluster shutdown -- then we will do cleanup of state;
without this the RS can get stuck and won't go down.
Add handleException so CRH looks more like UnassignRegionHandler and
AssignRegionHandler around exception handling. Add a bit of doc on
why CRH.
Right shift most of the body of process so can add in a finally
that cleans up rs.getRegionsInTransitionInRS is on exception
(otherwise outstanding entries can stop a RS going down on cluster
(cherry picked from commit 6d9802fc2ea4d0da75164cabe58d620ade5c604a)
- consolidate checks made by master on behalf of balancer and
normalizer: deciding if the master is in a healthy state for
running any actions at all (skipRegionManagementAction). Normalizer
now does as balancer did previously.
- both balancer and normalizer make one final check on above
conditions between calculating an action plan and executing the
plan. should make the process more responsive to shutdown
- change normalizer to only consider acting on a region when it is in
the OPEN state. previously we would normalizer attempt to merge a
region that was already in a MERGING_NEW,MERGING,MERGED state.
- fix some typos in variable names.
Add being able to configure netty thread counts. Enable socket reuse
(should not have any impact).
Rename the threads we create in here so they are NOT named same was
threads created by Hadoop RPC.
Allow configuring eventloopgroup thread count (so can override for
Enable socket resuse.
Enable socket resuse and config for how many threads to use.
Thread name edit; drop the redundant 'Thread' suffix.
Make closeable and shutdown executor when called.
Call close on HFileReplicator
HDFS creates lots of threads. Use less of it so less threads overall.
Constrain resources when running in test context.
Enable debug on netty to see netty configs in our log
Add system properties when we launch JVMs to constrain thread counts in
Restore behavior from before HBASE-21789 (hbase-2.2.0) where we convert
all exceptions to IOEs, even RuntimeExceptions. Actual fix is this change (in case
obscured by doc and lambda simplification):
} catch (Throwable e) {
- Throwables.propagateIfPossible(e, IOException.class);
+ // Throw if an IOE else wrap in an IOE EVEN IF IT IS a RuntimeException (e.g.
+ // a RejectedExecutionException because the hosting exception is shutting down.
+ // This is old behavior worth reexamining. Procedures doing merge or split
+ // currently don't handle RuntimeExceptions coming up out of meta table edits.
+ // Would have to work on this at least. See HBASE-23904.
+ Throwables.throwIfInstanceOf(e, IOException.class);
A miscellaney. Add extra logging to help w/ debug to a bunch of tests.
Fix some issues particular where we ran into mismatched filesystem
complaint. Some modernizations, removal of unnecessary deletes
(especially after seeing tests fail in table delete), and cleanup.
Recategorized one tests because it starts four clusters in the one
JVM from medium to large. Finally, zk standalone server won't come
on occasion; added debug and thread dumping to help figure why (
manifests as test failing in startup saying master didn't launch).
Fixes occasional mismatched filesystems where the difference is file:// vs file:///
or we pick up hdfs schema when it a local fs test. Had to do this
vetting of how we do make qualified on a Path in a few places, not
just here as a few tests failed with this same issue. Code in here is
used by a lot of tests that each in turn suffered this mismatch.
Refactor for clarity
Unused import.
This test fails if tmp dir is not where it expects because tries to
make rootdir there. Give it a rootdir under test data dir.
This change is probably useless. I think the issue is actually
a problem addressed later where our test for zk server being
up gets stuck and never times out.
Move off deprecated APIs.
Log when we fail balance check for DEBUG Currently just says 'false'
NPEs on way out if setup failed.
Add logging when assert fails to help w/ DEBUG
Don't bother removing stuff on teardown. All gets thrown away anyways.
Saw a few hangs in here in the teardown where hdfs was down before
expected messing up shutdown.
Add timeout on socket; was seeing check for zk server getting stuck
and never timing out (test time out in startup)
Write to test data dir instead.
Be careful about how we make qualified paths.
Remove snowflake configs.
Add a hacky pause. Tried adding barriers but didn't work. Needs deep
Remove code copied from zk and use zk methods directly instead.
A general problem is that zk cluster doesn't come up occasionally but
no clue why. Add thread dumping and state check.
Master rpc server end point doesn't bind to localhost's
IP address by default. Instead, it looks up the hostname and
binds to the endpoint to which it resolves. MasterRegistry should
do the same when building the default server end point to talk to.
TestFromClientSideWithCoprocessor: Initialization bug causing parameterized
runs to fail.
TestCustomSaslAuthenticationProvider: Test config had to be fixed because
it was written pre-master registry implementation.
TestSnapshotScannerHDFSAclController: Cluster restart did not reset the
cached connection state.
There were a couple of issues.
- There was a leak of a file descriptor for hbck lock file. This
was contributing to all the "ConnectionRefused" stack traces since
it was trying to renew lease for an already expired mini dfs cluster.
This issue was there for a while, just that we noticed it now.
- After upgrade to JUnit 4.13, it looks like the behavior for test
timeouts has changed. Earlier the timeout seems to have applied for
each parameterized run, but now it looks like it is applied across
all the runs.
This patch fixes both the issues.
