Down jdk8 forked jvm heap from 2800 to 2200 and the jdk11 heap from
3200 to 2200. Down the mvn size from 4G to 3.6G
Change how many puts done by TestMultiRespectsLimits because made
the test run the forked heap over 2.5G in size.
Signed-off-by: Sean Busbey <busbey@apache.org>
* Reorganize MOB compaction tests for more reuse.
* Add tests for mob compaction after snapshot clone operations
* note the original table used to write a given mob hfile and use that to find it later.
Signed-off-by: Esteban Gutierrez <esteban@apache.org>
- at DEBUG log messages about RegionCountSkewCostFunction region/server totals
- at DEBUG log messages about the decision to balance or not with total costs
- at TRACE log messages about region count on each server RegionCountSkewCostFunction sees
- at TRACE log message with the individual cost functions used in the decision to balance or not
Signed-off-by: Viraj Jasani <vjasani@apache.org>
* Use Reflection to access shaded Hadoop protobuf classes.
(cherry picked from commit a321e536989083ca3620bf2c53f12c07740bf5b0)
* Update to improve the code:
1. Added license.
2. Added more comments.
3. Wrap byte array instead of copy to make a ByteString.
4. Moved all reflection instantiation to static class loading time.
* Use LiteralByteString to wrap byte array instead of copying it.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: stack <stack@apache.org>
hbase-server/src/test/java/org/apache/hadoop/hbase/TestClusterPortAssignment.java
Saw case where Master failed startup but it came out as an IOE so we
did not trip the retry logic.
hbase-server/src/test/java/org/apache/hadoop/hbase/TestInfoServers.java
Add some debug and up timeouts. This test fails frequently for me
locally.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/locking/TestEntityLocks.java
Up the wait from 2x 200ms to 10x in case a pause on hardware or GC.
This test fails locally and up on jenkins.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestClearRegionBlockCache.java
Debug. Have assert say what bad count was.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactingToCellFlatMapMemStore.java
Fails on occasion. Found count is off by a few. Tricky to debug. HBASE-24129 to reenable.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMergeTransactionOnCluster.java
Debug. Add wait and check before moving to assert.
hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftHttpServer.java
Check for null before shutting; can be null if failed start.
hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
Add retry if client messes up connection. Fails for me locally.
Addendum:
For major compaction test, set hbase.hstore.compaction.min to a big number to
avoid kicking in minor compactions, which will pollute compaction state and
sometimes, cause major compaction cannot happen.
Co-authored-by: Huaxiang Sun <huaxiangsun@apache.com>
Signed-off-by: stack <stack@apache.org>
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationTrackerZKImpl.java
Add debug for when assert fails (it fails on occasion locally)
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestHDFSAclHelper.java
Move this inner class out standalone since it used now by two tests.
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestSnapshotScannerHDFSAclController.java
Moved out testRestoreSnapshot and made methods in here static so could
be used by a new adacent test. Also made tablenames unique to methods
thinking that was root of original issue (wasn't but no harm in doing
this change) Moved out the inner class TestHDFSAclHelper.
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestSnapshotScannerHDFSAclController2.java
New class that sets up same context as
TestSnapshotScannerHDFSAclController but just to run single
testRestoreSnapshot method.
hbase-server/src/test/java/org/apache/hadoop/hbase/security/token/TestZKSecretWatcher.java
Some debug.
Signed-off-by: Yi Mei
JMXCacheBuster resets the metrics state at various points in time. These
events can potentially race with a master shutdown. When the master is
tearing down, metrics initialization can touch a lot of unsafe state,
for example invalidated FS objects. To avoid this, this patch makes
the getMetrics() a no-op when the master is either stopped or in the
process of shutting down. Additionally, getClusterId() when the server
is shutting down is made a no-op.
Simulating a test for this is a bit tricky but with the patch I don't
locally see the long stacktraces from the jira.
Signed-off-by: Michael Stack <stack@apache.org>
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientAsyncPrefetchScanner.java
Refactor to avoid NPE timing issue referencing lock during Construction.
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
Comment
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
Refactor. Catch NPE during startup and return it instead as failed initialization.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplit.java
Catch IndexOutOfBounds exception and convert to non-split request.
hbase-server/src/test/java/org/apache/hadoop/hbase/TestCachedClusterId.java
Make less furious. Make it less flakie.
hbase-server/src/test/java/org/apache/hadoop/hbase/TestServerSideScanMetricsFromClientSide.java
Debug. Catch exception to log, then rethrow.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
Guess that waiting longer on compaction to succeed may help make this
less flakey.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide3.java
Be explicit about timestamping to avoid concurrent edit landing
server-side and messing up test expectation.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMasterRegistry.java
Add wait on meta before proceeding w/ test.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java
Be explicit that edits are distinct.
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/bucket/TestBucketCacheRefCnt.java
Add @Ignore on RAM test... Fails sporadically.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestRegionMoveAndAbandon.java
Add wait for all RegionServers going down before proceeding; was
messing up RS accounting.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
Make balancer test sloppier; less restrictive; would fail on occasion
by being just outside test limits.
hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestQuotaObserverChoreRegionReports.java
Add wait on quota table coming up; helps make this less flakie.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
Be explicity about timestamps; see if helps w/ flakie failure.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionReplicas.java
Catch and ignore if issue in shutdown; don't care if after test.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerReportForDuty.java
Comment.
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
Add retry to see if helps w/ odd failure; grant hasn't propagated?
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLWithMultipleVersions.java
Explicit w/ timestamps so no accidental overlap of puts.
hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftHttpServer.java
hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
Hack to deal w/ BindException on startup.
hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThrift2ServerCmdLine.java
Use loopback.
hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java
Disable flakie test.
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
The PLAIN mechanism test added in the Shade authentication example has
different semantics than GSSAPI mechanism -- the client reports that the
handshake is done after the original challenge is computed. The javadoc
on SaslClient, however, tells us that we need to wait for a response
from the server before proceeding.
The client, best as I can see, does not receive any data from HBase;
however the application semantics (e.g. throw an exception on auth'n
error) do not work as we intend as a result of this bug.
Extra trace logging was also added to debug this, should a similar error
ever happen again with some other mechanism.
Closes#1260
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/DeleteTableProcedure.java
Edit of log about archiving that shows in middle of a table create;
try to make it less disorientating.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
Loosen assert. Compaction may have produced a single file only. Allow
for this.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableGetMultiThreaded.java
Make this test less furious given it is inline w/ a bunch of unit
tests.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide3.java
Add debug
hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestQuotaObserverChoreRegionReports.java
Add wait on quota table to show up before moving forward; otherwise,
attempt at quota setting fails.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
Debug
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMergeTransactionOnCluster.java
Remove asserts that expected regions to still have a presence in fs
after merge when a catalogjanitor may have cleaned up parent dirs.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionReplicas.java
Catch exception on way out and log it rather than let it fail test.
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestSnapshotScannerHDFSAclController.java
Wait on acl table before proceeding.
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupMajorCompactionTTL.java
Remove spurious assert. Just before this it waits an arbitrary 10
seconds. Compactions could have completed inside this time. The spirit
of the test remains.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
Get log cleaner to go down promptly; its sticking around. See if this
helps with TestMasterShutdown
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
We get a rare NPE trying to sync. Make local copy of SyncFuture and see
if that helps.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
Compaction may have completed when not expected; allow for it.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestBlockEvictionFromClient.java
Add wait before testing. Compaction may not have completed. Let
compaction complete before progressing and then test for empty cache.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
Less resources.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestDefaultLoadBalancer.java
Less resources.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
Wait till online before we try and do compaction (else request is
ignored)
hbase-server/src/test/java/org/apache/hadoop/hbase/tool/TestCanaryTool.java
Disable test that fails randomly w/ mockito complaint on some mac os
x's.
TestMasterShutdown... fix NPE in RSRpcDispatcher... catch it and covert
to false and have master check for successful startup.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Change parameter name and add javadoc to make it more clear what the
param actually is.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
Move postOpenDeployTasks so if it fails to talk to the Master -- which
can happen on cluster shutdown -- then we will do cleanup of state;
without this the RS can get stuck and won't go down.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
Add handleException so CRH looks more like UnassignRegionHandler and
AssignRegionHandler around exception handling. Add a bit of doc on
why CRH.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
Right shift most of the body of process so can add in a finally
that cleans up rs.getRegionsInTransitionInRS is on exception
(otherwise outstanding entries can stop a RS going down on cluster
shutdown)
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
Have client and server use loopback instead of 'localhost'
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Jan Hentschel <janh@apache.org>
Add being able to configure netty thread counts. Enable socket reuse
(should not have any impact).
hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/BlockingRpcConnection.java
Rename the threads we create in here so they are NOT named same was
threads created by Hadoop RPC.
hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/DefaultNettyEventLoopConfig.java
hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcClient.java
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/AsyncFSWAL.java
Allow configuring eventloopgroup thread count (so can override for
tests)
hbase-examples/src/main/java/org/apache/hadoop/hbase/client/example/HttpProxyExample.java
Enable socket resuse.
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcServer.java
Enable socket resuse and config for how many threads to use.
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
hbase-server/src/main/java/org/apache/hadoop/hbase/util/ModifyRegionUtils.java
Thread name edit; drop the redundant 'Thread' suffix.
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HFileReplicator.java
Make closeable and shutdown executor when called.
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
Call close on HFileReplicator
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java
HDFS creates lots of threads. Use less of it so less threads overall.
hbase-server/src/test/resources/hbase-site.xml
hbase-server/src/test/resources/hdfs-site.xml
Constrain resources when running in test context.
hbase-server/src/test/resources/log4j.properties
Enable debug on netty to see netty configs in our log
pom.xml
Add system properties when we launch JVMs to constrain thread counts in
tests
Signed-off-by: Duo Zhang <zhangduo@apache.org>
- consolidate checks made by master on behalf of balancer and
normalizer: deciding if the master is in a healthy state for
running any actions at all (skipRegionManagementAction). Normalizer
now does as balancer did previously.
- both balancer and normalizer make one final check on above
conditions between calculating an action plan and executing the
plan. should make the process more responsive to shutdown
requests.
- change normalizer to only consider acting on a region when it is in
the OPEN state. previously we would normalizer attempt to merge a
region that was already in a MERGING_NEW,MERGING,MERGED state.
- fix some typos in variable names.
Signed-off-by: Josh Elser <elserj@apache.org>
Signed-off-by: binlijin <binlijin@gmail.com>
A miscellaney. Add extra logging to help w/ debug to a bunch of tests.
Fix some issues particular where we ran into mismatched filesystem
complaint. Some modernizations, removal of unnecessary deletes
(especially after seeing tests fail in table delete), and cleanup.
Recategorized one tests because it starts four clusters in the one
JVM from medium to large. Finally, zk standalone server won't come
on occasion; added debug and thread dumping to help figure why (
manifests as test failing in startup saying master didn't launch).
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshot.java
Fixes occasional mismatched filesystems where the difference is file:// vs file:///
or we pick up hdfs schema when it a local fs test. Had to do this
vetting of how we do make qualified on a Path in a few places, not
just here as a few tests failed with this same issue. Code in here is
used by a lot of tests that each in turn suffered this mismatch.
Refactor for clarity
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotV1NoCluster.java
Unused import.
hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/store/wal/TestWALProcedureStore.java
This test fails if tmp dir is not where it expects because tries to
make rootdir there. Give it a rootdir under test data dir.
hbase-server/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
This change is probably useless. I think the issue is actually
a problem addressed later where our test for zk server being
up gets stuck and never times out.
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestSplitOrMergeStatus.java
Move off deprecated APIs.
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
Log when we fail balance check for DEBUG Currently just says 'false'
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestSplitWALProcedure.java
NPEs on way out if setup failed.
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
Add logging when assert fails to help w/ DEBUG
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerAbortTimeout.java
Don't bother removing stuff on teardown. All gets thrown away anyways.
Saw a few hangs in here in the teardown where hdfs was down before
expected messing up shutdown.
hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
Add timeout on socket; was seeing check for zk server getting stuck
and never timing out (test time out in startup)
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/snapshot/TestExportSnapshotWithTemporaryDirectory.java
Write to test data dir instead.
Be careful about how we make qualified paths.
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
Remove snowflake configs.
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationStatus.java
Add a hacky pause. Tried adding barriers but didn't work. Needs deep
dive.
hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
Remove code copied from zk and use zk methods directly instead.
A general problem is that zk cluster doesn't come up occasionally but
no clue why. Add thread dumping and state check.
Master rpc server end point doesn't bind to localhost's
IP address by default. Instead, it looks up the hostname and
binds to the endpoint to which it resolves. MasterRegistry should
do the same when building the default server end point to talk to.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
TestFromClientSideWithCoprocessor: Initialization bug causing parameterized
runs to fail.
TestCustomSaslAuthenticationProvider: Test config had to be fixed because
it was written pre-master registry implementation.
TestSnapshotScannerHDFSAclController: Cluster restart did not reset the
cached connection state.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Josh Elser <elserj@apache.org>
There were a couple of issues.
- There was a leak of a file descriptor for hbck lock file. This
was contributing to all the "ConnectionRefused" stack traces since
it was trying to renew lease for an already expired mini dfs cluster.
This issue was there for a while, just that we noticed it now.
- After upgrade to JUnit 4.13, it looks like the behavior for test
timeouts has changed. Earlier the timeout seems to have applied for
each parameterized run, but now it looks like it is applied across
all the runs.
This patch fixes both the issues.
Signed-off-by: Stack <stack@apache.org>
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
Implements a master based registry for clients.
- Supports hedged RPCs (fan out configured via configs).
- Parameterized existing client tests to run with multiple registry combinations.
- Added unit-test coverage for the new registry implementation.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Andrew Purtell <apurtell@apache.org>
* HBASE-23604: Cleanup AsyncRegistry interface
- Cleans up the method names to make more sense and adds a little
more javadocs for context. In future patches we can revisit
the name of the actual class to make it more self explanatory.
- Does AsyncRegistry -> ConnectionRegistry rename.
"async" ness of the registry is kind of implicit based on
the interface contents and need not be reflected in the name.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
* HBASE-23304: RPCs needed for client meta information lookup
This patch implements the RPCs needed for the meta information
lookup during connection init. New tests added to cover the RPC
code paths. HBASE-23305 builds on this to implement the client
side logic.
Fixed a bunch of checkstyle nits around the places the patch
touches.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
* HBASE-23281: Track meta region changes on masters
This patch adds a simple cache that tracks the meta region replica
locations. It keeps an eye on the region movements so that the
cached locations are not stale.
This information is used for servicing client RPCs for connections
that use master based registry (HBASE-18095). The RPC end points
will be added in a separate patch.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
* HBASE-23275: Track active master's address in ActiveMasterManager
Currently we just track whether an active master exists.
It helps to also track the address of the active master in
all the masters to help serve the client RPC requests to
know which master is active.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Andrew Purtell <apurtell@apache.org>
- MOB compaction is now handled in-line with per-region compaction on region
servers
- regions with mob data store per-hfile metadata about which mob hfiles are
referenced
- admin requested major compaction will also rewrite MOB files; periodic RS
initiated major compaction will not
- periodically a chore in the master will initiate a major compaction that
will rewrite MOB values to ensure it happens. controlled by
'hbase.mob.compaction.chore.period'. default is weekly
- control how many RS the chore requests major compaction on in parallel
with 'hbase.mob.major.compaction.region.batch.size'. default is as
parallel as possible.
- periodic chore in master will scan backing hfiles from regions to get the
set of referenced mob hfiles and archive those that are no longer
referenced. control period with 'hbase.master.mob.cleaner.period'
- Optionally, RS that are compacting mob files can limit write
amplification by not rewriting values from mob hfiles over a certain size
limit. opt-in by setting 'hbase.mob.compaction.type' to 'optimized'.
control threshold by 'hbase.mob.compactions.max.file.size'.
default is 1GiB
- Should smoothly integrate with existing MOB users via rolling upgrade.
will delay old MOB file cleanup until per-region compaction has managed
to compact each region at least once so that used mob hfile metadata can
be gathered.