Commit Graph

7842 Commits

Author SHA1 Message Date
Michael Stack ed54c35cf0
HBASE-24134 Down forked JVM heap size from 2800m to 2200m for jdk8 and jdk11 (#1451)
Down jdk8 forked jvm heap from 2800 to 2200 and the jdk11 heap from
3200 to 2200. Down the mvn size from 4G to 3.6G

Change how many puts done by TestMultiRespectsLimits because made
the test run the forked heap over 2.5G in size.

Signed-off-by: Sean Busbey <busbey@apache.org>
2020-04-08 10:51:03 -07:00
Michael Stack 9da7f95fa7
HBASE-24128 [Flakey Tests] Add retry on thrift cmdline if client fails plus misc debug (#1442)
hbase-server/src/test/java/org/apache/hadoop/hbase/TestClusterPortAssignment.java
 Saw case where Master failed startup but it came out as an IOE so we
 did not trip the retry logic.

hbase-server/src/test/java/org/apache/hadoop/hbase/TestInfoServers.java
 Add some debug and up timeouts. This test fails frequently for me
 locally.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/locking/TestEntityLocks.java
 Up the wait from 2x 200ms to 10x in case a pause on hardware or GC.
 This test fails locally and up on jenkins.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestClearRegionBlockCache.java
 Debug. Have assert say what bad count was.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactingToCellFlatMapMemStore.java
 Fails on occasion. Found count is off by a few. Tricky to debug. HBASE-24129 to reenable.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMergeTransactionOnCluster.java
 Debug. Add wait and check before moving to assert.

hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftHttpServer.java
 Check for null before shutting; can be null if failed start.

hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
 Add retry if client messes up connection. Fails for me locally.
2020-04-07 09:28:05 -07:00
Duo Zhang 74a85e26ee HBASE-24055 Make AsyncFSWAL can run on EC cluster (#1437)
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: stack <stack@apache.org>
2020-04-07 23:46:06 +08:00
binlijin 4a85f06590 HBASE-24077 When encounter RowTooBigException, log the row info. (#1379)
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Pankaj <pankajkumar@apache.org>
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
2020-04-07 10:36:40 +08:00
stack 389dfd2198 HBASE-24118 [Flakey Tests] TestCloseRegionWhileRSCrash
Reapply but as an @Ignore for the flakey test.
2020-04-06 12:40:25 -07:00
stack 82773a8c6e Revert "HBASE-24118 [Flakey Tests] TestCloseRegionWhileRSCrash"
Reverting in favor of adding an @Ignore on this test until
root cause of flakyness HBASE-24117 is addressed.

This reverts commit 9985c06647.
2020-04-06 12:40:13 -07:00
huaxiangsun 9d28f2d086 HBASE-24105 [Flakey Test] regionserver.TestRegionReplicas (#1425)
Co-authored-by: Huaxiang Sun <huaxiangsun@apache.com>
Signed-off-by: stack <stack@apache.org>
2020-04-05 13:23:33 -07:00
stack 9845f9e416 HBASE-24118 [Flakey Tests] TestCloseRegionWhileRSCrash 2020-04-04 17:45:22 -07:00
stack 51485db67c HBASE-24114 [Flakey Tests] TestSnapshotScannerHDFSAclController
Addendum, make it three seconds.
2020-04-04 17:38:54 -07:00
Huaxiang Sun 93c3653ecf HBASE-24114 [Flakey Tests] TestSnapshotScannerHDFSAclController 2020-04-04 13:14:30 -07:00
Viraj Jasani 73aded09ec
HBASE-24102 : Remove decommissioned RS from target servers while unlo… (#1417)
Signed-off-by: binlijin <binlijin@gmail.com>
Signed-off-by: Pankaj <pankajkumar@apache.org>
Signed-off-by: ramkrish86 <ramkrishna@apache.org>
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Xu Cang <xucang@apache.org>
Signed-off-by: Reid Chan <reidchan@apache.org>
2020-04-03 18:36:02 +05:30
huaxiangsun 53299a6de2
HBASE-24080 [flakey test] TestRegionReplicaFailover.testSecondaryRegionKill fails. (#1421) (#1423)
Signed-off-by: stack <stack@apache.org>
2020-04-02 18:21:40 -07:00
meiyi c97c9e2eda HBASE-24103 [Flakey Tests] TestSnapshotScannerHDFSAclController (#1416)
Signed-off-by: stack <stack@apache.org>
2020-04-02 09:56:26 -07:00
niuyulin 75714a4a06 HBASE-24021 Fail fast when bulkLoadHFiles method catch some IOException (#1343)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2020-04-02 23:26:51 +08:00
huaxiangsun 231c2bca94 HBASE-24073 [flakey test] client.TestAsyncRegionAdminApi messed up compaction state. (#1414)
Addendum:
  For major compaction test, set hbase.hstore.compaction.min to a big number to
  avoid kicking in minor compactions, which will pollute compaction state and
  sometimes, cause major compaction cannot happen.

Co-authored-by: Huaxiang Sun <huaxiangsun@apache.com>
Signed-off-by: stack <stack@apache.org>
2020-04-02 08:18:17 -07:00
stack 09141681f6 Revert "HBASE-24051 Allows indirect inheritance to CanUnbuffer (#1406)"
This reverts commit 30f5852fc2.
2020-04-01 15:49:11 -07:00
申胜利 30f5852fc2 HBASE-24051 Allows indirect inheritance to CanUnbuffer (#1406)
Signed-off-by: stack <stack@apache.org>
2020-04-01 14:41:14 -07:00
Bharath Vissapragada 9384b84552 HBASE-24075: Fix a race between master shutdown and metrics (re)init
JMXCacheBuster resets the metrics state at various points in time. These
events can potentially race with a master shutdown. When the master is
tearing down, metrics initialization can touch a lot of unsafe state,
for example invalidated FS objects. To avoid this, this patch makes
the getMetrics() a no-op when the master is either stopped or in the
process of shutting down. Additionally, getClusterId() when the server
is shutting down is made a no-op.

Simulating a test for this is a bit tricky but with the patch I don't
locally see the long stacktraces from the jira.

Signed-off-by: Michael Stack <stack@apache.org>
(cherry picked from commit 6f213e9d5a)
2020-04-01 10:14:34 -07:00
Michael Stack 40caac9b61
HBASE-24097 [Flakey Tests] TestSnapshotScannerHDFSAclController#testRestoreSnapshot (#1405)
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationTrackerZKImpl.java
 Add debug for when assert fails (it fails on occasion locally)

hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestHDFSAclHelper.java
 Move this inner class out standalone since it used now by two tests.

hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestSnapshotScannerHDFSAclController.java
 Moved out testRestoreSnapshot and made methods in here static so could
 be used by a new adacent test. Also made tablenames unique to methods
 thinking that was root of original issue (wasn't but no harm in doing
 this change) Moved out the inner class TestHDFSAclHelper.

hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestSnapshotScannerHDFSAclController2.java
 New class that sets up same context as
 TestSnapshotScannerHDFSAclController but just to run single
 testRestoreSnapshot method.

hbase-server/src/test/java/org/apache/hadoop/hbase/security/token/TestZKSecretWatcher.java
 Some debug.

Signed-off-by: Yi Mei
2020-04-01 08:33:44 -07:00
Viraj Jasani 3433c7a2db
HBASE-23937 : Support Online LargeLogs similar to SlowLogs APIs (#1346)
Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
2020-04-01 19:56:42 +05:30
Viraj Jasani 0b2b63ea84
HBASE-23678 : Builder API for version management - setVersionsWithTim… (#1381)
Signed-off-by: Xu Cang <xucang@apache.org>
2020-04-01 16:27:36 +05:30
stack b1eff98789 HBASE-24079 [Flakey Tests] Misc fixes and debug; fix BindException in Thrift tests; add waits on quota table to come online; etc.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientAsyncPrefetchScanner.java
 Refactor to avoid NPE timing issue referencing lock during Construction.

hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/BucketCache.java
 Comment

hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
 Refactor. Catch NPE during startup and return it instead as failed initialization.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplit.java
 Catch IndexOutOfBounds exception and convert to non-split request.

hbase-server/src/test/java/org/apache/hadoop/hbase/TestCachedClusterId.java
 Make less furious. Make it less flakie.

hbase-server/src/test/java/org/apache/hadoop/hbase/TestServerSideScanMetricsFromClientSide.java
 Debug. Catch exception to log, then rethrow.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
 Guess that waiting longer on compaction to succeed may help make this
 less flakey.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide3.java
 Be explicit about timestamping to avoid concurrent edit landing
 server-side and messing up test expectation.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestMasterRegistry.java
 Add wait on meta before proceeding w/ test.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestScannersFromClientSide.java
 Be explicit that edits are distinct.

hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/bucket/TestBucketCacheRefCnt.java
 Add @Ignore on RAM test... Fails sporadically.

hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestRegionMoveAndAbandon.java
 Add wait for all RegionServers going down before proceeding; was
 messing up RS accounting.

hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/BalancerTestBase.java
 Make balancer test sloppier; less restrictive; would fail on occasion
 by being just outside test limits.

hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestQuotaObserverChoreRegionReports.java
 Add wait on quota table coming up; helps make this less flakie.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
 Be explicity about timestamps; see if helps w/ flakie failure.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionReplicas.java
 Catch and ignore if issue in shutdown; don't care if after test.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerReportForDuty.java
 Comment.

hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
 Add retry to see if helps w/ odd failure; grant hasn't propagated?

hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestCellACLWithMultipleVersions.java
 Explicit w/ timestamps so no accidental overlap of puts.

hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftHttpServer.java
hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
 Hack to deal w/ BindException on startup.

hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThrift2ServerCmdLine.java
 Use loopback.

hbase-thrift/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java
 Disable flakie test.

Signed-off-by: Bharath Vissapragada <bharathv@apache.org>
2020-03-30 16:46:48 -07:00
huaxiangsun 34ebdd6c9d
HBASE-24073 [flakey test] client.TestAsyncRegionAdminApi messed up compaction state. (#1387) (#1389)
Signed-off-by: Viraj Jasani <vjasani@apache.org>
2020-03-30 12:26:10 -07:00
WenFeiYi 0433713b35 HBASE-24040 WALFactory.Providers.multiwal causes StackOverflowError (#1338)
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2020-03-27 10:11:50 +08:00
huaxiangsun 5d5d845fea
HBASE-23853 [Flakey Test] TestBlockEvictionFromClient#testBlockRefCountAfterSplits (#1363) (#1366)
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: <stack@apache.org>
2020-03-26 17:21:01 -07:00
stack 8a26a4e64f HBASE-24052 Add debug to TestMasterShutdown
Addendum 2: Refactor TestMasterShutdown
2020-03-26 15:42:56 -07:00
stack a18f5b1517 HBASE-24052 Add debug to TestMasterShutdown
Addendum
2020-03-26 12:22:22 -07:00
Duo Zhang 30eba2c24e HBASE-24000 Simplify CommonFSUtils after upgrading to hadoop 2.10.0 (#1335)
Signed-off-by: stack <stack@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
2020-03-26 18:10:03 +08:00
Peter Somogyi 05023846f9 HBASE-22555 Re-enable TestMasterOperationsForRegionReplicas (#1345)
Signed-off-by: stack <stack@apache.org>
2020-03-26 11:02:16 +01:00
stack b71ef1e94a HBASE-24052 Add debug to TestMasterShutdown 2020-03-25 22:42:19 -07:00
stack dcd9a81528 HBASE-24047 [Flakey Tests] Disable TestCustomSaslAuthenticationProvider#testNegativeAuthentication 2020-03-25 15:44:19 -07:00
niuyulin 244b308a3e
HBASE-23949 refactor loadBalancer implements for rsgroup balance by table to achieve overallbalanced (#1324)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2020-03-25 11:27:32 +08:00
Guanghao Zhang 41baf711ec HBASE-24037 Add ut for root dir and wal root dir are different (#1336)
Signed-off-by: stack <stack@apache.org>
2020-03-25 10:55:58 +08:00
Wei-Chiu Chuang 8521207be4 HBASE-8868. add metric to report client shortcircuit reads. (#1334)
Signed-off-by: stack <stack@apache.net>
2020-03-24 15:31:34 -07:00
stack d7189127fb HBASE-24043 [Flakey Tests] TestAsyncRegionAdminApi, TestRegionMergeTransactionOnCluster fixes and debug
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/DeleteTableProcedure.java
 Edit of log about archiving that shows in middle of a table create;
 try to make it less disorientating.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
 Loosen assert. Compaction may have produced a single file only. Allow
 for this.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableGetMultiThreaded.java
 Make this test less furious given it is inline w/ a bunch of unit
 tests.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide3.java
 Add debug

hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestQuotaObserverChoreRegionReports.java
 Add wait on quota table to show up before moving forward; otherwise,
 attempt at quota setting fails.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
 Debug

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionMergeTransactionOnCluster.java
 Remove asserts that expected regions to still have a presence in fs
 after merge when a catalogjanitor may have cleaned up parent dirs.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionReplicas.java
 Catch exception on way out and log it rather than let it fail test.

hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestSnapshotScannerHDFSAclController.java
 Wait on acl table before proceeding.
2020-03-24 14:47:28 -07:00
Reid Chan 1196e42362 Revert "[RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x (#1326)"
Reason: Invalid, branch-2 and master is different in RSGroup module.

This reverts commit e869a20123.
2020-03-24 14:41:33 +08:00
Reid Chan e869a20123 [RSGroup] Forward-port HBASE-22658 to master branch and branch-2.x (#1326)
Signed-off-by: stack <stack@apache.org>
2020-03-24 13:17:25 +08:00
stack 50161f2de4 HBASE-24034 [Flakey Tests] A couple of fixes and cleanups
hbase-rsgroup/src/test/java/org/apache/hadoop/hbase/rsgroup/TestRSGroupMajorCompactionTTL.java
 Remove spurious assert. Just before this it waits an arbitrary 10
 seconds. Compactions could have completed inside this time. The spirit
 of the test remains.

hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/HFileCleaner.java
 Get log cleaner to go down promptly; its sticking around. See if this
 helps with TestMasterShutdown

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
 We get a rare NPE trying to sync. Make local copy of SyncFuture and see
 if that helps.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
 Compaction  may have completed when not expected; allow for it.

hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestBlockEvictionFromClient.java
 Add wait before testing. Compaction may not have completed. Let
 compaction complete before progressing and then test for empty cache.

hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
 Less resources.

hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestDefaultLoadBalancer.java
 Less resources.

hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
 Wait till online before we try and do compaction (else request is
 ignored)

hbase-server/src/test/java/org/apache/hadoop/hbase/tool/TestCanaryTool.java
 Disable test that fails randomly w/ mockito complaint on some mac os
 x's.

TestMasterShutdown... fix NPE in RSRpcDispatcher... catch it and covert
to false and have master check for successful startup.
2020-03-23 16:21:19 -07:00
stack 1147c23627 HBASE-24035 [Flakey Tests] Disable TestClusterScopeQuotaThrottle#testUserNamespaceClusterScopeQuota 2020-03-23 13:44:00 -07:00
Huaxiang Sun ccc955a4d0
HBASE-23957 [flakey test] client.TestMultiParallel fails to read hbase-site.xml (#1310) (#1327)
Signed-off-by: Nick Dimiduk ndimiduk@apache.org
Signed-off-by: stack <stack@apache.org>
2020-03-23 12:55:59 -07:00
Guanghao Zhang f16cf1dd8d HBASE-23741 Data loss when WAL split to HFile enabled (#1254)
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2020-03-23 14:42:08 +08:00
Guanghao Zhang 1cede85a53 HBASE-24033 Add ut for loading the corrupt recovered hfiles (#1322)
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2020-03-22 22:58:35 +08:00
Pankaj 3e4444f4dd HBASE-23633 Find a way to handle the corrupt recovered hfiles (#1233)
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2020-03-22 16:48:01 +08:00
Toshihiro Suzuki 5104aa80fa HBASE-24030 Add necessary validations to HRegion.checkAndMutate() and HRegion.checkAndRowMutate() (#1315)
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Jan Hentschel <janh@apache.org>
2020-03-22 11:58:42 +09:00
Michael Stack 392bce03f6
HBASE-23984 [Flakey Tests] TestMasterAbortAndRSGotKilled fails in teardown (#1311)
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 Change parameter name and add javadoc to make it more clear what the
 param actually is.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/AssignRegionHandler.java
 Move postOpenDeployTasks so if it fails to talk to the Master -- which
 can happen on cluster shutdown -- then we will do cleanup of state;
 without this the RS can get stuck and won't go down.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 Add handleException so CRH looks more like UnassignRegionHandler and
 AssignRegionHandler around exception handling. Add a bit of doc on
 why CRH.

hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/UnassignRegionHandler.java
 Right shift most of the body of process so can add in a finally
 that cleans up rs.getRegionsInTransitionInRS is on exception
 (otherwise outstanding entries can stop a RS going down on cluster
 shutdown)

Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2020-03-20 15:25:06 -07:00
Viraj Jasani 8320f73c8c
HBASE-23977 : Resolve flakes present in TestSlowLogRecorder (ADDENDUM) 2020-03-20 13:28:28 +05:30
Nick Dimiduk ffb2359146
HBASE-24013 Bump branch-2 version to 2.4.0-SNAPSHOT (#1309)
Increment version in poms with

```
$ mvn org.codehaus.mojo:versions-maven-plugin:2.7:set -DnewVersion=2.4.0-SNAPSHOT -DgenerateBackupPoms=false
```

Verified no dangling references with

```
$ find . -iname '*pom.xml' -exec grep -n '2.3.0-SNAPSHOT' {} +
```

Verified build with

```
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home mvn clean package -DskipTests
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home mvn clean package -DskipTests -Dhadoop.profile=3.0
```

Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
2020-03-19 08:01:43 -07:00
Viraj Jasani 481338cc4b
HBASE-23977 : Resolve flakes present in TestSlowLogRecorder (#1286)
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
2020-03-19 15:43:30 +05:30
Michael Stack ebd37a314c
HBASE-23993 Use loopback for zk standalone server in minizkcluster (#1291)
hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
 Have client and server use loopback instead of 'localhost'

Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Jan Hentschel <janh@apache.org>
2020-03-17 20:14:24 -07:00
Wei-Chiu Chuang 7b2fe82be3 HBASE-22103. HDFS-13209 in Hadoop 3.3.0 breaks asyncwal. (#1284)
Signed-off-by: Duo Zhang <zhangduo@apache.org>
2020-03-17 14:37:40 +08:00