Commit Graph

6789 Commits

Author SHA1 Message Date
Andrew Purtell d7b09de854 HBASE-20670 NPE in HMaster#isInMaintenanceMode 2018-06-04 15:19:45 -07:00
Michael Stack 063eefe3b0 HBASE-20634 Reopen region while server crash can cause the procedure to be stuck; ADDENDUM 2018-06-04 12:38:56 -07:00
Michael Stack 27e2c8c86b HBASE-20628 SegmentScanner does over-comparing when one flushing
Signed-off-by: eshcar <eshcar@oath.com>
Signed-off-by: anoopsjohn <anoopsamjohn@gmail.com>
2018-06-04 09:50:13 -07:00
zhangduo d834859404 HBASE-20634 Reopen region while server crash can cause the procedure to be stuck
A reattempt at fixing HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

The scenario is a SCP after processing WALs, goes to assign regions that
were on the crashed server but a concurrent Procedure gets in there
first and tries to unassign a region that was on the crashed server
(could be part of a move procedure or a disable table, etc.). The
unassign happens to run AFTER SCP has released all RPCs that
were going against the crashed server. The unassign fails because the
server is crashed. The unassign used to suspend itself only it would
never be woken up because the server it was going against had already
been processed. Worse, the SCP could not make progress because the
unassign was suspended with the lock on a region that it wanted to
assign held making it so it could make no progress.

In here, we add to the unassign recognition of the state where it is
running post SCP cleanup of RPCs. If present, unassign moves to finish
instead of suspending itself.

Includes a nice unit test made by Duo Zhang that reproduces nicely the
hung scenario.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java
 Moved this class back to hbase-procedure where it belongs.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoNodeDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoServerDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NullTargetServerDispatchException.java
 Specializiations on FRDE so we can be more particular when we say there
 was a problem.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
 Change addOperationToNode so we throw exceptions that give more detail
 on issue rather than a mysterious true/false

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Undo SERVER_CRASH_HANDLE_RIT2. Bad idea (from HBASE-20173)

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 Have expireServer return true if it actually queued an expiration. Used
 later in this patch.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Hide methods that shouldn't be public. Add a particular check used out
 in unassign procedure failure processing.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Check that server we're to move from is actually online (might
 catch a few silly move requests early).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
 Add doc on ServerState. Wasn't being used really. Now we actually stamp
 a Server OFFLINE after its WAL has been split. Means its safe to assign
 since all WALs have been processed. Add methods to update SPLITTING
 and to set it to OFFLINE after splitting done.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Change logging to be new-style and less repetitive of info.
 Cater to new way in which .addOperationToNode returns info (exceptions
 rather than true/false).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add looking for the case where we failed assign AND we should not
 suspend because we will never be woken up because SCP is beyond
 doing this for all stuck RPCs.

 Some cleanup of the failure processing grouping where we can proceed.

 TODOs have been handled in this refactor including the TODO that
 wonders if it possible that there are concurrent fails coming in
 (Yes).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
 Doc and removing the old HBASE-20173 'fix'.
 Also updating ServerStateNode post WAL splitting so it gets marked
 OFFLINE.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestServerCrashProcedureStuck.java
 Nice test by Duo Zhang.

Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Duo Zhang <palomino219@gmail.com>
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-04 09:26:36 -07:00
maoling 4c95b82b61 HBASE-19761:Fix Checkstyle errors in hbase-zookeeper
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
2018-06-02 10:17:27 +02:00
Andrew Purtell f46569a742 HBASE-20667 Rename TestGlobalThrottler to TestReplicationGlobalThrottler 2018-06-01 17:01:14 -07:00
Xu Cang d3e2248f12 HBASE-18116 Replication source in-memory accounting should not include bulk transfer hfiles
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-06-01 11:16:16 -07:00
Peter Somogyi 53d29d53c4 HBASE-20592 Create a tool to verify tables do not have prefix tree encoding
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-01 19:22:49 +02:00
Andrew Purtell b22409d51d Revert "HBASE-18116 fix replication source in-memory calculation by excluding bulk load file"
This reverts commit 050fae501a.
2018-05-31 15:28:37 -07:00
Xu Cang 050fae501a HBASE-18116 fix replication source in-memory calculation by excluding bulk load file
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-05-31 14:22:12 -07:00
Sean Busbey fc9743c17a HBASE-20444 Addendum keep folks from looking at raw version component array.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-05-31 14:17:41 -05:00
Andrew Purtell aaec02e0f5 HBASE-20646 TestWALProcedureStoreOnHDFS failing on branch-1 2018-05-30 14:44:54 -07:00
Andrew Purtell 15bb234d51 Revert "TestWALProcedureStoreOnHDFS failing on branch-1"
This reverts commit 694e79a67e.
2018-05-30 14:44:49 -07:00
Andrew Purtell 694e79a67e TestWALProcedureStoreOnHDFS failing on branch-1 2018-05-30 13:46:08 -07:00
zhangduo b785896cbd HBASE-20659 Implement a reopen table regions procedure 2018-05-30 20:03:35 +08:00
tedyu 856a3ac154 HBASE-20639 Implement permission checking through AccessController instead of RSGroupAdminEndpoint - revert due to pending discussion 2018-05-29 19:58:32 -07:00
Andrew Purtell 2dc51934f4 HBASE-20597 Serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint 2018-05-29 11:29:12 -07:00
Andrew Purtell 7f154dc20e Revert "HBASE-20597 Use a lock to serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint"
This reverts commit 60dcef289b.
2018-05-29 11:24:30 -07:00
Nihal Jain d36cce1574 HBASE-20633 Dropping a table containing a disable violation policy fails to remove the quota upon table delete
Signed-off-by: Josh Elser <elserj@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
2018-05-29 11:50:40 -04:00
eshcar aa00391140 HBASE-20390 ADDENDUM 2: fix TestHRegionWithInMemoryFlush OOME 2018-05-29 16:24:27 +03:00
eshcar cf1928aaca HBASE-20390-ADDENDUM: fix TestHRegionWithInMemoryFlush OOME 2018-05-29 13:01:07 +03:00
huzheng c8fd6e0fb6 HBASE-20533 Fix the flaky TestAssignmentManagerMetrics 2018-05-29 09:50:04 +08:00
Toshihiro Suzuki 0455e75edd HBASE-20648 HBASE-19364 "Truncate_preserve fails with table when replica region > 1" for master branch
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-28 08:27:41 -07:00
Nihal Jain bc72fcd8c5 HBASE-20639 Implement permission checking through AccessController instead of RSGroupAdminEndpoint
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-27 18:43:08 -07:00
meiyi f40c10a211 HBASE-20518 Need to serialize the enabled field for UpdatePeerConfigProcedure
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-05-25 14:45:49 +08:00
Thiruvel Thirumoolan d1cbd561df HBASE-20548 Master fails to startup on large clusters, refreshing block distribution
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-05-24 15:47:22 -07:00
Toshihiro Suzuki db8789ab22 HBASE-20616 TruncateTableProcedure is stuck in retry loop in TRUNCATE_TABLE_CREATE_FS_LAYOUT state
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-24 15:17:01 -07:00
Nihal Jain 55c4791a63 HBASE-20588 Space quota change after quota violation doesn't seem to take in effect
Signed-off-by: Josh Elser <elserj@apache.org>
2018-05-24 12:50:22 -04:00
eshcar 079e08d7c5 HBASE-20390: IMC Default Parameters for 2.0.0 2018-05-24 18:10:10 +03:00
Guanghao Zhang bfab1e2f92 HBASE-20589 Don't need to assign meta to a new RS when standby master become active 2018-05-24 11:45:59 +08:00
zhangduo a94c6dbadb HBASE-20624 Race in ReplicationSource which causes walEntryFilter being null when creating new shipper 2018-05-24 10:48:35 +08:00
Andrew Purtell 60dcef289b HBASE-20597 Use a lock to serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint 2018-05-23 16:46:20 -07:00
Michael Stack afddf6b1c2 HBASE-20620 HBASE-20564 Tighter ByteBufferKeyValue Cell Comparator; part 2
Adds new stripped-down, faster ByteBufferKeyValue comparator
(BBKV is the base Cell-type in hbase2). Creates an instance
of new Comparator each time we create new memstore rather
than use the universal CellComparator.

Remove unused and unneeded Interfaces from Cell base type.
2018-05-23 13:20:29 -07:00
huzheng fbda502435 HBASE-20612 TestReplicationKillSlaveRSWithSeparateOldWALs sometimes fail because it uses an expired cluster conn 2018-05-23 12:07:54 +08:00
jingyuntian c3c9a4a595 HBASE-20579 Improve snapshot manifest copy in ExportSnapshot
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-18 06:43:53 -07:00
Guanghao Zhang d06673cf3e HBASE-20583 SplitLogWorker should handle FileNotFoundException when split a wal 2018-05-18 14:30:40 +08:00
Balazs Meszaros 39ea1efa88 HBASE-20571 JMXJsonServlet generates invalid JSON if it has NaN in metrics
- CacheStats won't generate NaN metrics.
- JSONBean class will serialize special floating point values as
  "NaN", "Infinity" or "-Infinity"

Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-05-16 12:20:37 -07:00
Apekshit Sharma 61f2b5f071 HBASE-20567 Pass both old and new descriptors to pre/post hooks of modify operations for table and namespace.
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-05-16 14:03:36 -05:00
Michael Stack 77eaff0e10
HBASE-20564 Tighter ByteBufferKeyValue Cell Comparator; ADDENDUM
Add method the CellComparator Interface. Add implementation to
meta comparator so we don't fall back to the default comparator.

Includes a nothing change to hbase-server/pom.xml just to provoke
build.
2018-05-16 09:42:51 -07:00
Michael Stack 5c4685e56e
HBASE-20520 Failed effort upping default HDFS blocksize, hbase.regionserver.hlog.blocksize 2018-05-16 09:18:06 -07:00
zhangduo 82e3011166 HBASE-20585 Need to clear peer map when clearing MasterProcedureScheduler 2018-05-16 08:46:34 +08:00
zhangduo 60b8344cf1 HBASE-20457 Return immediately for a scan rpc call when we want to switch from pread to stream 2018-05-15 21:09:04 +08:00
Zach York b7def9b690 HBASE-20447 Only fail cacheBlock if block collisions aren't related to next block metadata
When we pread, we don't force the read to read all of the next block header.
However, when we get into a race condition where two opener threads try to
cache the same block and one thread read all of the next block header and
the other one didn't, it will fail the open process. This is especially important
in a splitting case where it will potentially fail the split process.
Instead, in the caches, we should only fail if the required blocks are different.

Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-05-14 16:09:14 -07:00
huzheng 4b0ac73f51 HBASE-20560 Revisit the TestReplicationDroppedTables ut 2018-05-14 19:33:51 +08:00
huzheng be3df29cef HBASE-20128 Add new UTs which extends the old replication UTs but set replication scope to SERIAL 2018-05-14 19:32:39 +08:00
Michael Stack 5ac7740896
HBASE-20411 Ameliorate MutableSegment synchronize
Change the MemStore size accounting so we don't synchronize across three
volatiles applying deltas. Instead:

 + Make MemStoreSize, a datastructure of our memstore size longs, immutable.
 + Undo MemStoreSizing being an instance of MemStoreSize; instead it has-a.
 + Make two MemStoreSizing implementations; one thread-safe, the other not.
 + Let all memory sizing longs run independent, untied by
   synchronize (Huaxiang and Anoop suggestion) using atomiclongs.
 + Review all use of MemStoreSizing. Many are single-threaded and do
   not need to be synchronized; use the non-thread safe counter.

TODO: Use this technique accounting at the global level too.
2018-05-12 02:16:19 +01:00
Thiruvel Thirumoolan 1f10ef553e HBASE-20545 Improve performance of BaseLoadBalancer.retainAssignment
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-10 10:45:17 -07:00
Sean Busbey 61f96b6ffa HBASE-20544 Make HBTU default to random ports.
Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Josh Elser <elserj@apache.org>

 Conflicts:
	hbase-backup/src/test/resources/hbase-site.xml
	hbase-spark-it/src/test/resources/hbase-site.xml
	hbase-spark/src/test/resources/hbase-site.xml
2018-05-09 23:45:39 -07:00
Andrew Purtell c430016cf9 HBASE-20554 "WALs outstanding" message from CleanerChore is noisy 2018-05-09 19:11:50 -07:00
Zach York cba8d2fb8d HBASE-20204 Add locking to RefreshFileConnections in BucketCache
This is a follow-up to HBASE-20141 where Anoop suggested adding locking
for refreshing channels.
2018-05-09 14:23:27 -07:00