hbase

Commit Graph

Author	SHA1	Message	Date
zhangduo	a472f24d17	HBASE-20634 Reopen region while server crash can cause the procedure to be stuck A reattempt at fixing HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock The scenario is a SCP after processing WALs, goes to assign regions that were on the crashed server but a concurrent Procedure gets in there first and tries to unassign a region that was on the crashed server (could be part of a move procedure or a disable table, etc.). The unassign happens to run AFTER SCP has released all RPCs that were going against the crashed server. The unassign fails because the server is crashed. The unassign used to suspend itself only it would never be woken up because the server it was going against had already been processed. Worse, the SCP could not make progress because the unassign was suspended with the lock on a region that it wanted to assign held making it so it could make no progress. In here, we add to the unassign recognition of the state where it is running post SCP cleanup of RPCs. If present, unassign moves to finish instead of suspending itself. Includes a nice unit test made by Duo Zhang that reproduces nicely the hung scenario. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java Moved this class back to hbase-procedure where it belongs. M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoNodeDispatchException.java M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoServerDispatchException.java M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NullTargetServerDispatchException.java Specializiations on FRDE so we can be more particular when we say there was a problem. M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java Change addOperationToNode so we throw exceptions that give more detail on issue rather than a mysterious true/false M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto Undo SERVER_CRASH_HANDLE_RIT2. Bad idea (from HBASE-20173) M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java Have expireServer return true if it actually queued an expiration. Used later in this patch. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java Hide methods that shouldn't be public. Add a particular check used out in unassign procedure failure processing. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java Check that server we're to move from is actually online (might catch a few silly move requests early). M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java Add doc on ServerState. Wasn't being used really. Now we actually stamp a Server OFFLINE after its WAL has been split. Means its safe to assign since all WALs have been processed. Add methods to update SPLITTING and to set it to OFFLINE after splitting done. M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java Change logging to be new-style and less repetitive of info. Cater to new way in which .addOperationToNode returns info (exceptions rather than true/false). M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java Add looking for the case where we failed assign AND we should not suspend because we will never be woken up because SCP is beyond doing this for all stuck RPCs. Some cleanup of the failure processing grouping where we can proceed. TODOs have been handled in this refactor including the TODO that wonders if it possible that there are concurrent fails coming in (Yes). M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java Doc and removing the old HBASE-20173 'fix'. Also updating ServerStateNode post WAL splitting so it gets marked OFFLINE. A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestServerCrashProcedureStuck.java Nice test by Duo Zhang. Signed-off-by: Umesh Agashe <uagashe@cloudera.com> Signed-off-by: Duo Zhang <palomino219@gmail.com> Signed-off-by: Mike Drob <mdrob@apache.org>	2018-06-04 09:26:56 -07:00
maoling	1b98a96caa	HBASE-19761:Fix Checkstyle errors in hbase-zookeeper Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>	2018-06-02 10:08:15 +02:00
Andrew Purtell	9d5004894c	HBASE-20667 Rename TestGlobalThrottler to TestReplicationGlobalThrottler	2018-06-01 17:01:16 -07:00
Xu Cang	a11701ecc5	HBASE-18116 Replication source in-memory accounting should not include bulk transfer hfiles Signed-off-by: Andrew Purtell <apurtell@apache.org>	2018-06-01 11:15:47 -07:00
Peter Somogyi	0968668283	HBASE-20592 Create a tool to verify tables do not have prefix tree encoding Signed-off-by: Mike Drob <mdrob@apache.org>	2018-06-01 19:17:49 +02:00
Andrew Purtell	da3ecf1f13	Revert "HBASE-18116 fix replication source in-memory calculation by excluding bulk load file" This reverts commit `6f3f34227e`.	2018-05-31 15:28:28 -07:00
Xu Cang	6f3f34227e	HBASE-18116 fix replication source in-memory calculation by excluding bulk load file Signed-off-by: Andrew Purtell <apurtell@apache.org>	2018-05-31 14:22:12 -07:00
Sean Busbey	d909ec55aa	HBASE-20444 Addendum keep folks from looking at raw version component array. Signed-off-by: Andrew Purtell <apurtell@apache.org>	2018-05-31 13:29:52 -05:00
Nihal Jain	40a73a5ca7	HBASE-20653 Add missing observer hooks for region server group to MasterObserver Signed-off-by: tedyu <yuzhihong@gmail.com>	2018-05-30 21:29:07 -07:00
Andrew Purtell	b889c8a221	HBASE-20646 TestWALProcedureStoreOnHDFS failing on branch-1	2018-05-30 14:44:30 -07:00
Andrew Purtell	31ae8dc7f3	Revert "TestWALProcedureStoreOnHDFS failing on branch-1" This reverts commit `dcfa01448c`.	2018-05-30 14:44:22 -07:00
Andrew Purtell	dcfa01448c	TestWALProcedureStoreOnHDFS failing on branch-1	2018-05-30 13:45:38 -07:00
zhangduo	997747076d	HBASE-20659 Implement a reopen table regions procedure	2018-05-30 20:03:25 +08:00
tedyu	266b251dfa	HBASE-20639 Implement permission checking through AccessController instead of RSGroupAdminEndpoint - revert due to pending discussion	2018-05-29 19:57:51 -07:00
tedyu	fe73fe8def	HBASE-20653 Add missing observer hooks for region server group to MasterObserver - revert due to pending discussion	2018-05-29 19:42:28 -07:00
Nihal Jain	8d19bbd347	HBASE-20653 Add missing observer hooks for region server group to MasterObserver Signed-off-by: tedyu <yuzhihong@gmail.com>	2018-05-29 16:37:19 -07:00
Andrew Purtell	06611256ee	HBASE-20597 Serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint	2018-05-29 11:29:05 -07:00
Andrew Purtell	807c905f90	Revert "HBASE-20597 Use a lock to serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint" This reverts commit `9fbce1668b`.	2018-05-29 11:24:11 -07:00
Nihal Jain	7ff29d8e00	HBASE-20633 Dropping a table containing a disable violation policy fails to remove the quota upon table delete Signed-off-by: Josh Elser <elserj@apache.org> Signed-off-by: Michael Stack <stack@apache.org>	2018-05-29 11:33:56 -04:00
Mike Drob	a110e1eff5	HBASE-20478 Update checkstyle to v8.2 Cannot go to latest (8.9) yet due to https://github.com/checkstyle/checkstyle/issues/5279 * move hbaseanti import checks to checkstyle * implment a few missing equals checks, and ignore one * fix lots of javadoc errors Signed-off-by: Sean Busbey <busbey@apache.org>	2018-05-29 10:12:31 -05:00
eshcar	42be553433	HBASE-20390 ADDENDUM 2: fix TestHRegionWithInMemoryFlush OOME	2018-05-29 16:27:20 +03:00
Apekshit Sharma	05f57f4c03	HBASE-20652 Remove internal uses of some deprecated MasterObserver hooks Remove internal uses of these hooks: preModifyNamespace postModifyNamespace preModifyTable postModifyTable preModifyTableAction postCompletedModifyTableAction Signed-off-by: tedyu <yuzhihong@gmail.com>	2018-05-28 21:10:52 -07:00
huzheng	81228f72d0	HBASE-20533 Fix the flaky TestAssignmentManagerMetrics	2018-05-29 09:38:47 +08:00
eshcar	1cd2b56802	HBASE-20390-ADDENDUM: fix TestHRegionWithInMemoryFlush OOME	2018-05-28 16:10:53 +03:00
Nihal Jain	9bd4b04ca8	HBASE-20639 Implement permission checking through AccessController instead of RSGroupAdminEndpoint Signed-off-by: tedyu <yuzhihong@gmail.com>	2018-05-27 11:29:26 -07:00
eshcar	1eabbb4295	HBASE-20390: IMC Default Parameters for 2.0.0	2018-05-26 22:57:28 +03:00
Toshihiro Suzuki	b1089e8310	HBASE-20648 HBASE-19364 "Truncate_preserve fails with table when replica region > 1" for master branch Signed-off-by: tedyu <yuzhihong@gmail.com>	2018-05-25 07:52:40 -07:00
meiyi	36f3d9432a	HBASE-20518 Need to serialize the enabled field for UpdatePeerConfigProcedure Signed-off-by: zhangduo <zhangduo@apache.org>	2018-05-25 14:36:16 +08:00
Thiruvel Thirumoolan	1fbce10ff4	HBASE-20548 Master fails to startup on large clusters, refreshing block distribution Signed-off-by: Andrew Purtell <apurtell@apache.org>	2018-05-24 15:47:24 -07:00
Toshihiro Suzuki	554d513f50	HBASE-20616 TruncateTableProcedure is stuck in retry loop in TRUNCATE_TABLE_CREATE_FS_LAYOUT state Signed-off-by: tedyu <yuzhihong@gmail.com>	2018-05-24 15:16:30 -07:00
Nihal Jain	09dac89908	HBASE-20588 Space quota change after quota violation doesn't seem to take in effect Signed-off-by: Josh Elser <elserj@apache.org>	2018-05-24 12:40:55 -04:00
Guanghao Zhang	320a3332e0	HBASE-20589 Don't need to assign meta to a new RS when standby master become active	2018-05-24 11:26:48 +08:00
zhangduo	ee540c9f9e	HBASE-20624 Race in ReplicationSource which causes walEntryFilter being null when creating new shipper	2018-05-24 10:48:29 +08:00
Andrew Purtell	9fbce1668b	HBASE-20597 Use a lock to serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint	2018-05-23 16:46:22 -07:00
Michael Stack	079f168c5c	HBASE-20620 HBASE-20564 Tighter ByteBufferKeyValue Cell Comparator; part 2 Adds new stripped-down, faster ByteBufferKeyValue comparator (BBKV is the base Cell-type in hbase2). Creates an instance of new Comparator each time we create new memstore rather than use the universal CellComparator. Remove unused and unneeded Interfaces from Cell base type.	2018-05-23 13:20:47 -07:00
huzheng	5721150c6d	HBASE-20612 TestReplicationKillSlaveRSWithSeparateOldWALs sometimes fail because it uses an expired cluster conn	2018-05-23 12:07:01 +08:00
tedyu	6c1097e92f	HBASE-20609 SnapshotHFileCleaner#init should check that params is not null	2018-05-21 18:36:38 -07:00
jingyuntian	c9f8c3436f	HBASE-20579 Improve snapshot manifest copy in ExportSnapshot Signed-off-by: tedyu <yuzhihong@gmail.com>	2018-05-18 06:42:12 -07:00
Guanghao Zhang	0836b0719a	HBASE-20583 SplitLogWorker should handle FileNotFoundException when split a wal	2018-05-18 14:29:41 +08:00
Balazs Meszaros	6148b4785d	HBASE-20571 JMXJsonServlet generates invalid JSON if it has NaN in metrics - CacheStats won't generate NaN metrics. - JSONBean class will serialize special floating point values as "NaN", "Infinity" or "-Infinity" Signed-off-by: Andrew Purtell <apurtell@apache.org>	2018-05-16 12:20:39 -07:00
Apekshit Sharma	8c9825a030	HBASE-20567 Pass both old and new descriptors to pre/post hooks of modify operations for table and namespace. Signed-off-by: Mike Drob <mdrob@apache.org>	2018-05-16 14:03:18 -05:00
Michael Stack	438af9bf74	HBASE-20564 Tighter ByteBufferKeyValue Cell Comparator; ADDENDUM Add method the CellComparator Interface. Add implementation to meta comparator so we don't fall back to the default comparator. Includes a nothing change to hbase-server/pom.xml just to provoke build.	2018-05-16 09:43:16 -07:00
Michael Stack	060b8aca86	HBASE-20520 Failed effort upping default HDFS blocksize, hbase.regionserver.hlog.blocksize	2018-05-16 09:19:24 -07:00
zhangduo	ab53329cb3	HBASE-20585 Need to clear peer map when clearing MasterProcedureScheduler	2018-05-16 08:46:29 +08:00
zhangduo	26babcf013	HBASE-20457 Return immediately for a scan rpc call when we want to switch from pread to stream	2018-05-15 20:56:20 +08:00
Zach York	d2daada970	HBASE-20447 Only fail cacheBlock if block collisions aren't related to next block metadata When we pread, we don't force the read to read all of the next block header. However, when we get into a race condition where two opener threads try to cache the same block and one thread read all of the next block header and the other one didn't, it will fail the open process. This is especially important in a splitting case where it will potentially fail the split process. Instead, in the caches, we should only fail if the required blocks are different. Signed-off-by: Andrew Purtell <apurtell@apache.org>	2018-05-14 17:16:54 -07:00
huzheng	eabe672ebd	HBASE-20560 Revisit the TestReplicationDroppedTables ut	2018-05-14 19:12:43 +08:00
Michael Stack	021f66d11d	HBASE-20411 Ameliorate MutableSegment synchronize Change the MemStore size accounting so we don't synchronize across three volatiles applying deltas. Instead: + Make MemStoreSize, a datastructure of our memstore size longs, immutable. + Undo MemStoreSizing being an instance of MemStoreSize; instead it has-a. + Make two MemStoreSizing implementations; one thread-safe, the other not. + Let all memory sizing longs run independent, untied by synchronize (Huaxiang and Anoop suggestion) using atomiclongs. + Review all use of MemStoreSizing. Many are single-threaded and do not need to be synchronized; use the non-thread safe counter. TODO: Use this technique accounting at the global level too.	2018-05-12 02:17:50 +01:00
Sean Busbey	8ba2a7eeb9	HBASE-20544 Make HBTU default to random ports. Signed-off-by: Umesh Agashe <uagashe@cloudera.com> Signed-off-by: Josh Elser <elserj@apache.org>	2018-05-09 23:35:20 -07:00
Thiruvel Thirumoolan	a67909d3d6	HBASE-20545 Improve performance of BaseLoadBalancer.retainAssignment Signed-off-by: tedyu <yuzhihong@gmail.com>	2018-05-09 19:48:27 -07:00

1 2 3 4 5 ...

6828 Commits