Commit Graph

6945 Commits

Author SHA1 Message Date
zhangduo ae6c90b4ec HBASE-20426 Give up replicating anything in S state 2018-06-28 18:08:43 +08:00
huzheng 5b6c0d2777 HBASE-20432 Cleanup related resources when remove a sync replication peer 2018-06-28 18:08:43 +08:00
Guanghao Zhang 1bea678ef8 HBASE-20458 Support removing a WAL from LogRoller 2018-06-28 18:08:43 +08:00
zhangduo 2d203c4479 HBASE-20434 Also remove remote wals when peer is in DA state 2018-06-28 18:08:43 +08:00
zhangduo b281328228 HBASE-20456 Support removing a ReplicationSourceShipper for a special wal group 2018-06-28 18:08:43 +08:00
huzheng 66cced16dc HBASE-20425 Do not write the cluster id of the current active cluster when writing remote WAL 2018-06-28 18:08:43 +08:00
huzheng fe339860b5 HBASE-19782 Reject the replication request when peer is DA or A state 2018-06-28 18:08:43 +08:00
zhangduo d91784e666 HBASE-20370 Also remove the wal file in remote cluster when we finish replicating a file 2018-06-28 18:08:43 +08:00
Guanghao Zhang d57c80c415 HBASE-20163 Forbid major compaction when standby cluster replay the remote wals 2018-06-28 18:08:43 +08:00
zhangduo 2389c09d75 HBASE-19079 Support setting up two clusters with A and S stat 2018-06-28 18:08:43 +08:00
Guanghao Zhang c7d1085fa2 HBASE-19999 Remove the SYNC_REPLICATION_ENABLED flag 2018-06-28 18:07:44 +08:00
Guanghao Zhang 183b8d0581 HBASE-19973 Implement a procedure to replay sync replication wal for standby cluster 2018-06-28 18:07:44 +08:00
huzheng 45794d4156 HBASE-19943 Only allow removing sync replication peer which is in DA state 2018-06-28 18:07:44 +08:00
zhangduo 0c97cda2a9 HBASE-19990 Create remote wal directory when transitting to state S 2018-06-28 18:07:44 +08:00
zhangduo a41c549ca4 HBASE-19082 Reject read/write from client but accept write from replication in state S 2018-06-28 18:07:44 +08:00
zhangduo 39dd81a7c6 HBASE-19957 General framework to transit sync replication state 2018-06-28 18:07:44 +08:00
Guanghao Zhang 00e54aae24 HBASE-19935 Only allow table replication for sync replication for now 2018-06-28 18:07:44 +08:00
Guanghao Zhang 1481bd9481 HBASE-19864 Use protobuf instead of enum.ordinal to store SyncReplicationState
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-06-28 18:07:44 +08:00
zhangduo d8842dc3d4 HBASE-19857 Complete the procedure for adding a sync replication peer 2018-06-28 18:07:44 +08:00
Guanghao Zhang 2acebac00e HBASE-19781 Add a new cluster state flag for synchronous replication 2018-06-28 18:07:44 +08:00
zhangduo 274b813e12 HBASE-19747 Introduce a special WALProvider for synchronous replication 2018-06-28 18:07:44 +08:00
Guanghao Zhang b4a1dbf768 HBASE-19078 Add a remote peer cluster wal directory config for synchronous replication
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-06-28 18:07:44 +08:00
zhangduo b3dea0378e HBASE-19083 Introduce a new log writer which can write to two HDFSes 2018-06-28 18:07:44 +08:00
Michael Stack c23e61f20d HBASE-20781 Save recalculating families in a WALEdit batch of Cells
Pass the Set of families through to the WAL rather than recalculate
a Set already known.

Signed-off-by: zhangduo <zhangduo@apache.org>
2018-06-27 22:04:57 -07:00
Reid Chan 74e5c776b3 HBASE-20732 Shutdown scan pool when master is stopped
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2018-06-28 11:42:18 +08:00
tedyu a8b16ac907 HBASE-20798 Duplicate thread names of StoreFileOpenerThread and StoreFileCloserThread (Zephyr Guo) 2018-06-27 17:21:07 -07:00
Sahil Aggarwal 952bb96c8a
HBASE-19164: Remove UUID.randomUUID in tests.
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-27 10:34:16 -05:00
jingyuntian 6a0c67344a HBASE-20194 Basic Replication WebUI - Master
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-06-26 18:26:54 +08:00
Michael Stack 4ba6242a62 HBASE-20780 ServerRpcConnection logging cleanup Get rid of one of the logging lines in ServerRpcConnection by amalgamating all into one new-style log line. 2018-06-25 16:43:11 -07:00
Michael Stack 0db2b628d6
HBASE-20770 WAL cleaner logs way too much; gets clogged when lots of work to do
General log cleanup; setting stuff that can flood the log to TRACE.
2018-06-25 12:13:04 -07:00
Todd Lipcon 025ddce868 HBASE-20403. Fix race between prefetch task and non-pread HFile reads
With prefetch-on-open enabled, the task doing the prefetching was using
non-positional (i.e. streaming) reads. If the main (non-prefetch) thread
was also using non-positional reads, these two would conflict, because
inputstreams are not thread-safe for non-positional reads.

In the case of an encrypted filesystem, this could cause JVM crashes,
etc, as underlying cipher buffers were freed underneath the racing
threads. In the case of a non-encrypted filesystem, less severe errors
would be thrown. The included unit test reproduces the latter case.
2018-06-25 11:54:52 -07:00
zhangduo 9640ebacd4 HBASE-20777 RpcConnection could still remain opened after we shutdown the NettyRpcServer 2018-06-25 14:15:15 +08:00
Michael Stack daad14428d HBASE-20778 Make it so WALPE runs on DFS 2018-06-23 23:33:53 -07:00
zhangduo 55147c7eae HBASE-20775 Addendum disable REGIONS_ON_MASTER for TEstMultiParallel 2018-06-23 17:38:50 +08:00
zhangduo 14087cc919 HBASE-20775 TestMultiParallel is flakey 2018-06-22 21:32:07 +08:00
zhangduo 177458d9d0 HBASE-18569 Add prefetch support for async region locator 2018-06-22 18:25:31 +08:00
tedyu 98245ca6e4 HBASE-20740 StochasticLoadBalancer should consider CoprocessorService request factor when computing cost (chenxu) 2018-06-22 00:26:14 -07:00
zhangduo 7b716c964b HBASE-20752 Make sure the regions are truly reopened after ReopenTableRegionsProcedure 2018-06-22 14:04:33 +08:00
zhangduo 0d784efc37 HBASE-20767 Always close hbaseAdmin along with connection in HBTU 2018-06-21 21:01:19 +08:00
Ankit Singhal 72784c2d83 HBASE-20642 Clients should re-use the same nonce across DDL operations
Also changes modify table operations to help the case where a MTP spans
two master, avoiding the sanity-checks propagating back to the client
unnecessarily.

Signed-off-by: Josh Elser <elserj@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
2018-06-20 14:56:10 -07:00
Josh Elser e989a9927e HBASE-20706 Prevent MTP from trying to reopen non-OPEN regions
ModifyTableProcedure is using MoveRegionProcedure in a way
that was unintended from the original implementation. As such,
we have to guard against certain usages of it. We know we can
re-open OPEN regions, but regions in OPENING will similarly
soon be OPEN (thus, we want to reopen those regions too).

Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-06-20 14:19:28 -07:00
zhangduo 4cb70ea9f5 HBASE-20739 Add priority for SCP 2018-06-20 15:17:07 +08:00
zhangduo c08eff67af HBASE-20742 Always create WAL directory for region server 2018-06-20 14:21:23 +08:00
Michael Stack 21684a32fa HBASE-20745 Log when master proc wal rolls 2018-06-19 19:53:51 -07:00
zhangduo 6dbbd78aa0 HBASE-20708 Remove the usage of RecoverMetaProcedure in master startup 2018-06-19 15:02:10 +08:00
Allan Yang b336da925a HBASE-20727 Persist FlushedSequenceId to speed up WAL split after cluster restart 2018-06-19 09:45:47 +08:00
Sean Busbey f1b536bad4 HBASE-20332 shaded mapreduce module shouldn't include hadoop
* modify the jar checking script to take args; make hadoop stuff optional
* separate out checking the artifacts that have hadoop vs those that don't.
* * Unfortunately means we need two modules for checking things
* * put in a safety check that the support script for checking jar contents is maintained in both modules
* * have to carve out an exception for o.a.hadoop.metrics2. :(
* fix duplicated class warning
* clean up dependencies in hbase-server and some modules that depend on it.
* allow Hadoop to have its own htrace where it needs it
* add a precommit check to make sure we're not using old htrace imports
2018-06-18 11:31:04 -07:00
tedyu ac5bb8155b HBASE-20723 Custom hbase.wal.dir results in data loss because we write recovered edits into a different place than where the recovering region server looks for them 2018-06-15 19:40:48 -07:00
taiynlee 0e43abc78a HBASE-20737 put collection into ArrayList instead of addAll function
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2018-06-16 03:25:42 +08:00
Xu Cang 86653c708f HBASE-20695 Implement table level RegionServer replication metrics
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2018-06-15 10:38:49 +08:00
jingyuntian 0b28155d27 HBASE-20625 refactor some WALCellCodec related code
Signed-off-by: Guanghao Zhang <zghao@apache.org>
2018-06-14 19:37:01 +08:00
zhangduo 423a0ab71a HBASE-20722 Make RegionServerTracker only depend on children changed event 2018-06-14 08:36:37 +08:00
Guanghao Zhang ec66434380 HBASE-20561 The way we stop a ReplicationSource may cause the RS down 2018-06-13 17:58:59 +08:00
tedyu edf60b965b HBASE-20672 Adding new Metrics readRequestRate and writeRequestRate - revert pending discussion 2018-06-11 18:47:30 -07:00
Balazs Meszaros c323e7bfaa
HBASE-20656 Validate pre-2.0 coprocessors against HBase 2.0+
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-11 10:26:58 -05:00
Mike Drob eb13cdd7ed HBASE-20707 Move MissingSwitchDefault case check
Perform this check using error-prone instead of checkstyle because the
former can handle enum switches somewhat more intelligently.
2018-06-11 09:57:50 -05:00
zhangduo 573b57d437 HBASE-20700 Move meta region when server crash can cause the procedure to be stuck 2018-06-11 14:57:31 +08:00
Guanghao Zhang cc7aefe0bb HBASE-20698 (addendum) Master don't record right server version until new started region server call regionServerReport method 2018-06-10 08:23:28 +08:00
Guanghao Zhang 5fd16f3853 HBASE-20698 Master don't record right server version until new started region server call regionServerReport method 2018-06-09 14:40:43 +08:00
Ankit 519236b4af HBASE-20672 Adding new Metrics readRequestRate and writeRequestRate
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-06-08 13:48:33 -07:00
Nihal Jain 30a052b3e5 HBASE-20699 QuotaCache should cancel the QuotaRefresherChore service inside its stop()
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-06-08 04:30:52 -07:00
Michael Stack cfeb26d27a HBASE-20702 Processing crash, skip ONLINE'ing empty rows
Signed-off-by: Josh Elser <elserj@apache.org>
2018-06-07 09:54:57 -07:00
eric-maynard 9a80907760 HBASE-20665: Changed log level of HBASE-8547 warning to debug
Closes #77

Signed-off-by: Josh Elser <elserj@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
2018-06-07 11:34:33 -04:00
Peter Somogyi cfd4b7d564 HBASE-20683 Incorrect return value for PreUpgradeValidator
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
2018-06-06 20:03:56 +02:00
Andrew Purtell a45763df55 HBASE-20670 NPE in HMaster#isInMaintenanceMode 2018-06-04 15:19:47 -07:00
Michael Stack d99ba62b12 HBASE-20634 Reopen region while server crash can cause the procedure to be stuck; ADDENDUM 2018-06-04 12:39:39 -07:00
Michael Stack 03c0f7fe13 HBASE-20628 SegmentScanner does over-comparing when one flushing
Signed-off-by: eshcar <eshcar@oath.com>
Signed-off-by: anoopsjohn <anoopsamjohn@gmail.com>
2018-06-04 09:50:47 -07:00
zhangduo a472f24d17 HBASE-20634 Reopen region while server crash can cause the procedure to be stuck
A reattempt at fixing HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

The scenario is a SCP after processing WALs, goes to assign regions that
were on the crashed server but a concurrent Procedure gets in there
first and tries to unassign a region that was on the crashed server
(could be part of a move procedure or a disable table, etc.). The
unassign happens to run AFTER SCP has released all RPCs that
were going against the crashed server. The unassign fails because the
server is crashed. The unassign used to suspend itself only it would
never be woken up because the server it was going against had already
been processed. Worse, the SCP could not make progress because the
unassign was suspended with the lock on a region that it wanted to
assign held making it so it could make no progress.

In here, we add to the unassign recognition of the state where it is
running post SCP cleanup of RPCs. If present, unassign moves to finish
instead of suspending itself.

Includes a nice unit test made by Duo Zhang that reproduces nicely the
hung scenario.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java
 Moved this class back to hbase-procedure where it belongs.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoNodeDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoServerDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NullTargetServerDispatchException.java
 Specializiations on FRDE so we can be more particular when we say there
 was a problem.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
 Change addOperationToNode so we throw exceptions that give more detail
 on issue rather than a mysterious true/false

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Undo SERVER_CRASH_HANDLE_RIT2. Bad idea (from HBASE-20173)

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 Have expireServer return true if it actually queued an expiration. Used
 later in this patch.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Hide methods that shouldn't be public. Add a particular check used out
 in unassign procedure failure processing.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Check that server we're to move from is actually online (might
 catch a few silly move requests early).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
 Add doc on ServerState. Wasn't being used really. Now we actually stamp
 a Server OFFLINE after its WAL has been split. Means its safe to assign
 since all WALs have been processed. Add methods to update SPLITTING
 and to set it to OFFLINE after splitting done.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Change logging to be new-style and less repetitive of info.
 Cater to new way in which .addOperationToNode returns info (exceptions
 rather than true/false).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add looking for the case where we failed assign AND we should not
 suspend because we will never be woken up because SCP is beyond
 doing this for all stuck RPCs.

 Some cleanup of the failure processing grouping where we can proceed.

 TODOs have been handled in this refactor including the TODO that
 wonders if it possible that there are concurrent fails coming in
 (Yes).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
 Doc and removing the old HBASE-20173 'fix'.
 Also updating ServerStateNode post WAL splitting so it gets marked
 OFFLINE.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestServerCrashProcedureStuck.java
 Nice test by Duo Zhang.

Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Duo Zhang <palomino219@gmail.com>
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-04 09:26:56 -07:00
maoling 1b98a96caa HBASE-19761:Fix Checkstyle errors in hbase-zookeeper
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
2018-06-02 10:08:15 +02:00
Andrew Purtell 9d5004894c HBASE-20667 Rename TestGlobalThrottler to TestReplicationGlobalThrottler 2018-06-01 17:01:16 -07:00
Xu Cang a11701ecc5 HBASE-18116 Replication source in-memory accounting should not include bulk transfer hfiles
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-06-01 11:15:47 -07:00
Peter Somogyi 0968668283 HBASE-20592 Create a tool to verify tables do not have prefix tree encoding
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-01 19:17:49 +02:00
Andrew Purtell da3ecf1f13 Revert "HBASE-18116 fix replication source in-memory calculation by excluding bulk load file"
This reverts commit 6f3f34227e.
2018-05-31 15:28:28 -07:00
Xu Cang 6f3f34227e HBASE-18116 fix replication source in-memory calculation by excluding bulk load file
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-05-31 14:22:12 -07:00
Sean Busbey d909ec55aa HBASE-20444 Addendum keep folks from looking at raw version component array.
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-05-31 13:29:52 -05:00
Nihal Jain 40a73a5ca7 HBASE-20653 Add missing observer hooks for region server group to MasterObserver
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-30 21:29:07 -07:00
Andrew Purtell b889c8a221 HBASE-20646 TestWALProcedureStoreOnHDFS failing on branch-1 2018-05-30 14:44:30 -07:00
Andrew Purtell 31ae8dc7f3 Revert "TestWALProcedureStoreOnHDFS failing on branch-1"
This reverts commit dcfa01448c.
2018-05-30 14:44:22 -07:00
Andrew Purtell dcfa01448c TestWALProcedureStoreOnHDFS failing on branch-1 2018-05-30 13:45:38 -07:00
zhangduo 997747076d HBASE-20659 Implement a reopen table regions procedure 2018-05-30 20:03:25 +08:00
tedyu 266b251dfa HBASE-20639 Implement permission checking through AccessController instead of RSGroupAdminEndpoint - revert due to pending discussion 2018-05-29 19:57:51 -07:00
tedyu fe73fe8def HBASE-20653 Add missing observer hooks for region server group to MasterObserver - revert due to pending discussion 2018-05-29 19:42:28 -07:00
Nihal Jain 8d19bbd347 HBASE-20653 Add missing observer hooks for region server group to MasterObserver
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-29 16:37:19 -07:00
Andrew Purtell 06611256ee HBASE-20597 Serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint 2018-05-29 11:29:05 -07:00
Andrew Purtell 807c905f90 Revert "HBASE-20597 Use a lock to serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint"
This reverts commit 9fbce1668b.
2018-05-29 11:24:11 -07:00
Nihal Jain 7ff29d8e00 HBASE-20633 Dropping a table containing a disable violation policy fails to remove the quota upon table delete
Signed-off-by: Josh Elser <elserj@apache.org>
Signed-off-by: Michael Stack <stack@apache.org>
2018-05-29 11:33:56 -04:00
Mike Drob a110e1eff5 HBASE-20478 Update checkstyle to v8.2
Cannot go to latest (8.9) yet due to
  https://github.com/checkstyle/checkstyle/issues/5279

* move hbaseanti import checks to checkstyle
* implment a few missing equals checks, and ignore one
* fix lots of javadoc errors

Signed-off-by: Sean Busbey <busbey@apache.org>
2018-05-29 10:12:31 -05:00
eshcar 42be553433 HBASE-20390 ADDENDUM 2: fix TestHRegionWithInMemoryFlush OOME 2018-05-29 16:27:20 +03:00
Apekshit Sharma 05f57f4c03 HBASE-20652 Remove internal uses of some deprecated MasterObserver hooks
Remove internal uses of these hooks:
preModifyNamespace
postModifyNamespace
preModifyTable
postModifyTable
preModifyTableAction
postCompletedModifyTableAction

Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-28 21:10:52 -07:00
huzheng 81228f72d0 HBASE-20533 Fix the flaky TestAssignmentManagerMetrics 2018-05-29 09:38:47 +08:00
eshcar 1cd2b56802 HBASE-20390-ADDENDUM: fix TestHRegionWithInMemoryFlush OOME 2018-05-28 16:10:53 +03:00
Nihal Jain 9bd4b04ca8 HBASE-20639 Implement permission checking through AccessController instead of RSGroupAdminEndpoint
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-27 11:29:26 -07:00
eshcar 1eabbb4295 HBASE-20390: IMC Default Parameters for 2.0.0 2018-05-26 22:57:28 +03:00
Toshihiro Suzuki b1089e8310 HBASE-20648 HBASE-19364 "Truncate_preserve fails with table when replica region > 1" for master branch
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-25 07:52:40 -07:00
meiyi 36f3d9432a HBASE-20518 Need to serialize the enabled field for UpdatePeerConfigProcedure
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-05-25 14:36:16 +08:00
Thiruvel Thirumoolan 1fbce10ff4 HBASE-20548 Master fails to startup on large clusters, refreshing block distribution
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-05-24 15:47:24 -07:00
Toshihiro Suzuki 554d513f50 HBASE-20616 TruncateTableProcedure is stuck in retry loop in TRUNCATE_TABLE_CREATE_FS_LAYOUT state
Signed-off-by: tedyu <yuzhihong@gmail.com>
2018-05-24 15:16:30 -07:00
Nihal Jain 09dac89908 HBASE-20588 Space quota change after quota violation doesn't seem to take in effect
Signed-off-by: Josh Elser <elserj@apache.org>
2018-05-24 12:40:55 -04:00
Guanghao Zhang 320a3332e0 HBASE-20589 Don't need to assign meta to a new RS when standby master become active 2018-05-24 11:26:48 +08:00
zhangduo ee540c9f9e HBASE-20624 Race in ReplicationSource which causes walEntryFilter being null when creating new shipper 2018-05-24 10:48:29 +08:00