265 Commits

Author SHA1 Message Date
stack
7c1f15bd2a HBASE-21558 Set version to 2.1.2 on branch-2.1 so can cut an RC 2018-12-05 21:24:04 -08:00
Duo Zhang
030d06141c HBASE-21490 WALProcedure may remove proc wal files still with active procedures
Signed-off-by: Allan Yang <allan163@apache.org>
2018-11-19 08:20:49 -08:00
Duo Zhang
dd1aa88ddd HBASE-21377 Add debug log for procedure stack id related operations 2018-11-19 18:55:50 +08:00
Ankit Singhal
d0c2e60e36 HBASE-21440 Assign procedure on the crashed server is not properly interrupted 2018-11-14 22:33:13 -08:00
Allan Yang
0f295de156 HBASE-21468 separate workers for meta table is not working 2018-11-14 11:43:41 +08:00
jingyuntian
c6090d4f04 HBASE-21437 Bypassed procedure throw IllegalArgumentException when its state is WAITING_TIMEOUT
Signed-off-by: Allan Yang <allan163@apache.org>
2018-11-09 22:52:14 +08:00
Sean Busbey
6f9084380b HBASE-21442 Update branch-2.1 for next development cycle
* update pom versions to 2.1.2-SNAPSHOT
* update CHANGES.md to mark release date (as of arriving in dist/release svn repo)
2018-11-06 14:19:47 -06:00
zhangduo
d19e6dff2c HBASE-21314 The implementation of BitSetNode is not efficient 2018-11-06 09:20:06 +08:00
tedyu
ed218554a9 HBASE-21438 TestAdmin2#testGetProcedures fails due to FailedProcedure inaccessible
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-11-06 09:16:50 +08:00
Allan Yang
0b7c66642b HBASE-21423 Procedures for meta table/region should be able to execute in separate workers 2018-11-05 20:37:15 +08:00
zhangduo
46eb8f1d0d HBASE-21351 The force update thread may have race with PE worker when the procedure is rolling back 2018-11-03 08:25:43 +08:00
zhangduo
2466032fdd HBASE-21375 Revisit the lock and queue implementation in MasterProcedureScheduler 2018-10-29 20:18:10 +08:00
Michael Stack
066082dff4
HBASE-21397 Set version to 2.1.1 on branch-2.1 in prep for first RC 2018-10-26 12:56:24 -07:00
zhangduo
e0dfa4caf3
HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure
Signed-off-by: Michael Stack <stack@apache.org>
2018-10-26 12:35:11 -07:00
zhangduo
9a151ec77b Revert "HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure"
This reverts commit e29ce9f93753d79edfa4e8b864c31c34e33ea635.
2018-10-26 21:30:10 +08:00
Allan Yang
e71c05707e HBASE-21384 Procedure with holdlock=false should not be restored lock when restarts 2018-10-25 13:58:50 +08:00
Duo Zhang
040ec2227e HBASE-21363 Rewrite the buildingHoldCleanupTracker method in WALProcedureStore 2018-10-24 14:37:26 +08:00
Allan Yang
6c9e3d0670 HBASE-21364 Procedure holds the lock should put to front of the queue after restart 2018-10-24 10:52:52 +08:00
Allan Yang
e29ce9f937 HBASE-20973 ArrayIndexOutOfBoundsException when rolling back procedure 2018-10-23 16:13:24 +08:00
zhangduo
7c04a95f4a
HBASE-21321 Backport HBASE-21278 to branch-2.1 and branch-2.0 ("Do not rollback successful sub procedures when rolling back a procedure")
Signed-off-by: Michael Stack <stack@apache.org>
2018-10-22 21:10:11 -07:00
Allan Yang
c141547f3b HBASE-21354 Procedure may be deleted improperly during master restarts resulting in 'Corrupt' 2018-10-23 10:27:48 +08:00
zhangduo
4ded75357b HBASE-21336 Simplify the implementation of WALProcedureMap 2018-10-22 18:36:39 +08:00
zhangduo
7e4cb7d7ec
HBASE-21323 Should not skip force updating for a sub procedure even if it has been finished
Reapplication after fixing failing test.
2018-10-19 15:25:15 -07:00
Duo Zhang
63f718974b
HBASE-21075 Confirm that we can (rolling) upgrade from 2.0.x and 2.1.x to 2.2.x after HBASE-20881
Signed-off-by: Michael Stack <stack@apache.org>
2018-10-19 12:34:36 -07:00
Michael Stack
0cd23c3dda
Revert "HBASE-21323 Should not skip force updating for a sub procedure even if it has been finished"
This reverts commit fffd9b9b6dd42c52f4a30e956313cdb4129c66be.

Revert till we figure why behavior between 2.1 and 2.2 is different.
2018-10-18 20:04:24 -07:00
Michael Stack
8fd3fd0e9c
Revert "HBASE-21323 Should not skip force updating for a sub procedure even if"
This reverts commit 30727764a3f9c30c41eaae4340ee7ea9723c1306.

Revert till we figure why behavior between 2.1 and 2.2 is different.
2018-10-18 20:03:57 -07:00
tianjingyun
915e87ecf7
HBASE-21291 Add a test for bypassing stuck state-machine procedures
Signed-off-by: Michael Stack <stack@apache.org>
2018-10-18 14:26:47 -07:00
Michael Stack
30727764a3
HBASE-21323 Should not skip force updating for a sub procedure even if
it has been finished; ADDENDUM

Fix broke unit test.
2018-10-18 13:48:02 -07:00
zhangduo
fffd9b9b6d HBASE-21323 Should not skip force updating for a sub procedure even if it has been finished 2018-10-18 14:44:31 +08:00
Duo Zhang
85c3ec3fb4 HBASE-21315 The getActiveMinProcId and getActiveMaxProcId of BitSetNode are incorrect if there are no active procedure 2018-10-16 15:42:10 +08:00
Duo Zhang
c3401d4327 HBASE-21254 Need to find a way to limit the number of proc wal files 2018-10-12 11:47:48 +08:00
zhangduo
5a300f3fc9 HBASE-21250 Refactor WALProcedureStore and add more comments for better understanding the implementation 2018-10-07 17:16:09 +08:00
Michael Stack
9d34b4581c
HBASE-21242 [amv2] Miscellaneous minor log and assign procedure create improvements
For RIT Duration, do better than print ms/seconds. Remove redundant UI
column dedicated to duration when we log it in the status field too.

Make bypass log at INFO level.

Make it so on complete of subprocedure, we note count of outstanding
siblings so we have a clue how much further the parent has to go before
it is done (Helpful when hundreds of servers doing SCP).

Have the SCP run the AP preflight check before creating an AP; saves
creation of thousands of APs during fixup.

Don't log tablename three times when reporting remote call failed.

If lock is held already, note who has it. Also log after we get lock
or if we have to wait rather than log on entrance though we may
later have to wait (or we may have just picked up the lock).

Signed-off-by: Mike Drob <mdrob@apache.org>
2018-10-04 17:18:13 -07:00
Michael Stack
8fc90a23ae
HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign
Adds override to assigns and unassigns. Changes bypass 'force'
to align calling the param 'override' instead.

Adds recursive to 'bypass', a means of calling bypass on
parent and its subprocedures (usually bypass works on
leaf nodes rippling the bypass up to parent -- recursive
has us work in the opposite direction): EXPERIMENTAL.

bypass on an assign/unassign leaves region in RIT and the
RegionStateNode loaded with the bypassed procedure. First
implementation had assign/unassign cleanup leftover state.
Second implementation, on feedback, keeps the state in place
as a fence against other Procedures assuming the region entity,
and instead adds an 'override' function that hbck2 can set on
assigns/unassigns to override the fencing.

Note that the below also converts ProcedureExceptions that
come out of the Pv2 system into DoNotRetryIOEs. It is a
little awkward because DNRIOE is in client-module, not
in procedure module. Previous, we'd just keep retrying
the bypass, etc.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
 Have bypass take an environment like all other methods so subclasses.
 Fix javadoc issues.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
 Javadoc issues. Pass environment when we invoke bypass.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
 Rename waitUntilNamespace... etc. to align with how these method types
 are named elsehwere .. i.e. waitFor rather than waitUntil..

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Cleanup message we emit when we find an exisitng procedure working
 against this entity.
 Add support for a force function which allows Assigns/Unassigns force
 ownership of the Region entity.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestRegionBypass.java
 Test bypass and force.

M hbase-shell/src/main/ruby/shell/commands/list_procedures.rb
 Minor cleanup of the json output... do iso8601 timestamps.
2018-10-04 16:37:37 -07:00
Michael Stack
259d12f739 Revert "Revert "Revert "HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign"""
This reverts commit 2174461cf729b61a844950278773ee0802ced158.

Revert because not ready to port to other branches.
2018-09-29 04:06:46 -07:00
Michael Stack
2174461cf7 Revert "Revert "HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign""
This reverts commit b96905d1df93aea0bc5b0e1ab074954e57b0dcc4.

i.e. a revert of a revert so a reapplication!

Revert so I can add signed-off-by....

Signed-off-by: Allan Yang <allan163@apache.org>
2018-09-29 03:34:36 -07:00
Michael Stack
b96905d1df Revert "HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign"
This reverts commit b42d7978cbb0d2b02eb5552a2f344cb128092b1e.
2018-09-29 03:34:10 -07:00
Michael Stack
b42d7978cb HBASE-21213 [hbck2] bypass leaves behind state in RegionStates when assign/unassign
bypass on an assign/unassign leaves region in RIT and the
RegionStateNode loaded with the bypassed procedure. First
implementation had assign/unassign cleanup leftover state.
Second implementation, on feedback, keeps the state in place
as a fence against other Procedures assuming the region entity,
and instead adds an 'override' function that hbck2 can set on
assigns/unassigns to override the fencing.

Note that the below also converts ProcedureExceptions that
come out of the Pv2 system into DoNotRetryIOEs. It is a
little awkward because DNRIOE is in client-module, not
in procedure module. Previous, we'd just keep retrying
the bypass, etc.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
 Have bypass take an environment like all other methods so subclasses.
 Fix javadoc issues.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java
 Javadoc issues. Pass environment when we invoke bypass.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
 Rename waitUntilNamespace... etc. to align with how these method types
 are named elsehwere .. i.e. waitFor rather than waitUntil..

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Cleanup message we emit when we find an exisitng procedure working
 against this entity.
 Add support for a force function which allows Assigns/Unassigns force
 ownership of the Region entity.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestRegionBypass.java
 Test bypass and force.

M hbase-shell/src/main/ruby/shell/commands/list_procedures.rb
 Minor cleanup of the json output... do iso8601 timestamps.
2018-09-29 03:33:07 -07:00
meiyi
8dea600795 HBASE-21249 Add jitter for ProcedureUtil.getBackoffTimeMs
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-09-28 21:28:16 +08:00
zhangduo
4947e72f63 HBASE-21233 Allow the procedure implementation to skip persistence of the state after a execution 2018-09-28 11:14:49 +08:00
Umesh Agashe
e6c7ed34e0
HBASE-21023 Added bypassProcedure() API to HbckService 2018-09-19 15:01:29 -07:00
Michael Stack
487f713c63 HBASE-21190 Log files and count of entries in each as we load from the MasterProcWAL store 2018-09-12 10:19:46 -07:00
Duo Zhang
2da6dbe563 HBASE-21172 Reimplement the retry backoff logic for ReopenTableRegionsProcedure 2018-09-12 16:01:55 +08:00
TAK LON WU
2c19b04274
HBASE-21181 Use the same filesystem for wal archive directory and wal directory
Signed-off-by: Andrew Purtell <apurtell@apache.org>
2018-09-11 15:50:41 -07:00
Michael Stack
f755ded2d2 HBASE-21171 [amv2] Tool to parse a directory of MasterProcWALs standalone
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-09-08 20:35:15 -07:00
Michael Stack
205783419c
HBASE-21155 Save on a few log strings and some churn in wal splitter by skipping out early if no logs in dir 2018-09-06 16:36:59 -07:00
Allan Yang
e33591515c
HBASE-21083 Introduce a mechanism to bypass the execution of a stuck procedure 2018-08-28 20:18:47 -07:00
Michael Stack
d954031d50 HBASE-21078 [amv2] CODE-BUG NPE in RTP doing Unassign 2018-08-24 13:22:16 -07:00
Allan Yang
ee3507d456 HBASE-21050 Exclusive lock may be held by a SUCCESS state procedure forever
Signed-off-by: Michael Stack <stack@apache.org>
Signed-off-by: zhangduo <zhangduo@apache.org>
2018-08-15 15:39:15 -07:00
Allan Yang
e1188d27f5 HBASE-20978 [amv2] Worker terminating UNNATURALLY during MoveRegionProcedure Signed-off-by: Michael Stack <stack@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org> 2018-08-14 16:29:58 -07:00