Go to file
zhangduo a472f24d17 HBASE-20634 Reopen region while server crash can cause the procedure to be stuck
A reattempt at fixing HBASE-20173 [AMv2] DisableTableProcedure concurrent to ServerCrashProcedure can deadlock

The scenario is a SCP after processing WALs, goes to assign regions that
were on the crashed server but a concurrent Procedure gets in there
first and tries to unassign a region that was on the crashed server
(could be part of a move procedure or a disable table, etc.). The
unassign happens to run AFTER SCP has released all RPCs that
were going against the crashed server. The unassign fails because the
server is crashed. The unassign used to suspend itself only it would
never be woken up because the server it was going against had already
been processed. Worse, the SCP could not make progress because the
unassign was suspended with the lock on a region that it wanted to
assign held making it so it could make no progress.

In here, we add to the unassign recognition of the state where it is
running post SCP cleanup of RPCs. If present, unassign moves to finish
instead of suspending itself.

Includes a nice unit test made by Duo Zhang that reproduces nicely the
hung scenario.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java
 Moved this class back to hbase-procedure where it belongs.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoNodeDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NoServerDispatchException.java
M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/NullTargetServerDispatchException.java
 Specializiations on FRDE so we can be more particular when we say there
 was a problem.

M hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
 Change addOperationToNode so we throw exceptions that give more detail
 on issue rather than a mysterious true/false

M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
 Undo SERVER_CRASH_HANDLE_RIT2. Bad idea (from HBASE-20173)

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 Have expireServer return true if it actually queued an expiration. Used
 later in this patch.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
 Hide methods that shouldn't be public. Add a particular check used out
 in unassign procedure failure processing.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
 Check that server we're to move from is actually online (might
 catch a few silly move requests early).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
 Add doc on ServerState. Wasn't being used really. Now we actually stamp
 a Server OFFLINE after its WAL has been split. Means its safe to assign
 since all WALs have been processed. Add methods to update SPLITTING
 and to set it to OFFLINE after splitting done.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 Change logging to be new-style and less repetitive of info.
 Cater to new way in which .addOperationToNode returns info (exceptions
 rather than true/false).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 Add looking for the case where we failed assign AND we should not
 suspend because we will never be woken up because SCP is beyond
 doing this for all stuck RPCs.

 Some cleanup of the failure processing grouping where we can proceed.

 TODOs have been handled in this refactor including the TODO that
 wonders if it possible that there are concurrent fails coming in
 (Yes).

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java
 Doc and removing the old HBASE-20173 'fix'.
 Also updating ServerStateNode post WAL splitting so it gets marked
 OFFLINE.

A hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestServerCrashProcedureStuck.java
 Nice test by Duo Zhang.

Signed-off-by: Umesh Agashe <uagashe@cloudera.com>
Signed-off-by: Duo Zhang <palomino219@gmail.com>
Signed-off-by: Mike Drob <mdrob@apache.org>
2018-06-04 09:26:56 -07:00
bin HBASE-20592 Create a tool to verify tables do not have prefix tree encoding 2018-06-01 19:17:49 +02:00
conf HBASE-12882 Log level now configurable from outside Log4j configuration 2018-06-02 11:21:15 +02:00
dev-support HBASE-20501 update minimum supported Hadoop version to 2.7.1. 2018-05-30 21:22:42 -05:00
hbase-annotations HBASE-20212 Make all Public classes have InterfaceAudience category 2018-03-22 18:10:23 +08:00
hbase-archetypes HBASE-20544 Make HBTU default to random ports. 2018-05-09 23:35:20 -07:00
hbase-assembly HBASE-20149 Purge dev javadoc from bin tarball (or make a separate tarball of javadoc) 2018-04-10 10:04:22 -07:00
hbase-backup HBASE-20478 Update checkstyle to v8.2 2018-05-29 10:12:31 -05:00
hbase-build-configuration HBASE-20180 Avoid Class::newInstance 2018-03-14 13:15:05 -05:00
hbase-build-support HBASE-19987 error-prone 2.2.0 2018-03-07 13:30:51 -06:00
hbase-checkstyle HBASE-20478 Update checkstyle to v8.2 2018-05-29 10:12:31 -05:00
hbase-client HBASE-20640 Add missing test category and class rule 2018-05-30 10:34:19 -04:00
hbase-common HBASE-20444 Addendum keep folks from looking at raw version component array. 2018-05-31 13:29:52 -05:00
hbase-endpoint HBASE-20478 Update checkstyle to v8.2 2018-05-29 10:12:31 -05:00
hbase-examples HBASE-20590 REST Java client is not able to negotiate with the server in the secure mode 2018-06-04 14:11:19 +05:30
hbase-external-blockcache HBASE-20447 Only fail cacheBlock if block collisions aren't related to next block metadata 2018-05-14 17:16:54 -07:00
hbase-hadoop-compat HBASE-20450 Provide metrics for number of total active, priority and replication rpc handlers 2018-04-20 16:24:32 -07:00
hbase-hadoop2-compat HBASE-19724 Fixed Checkstyle errors in hbase-hadoop2-compat and enabled Checkstyle to fail on violations 2018-06-01 10:59:47 +02:00
hbase-http HBASE-20478 Update checkstyle to v8.2 2018-05-29 10:12:31 -05:00
hbase-it HBASE-19761:Fix Checkstyle errors in hbase-zookeeper 2018-06-02 10:08:15 +02:00
hbase-mapreduce HBASE-20579 Include original exception in wrapped exception 2018-06-02 22:27:13 -04:00
hbase-metrics HBASE-20212 Make all Public classes have InterfaceAudience category 2018-03-22 18:10:23 +08:00
hbase-metrics-api HBASE-20212 Make all Public classes have InterfaceAudience category 2018-03-22 18:10:23 +08:00
hbase-native-client HBASE-14087 Ensure correct ASF headers for docs/code 2015-07-29 14:25:43 -05:00
hbase-procedure HBASE-20634 Reopen region while server crash can cause the procedure to be stuck 2018-06-04 09:26:56 -07:00
hbase-protocol HBASE-20356 Make skipping protoc possible 2018-04-12 13:31:54 -05:00
hbase-protocol-shaded HBASE-20659 Implement a reopen table regions procedure 2018-05-30 20:03:25 +08:00
hbase-replication HBASE-19761:Fix Checkstyle errors in hbase-zookeeper 2018-06-02 10:08:15 +02:00
hbase-resource-bundle HBASE-20070 refactor website generation 2018-03-02 09:25:10 -06:00
hbase-rest HBASE-20590 REST Java client is not able to negotiate with the server in the secure mode 2018-06-04 14:11:19 +05:30
hbase-rsgroup HBASE-19761:Fix Checkstyle errors in hbase-zookeeper 2018-06-02 10:08:15 +02:00
hbase-server HBASE-20634 Reopen region while server crash can cause the procedure to be stuck 2018-06-04 09:26:56 -07:00
hbase-shaded HBASE-20070 refactor website generation 2018-03-02 09:25:10 -06:00
hbase-shell HBASE-20645 Pass stringified table name to exists? method 2018-05-25 15:02:47 -04:00
hbase-spark HBASE-20544 Make HBTU default to random ports. 2018-05-09 23:35:20 -07:00
hbase-spark-it HBASE-20544 Make HBTU default to random ports. 2018-05-09 23:35:20 -07:00
hbase-testing-util HBASE-20544 Make HBTU default to random ports. 2018-05-09 23:35:20 -07:00
hbase-thrift HBASE-20664 Reduce the broad scope of outToken in ThriftHttpServlet 2018-05-31 20:02:25 -04:00
hbase-zookeeper HBASE-19761:Fix Checkstyle errors in hbase-zookeeper 2018-06-02 10:08:15 +02:00
src HBASE-18948: Added a note in the Tag implementation details in security.adoc 2018-06-04 11:03:58 -04:00
.gitattributes HBASE-6816. [WINDOWS] line endings on checkout for .sh files 2013-01-23 19:30:14 +00:00
.gitignore HBASE-19637 Add .checkstyle to gitignore 2017-12-27 11:24:35 +08:00
.pylintrc HBASE-18041 Add .pylintrc to HBase 2017-06-28 12:22:37 -05:00
CHANGES.txt HBASE-18548 Move sources of website gen and check jobs into source control 2017-08-10 14:48:14 -07:00
LICENSE.txt HBASE-18548 Move sources of website gen and check jobs into source control 2017-08-10 14:48:14 -07:00
NOTICE.txt HBASE-20088 Update NOTICE.txt year 2018-02-27 09:52:30 -05:00
README.txt HBASE-14348 Update download mirror link 2018-04-04 14:30:06 -07:00
pom.xml Add Guangxu Cheng to pom.xml 2018-06-04 14:54:39 +08:00

README.txt

Apache HBase [1] is an open-source, distributed, versioned, column-oriented
store modeled after Google' Bigtable: A Distributed Storage System for
Structured Data by Chang et al.[2]  Just as Bigtable leverages the distributed
data storage provided by the Google File System, HBase provides Bigtable-like
capabilities on top of Apache Hadoop [3].

To get started using HBase, the full documentation for this release can be
found under the doc/ directory that accompanies this README.  Using a browser,
open the docs/index.html to view the project home page (or browse to [1]).
The hbase 'book' at http://hbase.apache.org/book.html has a 'quick start'
section and is where you should being your exploration of the hbase project.

The latest HBase can be downloaded from an Apache Mirror [4].

The source code can be found at [5]

The HBase issue tracker is at [6]

Apache HBase is made available under the Apache License, version 2.0 [7]

The HBase mailing lists and archives are listed here [8].

The HBase distribution includes cryptographic software. See the export control
notice here [9].

1. http://hbase.apache.org
2. http://research.google.com/archive/bigtable.html
3. http://hadoop.apache.org
4. http://www.apache.org/dyn/closer.lua/hbase/
5. https://hbase.apache.org/source-repository.html
6. https://hbase.apache.org/issue-tracking.html
7. http://hbase.apache.org/license.html
8. http://hbase.apache.org/mail-lists.html
9. https://hbase.apache.org/export_control.html