Master force-closes unknown/incorrect Regions OPEN on RS
M hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
Added a note and small refactor.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java
Fix an NPE when CJ ran.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Minor clean up of log message; make it clearer.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
Make it so closeRegionSilentlyAndWait can be used w/o timeout.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
If a RegionServer Report notes a Region is OPEN and the Master does not
know of said Region, close it (We used to crash out the RegionServer)
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateNode.java
Minor tweak of toString -- label should be state, not rit (confusing).
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
Doc.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
Add region name to exception.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/HBCKServerCrashProcedure.java
Be more careful about which Regions we queue up for reassign. This
procedure is run by the operator so could happen at any time. We
will likely be running this when Master has some accounting of
cluster members so check its answers for what Regions were on
server before running.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Doc and we were misrepresenting the case where a Region as not in RIT
when we got CLOSE -- we were reporting it as though it was already
trying to CLOSE.
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Signed-off-by: Lijin Bin <binlijin@apache.org>
Signed-off-by: Viraj Jasani <vjasani@apache.org>
Change its behavior so it will only look in hbase:meta
if the call to the super class turns up zero references.
Only then will it search hbase:meta for references to
'Unknown Servers'. Normal operation where we read Master
context is usual and sufficient. The scan of hbase:meta
is only for case where Master state has been corrupted
and we need to clear out 'Unknown Servers'.
Removes a bunch of dead code and fixes some checkstyle nits.
(cherry picked from commit efa4fe901a)
Signed-off-by: Jan Hentschel <janh@apache.org>
Signed-off-by: Xu Cang <xucang@apache.org>
Signed-off-by: Sean Busbey <busbey@apache.org>
Signed-off-by: Viraj Jasani <virajjasani007@gmail.com>
* Add a bit of javadoc around SerialReplicationChecker.
* Miniscule edit to the profiler jsp page and then a bit of doc on how to make it work that might help.
* Add some detail if NPE getting BitSetNode to help w/ debug.
* Change HbckChore to log region names instead of encoded names; helps doing diagnostics; can take region name and query in shell to find out all about the region according to hbase:meta.
* Add some fix-it help inline in the HBCK Report page – how to fix.
* Add counts in procedures page so can see if making progress; move listing of WALs to end of the page.
Have the existing scheduleRecoveries launch a new HBCKSCP
instead of SCP. It gets regions to recover from Master
in-memory context AND from a scan of hbase:meta. This
new HBCKSCP is For processing 'Unknown Servers', servers that
are 'dead' and purged but still have references in
hbase:meta. Rare occurance but needs tooling to address.
Later have catalogjanitor take care of these deviations
between Master in-memory and hbase:meta content (usually
because of overdriven cluster with failed RPCs to hbase:meta,
etc)
Changed expireServers in ServerManager so could pass in
custom reaction to expired server.... This is how we
run our custom HBCKSCP while keeping all other aspects
of expiring services (rather than try replicate it
externally).
Includes the following, incorporating HBASE-20439 and HBASE-20440, too.
1)
HBASE-18133 Decrease quota reaction latency by HBase
Certain operations in HBase are known to directly affect
the utilization of tables on HDFS. When these actions
occur, we can circumvent the normal path and notify the
Master directly. This results in a much faster response to
changes in HDFS usage.
This requires FS scanning by the RS to be decoupled from
the reporting of sizes to the Master. An API inside each
RS is made so that any operation can hook into this call
in the face of other operations (e.g. compaction, flush,
bulk load).
2)
HBASE-18135 Implement mechanism for RegionServers to report file archival for space quotas
This de-couples the snapshot size calculation from the
SpaceQuotaObserverChore into another API which both the periodically
invoked Master chore and the Master service endpoint can invoke. This
allows for multiple sources of snapshot size to reported (from the
multiple sources we have in HBase).
When a file is archived, snapshot sizes can be more quickly realized and
the Master can still perform periodical computations of the total
snapshot size to account for any delayed/missing/lost file archival RPCs.
3)
HBASE-20531 RS may throw NPE when close meta regions in shutdown procedure.
Make it so hbase:meta can be altered. TableState for hbase:meta
is kept in Master. State is in-memory transient so if Master
fails, hbase:meta is ENABLED again. hbase:meta schema will be
bootstrapped from the filesystem. Changes to filesystem schema
are atomic so we should be ok if Master fails mid-edit (TBD)
Undoes a bunch of guards that prevented our being able to edit
hbase:meta. At minimmum, need to add in a bunch of WARNING.
TODO: Tests, more clarity around hbase:meta table state, and undoing
references to hard-coded hbase:meta regioninfo.
M hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
Throw illegal access exception if you try to use MetaTableAccessor
getting state of the hbase:meta table.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
For table state, go to master rather than go to meta direct. Going
to meta won't work for hbase;meta state. Puts load on Master.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
Change isTableDisabled/Enabled implementation to ask the Master instead.
This will give the Master's TableStateManager's opinion rather than
client figuring it for themselves reading meta table direct.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncHBaseAdmin.java
TODO: Cleanup in here. Go to master for state, not to meta.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ZKAsyncRegistry.java
Logging cleanup.
M hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZNodePaths.java
Shutdown access.
M hbase-server/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java
Just cleanup.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/TableStateManager.java
Add state holder for hbase:meta.
Removed unused methods.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
Shut down access.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/DisableTableProcedure.java
Allow hbase:meta to be disabled.
M hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/EnableTableProcedure.java
Allow hbase:meta to be enabled.
Signed-off-by: Ramkrishna <ramkrishna.s.vasudevan@intel.com>
Check if overlap is split parent.
Cleaned up the HBCK Report page too with some notes that it is made of
two reports; have the two sections display the same.