New tool to dump existing replication peers, configurations and
queues when using HBase Replication. The tool provides two flags:
--distributed This flag will poll each RS for information about
the replication queues being processed on this RS.
By default this is not enabled and the information
about the replication queues and configuration will
be obtained from ZooKeeper.
--hdfs When --distributed is used, this flag will attempt
to calculate the total size of the WAL files used
by the replication queues. Since its possible that
multiple peers can be configured this value can be
overestimated.
Signed-off-by: Matteo Bertozzi <matteo.bertozzi@cloudera.com>
This is a revert of a revert; i.e. we are adding back the change only adding
back with fixes for the broken unit test; was a real issue on a test that
went in just at same time as this commit; I was getting a new nonce on each
retry rather than getting one for the mutation.
Other changes since revert are more hiding of RpcController. Use
accessor method rather than always pass in a RpcController
Walked back retrying operations that used to be single-shot (though
code comment said need a retry) because it opens a can of worms where
we retry stuff like bad column family when we shouldn't (needs
work adding in DoNotRetryIOEs)
Changed name of class from PayloadCarryingServerCallable to
CancellableRegionServerCallable.
Fix javadoc and findbugs warnings.
Fix case of not initializing the ScannerCallable RpcController.
Below is original commit message:
Remove mention of ServiceException and other protobuf classes from all over the codebase.
Purge TimeLimitedRpcController. Lets just have one override of RpcController.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractRegionServerCallable.java
Cleanup. Make it clear this is an odd class for async hbase intro.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
Refactor of RegionServerCallable allows me clean up a bunch of
boilerplate in here and remove protobuf references.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
Purge protobuf references everywhere except a reference to a throw of a
ServiceException in method checkHBaseAvailable. I deprecated it in favor
of new available method (the SE is not actually needed)
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/PayloadCarryingServerCallable.java
Move the RetryingTimeTracker instance in here from HTable.
Allows me to contain tracker and remove a repeated code in HTable.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionServerCallable.java
Clean up move set up of rpc in here rather than have it repeat in HTable.
Allows me to remove protobuf references from a bunch of places.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/FlushRegionCallable.java
Make use of the push of boilerplate up into RegionServerCallable
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/PayloadCarryingServerCallable.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionAdminServiceCallable.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/SecureBulkLoadClient.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
Move boilerplate up into superclass.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetryingTimeTracker.java
Cleanup
M hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/PayloadCarryingRpcController.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEditsReplaySink.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RegionReplicaReplicationEndpoint.java
Factor in TimeLimitedRpcController. Just have one RpcController override.
D hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java
Removed. Lets have one override of pb rpccontroller only.
M hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
(handleRemoteException) added
(toText) added
Purge ServiceException from Callable subclasses by pushing SE handling
up into the parent Callable class (varies by context but this is basic
patten). Allows us remove a bunch of boilerplate.
Do this in the public facing classes in particular (though if
an API has SE in it -- which a few do, this patch leaves these
untouched -- for now.) Make it so HBaseAdmin and HTable have no
direct pb imports (except for endpoint processor API).
Change a few of the HBaseAdmin calls to be retrying where comments
ask that we do retry rather than one time.
Purge TimeLimitedRpcController. Lets just have one override of RpcController.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/AbstractRegionServerCallable.java
Cleanup. Make it clear this is an odd class for async hbase intro.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
Refactor of RegionServerCallable allows me clean up a bunch of
boilerplate in here and remove protobuf references.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
Purge protobuf references everywhere except a reference to a throw of a
ServiceException in method checkHBaseAvailable. I deprecated it in favor
of new available method (the SE is not actually needed)
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/PayloadCarryingServerCallable.java
Move the RetryingTimeTracker instance in here from HTable.
Allows me to contain tracker and remove a repeated code in HTable.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionServerCallable.java
Clean up move set up of rpc in here rather than have it repeat in HTable.
Allows me to remove protobuf references from a bunch of places.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/FlushRegionCallable.java
Make use of the push of boilerplate up into RegionServerCallable
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/PayloadCarryingServerCallable.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionAdminServiceCallable.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/SecureBulkLoadClient.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
Move boilerplate up into superclass.
M hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetryingTimeTracker.java
Cleanup
M hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/PayloadCarryingRpcController.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEditsReplaySink.java
M hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/RegionReplicaReplicationEndpoint.java
Factor in TimeLimitedRpcController. Just have one RpcController override.
D hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java
Removed. Lets have one override of pb rpccontroller only.
M hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
(handleRemoteException) added
(toText) added
Signed-off-by: stack <stack@apache.org>
hbase-client has hbase-common test-jar as dependency in compile scope, while it should be test scope instead.
This patch fixes the bug.
closes#12
Signed-off-by: Sean Busbey <busbey@apache.org>
TimeRangeTracker as point of contention when many threads reading a StoreFile
Fixes HBASE-16074 ITBLL fails, reports lost big or tiny families broken
scanning because of a side effect of a clean up in HBASE-15650 to make
TimeRange construction consistent exposed a latent issue in
TimeRange#compare. See HBASE-16074 for more detail.
Also change HFile Writer constructor so we pass in the TimeRangeTracker, if one,
on construction rather than set later (the flag and reference were not volatile
so could have made for issues in concurrent case). And make sure the construction
of a TimeRange from a TimeRangeTracer on open of an HFile Reader never makes a
bad minimum value, one that would preclude us reading any values from a file
(set min to 0)
M hbase-common/src/main/java/org/apache/hadoop/hbase/io/TimeRange.java
Call through to next constructor (if minStamp was 0, we'd skip setting
allTime=true). Add asserts that timestamps are not < 0 cos it messes
us up if they are (we already were checking for < 0 on construction but
assert passed in timestamps are not < 0).
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
Add constructor override that takes a TimeRangeTracker (set when flushing
but not when compacting)
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
Add override creating an HFile in tmp that takes a TimeRangeTracker
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
Add override for HFile Writer that takes a TimeRangeTracker Take it on
construction instead of having it passed by a setter later (flags and
reference set by the setter were not volatile... could have been prob
in concurrent case)
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java
Log WARN if bad initial TimeRange value (and then 'fix' it)
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestTimeRangeTracker.java
A few tests to prove serialization works as expected and that we'll get a bad min if not constructed properly.
M hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
Handle OLDEST_TIMESTAMP explictly. Don't expect TimeRange to do it.
M hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
Refactor from junit3 to junit4 and add test for this weird case.
All ReplicationTableBase method's that need to access the Replication Table will block until it is created though.
Also refactored ReplicationSourceManager so that abandoned queue adoption is run in the background too so that it does not block HRegionServer initialization.
Signed-off-by: Elliott Clark <eclark@apache.org>
Building on HBase-15958.
Provided a ReplicationQueuesClientHBaseImpl that relies on the HBase Replication Table to track WAL queues.
Refactored out a large section of ReplicationQueuesHBaseImpl into a ReplicationTableClient class that handles Replication Table operations.
Signed-off-by: Elliott Clark <eclark@apache.org>
Building on HBase-15883.
Now implementing the claim queues procedure within an HBase table.
Also added UnitTests to test claimQueue.
Peer tracking will still be performed by ZooKeeper though.
Also modified the queueId tracking procedure so we no longer have to perform scans over the Replication Table.
This does make our queue naming schema slightly different from ReplicationQueuesZKImpl though.
Signed-off-by: Elliott Clark <eclark@apache.org>
- Testing by executing a command will cover the exact path users will trigger, so its better then directly calling library functions in tests. Changing the tests to use @shell.command(:<command>, args) to execute them like it's a command coming from shell.
Norm change:
Commands should print the output user would like to see, but in the end, should also return the relevant value. This way:
- Tests can use returned value to check that functionality works
- Tests can capture stdout to assert particular kind of output user should see.
- We do not print the return value in interactive mode and keep the output clean. See Shell.command() function.
Bugs found due to this change:
- Uncovered bug in major_compact.rb with this approach. It was calling admin.majorCompact() which doesn't exist but our tests didn't catch it since they directly tested admin.major_compact()
- Enabled TestReplicationShell. If it's bad, flaky infra will take care of it.
Change-Id: I5d8af16bf477a79a2f526a5bf11c245b02b7d276
Implemented ReplicationQueuesHBaseImpl that tracks WAL offsets and replication queues in an HBase table.
Only wrote the basic tracking methods, have not implemented claimQueue() or HFileRef methods yet.
Wrote a basic unit test for ReplicationQueueHBaseImpl that tests the implemented functions on a single Region Server
Signed-off-by: Elliott Clark <elliott@fb.com>
Signed-off-by: Elliott Clark <eclark@apache.org>
It added this in AsyncProcess#waitForMaximumCurrentTasks:
synchronized (this.tasksInProgress) {
+ if (tasksInProgress.get() != oldInProgress) break;
this.tasksInProgress.wait(100);
which added a break out of our waiting loop if any change in
count of tasks; it seems that what was wanted was instead to
avoid the wait if there was movement in the count of completed
task.
Reformats waitForMaximumCurrentTasks so it is testable. Adds
test that we indeed wait on the specified parameter.
Summary:
Allow TestTimestampFilterSeekHint to provide a seek next hint.
This can be incorrect as it might skip deletes. However it can
make things much much faster.
Test Plan: Added a unit test.
Differential Revision: https://reviews.facebook.net/D55617
Summary:
Currently WAL splitting is broken when a region has been opened multiple times in recent minutes.
Region open and region close write event markers to the wal. These markers should have the sequence id in them. However it is currently getting 1. That means that if a region has moved multiple times in the last few mins then multiple split log workers will try and create the recovered edits file for sequence id 1. One of the workers will fail and on failing they will delete the recovered edits. Causing all split wal attempts to fail.
We need to:
It appears that the close event with a sequence id of one is coming from region warm up.
This patch fixes that by making sure the close on warm up doesn't happen. Also splitting will ignore any of the events that are already in the logs.
Test Plan: Unit tests pass
Differential Revision: https://reviews.facebook.net/D55557
Further investigation after HBASE-15221 lead to some findings that
AsyncProcess should have been managing the contents of the region
location cache, appropriately clearing it when necessary (e.g. an
RPC to a server fails because the server doesn't host that region)
For multi() RPCs, the tableName argument is null since there is no
single table that the updates are destined to. This inadvertently
caused the existing region location cache updates to fail on 1.x
branches. AsyncProcess needs to handle when tableName is null
and perform the necessary cache evictions.
As such, much of the new retry logic in HTableMultiplexer is
unnecessary and is removed with this commit. Getters which were
added as a part of testing were left since that are mostly
harmless and should contain no negative impact.
Signed-off-by: stack <stack@apache.org>