HBASE-14558 Documenmt ChaosMonkey enhancements from HBASE-14261
Signed-off-by: Elliott Clark <eclark@apache.org>
This commit is contained in:
parent
e030c7a77b
commit
397bc555e3
|
@ -1202,16 +1202,19 @@ _/etc/init.d/_ scripts are not supported for now, but it can be easily added.
|
||||||
For other deployment options, a ClusterManager can be implemented and plugged in.
|
For other deployment options, a ClusterManager can be implemented and plugged in.
|
||||||
|
|
||||||
[[maven.build.commands.integration.tests.destructive]]
|
[[maven.build.commands.integration.tests.destructive]]
|
||||||
==== Destructive integration / system tests
|
==== Destructive integration / system tests (ChaosMonkey)
|
||||||
|
|
||||||
In 0.96, a tool named `ChaosMonkey` has been introduced.
|
HBase 0.96 introduced a tool named `ChaosMonkey`, modeled after link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
|
||||||
It is modeled after the link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html[same-named tool by Netflix].
|
[same-named tool by Netflix's Chaos Monkey tool]. ChaosMonkey simulates real-world
|
||||||
Some of the tests use ChaosMonkey to simulate faults in the running cluster in the way of killing random servers, disconnecting servers, etc.
|
faults in a running cluster by killing or disconnecting random servers, or injecting
|
||||||
ChaosMonkey can also be used as a stand-alone tool to run a (misbehaving) policy while you are running other tests.
|
other failures into the environment. You can use ChaosMonkey as a stand-alone tool
|
||||||
|
to run a policy while other tests are running. In some environments, ChaosMonkey is
|
||||||
|
always running, in order to constantly check that high availability and fault tolerance
|
||||||
|
are working as expected.
|
||||||
|
|
||||||
ChaosMonkey defines Action's and Policy's.
|
ChaosMonkey defines *Actions* and *Policies*.
|
||||||
Actions are sequences of events.
|
|
||||||
We have at least the following actions:
|
Actions:: Actions are predefined sequences of events, such as the following:
|
||||||
|
|
||||||
* Restart active master (sleep 5 sec)
|
* Restart active master (sleep 5 sec)
|
||||||
* Restart random regionserver (sleep 5 sec)
|
* Restart random regionserver (sleep 5 sec)
|
||||||
|
@ -1221,23 +1224,17 @@ We have at least the following actions:
|
||||||
* Batch restart of 50% of regionservers (sleep 5 sec)
|
* Batch restart of 50% of regionservers (sleep 5 sec)
|
||||||
* Rolling restart of 100% of regionservers (sleep 5 sec)
|
* Rolling restart of 100% of regionservers (sleep 5 sec)
|
||||||
|
|
||||||
Policies on the other hand are responsible for executing the actions based on a strategy.
|
Policies:: A policy is a strategy for executing one or more actions. The default policy
|
||||||
The default policy is to execute a random action every minute based on predefined action weights.
|
executes a random action every minute based on predefined action weights.
|
||||||
ChaosMonkey executes predefined named policies until it is stopped.
|
A given policy will be executed until ChaosMonkey is interrupted.
|
||||||
More than one policy can be active at any time.
|
|
||||||
|
|
||||||
To run ChaosMonkey as a standalone tool deploy your HBase cluster as usual.
|
Most ChaosMonkey actions are configured to have reasonable defaults, so you can run
|
||||||
ChaosMonkey uses the configuration from the bin/hbase script, thus no extra configuration needs to be done.
|
ChaosMonkey against an existing cluster without any additional configuration. The
|
||||||
You can invoke the ChaosMonkey by running:
|
following example runs ChaosMonkey with the default configuration:
|
||||||
|
|
||||||
[source,bourne]
|
|
||||||
----
|
|
||||||
bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey
|
|
||||||
----
|
|
||||||
|
|
||||||
This will output something like:
|
|
||||||
|
|
||||||
|
[source,bash]
|
||||||
----
|
----
|
||||||
|
$ bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey
|
||||||
|
|
||||||
12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000
|
12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000
|
||||||
12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter
|
12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter
|
||||||
|
@ -1276,31 +1273,38 @@ This will output something like:
|
||||||
12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
|
12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
|
||||||
----
|
----
|
||||||
|
|
||||||
As you can see from the log, ChaosMonkey started the default PeriodicRandomActionPolicy, which is configured with all the available actions, and ran RestartActiveMaster and RestartRandomRs actions.
|
The output indicates that ChaosMonkey started the default `PeriodicRandomActionPolicy`
|
||||||
ChaosMonkey tool, if run from command line, will keep on running until the process is killed.
|
policy, which is configured with all the available actions. It chose to run `RestartActiveMaster` and `RestartRandomRs` actions.
|
||||||
|
|
||||||
|
==== Available Policies
|
||||||
|
HBase ships with several ChaosMonkey policies, available in the
|
||||||
|
`hbase/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/policies/` directory.
|
||||||
|
|
||||||
[[chaos.monkey.properties]]
|
[[chaos.monkey.properties]]
|
||||||
==== Passing individual Chaos Monkey per-test Settings/Properties
|
==== Configuring Individual ChaosMonkey Actions
|
||||||
|
|
||||||
Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348[HBASE-11348]), the chaos monkeys is used to run integration tests can be configured per test run.
|
Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348
|
||||||
Users can create a java properties file and and pass this to the chaos monkey with timing configurations.
|
[HBASE-11348]), ChaosMonkey integration tests can be configured per test run.
|
||||||
The properties file needs to be in the HBase classpath.
|
Create a Java properties file in the HBase classpath and pass it to ChaosMonkey using
|
||||||
The various properties that can be configured and their default values can be found listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants` class.
|
the `-monkeyProps` configuration flag. Configurable properties, along with their default
|
||||||
If any chaos monkey configuration is missing from the property file, then the default values are assumed.
|
values if applicable, are listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants`
|
||||||
For example:
|
class. For properties that have defaults, you can override them by including them
|
||||||
|
in your properties file.
|
||||||
|
|
||||||
|
The following example uses a properties file called <<monkey.properties,monkey.properties>>.
|
||||||
|
|
||||||
[source,bourne]
|
[source,bourne]
|
||||||
----
|
----
|
||||||
|
$ bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic -monkeyProps monkey.properties
|
||||||
$bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic -monkeyProps monkey.properties
|
|
||||||
----
|
----
|
||||||
|
|
||||||
The above command will start the integration tests and chaos monkey passing the properties file _monkey.properties_.
|
The above command will start the integration tests and chaos monkey passing the properties file _monkey.properties_.
|
||||||
Here is an example chaos monkey file:
|
Here is an example chaos monkey file:
|
||||||
|
|
||||||
|
[[monkey.properties]]
|
||||||
|
.Example ChaosMonkey Properties File
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
|
|
||||||
sdm.action1.period=120000
|
sdm.action1.period=120000
|
||||||
sdm.action2.period=40000
|
sdm.action2.period=40000
|
||||||
move.regions.sleep.time=80000
|
move.regions.sleep.time=80000
|
||||||
|
@ -1309,6 +1313,35 @@ move.regions.sleep.time=80000
|
||||||
batch.restart.rs.ratio=0.4f
|
batch.restart.rs.ratio=0.4f
|
||||||
----
|
----
|
||||||
|
|
||||||
|
HBase 1.0.2 and newer adds the ability to restart HBase's underlying ZooKeeper quorum or
|
||||||
|
HDFS nodes. To use these actions, you need to configure some new properties, which
|
||||||
|
have no reasonable defaults because they are deployment-specific, in your ChaosMonkey
|
||||||
|
properties file, which may be `hbase-site.xml` or a different properties file.
|
||||||
|
|
||||||
|
[source,xml]
|
||||||
|
----
|
||||||
|
<property>
|
||||||
|
<name>hbase.it.clustermanager.hadoop.home</name>
|
||||||
|
<value>$HADOOP_HOME</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.it.clustermanager.zookeeper.home</name>
|
||||||
|
<value>$ZOOKEEPER_HOME</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.it.clustermanager.hbase.user</name>
|
||||||
|
<value>hbase</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.it.clustermanager.hadoop.hdfs.user</name>
|
||||||
|
<value>hdfs</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.it.clustermanager.zookeeper.user</name>
|
||||||
|
<value>zookeeper</value>
|
||||||
|
</property>
|
||||||
|
----
|
||||||
|
|
||||||
[[developing]]
|
[[developing]]
|
||||||
== Developer Guidelines
|
== Developer Guidelines
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue