HBASE-14558 Documenmt ChaosMonkey enhancements from HBASE-14261

Signed-off-by: Elliott Clark <eclark@apache.org>
This commit is contained in:
Misty Stanley-Jones 2015-10-06 15:17:12 +10:00
parent e030c7a77b
commit 397bc555e3
1 changed files with 67 additions and 34 deletions

View File

@ -1202,16 +1202,19 @@ _/etc/init.d/_ scripts are not supported for now, but it can be easily added.
For other deployment options, a ClusterManager can be implemented and plugged in. For other deployment options, a ClusterManager can be implemented and plugged in.
[[maven.build.commands.integration.tests.destructive]] [[maven.build.commands.integration.tests.destructive]]
==== Destructive integration / system tests ==== Destructive integration / system tests (ChaosMonkey)
In 0.96, a tool named `ChaosMonkey` has been introduced. HBase 0.96 introduced a tool named `ChaosMonkey`, modeled after link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
It is modeled after the link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html[same-named tool by Netflix]. [same-named tool by Netflix's Chaos Monkey tool]. ChaosMonkey simulates real-world
Some of the tests use ChaosMonkey to simulate faults in the running cluster in the way of killing random servers, disconnecting servers, etc. faults in a running cluster by killing or disconnecting random servers, or injecting
ChaosMonkey can also be used as a stand-alone tool to run a (misbehaving) policy while you are running other tests. other failures into the environment. You can use ChaosMonkey as a stand-alone tool
to run a policy while other tests are running. In some environments, ChaosMonkey is
always running, in order to constantly check that high availability and fault tolerance
are working as expected.
ChaosMonkey defines Action's and Policy's. ChaosMonkey defines *Actions* and *Policies*.
Actions are sequences of events.
We have at least the following actions: Actions:: Actions are predefined sequences of events, such as the following:
* Restart active master (sleep 5 sec) * Restart active master (sleep 5 sec)
* Restart random regionserver (sleep 5 sec) * Restart random regionserver (sleep 5 sec)
@ -1221,23 +1224,17 @@ We have at least the following actions:
* Batch restart of 50% of regionservers (sleep 5 sec) * Batch restart of 50% of regionservers (sleep 5 sec)
* Rolling restart of 100% of regionservers (sleep 5 sec) * Rolling restart of 100% of regionservers (sleep 5 sec)
Policies on the other hand are responsible for executing the actions based on a strategy. Policies:: A policy is a strategy for executing one or more actions. The default policy
The default policy is to execute a random action every minute based on predefined action weights. executes a random action every minute based on predefined action weights.
ChaosMonkey executes predefined named policies until it is stopped. A given policy will be executed until ChaosMonkey is interrupted.
More than one policy can be active at any time.
To run ChaosMonkey as a standalone tool deploy your HBase cluster as usual. Most ChaosMonkey actions are configured to have reasonable defaults, so you can run
ChaosMonkey uses the configuration from the bin/hbase script, thus no extra configuration needs to be done. ChaosMonkey against an existing cluster without any additional configuration. The
You can invoke the ChaosMonkey by running: following example runs ChaosMonkey with the default configuration:
[source,bourne]
----
bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey
----
This will output something like:
[source,bash]
---- ----
$ bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey
12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000 12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000
12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter 12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter
@ -1276,31 +1273,38 @@ This will output something like:
12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6 12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
---- ----
As you can see from the log, ChaosMonkey started the default PeriodicRandomActionPolicy, which is configured with all the available actions, and ran RestartActiveMaster and RestartRandomRs actions. The output indicates that ChaosMonkey started the default `PeriodicRandomActionPolicy`
ChaosMonkey tool, if run from command line, will keep on running until the process is killed. policy, which is configured with all the available actions. It chose to run `RestartActiveMaster` and `RestartRandomRs` actions.
==== Available Policies
HBase ships with several ChaosMonkey policies, available in the
`hbase/hbase-it/src/test/java/org/apache/hadoop/hbase/chaos/policies/` directory.
[[chaos.monkey.properties]] [[chaos.monkey.properties]]
==== Passing individual Chaos Monkey per-test Settings/Properties ==== Configuring Individual ChaosMonkey Actions
Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348[HBASE-11348]), the chaos monkeys is used to run integration tests can be configured per test run. Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348
Users can create a java properties file and and pass this to the chaos monkey with timing configurations. [HBASE-11348]), ChaosMonkey integration tests can be configured per test run.
The properties file needs to be in the HBase classpath. Create a Java properties file in the HBase classpath and pass it to ChaosMonkey using
The various properties that can be configured and their default values can be found listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants` class. the `-monkeyProps` configuration flag. Configurable properties, along with their default
If any chaos monkey configuration is missing from the property file, then the default values are assumed. values if applicable, are listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants`
For example: class. For properties that have defaults, you can override them by including them
in your properties file.
The following example uses a properties file called <<monkey.properties,monkey.properties>>.
[source,bourne] [source,bourne]
---- ----
$ bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic -monkeyProps monkey.properties
$bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic -monkeyProps monkey.properties
---- ----
The above command will start the integration tests and chaos monkey passing the properties file _monkey.properties_. The above command will start the integration tests and chaos monkey passing the properties file _monkey.properties_.
Here is an example chaos monkey file: Here is an example chaos monkey file:
[[monkey.properties]]
.Example ChaosMonkey Properties File
[source] [source]
---- ----
sdm.action1.period=120000 sdm.action1.period=120000
sdm.action2.period=40000 sdm.action2.period=40000
move.regions.sleep.time=80000 move.regions.sleep.time=80000
@ -1309,6 +1313,35 @@ move.regions.sleep.time=80000
batch.restart.rs.ratio=0.4f batch.restart.rs.ratio=0.4f
---- ----
HBase 1.0.2 and newer adds the ability to restart HBase's underlying ZooKeeper quorum or
HDFS nodes. To use these actions, you need to configure some new properties, which
have no reasonable defaults because they are deployment-specific, in your ChaosMonkey
properties file, which may be `hbase-site.xml` or a different properties file.
[source,xml]
----
<property>
<name>hbase.it.clustermanager.hadoop.home</name>
<value>$HADOOP_HOME</value>
</property>
<property>
<name>hbase.it.clustermanager.zookeeper.home</name>
<value>$ZOOKEEPER_HOME</value>
</property>
<property>
<name>hbase.it.clustermanager.hbase.user</name>
<value>hbase</value>
</property>
<property>
<name>hbase.it.clustermanager.hadoop.hdfs.user</name>
<value>hdfs</value>
</property>
<property>
<name>hbase.it.clustermanager.zookeeper.user</name>
<value>zookeeper</value>
</property>
----
[[developing]] [[developing]]
== Developer Guidelines == Developer Guidelines