HDFS-14410. Make Dynamometer documentation properly compile onto the Hadoop site. Contributed by Erik Krogen.
This commit is contained in:
parent
32925d04d9
commit
5043840b1d
|
@ -215,6 +215,7 @@
|
||||||
<item name="Resource Estimator Service" href="hadoop-resourceestimator/ResourceEstimator.html"/>
|
<item name="Resource Estimator Service" href="hadoop-resourceestimator/ResourceEstimator.html"/>
|
||||||
<item name="Scheduler Load Simulator" href="hadoop-sls/SchedulerLoadSimulator.html"/>
|
<item name="Scheduler Load Simulator" href="hadoop-sls/SchedulerLoadSimulator.html"/>
|
||||||
<item name="Hadoop Benchmarking" href="hadoop-project-dist/hadoop-common/Benchmarking.html"/>
|
<item name="Hadoop Benchmarking" href="hadoop-project-dist/hadoop-common/Benchmarking.html"/>
|
||||||
|
<item name="Dynamometer" href="hadoop-dynamometer/Dynamometer.html"/>
|
||||||
</menu>
|
</menu>
|
||||||
|
|
||||||
<menu name="Reference" inherit="top">
|
<menu name="Reference" inherit="top">
|
||||||
|
|
|
@ -0,0 +1,299 @@
|
||||||
|
<!---
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
you may not use this file except in compliance with the License.
|
||||||
|
You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License. See accompanying LICENSE file.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Dynamometer Guide
|
||||||
|
|
||||||
|
<!-- MACRO{toc|fromDepth=0|toDepth=3} -->
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Dynamometer is a tool to performance test Hadoop's HDFS NameNode. The intent is to provide a
|
||||||
|
real-world environment by initializing the NameNode against a production file system image and replaying
|
||||||
|
a production workload collected via e.g. the NameNode's audit logs. This allows for replaying a workload
|
||||||
|
which is not only similar in characteristic to that experienced in production, but actually identical.
|
||||||
|
|
||||||
|
Dynamometer will launch a YARN application which starts a single NameNode and a configurable number of
|
||||||
|
DataNodes, simulating an entire HDFS cluster as a single application. There is an additional `workload`
|
||||||
|
job run as a MapReduce job which accepts audit logs as input and uses the information contained within to
|
||||||
|
submit matching requests to the NameNode, inducing load on the service.
|
||||||
|
|
||||||
|
Dynamometer can execute this same workload against different Hadoop versions or with different
|
||||||
|
configurations, allowing for the testing of configuration tweaks and code changes at scale without the
|
||||||
|
necessity of deploying to a real large-scale cluster.
|
||||||
|
|
||||||
|
Throughout this documentation, we will use "Dyno-HDFS", "Dyno-NN", and "Dyno-DN" to refer to the HDFS
|
||||||
|
cluster, NameNode, and DataNodes (respectively) which are started _inside of_ a Dynamometer application.
|
||||||
|
Terms like HDFS, YARN, and NameNode used without qualification refer to the existing infrastructure on
|
||||||
|
top of which Dynamometer is run.
|
||||||
|
|
||||||
|
For more details on how Dynamometer works, as opposed to how to use it, see the Architecture section
|
||||||
|
at the end of this page.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
Dynamometer is based around YARN applications, so an existing YARN cluster will be required for execution.
|
||||||
|
It also requires an accompanying HDFS instance to store some temporary files for communication.
|
||||||
|
|
||||||
|
## Building
|
||||||
|
|
||||||
|
Dynamometer consists of three main components, each one in its own module:
|
||||||
|
|
||||||
|
* Infrastructure (`dynamometer-infra`): This is the YARN application which starts a Dyno-HDFS cluster.
|
||||||
|
* Workload (`dynamometer-workload`): This is the MapReduce job which replays audit logs.
|
||||||
|
* Block Generator (`dynamometer-blockgen`): This is a MapReduce job used to generate input files for each Dyno-DN; its
|
||||||
|
execution is a prerequisite step to running the infrastructure application.
|
||||||
|
|
||||||
|
The compiled version of all of these components will be included in a standard Hadoop distribution.
|
||||||
|
You can find them in the packaged distribution within `share/hadoop/tools/dynamometer`.
|
||||||
|
|
||||||
|
## Setup Steps
|
||||||
|
|
||||||
|
Before launching a Dynamometer application, there are a number of setup steps that must be completed,
|
||||||
|
instructing Dynamometer what configurations to use, what version to use, what fsimage to use when
|
||||||
|
loading, etc. These steps can be performed a single time to put everything in place, and then many
|
||||||
|
Dynamometer executions can be performed against them with minor tweaks to measure variations.
|
||||||
|
|
||||||
|
Scripts discussed below can be found in the `share/hadoop/tools/dynamometer/dynamometer-{infra,workload,blockgen}/bin`
|
||||||
|
directories of the distribution. The corresponding Java JAR files can be found in the `share/hadoop/tools/lib/` directory.
|
||||||
|
References to bin files below assume that the current working directory is `share/hadoop/tools/dynamometer`.
|
||||||
|
|
||||||
|
### Step 1: Preparing Requisite Files
|
||||||
|
|
||||||
|
A number of steps are required in advance of starting your first Dyno-HDFS cluster:
|
||||||
|
|
||||||
|
### Step 2: Prepare FsImage Files
|
||||||
|
|
||||||
|
Collect an fsimage and related files from your NameNode. This will include the `fsimage_TXID` file
|
||||||
|
which the NameNode creates as part of checkpointing, the `fsimage_TXID.md5` containing the md5 hash
|
||||||
|
of the image, the `VERSION` file containing some metadata, and the `fsimage_TXID.xml` file which can
|
||||||
|
be generated from the fsimage using the offline image viewer:
|
||||||
|
```
|
||||||
|
hdfs oiv -i fsimage_TXID -o fsimage_TXID.xml -p XML
|
||||||
|
```
|
||||||
|
It is recommended that you collect these files from your Secondary/Standby NameNode if you have one
|
||||||
|
to avoid placing additional load on your Active NameNode.
|
||||||
|
|
||||||
|
All of these files must be placed somewhere on HDFS where the various jobs will be able to access them.
|
||||||
|
They should all be in the same folder, e.g. `hdfs:///dyno/fsimage`.
|
||||||
|
|
||||||
|
All these steps can be automated with the `upload-fsimage.sh` script, e.g.:
|
||||||
|
```
|
||||||
|
./dynamometer-infra/bin/upload-fsimage.sh 0001 hdfs:///dyno/fsimage
|
||||||
|
```
|
||||||
|
Where 0001 is the transaction ID of the desired fsimage. See usage info of the script for more detail.
|
||||||
|
|
||||||
|
### Step 3: Prepare a Hadoop Binary
|
||||||
|
|
||||||
|
Collect the Hadoop distribution tarball to use to start the Dyno-NN and -DNs. For example, if
|
||||||
|
testing against Hadoop 3.0.2, use
|
||||||
|
[hadoop-3.0.2.tar.gz](http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.0.2/hadoop-3.0.2.tar.gz).
|
||||||
|
This distribution contains several components unnecessary for Dynamometer (e.g. YARN), so to reduce
|
||||||
|
its size, you can optionally use the `create-slim-hadoop-tar.sh` script:
|
||||||
|
```
|
||||||
|
./dynamometer-infra/bin/create-slim-hadoop-tar.sh hadoop-VERSION.tar.gz
|
||||||
|
```
|
||||||
|
The Hadoop tar can be present on HDFS or locally where the client will be run from. Its path will be
|
||||||
|
supplied to the client via the `-hadoop_binary_path` argument.
|
||||||
|
|
||||||
|
Alternatively, if you use the `-hadoop_version` argument, you can simply specify which version you would
|
||||||
|
like to run against (e.g. '3.0.2') and the client will attempt to download it automatically from an
|
||||||
|
Apache mirror. See the usage information of the client for more details.
|
||||||
|
|
||||||
|
### Step 4: Prepare Configurations
|
||||||
|
|
||||||
|
Prepare a configuration directory. You will need to specify a configuration directory with the standard
|
||||||
|
Hadoop configuration layout, e.g. it should contain `etc/hadoop/*-site.xml`. This determines with what
|
||||||
|
configuration the Dyno-NN and -DNs will be launched. Configurations that must be modified for
|
||||||
|
Dynamometer to work properly (e.g. `fs.defaultFS` or `dfs.namenode.name.dir`) will be overridden
|
||||||
|
at execution time. This can be a directory if it is available locally, else an archive file on local
|
||||||
|
or remote (HDFS) storage.
|
||||||
|
|
||||||
|
### Step 5: Execute the Block Generation Job
|
||||||
|
|
||||||
|
This will use the `fsimage_TXID.xml` file to generate the list of blocks that each Dyno-DN should
|
||||||
|
advertise to the Dyno-NN. It runs as a MapReduce job.
|
||||||
|
```
|
||||||
|
./dynamometer-blockgen/bin/generate-block-lists.sh
|
||||||
|
-fsimage_input_path hdfs:///dyno/fsimage/fsimage_TXID.xml
|
||||||
|
-block_image_output_dir hdfs:///dyno/blocks
|
||||||
|
-num_reducers R
|
||||||
|
-num_datanodes D
|
||||||
|
```
|
||||||
|
In this example, the XML file uploaded above is used to generate block listings into `hdfs:///dyno/blocks`.
|
||||||
|
`R` reducers are used for the job, and `D` block listings are generated - this will determine how many
|
||||||
|
Dyno-DNs are started in the Dyno-HDFS cluster.
|
||||||
|
|
||||||
|
### Step 6: Prepare Audit Traces (Optional)
|
||||||
|
|
||||||
|
This step is only necessary if you intend to use the audit trace replay capabilities of Dynamometer; if you
|
||||||
|
just intend to start a Dyno-HDFS cluster you can skip to the next section.
|
||||||
|
|
||||||
|
The audit trace replay accepts one input file per mapper, and currently supports two input formats, configurable
|
||||||
|
via the `auditreplay.command-parser.class` configuration. One mapper will automatically be created for every
|
||||||
|
audit log file within the audit log directory specified at launch time.
|
||||||
|
|
||||||
|
The default is a direct format,
|
||||||
|
`com.linkedin.dynamometer.workloadgenerator.audit.AuditLogDirectParser`. This accepts files in the format produced
|
||||||
|
by a standard configuration audit logger, e.g. lines like:
|
||||||
|
```
|
||||||
|
1970-01-01 00:00:42,000 INFO FSNamesystem.audit: allowed=true ugi=hdfs ip=/127.0.0.1 cmd=open src=/tmp/foo dst=null perm=null proto=rpc
|
||||||
|
```
|
||||||
|
When using this format you must also specify `auditreplay.log-start-time.ms`, which should be (in milliseconds since
|
||||||
|
the Unix epoch) the start time of the audit traces. This is needed for all mappers to agree on a single start time. For
|
||||||
|
example, if the above line was the first audit event, you would specify `auditreplay.log-start-time.ms=42000`. Within a
|
||||||
|
file, the audit logs must be in order of ascending timestamp.
|
||||||
|
|
||||||
|
The other supported format is `com.linkedin.dynamometer.workloadgenerator.audit.AuditLogHiveTableParser`. This accepts
|
||||||
|
files in the format produced by a Hive query with output fields, in order:
|
||||||
|
|
||||||
|
* `relativeTimestamp`: event time offset, in milliseconds, from the start of the trace
|
||||||
|
* `ugi`: user information of the submitting user
|
||||||
|
* `command`: name of the command, e.g. 'open'
|
||||||
|
* `source`: source path
|
||||||
|
* `dest`: destination path
|
||||||
|
* `sourceIP`: source IP of the event
|
||||||
|
|
||||||
|
Assuming your audit logs are available in Hive, this can be produced via a Hive query looking like:
|
||||||
|
```sql
|
||||||
|
INSERT OVERWRITE DIRECTORY '${outputPath}'
|
||||||
|
SELECT (timestamp - ${startTimestamp} AS relativeTimestamp, ugi, command, source, dest, sourceIP
|
||||||
|
FROM '${auditLogTableLocation}'
|
||||||
|
WHERE timestamp >= ${startTimestamp} AND timestamp < ${endTimestamp}
|
||||||
|
DISTRIBUTE BY src
|
||||||
|
SORT BY relativeTimestamp ASC;
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Partitioning the Audit Logs
|
||||||
|
|
||||||
|
You may notice that in the Hive query shown above, there is a `DISTRIBUTE BY src` clause which indicates that the
|
||||||
|
output files should be partitioned by the source IP of the caller. This is done to try to maintain closer ordering
|
||||||
|
of requests which originated from a single client. Dynamometer does not guarantee strict ordering of operations even
|
||||||
|
within a partition, but ordering will typically be maintained more closely within a partition than across partitions.
|
||||||
|
|
||||||
|
Whether you use Hive or raw audit logs, it will be necessary to partition the audit logs based on the number of
|
||||||
|
simultaneous clients you required to perform your workload replay. Using the source IP as a partition key is one
|
||||||
|
approach with the potential advantages discussed above, but any partition scheme should work reasonably well.
|
||||||
|
|
||||||
|
## Running Dynamometer
|
||||||
|
|
||||||
|
After the setup steps above have been completed, you're ready to start up a Dyno-HDFS cluster and replay
|
||||||
|
some workload against it!
|
||||||
|
|
||||||
|
The client which launches the Dyno-HDFS YARN application can optionally launch the workload replay
|
||||||
|
job once the Dyno-HDFS cluster has fully started. This makes each replay into a single execution of the client,
|
||||||
|
enabling easy testing of various configurations. You can also launch the two separately to have more control.
|
||||||
|
Similarly, it is possible to launch Dyno-DNs for an external NameNode which is not controlled by Dynamometer/YARN.
|
||||||
|
This can be useful for testing NameNode configurations which are not yet supported (e.g. HA NameNodes). You can do
|
||||||
|
this by passing the `-namenode_servicerpc_addr` argument to the infrastructure application with a value that points
|
||||||
|
to an external NameNode's service RPC address.
|
||||||
|
|
||||||
|
### Manual Workload Launch
|
||||||
|
|
||||||
|
First launch the infrastructure application to begin the startup of the internal HDFS cluster, e.g.:
|
||||||
|
```
|
||||||
|
./dynamometer-infra/bin/start-dynamometer-cluster.sh
|
||||||
|
-hadoop_binary_path hadoop-3.0.2.tar.gz
|
||||||
|
-conf_path my-hadoop-conf
|
||||||
|
-fs_image_dir hdfs:///fsimage
|
||||||
|
-block_list_path hdfs:///dyno/blocks
|
||||||
|
```
|
||||||
|
This demonstrates the required arguments. You can run this with the `-help` flag to see further usage information.
|
||||||
|
|
||||||
|
The client will track the Dyno-NN's startup progress and how many Dyno-DNs it considers live. It will notify
|
||||||
|
via logging when the Dyno-NN has exited safemode and is ready for use.
|
||||||
|
|
||||||
|
At this point, a workload job (map-only MapReduce job) can be launched, e.g.:
|
||||||
|
```
|
||||||
|
./dynamometer-workload/bin/start-workload.sh
|
||||||
|
-Dauditreplay.input-path=hdfs:///dyno/audit_logs/
|
||||||
|
-Dauditreplay.output-path=hdfs:///dyno/results/
|
||||||
|
-Dauditreplay.num-threads=50
|
||||||
|
-nn_uri hdfs://namenode_address:port/
|
||||||
|
-start_time_offset 5m
|
||||||
|
-mapper_class_name AuditReplayMapper
|
||||||
|
```
|
||||||
|
The type of workload generation is configurable; AuditReplayMapper replays an audit log trace as discussed previously.
|
||||||
|
The AuditReplayMapper is configured via configurations; `auditreplay.input-path`, `auditreplay.output-path` and
|
||||||
|
`auditreplay.num-threads` are required to specify the input path for audit log files, the output path for the results,
|
||||||
|
and the number of threads per map task. A number of map tasks equal to the number of files in `input-path` will be
|
||||||
|
launched; each task will read in one of these input files and use `num-threads` threads to replay the events contained
|
||||||
|
within that file. A best effort is made to faithfully replay the audit log events at the same pace at which they
|
||||||
|
originally occurred (optionally, this can be adjusted by specifying `auditreplay.rate-factor` which is a multiplicative
|
||||||
|
factor towards the rate of replay, e.g. use 2.0 to replay the events at twice the original speed).
|
||||||
|
|
||||||
|
### Integrated Workload Launch
|
||||||
|
|
||||||
|
To have the infrastructure application client launch the workload automatically, parameters for the workload job
|
||||||
|
are passed to the infrastructure script. Only the AuditReplayMapper is supported in this fashion at this time. To
|
||||||
|
launch an integrated application with the same parameters as were used above, the following can be used:
|
||||||
|
```
|
||||||
|
./dynamometer-infra/bin/start-dynamometer-cluster.sh
|
||||||
|
-hadoop_binary hadoop-3.0.2.tar.gz
|
||||||
|
-conf_path my-hadoop-conf
|
||||||
|
-fs_image_dir hdfs:///fsimage
|
||||||
|
-block_list_path hdfs:///dyno/blocks
|
||||||
|
-workload_replay_enable
|
||||||
|
-workload_input_path hdfs:///dyno/audit_logs/
|
||||||
|
-workload_output_path hdfs:///dyno/results/
|
||||||
|
-workload_threads_per_mapper 50
|
||||||
|
-workload_start_delay 5m
|
||||||
|
```
|
||||||
|
When run in this way, the client will automatically handle tearing down the Dyno-HDFS cluster once the
|
||||||
|
workload has completed. To see the full list of supported parameters, run this with the `-help` flag.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Dynamometer is implemented as an application on top of YARN. There are three main actors in a Dynamometer application:
|
||||||
|
|
||||||
|
* Infrastructure is the simulated HDFS cluster.
|
||||||
|
* Workload simulates HDFS clients to generate load on the simulated NameNode.
|
||||||
|
* The driver coordinates the two other components.
|
||||||
|
|
||||||
|
The logic encapsulated in the driver enables a user to perform a full test execution of Dynamometer with a single command,
|
||||||
|
making it possible to do things like sweeping over different parameters to find optimal configurations.
|
||||||
|
|
||||||
|
![Dynamometer Infrastructure Application Architecture](./images/dynamometer-architecture-infra.png)
|
||||||
|
|
||||||
|
The infrastructure application is written as a native YARN application in which a single NameNode and numerous DataNodes
|
||||||
|
are launched and wired together to create a fully simulated HDFS cluster. For Dynamometer to provide an extremely realistic scenario,
|
||||||
|
it is necessary to have a cluster which contains, from the NameNode’s perspective, the same information as a production cluster.
|
||||||
|
This is why the setup steps described above involve first collecting the FsImage file from a production NameNode and placing it onto
|
||||||
|
the host HDFS cluster. To avoid having to copy an entire cluster’s worth of blocks, Dynamometer leverages the fact that the actual
|
||||||
|
data stored in blocks is irrelevant to the NameNode, which is only aware of the block metadata. Dynamometer's blockgen job first
|
||||||
|
uses the Offline Image Viewer to turn the FsImage into XML, then parses this to extract the metadata for each block, then partitions this
|
||||||
|
information before placing it on HDFS for the simulated DataNodes to consume. `SimulatedFSDataset` is used to bypass the DataNode storage
|
||||||
|
layer and store only the block metadata, loaded from the information extracted in the previous step. This scheme allows Dynamometer to
|
||||||
|
pack many simulated DataNodes onto each physical node, as the size of the metadata is many orders of magnitude smaller than the data itself.
|
||||||
|
|
||||||
|
To create a stress test that matches a production environment, Dynamometer needs a way to collect the information about the production workload.
|
||||||
|
For this the HDFS audit log is used, which contains a faithful record of all client-facing operations against the NameNode. By replaying this
|
||||||
|
audit log to recreate the client load, and running simulated DataNodes to recreate the cluster management load, Dynamometer is able to provide
|
||||||
|
a realistic simulation of the conditions of a production NameNode.
|
||||||
|
|
||||||
|
![Dynamometer Replay Architecture](./images/dynamometer-architecture-replay.png)
|
||||||
|
|
||||||
|
A heavily-loaded NameNode can service tens of thousands of operations per second; to induce such a load, Dynamometer needs numerous clients to submit
|
||||||
|
requests. In an effort to ensure that each request has the same effect and performance implications as its original submission, Dynamometer
|
||||||
|
attempts to make related requests (for example, a directory creation followed by a listing of that directory) in such a way as to preserve their original
|
||||||
|
ordering. It is for this reason that audit log files are suggested to be partitioned by source IP address, using the assumption that requests which
|
||||||
|
originated from the same host have more tightly coupled causal relationships than those which originated from different hosts. In the interest of
|
||||||
|
simplicity, the stress testing job is written as a map-only MapReduce job, in which each mapper consumes a partitioned audit log file and replays
|
||||||
|
the commands contained within against the simulated NameNode. During execution statistics are collected about the replay, such as latency for
|
||||||
|
different types of requests.
|
||||||
|
|
||||||
|
## External Resources
|
||||||
|
|
||||||
|
To see more information on Dynamometer, you can see the
|
||||||
|
[blog post announcing its initial release](https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum)
|
||||||
|
or [this presentation](https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-dynamometer-and-a-case-study-in-namenode-gc).
|
|
@ -0,0 +1,30 @@
|
||||||
|
/*
|
||||||
|
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||||
|
* contributor license agreements. See the NOTICE file distributed with
|
||||||
|
* this work for additional information regarding copyright ownership.
|
||||||
|
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||||
|
* (the "License"); you may not use this file except in compliance with
|
||||||
|
* the License. You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
#banner {
|
||||||
|
height: 93px;
|
||||||
|
background: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
#bannerLeft img {
|
||||||
|
margin-left: 30px;
|
||||||
|
margin-top: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
#bannerRight img {
|
||||||
|
margin: 17px;
|
||||||
|
}
|
||||||
|
|
Binary file not shown.
After Width: | Height: | Size: 121 KiB |
Binary file not shown.
After Width: | Height: | Size: 156 KiB |
Loading…
Reference in New Issue