diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index bf6bd26ba9a..a79036907cf 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -215,6 +215,7 @@ + diff --git a/hadoop-tools/hadoop-dynamometer/src/site/markdown/Dynamometer.md b/hadoop-tools/hadoop-dynamometer/src/site/markdown/Dynamometer.md new file mode 100644 index 00000000000..39dd0dbbeef --- /dev/null +++ b/hadoop-tools/hadoop-dynamometer/src/site/markdown/Dynamometer.md @@ -0,0 +1,299 @@ + + +# Dynamometer Guide + + + +## Overview + +Dynamometer is a tool to performance test Hadoop's HDFS NameNode. The intent is to provide a +real-world environment by initializing the NameNode against a production file system image and replaying +a production workload collected via e.g. the NameNode's audit logs. This allows for replaying a workload +which is not only similar in characteristic to that experienced in production, but actually identical. + +Dynamometer will launch a YARN application which starts a single NameNode and a configurable number of +DataNodes, simulating an entire HDFS cluster as a single application. There is an additional `workload` +job run as a MapReduce job which accepts audit logs as input and uses the information contained within to +submit matching requests to the NameNode, inducing load on the service. + +Dynamometer can execute this same workload against different Hadoop versions or with different +configurations, allowing for the testing of configuration tweaks and code changes at scale without the +necessity of deploying to a real large-scale cluster. + +Throughout this documentation, we will use "Dyno-HDFS", "Dyno-NN", and "Dyno-DN" to refer to the HDFS +cluster, NameNode, and DataNodes (respectively) which are started _inside of_ a Dynamometer application. +Terms like HDFS, YARN, and NameNode used without qualification refer to the existing infrastructure on +top of which Dynamometer is run. + +For more details on how Dynamometer works, as opposed to how to use it, see the Architecture section +at the end of this page. + +## Requirements + +Dynamometer is based around YARN applications, so an existing YARN cluster will be required for execution. +It also requires an accompanying HDFS instance to store some temporary files for communication. + +## Building + +Dynamometer consists of three main components, each one in its own module: + +* Infrastructure (`dynamometer-infra`): This is the YARN application which starts a Dyno-HDFS cluster. +* Workload (`dynamometer-workload`): This is the MapReduce job which replays audit logs. +* Block Generator (`dynamometer-blockgen`): This is a MapReduce job used to generate input files for each Dyno-DN; its + execution is a prerequisite step to running the infrastructure application. + +The compiled version of all of these components will be included in a standard Hadoop distribution. +You can find them in the packaged distribution within `share/hadoop/tools/dynamometer`. + +## Setup Steps + +Before launching a Dynamometer application, there are a number of setup steps that must be completed, +instructing Dynamometer what configurations to use, what version to use, what fsimage to use when +loading, etc. These steps can be performed a single time to put everything in place, and then many +Dynamometer executions can be performed against them with minor tweaks to measure variations. + +Scripts discussed below can be found in the `share/hadoop/tools/dynamometer/dynamometer-{infra,workload,blockgen}/bin` +directories of the distribution. The corresponding Java JAR files can be found in the `share/hadoop/tools/lib/` directory. +References to bin files below assume that the current working directory is `share/hadoop/tools/dynamometer`. + +### Step 1: Preparing Requisite Files + +A number of steps are required in advance of starting your first Dyno-HDFS cluster: + +### Step 2: Prepare FsImage Files + +Collect an fsimage and related files from your NameNode. This will include the `fsimage_TXID` file +which the NameNode creates as part of checkpointing, the `fsimage_TXID.md5` containing the md5 hash +of the image, the `VERSION` file containing some metadata, and the `fsimage_TXID.xml` file which can +be generated from the fsimage using the offline image viewer: +``` +hdfs oiv -i fsimage_TXID -o fsimage_TXID.xml -p XML +``` +It is recommended that you collect these files from your Secondary/Standby NameNode if you have one +to avoid placing additional load on your Active NameNode. + +All of these files must be placed somewhere on HDFS where the various jobs will be able to access them. +They should all be in the same folder, e.g. `hdfs:///dyno/fsimage`. + +All these steps can be automated with the `upload-fsimage.sh` script, e.g.: +``` +./dynamometer-infra/bin/upload-fsimage.sh 0001 hdfs:///dyno/fsimage +``` +Where 0001 is the transaction ID of the desired fsimage. See usage info of the script for more detail. + +### Step 3: Prepare a Hadoop Binary + +Collect the Hadoop distribution tarball to use to start the Dyno-NN and -DNs. For example, if +testing against Hadoop 3.0.2, use +[hadoop-3.0.2.tar.gz](http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.0.2/hadoop-3.0.2.tar.gz). +This distribution contains several components unnecessary for Dynamometer (e.g. YARN), so to reduce +its size, you can optionally use the `create-slim-hadoop-tar.sh` script: +``` +./dynamometer-infra/bin/create-slim-hadoop-tar.sh hadoop-VERSION.tar.gz +``` +The Hadoop tar can be present on HDFS or locally where the client will be run from. Its path will be +supplied to the client via the `-hadoop_binary_path` argument. + +Alternatively, if you use the `-hadoop_version` argument, you can simply specify which version you would +like to run against (e.g. '3.0.2') and the client will attempt to download it automatically from an +Apache mirror. See the usage information of the client for more details. + +### Step 4: Prepare Configurations + +Prepare a configuration directory. You will need to specify a configuration directory with the standard +Hadoop configuration layout, e.g. it should contain `etc/hadoop/*-site.xml`. This determines with what +configuration the Dyno-NN and -DNs will be launched. Configurations that must be modified for +Dynamometer to work properly (e.g. `fs.defaultFS` or `dfs.namenode.name.dir`) will be overridden +at execution time. This can be a directory if it is available locally, else an archive file on local +or remote (HDFS) storage. + +### Step 5: Execute the Block Generation Job + +This will use the `fsimage_TXID.xml` file to generate the list of blocks that each Dyno-DN should +advertise to the Dyno-NN. It runs as a MapReduce job. +``` +./dynamometer-blockgen/bin/generate-block-lists.sh + -fsimage_input_path hdfs:///dyno/fsimage/fsimage_TXID.xml + -block_image_output_dir hdfs:///dyno/blocks + -num_reducers R + -num_datanodes D +``` +In this example, the XML file uploaded above is used to generate block listings into `hdfs:///dyno/blocks`. +`R` reducers are used for the job, and `D` block listings are generated - this will determine how many +Dyno-DNs are started in the Dyno-HDFS cluster. + +### Step 6: Prepare Audit Traces (Optional) + +This step is only necessary if you intend to use the audit trace replay capabilities of Dynamometer; if you +just intend to start a Dyno-HDFS cluster you can skip to the next section. + +The audit trace replay accepts one input file per mapper, and currently supports two input formats, configurable +via the `auditreplay.command-parser.class` configuration. One mapper will automatically be created for every +audit log file within the audit log directory specified at launch time. + +The default is a direct format, +`com.linkedin.dynamometer.workloadgenerator.audit.AuditLogDirectParser`. This accepts files in the format produced +by a standard configuration audit logger, e.g. lines like: +``` +1970-01-01 00:00:42,000 INFO FSNamesystem.audit: allowed=true ugi=hdfs ip=/127.0.0.1 cmd=open src=/tmp/foo dst=null perm=null proto=rpc +``` +When using this format you must also specify `auditreplay.log-start-time.ms`, which should be (in milliseconds since +the Unix epoch) the start time of the audit traces. This is needed for all mappers to agree on a single start time. For +example, if the above line was the first audit event, you would specify `auditreplay.log-start-time.ms=42000`. Within a +file, the audit logs must be in order of ascending timestamp. + +The other supported format is `com.linkedin.dynamometer.workloadgenerator.audit.AuditLogHiveTableParser`. This accepts +files in the format produced by a Hive query with output fields, in order: + +* `relativeTimestamp`: event time offset, in milliseconds, from the start of the trace +* `ugi`: user information of the submitting user +* `command`: name of the command, e.g. 'open' +* `source`: source path +* `dest`: destination path +* `sourceIP`: source IP of the event + +Assuming your audit logs are available in Hive, this can be produced via a Hive query looking like: +```sql +INSERT OVERWRITE DIRECTORY '${outputPath}' +SELECT (timestamp - ${startTimestamp} AS relativeTimestamp, ugi, command, source, dest, sourceIP +FROM '${auditLogTableLocation}' +WHERE timestamp >= ${startTimestamp} AND timestamp < ${endTimestamp} +DISTRIBUTE BY src +SORT BY relativeTimestamp ASC; +``` + +#### Partitioning the Audit Logs + +You may notice that in the Hive query shown above, there is a `DISTRIBUTE BY src` clause which indicates that the +output files should be partitioned by the source IP of the caller. This is done to try to maintain closer ordering +of requests which originated from a single client. Dynamometer does not guarantee strict ordering of operations even +within a partition, but ordering will typically be maintained more closely within a partition than across partitions. + +Whether you use Hive or raw audit logs, it will be necessary to partition the audit logs based on the number of +simultaneous clients you required to perform your workload replay. Using the source IP as a partition key is one +approach with the potential advantages discussed above, but any partition scheme should work reasonably well. + +## Running Dynamometer + +After the setup steps above have been completed, you're ready to start up a Dyno-HDFS cluster and replay +some workload against it! + +The client which launches the Dyno-HDFS YARN application can optionally launch the workload replay +job once the Dyno-HDFS cluster has fully started. This makes each replay into a single execution of the client, +enabling easy testing of various configurations. You can also launch the two separately to have more control. +Similarly, it is possible to launch Dyno-DNs for an external NameNode which is not controlled by Dynamometer/YARN. +This can be useful for testing NameNode configurations which are not yet supported (e.g. HA NameNodes). You can do +this by passing the `-namenode_servicerpc_addr` argument to the infrastructure application with a value that points +to an external NameNode's service RPC address. + +### Manual Workload Launch + +First launch the infrastructure application to begin the startup of the internal HDFS cluster, e.g.: +``` +./dynamometer-infra/bin/start-dynamometer-cluster.sh + -hadoop_binary_path hadoop-3.0.2.tar.gz + -conf_path my-hadoop-conf + -fs_image_dir hdfs:///fsimage + -block_list_path hdfs:///dyno/blocks +``` +This demonstrates the required arguments. You can run this with the `-help` flag to see further usage information. + +The client will track the Dyno-NN's startup progress and how many Dyno-DNs it considers live. It will notify +via logging when the Dyno-NN has exited safemode and is ready for use. + +At this point, a workload job (map-only MapReduce job) can be launched, e.g.: +``` +./dynamometer-workload/bin/start-workload.sh + -Dauditreplay.input-path=hdfs:///dyno/audit_logs/ + -Dauditreplay.output-path=hdfs:///dyno/results/ + -Dauditreplay.num-threads=50 + -nn_uri hdfs://namenode_address:port/ + -start_time_offset 5m + -mapper_class_name AuditReplayMapper +``` +The type of workload generation is configurable; AuditReplayMapper replays an audit log trace as discussed previously. +The AuditReplayMapper is configured via configurations; `auditreplay.input-path`, `auditreplay.output-path` and +`auditreplay.num-threads` are required to specify the input path for audit log files, the output path for the results, +and the number of threads per map task. A number of map tasks equal to the number of files in `input-path` will be +launched; each task will read in one of these input files and use `num-threads` threads to replay the events contained +within that file. A best effort is made to faithfully replay the audit log events at the same pace at which they +originally occurred (optionally, this can be adjusted by specifying `auditreplay.rate-factor` which is a multiplicative +factor towards the rate of replay, e.g. use 2.0 to replay the events at twice the original speed). + +### Integrated Workload Launch + +To have the infrastructure application client launch the workload automatically, parameters for the workload job +are passed to the infrastructure script. Only the AuditReplayMapper is supported in this fashion at this time. To +launch an integrated application with the same parameters as were used above, the following can be used: +``` +./dynamometer-infra/bin/start-dynamometer-cluster.sh + -hadoop_binary hadoop-3.0.2.tar.gz + -conf_path my-hadoop-conf + -fs_image_dir hdfs:///fsimage + -block_list_path hdfs:///dyno/blocks + -workload_replay_enable + -workload_input_path hdfs:///dyno/audit_logs/ + -workload_output_path hdfs:///dyno/results/ + -workload_threads_per_mapper 50 + -workload_start_delay 5m +``` +When run in this way, the client will automatically handle tearing down the Dyno-HDFS cluster once the +workload has completed. To see the full list of supported parameters, run this with the `-help` flag. + +## Architecture + +Dynamometer is implemented as an application on top of YARN. There are three main actors in a Dynamometer application: + +* Infrastructure is the simulated HDFS cluster. +* Workload simulates HDFS clients to generate load on the simulated NameNode. +* The driver coordinates the two other components. + +The logic encapsulated in the driver enables a user to perform a full test execution of Dynamometer with a single command, +making it possible to do things like sweeping over different parameters to find optimal configurations. + +![Dynamometer Infrastructure Application Architecture](./images/dynamometer-architecture-infra.png) + +The infrastructure application is written as a native YARN application in which a single NameNode and numerous DataNodes +are launched and wired together to create a fully simulated HDFS cluster. For Dynamometer to provide an extremely realistic scenario, +it is necessary to have a cluster which contains, from the NameNode’s perspective, the same information as a production cluster. +This is why the setup steps described above involve first collecting the FsImage file from a production NameNode and placing it onto +the host HDFS cluster. To avoid having to copy an entire cluster’s worth of blocks, Dynamometer leverages the fact that the actual +data stored in blocks is irrelevant to the NameNode, which is only aware of the block metadata. Dynamometer's blockgen job first +uses the Offline Image Viewer to turn the FsImage into XML, then parses this to extract the metadata for each block, then partitions this +information before placing it on HDFS for the simulated DataNodes to consume. `SimulatedFSDataset` is used to bypass the DataNode storage +layer and store only the block metadata, loaded from the information extracted in the previous step. This scheme allows Dynamometer to +pack many simulated DataNodes onto each physical node, as the size of the metadata is many orders of magnitude smaller than the data itself. + +To create a stress test that matches a production environment, Dynamometer needs a way to collect the information about the production workload. +For this the HDFS audit log is used, which contains a faithful record of all client-facing operations against the NameNode. By replaying this +audit log to recreate the client load, and running simulated DataNodes to recreate the cluster management load, Dynamometer is able to provide +a realistic simulation of the conditions of a production NameNode. + +![Dynamometer Replay Architecture](./images/dynamometer-architecture-replay.png) + +A heavily-loaded NameNode can service tens of thousands of operations per second; to induce such a load, Dynamometer needs numerous clients to submit +requests. In an effort to ensure that each request has the same effect and performance implications as its original submission, Dynamometer +attempts to make related requests (for example, a directory creation followed by a listing of that directory) in such a way as to preserve their original +ordering. It is for this reason that audit log files are suggested to be partitioned by source IP address, using the assumption that requests which +originated from the same host have more tightly coupled causal relationships than those which originated from different hosts. In the interest of +simplicity, the stress testing job is written as a map-only MapReduce job, in which each mapper consumes a partitioned audit log file and replays +the commands contained within against the simulated NameNode. During execution statistics are collected about the replay, such as latency for +different types of requests. + +## External Resources + +To see more information on Dynamometer, you can see the +[blog post announcing its initial release](https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum) +or [this presentation](https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-dynamometer-and-a-case-study-in-namenode-gc). diff --git a/hadoop-tools/hadoop-dynamometer/src/site/resources/css/site.css b/hadoop-tools/hadoop-dynamometer/src/site/resources/css/site.css new file mode 100644 index 00000000000..f830baafa8c --- /dev/null +++ b/hadoop-tools/hadoop-dynamometer/src/site/resources/css/site.css @@ -0,0 +1,30 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ +#banner { + height: 93px; + background: none; +} + +#bannerLeft img { + margin-left: 30px; + margin-top: 10px; +} + +#bannerRight img { + margin: 17px; +} + diff --git a/hadoop-tools/hadoop-dynamometer/src/site/resources/images/dynamometer-architecture-infra.png b/hadoop-tools/hadoop-dynamometer/src/site/resources/images/dynamometer-architecture-infra.png new file mode 100644 index 00000000000..2f921f61241 Binary files /dev/null and b/hadoop-tools/hadoop-dynamometer/src/site/resources/images/dynamometer-architecture-infra.png differ diff --git a/hadoop-tools/hadoop-dynamometer/src/site/resources/images/dynamometer-architecture-replay.png b/hadoop-tools/hadoop-dynamometer/src/site/resources/images/dynamometer-architecture-replay.png new file mode 100644 index 00000000000..897828d49e2 Binary files /dev/null and b/hadoop-tools/hadoop-dynamometer/src/site/resources/images/dynamometer-architecture-replay.png differ