HADOOP-10954. Adding site documents of hadoop-tools (Masatake Iwasaki via aw)
This commit is contained in:
parent
fe8222998f
commit
27f9edceed
|
@ -212,6 +212,9 @@ Release 2.6.0 - UNRELEASED
|
||||||
HADOOP-8808. Update FsShell documentation to mention deprecation of some
|
HADOOP-8808. Update FsShell documentation to mention deprecation of some
|
||||||
of the commands, and mention alternatives (Akira AJISAKA via aw)
|
of the commands, and mention alternatives (Akira AJISAKA via aw)
|
||||||
|
|
||||||
|
HADOOP-10954. Adding site documents of hadoop-tools (Masatake Iwasaki
|
||||||
|
via aw)
|
||||||
|
|
||||||
OPTIMIZATIONS
|
OPTIMIZATIONS
|
||||||
|
|
||||||
HADOOP-10838. Byte array native checksumming. (James Thomas via todd)
|
HADOOP-10838. Byte array native checksumming. (James Thomas via todd)
|
||||||
|
|
|
@ -0,0 +1,818 @@
|
||||||
|
<!---
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
you may not use this file except in compliance with the License.
|
||||||
|
You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License. See accompanying LICENSE file.
|
||||||
|
-->
|
||||||
|
|
||||||
|
Gridmix
|
||||||
|
=======
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
- [Overview](#Overview)
|
||||||
|
- [Usage](#Usage)
|
||||||
|
- [General Configuration Parameters](#General_Configuration_Parameters)
|
||||||
|
- [Job Types](#Job_Types)
|
||||||
|
- [Job Submission Policies](#Job_Submission_Policies)
|
||||||
|
- [Emulating Users and Queues](#Emulating_Users_and_Queues)
|
||||||
|
- [Emulating Distributed Cache Load](#Emulating_Distributed_Cache_Load)
|
||||||
|
- [Configuration of Simulated Jobs](#Configuration_of_Simulated_Jobs)
|
||||||
|
- [Emulating Compression/Decompression](#Emulating_CompressionDecompression)
|
||||||
|
- [Emulating High-Ram jobs](#Emulating_High-Ram_jobs)
|
||||||
|
- [Emulating resource usages](#Emulating_resource_usages)
|
||||||
|
- [Simplifying Assumptions](#Simplifying_Assumptions)
|
||||||
|
- [Appendix](#Appendix)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Overview
|
||||||
|
--------
|
||||||
|
|
||||||
|
GridMix is a benchmark for Hadoop clusters. It submits a mix of
|
||||||
|
synthetic jobs, modeling a profile mined from production loads.
|
||||||
|
|
||||||
|
There exist three versions of the GridMix tool. This document
|
||||||
|
discusses the third (checked into `src/contrib` ), distinct
|
||||||
|
from the two checked into the `src/benchmarks` sub-directory.
|
||||||
|
While the first two versions of the tool included stripped-down versions
|
||||||
|
of common jobs, both were principally saturation tools for stressing the
|
||||||
|
framework at scale. In support of a broader range of deployments and
|
||||||
|
finer-tuned job mixes, this version of the tool will attempt to model
|
||||||
|
the resource profiles of production jobs to identify bottlenecks, guide
|
||||||
|
development, and serve as a replacement for the existing GridMix
|
||||||
|
benchmarks.
|
||||||
|
|
||||||
|
To run GridMix, you need a MapReduce job trace describing the job mix
|
||||||
|
for a given cluster. Such traces are typically generated by Rumen (see
|
||||||
|
Rumen documentation). GridMix also requires input data from which the
|
||||||
|
synthetic jobs will be reading bytes. The input data need not be in any
|
||||||
|
particular format, as the synthetic jobs are currently binary readers.
|
||||||
|
If you are running on a new cluster, an optional step generating input
|
||||||
|
data may precede the run.
|
||||||
|
In order to emulate the load of production jobs from a given cluster
|
||||||
|
on the same or another cluster, follow these steps:
|
||||||
|
|
||||||
|
1. Locate the job history files on the production cluster. This
|
||||||
|
location is specified by the
|
||||||
|
`mapred.job.tracker.history.completed.location`
|
||||||
|
configuration property of the cluster.
|
||||||
|
|
||||||
|
2. Run Rumen to build a job trace in JSON format for all or select jobs.
|
||||||
|
|
||||||
|
3. Use GridMix with the job trace on the benchmark cluster.
|
||||||
|
|
||||||
|
Jobs submitted by GridMix have names of the form
|
||||||
|
"`GRIDMIXnnnnnn`", where
|
||||||
|
"`nnnnnn`" is a sequence number padded with leading zeroes.
|
||||||
|
|
||||||
|
|
||||||
|
Usage
|
||||||
|
-----
|
||||||
|
|
||||||
|
Basic command-line usage without configuration parameters:
|
||||||
|
|
||||||
|
org.apache.hadoop.mapred.gridmix.Gridmix [-generate <size>] [-users <users-list>] <iopath> <trace>
|
||||||
|
|
||||||
|
Basic command-line usage with configuration parameters:
|
||||||
|
|
||||||
|
org.apache.hadoop.mapred.gridmix.Gridmix \
|
||||||
|
-Dgridmix.client.submit.threads=10 -Dgridmix.output.directory=foo \
|
||||||
|
[-generate <size>] [-users <users-list>] <iopath> <trace>
|
||||||
|
|
||||||
|
> Configuration parameters like
|
||||||
|
> `-Dgridmix.client.submit.threads=10` and
|
||||||
|
> `-Dgridmix.output.directory=foo` as given above should
|
||||||
|
> be used *before* other GridMix parameters.
|
||||||
|
|
||||||
|
The `<iopath>` parameter is the working directory for
|
||||||
|
GridMix. Note that this can either be on the local file-system
|
||||||
|
or on HDFS, but it is highly recommended that it be the same as that for
|
||||||
|
the original job mix so that GridMix puts the same load on the local
|
||||||
|
file-system and HDFS respectively.
|
||||||
|
|
||||||
|
The `-generate` option is used to generate input data and
|
||||||
|
Distributed Cache files for the synthetic jobs. It accepts standard units
|
||||||
|
of size suffixes, e.g. `100g` will generate
|
||||||
|
100 * 2<sup>30</sup> bytes as input data.
|
||||||
|
`<iopath>/input` is the destination directory for
|
||||||
|
generated input data and/or the directory from which input data will be
|
||||||
|
read. HDFS-based Distributed Cache files are generated under the
|
||||||
|
distributed cache directory `<iopath>/distributedCache`.
|
||||||
|
If some of the needed Distributed Cache files are already existing in the
|
||||||
|
distributed cache directory, then only the remaining non-existing
|
||||||
|
Distributed Cache files are generated when `-generate` option
|
||||||
|
is specified.
|
||||||
|
|
||||||
|
The `-users` option is used to point to a users-list
|
||||||
|
file (see <a href="#usersqueues">Emulating Users and Queues</a>).
|
||||||
|
|
||||||
|
The `<trace>` parameter is a path to a job trace
|
||||||
|
generated by Rumen. This trace can be compressed (it must be readable
|
||||||
|
using one of the compression codecs supported by the cluster) or
|
||||||
|
uncompressed. Use "-" as the value of this parameter if you
|
||||||
|
want to pass an *uncompressed* trace via the standard
|
||||||
|
input-stream of GridMix.
|
||||||
|
|
||||||
|
The class `org.apache.hadoop.mapred.gridmix.Gridmix` can
|
||||||
|
be found in the JAR
|
||||||
|
`contrib/gridmix/hadoop-gridmix-$VERSION.jar` inside your
|
||||||
|
Hadoop installation, where `$VERSION` corresponds to the
|
||||||
|
version of Hadoop installed. A simple way of ensuring that this class
|
||||||
|
and all its dependencies are loaded correctly is to use the
|
||||||
|
`hadoop` wrapper script in Hadoop:
|
||||||
|
|
||||||
|
hadoop jar <gridmix-jar> org.apache.hadoop.mapred.gridmix.Gridmix \
|
||||||
|
[-generate <size>] [-users <users-list>] <iopath> <trace>
|
||||||
|
|
||||||
|
The supported configuration parameters are explained in the
|
||||||
|
following sections.
|
||||||
|
|
||||||
|
|
||||||
|
General Configuration Parameters
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Parameter</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.output.directory</code>
|
||||||
|
</td>
|
||||||
|
<td>The directory into which output will be written. If specified,
|
||||||
|
<code>iopath</code> will be relative to this parameter. The
|
||||||
|
submitting user must have read/write access to this directory. The
|
||||||
|
user should also be mindful of any quota issues that may arise
|
||||||
|
during a run. The default is "<code>gridmix</code>".</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.client.submit.threads</code>
|
||||||
|
</td>
|
||||||
|
<td>The number of threads submitting jobs to the cluster. This
|
||||||
|
also controls how many splits will be loaded into memory at a given
|
||||||
|
time, pending the submit time in the trace. Splits are pre-generated
|
||||||
|
to hit submission deadlines, so particularly dense traces may want
|
||||||
|
more submitting threads. However, storing splits in memory is
|
||||||
|
reasonably expensive, so you should raise this cautiously. The
|
||||||
|
default is 1 for the SERIAL job-submission policy (see
|
||||||
|
<a href="#policies">Job Submission Policies</a>) and one more than
|
||||||
|
the number of processors on the client machine for the other
|
||||||
|
policies.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.submit.multiplier</code>
|
||||||
|
</td>
|
||||||
|
<td>The multiplier to accelerate or decelerate the submission of
|
||||||
|
jobs. The time separating two jobs is multiplied by this factor.
|
||||||
|
The default value is 1.0. This is a crude mechanism to size
|
||||||
|
a job trace to a cluster.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.client.pending.queue.depth</code>
|
||||||
|
</td>
|
||||||
|
<td>The depth of the queue of job descriptions awaiting split
|
||||||
|
generation. The jobs read from the trace occupy a queue of this
|
||||||
|
depth before being processed by the submission threads. It is
|
||||||
|
unusual to configure this. The default is 5.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.gen.blocksize</code>
|
||||||
|
</td>
|
||||||
|
<td>The block-size of generated data. The default value is 256
|
||||||
|
MiB.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.gen.bytes.per.file</code>
|
||||||
|
</td>
|
||||||
|
<td>The maximum bytes written per file. The default value is 1
|
||||||
|
GiB.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.min.file.size</code>
|
||||||
|
</td>
|
||||||
|
<td>The minimum size of the input files. The default limit is 128
|
||||||
|
MiB. Tweak this parameter if you see an error-message like
|
||||||
|
"Found no satisfactory file" while testing GridMix with
|
||||||
|
a relatively-small input data-set.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.max.total.scan</code>
|
||||||
|
</td>
|
||||||
|
<td>The maximum size of the input files. The default limit is 100
|
||||||
|
TiB.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.task.jvm-options.enable</code>
|
||||||
|
</td>
|
||||||
|
<td>Enables Gridmix to configure the simulated task's max heap
|
||||||
|
options using the values obtained from the original task (i.e via
|
||||||
|
trace).
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
Job Types
|
||||||
|
---------
|
||||||
|
|
||||||
|
GridMix takes as input a job trace, essentially a stream of
|
||||||
|
JSON-encoded job descriptions. For each job description, the submission
|
||||||
|
client obtains the original job submission time and for each task in
|
||||||
|
that job, the byte and record counts read and written. Given this data,
|
||||||
|
it constructs a synthetic job with the same byte and record patterns as
|
||||||
|
recorded in the trace. It constructs jobs of two types:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Job Type</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>LOADJOB</code>
|
||||||
|
</td>
|
||||||
|
<td>A synthetic job that emulates the workload mentioned in Rumen
|
||||||
|
trace. In the current version we are supporting I/O. It reproduces
|
||||||
|
the I/O workload on the benchmark cluster. It does so by embedding
|
||||||
|
the detailed I/O information for every map and reduce task, such as
|
||||||
|
the number of bytes and records read and written, into each
|
||||||
|
job's input splits. The map tasks further relay the I/O patterns of
|
||||||
|
reduce tasks through the intermediate map output data.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>SLEEPJOB</code>
|
||||||
|
</td>
|
||||||
|
<td>A synthetic job where each task does *nothing* but sleep
|
||||||
|
for a certain duration as observed in the production trace. The
|
||||||
|
scalability of the Job Tracker is often limited by how many
|
||||||
|
heartbeats it can handle every second. (Heartbeats are periodic
|
||||||
|
messages sent from Task Trackers to update their status and grab new
|
||||||
|
tasks from the Job Tracker.) Since a benchmark cluster is typically
|
||||||
|
a fraction in size of a production cluster, the heartbeat traffic
|
||||||
|
generated by the slave nodes is well below the level of the
|
||||||
|
production cluster. One possible solution is to run multiple Task
|
||||||
|
Trackers on each slave node. This leads to the obvious problem that
|
||||||
|
the I/O workload generated by the synthetic jobs would thrash the
|
||||||
|
slave nodes. Hence the need for such a job.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
The following configuration parameters affect the job type:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Parameter</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.job.type</code>
|
||||||
|
</td>
|
||||||
|
<td>The value for this key can be one of LOADJOB or SLEEPJOB. The
|
||||||
|
default value is LOADJOB.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.key.fraction</code>
|
||||||
|
</td>
|
||||||
|
<td>For a LOADJOB type of job, the fraction of a record used for
|
||||||
|
the data for the key. The default value is 0.1.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.sleep.maptask-only</code>
|
||||||
|
</td>
|
||||||
|
<td>For a SLEEPJOB type of job, whether to ignore the reduce
|
||||||
|
tasks for the job. The default is <code>false</code>.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.sleep.fake-locations</code>
|
||||||
|
</td>
|
||||||
|
<td>For a SLEEPJOB type of job, the number of fake locations
|
||||||
|
for map tasks for the job. The default is 0.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.sleep.max-map-time</code>
|
||||||
|
</td>
|
||||||
|
<td>For a SLEEPJOB type of job, the maximum runtime for map
|
||||||
|
tasks for the job in milliseconds. The default is unlimited.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.sleep.max-reduce-time</code>
|
||||||
|
</td>
|
||||||
|
<td>For a SLEEPJOB type of job, the maximum runtime for reduce
|
||||||
|
tasks for the job in milliseconds. The default is unlimited.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
<a name="policies"></a>
|
||||||
|
|
||||||
|
Job Submission Policies
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
GridMix controls the rate of job submission. This control can be
|
||||||
|
based on the trace information or can be based on statistics it gathers
|
||||||
|
from the Job Tracker. Based on the submission policies users define,
|
||||||
|
GridMix uses the respective algorithm to control the job submission.
|
||||||
|
There are currently three types of policies:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Job Submission Policy</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>STRESS</code>
|
||||||
|
</td>
|
||||||
|
<td>Keep submitting jobs so that the cluster remains under stress.
|
||||||
|
In this mode we control the rate of job submission by monitoring
|
||||||
|
the real-time load of the cluster so that we can maintain a stable
|
||||||
|
stress level of workload on the cluster. Based on the statistics we
|
||||||
|
gather we define if a cluster is *underloaded* or
|
||||||
|
*overloaded* . We consider a cluster *underloaded* if
|
||||||
|
and only if the following three conditions are true:
|
||||||
|
<ol>
|
||||||
|
<li>the number of pending and running jobs are under a threshold
|
||||||
|
TJ</li>
|
||||||
|
<li>the number of pending and running maps are under threshold
|
||||||
|
TM</li>
|
||||||
|
<li>the number of pending and running reduces are under threshold
|
||||||
|
TR</li>
|
||||||
|
</ol>
|
||||||
|
The thresholds TJ, TM and TR are proportional to the size of the
|
||||||
|
cluster and map, reduce slots capacities respectively. In case of a
|
||||||
|
cluster being *overloaded* , we throttle the job submission.
|
||||||
|
In the actual calculation we also weigh each running task with its
|
||||||
|
remaining work - namely, a 90% complete task is only counted as 0.1
|
||||||
|
in calculation. Finally, to avoid a very large job blocking other
|
||||||
|
jobs, we limit the number of pending/waiting tasks each job can
|
||||||
|
contribute.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>REPLAY</code>
|
||||||
|
</td>
|
||||||
|
<td>In this mode we replay the job traces faithfully. This mode
|
||||||
|
exactly follows the time-intervals given in the actual job
|
||||||
|
trace.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>SERIAL</code>
|
||||||
|
</td>
|
||||||
|
<td>In this mode we submit the next job only once the job submitted
|
||||||
|
earlier is completed.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
The following configuration parameters affect the job submission policy:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Parameter</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.job-submission.policy</code>
|
||||||
|
</td>
|
||||||
|
<td>The value for this key would be one of the three: STRESS, REPLAY
|
||||||
|
or SERIAL. In most of the cases the value of key would be STRESS or
|
||||||
|
REPLAY. The default value is STRESS.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.throttle.jobs-to-tracker-ratio</code>
|
||||||
|
</td>
|
||||||
|
<td>In STRESS mode, the minimum ratio of running jobs to Task
|
||||||
|
Trackers in a cluster for the cluster to be considered
|
||||||
|
*overloaded* . This is the threshold TJ referred to earlier.
|
||||||
|
The default is 1.0.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.throttle.maps.task-to-slot-ratio</code>
|
||||||
|
</td>
|
||||||
|
<td>In STRESS mode, the minimum ratio of pending and running map
|
||||||
|
tasks (i.e. incomplete map tasks) to the number of map slots for
|
||||||
|
a cluster for the cluster to be considered *overloaded* .
|
||||||
|
This is the threshold TM referred to earlier. Running map tasks are
|
||||||
|
counted partially. For example, a 40% complete map task is counted
|
||||||
|
as 0.6 map tasks. The default is 2.0.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.throttle.reduces.task-to-slot-ratio</code>
|
||||||
|
</td>
|
||||||
|
<td>In STRESS mode, the minimum ratio of pending and running reduce
|
||||||
|
tasks (i.e. incomplete reduce tasks) to the number of reduce slots
|
||||||
|
for a cluster for the cluster to be considered *overloaded* .
|
||||||
|
This is the threshold TR referred to earlier. Running reduce tasks
|
||||||
|
are counted partially. For example, a 30% complete reduce task is
|
||||||
|
counted as 0.7 reduce tasks. The default is 2.5.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.throttle.maps.max-slot-share-per-job</code>
|
||||||
|
</td>
|
||||||
|
<td>In STRESS mode, the maximum share of a cluster's map-slots
|
||||||
|
capacity that can be counted toward a job's incomplete map tasks in
|
||||||
|
overload calculation. The default is 0.1.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.throttle.reducess.max-slot-share-per-job</code>
|
||||||
|
</td>
|
||||||
|
<td>In STRESS mode, the maximum share of a cluster's reduce-slots
|
||||||
|
capacity that can be counted toward a job's incomplete reduce tasks
|
||||||
|
in overload calculation. The default is 0.1.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
<a name="usersqueues"></a>
|
||||||
|
|
||||||
|
Emulating Users and Queues
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
Typical production clusters are often shared with different users and
|
||||||
|
the cluster capacity is divided among different departments through job
|
||||||
|
queues. Ensuring fairness among jobs from all users, honoring queue
|
||||||
|
capacity allocation policies and avoiding an ill-behaving job from
|
||||||
|
taking over the cluster adds significant complexity in Hadoop software.
|
||||||
|
To be able to sufficiently test and discover bugs in these areas,
|
||||||
|
GridMix must emulate the contentions of jobs from different users and/or
|
||||||
|
submitted to different queues.
|
||||||
|
|
||||||
|
Emulating multiple queues is easy - we simply set up the benchmark
|
||||||
|
cluster with the same queue configuration as the production cluster and
|
||||||
|
we configure synthetic jobs so that they get submitted to the same queue
|
||||||
|
as recorded in the trace. However, not all users shown in the trace have
|
||||||
|
accounts on the benchmark cluster. Instead, we set up a number of testing
|
||||||
|
user accounts and associate each unique user in the trace to testing
|
||||||
|
users in a round-robin fashion.
|
||||||
|
|
||||||
|
The following configuration parameters affect the emulation of users
|
||||||
|
and queues:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Parameter</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.job-submission.use-queue-in-trace</code>
|
||||||
|
</td>
|
||||||
|
<td>When set to <code>true</code> it uses exactly the same set of
|
||||||
|
queues as those mentioned in the trace. The default value is
|
||||||
|
<code>false</code>.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.job-submission.default-queue</code>
|
||||||
|
</td>
|
||||||
|
<td>Specifies the default queue to which all the jobs would be
|
||||||
|
submitted. If this parameter is not specified, GridMix uses the
|
||||||
|
default queue defined for the submitting user on the cluster.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.user.resolve.class</code>
|
||||||
|
</td>
|
||||||
|
<td>Specifies which <code>UserResolver</code> implementation to use.
|
||||||
|
We currently have three implementations:
|
||||||
|
<ol>
|
||||||
|
<li><code>org.apache.hadoop.mapred.gridmix.EchoUserResolver</code>
|
||||||
|
- submits a job as the user who submitted the original job. All
|
||||||
|
the users of the production cluster identified in the job trace
|
||||||
|
must also have accounts on the benchmark cluster in this case.</li>
|
||||||
|
<li><code>org.apache.hadoop.mapred.gridmix.SubmitterUserResolver</code>
|
||||||
|
- submits all the jobs as current GridMix user. In this case we
|
||||||
|
simply map all the users in the trace to the current GridMix user
|
||||||
|
and submit the job.</li>
|
||||||
|
<li><code>org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver</code>
|
||||||
|
- maps trace users to test users in a round-robin fashion. In
|
||||||
|
this case we set up a number of testing user accounts and
|
||||||
|
associate each unique user in the trace to testing users in a
|
||||||
|
round-robin fashion.</li>
|
||||||
|
</ol>
|
||||||
|
The default is
|
||||||
|
<code>org.apache.hadoop.mapred.gridmix.SubmitterUserResolver</code>.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
If the parameter `gridmix.user.resolve.class` is set to
|
||||||
|
`org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver`,
|
||||||
|
we need to define a users-list file with a list of test users.
|
||||||
|
This is specified using the `-users` option to GridMix.
|
||||||
|
|
||||||
|
<note>
|
||||||
|
Specifying a users-list file using the `-users` option is
|
||||||
|
mandatory when using the round-robin user-resolver. Other user-resolvers
|
||||||
|
ignore this option.
|
||||||
|
</note>
|
||||||
|
|
||||||
|
A users-list file has one user per line, each line of the format:
|
||||||
|
|
||||||
|
<username>
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
user1
|
||||||
|
user2
|
||||||
|
user3
|
||||||
|
|
||||||
|
In the above example we have defined three users `user1`, `user2` and `user3`.
|
||||||
|
Now we would associate each unique user in the trace to the above users
|
||||||
|
defined in round-robin fashion. For example, if trace's users are
|
||||||
|
`tuser1`, `tuser2`, `tuser3`, `tuser4` and `tuser5`, then the mappings would be:
|
||||||
|
|
||||||
|
tuser1 -> user1
|
||||||
|
tuser2 -> user2
|
||||||
|
tuser3 -> user3
|
||||||
|
tuser4 -> user1
|
||||||
|
tuser5 -> user2
|
||||||
|
|
||||||
|
For backward compatibility reasons, each line of users-list file can
|
||||||
|
contain username followed by groupnames in the form username[,group]*.
|
||||||
|
The groupnames will be ignored by Gridmix.
|
||||||
|
|
||||||
|
|
||||||
|
Emulating Distributed Cache Load
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
Gridmix emulates Distributed Cache load by default for LOADJOB type of
|
||||||
|
jobs. This is done by precreating the needed Distributed Cache files for all
|
||||||
|
the simulated jobs as part of a separate MapReduce job.
|
||||||
|
|
||||||
|
Emulation of Distributed Cache load in gridmix simulated jobs can be
|
||||||
|
disabled by configuring the property
|
||||||
|
`gridmix.distributed-cache-emulation.enable` to
|
||||||
|
`false`.
|
||||||
|
But generation of Distributed Cache data by gridmix is driven by
|
||||||
|
`-generate` option and is independent of this configuration
|
||||||
|
property.
|
||||||
|
|
||||||
|
Both generation of Distributed Cache files and emulation of
|
||||||
|
Distributed Cache load are disabled if:
|
||||||
|
|
||||||
|
* input trace comes from the standard input-stream instead of file, or
|
||||||
|
* `<iopath>` specified is on local file-system, or
|
||||||
|
* any of the ascendant directories of the distributed cache directory
|
||||||
|
i.e. `<iopath>/distributedCache` (including the distributed
|
||||||
|
cache directory) doesn't have execute permission for others.
|
||||||
|
|
||||||
|
|
||||||
|
Configuration of Simulated Jobs
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
Gridmix3 sets some configuration properties in the simulated Jobs
|
||||||
|
submitted by it so that they can be mapped back to the corresponding Job
|
||||||
|
in the input Job trace. These configuration parameters include:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Parameter</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.job.original-job-id</code>
|
||||||
|
</td>
|
||||||
|
<td> The job id of the original cluster's job corresponding to this
|
||||||
|
simulated job.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<code>gridmix.job.original-job-name</code>
|
||||||
|
</td>
|
||||||
|
<td> The job name of the original cluster's job corresponding to this
|
||||||
|
simulated job.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
Emulating Compression/Decompression
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
|
MapReduce supports data compression and decompression.
|
||||||
|
Input to a MapReduce job can be compressed. Similarly, output of Map
|
||||||
|
and Reduce tasks can also be compressed. Compression/Decompression
|
||||||
|
emulation in GridMix is important because emulating
|
||||||
|
compression/decompression will effect the CPU and Memory usage of the
|
||||||
|
task. A task emulating compression/decompression will affect other
|
||||||
|
tasks and daemons running on the same node.
|
||||||
|
|
||||||
|
Compression emulation is enabled if
|
||||||
|
`gridmix.compression-emulation.enable` is set to
|
||||||
|
`true`. By default compression emulation is enabled for
|
||||||
|
jobs of type *LOADJOB* . With compression emulation enabled,
|
||||||
|
GridMix will now generate compressed text data with a constant
|
||||||
|
compression ratio. Hence a simulated GridMix job will now emulate
|
||||||
|
compression/decompression using compressible text data (having a
|
||||||
|
constant compression ratio), irrespective of the compression ratio
|
||||||
|
observed in the actual job.
|
||||||
|
|
||||||
|
A typical MapReduce Job deals with data compression/decompression in
|
||||||
|
the following phases
|
||||||
|
|
||||||
|
* `Job input data decompression: ` GridMix generates
|
||||||
|
compressible input data when compression emulation is enabled.
|
||||||
|
Based on the original job's configuration, a simulated GridMix job
|
||||||
|
will use a decompressor to read the compressed input data.
|
||||||
|
Currently, GridMix uses
|
||||||
|
`mapreduce.input.fileinputformat.inputdir` to determine
|
||||||
|
if the original job used compressed input data or
|
||||||
|
not. If the original job's input files are uncompressed then the
|
||||||
|
simulated job will read the compressed input file without using a
|
||||||
|
decompressor.
|
||||||
|
|
||||||
|
* `Intermediate data compression and decompression: `
|
||||||
|
If the original job has map output compression enabled then GridMix
|
||||||
|
too will enable map output compression for the simulated job.
|
||||||
|
Accordingly, the reducers will use a decompressor to read the map
|
||||||
|
output data.
|
||||||
|
|
||||||
|
* `Job output data compression: `
|
||||||
|
If the original job's output is compressed then GridMix
|
||||||
|
too will enable job output compression for the simulated job.
|
||||||
|
|
||||||
|
The following configuration parameters affect compression emulation
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th>Parameter</th>
|
||||||
|
<th>Description</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>gridmix.compression-emulation.enable</td>
|
||||||
|
<td>Enables compression emulation in simulated GridMix jobs.
|
||||||
|
Default is true.</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
With compression emulation turned on, GridMix will generate compressed
|
||||||
|
input data. Hence the total size of the input
|
||||||
|
data will be lesser than the expected size. Set
|
||||||
|
`gridmix.min.file.size` to a smaller value (roughly 10% of
|
||||||
|
`gridmix.gen.bytes.per.file`) for enabling GridMix to
|
||||||
|
correctly emulate compression.
|
||||||
|
|
||||||
|
|
||||||
|
Emulating High-Ram jobs
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
MapReduce allows users to define a job as a High-Ram job. Tasks from a
|
||||||
|
High-Ram job can occupy multiple slots on the task-trackers.
|
||||||
|
Task-tracker assigns fixed virtual memory for each slot. Tasks from
|
||||||
|
High-Ram jobs can occupy multiple slots and thus can use up more
|
||||||
|
virtual memory as compared to a default task.
|
||||||
|
|
||||||
|
Emulating this behavior is important because of the following reasons
|
||||||
|
|
||||||
|
* Impact on scheduler: Scheduling of tasks from High-Ram jobs
|
||||||
|
impacts the scheduling behavior as it might result into slot
|
||||||
|
reservation and slot/resource utilization.
|
||||||
|
|
||||||
|
* Impact on the node : Since High-Ram tasks occupy multiple slots,
|
||||||
|
trackers do some bookkeeping for allocating extra resources for
|
||||||
|
these tasks. Thus this becomes a precursor for memory emulation
|
||||||
|
where tasks with high memory requirements needs to be considered
|
||||||
|
as a High-Ram task.
|
||||||
|
|
||||||
|
High-Ram feature emulation can be disabled by setting
|
||||||
|
`gridmix.highram-emulation.enable` to `false`.
|
||||||
|
|
||||||
|
|
||||||
|
Emulating resource usages
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
Usages of resources like CPU, physical memory, virtual memory, JVM heap
|
||||||
|
etc are recorded by MapReduce using its task counters. This information
|
||||||
|
is used by GridMix to emulate the resource usages in the simulated
|
||||||
|
tasks. Emulating resource usages will help GridMix exert similar load
|
||||||
|
on the test cluster as seen in the actual cluster.
|
||||||
|
|
||||||
|
MapReduce tasks use up resources during its entire lifetime. GridMix
|
||||||
|
also tries to mimic this behavior by spanning resource usage emulation
|
||||||
|
across the entire lifetime of the simulated task. Each resource to be
|
||||||
|
emulated should have an *emulator* associated with it.
|
||||||
|
Each such *emulator* should implement the
|
||||||
|
`org.apache.hadoop.mapred.gridmix.emulators.resourceusage
|
||||||
|
.ResourceUsageEmulatorPlugin` interface. Resource
|
||||||
|
*emulators* in GridMix are *plugins* that can be
|
||||||
|
configured (plugged in or out) before every run. GridMix users can
|
||||||
|
configure multiple emulator *plugins* by passing a comma
|
||||||
|
separated list of *emulators* as a value for the
|
||||||
|
`gridmix.emulators.resource-usage.plugins` parameter.
|
||||||
|
|
||||||
|
List of *emulators* shipped with GridMix:
|
||||||
|
|
||||||
|
* Cumulative CPU usage *emulator* :
|
||||||
|
GridMix uses the cumulative CPU usage value published by Rumen
|
||||||
|
and makes sure that the total cumulative CPU usage of the simulated
|
||||||
|
task is close to the value published by Rumen. GridMix can be
|
||||||
|
configured to emulate cumulative CPU usage by adding
|
||||||
|
`org.apache.hadoop.mapred.gridmix.emulators.resourceusage
|
||||||
|
.CumulativeCpuUsageEmulatorPlugin` to the list of emulator
|
||||||
|
*plugins* configured for the
|
||||||
|
`gridmix.emulators.resource-usage.plugins` parameter.
|
||||||
|
CPU usage emulator is designed in such a way that
|
||||||
|
it only emulates at specific progress boundaries of the task. This
|
||||||
|
interval can be configured using
|
||||||
|
`gridmix.emulators.resource-usage.cpu.emulation-interval`.
|
||||||
|
The default value for this parameter is `0.1` i.e
|
||||||
|
`10%`.
|
||||||
|
|
||||||
|
* Total heap usage *emulator* :
|
||||||
|
GridMix uses the total heap usage value published by Rumen
|
||||||
|
and makes sure that the total heap usage of the simulated
|
||||||
|
task is close to the value published by Rumen. GridMix can be
|
||||||
|
configured to emulate total heap usage by adding
|
||||||
|
`org.apache.hadoop.mapred.gridmix.emulators.resourceusage
|
||||||
|
.TotalHeapUsageEmulatorPlugin` to the list of emulator
|
||||||
|
*plugins* configured for the
|
||||||
|
`gridmix.emulators.resource-usage.plugins` parameter.
|
||||||
|
Heap usage emulator is designed in such a way that
|
||||||
|
it only emulates at specific progress boundaries of the task. This
|
||||||
|
interval can be configured using
|
||||||
|
`gridmix.emulators.resource-usage.heap.emulation-interval
|
||||||
|
`. The default value for this parameter is `0.1`
|
||||||
|
i.e `10%` progress interval.
|
||||||
|
|
||||||
|
Note that GridMix will emulate resource usages only for jobs of type *LOADJOB* .
|
||||||
|
|
||||||
|
|
||||||
|
Simplifying Assumptions
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
GridMix will be developed in stages, incorporating feedback and
|
||||||
|
patches from the community. Currently its intent is to evaluate
|
||||||
|
MapReduce and HDFS performance and not the layers on top of them (i.e.
|
||||||
|
the extensive lib and sub-project space). Given these two limitations,
|
||||||
|
the following characteristics of job load are not currently captured in
|
||||||
|
job traces and cannot be accurately reproduced in GridMix:
|
||||||
|
|
||||||
|
* *Filesystem Properties* - No attempt is made to match block
|
||||||
|
sizes, namespace hierarchies, or any property of input, intermediate
|
||||||
|
or output data other than the bytes/records consumed and emitted from
|
||||||
|
a given task. This implies that some of the most heavily-used parts of
|
||||||
|
the system - text processing, streaming, etc. - cannot be meaningfully tested
|
||||||
|
with the current implementation.
|
||||||
|
|
||||||
|
* *I/O Rates* - The rate at which records are
|
||||||
|
consumed/emitted is assumed to be limited only by the speed of the
|
||||||
|
reader/writer and constant throughout the task.
|
||||||
|
|
||||||
|
* *Memory Profile* - No data on tasks' memory usage over time
|
||||||
|
is available, though the max heap-size is retained.
|
||||||
|
|
||||||
|
* *Skew* - The records consumed and emitted to/from a given
|
||||||
|
task are assumed to follow observed averages, i.e. records will be
|
||||||
|
more regular than may be seen in the wild. Each map also generates
|
||||||
|
a proportional percentage of data for each reduce, so a job with
|
||||||
|
unbalanced input will be flattened.
|
||||||
|
|
||||||
|
* *Job Failure* - User code is assumed to be correct.
|
||||||
|
|
||||||
|
* *Job Independence* - The output or outcome of one job does
|
||||||
|
not affect when or whether a subsequent job will run.
|
||||||
|
|
||||||
|
|
||||||
|
Appendix
|
||||||
|
--------
|
||||||
|
|
||||||
|
Issues tracking the original implementations of
|
||||||
|
<a href="https://issues.apache.org/jira/browse/HADOOP-2369">GridMix1</a>,
|
||||||
|
<a href="https://issues.apache.org/jira/browse/HADOOP-3770">GridMix2</a>,
|
||||||
|
and <a href="https://issues.apache.org/jira/browse/MAPREDUCE-776">GridMix3</a>
|
||||||
|
can be found on the Apache Hadoop MapReduce JIRA. Other issues tracking
|
||||||
|
the current development of GridMix can be found by searching
|
||||||
|
<a href="https://issues.apache.org/jira/browse/MAPREDUCE/component/12313086">
|
||||||
|
the Apache Hadoop MapReduce JIRA</a>
|
|
@ -0,0 +1,397 @@
|
||||||
|
<!--
|
||||||
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||||
|
contributor license agreements. See the NOTICE file distributed with
|
||||||
|
this work for additional information regarding copyright ownership.
|
||||||
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||||
|
(the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
#set ( $H3 = '###' )
|
||||||
|
#set ( $H4 = '####' )
|
||||||
|
#set ( $H5 = '#####' )
|
||||||
|
|
||||||
|
Rumen
|
||||||
|
=====
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
- [Overview](#Overview)
|
||||||
|
- [Motivation](#Motivation)
|
||||||
|
- [Components](#Components)
|
||||||
|
- [How to use Rumen?](#How_to_use_Rumen)
|
||||||
|
- [Trace Builder](#Trace_Builder)
|
||||||
|
- [Example](#Example)
|
||||||
|
- [Folder](#Folder)
|
||||||
|
- [Examples](#Examples)
|
||||||
|
- [Appendix](#Appendix)
|
||||||
|
- [Resources](#Resources)
|
||||||
|
- [Dependencies](#Dependencies)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Overview
|
||||||
|
--------
|
||||||
|
|
||||||
|
*Rumen* is a data extraction and analysis tool built for
|
||||||
|
*Apache Hadoop*. *Rumen* mines *JobHistory* logs to
|
||||||
|
extract meaningful data and stores it in an easily-parsed, condensed
|
||||||
|
format or *digest*. The raw trace data from MapReduce logs are
|
||||||
|
often insufficient for simulation, emulation, and benchmarking, as these
|
||||||
|
tools often attempt to measure conditions that did not occur in the
|
||||||
|
source data. For example, if a task ran locally in the raw trace data
|
||||||
|
but a simulation of the scheduler elects to run that task on a remote
|
||||||
|
rack, the simulator requires a runtime its input cannot provide.
|
||||||
|
To fill in these gaps, Rumen performs a statistical analysis of the
|
||||||
|
digest to estimate the variables the trace doesn't supply. Rumen traces
|
||||||
|
drive both Gridmix (a benchmark of Hadoop MapReduce clusters) and Mumak
|
||||||
|
(a simulator for the JobTracker).
|
||||||
|
|
||||||
|
|
||||||
|
$H3 Motivation
|
||||||
|
|
||||||
|
* Extracting meaningful data from *JobHistory* logs is a common
|
||||||
|
task for any tool built to work on *MapReduce*. It
|
||||||
|
is tedious to write a custom tool which is so tightly coupled with
|
||||||
|
the *MapReduce* framework. Hence there is a need for a
|
||||||
|
built-in tool for performing framework level task of log parsing and
|
||||||
|
analysis. Such a tool would insulate external systems depending on
|
||||||
|
job history against the changes made to the job history format.
|
||||||
|
|
||||||
|
* Performing statistical analysis of various attributes of a
|
||||||
|
*MapReduce Job* such as *task runtimes, task failures
|
||||||
|
etc* is another common task that the benchmarking
|
||||||
|
and simulation tools might need. *Rumen* generates
|
||||||
|
<a href="http://en.wikipedia.org/wiki/Cumulative_distribution_function">
|
||||||
|
*Cumulative Distribution Functions (CDF)*
|
||||||
|
</a> for the Map/Reduce task runtimes.
|
||||||
|
Runtime CDF can be used for extrapolating the task runtime of
|
||||||
|
incomplete, missing and synthetic tasks. Similarly CDF is also
|
||||||
|
computed for the total number of successful tasks for every attempt.
|
||||||
|
|
||||||
|
|
||||||
|
$H3 Components
|
||||||
|
|
||||||
|
*Rumen* consists of 2 components
|
||||||
|
|
||||||
|
* *Trace Builder* :
|
||||||
|
Converts *JobHistory* logs into an easily-parsed format.
|
||||||
|
Currently `TraceBuilder` outputs the trace in
|
||||||
|
<a href="http://www.json.org/">*JSON*</a>
|
||||||
|
format.
|
||||||
|
|
||||||
|
* *Folder *:
|
||||||
|
A utility to scale the input trace. A trace obtained from
|
||||||
|
*TraceBuilder* simply summarizes the jobs in the
|
||||||
|
input folders and files. The time-span within which all the jobs in
|
||||||
|
a given trace finish can be considered as the trace runtime.
|
||||||
|
*Folder* can be used to scale the runtime of a trace.
|
||||||
|
Decreasing the trace runtime might involve dropping some jobs from
|
||||||
|
the input trace and scaling down the runtime of remaining jobs.
|
||||||
|
Increasing the trace runtime might involve adding some dummy jobs to
|
||||||
|
the resulting trace and scaling up the runtime of individual jobs.
|
||||||
|
|
||||||
|
|
||||||
|
How to use Rumen?
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Converting *JobHistory* logs into a desired job-trace consists of 2 steps
|
||||||
|
|
||||||
|
1. Extracting information into an intermediate format
|
||||||
|
|
||||||
|
2. Adjusting the job-trace obtained from the intermediate trace to
|
||||||
|
have the desired properties.
|
||||||
|
|
||||||
|
> Extracting information from *JobHistory* logs is a one time
|
||||||
|
> operation. This so called *Gold Trace* can be reused to
|
||||||
|
> generate traces with desired values of properties such as
|
||||||
|
> `output-duration`, `concentration` etc.
|
||||||
|
|
||||||
|
*Rumen* provides 2 basic commands
|
||||||
|
|
||||||
|
* `TraceBuilder`
|
||||||
|
* `Folder`
|
||||||
|
|
||||||
|
Firstly, we need to generate the *Gold Trace*. Hence the first
|
||||||
|
step is to run `TraceBuilder` on a job-history folder.
|
||||||
|
The output of the `TraceBuilder` is a job-trace file (and an
|
||||||
|
optional cluster-topology file). In case we want to scale the output, we
|
||||||
|
can use the `Folder` utility to fold the current trace to the
|
||||||
|
desired length. The remaining part of this section explains these
|
||||||
|
utilities in detail.
|
||||||
|
|
||||||
|
> Examples in this section assumes that certain libraries are present
|
||||||
|
> in the java CLASSPATH. See <em>Section-3.2</em> for more details.
|
||||||
|
|
||||||
|
|
||||||
|
$H3 Trace Builder
|
||||||
|
|
||||||
|
`Command:`
|
||||||
|
|
||||||
|
java org.apache.hadoop.tools.rumen.TraceBuilder [options] <jobtrace-output> <topology-output> <inputs>
|
||||||
|
|
||||||
|
This command invokes the `TraceBuilder` utility of
|
||||||
|
*Rumen*. It converts the JobHistory files into a series of JSON
|
||||||
|
objects and writes them into the `<jobtrace-output>`
|
||||||
|
file. It also extracts the cluster layout (topology) and writes it in
|
||||||
|
the`<topology-output>` file.
|
||||||
|
`<inputs>` represents a space-separated list of
|
||||||
|
JobHistory files and folders.
|
||||||
|
|
||||||
|
> 1) Input and output to `TraceBuilder` is expected to
|
||||||
|
> be a fully qualified FileSystem path. So use file://
|
||||||
|
> to specify files on the `local` FileSystem and
|
||||||
|
> hdfs:// to specify files on HDFS. Since input files or
|
||||||
|
> folder are FileSystem paths, it means that they can be globbed.
|
||||||
|
> This can be useful while specifying multiple file paths using
|
||||||
|
> regular expressions.
|
||||||
|
|
||||||
|
> 2) By default, TraceBuilder does not recursively scan the input
|
||||||
|
> folder for job history files. Only the files that are directly
|
||||||
|
> placed under the input folder will be considered for generating
|
||||||
|
> the trace. To add all the files under the input directory by
|
||||||
|
> recursively scanning the input directory, use ‘-recursive’
|
||||||
|
> option.
|
||||||
|
|
||||||
|
Cluster topology is used as follows :
|
||||||
|
|
||||||
|
* To reconstruct the splits and make sure that the
|
||||||
|
distances/latencies seen in the actual run are modeled correctly.
|
||||||
|
|
||||||
|
* To extrapolate splits information for tasks with missing splits
|
||||||
|
details or synthetically generated tasks.
|
||||||
|
|
||||||
|
`Options :`
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th> Parameter</th>
|
||||||
|
<th> Description</th>
|
||||||
|
<th> Notes </th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-demuxer</code></td>
|
||||||
|
<td>Used to read the jobhistory files. The default is
|
||||||
|
<code>DefaultInputDemuxer</code>.</td>
|
||||||
|
<td>Demuxer decides how the input file maps to jobhistory file(s).
|
||||||
|
Job history logs and job configuration files are typically small
|
||||||
|
files, and can be more effectively stored when embedded in some
|
||||||
|
container file format like SequenceFile or TFile. To support such
|
||||||
|
usage cases, one can specify a customized Demuxer class that can
|
||||||
|
extract individual job history logs and job configuration files
|
||||||
|
from the source files.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-recursive</code></td>
|
||||||
|
<td>Recursively traverse input paths for job history logs.</td>
|
||||||
|
<td>This option should be used to inform the TraceBuilder to
|
||||||
|
recursively scan the input paths and process all the files under it.
|
||||||
|
Note that, by default, only the history logs that are directly under
|
||||||
|
the input folder are considered for generating the trace.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
$H4 Example
|
||||||
|
|
||||||
|
java org.apache.hadoop.tools.rumen.TraceBuilder file:///home/user/job-trace.json file:///home/user/topology.output file:///home/user/logs/history/done
|
||||||
|
|
||||||
|
This will analyze all the jobs in
|
||||||
|
|
||||||
|
`/home/user/logs/history/done` stored on the
|
||||||
|
`local` FileSystem and output the jobtraces in
|
||||||
|
`/home/user/job-trace.json` along with topology
|
||||||
|
information in `/home/user/topology.output`.
|
||||||
|
|
||||||
|
|
||||||
|
$H3 Folder
|
||||||
|
|
||||||
|
`Command`:
|
||||||
|
|
||||||
|
java org.apache.hadoop.tools.rumen.Folder [options] [input] [output]
|
||||||
|
|
||||||
|
> Input and output to `Folder` is expected to be a fully
|
||||||
|
> qualified FileSystem path. So use file:// to specify
|
||||||
|
> files on the `local` FileSystem and hdfs:// to
|
||||||
|
> specify files on HDFS.
|
||||||
|
|
||||||
|
This command invokes the `Folder` utility of
|
||||||
|
*Rumen*. Folding essentially means that the output duration of
|
||||||
|
the resulting trace is fixed and job timelines are adjusted
|
||||||
|
to respect the final output duration.
|
||||||
|
|
||||||
|
`Options :`
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th> Parameter</th>
|
||||||
|
<th> Description</th>
|
||||||
|
<th> Notes </th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-input-cycle</code></td>
|
||||||
|
<td>Defines the basic unit of time for the folding operation. There is
|
||||||
|
no default value for <code>input-cycle</code>.
|
||||||
|
<strong>Input cycle must be provided</strong>.
|
||||||
|
</td>
|
||||||
|
<td>'<code>-input-cycle 10m</code>'
|
||||||
|
implies that the whole trace run will be now sliced at a 10min
|
||||||
|
interval. Basic operations will be done on the 10m chunks. Note
|
||||||
|
that *Rumen* understands various time units like
|
||||||
|
<em>m(min), h(hour), d(days) etc</em>.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-output-duration</code></td>
|
||||||
|
<td>This parameter defines the final runtime of the trace.
|
||||||
|
Default value if <strong>1 hour</strong>.
|
||||||
|
</td>
|
||||||
|
<td>'<code>-output-duration 30m</code>'
|
||||||
|
implies that the resulting trace will have a max runtime of
|
||||||
|
30mins. All the jobs in the input trace file will be folded and
|
||||||
|
scaled to fit this window.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-concentration</code></td>
|
||||||
|
<td>Set the concentration of the resulting trace. Default value is
|
||||||
|
<strong>1</strong>.
|
||||||
|
</td>
|
||||||
|
<td>If the total runtime of the resulting trace is less than the total
|
||||||
|
runtime of the input trace, then the resulting trace would contain
|
||||||
|
lesser number of jobs as compared to the input trace. This
|
||||||
|
essentially means that the output is diluted. To increase the
|
||||||
|
density of jobs, set the concentration to a higher value.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-debug</code></td>
|
||||||
|
<td>Run the Folder in debug mode. By default it is set to
|
||||||
|
<strong>false</strong>.</td>
|
||||||
|
<td>In debug mode, the Folder will print additional statements for
|
||||||
|
debugging. Also the intermediate files generated in the scratch
|
||||||
|
directory will not be cleaned up.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-seed</code></td>
|
||||||
|
<td>Initial seed to the Random Number Generator. By default, a Random
|
||||||
|
Number Generator is used to generate a seed and the seed value is
|
||||||
|
reported back to the user for future use.
|
||||||
|
</td>
|
||||||
|
<td>If an initial seed is passed, then the <code>Random Number
|
||||||
|
Generator</code> will generate the random numbers in the same
|
||||||
|
sequence i.e the sequence of random numbers remains same if the
|
||||||
|
same seed is used. Folder uses Random Number Generator to decide
|
||||||
|
whether or not to emit the job.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-temp-directory</code></td>
|
||||||
|
<td>Temporary directory for the Folder. By default the <strong>output
|
||||||
|
folder's parent directory</strong> is used as the scratch space.
|
||||||
|
</td>
|
||||||
|
<td>This is the scratch space used by Folder. All the
|
||||||
|
temporary files are cleaned up in the end unless the Folder is run
|
||||||
|
in <code>debug</code> mode.</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-skew-buffer-length</code></td>
|
||||||
|
<td>Enables <em>Folder</em> to tolerate skewed jobs.
|
||||||
|
The default buffer length is <strong>0</strong>.</td>
|
||||||
|
<td>'<code>-skew-buffer-length 100</code>'
|
||||||
|
indicates that if the jobs appear out of order within a window
|
||||||
|
size of 100, then they will be emitted in-order by the folder.
|
||||||
|
If a job appears out-of-order outside this window, then the Folder
|
||||||
|
will bail out provided <code>-allow-missorting</code> is not set.
|
||||||
|
<em>Folder</em> reports the maximum skew size seen in the
|
||||||
|
input trace for future use.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>-allow-missorting</code></td>
|
||||||
|
<td>Enables <em>Folder</em> to tolerate out-of-order jobs. By default
|
||||||
|
mis-sorting is not allowed.
|
||||||
|
</td>
|
||||||
|
<td>If mis-sorting is allowed, then the <em>Folder</em> will ignore
|
||||||
|
out-of-order jobs that cannot be deskewed using a skew buffer of
|
||||||
|
size specified using <code>-skew-buffer-length</code>. If
|
||||||
|
mis-sorting is not allowed, then the Folder will bail out if the
|
||||||
|
skew buffer is incapable of tolerating the skew.
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
$H4 Examples
|
||||||
|
|
||||||
|
$H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime
|
||||||
|
|
||||||
|
java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json
|
||||||
|
|
||||||
|
If the folded jobs are out of order then the command will bail out.
|
||||||
|
|
||||||
|
$H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime and tolerate some skewness
|
||||||
|
|
||||||
|
java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -allow-missorting -skew-buffer-length 100 file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json
|
||||||
|
|
||||||
|
If the folded jobs are out of order, then atmost
|
||||||
|
100 jobs will be de-skewed. If the 101<sup>st</sup> job is
|
||||||
|
*out-of-order*, then the command will bail out.
|
||||||
|
|
||||||
|
$H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime in debug mode
|
||||||
|
|
||||||
|
java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -debug -temp-directory file:///tmp/debug file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json
|
||||||
|
|
||||||
|
This will fold the 10hr job-trace file
|
||||||
|
`file:///home/user/job-trace.json` to finish within 1hr
|
||||||
|
and use `file:///tmp/debug` as the temporary directory.
|
||||||
|
The intermediate files in the temporary directory will not be cleaned
|
||||||
|
up.
|
||||||
|
|
||||||
|
$H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime with custom concentration.
|
||||||
|
|
||||||
|
java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -concentration 2 file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json</source>
|
||||||
|
|
||||||
|
This will fold the 10hr job-trace file
|
||||||
|
`file:///home/user/job-trace.json` to finish within 1hr
|
||||||
|
with concentration of 2. `Example-2.3.2` will retain 10%
|
||||||
|
of the jobs. With *concentration* as 2, 20% of the total input
|
||||||
|
jobs will be retained.
|
||||||
|
|
||||||
|
|
||||||
|
Appendix
|
||||||
|
--------
|
||||||
|
|
||||||
|
$H3 Resources
|
||||||
|
|
||||||
|
<a href="https://issues.apache.org/jira/browse/MAPREDUCE-751">MAPREDUCE-751</a>
|
||||||
|
is the main JIRA that introduced *Rumen* to *MapReduce*.
|
||||||
|
Look at the MapReduce
|
||||||
|
<a href="https://issues.apache.org/jira/browse/MAPREDUCE/component/12313617">
|
||||||
|
rumen-component</a>for further details.
|
||||||
|
|
||||||
|
|
||||||
|
$H3 Dependencies
|
||||||
|
|
||||||
|
*Rumen* expects certain library *JARs* to be present in
|
||||||
|
the *CLASSPATH*. The required libraries are
|
||||||
|
|
||||||
|
* `Hadoop MapReduce Tools` (`hadoop-mapred-tools-{hadoop-version}.jar`)
|
||||||
|
* `Hadoop Common` (`hadoop-common-{hadoop-version}.jar`)
|
||||||
|
* `Apache Commons Logging` (`commons-logging-1.1.1.jar`)
|
||||||
|
* `Apache Commons CLI` (`commons-cli-1.2.jar`)
|
||||||
|
* `Jackson Mapper` (`jackson-mapper-asl-1.4.2.jar`)
|
||||||
|
* `Jackson Core` (`jackson-core-asl-1.4.2.jar`)
|
||||||
|
|
||||||
|
> One simple way to run Rumen is to use '$HADOOP_HOME/bin/hadoop jar'
|
||||||
|
> option to run it.
|
Loading…
Reference in New Issue