diff --git a/hadoop-common-project/hadoop-common/CHANGES.txt b/hadoop-common-project/hadoop-common/CHANGES.txt
index 6dc8e177cd8..7257ba1c2e6 100644
--- a/hadoop-common-project/hadoop-common/CHANGES.txt
+++ b/hadoop-common-project/hadoop-common/CHANGES.txt
@@ -548,6 +548,12 @@ Release 2.6.0 - UNRELEASED
HADOOP-11101. How about inputstream close statement from catch block to
finally block in FileContext#copy() ( skrho via vinayakumarb )
+ HADOOP-8808. Update FsShell documentation to mention deprecation of some of
+ the commands, and mention alternatives (Akira AJISAKA via aw)
+
+ HADOOP-10954. Adding site documents of hadoop-tools (Masatake Iwasaki
+ via aw)
+
OPTIMIZATIONS
HADOOP-10838. Byte array native checksumming. (James Thomas via todd)
@@ -606,6 +612,9 @@ Release 2.6.0 - UNRELEASED
HADOOP-11111 MiniKDC to use locale EN_US for case conversions. (stevel)
+ HADOOP-10731. Remove @date JavaDoc comment in ProgramDriver class (Henry
+ Saputra via aw)
+
BUG FIXES
HADOOP-10781. Unportable getgrouplist() usage breaks FreeBSD (Dmitry
@@ -867,6 +876,12 @@ Release 2.6.0 - UNRELEASED
HADOOP-11064. UnsatisifedLinkError with hadoop 2.4 JARs on hadoop-2.6 due to
NativeCRC32 method changes. (cnauroth)
+ HADOOP-11048. user/custom LogManager fails to load if the client
+ classloader is enabled (Sangjin Lee via jlowe)
+
+ HADOOP-10552. Fix usage and example at FileSystemShell.apt.vm (Kenji
+ Kikushima via aw)
+
Release 2.5.1 - 2014-09-05
INCOMPATIBLE CHANGES
diff --git a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GenericOptionsParser.java b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GenericOptionsParser.java
index 2a37dac460d..d0e765529c7 100644
--- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GenericOptionsParser.java
+++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GenericOptionsParser.java
@@ -49,9 +49,9 @@ import org.apache.hadoop.security.UserGroupInformation;
* GenericOptionsParser
is a utility to parse command line
* arguments generic to the Hadoop framework.
*
- * GenericOptionsParser
recognizes several standarad command
+ * GenericOptionsParser
recognizes several standard command
* line arguments, enabling applications to easily specify a namenode, a
- * jobtracker, additional configuration resources etc.
+ * ResourceManager, additional configuration resources etc.
*
*
Parameter | +Description | +
---|---|
+ gridmix.output.directory
+ |
+ The directory into which output will be written. If specified,
+ iopath will be relative to this parameter. The
+ submitting user must have read/write access to this directory. The
+ user should also be mindful of any quota issues that may arise
+ during a run. The default is "gridmix ". |
+
+ gridmix.client.submit.threads
+ |
+ The number of threads submitting jobs to the cluster. This + also controls how many splits will be loaded into memory at a given + time, pending the submit time in the trace. Splits are pre-generated + to hit submission deadlines, so particularly dense traces may want + more submitting threads. However, storing splits in memory is + reasonably expensive, so you should raise this cautiously. The + default is 1 for the SERIAL job-submission policy (see + Job Submission Policies) and one more than + the number of processors on the client machine for the other + policies. | +
+ gridmix.submit.multiplier
+ |
+ The multiplier to accelerate or decelerate the submission of + jobs. The time separating two jobs is multiplied by this factor. + The default value is 1.0. This is a crude mechanism to size + a job trace to a cluster. | +
+ gridmix.client.pending.queue.depth
+ |
+ The depth of the queue of job descriptions awaiting split + generation. The jobs read from the trace occupy a queue of this + depth before being processed by the submission threads. It is + unusual to configure this. The default is 5. | +
+ gridmix.gen.blocksize
+ |
+ The block-size of generated data. The default value is 256 + MiB. | +
+ gridmix.gen.bytes.per.file
+ |
+ The maximum bytes written per file. The default value is 1 + GiB. | +
+ gridmix.min.file.size
+ |
+ The minimum size of the input files. The default limit is 128 + MiB. Tweak this parameter if you see an error-message like + "Found no satisfactory file" while testing GridMix with + a relatively-small input data-set. | +
+ gridmix.max.total.scan
+ |
+ The maximum size of the input files. The default limit is 100 + TiB. | +
+ gridmix.task.jvm-options.enable
+ |
+ Enables Gridmix to configure the simulated task's max heap + options using the values obtained from the original task (i.e via + trace). + | +
Job Type | +Description | +
---|---|
+ LOADJOB
+ |
+ A synthetic job that emulates the workload mentioned in Rumen + trace. In the current version we are supporting I/O. It reproduces + the I/O workload on the benchmark cluster. It does so by embedding + the detailed I/O information for every map and reduce task, such as + the number of bytes and records read and written, into each + job's input splits. The map tasks further relay the I/O patterns of + reduce tasks through the intermediate map output data. | +
+ SLEEPJOB
+ |
+ A synthetic job where each task does *nothing* but sleep + for a certain duration as observed in the production trace. The + scalability of the Job Tracker is often limited by how many + heartbeats it can handle every second. (Heartbeats are periodic + messages sent from Task Trackers to update their status and grab new + tasks from the Job Tracker.) Since a benchmark cluster is typically + a fraction in size of a production cluster, the heartbeat traffic + generated by the slave nodes is well below the level of the + production cluster. One possible solution is to run multiple Task + Trackers on each slave node. This leads to the obvious problem that + the I/O workload generated by the synthetic jobs would thrash the + slave nodes. Hence the need for such a job. | +
Parameter | +Description | +
---|---|
+ gridmix.job.type
+ |
+ The value for this key can be one of LOADJOB or SLEEPJOB. The + default value is LOADJOB. | +
+ gridmix.key.fraction
+ |
+ For a LOADJOB type of job, the fraction of a record used for + the data for the key. The default value is 0.1. | +
+ gridmix.sleep.maptask-only
+ |
+ For a SLEEPJOB type of job, whether to ignore the reduce
+ tasks for the job. The default is false . |
+
+ gridmix.sleep.fake-locations
+ |
+ For a SLEEPJOB type of job, the number of fake locations + for map tasks for the job. The default is 0. | +
+ gridmix.sleep.max-map-time
+ |
+ For a SLEEPJOB type of job, the maximum runtime for map + tasks for the job in milliseconds. The default is unlimited. | +
+ gridmix.sleep.max-reduce-time
+ |
+ For a SLEEPJOB type of job, the maximum runtime for reduce + tasks for the job in milliseconds. The default is unlimited. | +
Job Submission Policy | +Description | +
---|---|
+ STRESS
+ |
+ Keep submitting jobs so that the cluster remains under stress.
+ In this mode we control the rate of job submission by monitoring
+ the real-time load of the cluster so that we can maintain a stable
+ stress level of workload on the cluster. Based on the statistics we
+ gather we define if a cluster is *underloaded* or
+ *overloaded* . We consider a cluster *underloaded* if
+ and only if the following three conditions are true:
+
|
+
+ REPLAY
+ |
+ In this mode we replay the job traces faithfully. This mode + exactly follows the time-intervals given in the actual job + trace. | +
+ SERIAL
+ |
+ In this mode we submit the next job only once the job submitted + earlier is completed. | +
Parameter | +Description | +
---|---|
+ gridmix.job-submission.policy
+ |
+ The value for this key would be one of the three: STRESS, REPLAY + or SERIAL. In most of the cases the value of key would be STRESS or + REPLAY. The default value is STRESS. | +
+ gridmix.throttle.jobs-to-tracker-ratio
+ |
+ In STRESS mode, the minimum ratio of running jobs to Task + Trackers in a cluster for the cluster to be considered + *overloaded* . This is the threshold TJ referred to earlier. + The default is 1.0. | +
+ gridmix.throttle.maps.task-to-slot-ratio
+ |
+ In STRESS mode, the minimum ratio of pending and running map + tasks (i.e. incomplete map tasks) to the number of map slots for + a cluster for the cluster to be considered *overloaded* . + This is the threshold TM referred to earlier. Running map tasks are + counted partially. For example, a 40% complete map task is counted + as 0.6 map tasks. The default is 2.0. | +
+ gridmix.throttle.reduces.task-to-slot-ratio
+ |
+ In STRESS mode, the minimum ratio of pending and running reduce + tasks (i.e. incomplete reduce tasks) to the number of reduce slots + for a cluster for the cluster to be considered *overloaded* . + This is the threshold TR referred to earlier. Running reduce tasks + are counted partially. For example, a 30% complete reduce task is + counted as 0.7 reduce tasks. The default is 2.5. | +
+ gridmix.throttle.maps.max-slot-share-per-job
+ |
+ In STRESS mode, the maximum share of a cluster's map-slots + capacity that can be counted toward a job's incomplete map tasks in + overload calculation. The default is 0.1. | +
+ gridmix.throttle.reducess.max-slot-share-per-job
+ |
+ In STRESS mode, the maximum share of a cluster's reduce-slots + capacity that can be counted toward a job's incomplete reduce tasks + in overload calculation. The default is 0.1. | +
Parameter | +Description | +
---|---|
+ gridmix.job-submission.use-queue-in-trace
+ |
+ When set to true it uses exactly the same set of
+ queues as those mentioned in the trace. The default value is
+ false . |
+
+ gridmix.job-submission.default-queue
+ |
+ Specifies the default queue to which all the jobs would be + submitted. If this parameter is not specified, GridMix uses the + default queue defined for the submitting user on the cluster. | +
+ gridmix.user.resolve.class
+ |
+ Specifies which UserResolver implementation to use.
+ We currently have three implementations:
+
org.apache.hadoop.mapred.gridmix.SubmitterUserResolver . |
+
Parameter | +Description | +
---|---|
+ gridmix.job.original-job-id
+ |
+ The job id of the original cluster's job corresponding to this + simulated job. + | +
+ gridmix.job.original-job-name
+ |
+ The job name of the original cluster's job corresponding to this + simulated job. + | +
Parameter | +Description | +
---|---|
gridmix.compression-emulation.enable | +Enables compression emulation in simulated GridMix jobs. + Default is true. | +
Parameter | +Description | +Notes | +
---|---|---|
-demuxer |
+ Used to read the jobhistory files. The default is
+ DefaultInputDemuxer . |
+ Demuxer decides how the input file maps to jobhistory file(s). + Job history logs and job configuration files are typically small + files, and can be more effectively stored when embedded in some + container file format like SequenceFile or TFile. To support such + usage cases, one can specify a customized Demuxer class that can + extract individual job history logs and job configuration files + from the source files. + | +
-recursive |
+ Recursively traverse input paths for job history logs. | +This option should be used to inform the TraceBuilder to + recursively scan the input paths and process all the files under it. + Note that, by default, only the history logs that are directly under + the input folder are considered for generating the trace. + | +
Parameter | +Description | +Notes | +
---|---|---|
-input-cycle |
+ Defines the basic unit of time for the folding operation. There is
+ no default value for input-cycle .
+ Input cycle must be provided.
+ |
+ '-input-cycle 10m '
+ implies that the whole trace run will be now sliced at a 10min
+ interval. Basic operations will be done on the 10m chunks. Note
+ that *Rumen* understands various time units like
+ m(min), h(hour), d(days) etc.
+ |
+
-output-duration |
+ This parameter defines the final runtime of the trace. + Default value if 1 hour. + | +'-output-duration 30m '
+ implies that the resulting trace will have a max runtime of
+ 30mins. All the jobs in the input trace file will be folded and
+ scaled to fit this window.
+ |
+
-concentration |
+ Set the concentration of the resulting trace. Default value is + 1. + | +If the total runtime of the resulting trace is less than the total + runtime of the input trace, then the resulting trace would contain + lesser number of jobs as compared to the input trace. This + essentially means that the output is diluted. To increase the + density of jobs, set the concentration to a higher value. | +
-debug |
+ Run the Folder in debug mode. By default it is set to + false. | +In debug mode, the Folder will print additional statements for + debugging. Also the intermediate files generated in the scratch + directory will not be cleaned up. + | +
-seed |
+ Initial seed to the Random Number Generator. By default, a Random + Number Generator is used to generate a seed and the seed value is + reported back to the user for future use. + | +If an initial seed is passed, then the Random Number
+ Generator will generate the random numbers in the same
+ sequence i.e the sequence of random numbers remains same if the
+ same seed is used. Folder uses Random Number Generator to decide
+ whether or not to emit the job.
+ |
+
-temp-directory |
+ Temporary directory for the Folder. By default the output + folder's parent directory is used as the scratch space. + | +This is the scratch space used by Folder. All the
+ temporary files are cleaned up in the end unless the Folder is run
+ in debug mode. |
+
-skew-buffer-length |
+ Enables Folder to tolerate skewed jobs. + The default buffer length is 0. | +'-skew-buffer-length 100 '
+ indicates that if the jobs appear out of order within a window
+ size of 100, then they will be emitted in-order by the folder.
+ If a job appears out-of-order outside this window, then the Folder
+ will bail out provided -allow-missorting is not set.
+ Folder reports the maximum skew size seen in the
+ input trace for future use.
+ |
+
-allow-missorting |
+ Enables Folder to tolerate out-of-order jobs. By default + mis-sorting is not allowed. + | +If mis-sorting is allowed, then the Folder will ignore
+ out-of-order jobs that cannot be deskewed using a skew buffer of
+ size specified using -skew-buffer-length . If
+ mis-sorting is not allowed, then the Folder will bail out if the
+ skew buffer is incapable of tolerating the skew.
+ |
+