diff --git a/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm b/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm index 030f20c23b6..a351ec5a376 100644 --- a/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm +++ b/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm @@ -49,7 +49,7 @@ Overview [The erstwhile implementation of DistCp] (http://hadoop.apache.org/docs/r1.2.1/distcp.html) has its share of quirks - and drawbacks, both in its usage, as well as its extensibility and + and drawbacks, both in its usage and its extensibility and performance. The purpose of the DistCp refactor was to fix these shortcomings, enabling it to be used and extended programmatically. New paradigms have been introduced to improve runtime and setup performance, @@ -179,7 +179,7 @@ $H3 Update and Overwrite hdfs://nn2:8020/target/10 32 hdfs://nn2:8020/target/20 64 - Will effect: + The result will be: hdfs://nn2:8020/target/1 32 hdfs://nn2:8020/target/2 32 @@ -190,7 +190,7 @@ $H3 Update and Overwrite because it doesn't exist at the target. `10` and `20` are overwritten since the contents don't match the source. - If `-update` is used, `1` is skipped because the file-length and contents match. `2` is copied because it doesn’t exist at the target. `10` and `20` are overwritten since the contents don’t match the source. However, if `-append` is additionally used, then only `10` is overwritten (source length less than destination) and `20` is appended with the change in file (if the files match up to the destination's original length). + If `-update` is used, `1` is skipped because the file-length and contents match. `2` is copied because it doesn't exist at the target. `10` and `20` are overwritten since the contents don’t match the source. However, if `-append` is additionally used, then only `10` is overwritten (source length less than destination) and `20` is appended with the change in file (if the files match up to the destination's original length). If `-overwrite` is used, `1` is overwritten as well. @@ -269,7 +269,7 @@ $H4 Experiment 1: Syncing diff of two adjacent snapshots $H4 Experiment 2: syncing diff of two non-adjacent snapshots - First do a clean up from Experiment 1. + First do a cleanup from Experiment 1. hdfs dfs -rm -skipTrash /dst/1.txt @@ -514,7 +514,7 @@ $H3 InputFormats and MapReduce Components * A file with the same name exists at target, but `-overwrite` is specified. * A file with the same name exists at target, but differs in block-size - (and block-size needs to be preserved. + and block-size needs to be preserved. * **CopyCommitter:** This class is responsible for the commit-phase of the DistCp job, including: @@ -576,7 +576,7 @@ $H3 MapReduce and other side-effects map on a re-execution will be marked as "skipped". * If a map fails `mapreduce.map.maxattempts` times, the remaining map tasks will be killed (unless `-i` is set). - * If `mapreduce.map.speculative` is set set final and true, the result of the + * If `mapreduce.map.speculative` is set to be true, the result of the copy is undefined. $H3 DistCp and Object Stores @@ -691,7 +691,7 @@ Frequently Asked Questions directory is copied over, rather than the source-directory itself. This behaviour is consistent with the legacy DistCp implementation as well. - 2. **How does the new DistCp differ in semantics from the Legacy DistCp?** + 2. **How does the new DistCp differs in semantics from the Legacy DistCp?** * Files that are skipped during copy used to also have their file-attributes (permissions, owner/group info, etc.) unchanged, when