e4331a73c9
Contributed by Steve Loughran. This strips out all the -p preservation options which have already been processed when uploading a file before deciding whether or not to query the far end for the status of the (existing/uploaded) file to see if any other attributes need changing. This will avoid 404 caching-related issues in S3, wherein a newly created file can have a 404 entry in the S3 load balancer's cache from the probes for the file's existence prior to the upload. It partially addresses a regression caused by HADOOP-8143, "Change distcp to have -pb on by default" that causes a resurfacing of HADOOP-13145, "In DistCp, prevent unnecessary getFileStatus call when not preserving metadata" Change-Id: Ibc25d19e92548e6165eb8397157ebf89446333f7 |
||
---|---|---|
.. | ||
src | ||
README | ||
pom.xml |
README
DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.