hadoop/hadoop-tools/hadoop-distcp
Mukund Thakur 3937abddbd HDFS-13660. DistCp job fails when new data is appended in the file while the DistCp copy job is running
This uses the length of the file known at the start of the copy to determine the amount of data to copy.

* If a file is appended to during the copy, the original bytes are copied.
* If a file is truncated during a copy, or the attempt to read the data fails with a truncated stream,
  distcp will now fail. Until now these failures were not detected.

Contributed by Mukund Thakur.

Change-Id: I576a49d951fa48d37a45a7e4c82c47488aa8e884
(cherry picked from commit 51c64b357d)
2020-02-27 16:37:03 -08:00
..
src HDFS-13660. DistCp job fails when new data is appended in the file while the DistCp copy job is running 2020-02-27 16:37:03 -08:00
README HADOOP-11437. Remove the version and author information from distcp's README file (Brahma Reddy Battula via aw) 2015-02-11 15:47:36 -08:00
pom.xml HADOOP-16808. Use forkCount and reuseForks parameters instead of forkMode in the config of maven surefire plugin. Contributed by Xieming Li. 2020-01-21 18:03:56 +09:00

README

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. 
It uses Map/Reduce to effect its distribution, error handling and recovery, 
and reporting. It expands a list of files and directories into input to map tasks, 
each of which will copy a partition of the files specified in the source list.