Commit Graph

178 Commits

Author SHA1 Message Date
Ayush Saxena 8378ab9f92 HADOOP-17288. Use shaded guava from thirdparty. Contributed by Ayush Saxena. #2505 2020-12-10 05:50:55 +05:30
swamirishi ba4f7fb332
HADOOP-17122: Preserving Directory Attributes in DistCp with Atomic Copy (#2133)
Contributed by Swaminathan Balachandran

Change-Id: I86f956dd4ab0b278d923fe7b70037e6b929a8aa1
2020-08-22 18:51:10 +01:00
Steve Loughran f2f727359b Revert "HADOOP-14557. Document HADOOP-8143 (Change distcp to have -pb on by default)."
This reverts commit 44350fdf49.

It is related to the rollback of HADOOP-8143.

Change-Id: If48e3dd670c920ada702dc36461ff398fe9d35cc
2020-05-14 19:20:34 +01:00
Steve Loughran 7f2cf334a8
Revert "HADOOP-8143. Change distcp to have -pb on by default."
This reverts commit dd65eea74b.

Change-Id: I74180cf59d5bbad8c9f66cb331535addcbea863e
2020-05-14 19:14:39 +01:00
Steve Loughran e4331a73c9
HADOOP-16932. distcp copy calls getFileStatus() needlessly and can fail against S3 (#1936)
Contributed by Steve Loughran.

This strips out all the -p preservation options which have already been
processed when uploading a file before deciding whether or not to query
the far end for the status of the (existing/uploaded) file to see if any
other attributes need changing.

This will avoid 404 caching-related issues in S3, wherein a newly created
file can have a 404 entry in the S3 load balancer's cache from the
probes for the file's existence prior to the upload.

It partially addresses a regression caused by HADOOP-8143,
"Change distcp to have -pb on by default" that causes a resurfacing
of HADOOP-13145, "In DistCp, prevent unnecessary getFileStatus call when
not preserving metadata"

Change-Id: Ibc25d19e92548e6165eb8397157ebf89446333f7
2020-04-09 18:23:47 +01:00
Sebastian Nagel 18050bc583
HADOOP-16909 Typo in distcp counters.
Contributed by Sebastian Nagel.
2020-03-09 14:37:08 +00:00
Mukund Thakur 819159fa06
HDFS-14788. Use dynamic regex filter to ignore copy of source files in Distcp.
Contributed by Mukund Thakur.

Change-Id: I781387ddce95ee300c12a160dc9a0f7d602403c3
2020-01-06 19:10:39 +00:00
Steve Loughran b6dc00f481
HADOOP-16775. DistCp reuses the same temp file within the task for different files.
Contributed by Amir Shenavandeh.

This avoids overwrite consistency issues with S3 and other stores -though
given S3's copy operation is O(data), you are still best of using -direct
when distcp-ing to it.

Change-Id: I8dc9f048ad0cc57ff01543b849da1ce4eaadf8c3
2020-01-02 15:36:33 +00:00
aasha fccccc9703 HDFS-14869 Copy renamed files which are not excluded anymore by filter (#1530) 2019-12-06 17:41:25 +05:30
pingsutw 14cd969b6e
HADOOP-16512. [hadoop-tools] Fix order of actual and expected expression in assert statements
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2019-10-07 16:38:08 +09:00
Mukund Thakur 51c64b357d
HDFS-13660. DistCp job fails when new data is appended in the file while the DistCp copy job is running
This uses the length of the file known at the start of the copy to determine the amount of data to copy.

* If a file is appended to during the copy, the original bytes are copied.
* If a file is truncated during a copy, or the attempt to read the data fails with a truncated stream,
  distcp will now fail. Until now these failures were not detected.

Contributed by Mukund Thakur.

Change-Id: I576a49d951fa48d37a45a7e4c82c47488aa8e884
2019-09-24 11:23:24 +01:00
KAI XIE c765584eb2 HADOOP-16158. DistCp to support checksum validation when copy blocks in parallel (#919)
* DistCp to support checksum validation when copy blocks in parallel

* address review comments

* add checksums comparison test for combine mode
2019-08-18 18:46:31 -07:00
Ayush Saxena e60f5e2572 HADOOP-16440. Distcp can not preserve timestamp with -delete option. Contributed by ludun. 2019-07-20 13:11:14 +05:30
Steve Loughran 19a001826f
Revert "HDFS-9913. DistCp to add -useTrash to move deleted files to Trash."
Reverting due to test failures if ~/.Trash not present during test setup.

This reverts commit ee3115f488.

Change-Id: Icbeeb261570b9131ff99d765ac0945c335b26658
2019-07-17 13:13:24 +01:00
Shen Yinjie ee3115f488
HDFS-9913. DistCp to add -useTrash to move deleted files to Trash.
Contributed by Shen Yinjie.

Change-Id: I03ac7d22ab1054f8e5de4aa7552909c734438f4a
2019-07-17 11:50:46 +01:00
Takanobu Asanuma 98d2065643 HDFS-12564. Add the documents of swebhdfs configurations on the client side. Contributed by Takanobu Asanuma.
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
2019-06-20 20:17:24 -07:00
Andrew Olson c15b3bca86
HADOOP-16294: Enable access to input options by DistCp subclasses.
Adding a protected-scope getter for the DistCpOptions, so that a subclass does
not need to save its own copy of the inputOptions supplied to its constructor,
if it wishes to override the createInputFileListing method with logic similar
to the original implementation, i.e. calling CopyListing#buildListing with a path and input options.

Author:    Andrew Olson
2019-05-16 16:11:12 +02:00
Giovanni Matteo Fumarola 7a3188d054 HADOOP-16282. Avoid FileStream to improve performance. Contributed by Ayush Saxena. 2019-05-02 12:58:42 -07:00
Masatake Iwasaki bbdbc7a9a1 HADOOP-14544. DistCp documentation for command line options is misaligned. Contributed by Masatake Iwasaki. 2019-04-12 11:52:18 +09:00
Siyao Meng ce4bafdf44
HADOOP-16037. DistCp: Document usage of Sync (-diff option) in detail.
Contributed by Siyao Meng
2019-03-26 18:42:54 +00:00
Andrew Olson faba3591d3
HADOOP-16147. Allow CopyListing sequence file keys and values to be more easily customized.
Author:    Andrew Olson
2019-03-22 10:35:30 +00:00
Ranith Sardar 546c5d70ef
HADOOP-16032. Distcp It should clear sub directory ACL before applying new ACL on. 2019-02-07 21:48:07 +00:00
Andrew Olson de804e53b9
HADOOP-15281. Distcp to add no-rename copy option.
Contributed by Andrew Olson.
2019-02-07 10:07:22 +00:00
Giovanni Matteo Fumarola fb8932a727 HADOOP-16029. Consecutive StringBuilder.append can be reused. Contributed by Ayush Saxena. 2019-01-11 10:54:49 -08:00
Kai Xie 188bebbe7e HADOOP-16018. DistCp won't reassemble chunks when blocks per chunk > 0.
Contributed by Kai Xie.
2019-01-08 11:57:57 +00:00
Akira Ajisaka 7f78397036
Revert "HADOOP-14556. S3A to support Delegation Tokens."
This reverts commit d7152332b3.
2019-01-08 14:51:30 +09:00
Steve Loughran d7152332b3
HADOOP-14556. S3A to support Delegation Tokens.
Contributed by Steve Loughran.
2019-01-07 13:18:03 +00:00
Arpit Agarwal 914b0cf15f HADOOP-12558. distcp documentation is woefully out of date. Contributed by Dinesh Chitlangia. 2018-11-15 13:58:13 -08:00
Ted Yu e2cecb681e HADOOP-15850. CopyCommitter#concatFileChunks should check that the blocks per chunk is not 0. Contributed by Ted Yu.
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
2018-10-19 13:21:06 -07:00
Steve Loughran e36ae9639f
HADOOP-15831. Include modificationTime in the toString method of CopyListingFileStatus.
Contributed by Ted Yu.
2018-10-12 09:59:19 +01:00
Surendra Singh Lilhore 96c4575d73 HDFS-13805. Journal Nodes should allow to format non-empty directories with -force option. Contributed by Surendra Singh Lilhore. 2018-08-24 08:14:57 +05:30
Akira Ajisaka 3e3963b035
HADOOP-15552. Move logging APIs over to slf4j in hadoop-tools - Part2. Contributed by Ian Pickering. 2018-08-16 00:31:59 +09:00
Steve Loughran ca8b80bf59
HADOOP-15384. distcp numListstatusThreads option doesn't get to -delete scan.
Contributed by Steve Loughran.
2018-07-10 10:43:59 +01:00
Akira Ajisaka 2b2399d623
HADOOP-15495. Upgrade commons-lang version to 3.7 in hadoop-common-project and hadoop-tools. Contributed by Takanobu Asanuma. 2018-06-28 14:37:22 +09:00
Xiao Chen 7c9cdad6d0 HDFS-13056. Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts. Contributed by Dennis Huo. 2018-04-10 21:31:48 -07:00
Steve Loughran 1976e0066e HADOOP-15209. DistCp to eliminate needless deletion of files under already-deleted directories.
Contributed by Steve Loughran.
2018-03-15 18:05:14 +00:00
Chris Douglas 45cccadd2e HDFS-12780. Fix spelling mistake in DistCpUtils.java. Contributed by Jianfei Jiang 2018-03-13 11:08:11 -07:00
Steve Loughran 7ef4d942dd HADOOP-15273.distcp can't handle remote stores with different checksum algorithms.
Contributed by Steve Loughran.
2018-03-08 11:24:06 +00:00
Steve Loughran 3bd6b1fd85 HADOOP-15292. Distcp's use of pread is slowing it down.
Contributed by Virajith Jalaparti.
2018-03-08 11:15:46 +00:00
fang zhenyi 4d4dde5112
HADOOP-15223. Replace Collections.EMPTY* with empty* when available
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2018-02-18 22:19:39 +09:00
Anu Engineer 4304fcd5bd HDFS-12990. Change default NameNode RPC port back to 8020. Contributed by Xiao Chen. 2018-02-06 13:43:45 -08:00
Arpit Agarwal d4e13a4647 HADOOP-15198. Correct the spelling in CopyFilter.java. Contributed by Mukul Kumar Singh. 2018-02-02 11:37:51 -08:00
Surendra Singh Lilhore 00129c5314 HDFS-12833. Distcp : Update the usage of delete option for dependency with update and overwrite option. Contributed by usharani. 2017-12-12 00:28:02 +05:30
Akira Ajisaka cc3f3eca40
MAPREDUCE-6999. Fix typo onf in DynamicInputChunk.java. Contributed by fang zhenyi. 2017-11-02 18:32:24 +09:00
Steve Loughran f36cbc8475 HADOOP-14942. DistCp#cleanup() should check whether jobFS is null.
Contributed by Andras Bokor.
2017-10-20 22:27:04 +01:00
ChenSammi e0b3c644e1 HDFS-12414. Ensure to use CLI command to enable/disable erasure coding policy. Contributed by Sammi Chen 2017-09-14 09:15:29 +08:00
Xiaoyu Yao 63720ef574 HADOOP-14839. DistCp log output should contain copied and deleted files and directories. Contributed by Yiqun Lin. 2017-09-05 23:34:55 -07:00
Andrew Wang f29a0fc288 HDFS-12303. Change default EC cell size to 1MB for better performance. Contributed by Wei Zhou. 2017-08-25 14:14:23 -07:00
Andrew Wang dd7916d3cd HDFS-12250. Reduce usage of FsPermissionExtension in unit tests. Contributed by Chris Douglas. 2017-08-17 09:35:36 -07:00
Sean Mackrory 1a1bf6b7d0 HADOOP-13595. Rework hadoop_usage to be broken up by clients/daemons/etc. Contributed by Allen Wittenauer. 2017-08-02 12:25:05 -06:00