Commit Graph

234 Commits

Author SHA1 Message Date
Andrew Olson c15b3bca86
HADOOP-16294: Enable access to input options by DistCp subclasses.
Adding a protected-scope getter for the DistCpOptions, so that a subclass does
not need to save its own copy of the inputOptions supplied to its constructor,
if it wishes to override the createInputFileListing method with logic similar
to the original implementation, i.e. calling CopyListing#buildListing with a path and input options.

Author:    Andrew Olson
2019-05-16 16:11:12 +02:00
Giovanni Matteo Fumarola 7a3188d054 HADOOP-16282. Avoid FileStream to improve performance. Contributed by Ayush Saxena. 2019-05-02 12:58:42 -07:00
Masatake Iwasaki bbdbc7a9a1 HADOOP-14544. DistCp documentation for command line options is misaligned. Contributed by Masatake Iwasaki. 2019-04-12 11:52:18 +09:00
Siyao Meng ce4bafdf44
HADOOP-16037. DistCp: Document usage of Sync (-diff option) in detail.
Contributed by Siyao Meng
2019-03-26 18:42:54 +00:00
Andrew Olson faba3591d3
HADOOP-16147. Allow CopyListing sequence file keys and values to be more easily customized.
Author:    Andrew Olson
2019-03-22 10:35:30 +00:00
Ranith Sardar 546c5d70ef
HADOOP-16032. Distcp It should clear sub directory ACL before applying new ACL on. 2019-02-07 21:48:07 +00:00
Andrew Olson de804e53b9
HADOOP-15281. Distcp to add no-rename copy option.
Contributed by Andrew Olson.
2019-02-07 10:07:22 +00:00
Akira Ajisaka 1129288cf5
HADOOP-14178. Move Mockito up to version 2.23.4. Contributed by Akira Ajisaka and Masatake Iwasaki. 2019-01-29 18:29:56 -08:00
Giovanni Matteo Fumarola fb8932a727 HADOOP-16029. Consecutive StringBuilder.append can be reused. Contributed by Ayush Saxena. 2019-01-11 10:54:49 -08:00
Kai Xie 188bebbe7e HADOOP-16018. DistCp won't reassemble chunks when blocks per chunk > 0.
Contributed by Kai Xie.
2019-01-08 11:57:57 +00:00
Akira Ajisaka 7f78397036
Revert "HADOOP-14556. S3A to support Delegation Tokens."
This reverts commit d7152332b3.
2019-01-08 14:51:30 +09:00
Steve Loughran d7152332b3
HADOOP-14556. S3A to support Delegation Tokens.
Contributed by Steve Loughran.
2019-01-07 13:18:03 +00:00
Arpit Agarwal 914b0cf15f HADOOP-12558. distcp documentation is woefully out of date. Contributed by Dinesh Chitlangia. 2018-11-15 13:58:13 -08:00
Ted Yu e2cecb681e HADOOP-15850. CopyCommitter#concatFileChunks should check that the blocks per chunk is not 0. Contributed by Ted Yu.
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
2018-10-19 13:21:06 -07:00
Steve Loughran e36ae9639f
HADOOP-15831. Include modificationTime in the toString method of CopyListingFileStatus.
Contributed by Ted Yu.
2018-10-12 09:59:19 +01:00
Sunil G 58fa96b697 Changed version in trunk to 3.3.0-SNAPSHOT. 2018-10-02 22:41:41 +05:30
Surendra Singh Lilhore 96c4575d73 HDFS-13805. Journal Nodes should allow to format non-empty directories with -force option. Contributed by Surendra Singh Lilhore. 2018-08-24 08:14:57 +05:30
Akira Ajisaka 3e3963b035
HADOOP-15552. Move logging APIs over to slf4j in hadoop-tools - Part2. Contributed by Ian Pickering. 2018-08-16 00:31:59 +09:00
Steve Loughran ca8b80bf59
HADOOP-15384. distcp numListstatusThreads option doesn't get to -delete scan.
Contributed by Steve Loughran.
2018-07-10 10:43:59 +01:00
Akira Ajisaka 2b2399d623
HADOOP-15495. Upgrade commons-lang version to 3.7 in hadoop-common-project and hadoop-tools. Contributed by Takanobu Asanuma. 2018-06-28 14:37:22 +09:00
Xiao Chen 7c9cdad6d0 HDFS-13056. Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts. Contributed by Dennis Huo. 2018-04-10 21:31:48 -07:00
Steve Loughran 1976e0066e HADOOP-15209. DistCp to eliminate needless deletion of files under already-deleted directories.
Contributed by Steve Loughran.
2018-03-15 18:05:14 +00:00
Chris Douglas 45cccadd2e HDFS-12780. Fix spelling mistake in DistCpUtils.java. Contributed by Jianfei Jiang 2018-03-13 11:08:11 -07:00
Steve Loughran 7ef4d942dd HADOOP-15273.distcp can't handle remote stores with different checksum algorithms.
Contributed by Steve Loughran.
2018-03-08 11:24:06 +00:00
Steve Loughran 3bd6b1fd85 HADOOP-15292. Distcp's use of pread is slowing it down.
Contributed by Virajith Jalaparti.
2018-03-08 11:15:46 +00:00
fang zhenyi 4d4dde5112
HADOOP-15223. Replace Collections.EMPTY* with empty* when available
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2018-02-18 22:19:39 +09:00
Wangda Tan 60f9e60b3b Preparing for 3.2.0 development
Change-Id: I6d0e01f3d665d26573ef2b957add1cf0cddf7938
2018-02-11 11:17:38 +08:00
Anu Engineer 4304fcd5bd HDFS-12990. Change default NameNode RPC port back to 8020. Contributed by Xiao Chen. 2018-02-06 13:43:45 -08:00
Arpit Agarwal d4e13a4647 HADOOP-15198. Correct the spelling in CopyFilter.java. Contributed by Mukul Kumar Singh. 2018-02-02 11:37:51 -08:00
Surendra Singh Lilhore 00129c5314 HDFS-12833. Distcp : Update the usage of delete option for dependency with update and overwrite option. Contributed by usharani. 2017-12-12 00:28:02 +05:30
Akira Ajisaka cc3f3eca40
MAPREDUCE-6999. Fix typo onf in DynamicInputChunk.java. Contributed by fang zhenyi. 2017-11-02 18:32:24 +09:00
Steve Loughran f36cbc8475 HADOOP-14942. DistCp#cleanup() should check whether jobFS is null.
Contributed by Andras Bokor.
2017-10-20 22:27:04 +01:00
ChenSammi e0b3c644e1 HDFS-12414. Ensure to use CLI command to enable/disable erasure coding policy. Contributed by Sammi Chen 2017-09-14 09:15:29 +08:00
Xiaoyu Yao 63720ef574 HADOOP-14839. DistCp log output should contain copied and deleted files and directories. Contributed by Yiqun Lin. 2017-09-05 23:34:55 -07:00
Andrew Wang 0d419c984f Preparing for 3.1.0 development 2017-09-01 11:53:48 -07:00
Andrew Wang f29a0fc288 HDFS-12303. Change default EC cell size to 1MB for better performance. Contributed by Wei Zhou. 2017-08-25 14:14:23 -07:00
Andrew Wang dd7916d3cd HDFS-12250. Reduce usage of FsPermissionExtension in unit tests. Contributed by Chris Douglas. 2017-08-17 09:35:36 -07:00
Sean Mackrory 1a1bf6b7d0 HADOOP-13595. Rework hadoop_usage to be broken up by clients/daemons/etc. Contributed by Allen Wittenauer. 2017-08-02 12:25:05 -06:00
Wei-Chiu Chuang 44350fdf49 HADOOP-14557. Document HADOOP-8143 (Change distcp to have -pb on by default). Contributed by Bharat Viswanadham. 2017-07-20 18:23:13 -07:00
Andrew Wang af2773f609 Updating version for 3.0.0-beta1 development 2017-06-29 17:57:40 -07:00
Jason Lowe dd65eea74b HADOOP-8143. Change distcp to have -pb on by default. Contributed by Mithun Radhakrishnan 2017-06-20 09:53:47 -05:00
Andrew Wang 16ad896d5c Update maven version for 3.0.0-alpha4 development 2017-05-26 14:09:44 -07:00
Sunil G b6f66b0da1 YARN-6584. Correct license headers in hadoop-common, hdfs, yarn and mapreduce. Contributed by Yeliang Cang. 2017-05-22 14:10:06 +05:30
Yongjun Zhang b4adc8392c HADOOP-14407. DistCp - Introduce a configurable copy buffer size. (Omkar Aradhya K S via Yongjun Zhang) 2017-05-18 15:35:22 -07:00
Mingliang Liu 26172a94d6 HADOOP-14267. Make DistCpOptions immutable. Contributed by Mingliang Liu 2017-03-31 20:04:26 -07:00
Yongjun Zhang bf3fb585aa HADOOP-11794. Enable distcp to copy blocks in parallel. Contributed by Yongjun Zhang, Wei-Chiu Chuang, Xiao Chen, Rosie Li. 2017-03-30 17:38:56 -07:00
Yongjun Zhang 144f1cf765 Revert "HADOOP-11794. Enable distcp to copy blocks in parallel. Contributed by Yongjun Zhang, Wei-Chiu Chuang, Xiao Chen."
This reverts commit 064c8b25ec.
2017-03-30 17:38:18 -07:00
Yongjun Zhang 064c8b25ec HADOOP-11794. Enable distcp to copy blocks in parallel. Contributed by Yongjun Zhang, Wei-Chiu Chuang, Xiao Chen. 2017-03-30 17:01:15 -07:00
Wei-Chiu Chuang 8c591b8d19 HDFS-10974. Document replication factor for EC files. Contributed by Yiqun Lin. 2017-03-30 11:16:05 -07:00
Andrew Wang 0e6f8e4bc6 HDFS-10971. Distcp should not copy replication factor if source file is erasure coded. Contributed by Manoj Govindassamy. 2017-03-28 22:14:03 -07:00
Yongjun Zhang d235dcdf0b HADOOP-14127. Add log4j configuration to enable logging in hadoop-distcp's tests. (Xiao Chen via Yongjun Zhang) 2017-02-27 20:42:13 -08:00
Andrew Wang 5d8b80ea9b Preparing for 3.0.0-alpha3 development 2017-01-19 15:50:07 -08:00
Steve Loughran ed33ce11dd HADOOP-13496. Include file lengths in Mismatch in length error for distcp. Contributed by Ted Yu
(cherry picked from commit 77401bd5fc)
2017-01-19 11:25:40 +00:00
Chris Nauroth 4c8f9e1302 HDFS-9483. Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS. Contributed by Surendra Singh Lilhore. 2017-01-05 15:04:47 -08:00
Akira Ajisaka 209e805430 HADOOP-13506. Redundant groupid warning in child projects. Contributed by Kai Sasaki. 2016-11-28 14:34:57 +09:00
Mingliang Liu beb70fed4f HADOOP-13655. document object store use with fs shell and distcp. Contributed by Steve Loughran
This closes #131
2016-11-22 13:12:23 -08:00
Mingliang Liu 5af572b644 HADOOP-13427. Eliminate needless uses of FileSystem#{exists(), isFile(), isDirectory()}. Contributed by Steve Loughran and Mingliang Liu 2016-11-15 10:57:00 -08:00
Masatake Iwasaki 0bdd263d82 HADOOP-13017. Implementations of InputStream.read(buffer, offset, bytes) to exit 0 if bytes==0. Contributed by Steve Loughran. 2016-10-27 15:46:59 +09:00
Yongjun Zhang 0f0c15f7a5 HDFS-11040. Add documentation for HDFS-9820 distcp improvement. Contributed by Yongjun Zhang. 2016-10-25 12:25:40 -07:00
Yongjun Zhang 3a60573039 Revert "Fix HDFS-11040"
This reverts commit 54c1815790.
2016-10-25 12:25:02 -07:00
Yongjun Zhang 54c1815790 Fix HDFS-11040 2016-10-25 12:19:34 -07:00
Chris Douglas a1a0281e12 HADOOP-13626. Remove distcp dependency on FileStatus serialization 2016-10-24 12:46:54 -07:00
Yongjun Zhang 8650cc84f2 HDFS-9820. Improve distcp to support efficient restore to an earlier snapshot. Contributed by Yongjun Zhang. 2016-10-19 17:37:54 -07:00
Xiao Chen efdf810cf9 HADOOP-7352. FileSystem#listStatus should throw IOE upon access error. Contributed by John Zhuge. 2016-10-18 18:18:43 -07:00
Yongjun Zhang 0bc6d37f3c Revert "HDFS-9820. Improve distcp to support efficient restore to an earlier snapshot. Contributed by Yongjun Zhang."
This reverts commit 412c4c9a34.
2016-10-17 22:47:37 -07:00
Yongjun Zhang 412c4c9a34 HDFS-9820. Improve distcp to support efficient restore to an earlier snapshot. Contributed by Yongjun Zhang. 2016-10-17 11:04:42 -07:00
Jing Zhao 0a85d07983 HADOOP-13024. Distcp with -delete feature on raw data not implemented. Contributed by Mavin Martin. 2016-10-13 13:24:54 -07:00
Brahma Reddy Battula e17a4970be HDFS-9885. Correct the distcp counters name while displaying counters. Contributed by Surendra Singh Lilhore 2016-09-27 10:45:12 +05:30
Steve Loughran e5ef51e717 HADOOP-13643. Math error in AbstractContractDistCpTest. Contributed by Aaron Fabbri. 2016-09-23 10:01:30 +01:00
Chris Nauroth 98bdb51397 HADOOP-13169. Randomize file list in SimpleCopyListing. Contributed by Rajesh Balamohan. 2016-09-19 15:16:47 -07:00
Allen Wittenauer 58ed4fa544 HADOOP-13341. Deprecate HADOOP_SERVERNAME_OPTS; replace with (command)_(subcommand)_OPTS
This commit includes the following changes:

	HADOOP-13356. Add a function to handle command_subcommand_OPTS
	HADOOP-13355. Handle HADOOP_CLIENT_OPTS in a function
	HADOOP-13554. Add an equivalent of hadoop_subcmd_opts for secure opts
	HADOOP-13562. Change hadoop_subcommand_opts to use only uppercase
	HADOOP-13358. Modify HDFS to use hadoop_subcommand_opts
	HADOOP-13357. Modify common to use hadoop_subcommand_opts
	HADOOP-13359. Modify YARN to use hadoop_subcommand_opts
	HADOOP-13361. Modify hadoop_verify_user to be consistent with hadoop_subcommand_opts (ie more granularity)
	HADOOP-13564. modify mapred to use hadoop_subcommand_opts
	HADOOP-13563. hadoop_subcommand_opts should print name not actual content during debug
	HADOOP-13360. Documentation for HADOOP_subcommand_OPTS

This closes apache/hadoop#126
2016-09-12 11:10:00 -07:00
Ravi Prakash 9faccd1046 HADOOP-13587. distcp.map.bandwidth.mb is overwritten even when -bandwidth flag isn't set. Contributed by Zoran Dimitrijevic 2016-09-12 08:26:08 -07:00
Andrew Wang da456ffd62 Preparing for 3.0.0-alpha2 development 2016-07-15 19:04:17 -07:00
Andrew Wang f292624bd8 HDFS-10300. TestDistCpSystem should share MiniDFSCluster. Contributed by John Zhuge. 2016-07-11 18:06:28 -07:00
Yongjun Zhang 8113855b3a HDFS-10396. Using -diff option with DistCp may get "Comparison method violates its general contract" exception. Contributed by Yongjun Zhang. 2016-06-28 23:15:13 -07:00
Allen Wittenauer 422c73a865 HADOOP-13034. Log message about input options in distcp lacks some items (Takashi Ohnishi via aw) 2016-06-28 07:21:04 -07:00
Yongjun Zhang cfb860dee7 HADOOP-13199. Add doc for distcp -filters. (John Zhuge via Yongjun Zhang) 2016-05-26 23:30:31 -07:00
Steve Loughran c918286b17 HADOOP-13145 In DistCp, prevent unnecessary getFileStatus call when not preserving metadata. Contributed by Chris Nauroth. 2016-05-20 12:21:59 +01:00
Jing Zhao 03788d3015 HDFS-10397. Distcp should ignore -delete option if -diff option is provided instead of exiting. Contributed by Mingliang Liu. 2016-05-17 15:46:30 -07:00
Steve Loughran c69a649257 HADOOP-13163 Reuse pre-computed filestatus in Distcp-CopyMapper (Rajesh Balamohan via stevel) 2016-05-17 13:00:18 +01:00
Allen Wittenauer 730bc746f9 HADOOP-12930. Dynamic subcommands for hadoop shell scripts (aw)
This commit contains the following JIRA issues:

    HADOOP-12931. bin/hadoop work for dynamic subcommands
    HADOOP-12932. bin/yarn work for dynamic subcommands
    HADOOP-12933. bin/hdfs work for dynamic subcommands
    HADOOP-12934. bin/mapred work for dynamic subcommands
    HADOOP-12935. API documentation for dynamic subcommands
    HADOOP-12936. modify hadoop-tools to take advantage of dynamic subcommands
    HADOOP-13086. enable daemonization of dynamic commands
    HADOOP-13087. env var doc update for dynamic commands
    HADOOP-13088. fix shellprofiles in hadoop-tools to allow replacement
    HADOOP-13089. hadoop distcp adds client opts twice when dynamic
    HADOOP-13094. hadoop-common unit tests for dynamic commands
    HADOOP-13095. hadoop-hdfs unit tests for dynamic commands
    HADOOP-13107. clean up how rumen is executed
    HADOOP-13108. dynamic subcommands need a way to manipulate arguments
    HADOOP-13110. add a streaming subcommand to mapred
    HADOOP-13111. convert hadoop gridmix to be dynamic
    HADOOP-13115. dynamic subcommand docs should talk about exit vs. continue program flow
    HADOOP-13117. clarify daemonization and security vars for dynamic commands
    HADOOP-13120. add a --debug message when dynamic commands have been used
    HADOOP-13121. rename sub-project shellprofiles to match the rest of Hadoop
    HADOOP-13129. fix typo in dynamic subcommand docs
    HADOOP-13151. Underscores should be escaped in dynamic subcommands document
    HADOOP-13153. fix typo in debug statement for dynamic subcommands
2016-05-16 17:54:45 -07:00
Chris Nauroth b9685e85d5 HADOOP-13148. TestDistCpViewFs to include IOExceptions in test error reports. Contributed by Steve Loughran. 2016-05-16 11:53:17 -07:00
Andrew Wang 3c5c57af28 HADOOP-13142. Change project version from 3.0.0 to 3.0.0-alpha1. 2016-05-12 18:27:28 -07:00
Andrew Wang ca5613af91 Revert "Update project version to 3.0.0-alpha1-SNAPSHOT."
This reverts commit 6b53802cba.
2016-05-12 15:32:45 -07:00
Andrew Wang 6b53802cba Update project version to 3.0.0-alpha1-SNAPSHOT. 2016-05-12 11:05:05 -07:00
Jing Zhao af942585a1 HADOOP-12469. distcp should not ignore the ignoreFailures option. Contributed by Mingliang Liu. 2016-05-04 10:23:04 -07:00
Yongjun Zhang 959a28dd12 HDFS-10313. Distcp need to enforce the order of snapshot names passed to -diff. (Lin Yiqun via Yongjun Zhang) 2016-04-26 16:08:03 -07:00
Akira Ajisaka 02c51c27d9 HDFS-10298. Document the usage of distcp -diff option. Contributed by Takashi Ohnishi. 2016-04-25 22:33:09 +09:00
Jing Zhao 63e5412f1a HDFS-9427. HDFS should not default to ephemeral ports. Contributed by Xiaobing Zhou. 2016-04-22 15:14:40 -07:00
Yongjun Zhang a749ba0cea HDFS-9670. DistCp throws NPE when source is root. (John Zhuge via Yongjun Zhang) 2016-04-21 12:17:17 -07:00
Jing Zhao 404f57f328 HDFS-10216. Distcp -diff throws exception when handling relative path. Contributed by Takashi Ohnishi. 2016-04-14 10:35:22 -07:00
Akira Ajisaka 18c7e58283 HDFS-9640. Remove hsftp from DistCp in trunk. Contributed by Wei-Chiu Chuang. 2016-03-28 15:32:38 +09:00
Allen Wittenauer 738155063e HADOOP-12857. rework hadoop-tools (aw) 2016-03-23 13:46:38 -07:00
Masatake Iwasaki 33a412e8a4 HDFS-9048. DistCp documentation is out-of-dated (Daisuke Kobayashi via iwasakims) 2016-03-03 18:57:23 +09:00
Yongjun Zhang ba1c9d484a HDFS-9764. DistCp doesn't print value for several arguments including -numListstatusThreads. (Wei-Chiu Chuang via Yongjun Zhang) 2016-02-19 10:17:37 -08:00
Yongjun Zhang eddd823cd6 HDFS-9638. Improve DistCp Help and documentation. (Wei-Chiu Chuang via Yongjun Zhang) 2016-01-29 12:11:55 -08:00
Yongjun Zhang a9c69ebeb7 HDFS-9612. DistCp worker threads are not terminated after jobs are done. (Wei-Chiu Chuang via Yongjun Zhang) 2016-01-15 10:03:09 -08:00
Colin Patrick Mccabe 8315582c4f HDFS-9517. Fix missing @Test annotation on TestDistCpUtils.testUnpackAttributes (Wei-Chiu Chuang via cmccabe) 2016-01-13 16:28:06 -08:00
Xiaoyu Yao c2e2e13455 HDFS-8584. NPE in distcp when ssl configuration file does not exist in class path. Contributed by Surendra Singh Lilhore. 2016-01-11 17:08:26 -08:00
Zhe Zhang 95f32015ad HDFS-9630. DistCp minor refactoring and clean up. Contributed by Kai Zheng.
Change-Id: I363c4ffcac32116ddcdc0a22fac3db92f14a0db0
2016-01-11 09:46:56 -08:00