These changes ensure that sequential files are opened with the
right read policy, and split start/end is passed in.
As well as offering opportunities for filesystem clients to
choose fetch/cache/seek policies, the settings ensure that
processing text files on an s3 bucket where the default policy
is "random" will still be processed efficiently.
This commit depends on the associated hadoop-common patch,
which must be committed first.
Contributed by Steve Loughran.
Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94
Removed findbugs from the hadoop build images and added spotbugs instead.
Upgraded SpotBugs to 4.2.2 and spotbugs-maven-plugin to 4.2.0.
Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
(cherry picked from commit 23b343aed1)
Conflicts:
dev-support/docker/Dockerfile
hadoop-project/pom.xml
S3A to implement S3 Select through this API.
The new openFile() API is asynchronous, and implemented across FileSystem and FileContext.
The MapReduce V2 inputs are moved to this API, and you can actually set must/may
options to pass in.
This is more useful for setting things like s3a seek policy than for S3 select,
as the existing input format/record readers can't handle S3 select output where
the stream is shorter than the file length, and splitting plain text is suboptimal.
Future work is needed there.
In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific
configuration parameters which can be set in jobs and used to set filesystem input stream
options (seek policy, retry, encryption secrets, etc).
Contributed by Steve Loughran
This commit includes the following changes:
HADOOP-13356. Add a function to handle command_subcommand_OPTS
HADOOP-13355. Handle HADOOP_CLIENT_OPTS in a function
HADOOP-13554. Add an equivalent of hadoop_subcmd_opts for secure opts
HADOOP-13562. Change hadoop_subcommand_opts to use only uppercase
HADOOP-13358. Modify HDFS to use hadoop_subcommand_opts
HADOOP-13357. Modify common to use hadoop_subcommand_opts
HADOOP-13359. Modify YARN to use hadoop_subcommand_opts
HADOOP-13361. Modify hadoop_verify_user to be consistent with hadoop_subcommand_opts (ie more granularity)
HADOOP-13564. modify mapred to use hadoop_subcommand_opts
HADOOP-13563. hadoop_subcommand_opts should print name not actual content during debug
HADOOP-13360. Documentation for HADOOP_subcommand_OPTS
This closesapache/hadoop#126
This commit contains the following JIRA issues:
HADOOP-12931. bin/hadoop work for dynamic subcommands
HADOOP-12932. bin/yarn work for dynamic subcommands
HADOOP-12933. bin/hdfs work for dynamic subcommands
HADOOP-12934. bin/mapred work for dynamic subcommands
HADOOP-12935. API documentation for dynamic subcommands
HADOOP-12936. modify hadoop-tools to take advantage of dynamic subcommands
HADOOP-13086. enable daemonization of dynamic commands
HADOOP-13087. env var doc update for dynamic commands
HADOOP-13088. fix shellprofiles in hadoop-tools to allow replacement
HADOOP-13089. hadoop distcp adds client opts twice when dynamic
HADOOP-13094. hadoop-common unit tests for dynamic commands
HADOOP-13095. hadoop-hdfs unit tests for dynamic commands
HADOOP-13107. clean up how rumen is executed
HADOOP-13108. dynamic subcommands need a way to manipulate arguments
HADOOP-13110. add a streaming subcommand to mapred
HADOOP-13111. convert hadoop gridmix to be dynamic
HADOOP-13115. dynamic subcommand docs should talk about exit vs. continue program flow
HADOOP-13117. clarify daemonization and security vars for dynamic commands
HADOOP-13120. add a --debug message when dynamic commands have been used
HADOOP-13121. rename sub-project shellprofiles to match the rest of Hadoop
HADOOP-13129. fix typo in dynamic subcommand docs
HADOOP-13151. Underscores should be escaped in dynamic subcommands document
HADOOP-13153. fix typo in debug statement for dynamic subcommands