HADOOP-15642. Update aws-sdk version to 1.11.375.

Contributed by Steve Loughran.
2018-08-16 09:58:46 -07:00 · 2018-08-16 09:58:46 -07:00 · 0e832e7a74
parent 8990eaf592
commit 0e832e7a74
2 changed files with 158 additions and 1 deletions
--- a/hadoop-project/pom.xml
+++ b/hadoop-project/pom.xml
@ -142,7 +142,7 @@
    <make-maven-plugin.version>1.0-beta-1</make-maven-plugin.version>
    <native-maven-plugin.version>1.0-alpha-8</native-maven-plugin.version>
    <surefire.fork.timeout>900</surefire.fork.timeout>
-    <aws-java-sdk.version>1.11.271</aws-java-sdk.version>
+    <aws-java-sdk.version>1.11.375</aws-java-sdk.version>
    <hsqldb.version>2.3.4</hsqldb.version>
    <frontend-maven-plugin.version>1.5</frontend-maven-plugin.version>
    <!-- the version of Hadoop declared in the version resources; can be overridden
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
@ -1081,3 +1081,160 @@ thorough test, by switching to the credentials provider.
 The usual credentials needed to log in to the bucket will be used, but now
 the credentials used to interact with S3 and DynamoDB will be temporary
 role credentials, rather than the full credentials.
 ## <a name="qualifiying_sdk_updates"></a> Qualifying an AWS SDK Update
 Updating the AWS SDK is something which does need to be done regularly,
 but is rarely without complications, major or minor.
 Assume that the version of the SDK will remain constant for an X.Y release,
 excluding security fixes, so it's good to have an update before each release
 &mdash; as long as that update works doesn't trigger any regressions.
 1. Don't make this a last minute action.
 1. The upgrade patch should focus purely on the SDK update, so it can be cherry
 picked and reverted easily.
 1. Do not mix in an SDK update with any other piece of work, for the same reason.
 1. Plan for an afternoon's work, including before/after testing, log analysis
 and any manual tests.
 1. Make sure all the integration tests are running (including s3guard, ARN, encryption, scale)
  *before you start the upgrade*.
 1. Create a JIRA for updating the SDK. Don't include the version (yet),
 as it may take a couple of SDK updates before it is ready.
 1. Identify the latest AWS SDK [available for download](https://aws.amazon.com/sdk-for-java/).
 1. Create a private git branch of trunk for JIRA, and in
  `hadoop-project/pom.xml` update the `aws-java-sdk.version` to the new SDK version.
 1. Do a clean build and rerun all the `hadoop-aws` tests, with and without the `-Ds3guard -Ddynamodb` options.
  This includes the `-Pscale` set, with a role defined for the assumed role tests.
  in `fs.s3a.assumed.role.arn` for testing assumed roles,
  and `fs.s3a.server-side-encryption.key` for encryption, for full coverage.
  If you can, scale up the scale tests.
 1. Create the site with `mvn site -DskipTests`; look in `target/site` for the report.
 1. Review *every single `-output.txt` file in `hadoop-tools/hadoop-aws/target/failsafe-reports`,
  paying particular attention to
  `org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance-output.txt`,
  as that is where changes in stream close/abort logic will surface.
 1. Run `mvn install` to install the artifacts, then in
  `hadoop-cloud-storage-project/hadoop-cloud-storage` run
  `mvn dependency:tree -Dverbose > target/dependencies.txt`.
  Examine the `target/dependencies.txt` file to verify that no new
  artifacts have unintentionally been declared as dependencies
  of the shaded `aws-java-sdk-bundle` artifact.
 ### Basic command line regression testing
 We need a run through of the CLI to see if there have been changes there
 which cause problems, especially whether new log messages have surfaced,
 or whether some packaging change breaks that CLI
 From the root of the project, create a command line release `mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true  -DskipShade`;
 1. Change into the `hadoop/dist/target/hadoop-x.y.z-SNAPSHOT` dir.
 1. Copy a `core-site.xml` file into `etc/hadoop`.
 1. Set the `HADOOP_OPTIONAL_TOOLS` env var on the command line or `~/.hadoop-env`.
 ```bash
 export HADOOP_OPTIONAL_TOOLS="hadoop-aws"
 ```
 Run some basic s3guard commands as well as file operations.
 ```bash
 export BUCKET=s3a://example-bucket-name
 bin/hadoop s3guard bucket-info $BUCKET
 bin/hadoop s3guard set-capacity $BUCKET
 bin/hadoop s3guard set-capacity -read 15 -write 15 $BUCKET
 bin/hadoop s3guard uploads $BUCKET
 bin/hadoop s3guard diff $BUCKET/
 bin/hadoop s3guard prune -minutes 10 $BUCKET/
 bin/hadoop s3guard import $BUCKET/
 bin/hadoop fs -ls $BUCKET/
 bin/hadoop fs -ls $BUCKET/file
 bin/hadoop fs -rm -R -f $BUCKET/dir-no-trailing
 bin/hadoop fs -rm -R -f $BUCKET/dir-trailing/
 bin/hadoop fs -rm $BUCKET/
 bin/hadoop fs -touchz $BUCKET/file
 # expect I/O error as root dir is not empty
 bin/hadoop fs -rm -r $BUCKET/
 bin/hadoop fs -rm -r $BUCKET/\*
 # now success
 bin/hadoop fs -rm -r $BUCKET/
 bin/hadoop fs -mkdir $BUCKET/dir-no-trailing
 # fails with S3Guard
 bin/hadoop fs -mkdir $BUCKET/dir-trailing/
 bin/hadoop fs -touchz $BUCKET/file
 bin/hadoop fs -ls $BUCKET/
 bin/hadoop fs -mv $BUCKET/file $BUCKET/file2
 # expect "No such file or directory"
 bin/hadoop fs -stat $BUCKET/file
 bin/hadoop fs -stat $BUCKET/file2
 bin/hadoop fs -mv $BUCKET/file2 $BUCKET/dir-no-trailing
 bin/hadoop fs -stat $BUCKET/dir-no-trailing/file2
 # treated the same as the file stat
 bin/hadoop fs -stat $BUCKET/dir-no-trailing/file2/
 bin/hadoop fs -ls $BUCKET/dir-no-trailing/file2/
 bin/hadoop fs -ls $BUCKET/dir-no-trailing
 # expect a "0" here:
 bin/hadoop fs -test -d  $BUCKET/dir-no-trailing && echo $?
 # expect a "1" here:
 bin/hadoop fs -test -d  $BUCKET/dir-no-trailing/file2 && echo $?
 # will return NONE unless bucket has checksums enabled
 bin/hadoop fs -checksum $BUCKET/dir-no-trailing/file2
 # expect "etag" + a long string
 bin/hadoop fs -D fs.s3a.etag.checksum.enabled=true -checksum $BUCKET/dir-no-trailing/file2
 ```
 ### Other tests
 * Whatever applications you have which use S3A: build and run them before the upgrade,
 Then see if complete successfully in roughly the same time once the upgrade is applied.
 * Test any third-party endpoints you have access to.
 * Try different regions (especially a v4 only region), and encryption settings.
 * Any performance tests you have can identify slowdowns, which can be a sign
  of changed behavior in the SDK (especially on stream reads and writes).
 * If you can, try to test in an environment where a proxy is needed to talk
 to AWS services.
 * Try and get other people, especially anyone with their own endpoints,
  apps or different deployment environments, to run their own tests.
 ### Dealing with Deprecated APIs and New Features
 A Jenkins run should tell you if there are new deprecations.
 If so, you should think about how to deal with them.
 Moving to methods and APIs which weren't in the previous SDK release makes it
 harder to roll back if there is a problem; but there may be good reasons
 for the deprecation.
 At the same time, there may be good reasons for staying with the old code.
 * AWS have embraced the builder pattern for new operations; note that objects
 constructed this way often have their (existing) setter methods disabled; this
 may break existing code.
 * New versions of S3 calls (list v2, bucket existence checks, bulk operations)
 may be better than the previous HTTP operations & APIs, but they may not work with
 third-party endpoints, so can only be adopted if made optional, which then
 adds a new configuration option (with docs, testing, ...). A change like that
 must be done in its own patch, with its new tests which compare the old
 vs new operations.
 ### Committing the patch
 When the patch is committed: update the JIRA to the version number actually
 used; use that title in the commit message.
 Be prepared to roll-back, re-iterate or code your way out of a regression.
 There may be some problem which surfaces with wider use, which can get
 fixed in a new AWS release, rolling back to an older one,
 or just worked around [HADOOP-14596](https://issues.apache.org/jira/browse/HADOOP-14596).
 Don't be surprised if this happens, don't worry too much, and,
 while that rollback option is there to be used, ideally try to work forwards.
 If the problem is with the SDK, file issues with the
 [AWS SDK Bug tracker](https://github.com/aws/aws-sdk-java/issues).
 If the problem can be fixed or worked around in the Hadoop code, do it there too.