From f1841c644408fe5d3bf801e2ab695a75dc100da9 Mon Sep 17 00:00:00 2001 From: Peter Marshall <42997954+petermarshallio@users.noreply.github.com> Date: Tue, 29 Mar 2022 17:13:05 +0100 Subject: [PATCH] Docs - S3 masking and nav update to S3 page (#11490) * Docs: Masking S3 creds and some rewording Knowledge transfer from https://groups.google.com/g/druid-user/c/FydcpFrA688 * Removed bold in one of the quote sections * Update s3.md * Update s3.md Quick grammar change * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith * Update s3.md Typo * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Update s3.md Active lang * Update s3.md LAng nit * Update native-batch.md LAng nit * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith * Grammar tidy-up and link fix Corrected 2 x links to old page H2s, resolved the question around precedence, and some other grammatical changes. * Update docs/development/extensions-core/s3.md * Update s3.md Removed an Erroneous E Co-authored-by: Charles Smith --- docs/development/extensions-core/s3.md | 33 ++++++++++++++------------ 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/docs/development/extensions-core/s3.md b/docs/development/extensions-core/s3.md index 8e01b8aa6e0..a8dbaef1a3b 100644 --- a/docs/development/extensions-core/s3.md +++ b/docs/development/extensions-core/s3.md @@ -32,11 +32,12 @@ To use this Apache Druid extension, [include](../../development/extensions.md#lo ### Reading data from S3 -The [S3 input source](../../ingestion/native-batch-input-source.md#s3-input-source) is supported by the [Parallel task](../../ingestion/native-batch.md) -to read objects directly from S3. If you use the [Hadoop task](../../ingestion/hadoop.md), -you can read data from S3 by specifying the S3 paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec). +Use a native batch [Parallel task](../../ingestion/native-batch.md) with an [S3 input source](../../ingestion/native-batch-input-sources.html#s3-input-source) to read objects directly from S3. -To configure the extension to read objects from S3 you need to configure how to [connect to S3](#configuration). +Alternatively, use a [Hadoop task](../../ingestion/hadoop.md), +and specify S3 paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec). + +To read objects from S3, you must supply [connection information](#configuration) in configuration. ### Deep Storage @@ -44,8 +45,7 @@ S3-compatible deep storage means either AWS S3 or a compatible service like Goog S3 deep storage needs to be explicitly enabled by setting `druid.storage.type=s3`. **Only after setting the storage type to S3 will any of the settings below take effect.** -To correctly configure this extension for deep storage in S3, first configure how to [connect to S3](#configuration). -In addition to this you need to set additional configuration, specific for [deep storage](#deep-storage-specific-configuration) +To use S3 for Deep Storage, you must supply [connection information](#configuration) in configuration *and* set additional configuration, specific for [Deep Storage](#deep-storage-specific-configuration). #### Deep storage specific configuration @@ -63,8 +63,9 @@ In addition to this you need to set additional configuration, specific for [deep ### S3 authentication methods -Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket). -**Note :** *You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using [Properties Object](../../ingestion/native-batch-input-source.md#s3-input-source) parameters in the ingestionSpec.* +You can provide credentials to connect to S3 in a number of ways, whether for [deep storage](#deep-storage) or as an [ingestion source](#reading-data-from-s3). + +The configuration options are listed in order of precedence. For example, if you would like to use profile information given in `~/.aws.credentials`, do not set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid config file because they would take precedence. |order|type|details| |--------|-----------|-------| @@ -76,23 +77,25 @@ Druid uses the following credentials provider chain to connect to your S3 bucket |6|ECS container credentials|Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the [EC2ContainerCredentialsProviderWrapper documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)| |7|Instance profile information|Based on the instance profile you may have attached to your druid instance| -You can find more information about authentication method [here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials)
-**Note :** *Order is important here as it indicates the precedence of authentication methods.
-So if you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties* +For more information, refer to the [Amazon Developer Guide](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials). +Alternatively, you can bypass this chain by specifying an access key and secret key using a [Properties Object](../../ingestion/native-batch-input-sources.html#s3-input-source) inside your ingestion specification. + +Use the property [`druid.startup.logging.maskProperties`](../../configuration/index.html#startup-logging) to mask credentials information in Druid logs. For example, `["password", "secretKey", "awsSecretAccessKey"]`. ### S3 permissions settings -`s3:GetObject` and `s3:PutObject` are basically required for pushing/loading segments to/from S3. +`s3:GetObject` and `s3:PutObject` are required for pushing or pulling segments to or from S3. + If `druid.storage.disableAcl` is set to `false`, then `s3:GetBucketAcl` and `s3:PutObjectAcl` are additionally required to set ACL for objects. ### AWS region -The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property `aws.region` or the environment variable `AWS_REGION`. +The AWS SDK requires that a target region be specified. You can set these by using the JVM system property `aws.region` or by setting an environment variable `AWS_REGION`. -As an example, to set the region to 'us-east-1' through system properties: +For example, to set the region to 'us-east-1' through system properties: -- Add `-Daws.region=us-east-1` to the jvm.config file for all Druid services. +- Add `-Daws.region=us-east-1` to the `jvm.config` file for all Druid services. - Add `-Daws.region=us-east-1` to `druid.indexer.runner.javaOpts` in [Middle Manager configuration](../../configuration/index.md#middlemanager-configuration) so that the property will be passed to Peon (worker) processes. ### Connecting to S3 configuration