mirror of https://github.com/apache/druid.git
127 lines
9.3 KiB
Markdown
127 lines
9.3 KiB
Markdown
---
|
||
id: s3
|
||
title: "S3-compatible"
|
||
---
|
||
|
||
<!--
|
||
~ Licensed to the Apache Software Foundation (ASF) under one
|
||
~ or more contributor license agreements. See the NOTICE file
|
||
~ distributed with this work for additional information
|
||
~ regarding copyright ownership. The ASF licenses this file
|
||
~ to you under the Apache License, Version 2.0 (the
|
||
~ "License"); you may not use this file except in compliance
|
||
~ with the License. You may obtain a copy of the License at
|
||
~
|
||
~ http://www.apache.org/licenses/LICENSE-2.0
|
||
~
|
||
~ Unless required by applicable law or agreed to in writing,
|
||
~ software distributed under the License is distributed on an
|
||
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||
~ KIND, either express or implied. See the License for the
|
||
~ specific language governing permissions and limitations
|
||
~ under the License.
|
||
-->
|
||
|
||
## S3 extension
|
||
|
||
This extension allows you to do 2 things:
|
||
* [Ingest data](#reading-data-from-s3) from files stored in S3.
|
||
* Write segments to [deep storage](#deep-storage) in S3.
|
||
|
||
To use this Apache Druid extension, make sure to [include](../../development/extensions.md#loading-extensions) `druid-s3-extensions` as an extension.
|
||
|
||
### Reading data from S3
|
||
|
||
The [S3 input source](../../ingestion/native-batch.md#s3-input-source) is supported by the [Parallel task](../../ingestion/native-batch.md#parallel-task)
|
||
to read objects directly from S3. If you use the [Hadoop task](../../ingestion/hadoop.md),
|
||
you can read data from S3 by specifying the S3 paths in your [`inputSpec`](../../ingestion/hadoop.md#inputspec).
|
||
|
||
To configure the extension to read objects from S3 you need to configure how to [connect to S3](#configuration).
|
||
|
||
### Deep Storage
|
||
|
||
S3-compatible deep storage means either AWS S3 or a compatible service like Google Storage which exposes the same API as S3.
|
||
|
||
S3 deep storage needs to be explicitly enabled by setting `druid.storage.type=s3`. **Only after setting the storage type to S3 will any of the settings below take effect.**
|
||
|
||
To correctly configure this extension for deep storage in S3, first configure how to [connect to S3](#configuration).
|
||
In addition to this you need to set additional configuration, specific for [deep storage](#deep-storage-specific-configuration)
|
||
|
||
#### Deep storage specific configuration
|
||
|
||
|Property|Description|Default|
|
||
|--------|-----------|-------|
|
||
|`druid.storage.bucket`|Bucket to store in.|Must be set.|
|
||
|`druid.storage.baseKey`|A prefix string that will be prepended to the object names for the segments published to S3 deep storage|Must be set.|
|
||
|`druid.storage.type`|Global deep storage provider. Must be set to `s3` to make use of this extension.|Must be set (likely `s3`).|
|
||
|`druid.storage.archiveBucket`|S3 bucket name for archiving when running the *archive task*.|none|
|
||
|`druid.storage.archiveBaseKey`|S3 object key prefix for archiving.|none|
|
||
|`druid.storage.disableAcl`|Boolean flag to disable ACL. If this is set to `false`, the full control would be granted to the bucket owner. This may require to set additional permissions. See [S3 permissions settings](#s3-permissions-settings).|false|
|
||
|`druid.storage.useS3aSchema`|If true, use the "s3a" filesystem when using Hadoop-based ingestion. If false, the "s3n" filesystem will be used. Only affects Hadoop-based ingestion.|false|
|
||
|
||
## Configuration
|
||
|
||
### S3 authentication methods
|
||
|
||
Druid uses the following credentials provider chain to connect to your S3 bucket (whether a deep storage bucket or source bucket).
|
||
**Note :** *You can override the default credentials provider chain for connecting to source bucket by specifying an access key and secret key using [Properties Object](../../ingestion/native-batch.md#s3-input-source) parameters in the ingestionSpec.*
|
||
|
||
|order|type|details|
|
||
|--------|-----------|-------|
|
||
|1|Druid config file|Based on your runtime.properties if it contains values `druid.s3.accessKey` and `druid.s3.secretKey` |
|
||
|2|Custom properties file| Based on custom properties file where you can supply `sessionToken`, `accessKey` and `secretKey` values. This file is provided to Druid through `druid.s3.fileSessionCredentials` properties|
|
||
|3|Environment variables|Based on environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`|
|
||
|4|Java system properties|Based on JVM properties `aws.accessKeyId` and `aws.secretKey` |
|
||
|5|Profile information|Based on credentials you may have on your druid instance (generally in `~/.aws/credentials`)|
|
||
|6|ECS container credentials|Based on environment variables available on AWS ECS (AWS_CONTAINER_CREDENTIALS_RELATIVE_URI or AWS_CONTAINER_CREDENTIALS_FULL_URI) as described in the [EC2ContainerCredentialsProviderWrapper documentation](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/EC2ContainerCredentialsProviderWrapper.html)|
|
||
|7|Instance profile information|Based on the instance profile you may have attached to your druid instance|
|
||
|
||
You can find more information about authentication method [here](https://docs.aws.amazon.com/fr_fr/sdk-for-java/v1/developer-guide/credentials.html)<br/>
|
||
**Note :** *Order is important here as it indicates the precedence of authentication methods.<br/>
|
||
So if you are trying to use Instance profile information, you **must not** set `druid.s3.accessKey` and `druid.s3.secretKey` in your Druid runtime.properties*
|
||
|
||
|
||
### S3 permissions settings
|
||
|
||
`s3:GetObject` and `s3:PutObject` are basically required for pushing/loading segments to/from S3.
|
||
If `druid.storage.disableAcl` is set to `false`, then `s3:GetBucketAcl` and `s3:PutObjectAcl` are additionally required to set ACL for objects.
|
||
|
||
### AWS region
|
||
|
||
The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property `aws.region` or the environment variable `AWS_REGION`.
|
||
|
||
As an example, to set the region to 'us-east-1' through system properties:
|
||
|
||
- Add `-Daws.region=us-east-1` to the jvm.config file for all Druid services.
|
||
- Add `-Daws.region=us-east-1` to `druid.indexer.runner.javaOpts` in [Middle Manager configuration](../../configuration/index.md#middlemanager-configuration) so that the property will be passed to Peon (worker) processes.
|
||
|
||
### Connecting to S3 configuration
|
||
|
||
|Property|Description|Default|
|
||
|--------|-----------|-------|
|
||
|`druid.s3.accessKey`|S3 access key. See [S3 authentication methods](#s3-authentication-methods) for more details|Can be omitted according to authentication methods chosen.|
|
||
|`druid.s3.secretKey`|S3 secret key. See [S3 authentication methods](#s3-authentication-methods) for more details|Can be omitted according to authentication methods chosen.|
|
||
|`druid.s3.fileSessionCredentials`|Path to properties file containing `sessionToken`, `accessKey` and `secretKey` value. One key/value pair per line (format `key=value`). See [S3 authentication methods](#s3-authentication-methods) for more details |Can be omitted according to authentication methods chosen.|
|
||
|`druid.s3.protocol`|Communication protocol type to use when sending requests to AWS. `http` or `https` can be used. This configuration would be ignored if `druid.s3.endpoint.url` is filled with a URL with a different protocol.|`https`|
|
||
|`druid.s3.disableChunkedEncoding`|Disables chunked encoding. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#disableChunkedEncoding--) for details.|false|
|
||
|`druid.s3.enablePathStyleAccess`|Enables path style access. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#enablePathStyleAccess--) for details.|false|
|
||
|`druid.s3.forceGlobalBucketAccessEnabled`|Enables global bucket access. See [AWS document](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#setForceGlobalBucketAccessEnabled-java.lang.Boolean-) for details.|false|
|
||
|`druid.s3.endpoint.url`|Service endpoint either with or without the protocol.|None|
|
||
|`druid.s3.endpoint.signingRegion`|Region to use for SigV4 signing of requests (e.g. us-west-1).|None|
|
||
|`druid.s3.proxy.host`|Proxy host to connect through.|None|
|
||
|`druid.s3.proxy.port`|Port on the proxy host to connect through.|None|
|
||
|`druid.s3.proxy.username`|User name to use when connecting through a proxy.|None|
|
||
|`druid.s3.proxy.password`|Password to use when connecting through a proxy.|None|
|
||
|`druid.storage.sse.type`|Server-side encryption type. Should be one of `s3`, `kms`, and `custom`. See the below [Server-side encryption section](#server-side-encryption) for more details.|None|
|
||
|`druid.storage.sse.kms.keyId`|AWS KMS key ID. This is used only when `druid.storage.sse.type` is `kms` and can be empty to use the default key ID.|None|
|
||
|`druid.storage.sse.custom.base64EncodedKey`|Base64-encoded key. Should be specified if `druid.storage.sse.type` is `custom`.|None|
|
||
|
||
## Server-side encryption
|
||
|
||
You can enable [server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html) by setting
|
||
`druid.storage.sse.type` to a supported type of server-side encryption. The current supported types are:
|
||
|
||
- s3: [Server-side encryption with S3-managed encryption keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html)
|
||
- kms: [Server-side encryption with AWS KMS–Managed Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html)
|
||
- custom: [Server-side encryption with Customer-Provided Encryption Keys](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html)
|