5.9 KiB
layout | title |
---|---|
doc_page | S3-compatible |
S3-compatible
Make sure to include druid-s3-extensions
as an extension.
Deep Storage
S3-compatible deep storage means either AWS S3 or a compatible service like Google Storage which exposes the same API as S3.
Configuration
The AWS SDK requires that the target region be specified. Two ways of doing this are by using the JVM system property aws.region
or the environment variable AWS_REGION
.
As an example, to set the region to 'us-east-1' through system properties:
- Add
-Daws.region=us-east-1
to the jvm.config file for all Druid services. - Add
-Daws.region=us-east-1
todruid.indexer.runner.javaOpts
in middleManager/runtime.properties so that the property will be passed to Peon (worker) processes.
Property | Description | Default |
---|---|---|
druid.s3.accessKey |
S3 access key. | Must be set. |
druid.s3.secretKey |
S3 secret key. | Must be set. |
druid.storage.bucket |
Bucket to store in. | Must be set. |
druid.storage.baseKey |
Base key prefix to use, i.e. what directory. | Must be set. |
druid.storage.sse.type |
Server-side encryption type. Should be one of s3 , kms , and custom . See the below Server-side encryption section for more details. |
None |
druid.storage.sse.kms.keyId |
AWS KMS key ID. Can be empty if druid.storage.sse.type is kms . |
None |
druid.storage.sse.custom.base64EncodedKey |
Base64-encoded key. Should be specified if druid.storage.sse.type is custom . |
None |
druid.s3.protocol |
Communication protocol type to use when sending requests to AWS. http or https can be used. |
https |
druid.s3.disableChunkedEncoding |
Disables chunked encoding. See AWS document for details. | false |
druid.s3.enablePathStyleAccess |
Enables path style access. See AWS document for details. | false |
druid.s3.forceGlobalBucketAccessEnabled |
Enables global bucket access. See AWS document for details. | false |
druid.s3.endpoint.url |
Service endpoint either with or without the protocol. | None |
druid.s3.endpoint.signingRegion |
Region to use for SigV4 signing of requests (e.g. us-west-1). | None |
druid.s3.proxy.host |
Proxy host to connect through. | None |
druid.s3.proxy.port |
Port on the proxy host to connect through. | None |
druid.s3.proxy.username |
User name to use when connecting through a proxy. | None |
druid.s3.proxy.password |
Password to use when connecting through a proxy. | None |
Server-side encryption
You can enable server-side encryption by setting
druid.storage.sse.type
to a supported type of server-side encryption. The current supported types are:
- s3: Server-side encryption with S3-managed encryption keys
- kms: Server-side encryption with AWS KMS–Managed Keys
- custom: Server-side encryption with Customer-Provided Encryption Keys
StaticS3Firehose
This firehose ingests events from a predefined list of S3 objects.
This firehose is splittable and can be used by native parallel index tasks.
Since each split represents an object in this firehose, each worker task of index_parallel
will read an object.
Sample spec:
"firehose" : {
"type" : "static-s3",
"uris": ["s3://foo/bar/file.gz", "s3://bar/foo/file2.gz"]
}
This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow.
property | description | default | required? |
---|---|---|---|
type | This should be static-s3 . |
N/A | yes |
uris | JSON array of URIs where s3 files to be ingested are located. | N/A | uris or prefixes must be set |
prefixes | JSON array of URI prefixes for the locations of s3 files to be ingested. | N/A | uris or prefixes must be set |
maxCacheCapacityBytes | Maximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes. | 1073741824 | no |
maxFetchCapacityBytes | Maximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read. | 1073741824 | no |
prefetchTriggerBytes | Threshold to trigger prefetching s3 objects. | maxFetchCapacityBytes / 2 | no |
fetchTimeout | Timeout for fetching an s3 object. | 60000 | no |
maxFetchRetry | Maximum retry for fetching an s3 object. | 3 | no |