druid/docs/content/development/extensions-core/s3.md

2.4 KiB

layout
doc_page

S3-compatible

Make sure to include druid-s3-extensions as an extension.

Deep Storage

S3-compatible deep storage is basically either S3 or something like Google Storage which exposes the same API as S3.

Configuration

Property Description Default
druid.s3.accessKey S3 access key. Must be set.
druid.s3.secretKey S3 secret key. Must be set.
druid.storage.bucket Bucket to store in. Must be set.
druid.storage.baseKey Base key prefix to use, i.e. what directory. Must be set.
druid.s3.endpoint.url Service endpoint either with or without the protocol. None
druid.s3.endpoint.signingRegion Region to use for SigV4 signing of requests (e.g. us-west-1). None
druid.s3.proxy.host Proxy host to connect through. None
druid.s3.proxy.port Port on the proxy host to connect through. None
druid.s3.proxy.username User name to use when connecting through a proxy. None
druid.s3.proxy.password Password to use when connecting through a proxy. None

StaticS3Firehose

This firehose ingests events from a predefined list of S3 objects.

Sample spec:

"firehose" : {
    "type" : "static-s3",
    "uris": ["s3://foo/bar/file.gz", "s3://bar/foo/file2.gz"]
}

This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow.

property description default required?
type This should be static-s3. N/A yes
uris JSON array of URIs where s3 files to be ingested are located. N/A uris or prefixes must be set
prefixes JSON array of URI prefixes for the locations of s3 files to be ingested. N/A uris or prefixes must be set
maxCacheCapacityBytes Maximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes. 1073741824 no
maxFetchCapacityBytes Maximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read. 1073741824 no
prefetchTriggerBytes Threshold to trigger prefetching s3 objects. maxFetchCapacityBytes / 2 no
fetchTimeout Timeout for fetching an s3 object. 60000 no
maxFetchRetry Maximum retry for fetching an s3 object. 3 no