mirror of https://github.com/apache/druid.git
2.4 KiB
2.4 KiB
layout |
---|
doc_page |
S3-compatible
Make sure to include druid-s3-extensions
as an extension.
Deep Storage
S3-compatible deep storage is basically either S3 or something like Google Storage which exposes the same API as S3.
Configuration
Property | Description | Default |
---|---|---|
druid.s3.accessKey |
S3 access key. | Must be set. |
druid.s3.secretKey |
S3 secret key. | Must be set. |
druid.storage.bucket |
Bucket to store in. | Must be set. |
druid.storage.baseKey |
Base key prefix to use, i.e. what directory. | Must be set. |
druid.s3.endpoint.url |
Service endpoint either with or without the protocol. | None |
druid.s3.endpoint.signingRegion |
Region to use for SigV4 signing of requests (e.g. us-west-1). | None |
druid.s3.proxy.host |
Proxy host to connect through. | None |
druid.s3.proxy.port |
Port on the proxy host to connect through. | None |
druid.s3.proxy.username |
User name to use when connecting through a proxy. | None |
druid.s3.proxy.password |
Password to use when connecting through a proxy. | None |
StaticS3Firehose
This firehose ingests events from a predefined list of S3 objects.
Sample spec:
"firehose" : {
"type" : "static-s3",
"uris": ["s3://foo/bar/file.gz", "s3://bar/foo/file2.gz"]
}
This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow.
property | description | default | required? |
---|---|---|---|
type | This should be static-s3 . |
N/A | yes |
uris | JSON array of URIs where s3 files to be ingested are located. | N/A | uris or prefixes must be set |
prefixes | JSON array of URI prefixes for the locations of s3 files to be ingested. | N/A | uris or prefixes must be set |
maxCacheCapacityBytes | Maximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes. | 1073741824 | no |
maxFetchCapacityBytes | Maximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read. | 1073741824 | no |
prefetchTriggerBytes | Threshold to trigger prefetching s3 objects. | maxFetchCapacityBytes / 2 | no |
fetchTimeout | Timeout for fetching an s3 object. | 60000 | no |
maxFetchRetry | Maximum retry for fetching an s3 object. | 3 | no |