Gian Merlino d007477742
Docusaurus build framework + ingestion doc refresh. (#8311)
* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes
2019-08-20 21:48:59 -07:00

4.1 KiB

id title
azure Microsoft Azure

To use this Apache Druid (incubating) extension, make sure to include druid-azure-extensions extension.

Deep Storage

Microsoft Azure Storage is another option for deep storage. This requires some additional Druid configuration.

Property Possible Values Description Default
druid.storage.type azure Must be set.
druid.azure.account Azure Storage account name. Must be set.
druid.azure.key Azure Storage account key. Must be set.
druid.azure.container Azure Storage container name. Must be set.
druid.azure.protocol http or https https
druid.azure.maxTries Number of tries before cancel an Azure operation. 3

See Azure Services for more information.

Firehose

StaticAzureBlobStoreFirehose

This firehose ingests events, similar to the StaticS3Firehose, but from an Azure Blob Store.

Data is newline delimited, with one JSON object per line and parsed as per the InputRowParser configuration.

The storage account is shared with the one used for Azure deep storage functionality, but blobs can be in a different container.

As with the S3 blobstore, it is assumed to be gzipped if the extension ends in .gz

This firehose is splittable and can be used by native parallel index tasks. Since each split represents an object in this firehose, each worker task of index_parallel will read an object.

Sample spec:

"firehose" : {
    "type" : "static-azure-blobstore",
    "blobs": [
        {
          "container": "container",
          "path": "/path/to/your/file.json"
        },
        {
          "container": "anothercontainer",
          "path": "/another/path.json"
        }
    ]
}

This firehose provides caching and prefetching features. In IndexTask, a firehose can be read twice if intervals or shardSpecs are not specified, and, in this case, caching can be useful. Prefetching is preferred when direct scan of objects is slow.

property description default required?
type This should be static-azure-blobstore. N/A yes
blobs JSON array of Azure blobs. N/A yes
maxCacheCapacityBytes Maximum size of the cache space in bytes. 0 means disabling cache. Cached files are not removed until the ingestion task completes. 1073741824 no
maxFetchCapacityBytes Maximum size of the fetch space in bytes. 0 means disabling prefetch. Prefetched files are removed immediately once they are read. 1073741824 no
prefetchTriggerBytes Threshold to trigger prefetching Azure objects. maxFetchCapacityBytes / 2 no
fetchTimeout Timeout for fetching an Azure object. 60000 no
maxFetchRetry Maximum retry for fetching an Azure object. 3 no

Azure Blobs:

property description default required?
container Name of the azure container N/A yes
path The path where data is located. N/A yes