mirror of https://github.com/apache/druid.git
Azure data lake input source (#11153)
* Mention Azure Data Lake * Make consistent with other entries Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
This commit is contained in:
parent
ed0a57e106
commit
fd0931d35e
|
@ -58,7 +58,7 @@ The supported splittable input formats for now are:
|
|||
|
||||
- [`s3`](#s3-input-source) reads data from AWS S3 storage.
|
||||
- [`gs`](#google-cloud-storage-input-source) reads data from Google Cloud Storage.
|
||||
- [`azure`](#azure-input-source) reads data from Azure Blob Storage.
|
||||
- [`azure`](#azure-input-source) reads data from Azure Blob Storage and Azure Data Lake.
|
||||
- [`hdfs`](#hdfs-input-source) reads data from HDFS storage.
|
||||
- [`http`](#http-input-source) reads data from HTTP servers.
|
||||
- [`local`](#local-input-source) reads data from local storage.
|
||||
|
@ -1046,10 +1046,8 @@ Google Cloud Storage object:
|
|||
|
||||
> You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source.
|
||||
|
||||
The Azure input source is to support reading objects directly from Azure Blob store. Objects can be
|
||||
specified as list of Azure Blob store URI strings. The Azure input source is splittable and can be used
|
||||
by the [Parallel task](#parallel-task), where each worker task of `index_parallel` will read
|
||||
a single object.
|
||||
The Azure input source reads objects directly from Azure Blob store or Azure Data Lake sources. You can
|
||||
specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [Parallel task](#parallel-task) indexing and each worker task reads one chunk of the split data.
|
||||
|
||||
Sample specs:
|
||||
|
||||
|
@ -1108,17 +1106,17 @@ Sample specs:
|
|||
|property|description|default|required?|
|
||||
|--------|-----------|-------|---------|
|
||||
|type|This should be `azure`.|None|yes|
|
||||
|uris|JSON array of URIs where Azure Blob objects to be ingested are located. Should be in form "azure://\<container>/\<path-to-file\>"|None|`uris` or `prefixes` or `objects` must be set|
|
||||
|prefixes|JSON array of URI prefixes for the locations of Azure Blob objects to be ingested. Should be in the form "azure://\<container>/\<prefix\>". Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or `objects` must be set|
|
||||
|objects|JSON array of Azure Blob objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set|
|
||||
|uris|JSON array of URIs where the Azure objects to be ingested are located, in the form "azure://\<container>/\<path-to-file\>"|None|`uris` or `prefixes` or `objects` must be set|
|
||||
|prefixes|JSON array of URI prefixes for the locations of Azure objects to ingest, in the form "azure://\<container>/\<prefix\>". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set|
|
||||
|objects|JSON array of Azure objects to ingest.|None|`uris` or `prefixes` or `objects` must be set|
|
||||
|
||||
Note that the Azure input source will skip all empty objects only when `prefixes` is specified.
|
||||
Note that the Azure input source skips all empty objects only when `prefixes` is specified.
|
||||
|
||||
Azure Blob object:
|
||||
The `objects` property is:
|
||||
|
||||
|property|description|default|required?|
|
||||
|--------|-----------|-------|---------|
|
||||
|bucket|Name of the Azure Blob Storage container|None|yes|
|
||||
|bucket|Name of the Azure Blob Storage or Azure Data Lake container|None|yes|
|
||||
|path|The path where data is located.|None|yes|
|
||||
|
||||
### HDFS Input Source
|
||||
|
|
Loading…
Reference in New Issue