diff --git a/docs/ingestion/native-batch.md b/docs/ingestion/native-batch.md index e445b4edb1a..df7a09770ea 100644 --- a/docs/ingestion/native-batch.md +++ b/docs/ingestion/native-batch.md @@ -58,7 +58,7 @@ The supported splittable input formats for now are: - [`s3`](#s3-input-source) reads data from AWS S3 storage. - [`gs`](#google-cloud-storage-input-source) reads data from Google Cloud Storage. -- [`azure`](#azure-input-source) reads data from Azure Blob Storage. +- [`azure`](#azure-input-source) reads data from Azure Blob Storage and Azure Data Lake. - [`hdfs`](#hdfs-input-source) reads data from HDFS storage. - [`http`](#http-input-source) reads data from HTTP servers. - [`local`](#local-input-source) reads data from local storage. @@ -1046,10 +1046,8 @@ Google Cloud Storage object: > You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source. -The Azure input source is to support reading objects directly from Azure Blob store. Objects can be -specified as list of Azure Blob store URI strings. The Azure input source is splittable and can be used -by the [Parallel task](#parallel-task), where each worker task of `index_parallel` will read -a single object. +The Azure input source reads objects directly from Azure Blob store or Azure Data Lake sources. You can +specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [Parallel task](#parallel-task) indexing and each worker task reads one chunk of the split data. Sample specs: @@ -1108,17 +1106,17 @@ Sample specs: |property|description|default|required?| |--------|-----------|-------|---------| |type|This should be `azure`.|None|yes| -|uris|JSON array of URIs where Azure Blob objects to be ingested are located. Should be in form "azure://\/\"|None|`uris` or `prefixes` or `objects` must be set| -|prefixes|JSON array of URI prefixes for the locations of Azure Blob objects to be ingested. Should be in the form "azure://\/\". Empty objects starting with one of the given prefixes will be skipped.|None|`uris` or `prefixes` or `objects` must be set| -|objects|JSON array of Azure Blob objects to be ingested.|None|`uris` or `prefixes` or `objects` must be set| +|uris|JSON array of URIs where the Azure objects to be ingested are located, in the form "azure://\/\"|None|`uris` or `prefixes` or `objects` must be set| +|prefixes|JSON array of URI prefixes for the locations of Azure objects to ingest, in the form "azure://\/\". Empty objects starting with one of the given prefixes are skipped.|None|`uris` or `prefixes` or `objects` must be set| +|objects|JSON array of Azure objects to ingest.|None|`uris` or `prefixes` or `objects` must be set| -Note that the Azure input source will skip all empty objects only when `prefixes` is specified. +Note that the Azure input source skips all empty objects only when `prefixes` is specified. -Azure Blob object: +The `objects` property is: |property|description|default|required?| |--------|-----------|-------|---------| -|bucket|Name of the Azure Blob Storage container|None|yes| +|bucket|Name of the Azure Blob Storage or Azure Data Lake container|None|yes| |path|The path where data is located.|None|yes| ### HDFS Input Source