diff --git a/docs/development/extensions-contrib/delta-lake.md b/docs/development/extensions-contrib/delta-lake.md index 503fbfdc55d..88f3a2c77f3 100644 --- a/docs/development/extensions-contrib/delta-lake.md +++ b/docs/development/extensions-contrib/delta-lake.md @@ -51,9 +51,4 @@ java \ -c "org.apache.druid.extensions.contrib:druid-deltalake-extensions:" ``` -See [Loading community extensions](../../configuration/extensions.md#loading-community-extensions) for more information. - -## Known limitations - -This extension relies on the Delta Kernel API and can only read from the latest Delta table snapshot. Ability to read from -arbitrary snapshots is tracked [here](https://github.com/delta-io/delta/issues/2581). \ No newline at end of file +See [Loading community extensions](../../configuration/extensions.md#loading-community-extensions) for more information. \ No newline at end of file diff --git a/docs/ingestion/input-sources.md b/docs/ingestion/input-sources.md index c6ba2e5b49e..71340abc2c0 100644 --- a/docs/ingestion/input-sources.md +++ b/docs/ingestion/input-sources.md @@ -1147,11 +1147,12 @@ To use the Delta Lake input source, load the extension [`druid-deltalake-extensi You can use the Delta input source to read data stored in a Delta Lake table. For a given table, the input source scans the latest snapshot from the configured table. Druid ingests the underlying delta files from the table. -| Property|Description|Required| -|---------|-----------|--------| -| type|Set this value to `delta`.|yes| -| tablePath|The location of the Delta table.|yes| -| filter|The JSON Object that filters data files within a snapshot.|no| +| Property|Description| Default|Required | +|---------|-----------|-----------------| +|type|Set this value to `delta`.| None|yes| +|tablePath|The location of the Delta table.|None|yes| +|filter|The JSON Object that filters data files within a snapshot.|None|no| +|snapshotVersion|The snapshot version to read from the Delta table. An integer value must be specified.|Latest|no| ### Delta filter object @@ -1224,7 +1225,7 @@ filters on partitioned columns. | column | The table column to apply the filter on. | yes | | value | The value to use in the filter. | yes | -The following is a sample spec to read all records from the Delta table `/delta-table/foo`: +The following is a sample spec to read all records from the latest snapshot from Delta table `/delta-table/foo`: ```json ... @@ -1237,7 +1238,8 @@ The following is a sample spec to read all records from the Delta table `/delta- } ``` -The following is a sample spec to read records from the Delta table `/delta-table/foo` to select records where `name = 'Employee4' and age >= 30`: +The following is a sample spec to read records from the Delta table `/delta-table/foo` snapshot version `3` to select records where +`name = 'Employee4' and age >= 30`: ```json ... @@ -1260,7 +1262,8 @@ The following is a sample spec to read records from the Delta table `/delta-tabl "value": "30" } ] - } + }, + "snapshotVersion": 3 }, } ``` diff --git a/extensions-contrib/druid-deltalake-extensions/src/main/java/org/apache/druid/delta/input/DeltaInputSource.java b/extensions-contrib/druid-deltalake-extensions/src/main/java/org/apache/druid/delta/input/DeltaInputSource.java index 01a18e9bc85..c4c2f2668b0 100644 --- a/extensions-contrib/druid-deltalake-extensions/src/main/java/org/apache/druid/delta/input/DeltaInputSource.java +++ b/extensions-contrib/druid-deltalake-extensions/src/main/java/org/apache/druid/delta/input/DeltaInputSource.java @@ -67,9 +67,9 @@ import java.util.stream.Collectors; import java.util.stream.Stream; /** - * Input source to ingest data from a Delta Lake. This input source reads the latest snapshot from a Delta table - * specified by {@code tablePath} parameter. If {@code filter} is specified, it's used at the Kernel level - * for data pruning. The filtering behavior is as follows: + * Input source to ingest data from a Delta Lake. This input source reads the given {@code snapshotVersion} from a Delta + * table specified by {@code tablePath} parameter, or the latest snapshot if it's not specified. + * If {@code filter} is specified, it's used at the Kernel level for data pruning. The filtering behavior is as follows: *