druid/extensions-contrib/druid-deltalake-extensions/pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

  <groupId>org.apache.druid.extensions.contrib</groupId>
  <artifactId>druid-deltalake-extensions</artifactId>
  <name>druid-deltalake-extensions</name>
  <description>Delta Lake connector for Druid</description>

  <parent>
    <artifactId>druid</artifactId>
    <groupId>org.apache.druid</groupId>
    <version>30.0.0-SNAPSHOT</version>
    <relativePath>../../pom.xml</relativePath>
  </parent>
  <modelVersion>4.0.0</modelVersion>

  <properties>
    <delta-kernel.version>3.1.0</delta-kernel.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>io.delta</groupId>
      <artifactId>delta-kernel-api</artifactId>
      <version>${delta-kernel.version}</version>
    </dependency>

    <dependency>
      <groupId>io.delta</groupId>
      <artifactId>delta-kernel-defaults</artifactId>
      <version>${delta-kernel.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client-api</artifactId>
      <version>${hadoop.compile.version}</version>
      <scope>compile</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.druid</groupId>
      <artifactId>druid-processing</artifactId>
      <version>${project.parent.version}</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-annotations</artifactId>
      <scope>provided</scope>
    </dependency>

    <dependency>
      <groupId>com.google.code.findbugs</groupId>
      <artifactId>jsr305</artifactId>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.google.inject</groupId>
      <artifactId>guice</artifactId>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>joda-time</groupId>
      <artifactId>joda-time</artifactId>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-core</artifactId>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
      <version>2.12.7.1</version>
    </dependency>
    <dependency>
      <groupId>it.unimi.dsi</groupId>
      <artifactId>fastutil-core</artifactId>
      <version>8.5.4</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-aws</artifactId>
      <version>${hadoop.compile.version}</version>
      <scope>runtime</scope>
      <exclusions>
        <exclusion>
          <groupId>com.amazonaws</groupId>
          <artifactId>aws-java-sdk-bundle</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.druid</groupId>
      <artifactId>druid-processing</artifactId>
      <version>${project.parent.version}</version>
      <type>test-jar</type>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.hamcrest</groupId>
      <artifactId>hamcrest-all</artifactId>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.hamcrest</groupId>
      <artifactId>hamcrest-core</artifactId>
      <scope>test</scope>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.owasp</groupId>
        <artifactId>dependency-check-maven</artifactId>
        <configuration>
          <skip>true</skip>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>
Extension to read and ingest Delta Lake tables (#15755) * something * test commit * compilation fix * more compilation fixes (fixme placeholders) * Comment out druid-kereberos build since it conflicts with newly added transitive deps from delta-lake Will need to sort out the dependencies later. * checkpoint * remove snapshot schema since we can get schema from the row * iterator bug fix * json json json * sampler flow * empty impls for read(InputStats) and sample() * conversion? * conversion, without timestamp * Web console changes to show Delta Lake * Asset bug fix and tile load * Add missing pieces to input source info, etc. * fix stuff * Use a different delta lake asset * Delta lake extension dependencies * Cleanup * Add InputSource, module init and helper code to process delta files. * Test init * Checkpoint changes * Test resources and updates * some fixes * move to the correct package * More tests * Test cleanup * TODOs * Test updates * requirements and javadocs * Adjust dependencies * Update readme * Bump up version * fixup typo in deps * forbidden api and checkstyle checks * Trim down dependencies * new lines * Fixup Intellij inspections. * Add equals() and hashCode() * chain splits, intellij inspections * review comments and todo placeholder * fix up some docs * null table path and test dependencies. Fixup broken link. * run prettify * Different test; fixes * Upgrade pyspark and delta-spark to latest (3.5.0 and 3.0.0) and regenerate tests * yank the old test resource. * add a couple of sad path tests * Updates to readme based on latest. * Version support * Extract Delta DateTime converstions to DeltaTimeUtils class and add test * More comprehensive split tests. * Some test renames. * Cleanup and update instructions. * add pruneSchema() optimization for table scans. * Oops, missed the parquet files. * Update default table and rename schema constants. * Test setup and misc changes. * Add class loader logic as the context class loader is unaware about extension classes * change some table client creation logic. * Add hadoop-aws, hadoop-common and related exclusions. * Remove org.apache.hadoop:hadoop-common * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Add entry to .spelling to fix docs static check --------- Co-authored-by: abhishekagarwal87 <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> 2024-01-31 00:53:50 -05:00			`<?xml version="1.0" encoding="UTF-8"?>`
			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`
			`<project xmlns="http://maven.apache.org/POM/4.0.0"`
			`xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"`
			`xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">`

Update `groupId` for delta-lake and iceberg extensions (#15843) * Update the group id to org.apache.druid.extensions.contrib for contrib exts. * Note iceberg and delta lake extensions in extensions.md * properties and shell backticks * Update groupId in distribution/pom.xml * remove delta-lake from dist. * Add note on downloading extension. 2024-02-08 02:54:06 -05:00			`<groupId>org.apache.druid.extensions.contrib</groupId>`
Extension to read and ingest Delta Lake tables (#15755) * something * test commit * compilation fix * more compilation fixes (fixme placeholders) * Comment out druid-kereberos build since it conflicts with newly added transitive deps from delta-lake Will need to sort out the dependencies later. * checkpoint * remove snapshot schema since we can get schema from the row * iterator bug fix * json json json * sampler flow * empty impls for read(InputStats) and sample() * conversion? * conversion, without timestamp * Web console changes to show Delta Lake * Asset bug fix and tile load * Add missing pieces to input source info, etc. * fix stuff * Use a different delta lake asset * Delta lake extension dependencies * Cleanup * Add InputSource, module init and helper code to process delta files. * Test init * Checkpoint changes * Test resources and updates * some fixes * move to the correct package * More tests * Test cleanup * TODOs * Test updates * requirements and javadocs * Adjust dependencies * Update readme * Bump up version * fixup typo in deps * forbidden api and checkstyle checks * Trim down dependencies * new lines * Fixup Intellij inspections. * Add equals() and hashCode() * chain splits, intellij inspections * review comments and todo placeholder * fix up some docs * null table path and test dependencies. Fixup broken link. * run prettify * Different test; fixes * Upgrade pyspark and delta-spark to latest (3.5.0 and 3.0.0) and regenerate tests * yank the old test resource. * add a couple of sad path tests * Updates to readme based on latest. * Version support * Extract Delta DateTime converstions to DeltaTimeUtils class and add test * More comprehensive split tests. * Some test renames. * Cleanup and update instructions. * add pruneSchema() optimization for table scans. * Oops, missed the parquet files. * Update default table and rename schema constants. * Test setup and misc changes. * Add class loader logic as the context class loader is unaware about extension classes * change some table client creation logic. * Add hadoop-aws, hadoop-common and related exclusions. * Remove org.apache.hadoop:hadoop-common * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Add entry to .spelling to fix docs static check --------- Co-authored-by: abhishekagarwal87 <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> 2024-01-31 00:53:50 -05:00			`<artifactId>druid-deltalake-extensions</artifactId>`
			`<name>druid-deltalake-extensions</name>`
			`<description>Delta Lake connector for Druid</description>`

			`<parent>`
			`<artifactId>druid</artifactId>`
			`<groupId>org.apache.druid</groupId>`
			`<version>30.0.0-SNAPSHOT</version>`
			`<relativePath>../../pom.xml</relativePath>`
			`</parent>`
			`<modelVersion>4.0.0</modelVersion>`

			`<properties>`
Bump up Delta Lake Kernel to 3.1.0 (#15842) This patch bumps Delta Lake Kernel dependency from 3.0.0 to 3.1.0, which released last week - please see https://github.com/delta-io/delta/releases/tag/v3.1.0 for release notes. There were a few "breaking" API changes in 3.1.0, you can find the rationale for some of those changes here. Next-up in this extension: add and expose filter predicates. 2024-02-06 10:55:17 -05:00			`<delta-kernel.version>3.1.0</delta-kernel.version>`
Extension to read and ingest Delta Lake tables (#15755) * something * test commit * compilation fix * more compilation fixes (fixme placeholders) * Comment out druid-kereberos build since it conflicts with newly added transitive deps from delta-lake Will need to sort out the dependencies later. * checkpoint * remove snapshot schema since we can get schema from the row * iterator bug fix * json json json * sampler flow * empty impls for read(InputStats) and sample() * conversion? * conversion, without timestamp * Web console changes to show Delta Lake * Asset bug fix and tile load * Add missing pieces to input source info, etc. * fix stuff * Use a different delta lake asset * Delta lake extension dependencies * Cleanup * Add InputSource, module init and helper code to process delta files. * Test init * Checkpoint changes * Test resources and updates * some fixes * move to the correct package * More tests * Test cleanup * TODOs * Test updates * requirements and javadocs * Adjust dependencies * Update readme * Bump up version * fixup typo in deps * forbidden api and checkstyle checks * Trim down dependencies * new lines * Fixup Intellij inspections. * Add equals() and hashCode() * chain splits, intellij inspections * review comments and todo placeholder * fix up some docs * null table path and test dependencies. Fixup broken link. * run prettify * Different test; fixes * Upgrade pyspark and delta-spark to latest (3.5.0 and 3.0.0) and regenerate tests * yank the old test resource. * add a couple of sad path tests * Updates to readme based on latest. * Version support * Extract Delta DateTime converstions to DeltaTimeUtils class and add test * More comprehensive split tests. * Some test renames. * Cleanup and update instructions. * add pruneSchema() optimization for table scans. * Oops, missed the parquet files. * Update default table and rename schema constants. * Test setup and misc changes. * Add class loader logic as the context class loader is unaware about extension classes * change some table client creation logic. * Add hadoop-aws, hadoop-common and related exclusions. * Remove org.apache.hadoop:hadoop-common * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Add entry to .spelling to fix docs static check --------- Co-authored-by: abhishekagarwal87 <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> 2024-01-31 00:53:50 -05:00			`</properties>`

			`<dependencies>`
			`<dependency>`
			`<groupId>io.delta</groupId>`
			`<artifactId>delta-kernel-api</artifactId>`
			`<version>${delta-kernel.version}</version>`
			`</dependency>`

			`<dependency>`
			`<groupId>io.delta</groupId>`
			`<artifactId>delta-kernel-defaults</artifactId>`
			`<version>${delta-kernel.version}</version>`
			`</dependency>`
			`<dependency>`
			`<groupId>org.apache.hadoop</groupId>`
			`<artifactId>hadoop-client-api</artifactId>`
			`<version>${hadoop.compile.version}</version>`
			`<scope>compile</scope>`
			`</dependency>`

			`<dependency>`
			`<groupId>org.apache.druid</groupId>`
			`<artifactId>druid-processing</artifactId>`
			`<version>${project.parent.version}</version>`
			`<scope>provided</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>com.fasterxml.jackson.core</groupId>`
			`<artifactId>jackson-annotations</artifactId>`
			`<scope>provided</scope>`
			`</dependency>`

			`<dependency>`
			`<groupId>com.google.code.findbugs</groupId>`
			`<artifactId>jsr305</artifactId>`
			`<scope>provided</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>com.google.guava</groupId>`
			`<artifactId>guava</artifactId>`
			`<scope>provided</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>com.google.inject</groupId>`
			`<artifactId>guice</artifactId>`
			`<scope>provided</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>joda-time</groupId>`
			`<artifactId>joda-time</artifactId>`
			`<scope>provided</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>com.fasterxml.jackson.core</groupId>`
			`<artifactId>jackson-core</artifactId>`
			`<scope>provided</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>com.fasterxml.jackson.core</groupId>`
			`<artifactId>jackson-databind</artifactId>`
			`<version>2.12.7.1</version>`
			`</dependency>`
			`<dependency>`
			`<groupId>it.unimi.dsi</groupId>`
			`<artifactId>fastutil-core</artifactId>`
			`<version>8.5.4</version>`
			`<scope>provided</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>org.apache.hadoop</groupId>`
			`<artifactId>hadoop-aws</artifactId>`
			`<version>${hadoop.compile.version}</version>`
			`<scope>runtime</scope>`
			`<exclusions>`
			`<exclusion>`
			`<groupId>com.amazonaws</groupId>`
			`<artifactId>aws-java-sdk-bundle</artifactId>`
			`</exclusion>`
			`</exclusions>`
			`</dependency>`

			`<dependency>`
			`<groupId>junit</groupId>`
			`<artifactId>junit</artifactId>`
			`<scope>test</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>org.apache.druid</groupId>`
			`<artifactId>druid-processing</artifactId>`
			`<version>${project.parent.version}</version>`
			`<type>test-jar</type>`
			`<scope>test</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>org.hamcrest</groupId>`
			`<artifactId>hamcrest-all</artifactId>`
			`<scope>test</scope>`
			`</dependency>`
			`<dependency>`
			`<groupId>org.hamcrest</groupId>`
			`<artifactId>hamcrest-core</artifactId>`
			`<scope>test</scope>`
			`</dependency>`
			`</dependencies>`

			`<build>`
			`<plugins>`
			`<plugin>`
			`<groupId>org.owasp</groupId>`
			`<artifactId>dependency-check-maven</artifactId>`
			`<configuration>`
			`<skip>true</skip>`
			`</configuration>`
			`</plugin>`
			`</plugins>`
			`</build>`
			`</project>`