druid

mirror of https://github.com/apache/druid.git synced 2025-02-10 03:55:02 +00:00

Go to file

Abhishek Radhakrishnan 9f95a691f7

Extension to read and ingest Delta Lake tables (#15755 )

* something

* test commit

* compilation fix

* more compilation fixes (fixme placeholders)

* Comment out druid-kereberos build since it conflicts with newly added transitive deps from delta-lake

Will need to sort out the dependencies later.

* checkpoint

* remove snapshot schema since we can get schema from the row

* iterator bug fix

* json json json

* sampler flow

* empty impls for read(InputStats) and sample()

* conversion?

* conversion, without timestamp

* Web console changes to show Delta Lake

* Asset bug fix and tile load

* Add missing pieces to input source info, etc.

* fix stuff

* Use a different delta lake asset

* Delta lake extension dependencies

* Cleanup

* Add InputSource, module init and helper code to process delta files.

* Test init

* Checkpoint changes

* Test resources and updates

* some fixes

* move to the correct package

* More tests

* Test cleanup

* TODOs

* Test updates

* requirements and javadocs

* Adjust dependencies

* Update readme

* Bump up version

* fixup typo in deps

* forbidden api and checkstyle checks

* Trim down dependencies

* new lines

* Fixup Intellij inspections.

* Add equals() and hashCode()

* chain splits, intellij inspections

* review comments and todo placeholder

* fix up some docs

* null table path and test dependencies. Fixup broken link.

* run prettify

* Different test; fixes

* Upgrade pyspark and delta-spark to latest (3.5.0 and 3.0.0) and regenerate tests

* yank the old test resource.

* add a couple of sad path tests

* Updates to readme based on latest.

* Version support

* Extract Delta DateTime converstions to DeltaTimeUtils class and add test

* More comprehensive split tests.

* Some test renames.

* Cleanup and update instructions.

* add pruneSchema() optimization for table scans.

* Oops, missed the parquet files.

* Update default table and rename schema constants.

* Test setup and misc changes.

* Add class loader logic as the context class loader is unaware about extension classes

* change some table client creation logic.

* Add hadoop-aws, hadoop-common and related exclusions.

* Remove org.apache.hadoop:hadoop-common

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Add entry to .spelling to fix docs static check

---------

Co-authored-by: abhishekagarwal87 <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

2024-01-30 21:53:50 -08:00

.github

Fix minor build issues and stabilize intellij-inspections runs (#15747 )

2024-01-24 15:17:33 +05:30

.idea

Ignore misc.xml (#14362 )

2023-06-02 12:00:52 +05:30

benchmarks

Prepare main branch for next 30.0.0 release. (#15707 )

2024-01-23 15:55:54 +05:30

cloud

Prepare main branch for next 30.0.0 release. (#15707 )

2024-01-23 15:55:54 +05:30

codestyle

fix rat and checkstyle issue (#15530 )

2023-12-14 09:33:01 +08:00

dev

Suggest adoption of Google Style guide (#14905 )

2023-11-01 13:31:03 -07:00

distribution

Extension to read and ingest Delta Lake tables (#15755 )

2024-01-30 21:53:50 -08:00

docs

Extension to read and ingest Delta Lake tables (#15755 )

2024-01-30 21:53:50 -08:00

examples

Disable eager initialization for non-query connection requests (#15751 )

2024-01-25 14:38:50 +05:30

extensions-contrib

Extension to read and ingest Delta Lake tables (#15755 )

2024-01-30 21:53:50 -08:00

extensions-core

Allow null values for account when injecting (#15777 )

2024-01-30 16:55:45 -05:00

helm/druid

helm: Add serviceAccounts, rbac, and small fixes (#13747 )

2023-02-23 11:42:03 +05:30

hooks

Git hooks should fail on errors; pass args to git hooks (#12322 )

2022-03-10 09:07:50 +09:00

indexing-hadoop

Prepare main branch for next 30.0.0 release. (#15707 )

2024-01-23 15:55:54 +05:30

indexing-service

Release unneeded append locks after acquiring a new superseding append lock (#15682 )

2024-01-30 16:51:56 +05:30

integration-tests

Temporarily bump up the delay in auth IT from 5s to 10s. (#15765 )

2024-01-26 11:52:27 -05:00

integration-tests-ex

Disable eager initialization for non-query connection requests (#15751 )

2024-01-25 14:38:50 +05:30

licenses

Web console: Update and prune dependancies (#15487 )

2023-12-05 14:25:07 -08:00

processing

add null value index wiring for nested column to speed up is null/is not null (#15687 )

2024-01-29 12:34:50 +05:30

publications

De-incubation cleanup in code, docs, packaging (#9108 )

2020-01-03 12:33:19 -05:00

server

Close open segments when a newer segment with higher version is allocated (#15727 )

2024-01-31 09:11:00 +05:30

services

Prepare main branch for next 30.0.0 release. (#15707 )

2024-01-23 15:55:54 +05:30

sql

Fix up value types when creating range filters. (#15778 )

2024-01-29 13:30:47 -08:00

web-console

Extension to read and ingest Delta Lake tables (#15755 )

2024-01-30 21:53:50 -08:00

website

Extension to read and ingest Delta Lake tables (#15755 )

2024-01-30 21:53:50 -08:00

.asf.yaml

.asf.yaml: Add required "repository" field. (#14499 )

2023-06-28 15:05:07 -07:00

.backportrc.json

Add 0.18.0 to .backportrc.json to facilitate backport. (#9661 )

2020-04-11 13:49:04 -07:00

.codecov.yml

Use Codecov (#8388 )

2019-08-28 08:49:30 -07:00

.dockerignore

Add docker container for druid (#6896 )

2019-02-08 12:12:28 +00:00

.gitignore

Docusaurus2 upgrade for master (#14411 )

2023-08-16 19:01:21 -07:00

.lgtm.yml

be consistent about referring to the web console by its name (#13118 )

2022-09-19 15:02:17 -07:00

check_test_suite_test.py

remove Travis CI (#13789 )

2023-02-10 01:46:56 -08:00

check_test_suite.py

Update Hadoop3 as default build version (#14005 )

2023-04-26 12:52:51 +05:30

CONTRIBUTING.md

Document our conventions for writing messages (#13916 )

2023-04-03 21:30:20 -07:00

doap_Druid.rdf

Fix the created property in DOAP RDF file (#14971 )

2023-09-13 06:12:35 -07:00

it.sh

Build reliablity fixes (#15048 )

2023-09-28 12:27:52 -07:00

LABELS

Fixing security vulnerability check errors (#13956 )

2023-03-23 11:10:06 +05:30

LICENSE

Adding the PropertyNamingStrategies from jackson for fixing hadoop ingestion (#14671 )

2023-08-01 20:02:43 +05:30

licenses.yaml

Azure client upgrade to allow identity options (#15287 )

2024-01-03 18:36:05 -05:00

NOTICE

Update notice file. (#15702 )

2024-01-23 15:56:22 +05:30

owasp-dependency-check-suppressions.xml

unpin snakeyaml, add suppressions and licenses (#15549 )

2023-12-15 10:33:14 -08:00

pom.xml

Extension to read and ingest Delta Lake tables (#15755 )

2024-01-30 21:53:50 -08:00

README.md

Docusaurus2 upgrade for master (#14411 )

2023-08-16 19:01:21 -07:00

README.template

De-incubation cleanup in code, docs, packaging (#9108 )

2020-01-03 12:33:19 -05:00

upload.sh

Adding licenses and enable apache-rat-plugin. (#6215 )

2018-09-18 08:39:26 -07:00

README.md

Workflow	Status
⚙️ CodeQL Config
🔍 CodeQL
🕒 Cron Job ITS
🏷️ Labeler
♻️ Reusable Revised ITS
♻️ Reusable Standard ITS
♻️ Reusable Unit Tests
🔄 Revised ITS
🔧 Standard ITS
🛠️ Static Checks
🧪 Unit and Integration Tests Unified
🔬 Unit Tests

Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in web console (shown below).

Load data

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

See the latest documentation for the documentation for the current official release. If you need information on a previous release, you can browse previous releases documentation.

Make documentation and tutorials updates in /docs using Markdown or extended Markdown (MDX). Then, open a pull request.

To build the site locally, you need Node 16.14 or higher and to install Docusaurus 2 with npm|yarn install in the website directory. Then you can run npm|yarn start to launch a local build of the docs.

If you're looking to update non-doc pages like Use Cases, those files are in the druid-website-src repo.

Community

Visit the official project community page to read about getting involved in contributing to Apache Druid, and how we help one another use and operate Druid.

Druid users can find help in the druid-user mailing list on Google Groups, and have more technical conversations in #troubleshooting on Slack.
Druid development discussions take place in the druid-dev mailing list (dev@druid.apache.org). Subscribe by emailing dev-subscribe@druid.apache.org. For live conversations, join the #dev channel on Slack.

Check out the official community page for details of how to join the community Slack channels.

Find articles written by community members and a calendar of upcoming events on the project site - contribute your own events and articles by submitting a PR in the apache/druid-website-src repository.

Building from source

Please note that JDK 8 or JDK 11 is required to build Druid.

See the latest build guide for instructions on building Apache Druid from source.

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0

Languages

Java 62.4%

ReScript 30.7%

TypeScript 3.1%

Euphoria 0.9%

Csound 0.8%

Other 1.9%