druid

mirror of https://github.com/apache/druid.git synced 2025-02-11 04:24:58 +00:00

Go to file

Frame format for data transfer and short-term storage. (#12745 )

* Frame format for data transfer and short-term storage.

As we move towards query execution plans that involve more transfer
of data between servers, it's important to have a data format that
provides for doing this more efficiently than the options available to
us today.

This patch adds:

- Columnar frames, which support fast querying.
- Row-based frames, which support fast sorting via memory comparison
  and fast whole-row copies via memory copying.
- Frame files, a container format that can be stored on disk or
  transferred between servers.

The idea is we should use row-based frames when data is expected to
be sorted, and columnar frames when data is expected to be queried.

The code in this patch is not used in production yet. Therefore, the
patch involves minimal changes outside of the org.apache.druid.frame
package.  The main ones are adjustments to SqlBenchmark to add benchmarks
for queries on frames, and the addition of a "forEach" method to Sequence.

* Fixes based on tests, static analysis.

* Additional fixes.

* Skip DS mapping tests on JDK 14+

* Better JDK checking in tests.

* Fix imports.

* Additional comment.

* Adjustments from code review.

* Update test case.

2022-07-08 20:42:06 -07:00

.github

Lock hadoop dependencies to 2.8.5 (#11583 )

2021-08-12 15:16:47 +05:30

.idea

Poison stupid pool (#12646 )

2022-07-03 14:36:22 -07:00

benchmarks

Frame format for data transfer and short-term storage. (#12745 )

2022-07-08 20:42:06 -07:00

cloud

add aws-java-sdk-sts to aws-common classpath (#12482 )

2022-05-03 12:25:51 -07:00

codestyle

ScanQuery: Fix JsonIgnore for isLegacy. (#12674 )

2022-06-18 15:55:54 -07:00

core

Frame format for data transfer and short-term storage. (#12745 )

2022-07-08 20:42:06 -07:00

dev

Add git hooks that can run multiple scripts (#12300 )

2022-03-09 07:16:47 +09:00

distribution

deps: upgrade mysql-connector-java to v5.1.49 (#12704 )

2022-06-29 23:15:46 +08:00

docs

IMPLY-12348: Update description of UNION ALL in SQL syntax doc (#12710 )

2022-07-05 13:08:01 -07:00

examples

Add TIME_IN_INTERVAL SQL operator. (#12662 )

2022-06-21 13:05:37 -07:00

extendedset

Bump up the versions (#12480 )

2022-04-27 14:28:20 +05:30

extensions-contrib

Able to filter Cloud objects with glob notation. (#12659 )

2022-06-24 11:40:08 +05:30

extensions-core

Mark specific nimbus.lang.tag.version. (#12751 )

2022-07-07 09:58:35 +05:30

helm/druid

Bump up the versions (#12480 )

2022-04-27 14:28:20 +05:30

hll

Free ByteBuffers in tests and fix some bugs. (#12521 )

2022-05-19 07:42:29 -07:00

hooks

Git hooks should fail on errors; pass args to git hooks (#12322 )

2022-03-10 09:07:50 +09:00

indexing-hadoop

Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030 )

2022-05-02 08:40:44 -05:00

indexing-service

Add EIGHT_HOUR into possible list of Granularities. (#12717 )

2022-07-05 11:05:37 -07:00

integration-tests

fix DruidSchema issue where datasources with no segments can become stuck in tables list indefinitely (#12727 )

2022-07-01 18:54:01 -07:00

licenses

Blueprint 4 (#12391 )

2022-04-04 10:34:22 -07:00

processing

Frame format for data transfer and short-term storage. (#12745 )

2022-07-08 20:42:06 -07:00

publications

De-incubation cleanup in code, docs, packaging (#9108 )

2020-01-03 12:33:19 -05:00

server

Retain CSP configuration in ServerConfig constructor. (#12755 )

2022-07-08 19:19:14 +05:30

services

Mid-level service client and updated high-level clients. (#12696 )

2022-07-05 09:43:26 -07:00

sql

Frame format for data transfer and short-term storage. (#12745 )

2022-07-08 20:42:06 -07:00

web-console

Fix skipTests build flag (#12716 )

2022-06-29 21:59:26 -07:00

website

Update default value of inputSegmentSizeBytes in configuration docs (#12678 )

2022-06-22 09:05:03 +05:30

.asf.yaml

Add .asf.yaml. (#9083 )

2019-12-20 16:45:38 -08:00

.backportrc.json

Add 0.18.0 to .backportrc.json to facilitate backport. (#9661 )

2020-04-11 13:49:04 -07:00

.codecov.yml

Use Codecov (#8388 )

2019-08-28 08:49:30 -07:00

.dockerignore

Add docker container for druid (#6896 )

2019-02-08 12:12:28 +00:00

.gitignore

Cleanup changes pulled out of PR #12368 (#12672 )

2022-06-23 23:19:50 +05:30

.lgtm.yml

Suppress LGTM warnings about stack trace exposure (#9631 )

2020-04-09 17:31:03 -07:00

.travis.yml

Cleanup changes pulled out of PR #12368 (#12672 )

2022-06-23 23:19:50 +05:30

check_test_suite_test.py

suppress false positive cve (#11699 )

2021-09-13 20:45:38 -07:00

check_test_suite.py

suppress false positive cve (#11699 )

2021-09-13 20:45:38 -07:00

CONTRIBUTING.md

Fix numbered list formatting in markdown. (#9664 )

2020-04-21 20:18:12 -07:00

LABELS

Add plain text README.txt, use relative link from README.md to build.md (#7611 )

2019-05-09 21:29:26 -07:00

LICENSE

support Aliyun OSS service as deep storage (#9898 )

2020-07-01 22:20:53 -07:00

licenses.yaml

Mark specific nimbus.lang.tag.version. (#12751 )

2022-07-07 09:58:35 +05:30

NOTICE

license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 (#10885 )

2021-03-10 12:59:25 -08:00

owasp-dependency-check-suppressions.xml

Suppress CVE-2022-33915 (#12740 )

2022-07-04 22:48:08 +05:30

pom.xml

Frame format for data transfer and short-term storage. (#12745 )

2022-07-08 20:42:06 -07:00

README.md

Add JDK 11 (#12333 )

2022-03-16 15:03:04 -07:00

README.template

De-incubation cleanup in code, docs, packaging (#9108 )

2020-01-03 12:33:19 -05:00

upload.sh

Adding licenses and enable apache-rat-plugin. (#6215 )

2018-09-18 08:39:26 -07:00

README.md

Apache Druid

Druid is a high performance real-time analytics database. Druid's main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Getting started

You can get started with Druid with our local or Docker quickstart.

Druid provides a rich set of APIs (via HTTP and JDBC) for loading, managing, and querying your data. You can also interact with Druid via the built-in console (shown below).

Load data

Load streaming and batch data using a point-and-click wizard to guide you through ingestion setup. Monitor one off tasks and ingestion supervisors.

Manage the cluster

Manage your cluster with ease. Get a view of your datasources, segments, ingestion tasks, and services from one convenient location. All powered by SQL systems tables, allowing you to see the underlying query for each view.

Issue queries

Use the built-in query workbench to prototype DruidSQL and native queries or connect one of the many tools that help you make the most out of Druid.

Documentation

You can find the documentation for the latest Druid release on the project website.

If you would like to contribute documentation, please do so under /docs in this repository and submit a pull request.

Community

Community support is available on the druid-user mailing list, which is hosted at Google Groups.

Development discussions occur on dev@druid.apache.org, which you can subscribe to by emailing dev-subscribe@druid.apache.org.

Chat with Druid committers and users in real-time on the Apache Druid Slack channel. Please use this invitation link to join and invite others.

Building from source

Please note that JDK 8 or JDK 11 is required to build Druid.

For instructions on building Druid from source, see docs/development/build.md

Contributing

Please follow the community guidelines for contributing.

For instructions on setting up IntelliJ dev/intellij-setup.md

License

Apache License, Version 2.0

Languages

Java 62.4%

ReScript 30.7%

TypeScript 3.1%

Euphoria 0.9%

Csound 0.8%

Other 1.9%