Commit Graph

13082 Commits

Author SHA1 Message Date
Abhishek Agarwal 7911a04064
Refactoring of multi-topic kafka ingestion docs (#14828)
In this PR, I have gotten rid of multiTopic parameter and instead added a topicPattern parameter. Kafka supervisor will pass topicPattern or topic as the stream name to the core ingestion engine. There is validation to ensure that only one of topic or topicPattern will be set. This new setting is easier to understand than overloading the topic field that earlier could be interpreted differently depending on the value of some other field.
2023-08-16 18:00:11 +05:30
Kashif Faraz d9221e46e4
Completely disable cachingCost balancer strategy (#14798)
`cachingCost` has been deprecated in #14484 and is not advised to be used in
production clusters as it may cause usage skew across historicals which the
coordinator is unable to rectify. This PR completely disables `cachingCost` strategy
as it has now been rendered redundant due to recent performance improvements
made to `cost` strategy.

Changes
- Disable `cachingCost` strategy
- Add `DisabledCachingCostBalancerStrategyFactory` for the time being so that we
can give a proper error message before falling back to `CostBalancerStrategy`. This
will be removed in subsequent releases.
- Retain `CachingCostBalancerStrategy` for testing/benchmarking purposes.
- Add javadocs to `DiskNormalizedCostBalancerStrategy`
2023-08-16 11:43:52 +05:30
dependabot[bot] 9be0f64f50
Bump org.apache.commons:commons-compress from 1.21 to 1.23.0 (#14820)
* Bump org.apache.commons:commons-compress from 1.21 to 1.23.0

Bumps org.apache.commons:commons-compress from 1.21 to 1.23.0.

---
updated-dependencies:
- dependency-name: org.apache.commons:commons-compress
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-15 20:08:54 -04:00
Vadim Ogievetsky 0b2563fea3
Web console: adding format notice for CSV and TSV (#14783)
* adding format notice for CSV and TSV

* Update web-console/src/druid-models/ingestion-spec/ingestion-spec.tsx

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update web-console/src/druid-models/ingestion-spec/ingestion-spec.tsx

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update web-console/src/druid-models/ingestion-spec/ingestion-spec.tsx

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* fix tests

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-08-15 15:35:50 -07:00
Nhi Pham 8fa78594ea
Druid SQL API documentation refactor (#14711) 2023-08-15 13:45:25 -07:00
Nhi Pham a38579ab3c
Retention rules API documentation refactor (#14623) 2023-08-15 13:44:44 -07:00
dependabot[bot] aeeeed3b35
Bump protobuf.version from 3.21.7 to 3.24.0 (#14823)
* Bump protobuf.version from 3.21.7 to 3.24.0

Bumps `protobuf.version` from 3.21.7 to 3.24.0.

Updates `com.google.protobuf:protobuf-java` from 3.21.7 to 3.24.0
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v3.21.7...v3.24.0)

Updates `com.google.protobuf:protobuf-java-util` from 3.21.7 to 3.24.0

---
updated-dependencies:
- dependency-name: com.google.protobuf:protobuf-java
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: com.google.protobuf:protobuf-java-util
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

* fix licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-15 12:15:55 -07:00
dependabot[bot] 8abdaa239b
Bump dropwizard.metrics.version from 4.0.0 to 4.2.19 (#14824)
* Bump dropwizard.metrics.version from 4.0.0 to 4.2.19

Bumps `dropwizard.metrics.version` from 4.0.0 to 4.2.19.

Updates `io.dropwizard.metrics:metrics-core` from 4.0.0 to 4.2.19
- [Release notes](https://github.com/dropwizard/metrics/releases)
- [Commits](https://github.com/dropwizard/metrics/compare/v4.0.0...v4.2.19)

Updates `io.dropwizard.metrics:metrics-jmx` from 4.0.0 to 4.2.19
- [Release notes](https://github.com/dropwizard/metrics/releases)
- [Commits](https://github.com/dropwizard/metrics/compare/v4.0.0...v4.2.19)

---
updated-dependencies:
- dependency-name: io.dropwizard.metrics:metrics-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: io.dropwizard.metrics:metrics-jmx
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-15 12:15:29 -07:00
Peter Marshall e33d2db235
202307-notebooks Template amends (#14683)
Co-authored-by: writer-jill <jill.osborne@imply.io>
2023-08-15 11:25:56 -07:00
dependabot[bot] 2fdf5b195f
Bump org.assertj:assertj-core from 3.19.0 to 3.24.2 (#14815)
Bumps org.assertj:assertj-core from 3.19.0 to 3.24.2.

---
updated-dependencies:
- dependency-name: org.assertj:assertj-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-15 07:39:48 -07:00
dependabot[bot] f0a79fa0e4
Bump org.apache.maven.plugins:maven-source-plugin from 2.2.1 to 3.3.0 (#14812)
Bumps [org.apache.maven.plugins:maven-source-plugin](https://github.com/apache/maven-source-plugin) from 2.2.1 to 3.3.0.
- [Commits](https://github.com/apache/maven-source-plugin/compare/maven-source-plugin-2.2.1...maven-source-plugin-3.3.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-source-plugin
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-15 07:39:19 -07:00
dependabot[bot] 0967048dca
Bump org.scala-lang:scala-library from 2.13.9 to 2.13.11 (#14826)
Bumps [org.scala-lang:scala-library](https://github.com/scala/scala) from 2.13.9 to 2.13.11.
- [Release notes](https://github.com/scala/scala/releases)
- [Commits](https://github.com/scala/scala/compare/v2.13.9...v2.13.11)

---
updated-dependencies:
- dependency-name: org.scala-lang:scala-library
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-15 07:38:34 -07:00
dependabot[bot] 76c7963979
Bump com.github.oshi:oshi-core from 6.4.2 to 6.4.4 (#14814)
* Bump com.github.oshi:oshi-core from 6.4.2 to 6.4.4

Bumps [com.github.oshi:oshi-core](https://github.com/oshi/oshi) from 6.4.2 to 6.4.4.
- [Release notes](https://github.com/oshi/oshi/releases)
- [Changelog](https://github.com/oshi/oshi/blob/master/CHANGELOG.md)
- [Commits](https://github.com/oshi/oshi/compare/oshi-parent-6.4.2...oshi-parent-6.4.4)

---
updated-dependencies:
- dependency-name: com.github.oshi:oshi-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2023-08-15 16:14:19 +05:30
dependabot[bot] 5f54ae7d27
Bump org.apache.maven.plugins:maven-surefire-plugin (#14813)
Bumps [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire) from 3.0.0-M7 to 3.1.2.
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.0.0-M7...surefire-3.1.2)
2023-08-14 23:07:53 -07:00
dependabot[bot] d5d483fdc9
Bump org.apache.rat:apache-rat-plugin from 0.12 to 0.15 (#14817) 2023-08-14 23:07:03 -07:00
Xavier Léauté 50b3d96df5
increase dependabot PR limit for Java dependencies (#14804)
Many dependabot PRs are currently stuck due to API changes or
incompatibilities. Temporarily Increasing the limit so we can get
updates for other dependencies.
2023-08-14 19:51:59 -07:00
Abhishek Agarwal 30b5dd4ca7
Add support to read from multiple kafka topics in same supervisor (#14424)
This PR adds support to read from multiple Kafka topics in the same supervisor. A multi-topic ingestion can be useful in scenarios where a cluster admin has no control over input streams. Different teams in an org may create different input topics that they can write the data to. However, the cluster admin wants all this data to be queryable in one data source.
2023-08-14 22:24:49 +05:30
AmatyaAvadhanula e16096735b
Fix 404 when segment is used but not in the Coordinator snapshot (#14762)
* Fix 404 when used segment has not been updated in the Coordinator snapshot

* Add unit test
2023-08-14 13:20:43 +05:30
Kashif Faraz 786e772d26
Remove config `druid.coordinator.compaction.skipLockedIntervals` (#14807)
The value of `druid.coordinator.compaction.skipLockedIntervals` should always be `true`.
2023-08-14 12:31:15 +05:30
Rishabh Singh 0dc305f9e4
Upgrade hibernate validator version to fix CVE-2019-10219 (#14757) 2023-08-14 11:50:51 +05:30
dependabot[bot] e2d2afce46
Bump postgresql from 42.4.1 to 42.6.0 (#13959)
* Bump postgresql from 42.4.1 to 42.6.0

Bumps [postgresql](https://github.com/pgjdbc/pgjdbc) from 42.4.1 to 42.6.0.
- [Release notes](https://github.com/pgjdbc/pgjdbc/releases)
- [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md)
- [Commits](https://github.com/pgjdbc/pgjdbc/compare/REL42.4.1...REL42.6.0)

---
updated-dependencies:
- dependency-name: org.postgresql:postgresql
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-12 19:17:00 -04:00
Soumyava afe22907a5
Calcite upgrade 1.35 (#14510)
* Update to Calcite 1.35.0
* Update from.ftl for Calcite 1.35.0.
* Fixed tests in Calcite upgrade by doing the following:
1. Added a new rule, CoreRules.PROJECT_FILTER_TRANSPOSE_WHOLE_PROJECT_EXPRESSIONS, to Base rules
2. Refactored the CorrelateUnnestRule
3. Updated CorrelateUnnestRel accordingly
4. Fixed a case with selector filters on the left where Calcite was eliding the virtual column
5. Additional test cases for fixes in 2,3,4
6. Update to StringListAggregator to fail a query if separators are not propagated appropriately
* Refactored for testcases to pass after the upgrade, introduced 2 new data sources for handling filters and select projects
* Added a literalSqlAggregator as the upgraded Calcite involved changes to subquery remove rule. This corrected plans for 2 queries with joins and subqueries by replacing an useless literal dimension with a post agg. Additionally a test with COUNT DISTINCT and FILTER which was failing with Calcite 1.21 is added here which passes with 1.35
* Updated to latest avatica and updated code as SqlUnknownTimeStamp is now used in Calcite which needs to be resolved to a timestamp literal
* Added a wrapper segment ref to use for unnest and filter segment reference
2023-08-11 12:47:16 -07:00
George Shiqi Wu c8a11702db
Support broadcast segmetns (#14789) 2023-08-11 11:14:05 -07:00
Vadim Ogievetsky ec28672d07
Web console: allow format picking for download (#14794)
* allow format picking for download

* better popover

* ux review tweaks
2023-08-11 09:43:29 -07:00
Vadim Ogievetsky b0c78ff295
Web console: make retention dialog clearer (#14793)
* make retention dialog clearer

* tweak

* another tweak

* Update web-console/src/dialogs/retention-dialog/retention-dialog.tsx

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* update snapshot for copy

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-08-11 09:43:00 -07:00
hqx871 a0234c4e13
Add sampling factor for DeterminePartitionsJob (#13840)
There are two type of DeterminePartitionsJob:
-  When the input data is not assume grouped, there may be duplicate rows.
In this case, two MR jobs are launched. The first one do group job to remove duplicate rows.
And a second one to perform global sorting to find lower and upper bound for target segments.
- When the input data is assume grouped, we only need to launch the global sorting
MR job to find lower and upper bound for segments.

Sampling strategy:
- If the input data is assume grouped, sample by random at the mapper side of the global sort mr job.
- If the input data is not assume grouped, sample at the mapper of the group job. Use hash on time
and all dimensions and mod by sampling factor to sample, don't use random method because there
may be duplicate rows.
2023-08-11 10:42:25 +05:30
Sergio Ferragut 353f7bed7f
Adding data generation pod to jupyter notebooks deployment (#14742)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-08-10 15:43:05 -07:00
zachjsh 82d82dfbd6
Add stats to KillUnusedSegments coordinator duty (#14782)
### Description

Added the following metrics, which are calculated from the `KillUnusedSegments` coordinatorDuty

`"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill
`"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill
`"killTask/task/count"`: calculates the number of tasks submitted by auto kill. 

#### Release note
NEW: metrics added for auto kill

`"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill
`"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill
`"killTask/task/count"`: calculates the number of tasks submitted by auto kill.
2023-08-10 18:36:53 -04:00
zachjsh 23306c4d80
retry when killing s3 based segments (#14776)
### Description

s3 deleteObjects request sent when killing s3 based segments now being retried, if failure is retry-able.
2023-08-10 14:04:16 -04:00
Xavier Léauté 37ed0f4a17
Bump jclouds.version from 1.9.1 to 2.0.3 (#14746)
* Updates `org.apache.jclouds:*` from 1.9.1 to 2.0.3
* Pin jclouds to 2.0.x since 2.1.x requires Guava 18+
* replace easymock with mockito

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-10 06:24:01 -07:00
Rishabh Singh 4b9846b90f
Improve exception message when DruidLeaderClient doesn't find leader node (#14775)
The existing exception message No known server thrown in DruidLeaderClient is unhelpful.
2023-08-10 16:37:37 +05:30
George Shiqi Wu c8537dbeaf
Add lifecycle hooks to KubernetesTaskRunner (#14790) 2023-08-09 21:16:44 -07:00
Vadim Ogievetsky b1988b2f93
Web console: fix result count (#14786)
* fix result count

* fixes
2023-08-09 20:33:01 +05:30
Laksh Singla 8f102f9031
Introduce StorageConnector for Azure (#14660)
The Azure connector is introduced and MSQ's fault tolerance and durable storage can now be used with Microsoft Azure's blob storage. Also, the results of newly introduced queries from deep storage can now store and fetch the results from Azure's blob storage.
2023-08-09 12:25:27 +00:00
Tejaswini Bandlamudi a45b25fa1d
Removes support for Hadoop 2 (#14763)
Removing Hadoop 2 support as discussed in https://lists.apache.org/list?dev@druid.apache.org:lte=1M:hadoop
2023-08-09 17:47:52 +05:30
Tejaswini Bandlamudi 550a66d71e
Upgrade jackson-databind to 2.12.7 (#14770)
The current version of jackson-databind is flagged for vulnerabilities CVE-2020-28491 (Although cbor format is not used in druid), CVE-2020-36518 (Seems genuine as deeply nested json in can cause resource exhaustion). Updating the dependency to the latest version 2.12.7 to fix these vulnerabilities.
2023-08-09 12:22:16 +05:30
Karan Kumar cd817fc469
Fixing typo in `resultsTruncated` (#14779) 2023-08-08 20:51:44 -07:00
Clint Wylie e57f880020
document new filters and stuff (#14760) 2023-08-08 16:01:06 -07:00
Clint Wylie 667e4dab5e
document expression aggregator (#14497) 2023-08-08 15:49:29 -07:00
317brian 8a4dabc431
docs: remove experimental from schema auto-discoery (#14759) 2023-08-08 12:45:44 -07:00
zachjsh 660e6cfa01
Allow for task limit on kill tasks spawned by auto kill coordinator duty (#14769)
### Description

Previously, the `KillUnusedSegments` coordinator duty, in charge of periodically deleting unused segments, could spawn an unlimited number of kill tasks for unused segments. This change adds 2 new coordinator dynamic configs that can be used to control the limit of tasks spawned by this coordinator duty

`killTaskSlotRatio`: Ratio of total available task slots, including autoscaling if applicable that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is 1, which allows all available tasks to be used, which is the existing behavior

`maxKillTaskSlots`: Maximum number of tasks that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is INT.MAX, which essentially allows for unbounded number of tasks, which is the existing behavior. 

Realize that we can effectively get away with just the one `killTaskSlotRatio`, but following similarly to the compaction config, which has similar properties; I thought it was good to have some control of the upper limit regardless of ratio provided.

#### Release note
NEW: `killTaskSlotRatio`  and `maxKillTaskSlots` coordinator dynamic config properties added that allow control of task resource usage spawned by `KillUnusedSegments` coordinator task (auto kill)
2023-08-08 08:40:55 -04:00
Clint Wylie 2845b6a424
add new filters to unnest filter pushdown (#14777) 2023-08-08 03:29:18 -07:00
Tejaswini Bandlamudi d0403f00fd
upgrade org.mozilla:rhino (#14765) 2023-08-08 12:17:59 +05:30
Suneet Saldanha 2af0ab2425
Metric to report time spent fetching and analyzing segments (#14752)
* Metric to report time spent fetching and analyzing segments

* fix test

* spell check

* fix tests

* checkstyle

* remove unused variable

* Update docs/operations/metrics.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/metrics.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/metrics.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-08-07 18:32:48 -07:00
Abhishek Radhakrishnan bff8f9e12e
Update kinesis docs (#14768)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-08-07 17:08:34 -07:00
Suneet Saldanha b624a4ec4a
Rolling Supervisor restarts at taskDuration (#14396)
* Rolling supervior task publishing

* add an option for number of task groups to roll over

* better

* remove docs

* oops

* checkstyle

* wip test

* undo partial test change

* remove incomplete test
2023-08-07 16:24:32 -07:00
George Shiqi Wu 14940dc3ed
Add pod name to TaskLocation for easier observability and debugging. (#14758)
* Add pod name to location

* Add log

* fix style

* Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesPeonLifecycle.java

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Fix unit tests

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-08-07 12:33:35 -07:00
Victoria Lim 7d7813372a
Docs: Include EARLIEST_BY and LATEST_BY as supported aggregation functions (#14280) 2023-08-07 09:59:12 -07:00
Adarsh Sanjeev 56ab81f381
Add support for different result formats to MSQ SqlStatementResource (#14571)
* Add support for different result format

* Add tests

* Add tests

* Fix checkstyle

* Remove changes to destination

* Removed some unwanted code

* Address review comments

* Rename parameter

* Fix tests
2023-08-07 20:48:59 +05:30
Kashif Faraz 2d8e0f28f3
Refactor: Cleanup coordinator duties for metadata cleanup (#14631)
Changes
- Add abstract class `MetadataCleanupDuty`
- Make `KillAuditLogs`, `KillCompactionConfig`, etc extend `MetadataCleanupDuty` 
- Improve log and error messages
- Cleanup tests
- No functional change
2023-08-05 13:08:23 +05:30