Commit Graph

14265 Commits

Author SHA1 Message Date
Nikhil Rao a805c5612e
Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277)
* Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page

* Updates intervals in Native Query to remove excess Time part in timestamp

* Moves Druid SQL section above Native query because sql used more often by users

* removes old Druid SQL sections

* Adds TopN Druid SQL query using ORDER BY and LIMIT

* Adds table for Druid SQL variance and standard deviation functions

* Update docs/development/extensions-core/stats.md

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

---------

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-04-15 08:05:34 -07:00
Sree Charan Manamala 5247059d2f
Allow Double & null values in sql type array through dynamic params (#16274) 2024-04-15 10:44:42 +02:00
Adarsh Sanjeev 3df00aef9d
Add manifest file for MSQ export (#15953)
Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines
2024-04-15 11:37:31 +05:30
Kashif Faraz 81d7b6ebe1
Fix OverlordClient to read reports as a concrete `ReportMap` (#16226)
Follow up to #16217 

Changes:
- Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap`
- Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module
  - `TaskReport`
  - `KillTaskReport`
  - `IngestionStatsAndErrorsTaskReport`
  - `TaskContextReport`
  - `TaskReportFileWriter`
  - `SingleFileTaskReportFileWriter`
  - `TaskReportSerdeTest`
- Remove `MsqOverlordResourceTestClient` as it had only one method
which is already present in `OverlordResourceTestClient` itself
2024-04-15 08:00:59 +05:30
Abhishek Radhakrishnan 041d0bff5e
Set default `KillUnusedSegments` duty to coordinator's indexing period & `killTaskSlotRatio` to 0.1 (#16247)
The default value for druid.coordinator.kill.period (if unspecified) has changed from P1D to the value of druid.coordinator.period.indexingPeriod. Operators can choose to override druid.coordinator.kill.period and that will take precedence over the default behavior.
The default value for the coordinator dynamic config killTaskSlotRatio is updated from 1.0 to 0.1. This ensures that that kill tasks take up only 1 task slot right out-of-the-box instead of taking up all the task slots.

* Remove stale comment and inline canDutyRun()

* druid.coordinator.kill.period defaults to druid.coordinator.period.indexingPeriod if not set.

- Remove the default P1D value for druid.coordinator.kill.period. Instead default
  druid.coordinator.kill.period to whatever value druid.coordinator.period.indexingPeriod is set
  to if the former config isn't specified.
- If druid.coordinator.kill.period is set, the value will take precedence over
  druid.coordinator.period.indexingPeriod

* Update server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorConfigTest.java

* Fix checkstyle error

* Clarify comment

* Update server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java

* Put back canDutyRun()

* Default killTaskSlotsRatio to 0.1 instead of 1.0 (all slots)

* Fix typo DEFAULT_MAX_COMPACTION_TASK_SLOTS

* Remove unused test method.

* Update default value of killTaskSlotsRatio in docs and web-console default mock

* Move initDuty() after params and config setup.
2024-04-14 18:56:17 -07:00
Gian Merlino b0c5184f9d
Fix ORDER BY on certain GROUPING SETS. (#16268)
* Fix ORDER BY on certain GROUPING SETS.

DefaultLimitSpec (part of native groupBy) had a bug where it would assume
that results are naturally ordered by dimensions even when subtotalsSpec
is present. However, this is not necessarily the case. For certain
combinations of ORDER BY and GROUPING SETS, this would cause the ORDER BY
to be ignored.

* Fix test testGroupByWithSubtotalsSpecWithOrderLimitForcePushdown. Resorting was necessary.
2024-04-12 12:06:47 -07:00
Katya Macedo 7f06a53cb1
[Docs] Fix API placeholder formatting (#16240) 2024-04-12 09:19:13 -07:00
Sree Charan Manamala 3340b200db
Fix window function drill tests failures falling under RESULT_MISMATCH & RESULT_COUNT_MISMATCH (#16264)
* Updated the drill test expected results which are failing due to druid's default sorting algorithm taking nulls first approach.
* Corrected the queries where date time values are directly provided
* marked 2 cases failing with resultset casting issues
2024-04-12 13:54:48 +02:00
Laksh Singla cce2d0f127
Upload openrewrite patch via GHA (#16270)
This patch adds a step to the openrewrite action, such that it uploads the correcting patch, in case it fails.
2024-04-12 15:31:07 +05:30
Sree Charan Manamala f65c166327
Windowed aggregates should update the aggregation value based on final compute (#16244) 2024-04-12 08:28:33 +02:00
YongGang da9feb4430
Introduce TaskContextReport for reporting task context (#16041)
Changes:
- Add `TaskContextEnricher` interface to improve task management and monitoring
- Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord
- Add `TaskContextReport` to write out task context information in reports
2024-04-12 08:57:49 +05:30
Pranav fc2600b8e2
Adding jvmVersion dimension in JVM Monitor (#16262) 2024-04-11 15:44:56 -07:00
Gian Merlino 9f358f5f4a
SQL tests: avoid mixing skip and cannot vectorize. (#16251)
* SQL tests: avoid mixing skip and cannot vectorize.

skipVectorize switches off vectorization tests completely, and
cannotVectorize turns vectorization tests into negative tests. It doesn't
make sense to use them together, so this patch makes it an error to do so,
and cleans up cases where both are mentioned.

This patch also has the effect of changing various tests from skipVectorize
to cannotVectorize, because in the past when both were mentioned,
skipVectorize would take priority.

* Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt.

* Fix tests.
2024-04-11 15:06:11 -07:00
317brian df9e1bb97b
Docs: Fix typo in tutorial (#16254) 2024-04-10 08:59:52 +05:30
Katya Macedo cd69f145b7
docs: Add upgrade notes for Druid 29.0.1 (#16123) 2024-04-09 13:56:57 -07:00
Vishesh Garg 3d595cfab1
Add storeCompactionState flag support to msq (#15965)
Compaction in the native engine by default records the state of compaction for each segment in the lastCompactionState segment field. This PR adds support for doing the same in the MSQ engine, targeted for future cases such as REPLACE and compaction done via MSQ.

Note that this PR doesn't implicitly store the compaction state for MSQ replace tasks; it is stored with flag "storeCompactionState": true in the query context.
2024-04-09 16:47:47 +05:30
Vishesh Garg 9a4fb58543
Record column name for exceptions while writing frames in RowBasedFrameWriter (#16130)
Current Runtime Exceptions generated while writing frames only include the exception itself without including the name of the column they were encountered in. This patch introduces the further information in the error and makes it non-retryable.
2024-04-09 15:39:10 +05:30
Adarsh Sanjeev e2e0cb905c
Add reasoning for choosing shardSpec to the MSQ report (#16175)
This PR logs the segment type and reason chosen. It also adds it to the query report, to be displayed in the UI.

This PR adds a new section to the reports, segmentReport. This contains the segment type created, if the query is an ingestion, and null otherwise.
2024-04-09 11:32:02 +05:30
Gian Merlino 5e5cf9af99
Reduce upload buffer size in GoogleTaskLogs. (#16236)
* Reduce upload buffer size in GoogleTaskLogs.

Use a 1MB upload buffer, rather than the default of 15 MB in the API client. This is
mainly because MMs may upload logs in parallel, and typically have small heaps. The
default-sized 15 MB buffers add up quickly and can cause a MM to run out of memory.

* Make bufferSize a nullable Integer. Add tests.
2024-04-08 12:54:31 -07:00
Vadim Ogievetsky 4ff7e2c6c9
Web console: Better manual capabilities detection indication (#16191)
* Better forced mode indication

* more robust

* Update web-console/src/components/header-bar/header-bar.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/header-bar.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/__snapshots__/restricted-mode.spec.tsx.snap

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/__snapshots__/restricted-mode.spec.tsx.snap

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* reformat

* forced => manual capability detection

* typo

* typo2

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-04-08 10:07:21 -07:00
sullis f4649fece9
Bump openrewrite plugin + recipes (#16238) 2024-04-08 15:13:57 +05:30
Vishesh Garg af24cc88ce
Fix CVE errors (#16147)
* Fix CVE errors

* Update pac4j

* Update nimbus.jose.jwt.version

* Change pac4j version to 5.7.3

* Change pac4j version to 5.3.1

* Revert pac4j version change

* Update pac4j comment
2024-04-05 17:53:09 +05:30
Parag Jain f55c9e58a8
add google as external storage for msq export (#16051)
Support for exporting msq results to gcs bucket. This is essentially copying the logic of s3 export for gs, originally done by @adarshsanjeev in this PR - #15689
2024-04-05 12:10:10 +05:30
Vadim Ogievetsky 3ba878f21b
don't send lookups to sampler (#16234) 2024-04-04 21:17:42 -07:00
Gian Merlino a319b44545
Allow typedIn to run in replace-with-default mode. (#16233)
* Allow typedIn to run in replace-with-default mode.

Useful when data servers, like Historicals, are running in replace-with-default
mode and the Broker is running in SQL-compatible mode, which can happen during
a rolling update that is applying a mode change.
2024-04-04 15:45:42 -07:00
Sergio Ferragut 64433eb2ff
Update kubernetes-overloard-extension extension name in docs (#16239) 2024-04-04 14:38:28 -07:00
Vadim Ogievetsky 9658e1ad7f
Web console: fix query timer issues (#16235)
* fix timer issues

* wording
2024-04-04 13:13:31 -07:00
Soumyava 7759f25095
Moving bitwise_or to use native calcite operator (#16237) 2024-04-04 12:49:29 -07:00
Soumyava 972937659d
Fixing return type for IPV4 (#15916)
* Fixing return type for IPV4

* Update ipv4match
2024-04-04 08:49:50 -07:00
Abhishek Radhakrishnan 75fb57ed6e
Update error messages when supervisor's checkpoint state is invalid (#16208)
* Update error message when topic messages.

Suggest resetting the supervisor when the topic changes instead of changing
the supervisor name which is actually making a new supervisor.

* Update server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Cleanup

* Remove log and include oldCommitMetadataFromDb

* Fix test

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-04-03 10:34:17 -07:00
Zoltan Haindrich 1df41db46d
Migrate to use docker compose v2 (#16232)
https://github.com/actions/runner-images/issues/9557
2024-04-03 12:32:55 +02:00
Soumyava 4bea865697
Restore context flag for window functions (#16229) 2024-04-03 13:57:13 +05:30
AmatyaAvadhanula 218513ad55
Use created time from metadata store in list tasks (#16228) 2024-04-03 09:03:32 +05:30
Gian Merlino b0ca06f8cd
Fix name of combining filtered aggregator factory. (#16224)
The name of the combining filtered aggregator factory should be the same
as the name of the original factory. However, it wasn't the same in the
case where the original factory's name and the original delegate aggregator
were inconsistently named. In this scenario, we should use the name of
the original filtered aggregator, not the name of the original delegate
aggregator.
2024-04-02 12:59:48 -07:00
zachjsh 9b52c909e0
fix complex types returning UNKNOWN as their SQL type inference (#16216)
* * fix

* * fix

* * address review comments
2024-04-02 14:36:01 -04:00
Sree Charan Manamala 26f9b174de
Handling nil selector column in vector math processors (#16128) 2024-04-02 02:06:57 -07:00
Vadim Ogievetsky 06268bf060
only pick kafka input format by default when needed (#16180) 2024-04-01 13:47:49 -07:00
Aleksey Plekhanov a818b8acb6
Fix CalciteQueryTest#testCountStarWithTimeFilterUsingStringLiterals (#16221)
* Add cases to check handling equals between timestamp and string literal
2024-04-01 13:46:24 -07:00
Kashif Faraz 0de44d91f1
Cleanup serialiazation of TaskReportMap (#16217)
* Build task reports in AbstractBatchIndexTask

* Minor cleanup

* Apply suggestions from code review by @abhishekrb

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* Cleanup IndexTaskTest

* Fix formatting

* Fix coverage

* Cleanup serialization of TaskReport map

* Replace occurrences of Map<String, TaskReport>

* Return TaskReport.ReportMap for live reports, fix test comparisons

* Address test failures

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-04-01 11:53:24 -07:00
Charles Smith 1aa6808b9a
docs: add tutorial with examples of sql null handling (#16185)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-04-01 11:03:42 -07:00
Tapajit Chandra Paul dbef348249
Include heartbeat and zk-connected metric in PrometheusEmitter (#16209) 2024-04-01 15:12:06 +05:30
Pranav 20de7fd95a
Geo spatial interfaces (#16029)
This PR creates an interface for ImmutableRTree and moved the existing implementation to new class which represent 32 bit implementation (stores coordinate as floats). This PR makes the ImmutableRTree extendable to create higher precision implementation as well (64 bit).
In all spatial bound filters, we accept float as input which might not be accurate in the case of high precision implementation of ImmutableRTree. This PR changed the bound filters to accepts the query bounds as double instead of float and it is backward compatible change as it compares double to existing float values in RTree. Previously it was comparing input float to RTree floats which can cause precision loss, now it is little better as it compares double to float which is still not 100% accurate.
There are no changes in the way that we query spatial dimension today except input bound parsing. There is little improvement in string filter predicate which now parse double strings instead of float and compares double to double which is 100% accurate but string predicate is only called when we dont have spatial index.
With allowing the interface to extend ImmutableRTree, we allow to create high precision (HP) implementation and defines new search strategies to perform HP search Iterable<ImmutableBitmap> search(ImmutableDoubleNode node, Bound bound);
With possible HP implementations, Radius bound filter can not really focus on accuracy, it is calculating Euclidean distance in comparing. As EARTH 🌍 is round and not flat, Euclidean distances are not accurate in geo system. This PR adds new param called 'radiusUnit' which allows you to specify units like meters, km, miles etc. It uses https://en.wikipedia.org/wiki/Haversine_formula to check if given geo point falls inside circle or not. Added a test that generates set of points inside and outside in RadiusBoundTest.
2024-04-01 14:58:03 +05:30
dependabot[bot] 27b4028782
Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /web-console (#16195)
Bumps [webpack-dev-middleware](https://github.com/webpack/webpack-dev-middleware) from 5.3.3 to 5.3.4.
- [Release notes](https://github.com/webpack/webpack-dev-middleware/releases)
- [Changelog](https://github.com/webpack/webpack-dev-middleware/blob/v5.3.4/CHANGELOG.md)
- [Commits](https://github.com/webpack/webpack-dev-middleware/compare/v5.3.3...v5.3.4)

---
updated-dependencies:
- dependency-name: webpack-dev-middleware
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-29 09:51:51 -07:00
dependabot[bot] bf88ddb2d7
Bump express from 4.18.2 to 4.19.2 in /web-console (#16204)
Bumps [express](https://github.com/expressjs/express) from 4.18.2 to 4.19.2.
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/master/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.18.2...4.19.2)

---
updated-dependencies:
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-29 09:50:53 -07:00
Vadim Ogievetsky 195221ca59
Web console: update druid-toolkit to get bug fixes (#16213)
* update druid-toolkit to get bug fixes

* update

* fix test
2024-03-29 08:31:35 -07:00
Adithya Chakilam 463010bb29
Populate segment stats for non-parallel compaction jobs (#16171)
* Populate segment stats for non-parallel compaction jobs

* fix

* add-tests

* comments

* update-test

* comments
2024-03-29 09:40:55 -04:00
Kashif Faraz 4df4896674
Refactor: Add common method in AbstractBatchIndexTask to create ingestion stats report (#16202)
Changes
-  No functional changes
- Add method `AbstractBatchIndexTask.buildIngestionStatsReport()` used in several batch tasks
- Add utility method `AbstractBatchIndexTask.addBuildSegmentStatsToReport()`
- Use boolean argument to represent a full report instead of the String `full` 
in internal methods. (REST API remains unchanged.)
- Rename `IngestionStatsAndErrorsTaskReportData` to `IngestionStatsAndErrors`
- Clean up some of the methods
2024-03-28 23:07:00 +05:30
Rishabh Singh 3471352dac
Use DruidLeaderSelector in CliCoordinator.HearbeatSupplier (#16215) 2024-03-28 21:42:33 +05:30
Soumyava 524842a3bb
Window function on msq (#15470)
This PR aims to introduce Window functions on MSQ by doing the following:

    Introduce a Window querykit for handling window queries along with its factory and a processor for window queries
    If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage
    In presence of empty OVER() clause lets all operators loose on a single rac
    In presence of no empty OVER() clause, breaks down each window into individual stages
    Associated machinery to handle window functions in MSQ
    Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators
    Guardrails around materialization
    Comprehensive UTs
2024-03-28 14:58:34 +05:30
Gian Merlino 7649957710
MSQ: Fix issue where AUTO assignment would not respect maxWorkerCount. (#16214)
WorkerAssignmentStrategy.AUTO was missing a check for maxWorkerCount
in the case where the inputs to a stage are not dynamically sliceable.
A common case here is when the inputs to a stage are other stages.
2024-03-28 14:40:31 +05:30