Commit Graph

13897 Commits

Author SHA1 Message Date
Gian Merlino 5e5cf9af99
Reduce upload buffer size in GoogleTaskLogs. (#16236)
* Reduce upload buffer size in GoogleTaskLogs.

Use a 1MB upload buffer, rather than the default of 15 MB in the API client. This is
mainly because MMs may upload logs in parallel, and typically have small heaps. The
default-sized 15 MB buffers add up quickly and can cause a MM to run out of memory.

* Make bufferSize a nullable Integer. Add tests.
2024-04-08 12:54:31 -07:00
Vadim Ogievetsky 4ff7e2c6c9
Web console: Better manual capabilities detection indication (#16191)
* Better forced mode indication

* more robust

* Update web-console/src/components/header-bar/header-bar.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/header-bar.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/__snapshots__/restricted-mode.spec.tsx.snap

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/__snapshots__/restricted-mode.spec.tsx.snap

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update web-console/src/components/header-bar/restricted-mode/restricted-mode.tsx

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* reformat

* forced => manual capability detection

* typo

* typo2

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-04-08 10:07:21 -07:00
sullis f4649fece9
Bump openrewrite plugin + recipes (#16238) 2024-04-08 15:13:57 +05:30
Vishesh Garg af24cc88ce
Fix CVE errors (#16147)
* Fix CVE errors

* Update pac4j

* Update nimbus.jose.jwt.version

* Change pac4j version to 5.7.3

* Change pac4j version to 5.3.1

* Revert pac4j version change

* Update pac4j comment
2024-04-05 17:53:09 +05:30
Parag Jain f55c9e58a8
add google as external storage for msq export (#16051)
Support for exporting msq results to gcs bucket. This is essentially copying the logic of s3 export for gs, originally done by @adarshsanjeev in this PR - #15689
2024-04-05 12:10:10 +05:30
Vadim Ogievetsky 3ba878f21b
don't send lookups to sampler (#16234) 2024-04-04 21:17:42 -07:00
Gian Merlino a319b44545
Allow typedIn to run in replace-with-default mode. (#16233)
* Allow typedIn to run in replace-with-default mode.

Useful when data servers, like Historicals, are running in replace-with-default
mode and the Broker is running in SQL-compatible mode, which can happen during
a rolling update that is applying a mode change.
2024-04-04 15:45:42 -07:00
Sergio Ferragut 64433eb2ff
Update kubernetes-overloard-extension extension name in docs (#16239) 2024-04-04 14:38:28 -07:00
Vadim Ogievetsky 9658e1ad7f
Web console: fix query timer issues (#16235)
* fix timer issues

* wording
2024-04-04 13:13:31 -07:00
Soumyava 7759f25095
Moving bitwise_or to use native calcite operator (#16237) 2024-04-04 12:49:29 -07:00
Soumyava 972937659d
Fixing return type for IPV4 (#15916)
* Fixing return type for IPV4

* Update ipv4match
2024-04-04 08:49:50 -07:00
Abhishek Radhakrishnan 75fb57ed6e
Update error messages when supervisor's checkpoint state is invalid (#16208)
* Update error message when topic messages.

Suggest resetting the supervisor when the topic changes instead of changing
the supervisor name which is actually making a new supervisor.

* Update server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Cleanup

* Remove log and include oldCommitMetadataFromDb

* Fix test

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-04-03 10:34:17 -07:00
Zoltan Haindrich 1df41db46d
Migrate to use docker compose v2 (#16232)
https://github.com/actions/runner-images/issues/9557
2024-04-03 12:32:55 +02:00
Soumyava 4bea865697
Restore context flag for window functions (#16229) 2024-04-03 13:57:13 +05:30
AmatyaAvadhanula 218513ad55
Use created time from metadata store in list tasks (#16228) 2024-04-03 09:03:32 +05:30
Gian Merlino b0ca06f8cd
Fix name of combining filtered aggregator factory. (#16224)
The name of the combining filtered aggregator factory should be the same
as the name of the original factory. However, it wasn't the same in the
case where the original factory's name and the original delegate aggregator
were inconsistently named. In this scenario, we should use the name of
the original filtered aggregator, not the name of the original delegate
aggregator.
2024-04-02 12:59:48 -07:00
zachjsh 9b52c909e0
fix complex types returning UNKNOWN as their SQL type inference (#16216)
* * fix

* * fix

* * address review comments
2024-04-02 14:36:01 -04:00
Sree Charan Manamala 26f9b174de
Handling nil selector column in vector math processors (#16128) 2024-04-02 02:06:57 -07:00
Vadim Ogievetsky 06268bf060
only pick kafka input format by default when needed (#16180) 2024-04-01 13:47:49 -07:00
Aleksey Plekhanov a818b8acb6
Fix CalciteQueryTest#testCountStarWithTimeFilterUsingStringLiterals (#16221)
* Add cases to check handling equals between timestamp and string literal
2024-04-01 13:46:24 -07:00
Kashif Faraz 0de44d91f1
Cleanup serialiazation of TaskReportMap (#16217)
* Build task reports in AbstractBatchIndexTask

* Minor cleanup

* Apply suggestions from code review by @abhishekrb

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* Cleanup IndexTaskTest

* Fix formatting

* Fix coverage

* Cleanup serialization of TaskReport map

* Replace occurrences of Map<String, TaskReport>

* Return TaskReport.ReportMap for live reports, fix test comparisons

* Address test failures

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-04-01 11:53:24 -07:00
Charles Smith 1aa6808b9a
docs: add tutorial with examples of sql null handling (#16185)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-04-01 11:03:42 -07:00
Tapajit Chandra Paul dbef348249
Include heartbeat and zk-connected metric in PrometheusEmitter (#16209) 2024-04-01 15:12:06 +05:30
Pranav 20de7fd95a
Geo spatial interfaces (#16029)
This PR creates an interface for ImmutableRTree and moved the existing implementation to new class which represent 32 bit implementation (stores coordinate as floats). This PR makes the ImmutableRTree extendable to create higher precision implementation as well (64 bit).
In all spatial bound filters, we accept float as input which might not be accurate in the case of high precision implementation of ImmutableRTree. This PR changed the bound filters to accepts the query bounds as double instead of float and it is backward compatible change as it compares double to existing float values in RTree. Previously it was comparing input float to RTree floats which can cause precision loss, now it is little better as it compares double to float which is still not 100% accurate.
There are no changes in the way that we query spatial dimension today except input bound parsing. There is little improvement in string filter predicate which now parse double strings instead of float and compares double to double which is 100% accurate but string predicate is only called when we dont have spatial index.
With allowing the interface to extend ImmutableRTree, we allow to create high precision (HP) implementation and defines new search strategies to perform HP search Iterable<ImmutableBitmap> search(ImmutableDoubleNode node, Bound bound);
With possible HP implementations, Radius bound filter can not really focus on accuracy, it is calculating Euclidean distance in comparing. As EARTH 🌍 is round and not flat, Euclidean distances are not accurate in geo system. This PR adds new param called 'radiusUnit' which allows you to specify units like meters, km, miles etc. It uses https://en.wikipedia.org/wiki/Haversine_formula to check if given geo point falls inside circle or not. Added a test that generates set of points inside and outside in RadiusBoundTest.
2024-04-01 14:58:03 +05:30
dependabot[bot] 27b4028782
Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /web-console (#16195)
Bumps [webpack-dev-middleware](https://github.com/webpack/webpack-dev-middleware) from 5.3.3 to 5.3.4.
- [Release notes](https://github.com/webpack/webpack-dev-middleware/releases)
- [Changelog](https://github.com/webpack/webpack-dev-middleware/blob/v5.3.4/CHANGELOG.md)
- [Commits](https://github.com/webpack/webpack-dev-middleware/compare/v5.3.3...v5.3.4)

---
updated-dependencies:
- dependency-name: webpack-dev-middleware
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-29 09:51:51 -07:00
dependabot[bot] bf88ddb2d7
Bump express from 4.18.2 to 4.19.2 in /web-console (#16204)
Bumps [express](https://github.com/expressjs/express) from 4.18.2 to 4.19.2.
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/master/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.18.2...4.19.2)

---
updated-dependencies:
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-29 09:50:53 -07:00
Vadim Ogievetsky 195221ca59
Web console: update druid-toolkit to get bug fixes (#16213)
* update druid-toolkit to get bug fixes

* update

* fix test
2024-03-29 08:31:35 -07:00
Adithya Chakilam 463010bb29
Populate segment stats for non-parallel compaction jobs (#16171)
* Populate segment stats for non-parallel compaction jobs

* fix

* add-tests

* comments

* update-test

* comments
2024-03-29 09:40:55 -04:00
Kashif Faraz 4df4896674
Refactor: Add common method in AbstractBatchIndexTask to create ingestion stats report (#16202)
Changes
-  No functional changes
- Add method `AbstractBatchIndexTask.buildIngestionStatsReport()` used in several batch tasks
- Add utility method `AbstractBatchIndexTask.addBuildSegmentStatsToReport()`
- Use boolean argument to represent a full report instead of the String `full` 
in internal methods. (REST API remains unchanged.)
- Rename `IngestionStatsAndErrorsTaskReportData` to `IngestionStatsAndErrors`
- Clean up some of the methods
2024-03-28 23:07:00 +05:30
Rishabh Singh 3471352dac
Use DruidLeaderSelector in CliCoordinator.HearbeatSupplier (#16215) 2024-03-28 21:42:33 +05:30
Soumyava 524842a3bb
Window function on msq (#15470)
This PR aims to introduce Window functions on MSQ by doing the following:

    Introduce a Window querykit for handling window queries along with its factory and a processor for window queries
    If a window operator is present with a partition by clause, pushes the partition as a shuffle spec of the previous stage
    In presence of empty OVER() clause lets all operators loose on a single rac
    In presence of no empty OVER() clause, breaks down each window into individual stages
    Associated machinery to handle window functions in MSQ
    Introduced a separate hidden engine feature WINDOW_LEAF_OPERATOR which is set only for MSQ engine. In presence of this feature, the planner plans without the leaf operators by creating a window query over an inner scan query. In case of native this is set to false and the planner generates the leafOperators
    Guardrails around materialization
    Comprehensive UTs
2024-03-28 14:58:34 +05:30
Gian Merlino 7649957710
MSQ: Fix issue where AUTO assignment would not respect maxWorkerCount. (#16214)
WorkerAssignmentStrategy.AUTO was missing a check for maxWorkerCount
in the case where the inputs to a stage are not dynamically sliceable.
A common case here is when the inputs to a stage are other stages.
2024-03-28 14:40:31 +05:30
Sensor f99501179d
remove junit-jupiter-api version to prevent inconsistent version (#16210) 2024-03-27 17:48:14 -07:00
Adithya Chakilam a65b2d4f41
Visibility into LagBased AutoScaler desired task count (#16199)
* Visibility into skipped scale notices

* comments

* change to emit always instead of just skips

* fix failing test

* comments

* Add couple more tests
2024-03-27 13:08:00 -04:00
Abhishek Radhakrishnan cf9a3bdc14
Fix up error handling in unusedSegments API. (#16206)
Changes:
- Handle exceptions in the API and map them to a `Response` object with the appropriate error code.
- Replace `AuthorizationUtils.filterAuthorizedResources()` with `DatasourceResourceFilter`.
The endpoint is annotated consistent with other usages.
- Update `DatasourceResourceFilter` to remove the lambda and update javadocs.
The usages information is self-evident with an IDE.
- Adjust the invalid interval exception message.
- Break up the large unit test `testGetUnusedSegmentsInDataSource()` into smaller unit tests
for each test case. Also, validate the error codes.
2024-03-27 12:31:21 +05:30
Gian Merlino 58a8a23243
Avoid conversion to String in JsonReader, JsonNodeReader. (#15693)
* Avoid conversion to String in JsonReader, JsonNodeReader.

These readers were running UTF-8 decode on the provided entity to
convert it to a String, then parsing the String as JSON. The patch
changes them to parse the provided entity's input stream directly.

In order to preserve the nice error messages that include parse errors,
the readers now need to open the entity again on the error path, to
re-read the data. To make this possible, the InputEntity#open contract
is tightened to require the ability to re-open entities, and existing
InputEntity implementations are updated to allow re-opening.

This patch also renames JsonLineReaderBenchmark to JsonInputFormatBenchmark,
updates it to benchmark all three JSON readers, and adds a case that reads
fields out of the parsed row (not just creates it).

* Fixes for static analysis.

* Implement intermediateRowAsString in JsonReader.

* Enhanced JsonInputFormatBenchmark.

Renames JsonLineReaderBenchmark to JsonInputFormatBenchmark, and enhances it to
test various readers (JsonReader, JsonLineReader, JsonNodeReader) as well as
to test with/without field discovery.
2024-03-26 08:16:05 -07:00
Sree Charan Manamala f29c8ac368
Allow non literal rhs in MV_FILTER_ONLY and MV_FILTER_NONE (#16113)
This commit allows to use the MV_FILTER_ONLY & MV_FILTER_NONE functions
with a non literal argument. 
Currently `select mv_filter_only('mvd_dim', 'array_dim') from 'table'`
returns a `Unhandled Query Planning Failure`

This is being tackled and also considered for the cases where the `array_dim`
having null & empty values.

Changed classes:
 * `MultiValueStringOperatorConversions`
 * `ApplyFunction`
 * `CalciteMultiValueStringQueryTest`
2024-03-26 12:31:09 +05:30
Abhishek Radhakrishnan 95595ba4f5
Fix handling an empty list of versions (#16198)
* Differentiate null and empty lists of segment IDs and versions.

Treat them differently so the. Segment IDs and versions can be An empty list,
in which case, the queries should just not return anything. Versions are optional, so
they can be null, which just indicates nothing, so the queries should return segments with
all possible versions. Segment IDs cannot be null as indicated by the absence of @Nullable
annotation.

* Update javadocs and add empty versions test to kill task.

* Add test for RetrieveSegmentsActions as well.
2024-03-25 17:51:24 -07:00
Kashif Faraz e7dc00b86d
Refactor: Simplify creation input row filter predicate in various batch tasks (#16196)
Changes:
- Simplify method `AbstractBatchIndexTask.defaultRowFilter()` and rename
- Add method `allowNonNullWithinInputIntervalsOf()`
- Add javadocs
2024-03-26 04:54:07 +05:30
Zoltan Haindrich a16092b16a
Rewrite exotic LAST_VALUE/FIRST_VALUE to self-reference. (#16063)
* Rewrite exotic LAST_VALUE/FIRST_VALUE to self-reference.

* rewrite `LAST_VALUE(x) OVER (ORDER BY y)` to `LAG(x,0) OVER (ORDER BY y)`
* not directly to `x` because some queries get unplannable that way
* restrict `NTILE` from framing - as its not supported
* add test to ensure that all of the `KNOWN_WINDOW_FNS`'s framing is accounted for

* checkstyle/etc

* add test

* apidoc

* add assume to avoid MSQ fail
2024-03-25 11:03:47 -07:00
zachjsh 8370db106c
INSERT/REPLACE dimension target column types are validated against source input expressions (#15962)
* * address remaining comments from https://github.com/apache/druid/pull/15836

* *  address remaining comments from https://github.com/apache/druid/pull/15908

* * add test that exposes relational algebra issue

* * simplify test exposing issue

* * fix

* * add tests for sealed / non-sealed

* * update test descriptions

* * fix test failure when -Ddruid.generic.useDefaultValueForNull=true

* * check type assignment based on natice Druid types

* * add tests that cover missing jacoco coverage

* * add replace tests

* * add more tests and comments about column ordering

* * simplify tests

* * review comments

* * remove commented line

* * STRING family types should be validated as non-null
2024-03-25 12:34:07 -04:00
Kashif Faraz 82f443340d
Clean up TaskQueueTest (#16187)
Changes:
- Remove redundant code from `TaskQueueTest`
- Use lambdas in `TaskQueue`
- Simplify error message when `TaskQueue` is full
2024-03-25 09:40:01 +05:30
Kashif Faraz 323d67a0ac
Add errorCode to failure type `InternalServerError` (#16186)
Changes:
- Use error code `internalServerError` for failures of this type
- Remove the error code argument from `InternalServerError.exception()` methods
thus fixing a bug in the callers.
2024-03-24 04:24:09 +05:30
AmatyaAvadhanula cfa2a901b3
Redact passwords from tasks fetched from the TaskQueue (#16182)
* Redact passwords from tasks fetched from the TaskQueue
2024-03-23 14:22:11 +05:30
Aru Raghuwanshi 6e19ce5e69
Handle null values in `KafkaStringHeaderReader` (#16192) 2024-03-23 13:05:55 +05:30
Clint Wylie b0a9c318d6
add new typed in filter (#16039)
changes:
* adds TypedInFilter which preserves matching sets in the native match value type
* SQL planner uses new TypedInFilter when druid.generic.useDefaultValueForNull=false (the default)
2024-03-22 12:45:08 -07:00
Abhishek Radhakrishnan a70e28a3c2
Parameterize segment IDs (#16174)
* Add parameterized segment IDs.

* Refactor into one common method.

* Refactor getConditionForIntervalsAndMatchMode - pass in only what's needed.

* Minor cleanup.
2024-03-22 08:20:59 -07:00
Arun Ramani c72e69a8c8
MetricsModule: inject DataSourceTaskIdHolder early (#16140)
* Explicitly bind ServiceStatusMonitor

* Correct fix
2024-03-21 16:14:41 -07:00
Kashif Faraz 352902156a
Fix mark segment unused when overshadowed by zero replica segment (#16181)
Bug:
In the `MarkOvershadowedSegmentsAsUnused` duty, the coordinator marks a segment
as unused if it is overshadowed by a segment currently being served by a historical or broker.
But it is possible to have segments that are eligible for a load rule but require zero replicas
to be loaded. (Such segments can be queried only using the MSQ engine).
If such a zero-replica segment overshadows any other segment, the overshadowed segment will
never be marked as unused and will continue to exist in the metadata store as a dangling segment.

Fix:
- In a coordinator run, keep track of segments that are eligible for a load rule but require zero replicas
- Allow the zero-replicas segments to overshadow old segments and hence mark the latter as unused

Other changes:
- Add simulation test to verify new behaviour. This test fails with the current code.
- Clean up javadocs
2024-03-21 12:56:59 +05:30
Rushikesh Bankar 3d8b0ffae8
Add indexer level task metrics to provide more visibility in the task distribution (#15991)
Changes:

Add the following indexer level task metrics:
- `worker/task/running/count`
- `worker/task/assigned/count`
- `worker/task/completed/count`

These metrics will provide more visibility into the tasks distribution across indexers
(We often see a task skew issue across indexers and with this issue it would be easier
to catch the imbalance)
2024-03-21 11:08:01 +05:30