Commit Graph

13508 Commits

Author SHA1 Message Date
Kashif Faraz feeb4f0fb0
Allocate pending segments at latest committed version (#15459)
The segment allocation algorithm reuses an already allocated pending segment if the new allocation request is made for the same parameters:

datasource
sequence name
same interval
same value of skipSegmentLineageCheck (false for batch append, true for streaming append)
same previous segment id (used only when skipSegmentLineageCheck = false)
The above parameters can thus uniquely identify a pending segment (enforced by the UNIQUE constraint on the sequence_name_prev_id_sha1 column in druid_pendingSegments metadata table).

This reuse is done in order to

allow replica tasks (in case of streaming ingestion) to use the same set of segment IDs.
allow re-run of a failed batch task to use the same segment ID and prevent unnecessary allocations
2023-12-14 16:18:39 +05:30
Vishesh Garg e43bb74c3a
Add MSQ Durable Storage Connector for Google Cloud Storage and change current Google Cloud Storage client library (#15398)
The PR addresses 2 things:

    Add MSQ durable storage connector for GCS
    Change GCS client library from the old Google API Client Library to the recommended Google Cloud Client Library. Ref: https://cloud.google.com/apis/docs/client-libraries-explained
2023-12-14 07:34:49 +05:30
AlbericByte 0436edae0c
fix rat and checkstyle issue (#15530)
* fix rat and checkstyle issue

* remove all checks for generated-sources and generated-test-sources
2023-12-14 09:33:01 +08:00
Soumyava 3e15522d6b
Round works correctly on system metadata columns (#15554) 2023-12-13 17:23:14 -08:00
Pranav 81fe855b6f
Update com.github.eirslett to fix bad zip issue (#15556) 2023-12-13 17:22:54 -08:00
Clint Wylie e55f6b6202
remove search auto strategy, estimateSelectivity of BitmapColumnIndex (#15550)
* remove search auto strategy, estimateSelectivity of BitmapColumnIndex

* more cleanup
2023-12-13 16:30:01 -08:00
Vadim Ogievetsky f770eeb8be
Web console: Update webpack-dev-server v3 to v4 (#15555)
* init

* update usage

* revert licenses.yaml

* move the audience-annotations outside of the web console block
2023-12-13 16:16:54 -08:00
zachjsh 857693f5cf
Decorate sampling response with system fields if specified (#15536)
* * decorate sampling response with system fields if specified

* * add unit test
2023-12-13 12:16:59 -08:00
Keerthana Srikanth f32dbd4131
Upgrade pac4j-oidc to 4.5.7 to address CVE-2021-44878 (#15522)
* Upgrade org.pac4j:pac4j-oidc to 4.5.5 to address CVE-2021-44878
* add CVE suppression and notes, since vulnerability scan still shows this CVE
* Add tests to improve coverage
2023-12-13 10:44:05 -08:00
Bartosz Mikulski 4670a7650f
Optional removal of metrics from Prometheus PushGateway on shutdown (#14935)
* Optional removal of metrics from Prometheus PushGateway on shutdown

* Make pushGatewayDeleteOnShutdown property nullable

* Add waitForShutdownDelay property

* Fix unit test

* Address PR comments

* Address PR comments

* Add explanation on why it is useful to have deletePushGatewayMetricsOnShutdown

* Fix spelling error

* Fix spelling error
2023-12-13 11:58:53 -05:00
Zoltan Haindrich 8bc7a5f3ac
Move codeql-config.yml out of the workflows folder (#15553)
Move codeql config file out of the workflows folder so github doesn't try
to run it and fail the github workflow run every time a branch is updated.
2023-12-13 08:37:01 -08:00
AmatyaAvadhanula 48a96f5d06
Better automatic offset reset for Kinesis ingestion (#15338)
Better automatic offset reset for Kinesis ingestion
2023-12-13 12:03:17 +05:30
Parth Agrawal 4ec9a0a7f7
Update Druid version in Tag in pom.xml (#15545)
This PR updates the tag present in pom.xml to match the druid version in pom.xml
This was last updated in 0da8ffc
It seems to me like this was missed in further Druid version upgrades.
2023-12-12 20:18:30 -08:00
Jan Werner 3c7dec56ca
update kubernetes java client to 19.0.0 and docker-java to 3.3.4 (#15449)
Update of direct dependencies:
* kubernetes java-client to 19.0.0
* docker-java-bom to 3.3.4

In order to update transitive dependencies:
* okio to 3.6.0
* bcjava to 1.76

To address CVES:
- CVE-2023-3635 in okio
- CVE-2023-33201 in bcjava

---------

Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-12-12 14:27:57 -08:00
Xavier Léauté debb6b401c
update core Apache Kafka dependencies to 3.6.1 (#15539)
Release notes: https://downloads.apache.org/kafka/3.6.1/RELEASE_NOTES.html
2023-12-12 14:24:57 -08:00
Soumyava 38f3cf9e65
Fixing a case where datatype mismatch was happenning in join (#15541) 2023-12-12 12:50:32 -08:00
AmatyaAvadhanula 91ca8e73d6
Skip compaction for datasources with partial-eternity segments (#15542)
This PR builds on #13304 to skip compaction for datasources with segments that have their interval start or end coinciding with Eternity interval end-points.

This is needed in order to prevent an issue similar to #13208 as the Coordinator tries to iterate over a large number of intervals when trying to compact an interval with infinite start or end.
2023-12-12 15:06:45 +05:30
Ankit Kothari 8735d023a1
Add experimental support for first/last for double/float/long #10702 (#14462)
Add experimental support for doubleLast, doubleFirst, FloatLast, FloatFirst, longLast and longFirst.
2023-12-12 11:36:51 +05:30
TestBoost 85af2c8340
only create used and unused segments once to make the test faster (#15533) 2023-12-12 09:31:04 +05:30
Clint Wylie e8fcf2cac8
minor doc adjustments (#15531) 2023-12-11 18:22:44 -08:00
Xavier Léauté 6f78049760
remove references to non-existant website maven module (#15540)
The website pom was removed as part of
https://github.com/apache/druid/pull/14411 so we no longer need to
reference it as a module and the profile can be removed.

Dependabot is currently failing trying to look for this module, so
removing it should also fix that.
2023-12-11 16:58:35 -08:00
zachjsh ab7d9bc6ec
Add api for Retrieving unused segments (#15415)
### Description

This pr adds an api for retrieving unused segments for a particular datasource. The api supports pagination by the addition of `limit` and `lastSegmentId` parameters. The resulting unused segments are returned with optional `sortOrder`, `ASC` or `DESC` with respect to the matching segments `id`, `start time`, and `end time`, or not returned in any guarenteed order if `sortOrder` is not specified

`GET /druid/coordinator/v1/datasources/{dataSourceName}/unusedSegments?interval={interval}&limit={limit}&lastSegmentId={lastSegmentId}&sortOrder={sortOrder}`

Returns a list of unused segments for a datasource in the cluster contained within an optionally specified interval.
Optional parameters for limit and lastSegmentId can be given as well, to limit results and enable paginated results.
The results may be sorted in either ASC, or DESC order depending on specifying the sortOrder parameter.

`dataSourceName`: The name of the datasource
`interval`:                 the specific interval to search for unused segments for.
`limit`:                      the maximum number of unused segments to return information about. This property helps to
                                 support pagination
`lastSegmentId`:     the last segment id from which to search for results. All segments returned are > this segment 
                                 lexigraphically if sortOrder is null or ASC, or < this segment lexigraphically if sortOrder is DESC.
`sortOrder`:            Specifies the order with which to return the matching segments by start time, end time. A null
                                value indicates that order does not matter.

This PR has:

- [x] been self-reviewed.
   - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
2023-12-11 16:32:18 -05:00
George Shiqi Wu 4152f1d147
Fix empty logs and status messages for mmless ingestion (#15527)
* Fix empty logs and status messages for mmless ingestion

* Add tests
2023-12-11 13:20:45 -05:00
Katya Macedo fc222377ae
[Docs] Document decode_base64_complex and decode_base64_utf8 functions (#15444) 2023-12-11 09:12:06 -08:00
Abhishek Radhakrishnan 96be82a3e6
Clean up duty for non-overlapping eternity tombstones (#15281)
* Add initial draft of MarkDanglingTombstonesAsUnused duty.

* Use overshadowed segments instead of all used segments.

* Add unit test for MarkDanglingSegmentsAsUnused duty.

* Add mock call

* Simplify code.

* Docs

* shorter lines formatting

* metric doc

* More tests, refactor and fix up some logic.

* update javadocs; other review comments.

* Make numCorePartitions as 0 in the TombstoneShardSpec.

* fix up test

* Add tombstone core partition tests

* Update docs/design/coordinator.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* review comment

* Minor cleanup

* Only consider tombstones with 0 core partitions

* Need to register the test shard type to make jackson happy

* test comments

* checkstyle

* fixup misc typos in comments

* Update logic to use overshadowed segments

* minor cleanup

* Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone.

* Address review feedback.

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-12-11 08:57:15 -08:00
Katya Macedo 099a9825d1
[Docs] Add a release notes template (#15333)
* Add release notes template

* Update spellcheck
2023-12-11 11:35:16 +05:30
Clint Wylie 42f2496b7d
fix bug with nested empty array fields (#15532) 2023-12-09 12:20:21 -08:00
Rishabh Singh 54df235026
Lazily build Filter in FilteredAggregatorFactory to avoid parsing exceptions in Router (#15526)
Query with lookups in FilteredAggregator fails with this exception in router,

Cannot construct instance of `org.apache.druid.query.aggregation.FilteredAggregatorFactory`, problem: Lookup [campaigns_lookup[campaignId][is_sold][autodsp]] not found at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 913] (through reference chain: org.apache.druid.query.groupby.GroupByQuery["aggregations"]->java.util.ArrayList[1])
T
he problem is that constructor of FilteredAggregatorFactory is actually validating if the lookup exists in this statement dimFilter.toFilter().
This is failing on the router, which is to be expected, because, the router isn’t assigned any lookups.
The fix is to move to a lazy initialisation of the filter object in the constructor.
2023-12-09 12:18:37 +05:30
Clint Wylie e7c8f2e208
lift restriction of array_to_mv to only support direct column access (#15528) 2023-12-08 16:27:17 -08:00
Victoria Lim e68979e03b
Docs: update SQL API reference (#15515)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-12-08 11:53:19 -08:00
Katya Macedo 355c800108
Revamp design page (#15486)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-12-08 11:40:24 -08:00
Soumyava ca4ecdf7d0
Fixing NPE with virtual expression with unnest (#15513)
* Fixing NPE with virtual expression with unnest

* Fixing a comment
2023-12-08 10:51:56 -08:00
Clint Wylie e64b92eb35
add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521) 2023-12-08 05:28:46 -08:00
Adarsh Sanjeev 2e45eadc08
Add better error messages for using OVERWRITE with INSERT statments (#15517)
* Add better error messages for using OVERWRITE with INSERT statments
2023-12-08 15:33:46 +05:30
Zoltan Haindrich c353ccfdef
Windowed min aggregates null-s as 0 (#15371) 2023-12-08 01:41:16 -08:00
Clint Wylie 1eafe983ec
fix array presenting columns to not match single element arrays to scalars for equality (#15503)
* fix array presenting columns to not match single element arrays to scalars for equality
* update docs to clarify usage model of mixed type columns
2023-12-08 01:22:07 -08:00
sb89594 5fda8613ad
Feature: Add IPv6 Match Function (#15212) 2023-12-07 23:09:06 -08:00
Adarsh Sanjeev 254a8eb7e0
Add null checks for HllSketchHolder (#15502)
Fixes a potential NPE which could occur while folding the HllSketchAggregator. If the sketch is null, druid could return a null HllSketchHolder object. Adding a null check here could help here

Resolves a null pointer exception in HllSketchAggregatorFactory
2023-12-08 11:43:04 +05:30
AlbericByte 935aa187a0
add Assert function to verify in the DataGeneratorTest (#15504)
* add Assert function to verify in the DataGeneratorTest

* remove unused log in DataGeneratorTest

* add comment for DataGeneratorTest
2023-12-08 09:12:17 +08:00
Charles Smith db3a633250
update timeseries to reflect NULL filling (#15512)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-12-07 14:41:27 -08:00
Clint Wylie c241c6980c
store auto columns with only empty or null containing arrays as ARRAY<LONG> instead of COMPLEX<json> (#15505) 2023-12-07 03:31:43 -08:00
Vishesh Garg 801967b75f
Add test logs zipping and archival steps for failures in Static Checks Github Actions (#15506)
Add test logs zipping and archival steps for failures in Static Checks Github Actions
2023-12-07 15:34:23 +05:30
Abhishek Radhakrishnan b541000d43
Bump up max heap memory for unit tests from 1.5 GB to 2 GB. (#15507) 2023-12-07 15:34:04 +05:30
Clint Wylie 82ac48786b
document arrayContainsElement filter (#15455) 2023-12-07 00:14:00 -08:00
Clint Wylie 557f3f6f57
add array column type support to EXTEND operator (#15458) 2023-12-06 23:21:35 -08:00
Benjamin Hopp fea53c7084
Re-arranging sections for append and replace docs. (#15497) 2023-12-06 13:13:05 -08:00
Rishabh Singh 6a64f72c67
Lookup on incomplete partition set in SegmentMetadataQuerySegmentWalker (#15496)
Description
With CentralizedDatasourceSchema (#14989) feature enabled, metadata for appended segments was not being refreshed. This caused numRows to be 0 for the new segments and would probably cause the datasource schema to not include columns from the new segments.

Analysis
The problem turned out in the new QuerySegmentWalker implementation in the Coordinator. It first finds the segment to be queried in the Coordinator timeline. Then it creates a new timeline of the segments present in the timeline.
The problem was that it is looking up complete partition set in the new timeline. Since the appended segments by themselves do not make a complete partition set, no SegmentMetadataQuery were executed.
2023-12-06 15:25:28 +05:30
Gian Merlino 6f51155ccb
Fix NullFilter getDimensionRangeSet. (#15500)
It wasn't checking the column name, so it would return a domain regardless
of the input column. This means that null filters on data sources with range
partitioning would lead to excessive pruning of segments, and therefore
missing results.
2023-12-06 15:09:59 +05:30
Abhishek Radhakrishnan f4949afdd7
clarify and fixup typos related to unused segments in docs and javadocs. (#15498) 2023-12-05 22:30:32 -08:00
Xavier Léauté ae6893edc3
unpin guava related dependabot dependencies (#15494)
Several dependabot ignore directives are no longer relevant. Unpin them
to ensure we get again get timely updates via dependabot.

* support for Hadoop 2 was dropped as part of #14763
* Guava was upgraded to 31 as part of #14767
* Calcite was upgraded to 1.35 as part of #14510
2023-12-05 16:04:39 -08:00