This PR builds on #13304 to skip compaction for datasources with segments that have their interval start or end coinciding with Eternity interval end-points.
This is needed in order to prevent an issue similar to #13208 as the Coordinator tries to iterate over a large number of intervals when trying to compact an interval with infinite start or end.
The website pom was removed as part of
https://github.com/apache/druid/pull/14411 so we no longer need to
reference it as a module and the profile can be removed.
Dependabot is currently failing trying to look for this module, so
removing it should also fix that.
### Description
This pr adds an api for retrieving unused segments for a particular datasource. The api supports pagination by the addition of `limit` and `lastSegmentId` parameters. The resulting unused segments are returned with optional `sortOrder`, `ASC` or `DESC` with respect to the matching segments `id`, `start time`, and `end time`, or not returned in any guarenteed order if `sortOrder` is not specified
`GET /druid/coordinator/v1/datasources/{dataSourceName}/unusedSegments?interval={interval}&limit={limit}&lastSegmentId={lastSegmentId}&sortOrder={sortOrder}`
Returns a list of unused segments for a datasource in the cluster contained within an optionally specified interval.
Optional parameters for limit and lastSegmentId can be given as well, to limit results and enable paginated results.
The results may be sorted in either ASC, or DESC order depending on specifying the sortOrder parameter.
`dataSourceName`: The name of the datasource
`interval`: the specific interval to search for unused segments for.
`limit`: the maximum number of unused segments to return information about. This property helps to
support pagination
`lastSegmentId`: the last segment id from which to search for results. All segments returned are > this segment
lexigraphically if sortOrder is null or ASC, or < this segment lexigraphically if sortOrder is DESC.
`sortOrder`: Specifies the order with which to return the matching segments by start time, end time. A null
value indicates that order does not matter.
This PR has:
- [x] been self-reviewed.
- [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
* Add initial draft of MarkDanglingTombstonesAsUnused duty.
* Use overshadowed segments instead of all used segments.
* Add unit test for MarkDanglingSegmentsAsUnused duty.
* Add mock call
* Simplify code.
* Docs
* shorter lines formatting
* metric doc
* More tests, refactor and fix up some logic.
* update javadocs; other review comments.
* Make numCorePartitions as 0 in the TombstoneShardSpec.
* fix up test
* Add tombstone core partition tests
* Update docs/design/coordinator.md
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* review comment
* Minor cleanup
* Only consider tombstones with 0 core partitions
* Need to register the test shard type to make jackson happy
* test comments
* checkstyle
* fixup misc typos in comments
* Update logic to use overshadowed segments
* minor cleanup
* Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone.
* Address review feedback.
---------
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Query with lookups in FilteredAggregator fails with this exception in router,
Cannot construct instance of `org.apache.druid.query.aggregation.FilteredAggregatorFactory`, problem: Lookup [campaigns_lookup[campaignId][is_sold][autodsp]] not found at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 913] (through reference chain: org.apache.druid.query.groupby.GroupByQuery["aggregations"]->java.util.ArrayList[1])
T
he problem is that constructor of FilteredAggregatorFactory is actually validating if the lookup exists in this statement dimFilter.toFilter().
This is failing on the router, which is to be expected, because, the router isn’t assigned any lookups.
The fix is to move to a lazy initialisation of the filter object in the constructor.
Fixes a potential NPE which could occur while folding the HllSketchAggregator. If the sketch is null, druid could return a null HllSketchHolder object. Adding a null check here could help here
Resolves a null pointer exception in HllSketchAggregatorFactory
Description
With CentralizedDatasourceSchema (#14989) feature enabled, metadata for appended segments was not being refreshed. This caused numRows to be 0 for the new segments and would probably cause the datasource schema to not include columns from the new segments.
Analysis
The problem turned out in the new QuerySegmentWalker implementation in the Coordinator. It first finds the segment to be queried in the Coordinator timeline. Then it creates a new timeline of the segments present in the timeline.
The problem was that it is looking up complete partition set in the new timeline. Since the appended segments by themselves do not make a complete partition set, no SegmentMetadataQuery were executed.
It wasn't checking the column name, so it would return a domain regardless
of the input column. This means that null filters on data sources with range
partitioning would lead to excessive pruning of segments, and therefore
missing results.
Several dependabot ignore directives are no longer relevant. Unpin them
to ensure we get again get timely updates via dependabot.
* support for Hadoop 2 was dropped as part of #14763
* Guava was upgraded to 31 as part of #14767
* Calcite was upgraded to 1.35 as part of #14510
This change completes the change introduced in #15461
and unifies the version of gson dependency used between all the modules.
gson is used by kubernetes-extension, avro-extensions, ranger-security,
and as a test dependency in several core modules.
---------
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
* Excluding jackson-jaxrs dependency from ranger-plugin-common to address CVE regression introduced by ranger-upgrade: CVE-2019-10202, CVE-2019-10172
* remove the reference to outdated ranger 2.0 from the docs
---------
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
- Licenses file contains several licenses for outdated libraries. In this PR we remove licenses for no longer used components.
This change is purely cosmetic / cleans up the license database.
The candidates were designated by reviewing the output of the license check script and comparing it against the depdency tree.
- Minor fix to license check tool to fail more gracefully when the license of used dependency is not listed as known, as well as fix not to fail on multi licensed components when at least one of the licenses is accepted.
---------
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
Update guava to 32.0.1-jre to address two CVEs: CVE-2020-8908, CVE-2023-2976
This change requires a minor test change to remove assumptions about ordering.
---------
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
Update multiple dependencies to clear CVEs
Update dropwizard-metrics to 4.2.22 to address GHSA-mm8h-8587-p46h in com.rabbitmq:amqp-client
Update ant to 1.10.14 to resolve GHSA-f62v-xpxf-3v68 GHSA-4p6w-m9wc-c9c9 GHSA-q5r4-cfpx-h6fh GHSA-5v34-g2px-j4fw
Update comomons-compress to resolve GHSA-cgwf-w82q-5jrr
Update jose4j to 0.9.3 to resolve GHSA-7g24-qg88-p43q GHSA-jgvc-jfgh-rjvv
Update kotlin-stdlib to 1.6.0 to resolve GHSA-cqj8-47ch-rvvq and CVE-2022-24329