A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
Currently, after an MSQ query, the web console is responsible for waiting for the segments to load. It does so by checking if there are any segments loading into the datasource ingested into, which can cause some issues, like in cases where the segments would never be loaded, or would end up waiting for other ingests as well.
This PR shifts this responsibility to the controller, which would have the list of segments created.
* Vectorizing earliest for numeric
* Vectorizing earliest string aggregator
* checkstyle fix
* Removing unnecessary exceptions
* Ignoring tests in MSQ as earliest is not supported for numeric there
* Fixing benchmarks
* Updating tests as MSQ does not support earliest for some cases
* Addressing review comments by adding the following:
1. Checking capabilities first before creating selectors
2. Removing mockito in tests for numeric first aggs
3. Removing unnecessary tests
* Addressing issues for dictionary encoded single string columns where we can use the dictionary ids instead of the entire string
* Adding a flag for multi value dimension selector
* Addressing comments
* 1 more change
* Handling review comments part 1
* Handling review comments and correctness fix for latest_by when the time expression need not be in sorted order
* Updating numeric first vector agg
* Revert "Updating numeric first vector agg"
This reverts commit 4291709901.
* Updating code for correctness issues
* fixing an issue with latest agg
* Adding more comments and removing an unnecessary check
* Addressing null checks for tie selector and only vectorize false for quantile sketches
Changes:
- Simplify static `create` methods for `NoopTask`
- Remove `FirehoseFactory`, `IsReadyResult`, `readyTime` from `NoopTask`
as these fields were not being used anywhere
- Update tests
Changes:
[A] Remove config `decommissioningMaxPercentOfMaxSegmentsToMove`
- It is a complicated config 😅 ,
- It is always desirable to prioritize move from decommissioning servers so that
they can be terminated quickly, so this should always be 100%
- It is already handled by `smartSegmentLoading` (enabled by default)
[B] Remove config `maxNonPrimaryReplicantsToLoad`
This was added in #11135 to address two requirements:
- Prevent coordinator runs from getting stuck assigning too many segments to historicals
- Prevent load of replicas from competing with load of unavailable segments
Both of these requirements are now already met thanks to:
- Round-robin segment assignment
- Prioritization in the new coordinator
- Modifications to `replicationThrottleLimit`
- `smartSegmentLoading` (enabled by default)
Changes:
- Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent>
and thus convert it to a plain builder rather than a builder of builder.
- Add methods setCreatedTime , setMetricAndValue to the builder
Changes:
- Reduce log level of some coordinator stats, which only denote normal coordinator operation.
These stats are still emitted and can be logged by setting debugDimensions in the coordinator
dynamic config.
- Initialize SegmentLoadingConfig only for historical management duties. This config is not
needed in other duties and initializing it creates logs which are misleading.
* prometheus-emitter: add extraLabels parameter
* prometheus-emitter: update readme to include the extraLabels parameter
* prometheus-emitter: remove nullable and surface label name issues
* remove import to make linter happy
Currently we have an error handler for https connection attempts, but
not for plaintext connection attempts. This leads to warnings like the
following for plaintext connection errors:
EXCEPTION, please implement org.jboss.netty.handler.codec.http.HttpContentDecompressor.exceptionCaught() for proper handling.
This happens because if we don't add our own error handler, the last
handler in the chain during a connection attempt is HttpContentDecompressor,
which doesn't handle errors.
The new error handler for plaintext doesn't do much: it just closes
the channel.
Changes
- Increase value of `replicationThrottleLimit` computed by `smartSegmentLoading` from
2% to 5% of total number of used segments.
- Assign replicas to a tier even when some replicas are already being loaded in that tier
- Limit the total number of replicas in load queue at start of run + replica assignments in
the run to the `replicationThrottleLimit`.
i.e. for every tier,
num loading replicas at start of run + num replicas assigned in run <= replicationThrottleLimit
* better dialog formatting
* use CSS to render triangle
* can flatten in kafka also
* better formatting
* better format
* fill in empty values in line chart
* more fp
* add show others
Changes:
- Fix capacity response in mm-less ingestion.
- Add field usedClusterCapacity to the GET /totalWorkerCapacity response.
This API should be used to get the total ingestion capacity on the overlord.
- Remove method `isK8sTaskRunner` from interface `TaskRunner`
Changes:
- Determine the default value of balancerComputeThreads based on number of
coordinator cpus rather than number of segments. Even if the number of segments
is low and we create more balancer threads, it doesn't hurt the system as threads
would mostly be idle.
- Remove unused field from SegmentLoadQueueManager
Expected values:
- Clusters with ~1M segments typically work with Coordinators having 16 cores or more.
This would give us 8 balancer threads, which is the same as the current maximum.
- On small clusters, even a single thread is enough to do the required balancing work.
Suppress CVEs from dependencies with no available fix or false positives
hadoop-annotations: CVE-2022-25168, CVE-2021-33036
hadoop-client-runtime: CVE-2023-1370, CVE-2023-37475
okio: CVE-2023-3635
Upgrade grpc version to fix CVE-2023-33953
This patch fixes a few issues toward #14858
1. some phony classes were added to enable maven to track the compilation of those classes
2. cyclonedx 2.7.9 seem to handle incremental compilation better; it had a PR relating to that
3. needed to update root pom to 25
4. update antlr to 4.5.3 older one didn't really worked incrementally; 4.5.3 works much better
### Description
This change enables the `KillUnusedSegments` coordinator duty to be scheduled continuously. Things that prevented this, or made this difficult before were the following:
1. If scheduled at fast enough rate, the duty would find the same intervals to kill for the same datasources, while kill tasks submitted for those same datasources and intervals were already underway, thus wasting task slots on duplicated work.
2. The task resources used by auto kill were previously unbounded. Each duty run period, if unused
segments were found for any datasource, a kill task would be submitted to kill them.
This pr solves for both of these issues:
1. The duty keeps track of the end time of the last interval found when killing unused segments for each datasource, in a in memory map. The end time for each datasource, if found, is used as the start time lower bound, when searching for unused intervals for that same datasource. Each duty run, we remove any datasource keys from this map that are no longer found to match datasources in the system, or in whitelist, and also remove a datasource entry, if there is found to be no unused segments for the datasource, which happens when we fail to find an interval which includes unused segments. Removing the datasource entry from the map, allows for searching for unusedSegments in the datasource from the beginning of time once again
2. The unbounded task resource usage can be mitigated with coordinator dynamic config added as part of ba957a9b97
Operators can configure continous auto kill by providing coordinator runtime properties similar to the following:
```
druid.coordinator.period.indexingPeriod=PT60S
druid.coordinator.kill.period=PT60S
```
And providing sensible limits to the killTask usage via coordinator dynamic properties.
There is a current issue due to inconsistent metadata between worker and controller in MSQ. A controller can receive one set of segments, which are then marked as unused by, say, a compaction job. The worker would be unable to get the segment information as MetadataResource.
Motivation:
- Clean up `DruidCoordinator` and move methods to classes where they are most relevant
Changes:
- No functional change
- Add duty `PrepareBalancerAndLoadQueues` to replace `UpdateCoordinatorState`
- Move map of `LoadQueuePeon` from `DruidCoordinator` to `LoadQueueTaskMaster`
- Make `BalancerStrategyFactory` an abstract class and keep the balancer executor here
- Move reporting of used segment stats and historical capacity stats from
`CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues`
- Move reporting of unavailable and under-replicated segment stats from
`CollectSegmentAndServerStats` to `UpdateReplicationStatus` duty
Currently, Druid is using Guava 16.0.1 version. This upgrade to 31.1-jre fixes the following issues.
CVE-2018-10237 (Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable). We don't use Java or GWT serializations. Despite being false positive they're causing red security scans on Druid distribution.
Latest version of google-client-api is incompatible with the existing Guava version. This PR unblocks Update google client apis to latest version #14414