Extracting a few miscellaneous non-functional changes from the batch supervisor branch:
- Replace anonymous inner classes with lambda expressions in the SQL supervisor manager layer
- Add explicit @Nullable annotations in DynamicConfigProviderUtils to make IDE happy
- Small variable renames (copy-paste error perhaps) and fix typos
- Add table name for this exception message: Delete the supervisor from the table[%s] in the database...
- Prefer CollectionUtils.isEmptyOrNull() over list == null || list.size() > 0. We can change the Precondition checks to throwing DruidException separately for a batch of APIs at a time.
* update docs for kafka lookup extension to specify correct extension ordering
* fix first line
* test with extension dependencies
* save work on dependency management
* working dependency graph
* working pull
* fix style
* fix style
* remove name
* load extension dependencies recursively
* generate depenencies on classloader creation
* add check for circular dependencies
* fix style
* revert style changes
* remove mutable class loader
* clean up class heirarchy
* extensions loader test working
* add unit tests
* pr comments
* fix unit tests
The current Supervisor interface is primarily focused on streaming use cases. However, as we introduce supervisors for non-streaming use cases, such as the recently added CompactionSupervisor (and the upcoming BatchSupervisor), certain operations like resetting offsets, checkpointing, task group handoff, etc., are not really applicable to non-streaming use cases.
So the methods are split between:
1. Supervisor: common methods that are applicable to both streaming and non-streaming use cases
2. SeekableStreamSupervisor: Supervisor + streaming-only operations. The existing streaming-only overrides exist along with the new abstract method public abstract LagStats computeLagStats(), for which custom implementations already exist in the concrete types
This PR is primarily a refactoring change with minimal functional adjustments (e.g., throwing an exception in a few places in SupervisorManager when the supervisor isn't the expected SeekableStreamSupervisor type).
Description
-----------
Coordinator logs are fairly noisy and don't give much useful information (see example below).
Even when the Coordinator misbehaves, these logs are not very useful.
Main changes
------------
- Add API `GET /druid/coordinator/v1/duties` that returns a status list of all duty groups currently running on the Coordinator
- Emit metrics `segment/poll/time`, `segment/pollWithSchema/time`, `segment/buildSnapshot/time`
- Remove redundant logs that indicate normal operation of well-tested aspects of the Coordinator
Refactors
---------
- Move some logic from `DutiesRunnable` to `CoordinatorDutyGroup`
- Move stats collection from `CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues`
- Minor cleanup of class `DruidCoordinator`
- Clean up class `DruidCoordinatorRuntimeParams`
- Remove field `coordinatorStartTime`. Maintain start time in `MarkOvershadowedSegmentsAsUnused` instead.
- Remove field `MetadataRuleManager`. Pass supplier to constructor of applicable duties instead.
- Make `usedSegmentsNewestFirst` and `datasourcesSnapshot` as non-nullable as they are always required.
#16768 added the functionality to run compaction as a supervisor on the overlord.
This patch builds on top of that to restrict MSQ engine to compaction in the supervisor-mode only.
With these changes, users can no longer add MSQ engine as part of datasource compaction config,
or as the default cluster-level compaction engine, on the Coordinator.
The patch also adds an Overlord runtime property `druid.supervisor.compaction.engine=<msq/native>`
to specify the default engine for compaction supervisors.
Since these updates require major changes to existing MSQ compaction integration tests,
this patch disables MSQ-specific compaction integration tests -- they will be taken up in a follow-up PR.
Key changed/added classes in this patch:
* CompactionSupervisor
* CompactionSupervisorSpec
* CoordinatorCompactionConfigsResource
* OverlordCompactionScheduler
changes:
* add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized
* add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type
* add support for array and complex vector constants
* reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness)
* fix array constructor expression, apply map expression to make actual evaluated type match the output type inference
* fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric
* fix isNull/isNotNull to correctly handle any type of input argument
* SQL syntax error should target USER persona
* * revert change to queryHandler and related tests, based on review comments
* * add test
* Properly encode schema and table names when syncing catalog
Previously table names and schema names were not being properly url
encododed when requested. Now they are.
Introduced includeTrailerHeader to enable TrailerHeaders in response
If enabled, a header X-Error-Message will be added to indicate reasons for partial results.
* enables to use DruidHook for native plan logging
* qudiem tests doesn't necessarily need to run the query to get an explain - this helps during development as if there is a runtime issue it could still be explained in the test
Text-based input formats like csv and tsv currently parse inputs only as strings, following the RFC4180Parser spec).
To workaround this, the web-console and other tools need to further inspect the sample data returned to sample data returned by the Druid sampler API to parse them as numbers.
This patch introduces a new optional config, tryParseNumbers, for the csv and tsv input formats. If enabled, any numbers present in the input will be parsed in the following manner -- long data type for integer types and double for floating-point numbers, and if parsing fails for whatever reason, the input is treated as a string. By default, this configuration is set to false, so numeric strings will be treated as strings.
* Use the whole frame when writing rows.
This patch makes the following adjustments to enable writing larger
single rows to frames:
1) RowBasedFrameWriter: Max out allocation size on the final doubling.
i.e., if the final allocation "naturally" would be 1 MiB but the
max frame size is 900 KiB, use 900 KiB rather than failing the 1 MiB
allocation.
2) AppendableMemory: In reserveAdditional, release the last block if it
is empty. This eliminates waste when a frame writer uses a
successive-doubling approach to find the right allocation size.
3) ArenaMemoryAllocator: Reclaim memory from the last allocation when
the last allocation is closed.
Prior to these changes, a single row could be much smaller than the
frame size and still fail to be added to the frame.
* Style.
* Fix test.
Revives #14024 and additionally supports,
Native queries
gRPC health check endpoint
This PR doesn't have the shaded module for packaging gRPC and Guava libraries since grpc-query module uses the same Guava version as that of Druid.
The response is gRPC-specific. It provides the result schema along with the results as a binary "blob". Results can be in CSV, JSON array lines or as an array of Protobuf objects. If using Protobuf, the corresponding class must be installed along with the gRPC query extension so it is available to the Broker at runtime.
The "suggested server memory" figure needs to take into account
maxConcurrentStages. The fix here does not affect the main memory
calculations, but it does affect the accuracy of error messages.
Parent issue: #14989
It is possible for the order of columns to vary across segments especially during realtime ingestion.
Since, the schema fingerprint is sensitive to column order this leads to creation of a large number of segment schema in the metadata database for essentially the same set of columns.
This is wasteful, this patch fixes this problem by computing schema fingerprint on lexicographically sorted columns. This would result in creation of a single schema in the metadata database with the first observed column order for a given signature.
Currently, TaskDataSegmentProvider fetches the DataSegment from the Coordinator while loading the segment, but just discards it later. This PR refactors this to also return the DataSegment so that it can be used by workers without a separate fetch.
There were some problematic cases
join branches are run with finalize=false instead of finalize=true like normal subqueries
this inconsistency is not good - but fixing it is a bigger thing
ensure that right hand sides of joins are always subqueries - or accessible globally
To achieve the above:
operand indexes were needed for the upstream reltree nodes in the generator
source unwrapping now takes the join situation into account as well
* Reduce the number of ITs in CDS groups
The two CDS IT groups cds-task-schema-publish-disabled & cds-coordinator-metadata-query-disabled
runs a lot of ITs serially, leading to increased CI runtime.
Reducing the number of ITs running in the two groups to,
ITAppendBatchIndexTest (append-ingestion)
ITIndexerTest (batch-index)
ITOverwriteBatchIndexTest (batch-index)
ITCompactionSparseColumnIndexTest (compaction)
ITCompactionTaskTest (compaction)
ITKafkaIndexingServiceDataFormatTest (kafka-data-format)
Both these groups will be removed once CDS is enabled by default.
* MSQ: Add QueryKitSpec to encapsulate QueryKit params.
This patch introduces QueryKitSpec, an object that encapsulates the
parameters to makeQueryDefinition that are consistent from call to
call. This simplifies things because we avoid passing around all the
components individually.
This patch also splits "maxWorkerCount" into "maxLeafWorkerCount" and
"maxNonLeafWorkerCount", which apply to leaf stages (no other stages as
inputs) and nonleaf stages respectively.
Finally, this patch also rovides a way for ControllerContext to supply a
QueryKitSpec to its liking. It is expected that this will be used by
controllers of quick interactive queries to set maxNonLeafWorkerCount = 1,
which will generate fanning-in query plans.
* Fix javadoc.
Previously, the processor used "remainingChannels" to track the number of
non-null entries of currentFrame. Now, "remainingChannels" tracks the
number of channels that are unfinished.
The difference is subtle. In the previous code, when an input channel
was blocked upon exiting nextFrame(), the "currentFrames" entry would be
null, and therefore the "remainingChannels" variable would be decremented.
After the next await and call to populateCurrentFramesAndTournamentTree(),
"remainingChannels" would be incremented if the channel had become
unblocked after awaiting.
This means that finished(), which returned true if remainingChannels was
zero, would not be reliable if called between nextFrame() and the
next await + populateCurrentFramesAndTournamentTree().
This patch changes things such that finished() is always reliable. This
fixes a regression introduced in PR #16911, which added a call to
finished() that was, at that time, unsafe.
* MSQ: Properly report errors that occur when starting up RunWorkOrder.
In #17046, an exception thrown by RunWorkOrder#startAsync would be ignored
and replaced with a generic CanceledFault. This patch fixes it by retaining
the original error.
* TableInputSpecSlicer changes to support running on Brokers.
Changes:
1) Rename TableInputSpecSlicer to IndexerTableInputSpecSlicer, in anticipation
of a new implementation being added for controllers running on Brokers.
2) Allow the context to use the WorkerManager to build the TableInputSpecSlicer,
in anticipation of Brokers wanting to use this to assign segments to servers
that are already serving those segments.
3) Remove unused DataSegmentTimelineView interface.
4) Add additional javadoc to DataSegmentProvider.
* Style.
* MSQ: Include worker context maps in WorkOrders.
This provides a mechanism to send contexts to workers in long-lived,
shared JVMs that are not part of the task system.
* Style, coverage.
Register a Ser-De for RowsAndColumns so that the window operator query running on leaf operators would be transferred properly on the wire. Would fix the empty response given by window queries without group by on the native engine.