* more aggressive cancellation of broker parallel merge, more chill blocking queue timeouts
* wire parallel merge into query cancellation system
* oops
* style
* adjust metrics initialization
* fix timeout, fix cleanup to not block
* javadocs to clarify why cancellation future and gizmo are split
* cancelled -> canceled, simplify QueuePusher since it always takes a ResultBatch, non-static terminal marker to make stuff stop complaining about types, specialize tryOffer to be tryOfferTerminal so it wont be misused, add comments to clarify reason for non-blocking offers that might fail
For aggregators like StringFirst/Last, whose intermediate type isn't the same as the final type, using them in GroupBy, TopN or Timeseries subqueries causes a fallback when maxSubqueryBytes is set. This is because we assume that the finalization is not known, due to which the row signature cannot determine whether to use the intermediate or the final type, and it puts it as null. This PR figures out the finalization from the query context and uses the intermediate or the final type appropriately.
* MSQ window functions: Revamp logic to create separate window stages when empty over() clause is present
* Fix tests
* Revert changes of creating separate stages for empty over clause
* Address review comments
changes:
* moves value column serializer initialization, call to `writeValue` method to `GlobalDictionaryEncodedFieldColumnWriter.writeTo` instead of during `GlobalDictionaryEncodedFieldColumnWriter.addValue`. This shift means these numeric value columns are now done in the per field section that happens after serializing the nested column raw data, so only a single compression buffer and temp file will be needed at a time instead of the total number of nested literal fields present in the column. This should be especially helpful for complicated nested structures with thousands of columns as even those 64k compression buffers can add up pretty quickly to a sizeable chunk of direct memory.
changes:
* removes `druid.indexer.task.batchProcessingMode` in favor of always using `CLOSED_SEGMENT_SINKS` which uses `BatchAppenderator`. This was intended to become the default for native batch, but that was missed so `CLOSED_SEGMENTS` was the default (using `AppenderatorImpl`), however MSQ has been exclusively using `BatchAppenderator` with no problems so it seems safe to just roll it out as the only option for batch ingestion everywhere.
* with `batchProcessingMode` gone, there is no use for `AppenderatorImpl` so it has been removed
* implify `Appenderator` construction since there are only separate stream and batch versions now
* simplify tests since `batchProcessingMode` is gone
This PR aims to check if the complex column being queried aligns with the supported types in the aggregator and aggregator factories, and throws a user-friendly error message if they don't.
* Throw exception if DISTINCT used with window functions aggregate call
* Improve error message when unsupported aggregations are used with window functions
Fixes#16766
Change log level from INFO to DEBUG when processing an empty user map
during polling. An empty user map is a normal situation for some
authenticators (e.g. LDAP) and polling is frequent (1 minute by
default.)
changes:
* removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602
* Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader`
* Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes
* Moved anything remaining in a 'firehose' package somewhere else
* Clean up docs on firehose stuff
Description:
Overlord guice dependencies are currently a little difficult to plug into.
This was encountered while working on a separate PR where a class needed to depend
on `TaskMaster.getTaskQueue()` to query some task related info but this class itself
needs to be a dependency of `TaskMaster` so that it can be registered to the leader lifecycle.
The approach taken here is to simply decouple the leadership lifecycle of the overlord from
manipulation or querying of its state.
Changes:
- No functional change
- Add new class `DruidOverlord` to contain leadership logic after the model of `DruidCoordinator`
- The new class `DruidOverlord` should not be a dependency of any class with the exception of
REST endpoint `*Resource` classes.
- All classes that need to listen to leadership changes must be a dependency of `DruidOverlord`
so that they can be registered to the leadership lifecycle.
- Move all querying logic from `OverlordResource` to `TaskQueryTool` so that other classes can
leverage this logic too (required for follow up PR).
- Update tests
* pre upgrade
* did the upgrade
* update snapshots
* fix BP5 issues
* update licenses
* fix more depication warnings
* use segmented control
* updat snapshots
* convert to fake local time
* preload icons before tests
* update e2e tests
* Update web-console/src/components/segment-timeline/segment-timeline.tsx
Co-authored-by: John Gozde <john@gozde.ca>
* Update web-console/src/components/segment-timeline/segment-timeline.tsx
Co-authored-by: John Gozde <john@gozde.ca>
* update e2e test selector
* direct import date-fns
---------
Co-authored-by: John Gozde <john@gozde.ca>
* updating first batch of numeric functions
* First batch of functions
* addressing first few comments
* alphabetize list
* draft with suggestions applied
* minor discrepency expr -> <NUMERIC>
* changed raises to calculates
* Update docs/querying/sql-functions.md
* switch to underscore
* changed to exp(1) to match slack message
* adding html text for trademark symbol to .spelling
* fixed discrepancy between description and example
---------
Co-authored-by: Benedict Jin <asdf2014@apache.org>