* SQL syntax error should target USER persona
* * revert change to queryHandler and related tests, based on review comments
* * add test
* * add `ingest/notices/queueSize` and `ingest/pause/time` to statsd emitter
* * add taskStatus dimension to `service/heartbeat` metric
* Revert "* add taskStatus dimension to `service/heartbeat` metric"
This reverts commit cfb02a2813.
changes:
* fix issue when merging projections from multiple-incremental persists which was hoping that some 'dim conversion' buffers were not closed, but they already were (by the merging iterator). fix involves selectively persisting these conversion buffers to temp files in the segment write out directory and mapping them and tying them to the segment level closer so that they are available after the lifetime of the parent merger
* modify auto column serializers to use segment write out directory for temp files instead of java.io.tmpdir
* fix queryable index projection to not put the time-like column as a dimension, instead only adding it as __time
* use smoosh for temp files so can safely write any Serializer to a temp smoosh
* SQL syntax error should target USER persona
* * revert change to queryHandler and related tests, based on review comments
* * add test
* Add 'ingest/notices/time' metric to statsd emitter
This metric gives the milliseconds taken to process a notice by the supervisor.
* SQL syntax error should target USER persona
* * revert change to queryHandler and related tests, based on review comments
* * add test
* Add documentation for druid-catalog extension
* * fix error
* * fix error
* Apply suggestions from code review
Co-authored-by: Andreas Maechler <amaechler@gmail.com>
* * fix spelling error
* * fix spelling
---------
Co-authored-by: Andreas Maechler <amaechler@gmail.com>
* ScanQuery: equals/hashCode/toString
* DruidQuery: changes of Align ScanQuery column order with its desired signature #17457
* ScanQueryTest: add equalsverifer test
* Add a wait on start() for task lifecycle to go into running
* handle exceptions
* Fix logging messages
* Don't pass in the settable future as a arg
* add some unit tests
* Run JDK 21 workflows with 21.0.4.
To work around #17429, run our JDK 21 workflows with
version 21.0.4. It does not appear to have this problem.
* Undo changes in standard-its.yml
* Add comments.
---------
Co-authored-by: Zoltan Haindrich <kirk@rxd.hu>
* WindowOperatorQueryKit: Pass QueryContext instead of WindowOperatorQuery to subsequent layers
* Add serializer for QueryContext class
* Revert changes of WindowOperatorQueryFrameProcessorFactory json param
* Fix checkstyle
* Address review comment: Remove older method in favor of calling new method inline
This patch re-uses timeBoundaryInspector for each cursor holder, which
enables caching of minDataTimestamp and maxDataTimestamp.
Fixes a performance regression introduced in #16533, where these fields
stopped being cached across cursors. Prior to that patch, they were
cached in the QueryableIndexStorageAdapter.
* introduces `UnionQuery`
* some changes to enable a `UnionQuery` to have multiple input datasources
* `UnionQuery` execution is driven by the `QueryLogic` - which could later enable to reduce some complexity in `ClientQuerySegmentWalker`
* to run the subqueries of `UnionQuery` there was a need to access the `conglomerate` from the `Runner`; to enable that some refactors were done
* renamed `UnionQueryRunner` to `UnionDataSourceQueryRunner`
* `QueryRunnerFactoryConglomerate` have taken the place of `QueryToolChestWarehouse` which shaves of some unnecessary things here and there
* small cleanup/refactors
Add build instructions for developers
Follow up from issue #17375, add instructions solely for distribution profile. Note that this build command is mostly used by me, everyone is welcome to add further optimizations for a faster distribution build.
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
* Update docs/development/build.md
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
* Update docs/development/build.md
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
---------
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
Change the persona for errors within the planner from Admin to User. The ADMIN persona is meant to be "a persona who is interacting with admin APIs and understands Druid query concepts". This isn't an admin API, it's a query API. Low quality error messages being returned to the correct audience is better than hiding all error messages.
The errors that can be returned back can be user solvable, and other times requires a druid expert. But the errors do not leak information that should only be seen by more expert/privileged personas.
The original ADMIN persona showed some reticence to tag low-quality error messages with a USER persona. but it really does seem user-directed to me so USER to me would make sense.
* handling empty sets for dataSourceCondition and taskTypeCondition
* using new HashSet<>() to fix forbidden api error in testCheck
* fixing style issues
Map Lookup Introspection API endpoints /keys and /values no longer return an invalid JSON object.
Also, update documentation to clarify the version returned by the /version introspection endpoint.
---------
Co-authored-by: Ashwin Tumma <ashwin.tumma@salesforce.com>
Currently, durable storage and export both require configuring a temporary directory to be used using druid.export.storage.<connectorType>.tempLocalDir and druid.msq.intermediate.storage.tempDir.
Tasks on middle manager already have a configured temporary directory. This PR aims to reduce the configuration required by using the task directory as a default if it is not explicitly configured, thus reducing the number of configs that a user has to set.
Please note that preference would be given to the user configured, druid.*.storage.temp*Dir, on the tasks. If that is not configured, we then use the configured temporary directory.
Overlord and brokers also require storage connector configurations (for the durableStorageCleanerOverlordDuty and to fetch results of async queries respectively), but do not have a default temporary task directory. The configuration is still required for these services.
* DruidOverlord: Move becomeLeader/stopBeingLeader earlier.
On becoming leader, it is helpful for the TaskRunner and TaskQueue to be
available when the SupervisorManager starts up, to aid the supervisors
in discovering their tasks.
On stopping leadership, it is helpful for the TaskRunner and TaskQueue
to be available until the SupervisorManager has finished shutting down.
They are only available when the TaskMaster is in "leader" mode, so to
achieve the above, this patch moves it earlier in the sequence.
* Adjust leadership into two phases.
* Update test.
* Adjustments for coverage.
* Stop mirrors start better.
* Update errorprone, mockito, jacoco, checkerframework.
This patch updates various build and test dependencies, to see if they
cause unit tests on JDK 21 to behave more reliably.
* Update licenses, tests.
* Remove assertEquals.
* Repair two tests.
* Update some more tests.
Following #17394, workerExec can get deadlocked with itself, because it
waits for task futures and is also used as the connectExec for the task
client. To fix this, we need to never await task futures in the workerExec.
There are two specific changes: in "verifyAndMergeCheckpoints" and
"checkpointTaskGroup", two "coalesceAndAwait" calls that formerly occurred
in workerExec are replaced with Futures.transform (using a callback in
workerExec).
Because this adjustment removes a source of blocking, it may also improve
supervisor responsiveness for high task counts. This is not the primary
goal, however. The primary goal is to fix the bug introduced by #17394.
Calling toString on newConfig is unnecessary, because it will be done
automatically by the logger. This saves some effort under log levels
higher than DEBUG.
* SeekableStreamSupervisor: Use workerExec as the client connectExec.
This patch uses the already-existing per-supervisor workerExec as the
connectExec for task clients, rather than using the process-wide default
ServiceClientFactory pool.
This helps prevent callbacks from backlogging on the process-wide pool.
It's especially useful for retries, where callbacks may need to establish
new TCP connections or perform TLS handshakes.
* Fix compilation, tests.
* Fix style.
changes:
* adds `SqlBenchmarkDatasets` which contains commonly used benchmark data generator schemas
* adds `SqlBaseBenchmark` which contains common benchmark segment generation methods for any benchmark using `SqlBenchmarkDatasets`
* adds `SqlBaseQueryBenchmark` and `SqlBasePlanBenchmark` for benchmarks measuring queries and planning respectively
* migrate all existing SQL jmh benchmarks to extend `SqlBaseQueryBenchmark`, quite dramatically reducing the boilerplate needed to create benchmarks, and allowing the use of multiple datasources within a benchmark file
* adjustments to data generator stuff to allow passing in an ObjectMapper so that the same mapper can be used for both benchmark queries and segment generation, avoiding the need to register stuff with both mappers for benchmarks
* adds `SqlProjectionsBenchmark` and `SqlComplexMetricsColumnsBenchmark` for measuring projections and measuring complex metric compression respectively