* Support complex variance object inputs for variance SQL agg function
* Add test
* Include complexTypeChecker, address PR comments
* Checkstyle, javadoc link
This PR aims to expose a new API called
"@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.
Adds support for automatic cleaning of a "query-results" directory in durable storage. This directory will be cleaned up only if the task id is not known to the overlord. This will allow the storage of query results after the task has finished running.
Changes:
- Throw an `InsertCannotAllocateSegmentFault` if the allocated segment is not aligned with
the requested granularity.
- Tests to verify new behaviour
* MSQ: Change default clusterStatisticsMergeMode to SEQUENTIAL.
This is an undocumented parameter that controls how cluster-by statistics
are merged. In PARALLEL mode, statistics are gathered from workers all
at once. In SEQUENTIAL mode, statistics are gathered time chunk by time
chunk. This improves accuracy for jobs with many time chunks, and reduces
memory usage.
The main downside of SEQUENTIAL is that it can take longer, but in most
situations I've seen, PARALLEL is only really usable in cases where the
sketches are small enough that SEQUENTIAL would also run relatively
quickly. So it seems like SEQUENTIAL is a better default.
* Switch off-test from SEQUENTIAL to PARALLEL.
* Fix sequential merge for situations where there are no time chunks at all.
* Add a couple more tests.
Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryRows in Broker's config or maxSubqueryRows in the query context. This feature is experimental for now and would default back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.
* SQL OperatorConversions: Introduce.aggregatorBuilder, allow CAST-as-literal.
Four main changes:
1) Provide aggregatorBuilder, a more consistent way of defining the
SqlAggFunction we need for all of our SQL aggregators. The mechanism
is analogous to the one we already use for SQL functions
(OperatorConversions.operatorBuilder).
2) Allow CASTs of constants to be considered as "literalOperands". This
fixes an issue where various of our operators are defined with
OperandTypes.LITERAL as part of their checkers, which doesn't allow
casts. However, in these cases we generally _do_ want to allow casts.
The important piece is that the value must be reducible to a constant,
not that the SQL text is literally a literal.
3) Update DataSketches SQL aggregators to use the new aggregatorBuilder
functionality. The main user-visible effect here is [2]: the aggregators
would now accept, for example, "CAST(0.99 AS DOUBLE)" as a literal
argument. Other aggregators could be updated in a future patch.
4) Rename "requiredOperands" to "requiredOperandCount", because the
old name was confusing. (It rhymes with "literalOperands" but the
arguments mean different things.)
* Adjust method calls.
* S3: Attach SSE key to doesObjectExist calls.
We did not previously attach the SSE key to the doesObjectExist request,
leading to an inconsistency that may cause problems on "S3-compatible"
implementations. This patch implements doesObjectExist using similar
logic to the S3 client itself, but calls our implementation of
getObjectMetadata rather than the S3 client's, ensuring the request
is decorated with the SSE key.
* Fix tests.
* Fix compatibility issue with SqlTaskResource
The DruidException changes broke the response format
for errors coming back from the SqlTaskResource, so fix those
Introduce DruidException, an exception whose goal in life is to be delivered to a user.
DruidException itself has javadoc on it to describe how it should be used. This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used.
This work was a 3rd iteration on top of work that was started by Paul Rogers. I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.
* Throw ValidationException if CLUSTERED BY column descending order is specified.
- Fails query planning
* Some more tests.
* fixup existing comment
* Update comment
* checkstyle fix: remove unused imports
* Remove InsertCannotOrderByDescendingFault and deprecate the fault in readme.
* move deprecated field to the bottom
They were not previously loaded because supportsQueries was false.
This patch sets supportsQueries to true, and clarifies in Task
javadocs that supportsQueries can be true for tasks that aren't
directly queryable over HTTP.
* Limit select results in MSQ
* reduce number of files in test
* add truncated flag
* avoid materializing select results to list, use iterable instead
* javadocs
* fix kafka input format reader schema discovery and partial schema discovery to actually work right, by re-using dimension filtering logic of MapInputRowParser
Changes
- Add a `DruidException` which contains a user-facing error message, HTTP response code
- Make `EntryExistsException` extend `DruidException`
- If metadata store max_allowed_packet limit is violated while inserting a new task, throw
`DruidException` with response code 400 (bad request) to prevent retries
- Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL
* Fix EarliestLatestBySqlAggregator signature; Include function name for all signatures.
* Single quote function signatures, space between args and remove \n.
* fixup UT assertion
The same aggregator can have two output names for a SQL like:
INSERT INTO foo
SELECT x, COUNT(*) AS y, COUNT(*) AS z
FROM t
GROUP BY 1
PARTITIONED BY ALL
In this case, the SQL planner will create a query with a single "count"
aggregator mapped to output names "y" and "z". The prior MSQ code did
not properly handle this case, instead throwing an error like:
Expected single output for query column[a0] but got [[1, 2]]
It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.
### Description
This change allows for consideration of the input format and compression when computing how to split the input files among available tasks, in MSQ ingestion, when considering the value of the `maxInputBytesPerWorker` query context parameter. This query parameter allows users to control the maximum number of bytes, with granularity of input file / object, that ingestion tasks will be assigned to ingest. With this change, this context parameter now denotes the estimated weighted size in bytes of the input to split on, with consideration for input format and compression format, rather than the actual file size, reported by the file system. We assume uncompressed newline delimited json as a baseline, with scaling factor of `1`. This means that when computing the byte weight that a file has towards the input splitting, we take the file size as is, if uncompressed json, 1:1. It was found during testing that gzip compressed json, and parquet, has scale factors of `4` and `8` respectively, meaning that each byte of data is weighted 4x and 8x respectively, when computing input splits. This weighted byte scaling is only considered for MSQ ingestion that uses either LocalInputSource or CloudObjectInputSource at the moment. The default value of the `maxInputBytesPerWorker` query context parameter has been updated from 10 GiB, to 512 MiB
* Be able to load segments on Peons
This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task. Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask
The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
- Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
- Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements
In StreamChunkParser#parseWithInputFormat, we call byteEntityReader.read() without handling a potential ParseException, which is thrown during this function call by the delegate AvroStreamReader#intermediateRowIterator.
A ParseException can be thrown if an Avro stream has corrupt data or data that doesn't conform to the schema specified or for other decoding reasons. This exception if uncaught, can cause ingestion to fail.
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.
While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
* Make LoggingEmitter more useful
* Skip code coverage for facade classes
* fix spellcheck
* code review
* fix dependency
* logging.md
* fix checkstyle
* Add back jacoco version to main pom
* If a worker dies after it has finished generating results, MSQ decides to not retry it as it has no active work orders. However, since we don't keep track of it further, if it is required for a future stage, the controller hangs waiting for the worker to be ready. This PR keeps tracks of any workers the controller decides to not restart immediately and while starting workers for the next stage, queues these workers for retry.
* fix typo in s3 docs. add readme to s3 module.
* Update extensions-core/s3-extensions/README.md
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* cleanup readme for s3 extension and link to repo markdown doc instead of web docs
---------
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>