One of the most requested features in druid is to have an ability to download big result sets.
As part of #14416 , we added an ability for MSQ to be queried via a query friendly endpoint. This PR builds upon that work and adds the ability for MSQ to write select results to durable storage.
We write the results to the durable storage location <prefix>/results/<queryId> in the druid frame format. This is exposed to users by
/v2/sql/statements/:queryId/results.
Apache Druid brings multiple direct and transitive dependencies that are affected by plethora of CVEs.
This PR attempts to update all the dependencies that did not require code refactoring.
This PR modifies pom files, license file and OWASP Dependency Check suppression file.
This commit borrows some test definitions from Drill's test suite
and tries to use them to flesh out the full validation of window
function capbilities.
In order to be able to run these tests, we also add the ability to
run a Scan operation against segments, which also meant an
implementation of RowsAndColumns for frames.
* Changes the get results API in SqlStatementResource to take a page number instead of row/offset.
* Adds "pages" containing information on each page to the results status.
* Update the "numRows" and "sizeInByes" to "numTotalRows" and "totalSizeInBytes" respectively, which are totalled across all pages.
* combine string column implementations
changes:
* generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
* remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings
* remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user
1) Fix a problem where the fault wasn't reported when the left-hand side
had too many buffered frames. (Instead, frames continued to be buffered,
eventually running the server out of memory.)
2) Always update the mark when rewinding isn't necessary. It fixes a problem where
frames would be needlessly buffered when there isn't a key match across
the two sides.
3) Memory reserved for building the trackers now change based on the heap sized
* Add "stringEncoding" parameter to DataSketches HLL.
Builds on the concept from #11172 and adds a way to feed HLL sketches
with UTF-8 bytes.
This must be an option rather than always-on, because prior to this
patch, HLL sketches used UTF-16LE encoding when hashing strings. To
remain compatible with sketch images created prior to this patch -- which
matters during rolling updates and when reading sketches that have been
written to segments -- we must keep UTF-16LE as the default.
Not currently documented, because I'm not yet sure how best to expose
this functionality to users. I think the first place would be in the SQL
layer: we could have it automatically select UTF-8 or UTF-16LE when
building sketches at query time. We need to be careful about this, though,
because UTF-8 isn't always faster. Sometimes, like for the results of
expressions, UTF-16LE is faster. I expect we will sort this out in
future patches.
* Fix benchmark.
* Fix style issues, improve test coverage.
* Put round back, to make IT updates easier.
* Fix test.
* Fix issue with filtered aggregators and add test.
* Use DS native update(ByteBuffer) method. Improve test coverage.
* Add another suppression.
* Fix ITAutoCompactionTest.
* Update benchmarks.
* Updates.
* Fix conflict.
* Adjustments.
In these other cases, stick to plain "filter". This simplifies lots of
logic downstream, and doesn't hurt since we don't have intervals-specific
optimizations outside of tables.
Fixes an issue where we couldn't properly filter on a column from an
external datasource if it was named __time.
Mocks generally have state and should not be static. In particular, the
"Yielder" included in one of the mocks can only be iterated once, which
made the test suite order-dependent.
* Support complex variance object inputs for variance SQL agg function
* Add test
* Include complexTypeChecker, address PR comments
* Checkstyle, javadoc link
This PR aims to expose a new API called
"@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.
Adds support for automatic cleaning of a "query-results" directory in durable storage. This directory will be cleaned up only if the task id is not known to the overlord. This will allow the storage of query results after the task has finished running.
Changes:
- Throw an `InsertCannotAllocateSegmentFault` if the allocated segment is not aligned with
the requested granularity.
- Tests to verify new behaviour
* MSQ: Change default clusterStatisticsMergeMode to SEQUENTIAL.
This is an undocumented parameter that controls how cluster-by statistics
are merged. In PARALLEL mode, statistics are gathered from workers all
at once. In SEQUENTIAL mode, statistics are gathered time chunk by time
chunk. This improves accuracy for jobs with many time chunks, and reduces
memory usage.
The main downside of SEQUENTIAL is that it can take longer, but in most
situations I've seen, PARALLEL is only really usable in cases where the
sketches are small enough that SEQUENTIAL would also run relatively
quickly. So it seems like SEQUENTIAL is a better default.
* Switch off-test from SEQUENTIAL to PARALLEL.
* Fix sequential merge for situations where there are no time chunks at all.
* Add a couple more tests.
Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryRows in Broker's config or maxSubqueryRows in the query context. This feature is experimental for now and would default back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.
* SQL OperatorConversions: Introduce.aggregatorBuilder, allow CAST-as-literal.
Four main changes:
1) Provide aggregatorBuilder, a more consistent way of defining the
SqlAggFunction we need for all of our SQL aggregators. The mechanism
is analogous to the one we already use for SQL functions
(OperatorConversions.operatorBuilder).
2) Allow CASTs of constants to be considered as "literalOperands". This
fixes an issue where various of our operators are defined with
OperandTypes.LITERAL as part of their checkers, which doesn't allow
casts. However, in these cases we generally _do_ want to allow casts.
The important piece is that the value must be reducible to a constant,
not that the SQL text is literally a literal.
3) Update DataSketches SQL aggregators to use the new aggregatorBuilder
functionality. The main user-visible effect here is [2]: the aggregators
would now accept, for example, "CAST(0.99 AS DOUBLE)" as a literal
argument. Other aggregators could be updated in a future patch.
4) Rename "requiredOperands" to "requiredOperandCount", because the
old name was confusing. (It rhymes with "literalOperands" but the
arguments mean different things.)
* Adjust method calls.
* S3: Attach SSE key to doesObjectExist calls.
We did not previously attach the SSE key to the doesObjectExist request,
leading to an inconsistency that may cause problems on "S3-compatible"
implementations. This patch implements doesObjectExist using similar
logic to the S3 client itself, but calls our implementation of
getObjectMetadata rather than the S3 client's, ensuring the request
is decorated with the SSE key.
* Fix tests.
* Fix compatibility issue with SqlTaskResource
The DruidException changes broke the response format
for errors coming back from the SqlTaskResource, so fix those
Introduce DruidException, an exception whose goal in life is to be delivered to a user.
DruidException itself has javadoc on it to describe how it should be used. This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used.
This work was a 3rd iteration on top of work that was started by Paul Rogers. I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.
* Throw ValidationException if CLUSTERED BY column descending order is specified.
- Fails query planning
* Some more tests.
* fixup existing comment
* Update comment
* checkstyle fix: remove unused imports
* Remove InsertCannotOrderByDescendingFault and deprecate the fault in readme.
* move deprecated field to the bottom
They were not previously loaded because supportsQueries was false.
This patch sets supportsQueries to true, and clarifies in Task
javadocs that supportsQueries can be true for tasks that aren't
directly queryable over HTTP.
* Limit select results in MSQ
* reduce number of files in test
* add truncated flag
* avoid materializing select results to list, use iterable instead
* javadocs
* fix kafka input format reader schema discovery and partial schema discovery to actually work right, by re-using dimension filtering logic of MapInputRowParser
Changes
- Add a `DruidException` which contains a user-facing error message, HTTP response code
- Make `EntryExistsException` extend `DruidException`
- If metadata store max_allowed_packet limit is violated while inserting a new task, throw
`DruidException` with response code 400 (bad request) to prevent retries
- Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL
* Fix EarliestLatestBySqlAggregator signature; Include function name for all signatures.
* Single quote function signatures, space between args and remove \n.
* fixup UT assertion
The same aggregator can have two output names for a SQL like:
INSERT INTO foo
SELECT x, COUNT(*) AS y, COUNT(*) AS z
FROM t
GROUP BY 1
PARTITIONED BY ALL
In this case, the SQL planner will create a query with a single "count"
aggregator mapped to output names "y" and "z". The prior MSQ code did
not properly handle this case, instead throwing an error like:
Expected single output for query column[a0] but got [[1, 2]]
It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.
### Description
This change allows for consideration of the input format and compression when computing how to split the input files among available tasks, in MSQ ingestion, when considering the value of the `maxInputBytesPerWorker` query context parameter. This query parameter allows users to control the maximum number of bytes, with granularity of input file / object, that ingestion tasks will be assigned to ingest. With this change, this context parameter now denotes the estimated weighted size in bytes of the input to split on, with consideration for input format and compression format, rather than the actual file size, reported by the file system. We assume uncompressed newline delimited json as a baseline, with scaling factor of `1`. This means that when computing the byte weight that a file has towards the input splitting, we take the file size as is, if uncompressed json, 1:1. It was found during testing that gzip compressed json, and parquet, has scale factors of `4` and `8` respectively, meaning that each byte of data is weighted 4x and 8x respectively, when computing input splits. This weighted byte scaling is only considered for MSQ ingestion that uses either LocalInputSource or CloudObjectInputSource at the moment. The default value of the `maxInputBytesPerWorker` query context parameter has been updated from 10 GiB, to 512 MiB
* Be able to load segments on Peons
This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task. Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask
The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
- Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
- Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements