Changes
- Add a `DruidException` which contains a user-facing error message, HTTP response code
- Make `EntryExistsException` extend `DruidException`
- If metadata store max_allowed_packet limit is violated while inserting a new task, throw
`DruidException` with response code 400 (bad request) to prevent retries
- Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL
The class apparently only exists to add a toString()
method to Indexes, which basically just crashes any debugger
on any meaningfully sized index. It's a pointless
abstract class that basically only causes pain.
Changes:
- Add a timeout of 1 minute to resultFuture.get() in `CostBalancerStrategy.chooseBestServer`.
1 minute is the typical time for a full coordinator run and is more than enough time for cost
computations of a single segment.
- Raise an alert if an exception is encountered while computing costs and if the executor has
not been shutdown. This is because a shutdown is intentional and does not require an alert.
* Fix EarliestLatestBySqlAggregator signature; Include function name for all signatures.
* Single quote function signatures, space between args and remove \n.
* fixup UT assertion
The same aggregator can have two output names for a SQL like:
INSERT INTO foo
SELECT x, COUNT(*) AS y, COUNT(*) AS z
FROM t
GROUP BY 1
PARTITIONED BY ALL
In this case, the SQL planner will create a query with a single "count"
aggregator mapped to output names "y" and "z". The prior MSQ code did
not properly handle this case, instead throwing an error like:
Expected single output for query column[a0] but got [[1, 2]]
* Fix read timed out failures and remove containers before test
* remove containers before loading images
* add labels to IT docker containers, download stable minio docker image release instead of latest
In this PR, we are enhancing KafkaEmitter, to emit metadata about published segments (SegmentMetadataEvent) into a Kafka topic. This segment metadata information that gets published into Kafka, can be used by any other downstream services to query Druid intelligently based on the segments published. The segment metadata gets published into kafka topic in json string format similar to other events.
The sampler API returns a `400 bad request` response if it encounters a `SamplerException`.
Otherwise, it returns a generic `500 Internal server error` response, with the message
"The RuntimeException could not be mapped to a response, re-throwing to the HTTP container".
This commit updates `RecordSupplierInputSource` to handle all types of exceptions instead of just
`InterruptedException`and wrap them in a `SamplerException` so that the actual error is
propagated back to the user.
It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.
### Description
This change allows for consideration of the input format and compression when computing how to split the input files among available tasks, in MSQ ingestion, when considering the value of the `maxInputBytesPerWorker` query context parameter. This query parameter allows users to control the maximum number of bytes, with granularity of input file / object, that ingestion tasks will be assigned to ingest. With this change, this context parameter now denotes the estimated weighted size in bytes of the input to split on, with consideration for input format and compression format, rather than the actual file size, reported by the file system. We assume uncompressed newline delimited json as a baseline, with scaling factor of `1`. This means that when computing the byte weight that a file has towards the input splitting, we take the file size as is, if uncompressed json, 1:1. It was found during testing that gzip compressed json, and parquet, has scale factors of `4` and `8` respectively, meaning that each byte of data is weighted 4x and 8x respectively, when computing input splits. This weighted byte scaling is only considered for MSQ ingestion that uses either LocalInputSource or CloudObjectInputSource at the moment. The default value of the `maxInputBytesPerWorker` query context parameter has been updated from 10 GiB, to 512 MiB
This PR changes how we skip java UT and ITs with changes in the examples folder. After this change, any Markdown files within the examples folder and jupyter-notebooks directory will be excluded. The rationale behind these more specific exclusions is that some ITs use json files checked in examples, so we want to trigger the full workflow for all other changes.
Changes:
- `CostBalancerStrategyTest`
- Focus on verification of cost computations rather than choosing servers in this test
- Add new tests `testComputeCost` and `testJointSegmentsCost`
- Add tests to demonstrate that with a long enough interval gap, all costs become negligible
- Retain `testIntervalCost` and `testIntervalCostAdditivity`
- Remove redundant tests such as `testStrategyMultiThreaded`, `testStrategySingleThreaded`as
verification of this behaviour is better suited to `BalancingStrategiesTest`.
- `CostBalancerStrategyBenchmark`
- Remove usage of static method from `CostBalancerStrategyTest`
- Explicitly setup cluster and segments to use for benchmarking
The defaults of the following config values in the `CoordinatorDynamicConfig` are being updated.
1. `maxSegmentsInNodeLoadingQueue = 500` (previous = 100)
2. `replicationThrottleLimit = 500` (previous = 10)
Rationale: With round-robin segment assignment now being the default assignment technique,
the Coordinator can assign a large number of under-replicated/unavailable segments very quickly,
without getting stuck in `RunRules` duty due to very slow strategy-based cost computations.
3. `maxSegmentsToMove = 100` (previous = 5)
Rationale: A very low value (say 5) is ineffective in balancing especially if there are many segments
to balance. A very large value can cause excessive moves, which has these disadvantages:
- Load of moving segments competing with load of unavailable/under-replicated segments
- Unnecessary network costs due to constant download and delete of segments
These defaults will be revisited after #13197 is merged.
This PR adds a new interface to control how SegmentMetadataCache chooses ColumnType when faced with differences between segments for SQL schemas which are computed, exposed as druid.sql.planner.metadataColumnTypeMergePolicy and adds a new 'least restrictive type' mode to allow choosing the type that data across all segments can best be coerced into and sets this as the default behavior.
This is a behavior change around when segment driven schema migrations take effect for the SQL schema. With latestInterval, the SQL schema will be updated as soon as the first job with the new schema has published segments, while using leastRestrictive, the schema will only be updated once all segments are reindexed to the new type. The benefit of leastRestrictive is that it eliminates a bunch of type coercion errors that can happen in SQL when types are varied across segments with latestInterval because the newest type is not able to correctly represent older data, such as if the segments have a mix of ARRAY and number types, or any other combinations that lead to odd query plans.
* Expr getCacheKey now delegates to children
* Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it
* Adding an unit test
* Update processing/src/main/java/org/apache/druid/math/expr/Expr.java
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
---------
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
* Make resources an ordered collection so it's deterministic.
* test cleanup
* fixup docs.
* Replace deprecated ObjectNode#put() calls with ObjectNode#set().