* Remove unused constants
* Refactor getBlockBlobLength
* Better link
* Upper-case log
* Mark defaultStorageAccount nullable
This is the case if you do not use Azure for deep-storage but ingest from Azure blobs.
* Do not always create a new container if it doesn't exist
Specifically, only create a container if uploading a blob or writing a blob stream
* Add lots of comments, group methods
* Revert "Mark defaultStorageAccount nullable"
* Add mockito for junit
* Add extra test
* Add comment
Thanks George.
* Pass blockSize as Long
* Test more branches...
* Optimise S3 storage writing for MSQ durable storage
* Get rid of static ConcurrentHashMap
* Fix static checks
* Fix tests
* Remove unused constructor parameter chunkValidation + relevant cleanup
* Assert etags as String instead of Integer
* Fix flaky test
* Inject executor service
* Make threadpool size dynamic based on number of cores
* Fix S3StorageDruidModuleTest
* Fix S3StorageConnectorProviderTest
* Fix injection issues
* Add S3UploadConfig to manage maximum number of concurrent chunks dynamically based on chunk size
* Address the minor review comments
* Refactor S3UploadConfig + ExecutorService into S3UploadManager
* Address review comments
* Make updateChunkSizeIfGreater() synchronized instead of recomputeMaxConcurrentNumChunks()
* Address the minor review comments
* Fix intellij-inspections check
* Refactor code to use futures for maxNumConcurrentChunks. Also use executor service with blocking queue for backpressure semantics.
* Update javadoc
* Get rid of cyclic dependency injection between S3UploadManager and S3OutputConfig
* Fix RetryableS3OutputStreamTest
* Remove unnecessary synchronization parts from RetryableS3OutputStream
* Update javadoc
* Add S3UploadManagerTest
* Revert back to S3StorageConnectorProvider extends S3OutputConfig
* Address Karan's review comments
* Address Kashif's review comments
* Change a log message to debug
* Address review comments
* Fix intellij-inspections check
* Fix checkstyle
---------
Co-authored-by: asdf2014 <asdf2014@apache.org>
* Fallback vectorization for FunctionExpr and BaseMacroFunctionExpr.
This patch adds FallbackVectorProcessor, a processor that adapts non-vectorizable
operations into vectorizable ones. It is used in FunctionExpr and BaseMacroFunctionExpr.
In addition:
- Identifiers are updated to offer getObjectVector for ARRAY and COMPLEX in addition
to STRING. ExprEvalObjectVector is updated to offer ARRAY and COMPLEX as well.
- In SQL tests, cannotVectorize now fails tests if an exception is not thrown. This makes
it easier to identify tests that can now vectorize.
- Fix a null-matcher bug in StringObjectVectorValueMatcher.
* Fix tests.
* Fixes.
* Fix tests.
* Fix test.
* Fix test.
* Fix serde for ArrayOfDoublesSketchConstantPostAggregator.
The version originally added in #13819 was missing an annotation for
the "value" property. Fixes#16539.
Line endings for ArrayOfDoublesSketchConstantPostAggregator.java are changed
from \r\n to \n.
Adds a serde test, and improves various other datasketches post-aggregator
serde tests to deserialize into PostAggregator. This verifies that the type
information is set up correctly.
* Fix excessive imports.
* Fix equals, hashCode.
* Fix task bootstrap locations.
* Remove dependency of SegmentCacheManager from SegmentLoadDropHandler.
- The load drop handler code talks to the local cache manager via
SegmentManager.
* Clean up unused imports and stuff.
* Test fixes.
* Intellij inspections and test bind.
* Clean up dependencies some more
* Extract test load spec and factory to its own class.
* Cleanup test util
* Pull SegmentForTesting out to TestSegmentUtils.
* Fix up.
* Minor changes to infoDir
* Replace server announcer mock and verify that.
* Add tests.
* Update javadocs.
* Address review comments.
* Separate methods for download and bootstrap load
* Clean up return types and exception handling.
* No callback for loadSegment().
* Minor cleanup
* Pull out the test helpers into its own static class so it can have better state control.
* LocalCacheManager stuff
* Fix build.
* Fix build.
* Address some CI warnings.
* Minor updates to javadocs and test code.
* Address some CodeQL test warnings and checkstyle fix.
* Pass a Consumer<DataSegment> instead of boolean & rename variables.
* Small updates
* Remove one test constructor.
* Remove the other constructor that wasn't initializing fully and update usages.
* Cleanup withInfoDir() builder and unnecessary test hooks.
* Remove mocks and elaborate on comments.
* Commentary
* Fix a few Intellij inspection warnings.
* Suppress corePoolSize intellij-inspect warning.
The intellij-inspect tool doesn't seem to correctly inspect
lambda usages. See ScheduledExecutors.
* Update docs and add more tests.
* Use hamcrest for asserting order on expectation.
* Shutdown bootstrap exec.
* Fix checkstyle
Changes:
- Loosely followed the steps in the migration guide at
https://junit.org/junit5/docs/current/user-guide/#migrating-from-junit4
- Updated POM to add JUnit 5 dependencies
- Updated imports to JUnit 5 packages
- Updated annotations (Lifecycle annotations like `@BeforeEach`)
- Updated exception testing (`assertThrows`)
- Updated temporary path handling (use `@TempDir` annotation)
- Various other updates (replace other `Rule` usages, make sure to use JUnit 5 assertions)
This commit aims to enable the re-ordering of window operators in order to optimise
the sort and partition operators.
Example :
```
SELECT m1, m2,
SUM(m1) OVER(PARTITION BY m2) as sum1,
SUM(m2) OVER() as sum2
from numFoo
GROUP BY m1,m2
```
In order to compute this query, we can order the operators as to first compute the operators
corresponding to sum2 and then place the operators corresponding to sum1 which would
help us in reducing one sort operator if we order our operators by sum1 and then sum2.
There are a few issues with using Jackson serialization in sending datasketches between controller and worker in MSQ. This caused a blowup due to holding multiple copies of the sketch being stored.
This PR aims to resolve this by switching to deserializing the sketch payload without Jackson.
The PR adds a new query parameter used during communication between controller and worker while fetching sketches, "sketchEncoding".
If the value of this parameter is OCTET, the sketch is returned as a binary encoding, done by ClusterByStatisticsSnapshotSerde.
If the value is not the above, the sketch is encoded by Jackson as before.
This PR updates CompactionTask to not load any lookups by default, unless transformSpec is present.
If transformSpec is present, we will make the decision based on context values, loading all lookups by default. This is done to ensure backward compatibility since transformSpec can reference lookups.
If transform spec is not present and no context value is passed, we donot load any lookup.
This behavior can be overridden by supplying lookupLoadingMode and lookupsToLoad in the task context.
MSQ sorts the columns in a highly specialized manner by byte comparisons. As such the values are serialized differently. This works well for the primitive types and primitive arrays, however complex types cannot be serialized specially.
This PR adds the support for sorting the complex columns by deserializing the value from the field and comparing it via the type strategy. This is a lot slower than the byte comparisons, however, it's the only way to support sorting on complex columns that can have arbitrary serialization not optimized for MSQ.
The primitives and the arrays are still compared via the byte comparison, therefore this doesn't affect the performance of the queries supported before the patch. If there's a sorting key with mixed complex and primitive/primitive array types, for example: longCol1 ASC, longCol2 ASC, complexCol1 DESC, complexCol2 DESC, stringCol1 DESC, longCol3 DESC, longCol4 ASC, the comparison will happen like:
longCol1, longCol2 (ASC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in ascending order
complexCol1 (DESC) - Compared via deserialization, cannot be clubbed with any other field
complexCol2 (DESC) - Compared via deserialization, cannot be clubbed with any other field, even though the prior field was a complex column with the same order
stringCol1, longCol3 (DESC) - Compared together via byte-comparison, since both are byte comparable and need to be sorted in descending order
longCol4 (ASC) - Compared via byte-comparison, couldn't be coalesced with the previous fields as the direction was different
This way, we only deserialize the field wherever required
Changes:
- Remove deprecated `markAsUnused` parameter from `KillUnusedSegmentsTask`
- Allow `kill` task to use `REPLACE` lock when `useConcurrentLocks` is true
- Use `EXCLUSIVE` lock by default
* enable quidem uri support for `druidtest:///?ComponentSupplier=Nested` and similar
* changes the way `SqlTestFrameworkConfig` is being applied; all options will have their own annotation (its kinda impossible to detect that an annotation has a set value or its the default)
* enables hierarchical processing of config annotation (was needed to enable class level supplier annotation)
* moves uri processing related string2config stuff into `SqlTestFrameworkConfig`
With this PR changes, MSQ tasks (MSQControllerTask and MSQWorkerTask) only load the required lookups during querying and ingestion, based on the value of CTX_LOOKUPS_TO_LOAD key in the query context.
Add validation for reindex with realtime sources.
With the addition of concurrent compaction, it is possible to ingest data while querying from realtime sources with MSQ into the same datasource. This could potentially lead to issues if the interval that is ingested into is replaced by an MSQ job, which has queried only some of the data from the realtime task.
This PR adds validation to check that the datasource being ingested into is not being queried from, if the query includes realtime sources.
* * add another catalog clustering columns unit test
* * dissallow clusterKeys with descending order
* * make more clear that clustering is re-written into ingest node
whether a catalog table or not
* * when partitionedBy is stored in catalog, user shouldnt need to specify
it in order to specify clustering
* * fix intellij inspection failure
update dependencies to address new batch of CVEs:
- Azure POM from 1.2.19 to 1.2.23 to update transitive dependency nimbus-jose-jwt to address: CVE-2023-52428
- commons-configuration2 from 2.8.0 to 2.10.1 to address: CVE-2024-29131 CVE-2024-29133
- bcpkix-jdk18on from 1.76 to 1.78.1 to address: CVE-2024-30172 CVE-2024-30171 CVE-2024-29857
* MSQ controller: Support in-memory shuffles; towards JVM reuse.
This patch contains two controller changes that make progress towards a
lower-latency MSQ.
First, support for in-memory shuffles. The main feature of in-memory shuffles,
as far as the controller is concerned, is that they are not fully buffered. That
means that whenever a producer stage uses in-memory output, its consumer must run
concurrently. The controller determines which stages run concurrently, and when
they start and stop.
"Leapfrogging" allows any chain of sort-based stages to use in-memory shuffles
even if we can only run two stages at once. For example, in a linear chain of
stages 0 -> 1 -> 2 where all do sort-based shuffles, we can use in-memory shuffling
for each one while only running two at once. (When stage 1 is done reading input
and about to start writing its output, we can stop 0 and start 2.)
1) New OutputChannelMode enum attached to WorkOrders that tells workers
whether stage output should be in memory (MEMORY), or use local or durable
storage.
2) New logic in the ControllerQueryKernel to determine which stages can use
in-memory shuffling (ControllerUtils#computeStageGroups) and to launch them
at the appropriate time (ControllerQueryKernel#createNewKernels).
3) New "doneReadingInput" method on Controller (passed down to the stage kernels)
which allows stages to transition to POST_READING even if they are not
gathering statistics. This is important because it enables "leapfrogging"
for HASH_LOCAL_SORT shuffles, and for GLOBAL_SORT shuffles with 1 partition.
4) Moved result-reading from ControllerContext#writeReports to new QueryListener
interface, which ControllerImpl feeds results to row-by-row while the query
is still running. Important so we can read query results from the final
stage using an in-memory channel.
5) New class ControllerQueryKernelConfig holds configs that control kernel
behavior (such as whether to pipeline, maximum number of concurrent stages,
etc). Generated by the ControllerContext.
Second, a refactor towards running workers in persistent JVMs that are able to
cache data across queries. This is helpful because I believe we'll want to reuse
JVMs and cached data for latency reasons.
1) Move creation of WorkerManager and TableInputSpecSlicer to the
ControllerContext, rather than ControllerImpl. This allows managing workers and
work assignment differently when JVMs are reusable.
2) Lift the Controller Jersey resource out from ControllerChatHandler to a
reusable resource.
3) Move memory introspection to a MemoryIntrospector interface, and introduce
ControllerMemoryParameters that uses it. This makes it easier to run MSQ in
process types other than Indexer and Peon.
Both of these areas will have follow-ups that make similar changes on the
worker side.
* Address static checks.
* Address static checks.
* Fixes.
* Report writer tests.
* Adjustments.
* Fix reports.
* Review updates.
* Adjust name.
* Small changes.
Changes:
- Add new config `lagAggregate` to `LagBasedAutoScalerConfig`
- Add field `aggregateForScaling` to `LagStats`
- Use the new field/config to determine which aggregate to use to compute lag
- Remove method `Supervisor.computeLagForAutoScaler()`
Changes:
1) Check for handoff of upgraded realtime segments.
2) Drop sink only when all associated realtime segments have been abandoned.
3) Delete pending segments upon commit to prevent unnecessary upgrades and
partition space exhaustion when a concurrent replace happens. This also prevents
potential data duplication.
4) Register pending segment upgrade only on those tasks to which the segment is associated.
I'm adding OIDC context to the AuthenticationResult returned by pac4j extension. I wanted to use this context as input in OpenPolicyAgent authorization. Since AuthenticationResult already accepts context as a parameter it felt okay to pass the profile attributes there.
Fixes a bug introduced in #16296, where the sketch might not be
initialized if get() is called without calling aggregate(). Also adds
a test for this case.
Issue: #14989
The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information.
This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.
Buffer aggregators can contain some cached objects within them, such as
Memory references or HLL Unions. Prior to this patch, various Grouper
implementations were not releasing this state when resetting their own
internal state, which could lead to excessive memory use.
This patch renames AggregatorAdapater#close to "reset", and updates
Grouper implementations to call this reset method whenever they reset
their internal state.
The base method on BufferAggregator and VectorAggregator remains named
"close", for compatibility with existing extensions, but the contract
is adjusted to say that the aggregator may be reused after the method
is called. All existing implementations in core already adhere to this
new contract, except for the ArrayOfDoubles build flavors, which are
updated in this patch to adhere.
Additionally, this patch harmonizes buffer sketch helpers to call their
clear method "clear" rather than a mix of "clear" and "close". (Others
were already using "clear".)
Tries to address the comments made on #16284 after merged.
Changes:
- Remove method `Supervisor.getLagMetric()`
- Add method `Supervisor.computeLagForAutoScaler()`
- Remove classes `LagMetric` and `LagMetricTest`
Changes:
- Add column `task_allocator_id` to `pendingSegments` metadata table.
- Add column `upgraded_from_segment_id` to `pendingSegments` metadata table.
- Add interface `PendingSegmentAllocatingTask` and implement it by all tasks which
can allocate pending segments.
- Use `taskAllocatorId` to identify the task (and its sub-tasks or replicas) to which
a pending segment has been allocated.
- Perform active cleanup of pending segments in `TaskLockbox` once there are no
active tasks for the corresponding task allocator id.
- When committing APPEND segments, also commit all upgraded pending segments
corresponding to that task allocator id.
- When committing REPLACE segments, upgrade all overlapping pending segments in
the same transaction.
Bug:
#15724 introduced a bug where a rolling upgrade would cause all task locations
returned by the Overlord on an older version to be unknown.
Fix:
If the new API fails, fall back to single task status API which always returns a valid task location.