The PR addresses 2 things:
Add MSQ durable storage connector for GCS
Change GCS client library from the old Google API Client Library to the recommended Google Cloud Client Library. Ref: https://cloud.google.com/apis/docs/client-libraries-explained
Currently Druid is using google apis client 1.26.0 version and google-oauth-client-1.26.0.jar in particular is bringing following CVEs CVE-2020-7692, CVE-2021-22573. Despite the CVEs being false positives, they're causing red security scans on Druid distribution. Hence updating the version to latest version with these CVE fixes.
* Fixing an issue in sequential merge where workers without any partial key statistics would get stuck because controller did not change the worker state.
* Removing empty check
* Adding IT for MSQ sequential bug fix.
* Add ZooKeeper connection state alerts and metrics.
- New metric "zk/connected" is an indicator showing 1 when connected,
0 when disconnected.
- New metric "zk/disconnected/time" measures time spent disconnected.
- New alert when Curator connection state enters LOST or SUSPENDED.
* Use right GuardedBy.
* Test fixes, coverage.
* Adjustment.
* Fix tests.
* Fix ITs.
* Improved injection.
* Adjust metric name, add tests.
Apache Druid brings multiple direct and transitive dependencies that are affected by plethora of CVEs.
This PR attempts to update all the dependencies that did not require code refactoring.
This PR modifies pom files, license file and OWASP Dependency Check suppression file.
Our CI system has a lot of tests. And much of this testing is really unnecessary for most of the PRs. This PR adds some checks so we can skip these expensive tests when we know they are not necessary.
This commit does a complete revamp of the coordinator to address problem areas:
- Stability: Fix several bugs, add capabilities to prioritize and cancel load queue items
- Visibility: Add new metrics, improve logs, revamp `CoordinatorRunStats`
- Configuration: Add dynamic config `smartSegmentLoading` to automatically set
optimal values for all segment loading configs such as `maxSegmentsToMove`,
`replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`.
Changed classes:
- Add `StrategicSegmentAssigner` to make assignment decisions for load, replicate and move
- Add `SegmentAction` to distinguish between load, replicate, drop and move operations
- Add `SegmentReplicationStatus` to capture current state of replication of all used segments
- Add `SegmentLoadingConfig` to contain recomputed dynamic config values
- Simplify classes `LoadRule`, `BroadcastRule`
- Simplify the `BalancerStrategy` and `CostBalancerStrategy`
- Add several new methods to `ServerHolder` to track loaded and queued segments
- Refactor `DruidCoordinator`
Impact:
- Enable `smartSegmentLoading` by default. With this enabled, none of the following
dynamic configs need to be set: `maxSegmentsToMove`, `replicationThrottleLimit`,
`maxSegmentsInNodeLoadingQueue`, `useRoundRobinSegmentAssignment`,
`emitBalancingStats` and `replicantLifetime`.
- Coordinator reports richer metrics and produces cleaner and more informative logs
- Coordinator uses an unlimited load queue for all serves, and makes better assignment decisions
* Fix read timed out failures and remove containers before test
* remove containers before loading images
* add labels to IT docker containers, download stable minio docker image release instead of latest
The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.
* MSQ: Use the same result coercion routines as the regular SQL endpoint.
The main changes are to move NativeQueryMaker.coerce to SqlResults, and
to formally make the list of sqlTypeNames from the MSQ results reports
use SqlTypeNames.
- Change the default to MSQ-compatible rather than MSQ-incompatible.
The explicit marker function is now "notMsqCompatible()".
*
1. Handling deletion/creation of container created during the previously run test in AzureTestUtil.java.
2. Adding/updating log messages and comments in Azure and GCS deep storage tests.
The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead.
Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.
* merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything
* fix poms and license stuff
* mockito is evil
* allow reset of JvmUtils RuntimeInfo if tests used static injection to override
* discover nested columns when using nested column indexer for schemaless
* move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec
Much improved table functions
* Revises properties, definitions in the catalog
* Adds a "table function" abstraction to model such functions
* Specific functions for HTTP, inline, local and S3.
* Extended SQL types in the catalog
* Restructure external table definitions to use table functions
* EXTEND syntax for Druid's extern table function
* Support for array-valued table function parameters
* Support for array-valued SQL query parameters
* Much new documentation
Support both indexer and MM in ITs
Support for the DRUID_INTEGRATION_TEST_INDEXER variable
Conditional client cluster configuration
Cleanup of OVERRIDE_ENV file handling
Enforce setting of test-specific env vars
Cleanup of unused bits
* Validate response headers and fix exception logging
A class of QueryException were throwing away their
causes making it really hard to determine what's
going wrong when something goes wrong in the SQL
planner specifically. Fix that and adjust tests
to do more validation of response headers as well.
We allow 404s and 307s to be returned even without
authorization validated, but others get converted to 403
* New IT Framework - InputSource and InputFormat Tests
* Fixing checkstyle errors
* Updating InputSource setup
* Updating queries to use druid DB
* Making metadata setup queries to be idempotent
* Restore intellij files
Druid catalog basics
Catalog object model for tables, columns
Druid metadata DB storage (as an extension)
REST API to update the catalog (as an extension)
Integration tests
Model only: no planner integration yet
In clusters with a large number of segments, the duty `MarkAsUnusedOvershadowedSegments`
can take a long very long time to finish. This is because of the costly invocation of
`timeline.isOvershadowed` which is done for every used segment in every coordinator run.
Changes
- Use `DataSourceSnapshot.getOvershadowedSegments` to get all overshadowed segments
- Iterate over this set instead of all used segments to identify segments that can be marked as unused
- Mark segments as unused in the DB in batches rather than one at a time
- Refactor: Add class `SegmentTimeline` for ease of use and readability while using a
`VersionedIntervalTimeline` of segments.
* Add interpolation to JsonConfigurator
* Fix checkstyle
* Fix tests by removing common-text override
* Add back commons-text without version
* Remove unused hadoopDir configs
* Move some stuff to hopefully pass coverage
* Building druid-it-tools and running for travis in it.sh
* Addressing comments
* Updating druid-it-image pom to point to correct it-tools
* Updating all it-tools references to druid-it-tools
* Adding dist back to it.sh travis
* Trigger Build
* Disabling batchIndex tests and commenting out user specific code
* Fixing checkstyle and intellij inspection errors
* Replacing tabs with spaces in it.sh
* Enabling old batch index tests with indexer