Commit Graph

66 Commits

Author SHA1 Message Date
Adarsh Sanjeev 9a2d7c28bc
Prepare master branch for 31.0.0 release (#16333) 2024-04-26 09:22:43 +05:30
Kashif Faraz 81d7b6ebe1
Fix OverlordClient to read reports as a concrete `ReportMap` (#16226)
Follow up to #16217 

Changes:
- Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap`
- Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module
  - `TaskReport`
  - `KillTaskReport`
  - `IngestionStatsAndErrorsTaskReport`
  - `TaskContextReport`
  - `TaskReportFileWriter`
  - `SingleFileTaskReportFileWriter`
  - `TaskReportSerdeTest`
- Remove `MsqOverlordResourceTestClient` as it had only one method
which is already present in `OverlordResourceTestClient` itself
2024-04-15 08:00:59 +05:30
YongGang da9feb4430
Introduce TaskContextReport for reporting task context (#16041)
Changes:
- Add `TaskContextEnricher` interface to improve task management and monitoring
- Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord
- Add `TaskContextReport` to write out task context information in reports
2024-04-12 08:57:49 +05:30
Gian Merlino 5e5cf9af99
Reduce upload buffer size in GoogleTaskLogs. (#16236)
* Reduce upload buffer size in GoogleTaskLogs.

Use a 1MB upload buffer, rather than the default of 15 MB in the API client. This is
mainly because MMs may upload logs in parallel, and typically have small heaps. The
default-sized 15 MB buffers add up quickly and can cause a MM to run out of memory.

* Make bufferSize a nullable Integer. Add tests.
2024-04-08 12:54:31 -07:00
Zoltan Haindrich 1df41db46d
Migrate to use docker compose v2 (#16232)
https://github.com/actions/runner-images/issues/9557
2024-04-03 12:32:55 +02:00
Kashif Faraz 4df4896674
Refactor: Add common method in AbstractBatchIndexTask to create ingestion stats report (#16202)
Changes
-  No functional changes
- Add method `AbstractBatchIndexTask.buildIngestionStatsReport()` used in several batch tasks
- Add utility method `AbstractBatchIndexTask.addBuildSegmentStatsToReport()`
- Use boolean argument to represent a full report instead of the String `full` 
in internal methods. (REST API remains unchanged.)
- Rename `IngestionStatsAndErrorsTaskReportData` to `IngestionStatsAndErrors`
- Clean up some of the methods
2024-03-28 23:07:00 +05:30
Adarsh Sanjeev 86a24012a6
Add security ITs for sending tasks to overlord (#16131)
* Add security ITs for sending tasks to overlord

* Add security ITs for sending tasks to overlord

* Resolve test flakiness
2024-03-18 09:33:40 +05:30
Vishesh Garg bed5d9c3b2
Remove exception on failure response from GCS delete API (#16047)
* Throw 404 Exception on failure response from GCS delete API

* Replace String.format

* Apply suggestions from code review

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

* Remove exception for file not found and fix tests

* Add warn log and fix intellij inspection errors

* More intellij inspection fixes

* * Change to debug log
* change runtime exception class for code coverage
* Add file paths for batch delete failures

* Move failedPaths computation to inside isDebugEnabled flag

* Correct handling of StorageException

* Address review comments

* Remove unused exceptions

* Address code coverage and review comments

* Minor corrections

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2024-03-07 17:57:17 +05:30
Adarsh Sanjeev 9eaaeb5c16
Add security ITs to the revised integration tests (#15885)
* Add IT for security

* Add admin client

* Clean up code

* Clean up code

* Address review comments
2024-02-20 11:32:08 +05:30
George Shiqi Wu d703b2c709
Add azure kill test (#15833)
* Add kill test

* Extra line

* Don't need toString

* Add back test

* Remove newline

* move kill verification into main test
2024-02-08 16:15:30 -05:00
Adarsh Sanjeev 514b3b4d01
Add export capabilities to MSQ with SQL syntax (#15689)
* Add test

* Parser changes to support export statements

* Fix builds

* Address comments

* Add frame processor

* Address review comments

* Fix builds

* Update syntax

* Webconsole workaround

* Refactor

* Refactor

* Change export file path

* Update docs

* Remove webconsole changes

* Fix spelling mistake

* Parser changes, add tests

* Parser changes, resolve build warnings

* Fix failing test

* Fix failing test

* Fix IT tests

* Add tests

* Cleanup

* Fix unparse

* Fix forbidden API

* Update docs

* Update docs

* Address review comments

* Address review comments

* Fix tests

* Address review comments

* Fix insert unparse

* Add external write resource action

* Fix tests

* Add resource check to overlord resource

* Fix tests

* Add IT

* Update syntax

* Update tests

* Update permission

* Address review comments

* Address review comments

* Address review comments

* Add tests

* Add check for runtime parameter for bucket and path

* Add check for runtime parameter for bucket and path

* Add tests

* Update docs

* Fix NPE

* Update docs, remove deadcode

* Fix formatting
2024-02-07 22:08:50 +05:30
George Shiqi Wu 50bae96e8b
Add azure integrationt ests (#15799) 2024-02-01 09:18:49 -05:00
Abhishek Agarwal 0ab2781a7f
Disable eager initialization for non-query connection requests (#15751) 2024-01-25 14:38:50 +05:30
Karan Kumar c4990f56d6
Prepare main branch for next 30.0.0 release. (#15707) 2024-01-23 15:55:54 +05:30
Vishesh Garg e43bb74c3a
Add MSQ Durable Storage Connector for Google Cloud Storage and change current Google Cloud Storage client library (#15398)
The PR addresses 2 things:

    Add MSQ durable storage connector for GCS
    Change GCS client library from the old Google API Client Library to the recommended Google Cloud Client Library. Ref: https://cloud.google.com/apis/docs/client-libraries-explained
2023-12-14 07:34:49 +05:30
Ankit Kothari 8735d023a1
Add experimental support for first/last for double/float/long #10702 (#14462)
Add experimental support for doubleLast, doubleFirst, FloatLast, FloatFirst, longLast and longFirst.
2023-12-12 11:36:51 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
Tejaswini Bandlamudi dec6a0aa14
Update google client apis to latest version (#14414)
Currently Druid is using google apis client 1.26.0 version and google-oauth-client-1.26.0.jar in particular is bringing following CVEs CVE-2020-7692, CVE-2021-22573. Despite the CVEs being false positives, they're causing red security scans on Druid distribution. Hence updating the version to latest version with these CVE fixes.
2023-09-11 12:27:23 +05:30
Clint Wylie 5d1412949e
enable sql compatible null handling mode by default (#14792)
* enable sql compatible null handling mode by default
* fix bug with string first/last aggs when druid.generic.useDefaultValueForNull=false
2023-08-21 20:07:13 -07:00
Clint Wylie 6b14dde50e
deprecate config-magic in favor of json configuration stuff (#14695)
* json config based processing and broker merge configs to deprecate config-magic
2023-08-16 18:23:57 -07:00
dependabot[bot] e55fe67535
Bump apache.curator.version from 5.4.0 to 5.5.0 (#14843)
* Bump apache.curator.version from 5.4.0 to 5.5.0

Bumps `apache.curator.version` from 5.4.0 to 5.5.0.

Updates `org.apache.curator:curator-client` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

Updates `org.apache.curator:curator-framework` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

Updates `org.apache.curator:curator-recipes` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

Updates `org.apache.curator:curator-x-discovery` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

Updates `org.apache.curator:curator-test` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

---
updated-dependencies:
- dependency-name: org.apache.curator:curator-client
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.curator:curator-framework
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.curator:curator-recipes
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.curator:curator-x-discovery
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.curator:curator-test
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-16 07:36:58 -07:00
AmatyaAvadhanula 0412f40d36
Prepare master branch for next release, 28.0.0 (#14595)
* Prepare master branch for next release, 28.0.0
2023-07-18 09:22:30 +05:30
Karan Kumar 89aee6caaa
Fixing an issue in sequential merge (#14574)
* Fixing an issue in sequential merge where workers without any partial key statistics would get stuck because controller did not change the worker state.

* Removing empty check

* Adding IT for MSQ sequential bug fix.
2023-07-12 22:05:30 +05:30
Gian Merlino 3ff51487b7
Add ZooKeeper connection state alerts and metrics. (#14333)
* Add ZooKeeper connection state alerts and metrics.

- New metric "zk/connected" is an indicator showing 1 when connected,
  0 when disconnected.
- New metric "zk/disconnected/time" measures time spent disconnected.
- New alert when Curator connection state enters LOST or SUSPENDED.

* Use right GuardedBy.

* Test fixes, coverage.

* Adjustment.

* Fix tests.

* Fix ITs.

* Improved injection.

* Adjust metric name, add tests.
2023-07-12 09:34:28 -07:00
Jan Werner 95115d722a
CVE fixes - update of multiple dependencies. (#14519)
Apache Druid brings multiple direct and transitive dependencies that are affected by plethora of CVEs.
This PR attempts to update all the dependencies that did not require code refactoring.
This PR modifies pom files, license file and OWASP Dependency Check suppression file.
2023-07-07 20:27:30 +05:30
Abhishek Agarwal f8f2fe8b7b
Skip tests based on files changed in the PR (#14445)
Our CI system has a lot of tests. And much of this testing is really unnecessary for most of the PRs. This PR adds some checks so we can skip these expensive tests when we know they are not necessary.
2023-06-22 12:27:23 +05:30
Kashif Faraz 50461c3bd5
Enable smartSegmentLoading on the Coordinator (#13197)
This commit does a complete revamp of the coordinator to address problem areas:
- Stability: Fix several bugs, add capabilities to prioritize and cancel load queue items
- Visibility: Add new metrics, improve logs, revamp `CoordinatorRunStats`
- Configuration: Add dynamic config `smartSegmentLoading` to automatically set
optimal values for all segment loading configs such as `maxSegmentsToMove`,
`replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`.

Changed classes:
- Add `StrategicSegmentAssigner` to make assignment decisions for load, replicate and move
- Add `SegmentAction` to distinguish between load, replicate, drop and move operations
- Add `SegmentReplicationStatus` to capture current state of replication of all used segments
- Add `SegmentLoadingConfig` to contain recomputed dynamic config values
- Simplify classes `LoadRule`, `BroadcastRule`
- Simplify the `BalancerStrategy` and `CostBalancerStrategy`
- Add several new methods to `ServerHolder` to track loaded and queued segments
- Refactor `DruidCoordinator`

Impact:
- Enable `smartSegmentLoading` by default. With this enabled, none of the following
dynamic configs need to be set: `maxSegmentsToMove`, `replicationThrottleLimit`,
`maxSegmentsInNodeLoadingQueue`, `useRoundRobinSegmentAssignment`,
`emitBalancingStats` and `replicantLifetime`.
- Coordinator reports richer metrics and produces cleaner and more informative logs
- Coordinator uses an unlimited load queue for all serves, and makes better assignment decisions
2023-06-19 14:27:35 +05:30
Tejaswini Bandlamudi 8e4f003f02
Fix flaky Revised ITs failures on GHA runners (#14348)
* Fix read timed out failures and remove containers before test

* remove containers before loading images

* add labels to IT docker containers, download stable minio docker image release instead of latest
2023-06-05 18:58:54 +05:30
Paul Rogers 3c0983c8e9
Extend the IT framework to allow tests in extensions (#13877)
The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.
2023-05-15 20:29:51 +05:30
Gian Merlino eeed5ed7e2
MSQ: Use the same result coercion routines as the regular SQL endpoint. (#14046)
* MSQ: Use the same result coercion routines as the regular SQL endpoint.

The main changes are to move NativeQueryMaker.coerce to SqlResults, and
to formally make the list of sqlTypeNames from the MSQ results reports
use SqlTypeNames.

- Change the default to MSQ-compatible rather than MSQ-incompatible.
  The explicit marker function is now "notMsqCompatible()".
2023-04-15 06:56:23 +05:30
Clint Wylie 1aef72aa7e
Bump up the version in pom to 27.0.0 in preparation of release (#14051) 2023-04-10 14:56:59 +05:30
Paul Rogers 030ed911d4
Temporarily revert extended table functions for Druid 26 (#14019) 2023-04-05 21:09:33 -07:00
abhagraw eb31207402
Using MinIO to run S3DeepStorage ITs (#13997)
* Using MinIO to S3DeepStorage ITs

* Adding S3DeepStorageTest to github actions revised ITs
2023-03-30 12:15:53 -07:00
abhagraw c7d864d3bc
Update container creation in AzureTestUtil.java (#13911)
*
1. Handling deletion/creation of container created during the previously run test in AzureTestUtil.java.
2. Adding/updating log messages and comments in Azure and GCS deep storage tests.
2023-03-16 11:04:43 +05:30
Tejaswini Bandlamudi 7103cb4b9d
Removes FiniteFirehoseFactory and its implementations (#12852)
The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead.
Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.
2023-03-02 18:07:17 +05:30
Clint Wylie 1d8fff4096
sampler + type detection = bff (#13711)
* sampler + type detection = bff
* split logical and physical dimensions, tidy up
2023-02-28 04:14:30 -08:00
Tejaswini Bandlamudi e2461c21c4
fix flaky BatchIndex IT failures. (#13855) 2023-02-27 17:23:14 -08:00
Adarsh Sanjeev aceeac91d4
Fix MSQ IT test (#13808) 2023-02-22 08:14:46 -08:00
Paul Rogers 5dadbdf4d0
Generate the IT docker-compose.yaml files (#13669)
Generate IT docker-compose.sh files

Generates test-specific docker-compose.sh files using a simple
Python template script.
2023-02-21 15:03:02 -08:00
Clint Wylie 08b5951cc5
merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698)
* merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything
* fix poms and license stuff
* mockito is evil
* allow reset of JvmUtils RuntimeInfo if tests used static injection to override
2023-02-17 14:27:41 -08:00
Paul Rogers 333196d207
Code cleanup & message improvements (#13778)
* Misc cleanup edits

Correct spacing
Add type parameters
Add toString() methods to formats so tests compare correctly
IT doc revisions
Error message edits
Display UT query results when tests fail

* Edit

* Build fix

* Build fixes
2023-02-15 15:22:54 +05:30
Tejaswini Bandlamudi 9ffaba9c7f
Fix MySQL drivers setup for Revised ITs (#13800)
* download both mysql drivers and use org.mariadb.jdbc.Driver for now

* use com.mysql.jdbc.Driver
2023-02-15 11:03:25 +05:30
Paul Rogers 842ee554de
Refinements to input-source specific table functions (#13780)
Refinements to table functions

Fixes various bugs
Improves the structure of the table function classes
Adds unit and integration tests
2023-02-13 16:21:27 -08:00
Paul Rogers f28c06515b
Auto-detect docker-compose (#13754) 2023-02-06 21:29:45 +05:30
Clint Wylie fb26a1093d
discover nested columns when using nested column indexer for schemaless ingestion (#13672)
* discover nested columns when using nested column indexer for schemaless
* move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec
2023-01-18 12:57:28 -08:00
Paul Rogers fa493f1ebc
Convert from DRUID_INTEGRATION_TEST_INDEXER to USE_INDEXER (#13684)
The old ITs use DRUID_INTEGRATION_TEST_INDEXER. The new ones use the
USE_INDEXER env var passed in from the build environment.
2023-01-18 08:51:42 -08:00
Paul Rogers 22630b0aab
Much improved table functions (#13627)
Much improved table functions

* Revises properties, definitions in the catalog
* Adds a "table function" abstraction to model such functions
* Specific functions for HTTP, inline, local and S3.
* Extended SQL types in the catalog
* Restructure external table definitions to use table functions
* EXTEND syntax for Druid's extern table function
* Support for array-valued table function parameters
* Support for array-valued SQL query parameters
* Much new documentation
2023-01-17 08:41:57 -08:00
Paul Rogers ed623d626f
Support both Indexer and MiddleManager in ITs (#13660)
Support both indexer and MM in ITs

Support for the DRUID_INTEGRATION_TEST_INDEXER variable
Conditional client cluster configuration
Cleanup of OVERRIDE_ENV file handling
Enforce setting of test-specific env vars
Cleanup of unused bits
2023-01-14 14:34:06 -08:00
abhagraw 5ef689fc3f
Cloud deep storage tests in new IT framework (S3, GCS, Azure) (#13535)
* MSQ s3 deep storage tests

* Fix license check

* Getting config values from env variables

* Added  s3TestUtils

* Merged AbstractITSQLBasedIngestionTest with AbstractITBatchIndexTest

* Fixing license issues

* Fixing checkstyle errors

* Fix spotbug errors

* Update s3util name in other files

* GCS and Azure deep storage tests

* Fix license and checkstyle errors

* Fix dependency error

* fix intellij check errors

* Copy credentials file in all containers

* Refactor and gcs file upload fix

* Fixing dependency check errors and codeQL warnings

* Fixing checkstyle errors

* Fixing intellij inspection errors

* Removing unrequired exceptions

* Addressing comments
2023-01-11 09:43:44 +05:30
Karan Kumar 56076d33fb
Worker retry for MSQ task (#13353)
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
2023-01-11 07:38:29 +05:30