Commit Graph

86 Commits

Author SHA1 Message Date
George Shiqi Wu 76e70654ac
Fix issues when startup timeout is hit (#14425) 2023-06-14 11:49:55 -07:00
George Shiqi Wu cb65135b99
Fix log streaming (#14285)
* Fix log streaming

* Add watch log

* Add unit tests

* long running client

* singleton client

* Remove accidental close
2023-05-22 11:19:53 -07:00
George Shiqi Wu 51f722b7f1
Fix labels (#14282)
* Fix labels

* move to a util function

* style

* PR comments

* rename class
2023-05-18 11:51:58 -07:00
Nicholas Lippis 58dcbf9399
queue tasks in kubernetes task runner if capacity is fully utilized (#14156)
* queue tasks if all slots in use

* Declare hamcrest-core dependency

* Use AtomicBoolean for shutdown requested

* Use AtomicReference for peon lifecycle state

* fix uninitialized read error

* fix indentations

* Make tasks protected

* fix KubernetesTaskRunnerConfig deserialization

* ensure k8s task runner max capacity is Integer.MAX_VALUE

* set job duration as task status duration

* Address pr comments

---------

Co-authored-by: George Shiqi Wu <george.wu@imply.io>
2023-05-12 09:41:44 -06:00
George Shiqi Wu 161d12eb44
Fix unit tests for java 17 (#14207)
Fix a unit test that fails in java 17
2023-05-09 20:02:31 +05:30
George Shiqi Wu eed5f4f291
Add labels to k8s jobs for the PodTemplateTaskAdapter (#14205)
* Add labels

* Add prefix

* remove newline

* fix syntax

* Update prefix
2023-05-08 10:56:52 +08:00
Churro 123c4908c8
Ephemeral storage is respected from the overlod for peon tasks (#14201) 2023-05-05 16:27:29 -07:00
Clint Wylie 90ea192d9c
fix bugs with auto encoded long vector deserializers (#14186)
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
2023-05-01 11:49:27 +05:30
Nicholas Lippis 6579c1c5b6
remove unneeded TaskLogStreamer binding override (#14176) 2023-04-27 19:39:24 +05:30
Nicholas Lippis 9d4cc501f7
return task status reported by peon (#14040)
* return task status reported by peon

* Write TaskStatus to file in AbstractTask.cleanUp

* Get TaskStatus from task log

* Fix merge conflicts in AbstractTaskTest

* Add unit tests for TaskLogPusher, TaskLogStreamer, NoopTaskLogs to satisfy code coverage

* Add license headerss

* Fix style

* Remove unknown exception declarations
2023-04-24 12:05:39 -07:00
imply-cheddar aaa6cc1883
Make the tasks run with only a single directory (#14063)
* Make the tasks run with only a single directory

There was a change that tried to get indexing to run on multiple disks
It made a bunch of changes to how tasks run, effectively hiding the
"safe" directory for tasks to write files into from the task code itself
making it extremely difficult to do anything correctly inside of a task.

This change reverts those changes inside of the tasks and makes it so that
only the task runners are the ones that make decisions about which
mount points should be used for storing task-related files.

It adds the config druid.worker.baseTaskDirs which can be used by the
task runners to know which directories they should schedule tasks inside of.
The TaskConfig remains the authoritative source of configuration for where
and how an individual task should be operating.
2023-04-13 00:45:02 -07:00
George Shiqi Wu 00d777d848
Fix race condition in KubernetesTaskRunner between shutdown and getKnownTasks (#14030)
* Fix issues with null pointers on jobResponse

* fix unit tests

* Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* nullable

* fix error message

* Use jobs for known tasks instead of pods

* Remove log lines

* remove log lines

* PR change requests

* revert wait change

---------

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2023-04-10 13:27:49 -07:00
Clint Wylie 1aef72aa7e
Bump up the version in pom to 27.0.0 in preparation of release (#14051) 2023-04-10 14:56:59 +05:30
Nicholas Lippis 5810e650d4
K8s mm less fixes (#14028)
Update Fabric8 version and allow metrics monitors to be overriden
2023-04-05 22:23:16 +05:30
George Shiqi Wu f60f377e5f
Fix issues with null pointers on jobResponse (#14010)
* Fix issues with null pointers on jobResponse

* fix unit tests

* Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* nullable

* fix error message

---------

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2023-04-04 17:48:18 -07:00
George Shiqi Wu 4560b9d8aa
New error message for task deletion (#14008)
* New error message

* Add unit test
2023-04-03 14:26:09 -07:00
Nicholas Lippis 61a35262ec
Kubernetes task runner live reports (#13986)
Implement Live Reports for the KubernetesTaskRunner
2023-03-30 10:30:22 +05:30
George Shiqi Wu 44abe2b96f
Fix bug in k8s task runner in handling deleted jobs (#14001)
With the KubernetesTaskRunner, if a task is manually shutdown via the web console while running or the corresponding k8s job is manually deleted, the thread responsible for overseeing the task gets stuck in a loop because the fabric8 client sends one event to it that the job is null when the job is deleted, but this doesn't pass the condition.

This means that the thread is stuck waiting on a fabric8 event (the job being successful) that will never come up until maxTaskDuration (default 4 hours). If a user of the extension is trying to use a limited taskqueue maxSize, this can cause problems as the k8s executor pool is unable to pick up additional tasks (since threads are stuck waiting on the old tasks that have already been deleted).
2023-03-30 10:09:52 +05:30
Nicholas Lippis 488f1d8363
Do not print error message if pod not found when getting task location (#13971)
Do not print error message if pod not found when getting task location
2023-03-29 13:27:06 +05:30
Nicholas Lippis 8a72544bd2
Hook up pod template adapter (#13966)
* Hook up PodTemplateTaskAdapter

* Make task adapter TYPE parameters final

* Rename adapters types

* Include specified adapter name in exception message

* Documentation for sidecarSupport deprecation

* Fix order

* Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969)

* Update docs/development/extensions-contrib/k8s-jobs.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Hook up PodTemplateTaskAdapter

* Make task adapter TYPE parameters final

* Rename adapters types

* Include specified adapter name in exception message

* Documentation for sidecarSupport deprecation

* Fix order

* fix spelling errors

---------

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2023-03-24 12:13:46 -06:00
Nicholas Lippis 36df2495e1
Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969) 2023-03-23 16:45:01 -06:00
Nicholas Lippis d81d13b9ba
Pod template task adapter (#13896)
* Pod template task adapter

* Use getBaseTaskDirPaths

* Remove unused task from getEnv

* Use Optional.ifPresent() instead of Optional.map()

* Pass absolute path

* Don't pass task to getEnv

* Assert the correct adapter is created

* Javadocs and Comments

* Add exception message to assertions
2023-03-22 14:20:24 -06:00
Nicholas Lippis faac43eabe
Use base task dir in kubernetes task runner (#13880)
* Use TaskConfig to get task dir in KubernetesTaskRunner

* Use the first path specified in baseTaskDirPaths instead of deprecated baseTaskDirPath

* Use getBaseTaskDirPaths in generate command
2023-03-07 15:30:42 -07:00
Nicholas Lippis cd4ad5123a
Stream Kubernetes Job Logs (#13869)
Streams Kubernetes job logs from the Kubernetes client to a file on the machine instead of reading the logs into memory and then writing to a file.
2023-03-06 19:52:42 +05:30
Nicholas Lippis 7123681ada
Allow druid-kubernetes-overlord-extensions to be loaded in any druid service (#13872)
Allow druid-kubernetes-overlord-extensions to be loaded in any druid service
2023-03-03 23:53:12 +05:30
Nicholas Lippis 1aae37f7d6
Fix expectedSingleiContainerOutput.yaml spelling (#13870) 2023-03-02 00:07:15 -08:00
Clint Wylie 38ac71ee56
one version of mockito is more than enough (#13871) 2023-03-01 23:27:18 -08:00
Nicholas Lippis d32dc1b0c9
Remove K8sOverlordConfig.java (#13866) 2023-03-02 09:43:48 +05:30
Clint Wylie 08b5951cc5
merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698)
* merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything
* fix poms and license stuff
* mockito is evil
* allow reset of JvmUtils RuntimeInfo if tests used static injection to override
2023-02-17 14:27:41 -08:00
Churro c1f283fd31
Better sidecar support (#13655)
* Better sidecar support

* remove un-thrown exception from test

* Druid you are such a stickler about spelling :)

* Only require the primaryContainerName, no need to exclude containers
2023-02-14 10:56:15 +05:30
AmatyaAvadhanula 0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
Kashif Faraz 78ae0b7533
Upgrade to netty 4.1.86.Final to address CVEs (#13604)
This commit addresses the following CVEs:
- CVE-2021-43797
- CVE-2022-41881
2022-12-23 01:44:01 +05:30
Kashif Faraz 7cf761cee4
Prepare master branch for next release, 26.0.0 (#13401)
* Prepare master branch for next release, 26.0.0

* Use docker image for druid 24.0.1

* Fix version in druid-it-cases pom.xml
2022-11-22 15:31:01 +05:30
Churro 9a684af3c9
Fixing the K8s task runner to work with MSQ (#13305)
* Fixing the K8s task runner to work with MSQ

* Sorry incomplete PR

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
2022-11-08 14:41:05 +05:30
dependabot[bot] 081508f1aa
Bump commons-text from 1.9 to 1.10.0 in /extensions-contrib/kubernetes-overlord-extensions (#13299)
* Bump commons-text in /extensions-contrib/kubernetes-overlord-extensions

Bumps commons-text from 1.9 to 1.10.0.

---
updated-dependencies:
- dependency-name: org.apache.commons:commons-text
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Cleanup pom

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Frank Chen <frank.chen021@outlook.com>
2022-11-05 15:21:39 +08:00
Dr. Sizzles e5ad24ff9f
Support for middle manager less druid, tasks launch as k8s jobs (#13156)
* Support for middle manager less druid, tasks launch as k8s jobs

* Fixing forking task runner test

* Test cleanup, dependency cleanup, intellij inspections cleanup

* Changes per PR review

Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support

* Removing un-needed log lines

* Small changes per PR review

* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly

* Merge conflict fix

* Fixing tests and docs

* update tiny-cluster.yaml 

changed `enableTaskLevelLogPush` to `encapsulatedTask`

* Apply suggestions from code review

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Minor changes per PR request

* Cleanup, adding test to AbstractTask

* Add comment in peon.sh

* Bumping code coverage

* More tests to make code coverage happy

* Doh a duplicate dependnecy

* Integration test setup is weird for k8s, will do this in a different PR

* Reverting back all integration test changes, will do in anotbher PR

* use StringUtils.base64 instead of Base64

* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-02 19:44:47 -07:00