Commit Graph

26 Commits

Author SHA1 Message Date
George Shiqi Wu eed5f4f291
Add labels to k8s jobs for the PodTemplateTaskAdapter (#14205)
* Add labels

* Add prefix

* remove newline

* fix syntax

* Update prefix
2023-05-08 10:56:52 +08:00
Churro 123c4908c8
Ephemeral storage is respected from the overlod for peon tasks (#14201) 2023-05-05 16:27:29 -07:00
Clint Wylie 90ea192d9c
fix bugs with auto encoded long vector deserializers (#14186)
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
2023-05-01 11:49:27 +05:30
Nicholas Lippis 6579c1c5b6
remove unneeded TaskLogStreamer binding override (#14176) 2023-04-27 19:39:24 +05:30
Nicholas Lippis 9d4cc501f7
return task status reported by peon (#14040)
* return task status reported by peon

* Write TaskStatus to file in AbstractTask.cleanUp

* Get TaskStatus from task log

* Fix merge conflicts in AbstractTaskTest

* Add unit tests for TaskLogPusher, TaskLogStreamer, NoopTaskLogs to satisfy code coverage

* Add license headerss

* Fix style

* Remove unknown exception declarations
2023-04-24 12:05:39 -07:00
imply-cheddar aaa6cc1883
Make the tasks run with only a single directory (#14063)
* Make the tasks run with only a single directory

There was a change that tried to get indexing to run on multiple disks
It made a bunch of changes to how tasks run, effectively hiding the
"safe" directory for tasks to write files into from the task code itself
making it extremely difficult to do anything correctly inside of a task.

This change reverts those changes inside of the tasks and makes it so that
only the task runners are the ones that make decisions about which
mount points should be used for storing task-related files.

It adds the config druid.worker.baseTaskDirs which can be used by the
task runners to know which directories they should schedule tasks inside of.
The TaskConfig remains the authoritative source of configuration for where
and how an individual task should be operating.
2023-04-13 00:45:02 -07:00
George Shiqi Wu 00d777d848
Fix race condition in KubernetesTaskRunner between shutdown and getKnownTasks (#14030)
* Fix issues with null pointers on jobResponse

* fix unit tests

* Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* nullable

* fix error message

* Use jobs for known tasks instead of pods

* Remove log lines

* remove log lines

* PR change requests

* revert wait change

---------

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2023-04-10 13:27:49 -07:00
Nicholas Lippis 5810e650d4
K8s mm less fixes (#14028)
Update Fabric8 version and allow metrics monitors to be overriden
2023-04-05 22:23:16 +05:30
George Shiqi Wu f60f377e5f
Fix issues with null pointers on jobResponse (#14010)
* Fix issues with null pointers on jobResponse

* fix unit tests

* Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* nullable

* fix error message

---------

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2023-04-04 17:48:18 -07:00
George Shiqi Wu 4560b9d8aa
New error message for task deletion (#14008)
* New error message

* Add unit test
2023-04-03 14:26:09 -07:00
Nicholas Lippis 61a35262ec
Kubernetes task runner live reports (#13986)
Implement Live Reports for the KubernetesTaskRunner
2023-03-30 10:30:22 +05:30
George Shiqi Wu 44abe2b96f
Fix bug in k8s task runner in handling deleted jobs (#14001)
With the KubernetesTaskRunner, if a task is manually shutdown via the web console while running or the corresponding k8s job is manually deleted, the thread responsible for overseeing the task gets stuck in a loop because the fabric8 client sends one event to it that the job is null when the job is deleted, but this doesn't pass the condition.

This means that the thread is stuck waiting on a fabric8 event (the job being successful) that will never come up until maxTaskDuration (default 4 hours). If a user of the extension is trying to use a limited taskqueue maxSize, this can cause problems as the k8s executor pool is unable to pick up additional tasks (since threads are stuck waiting on the old tasks that have already been deleted).
2023-03-30 10:09:52 +05:30
Nicholas Lippis 488f1d8363
Do not print error message if pod not found when getting task location (#13971)
Do not print error message if pod not found when getting task location
2023-03-29 13:27:06 +05:30
Nicholas Lippis 8a72544bd2
Hook up pod template adapter (#13966)
* Hook up PodTemplateTaskAdapter

* Make task adapter TYPE parameters final

* Rename adapters types

* Include specified adapter name in exception message

* Documentation for sidecarSupport deprecation

* Fix order

* Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969)

* Update docs/development/extensions-contrib/k8s-jobs.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Hook up PodTemplateTaskAdapter

* Make task adapter TYPE parameters final

* Rename adapters types

* Include specified adapter name in exception message

* Documentation for sidecarSupport deprecation

* Fix order

* fix spelling errors

---------

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2023-03-24 12:13:46 -06:00
Nicholas Lippis 36df2495e1
Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969) 2023-03-23 16:45:01 -06:00
Nicholas Lippis d81d13b9ba
Pod template task adapter (#13896)
* Pod template task adapter

* Use getBaseTaskDirPaths

* Remove unused task from getEnv

* Use Optional.ifPresent() instead of Optional.map()

* Pass absolute path

* Don't pass task to getEnv

* Assert the correct adapter is created

* Javadocs and Comments

* Add exception message to assertions
2023-03-22 14:20:24 -06:00
Nicholas Lippis faac43eabe
Use base task dir in kubernetes task runner (#13880)
* Use TaskConfig to get task dir in KubernetesTaskRunner

* Use the first path specified in baseTaskDirPaths instead of deprecated baseTaskDirPath

* Use getBaseTaskDirPaths in generate command
2023-03-07 15:30:42 -07:00
Nicholas Lippis cd4ad5123a
Stream Kubernetes Job Logs (#13869)
Streams Kubernetes job logs from the Kubernetes client to a file on the machine instead of reading the logs into memory and then writing to a file.
2023-03-06 19:52:42 +05:30
Nicholas Lippis 7123681ada
Allow druid-kubernetes-overlord-extensions to be loaded in any druid service (#13872)
Allow druid-kubernetes-overlord-extensions to be loaded in any druid service
2023-03-03 23:53:12 +05:30
Nicholas Lippis 1aae37f7d6
Fix expectedSingleiContainerOutput.yaml spelling (#13870) 2023-03-02 00:07:15 -08:00
Clint Wylie 38ac71ee56
one version of mockito is more than enough (#13871) 2023-03-01 23:27:18 -08:00
Nicholas Lippis d32dc1b0c9
Remove K8sOverlordConfig.java (#13866) 2023-03-02 09:43:48 +05:30
Churro c1f283fd31
Better sidecar support (#13655)
* Better sidecar support

* remove un-thrown exception from test

* Druid you are such a stickler about spelling :)

* Only require the primaryContainerName, no need to exclude containers
2023-02-14 10:56:15 +05:30
AmatyaAvadhanula 0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
Churro 9a684af3c9
Fixing the K8s task runner to work with MSQ (#13305)
* Fixing the K8s task runner to work with MSQ

* Sorry incomplete PR

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
2022-11-08 14:41:05 +05:30
Dr. Sizzles e5ad24ff9f
Support for middle manager less druid, tasks launch as k8s jobs (#13156)
* Support for middle manager less druid, tasks launch as k8s jobs

* Fixing forking task runner test

* Test cleanup, dependency cleanup, intellij inspections cleanup

* Changes per PR review

Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support

* Removing un-needed log lines

* Small changes per PR review

* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly

* Merge conflict fix

* Fixing tests and docs

* update tiny-cluster.yaml 

changed `enableTaskLevelLogPush` to `encapsulatedTask`

* Apply suggestions from code review

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Minor changes per PR request

* Cleanup, adding test to AbstractTask

* Add comment in peon.sh

* Bumping code coverage

* More tests to make code coverage happy

* Doh a duplicate dependnecy

* Integration test setup is weird for k8s, will do this in a different PR

* Reverting back all integration test changes, will do in anotbher PR

* use StringUtils.base64 instead of Base64

* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-02 19:44:47 -07:00