druid

Commit Graph

Author	SHA1	Message	Date
George Shiqi Wu	dc0b163e19	Separate task lifecycle from kubernetes/location lifecycle (#15133 ) * Separate k8s and druid task lifecycles * Remove extra log lines * Fix unit tests * fix unit tests * Fix unit tests * notify listeners on task completion * Fix unit test * unused var * PR changes * Fix unit tests * Fix checkstyle * PR changes	2023-10-17 08:17:43 -07:00
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Xavier Léauté	adef2069b1	Make unit tests pass with Java 21 (#15014 ) This change updates dependencies as needed and fixes tests to remove code incompatible with Java 21 As a result all unit tests now pass with Java 21. * update maven-shade-plugin to 3.5.0 and follow-up to #15042 * explain why we need to override configuration when specifying outputFile * remove configuration from dependency management in favor of explicit overrides in each module. * update to mockito to 5.5.0 for Java 21 support when running with Java 11+ * continue using latest mockito 4.x (4.11.0) when running with Java 8 * remove need to mock private fields * exclude incorrectly declared mockito dependency from pac4j-oidc * remove mocking of ByteBuffer, since sealed classes can no longer be mocked in Java 21 * add JVM options workaround for system-rules junit plugin not supporting Java 18+ * exclude older versions of byte-buddy from assertj-core * fix for Java 19 changes in floating point string representation * fix missing InitializedNullHandlingTest * update easymock to 5.2.0 for Java 21 compatibility * update animal-sniffer-plugin to 1.23 * update nl.jqno.equalsverifier to 3.15.1 * update exec-maven-plugin to 3.1.0	2023-10-03 22:41:21 -07:00
George Shiqi Wu	64754b6799	Allow users to pass task payload via deep storage instead of environment variable (#14887 ) This change is meant to fix a issue where passing too large of a task payload to the mm-less task runner will cause the peon to fail to startup because the payload is passed (compressed) as a environment variable (TASK_JSON). In linux systems the limit for a environment variable is commonly 128KB, for windows systems less than this. Setting a env variable longer than this results in a bunch of "Argument list too long" errors.	2023-10-03 14:08:59 +05:30
George Shiqi Wu	8e22a178cc	Support getTaskLocation for mixed task runner (#15033 ) The KubernetesAndWorkerTaskRunner currently doesn't implement getTaskLocation, so tasks run by it will show a unknown TaskLocation in the druid console after a task has completed. Fix bug in KubernetesAndWorkerTaskRunner that manifests as missing information in the druid Web Console.	2023-09-27 08:57:36 +05:30
Tejaswini Bandlamudi	48b6d2abf9	skip org.owasp:dependency-check on extensions-contrib modules and suppress false-positive gRPC CVEs (#15026 )	2023-09-25 12:14:42 +05:30
YongGang	be3f93e3cf	Restore tasks when lifecycle start (#14909 ) * K8s tasks restore should be from lifecycle start * add test * add more tests * fix test * wait tasks restore finish when start * fix style * revert previous change and add comment	2023-09-22 12:03:34 -07:00
George Shiqi Wu	d459df8d6e	Fix log syntax (#15004 )	2023-09-18 10:40:02 -07:00
George Shiqi Wu	f773d83914	Mixed task runner for migration to mm-less ingestion (#14918 ) * save work * Working * Fix runner constructor * Working runner * extra log lines * try using lifecycle for everything * clean up configs * cleanup /workers call * Use a single config * Allow selecting runner * debug changes * Work on composite task runner * Unit tests running * Add documentation * Add some javadocs * Fix spelling * Use standard libraries * code review * fix * fix * use taskRunner as string * checkstyl --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-09-11 18:09:46 -07:00
Kashif Faraz	289ee1e011	Refactor: Cleanup NoopTask (#14938 ) Changes: - Simplify static `create` methods for `NoopTask` - Remove `FirehoseFactory`, `IsReadyResult`, `readyTime` from `NoopTask` as these fields were not being used anywhere - Update tests	2023-09-05 09:15:41 +05:30
Kashif Faraz	7f26b80e21	Simplify ServiceMetricEvent.Builder (#14933 ) Changes: - Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent> and thus convert it to a plain builder rather than a builder of builder. - Add methods setCreatedTime , setMetricAndValue to the builder	2023-09-01 11:30:45 +05:30
George Shiqi Wu	95b0de61d1	Move some lifecycle management from doTask -> shutdown for the mm-less task runner (#14895 ) * save work * Add syncronized * Don't shutdown in run * Adding unit tests * Cleanup lifecycle * Fix tests * remove newline	2023-08-25 10:50:38 -06:00
George Shiqi Wu	ad32f84586	Fix capacity response in mm-less ingestion (#14888 ) Changes: - Fix capacity response in mm-less ingestion. - Add field usedClusterCapacity to the GET /totalWorkerCapacity response. This API should be used to get the total ingestion capacity on the overlord. - Remove method `isK8sTaskRunner` from interface `TaskRunner`	2023-08-25 08:17:38 +05:30
Tejaswini Bandlamudi	d87056e708	Upgrade guava version to 31.1-jre (#14767 ) Currently, Druid is using Guava 16.0.1 version. This upgrade to 31.1-jre fixes the following issues. CVE-2018-10237 (Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable). We don't use Java or GWT serializations. Despite being false positive they're causing red security scans on Druid distribution. Latest version of google-client-api is incompatible with the existing Guava version. This PR unblocks Update google client apis to latest version #14414	2023-08-22 12:09:53 +05:30
YongGang	3954685aae	Report more metrics to monitor K8s task runner (#14771 ) * Report pod running metrics to monitor K8s task runner * refine method definition * fix checkstyle * implement task metrics * more comment * address comments * update doc for the new metrics reported * fix checkstyle * refine method definition * minor refine	2023-08-16 14:03:53 -04:00
Rishabh Singh	0dc305f9e4	Upgrade hibernate validator version to fix CVE-2019-10219 (#14757 )	2023-08-14 11:50:51 +05:30
George Shiqi Wu	c8a11702db	Support broadcast segmetns (#14789 )	2023-08-11 11:14:05 -07:00
George Shiqi Wu	c8537dbeaf	Add lifecycle hooks to KubernetesTaskRunner (#14790 )	2023-08-09 21:16:44 -07:00
George Shiqi Wu	14940dc3ed	Add pod name to TaskLocation for easier observability and debugging. (#14758 ) * Add pod name to location * Add log * fix style * Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesPeonLifecycle.java Co-authored-by: Suneet Saldanha <suneet@apache.org> * Fix unit tests --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-08-07 12:33:35 -07:00
YongGang	3335040b22	Report task/pending/time metrics for k8s based ingestion (#14698 ) Changes: * Add and invoke `StateListener` when state changes in `KubernetesPeonLifecycle` * Report `task/pending/time` metric in `KubernetesTaskRunner` when state moves to RUNNING	2023-08-04 09:07:11 +05:30
George Shiqi Wu	174053f4fd	Add readme for kubernetes-overlord-extensions and update docs (#14674 ) * Add readme for kubernetes task scheduler * clean up uneeded stuff * Update extensions-contrib/kubernetes-overlord-extensions/README.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Move documentation into main page * indentation * cleanup spellcheck errors * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update extensions-contrib/kubernetes-overlord-extensions/README.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * PR comments * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Suneet Saldanha <suneet@apache.org> --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-08-01 13:29:44 -07:00
YongGang	9b88b78ba4	Fix race condition in KubernetesTaskRunner when task is added to the map (#14643 ) Changes: - Fix race condition in KubernetesTaskRunner introduced by #14435 - Perform addition and removal from map inside a synchronized block - Update tests	2023-07-27 12:34:36 +05:30
George Shiqi Wu	f742bb7376	Get task location should be stored on the lifecycle object (#14649 ) * Fix issue with long data source names * Use the regular library * Save location and tls enabled * Null out before running * add another comment	2023-07-24 18:36:19 -07:00
George Shiqi Wu	28914bbab8	Fix issue with long data source names (#14620 ) * Fix issue with long data source names * Use the regular library * fix overlord utils test	2023-07-24 08:45:10 -07:00
AmatyaAvadhanula	0412f40d36	Prepare master branch for next release, 28.0.0 (#14595 ) * Prepare master branch for next release, 28.0.0	2023-07-18 09:22:30 +05:30
Jan Werner	95115d722a	CVE fixes - update of multiple dependencies. (#14519 ) Apache Druid brings multiple direct and transitive dependencies that are affected by plethora of CVEs. This PR attempts to update all the dependencies that did not require code refactoring. This PR modifies pom files, license file and OWASP Dependency Check suppression file.	2023-07-07 20:27:30 +05:30
George Shiqi Wu	bd07c3dd43	Don't need to double synchronize on simple map operations (#14435 ) * Don't need to double syncronize on simple map operations * remove lock	2023-06-17 17:30:37 -07:00
George Shiqi Wu	76e70654ac	Fix issues when startup timeout is hit (#14425 )	2023-06-14 11:49:55 -07:00
George Shiqi Wu	cb65135b99	Fix log streaming (#14285 ) * Fix log streaming * Add watch log * Add unit tests * long running client * singleton client * Remove accidental close	2023-05-22 11:19:53 -07:00
George Shiqi Wu	51f722b7f1	Fix labels (#14282 ) * Fix labels * move to a util function * style * PR comments * rename class	2023-05-18 11:51:58 -07:00
Nicholas Lippis	58dcbf9399	queue tasks in kubernetes task runner if capacity is fully utilized (#14156 ) * queue tasks if all slots in use * Declare hamcrest-core dependency * Use AtomicBoolean for shutdown requested * Use AtomicReference for peon lifecycle state * fix uninitialized read error * fix indentations * Make tasks protected * fix KubernetesTaskRunnerConfig deserialization * ensure k8s task runner max capacity is Integer.MAX_VALUE * set job duration as task status duration * Address pr comments --------- Co-authored-by: George Shiqi Wu <george.wu@imply.io>	2023-05-12 09:41:44 -06:00
George Shiqi Wu	161d12eb44	Fix unit tests for java 17 (#14207 ) Fix a unit test that fails in java 17	2023-05-09 20:02:31 +05:30
George Shiqi Wu	eed5f4f291	Add labels to k8s jobs for the PodTemplateTaskAdapter (#14205 ) * Add labels * Add prefix * remove newline * fix syntax * Update prefix	2023-05-08 10:56:52 +08:00
Churro	123c4908c8	Ephemeral storage is respected from the overlod for peon tasks (#14201 )	2023-05-05 16:27:29 -07:00
Clint Wylie	90ea192d9c	fix bugs with auto encoded long vector deserializers (#14186 ) This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+. While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.	2023-05-01 11:49:27 +05:30
Nicholas Lippis	6579c1c5b6	remove unneeded TaskLogStreamer binding override (#14176 )	2023-04-27 19:39:24 +05:30
Nicholas Lippis	9d4cc501f7	return task status reported by peon (#14040 ) * return task status reported by peon * Write TaskStatus to file in AbstractTask.cleanUp * Get TaskStatus from task log * Fix merge conflicts in AbstractTaskTest * Add unit tests for TaskLogPusher, TaskLogStreamer, NoopTaskLogs to satisfy code coverage * Add license headerss * Fix style * Remove unknown exception declarations	2023-04-24 12:05:39 -07:00
imply-cheddar	aaa6cc1883	Make the tasks run with only a single directory (#14063 ) * Make the tasks run with only a single directory There was a change that tried to get indexing to run on multiple disks It made a bunch of changes to how tasks run, effectively hiding the "safe" directory for tasks to write files into from the task code itself making it extremely difficult to do anything correctly inside of a task. This change reverts those changes inside of the tasks and makes it so that only the task runners are the ones that make decisions about which mount points should be used for storing task-related files. It adds the config druid.worker.baseTaskDirs which can be used by the task runners to know which directories they should schedule tasks inside of. The TaskConfig remains the authoritative source of configuration for where and how an individual task should be operating.	2023-04-13 00:45:02 -07:00
George Shiqi Wu	00d777d848	Fix race condition in KubernetesTaskRunner between shutdown and getKnownTasks (#14030 ) * Fix issues with null pointers on jobResponse * fix unit tests * Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * nullable * fix error message * Use jobs for known tasks instead of pods * Remove log lines * remove log lines * PR change requests * revert wait change --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2023-04-10 13:27:49 -07:00
Clint Wylie	1aef72aa7e	Bump up the version in pom to 27.0.0 in preparation of release (#14051 )	2023-04-10 14:56:59 +05:30
Nicholas Lippis	5810e650d4	K8s mm less fixes (#14028 ) Update Fabric8 version and allow metrics monitors to be overriden	2023-04-05 22:23:16 +05:30
George Shiqi Wu	f60f377e5f	Fix issues with null pointers on jobResponse (#14010 ) * Fix issues with null pointers on jobResponse * fix unit tests * Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * nullable * fix error message --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2023-04-04 17:48:18 -07:00
George Shiqi Wu	4560b9d8aa	New error message for task deletion (#14008 ) * New error message * Add unit test	2023-04-03 14:26:09 -07:00
Nicholas Lippis	61a35262ec	Kubernetes task runner live reports (#13986 ) Implement Live Reports for the KubernetesTaskRunner	2023-03-30 10:30:22 +05:30
George Shiqi Wu	44abe2b96f	Fix bug in k8s task runner in handling deleted jobs (#14001 ) With the KubernetesTaskRunner, if a task is manually shutdown via the web console while running or the corresponding k8s job is manually deleted, the thread responsible for overseeing the task gets stuck in a loop because the fabric8 client sends one event to it that the job is null when the job is deleted, but this doesn't pass the condition. This means that the thread is stuck waiting on a fabric8 event (the job being successful) that will never come up until maxTaskDuration (default 4 hours). If a user of the extension is trying to use a limited taskqueue maxSize, this can cause problems as the k8s executor pool is unable to pick up additional tasks (since threads are stuck waiting on the old tasks that have already been deleted).	2023-03-30 10:09:52 +05:30
Nicholas Lippis	488f1d8363	Do not print error message if pod not found when getting task location (#13971 ) Do not print error message if pod not found when getting task location	2023-03-29 13:27:06 +05:30
Nicholas Lippis	8a72544bd2	Hook up pod template adapter (#13966 ) * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969) * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * fix spelling errors --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2023-03-24 12:13:46 -06:00
Nicholas Lippis	36df2495e1	Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969 )	2023-03-23 16:45:01 -06:00
Nicholas Lippis	d81d13b9ba	Pod template task adapter (#13896 ) * Pod template task adapter * Use getBaseTaskDirPaths * Remove unused task from getEnv * Use Optional.ifPresent() instead of Optional.map() * Pass absolute path * Don't pass task to getEnv * Assert the correct adapter is created * Javadocs and Comments * Add exception message to assertions	2023-03-22 14:20:24 -06:00
Nicholas Lippis	faac43eabe	Use base task dir in kubernetes task runner (#13880 ) * Use TaskConfig to get task dir in KubernetesTaskRunner * Use the first path specified in baseTaskDirPaths instead of deprecated baseTaskDirPath * Use getBaseTaskDirPaths in generate command	2023-03-07 15:30:42 -07:00

1 2

63 Commits