2363 Commits

Author SHA1 Message Date
Akira Ajisaka
d830e84217
YARN-9875. Improve fair scheduler configuration store on HDFS. (#3262)
Contributed by Prabhu Joseph

(cherry picked from commit 155864da006346a500ff35c2f6b69281093195b1)

 Conflicts:
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/conf/TestFSSchedulerConfigurationStore.java

Co-authored-by: Eric Yang <eyang@apache.org>
2021-08-04 16:12:54 +09:00
Akira Ajisaka
e259d85a99
YARN-8992. Fair scheduler can delete a dynamic queue while an application attempt is being added to the queue. (Contributed by Wilfred Spiegelenburg) (#3257)
(cherry picked from commit a41b648e98b6a1c5a9cdb7393e73e576a97f56d4)

Co-authored-by: Haibo Chen <haibochen@apache.org>
2021-08-04 10:17:51 +09:00
Akira Ajisaka
786a43e729
YARN-8990. Fix fair scheduler race condition in app submit and queue cleanup. (Contributed by Wilfred Spiegelenburg) (#3254)
(cherry picked from commit 524a7523c427b55273133078898ae3535897bada)

 Conflicts:
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java

Co-authored-by: Haibo Chen <haibochen@apache.org>
2021-08-03 13:12:01 +09:00
Szilard Nemeth
7a8b6265c6 YARN-10789. RM HA startup can fail due to race conditions in ZKConfigurationStore. Contributed by Tarun Parimi 2021-07-29 19:22:57 +02:00
Szilard Nemeth
6bf02a63e7 YARN-10813. Set default capacity of root for node labels. Contributed by Andras Gyori 2021-07-28 14:54:36 +02:00
zhuqi-lucas
cfe36ba762 YARN-10860. Make max container per heartbeat configs refreshable. Contributed by Eric Badger. 2021-07-22 10:13:55 +08:00
Jim Brennan
4a70248cb6 YARN-10456. RM PartitionQueueMetrics records are named QueueMetrics in Simon metrics registry. Contributed by Eric Payne.
(cherry picked from commit 632f64cadb1dfd8f0940e350b9314b4d4f8eda4b)
2021-07-15 14:49:22 +00:00
Jim Brennan
c3c86b18cb YARN-10834. Intra-queue preemption: apps that don't use defined custom resource won't be preempted. Contributed by Eric Payne.
(cherry picked from commit dc6f456e953e685370277d3d6bf3515b5001bca3)
2021-06-28 15:28:38 +00:00
Szilard Nemeth
c8b090d9a4 YARN-10828. Backport YARN-9789 to branch-3.2. Contributed by Tarun Parimi 2021-06-24 20:20:23 +02:00
lujiefsi
bc97dd0d26
YARN-10555. Missing access check before getAppAttempts (#2608)
Co-authored-by: lujie <lujie@foxmail.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit d92a25b790e5ad7d8e21fc3949cdd0f74d496b1b)
2021-05-17 14:05:26 +09:00
Eric Badger
951ce9b371 YARN-10479. Can't remove all node labels after add node label without
nodemanager port, broken by YARN-10647. Contributed by D M Murali Krishna Reddy

(cherry picked from commit 6857a05d6ac566a60336c0a28951f09ecda39f24)
2021-04-23 22:55:25 +00:00
Eric Badger
b36810ee3f YARN-10723. Change CS nodes page in UI to support custom resource. Contributed by Qi Zhu
(cherry picked from commit 6cb90005a7d0651474883ac4e1b6961ef74fe513)
2021-04-20 17:54:20 +00:00
Eric Badger
899cef53bd YARN-10702. Add cluster metric for amount of CPU used by RM Event Processor.
Contributed by Jim Brennan.
2021-04-08 18:33:42 +00:00
Jim Brennan
0fcb2f28ce YARN-10697. Resources are displayed in bytes in UI for schedulers other than capacity. Contributed by Bilwa S T.
(cherry picked from commit 174f3a96b10a0ab0fd8aed1b0f904ca5f0c3f268)
2021-03-23 18:37:32 +00:00
Eric Badger
8bfa4cc6d8 YARN-10688. ClusterMetrics should support GPU capacity related metrics.. Contributed by Qi Zhu.
(cherry picked from commit 49f89f1d3de66f3bb4db5952e8873432ba62f71a)

Conflicts:
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCSAllocateCustomResource.java
2021-03-17 18:50:00 +00:00
Eric Payne
5617bfa0d4 YARN-10588. Percentage of queue and cluster is zero in WebUI . Contributed by Bilwa S T
(cherry picked from commit aa4c17b9d7af122163789a731ced05f740562e45)
2021-03-15 19:38:33 +00:00
Peter Bacsko
0b8bfc50c6 YARN-10672. All testcases in TestReservations are flaky. Contributed by Szilard Nemeth. 2021-03-08 14:32:13 +01:00
Jonathan Hung
6863a5bb8a YARN-10651. CapacityScheduler crashed with NPE in AbstractYarnScheduler.updateNodeResource(). Contributed by Haibo Chen
(cherry picked from commit f348ab3f2f468751af329a1ffce4917cb000fcbf)
(cherry picked from commit be6e99963ded94adf6f447ff53f2ba66b99120ca)
2021-02-25 15:26:57 -08:00
Jim Brennan
3795f66364 [YARN-10613] Config to allow Intra- and Inter-queue preemption to enable/disable conservativeDRF. Contributed by Eric Payne 2021-02-25 20:07:30 +00:00
Jim Brennan
2117ab6b71 RN-10500. TestDelegationTokenRenewer fails intermittently. (#2619) Contributed by Masatake Iwasaki 2021-02-11 21:32:07 +00:00
Szilard Nemeth
59795ec3d6 YARN-10528. maxAMShare should only be accepted for leaf queues, not parent queues. Contributed by Siddharth Ahuja 2021-01-08 12:49:58 +01:00
Eric Payne
1184284baf YARN-10278: CapacityScheduler test framework ProportionalCapacityPreemptionPolicyMockFramework. Contributed by Szilard Nemeth (snemeth) 2020-12-02 17:22:49 +00:00
Peter Bacsko
c5ae78b793 YARN-10396. Max applications calculation per queue disregards queue level settings in absolute mode. Contributed by Benjamin Teke. 2020-11-16 11:48:50 +01:00
Eric E Payne
31154fdde5 YARN-10475: Scale RM-NM heartbeat interval based on node utilization. Contributed by Jim Brennan (Jim_Brennan). 2020-11-02 17:33:57 +00:00
Jonathan Hung
d0104e72c5 YARN-10467. ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers. Contributed by Haibo Chen
(cherry picked from commit bab5bf9743f54f48cc2f31b4e5c8b6d4e5a5cfb8)
(cherry picked from commit f95c0824b01175590fe98e2fba1e5988694a52da)
2020-10-28 10:38:58 -07:00
Eric Badger
4c61136616 YARN-10450. Add cpu and memory utilization per node and cluster-wide metrics.
Contributed by Jim Brennan.
2020-10-16 18:51:53 +00:00
He Xiaoqiao
3274fd139d
Preparing for 3.2.3 development 2020-10-16 14:52:41 +08:00
Akira Ajisaka
a2c1fb7c8c
YARN-9848. Revert YARN-4946. Contributed by Steven Rand. 2020-10-16 01:04:45 +09:00
Jim Brennan
ecf91638a8 YARN-10451. RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined. Contributed by Eric Payne 2020-10-06 18:36:51 +00:00
Eric E Payne
947b0a154a YARN-9809. Added node manager health status to resource manager registration call. Contributed by Eric Badger (ebadger). 2020-09-28 18:50:44 +00:00
Eric E Payne
5b14af6d09 YARN-10390: LeafQueue: retain user limits cache across assignContainers() calls. Contributed by Samir Khan (samkhan).
(cherry picked from commit 9afec2ed1721467aef7f2cd025d713273b12a6ca)
2020-09-11 16:46:28 +00:00
bibinchundatt
b5d24d646c YARN-10369. Make NMTokenSecretManagerInRM sending NMToken for nodeId DEBUG. Contributed by Jim Brennan.
(cherry picked from commit 5d8600e80ad7864b332b60d5a01585fdf00848ee)
2020-09-08 21:05:26 +00:00
Adam Antal
696494d663 YARN-10332. RESOURCE_UPDATE event was repeatedly registered in DECOMMISSIONING state. Contributed by yehuanhuan
(cherry picked from commit 34fe74da0e9c68173e1de196c496b9cfca029618)
2020-09-07 12:01:35 +02:00
Sunil G
94723bff64 Revert "YARN-10396. Max applications calculation per queue disregards queue level settings in absolute mode. Contributed by Benjamin Teke."
This reverts commit 2a40a33dfecb17eba42f67c0151be9b1e86740aa.
2020-08-20 19:15:10 +05:30
Sunil G
2a40a33dfe YARN-10396. Max applications calculation per queue disregards queue level settings in absolute mode. Contributed by Benjamin Teke.
(cherry picked from commit 82ec28f4421c162a505ba5e5b329e4be199878a7)
2020-08-19 12:00:33 +05:30
Jonathan Hung
17d18a2a3a YARN-10251. Show extended resources on legacy RM UI. Contributed by Eric Payne 2020-08-07 17:43:52 -07:00
Eric Badger
9a1db93b1b YARN-4575. ApplicationResourceUsageReport should return ALL reserved resource.
Contributed by Bibin Chundatt and Eric Payne.

(cherry picked from commit 5edd8b925ef22b83350a21abed6ecc551adb92ee)
2020-08-05 19:03:48 +00:00
Jonathan Hung
ffb920de2a YARN-10343. Legacy RM UI should include labeled metrics for allocated, total, and reserved resources. Contributed by Eric Payne 2020-07-28 13:44:17 -07:00
Ayush Saxena
27a97e4f28 HADOOP-17100. Replace Guava Supplier with Java8+ Supplier in Hadoop. Contributed by Ahmed Hussein. 2020-07-22 18:39:49 +05:30
Eric Badger
09f1547697 YARN-10348. Allow RM to always cancel tokens after app completes. Contributed by
Jim Brennan.
2020-07-14 18:26:15 +00:00
Eric E Payne
52f2303b5a YARN-10297. TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently. Contributed by Jim Brennan (Jim_Brennan)
(cherry picked from commit 0427100b7543d412f4fafe631b7ace289662d28c)
2020-07-13 21:34:21 +00:00
Masatake Iwasaki
936dece92b YARN-10347. Fix double locking in CapacityScheduler#reinitialize in branch-3.1.
(cherry picked from commit 4fa8055aa4624b4073b95c89e4c3a58e8d8117a0)
2020-07-09 14:19:22 +09:00
Eric E Payne
e6794f2fc4 YARN-9903: Support reservations continue looking for Node Labels. Contributed by Jim Brennan (Jim_Brennan). 2020-06-29 19:21:04 +00:00
Szilard Nemeth
30d7a06686 YARN-10295. CapacityScheduler NPE can cause apps to get stuck without resources. Contributed by Benjamin Teke 2020-06-10 18:16:21 +02:00
Eric E Payne
034d458511 YARN-10300: appMasterHost not set in RM ApplicationSummary when AM fails before first heartbeat. Contributed by Eric Badger (ebadger).
(cherry picked from commit 56247db3022705635580c4d2f8b0abde109f954f)
2020-06-09 21:09:11 +00:00
Szilard Nemeth
54c89ffad4 YARN-10286. PendingContainers bugs in the scheduler outputs. Contributed by Andras Gyori 2020-06-05 09:49:54 +02:00
Jonathan Hung
f31146bc1f YARN-6492. Generate queue metrics for each partition. Contributed by Manikandan R
(cherry picked from commit c30c23cb665761e997bcfc1dc00908f70b069fa2)
(cherry picked from commit 7a323a45aad07eed532d684d6dbe8436ba39c31c)
2020-05-29 10:43:33 -07:00
Jonathan Hung
a7ea55e015 YARN-10260. Allow transitioning queue from DRAINING to RUNNING state. Contributed by Bilwa S T
(cherry picked from commit fff1d2c1226ec23841b04dd478e8b97f31abbeba)
(cherry picked from commit 564d3211f27c35bf3143a4bd1b3f8eeac2c6b01f)
2020-05-12 10:52:58 -07:00
Ahmed Hussein
7740b88ee9 YARN-8959. TestContainerResizing fails randomly (Ahmed Hussein via jeagles)
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
(cherry picked from commit 92e3ebb40199aec0890b868b8d6bf2d7fe90abbf)
2020-05-06 12:32:36 -05:00
Szilard Nemeth
1dabbd5006 YARN-10194. YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Contributed by Prabhu Joseph 2020-04-28 17:52:14 +02:00