Commit Graph

1966 Commits

Author SHA1 Message Date
Eric Payne 9ee5265fb3 YARN-10178: Global Scheduler async thread crash caused by 'Comparison method violates its general contract. Contributed by Andras Gyori (gandras) and Qi Zhu (zhuqi). 2021-12-21 19:48:06 +00:00
Sunil G 29f81c6121 YARN-9984. FSPreemptionThread can cause NullPointerException while app is unregistered with containers running on a node. Contributed by Wilfred Spiegelenburg.
(cherry picked from commit 215f2052fc)

 Conflicts:
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSPreemptionThread.java
2021-11-29 14:35:47 +09:00
Shubham Gupta 484cac36fd YARN-10438. Handle null containerId in ClientRMService#getContainerReport() (#2313)
Co-authored-by: Shubham Gupta <gshubham@microsoft.com>
(cherry picked from commit e3cd627069)
2021-11-29 14:21:56 +09:00
Ahmed Hussein de120b16ad YARN-1115: Provide optional means for a scheduler to check real user ACLs. Contributed by Eric Payne (epayne) 2021-10-22 17:02:38 +00:00
Weiwei Yang 5f2047d491 YARN-8222. Fix potential NPE when gets RMApp from RM context. Contributed by Tao Yang.
(cherry picked from commit 251f528814)
2021-10-12 17:43:43 +00:00
Weiwei Yang bdd396b26d YARN-8546. Resource leak caused by a reserved container being released more than once under async scheduling. Contributed by Tao Yang.
(cherry picked from commit 5be9f4a5d0)
2021-10-08 16:08:45 +00:00
Weiwei Yang dc03afc7df YARN-8127. Resource leak when async scheduling is enabled. Contributed by Tao Yang.
(cherry picked from commit 7eb783e263)
2021-10-04 20:16:40 +00:00
Eric Badger 008bd8afc3 YARN-10935. AM Total Queue Limit goes below per-user AM Limit if parent is full. Contributed by Eric Payne. 2021-09-23 17:12:45 +00:00
Szilard Nemeth b196130c29
YARN-10428. Zombie applications in the YARN queue using FAIR + sizebasedweight. Contributed by Guang Yang, Andras Gyori
(cherry picked from commit 79a46599f7)

 Conflicts:
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/TestFairOrderingPolicy.java

(cherry picked from commit 7aea2e1b5c)
2021-09-01 13:16:30 +09:00
zhuqi-lucas 34acf9d4c8 YARN-10860. Make max container per heartbeat configs refreshable. Contributed by Eric Badger. 2021-07-21 15:35:45 +08:00
Jim Brennan 577ed175f9 YARN-10456. RM PartitionQueueMetrics records are named QueueMetrics in Simon metrics registry. Contributed by Eric Payne. 2021-07-15 15:21:02 +00:00
Jim Brennan f7bcc58e0f YARN-10834. Intra-queue preemption: apps that don't use defined custom resource won't be preempted. Contributed by Eric Payne. 2021-06-29 14:22:39 +00:00
lujiefsi 13a2e751e0
YARN-10555. Missing access check before getAppAttempts (#2608)
Co-authored-by: lujie <lujie@foxmail.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit d92a25b790)
2021-05-17 19:47:03 +09:00
Eric Badger 7b3a6e96d9 YARN-10479. Can't remove all node labels after add node label without
nodemanager port, broken by YARN-10647. Contributed by D M Murali Krishna Reddy

(cherry picked from commit 6857a05d6a)
2021-04-23 23:10:50 +00:00
Jim Brennan 33c4d4570d YARN-10697. Resources are displayed in bytes in UI for schedulers other than capacity. Contributed by Bilwa S T.
(cherry picked from commit 34e507cb8c)
2021-03-23 19:05:57 +00:00
Eric Payne d53ca0b887 YARN-10588. Percentage of queue and cluster is zero in WebUI . Contributed by Bilwa S T
(cherry picked from commit aa4c17b9d7)
2021-03-15 20:12:58 +00:00
Jonathan Hung 1d76a8e73f YARN-10651. CapacityScheduler crashed with NPE in AbstractYarnScheduler.updateNodeResource(). Contributed by Haibo Chen
(cherry picked from commit f348ab3f2f468751af329a1ffce4917cb000fcbf)
(cherry picked from commit be6e99963d)
(cherry picked from commit 6863a5bb8a)
(cherry picked from commit eb6c08e423)
2021-02-25 15:47:36 -08:00
Jim Brennan 4ed7b80b19 [YARN-10613] Config to allow Intra- and Inter-queue preemption to enable/disable conservativeDRF. Contributed by Eric Payne 2021-02-25 20:30:42 +00:00
Jim Brennan d0562d6cd0 YARN-10500. TestDelegationTokenRenewer fails intermittently. (#2619) Contributed by Masatake Iwasaki 2021-02-11 22:45:08 +00:00
Eric Badger 7b4034cd88 YARN-6977. Node information is not provided for non am containers in RM logs. (Suma Shivaprasad via wangda)
Change-Id: I0c44d09a560446dee2ba68c2b9ae69fce0ec1d3e
(cherry picked from commit 8a42e922fad613f3cf1cc6cb0f3fa72546a9cc56)
(cherry picked from commit 958e8c0e25)
2021-02-08 20:04:56 +00:00
Jonathan Hung 6f436a6776 YARN-10467. ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers. Contributed by Haibo Chen 2020-10-28 10:45:34 -07:00
Eric Badger c4b42fa1ae YARN-10450. Add cpu and memory utilization per node and cluster-wide metrics.
Contributed by Jim Brennan.
2020-10-16 19:29:04 +00:00
Jim Brennan 0bf270d2ed YARN-10451. RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined. Contributed by Eric Payne
(cherry picked from commit ecf91638a8)
2020-10-06 18:46:08 +00:00
Masatake Iwasaki f4e0c14fe9 Preparing for 2.10.2 development 2020-09-13 14:33:36 +09:00
Eric E Payne e5bd8d2840 YARN-10177: Backport YARN-7307 to branch-2.10 Allow client/AM update supported resource types via YARN APIs 2020-09-04 18:23:08 +00:00
Eric E Payne 21788f9fd4 YARN-8459. Improve Capacity Scheduler logs to debug invalid states. Contributed by Wangda Tan and Jim Brennan. 2020-08-10 20:52:44 +00:00
Jonathan Hung 865828ae63 YARN-10251. Show extended resources on legacy RM UI. Contributed by Eric Payne 2020-08-07 17:45:04 -07:00
Eric Badger 9bf554deae YARN-4575. ApplicationResourceUsageReport should return ALL reserved resource.
Contributed by Bibin Chundatt and Eric Payne.

(cherry picked from commit 647be0c0f6)
2020-08-05 23:20:52 +00:00
Jonathan Hung 50e68e67b6 YARN-10343. Legacy RM UI should include labeled metrics for allocated, total, and reserved resources. Contributed by Eric Payne 2020-07-28 13:45:14 -07:00
Eric Badger a4b419cdf5 YARN-10348. Allow RM to always cancel tokens after app completes. Contributed by
Jim Brennan.

(cherry picked from commit 09f1547697)
2020-07-14 18:33:56 +00:00
Eric E Payne 7190507aa2 YARN-10297. TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently. Contributed by Jim Brennan (Jim_Brennan)
(cherry picked from commit 0427100b75)
2020-07-13 21:51:32 +00:00
Eric E Payne 76fa956d3b YARN-9903: Support reservations continue looking for Node Labels. Contributed by Jim Brennan (Jim_Brennan).
(cherry picked from commit e6794f2fc4)
2020-06-29 19:55:18 +00:00
Tao Yang a91d4d612f YARN-8011. TestOpportunisticContainerAllocatorAMService#testContainerPromoteAndDemoteBeforeContainerStart fails intermittently. Contributed by Jim Brennan. 2020-06-12 10:56:24 +08:00
Eric E Payne af324e3153 YARN-10300: appMasterHost not set in RM ApplicationSummary when AM fails before first heartbeat. Contributed by Eric Badger (ebadger).
(cherry picked from commit 56247db302)
(cherry picked from commit 034d458511)
(cherry picked from commit 2e4892061a)
2020-06-09 22:16:16 +00:00
Jonathan Hung b9a0f99966 YARN-6492. Generate queue metrics for each partition. Contributed by Manikandan R 2020-06-01 10:48:41 -07:00
Jonathan Hung b8c88f6968 YARN-10260. Allow transitioning queue from DRAINING to RUNNING state. Contributed by Bilwa S T
(cherry picked from commit fff1d2c122)
(cherry picked from commit 564d3211f2)
(cherry picked from commit a7ea55e015)
(cherry picked from commit b3e9aff5f7)
2020-05-12 10:53:37 -07:00
Ahmed Hussein 0f0707fb0d YARN-8959. TestContainerResizing fails randomly (Ahmed Hussein via jeagles)
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
2020-05-06 12:48:12 -05:00
Jonathan Hung 27ad054696 YARN-8193. YARN RM hangs abruptly (stops allocating resources) when running successive applications. (Zian Chen via wangda) 2020-04-30 12:16:15 -07:00
Wangda Tan 34804679e3 YARN-8369. Javadoc build failed due to 'bad use of >'. (Takanobu Asanuma via wangda)
Change-Id: I79a42154e8f86ab1c3cc939b3745024b8eebe5f4
(cherry picked from commit 17aa40f669)
2020-04-19 12:51:56 +09:00
Jonathan Hung ebce5c74e6 YARN-9954. Configurable max application tags and max tag length. Contributed by Bilwa S T
(cherry picked from commit cd6c10de442fc3a53c9ed5521ac1d944a6ac95c6)
(cherry picked from commit 2c79865b951d0fdea7f576ce31e310b4074ecedd)
2020-04-17 10:35:39 -07:00
Jonathan Hung c0394c5434 YARN-10212. Create separate configuration for max global AM attempts. Contributed by Bilwa S T
(cherry picked from commit 57659422abbf6d9bf52e6e27fca775254bb77a56)
(cherry picked from commit e3a52804b03d646f15048c078f8c5292d5cbecfa)
(cherry picked from commit 54599b177c)
(cherry picked from commit 6271a2852e)
2020-04-09 11:07:59 -07:00
Jonathan Hung a7556f1ec2 YARN-8213. Add Capacity Scheduler performance metrics. (Weiwei Yang via wangda) 2020-03-27 16:10:39 -07:00
Jonathan Hung 1c8529f030 YARN-10200. Add number of containers to RMAppManager summary
(cherry picked from commit 2de0572cdc1c6fdbfaab108b169b2d5b0c077e86)
(cherry picked from commit 5d3fb0ebe9)
(cherry picked from commit 9c6dd8c83a)
2020-03-25 10:39:45 -07:00
Jonathan Hung c34c87b1a8 YARN-8292. Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative. Contributed by Wangda Tan and Eric Payne 2020-02-07 17:29:32 -08:00
Jonathan Hung 4fce8c8023 YARN-10116. Expose diagnostics in RMAppManager summary
(cherry picked from commit 314e2f9d2e)
(cherry picked from commit 147897da4b420b4749f3c7b410f4c329632c3352)
(cherry picked from commit fa35b8370ce14c9b8ee911b73fda380817b964fd)
2020-02-05 11:16:09 -08:00
Eric Badger 21970f6f67 YARN-10084. Allow inheritance of max app lifetime / default app lifetime. Contributed by Eric Payne. 2020-01-30 21:29:33 +00:00
Abhishek Modi 296786a647 YARN-9790. Failed to set default-application-lifetime if maximum-application-lifetime is less than or equal to zero. Contributed by kyungwan nam.
(cherry picked from commit d2d963f3d4)
2020-01-23 17:12:25 +00:00
Eric E Payne 5cca5ca81b YARN-7387: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently. Contributed by Jim Brennan (Jim_Brennan)
(cherry picked from commit b1e07d27cc)
2020-01-08 19:59:13 +00:00
Eric E Payne 2ae1b3568b YARN-10072: TestCSAllocateCustomResource failures. Contributed by Jim Brennan (Jim_Brennan)
(cherry picked from commit 6899be5a17)
2020-01-08 18:04:12 +00:00
Eric Badger cb5b80d6cb YARN-10009. In Capacity Scheduler, DRC can treat minimum user limit percent as a max when custom resource is defined. Contributed by Eric Payne. 2019-12-20 19:40:55 +00:00