Commit Graph

5316 Commits

Author SHA1 Message Date
Jonathan Hung 6863a5bb8a YARN-10651. CapacityScheduler crashed with NPE in AbstractYarnScheduler.updateNodeResource(). Contributed by Haibo Chen
(cherry picked from commit f348ab3f2f468751af329a1ffce4917cb000fcbf)
(cherry picked from commit be6e99963d)
2021-02-25 15:26:57 -08:00
Jim Brennan 3795f66364 [YARN-10613] Config to allow Intra- and Inter-queue preemption to enable/disable conservativeDRF. Contributed by Eric Payne 2021-02-25 20:07:30 +00:00
Eric Badger 4ef5ed382b YARN-10647. Fix TestRMNodeLabelsManager failed after YARN-10501. Contributed by
Qi Zhu.

(cherry picked from commit 47420ae3ed)
2021-02-22 19:07:51 +00:00
Eric Badger 2c25f80a9c YARN-10501. Can't remove all node labels after add node label without
nodemanager port. Contributed by caozhiqiang.

(cherry picked from commit 4891e68c2b)
2021-02-19 23:37:13 +00:00
Jim Brennan e6f5dbbe7f [YARN-10626] Log resource allocation in NM log at container start time. Contributed by Eric Badger 2021-02-16 17:19:15 +00:00
Jim Brennan 2117ab6b71 RN-10500. TestDelegationTokenRenewer fails intermittently. (#2619) Contributed by Masatake Iwasaki 2021-02-11 21:32:07 +00:00
Jim Brennan 6cc0eb3e30 [YARN-10607] User environment is unable to prepend PATH when mapreduce.admin.user.env also sets PATH. Contributed by Eric Badger.
(cherry picked from commit c22c77af43)
2021-02-05 17:52:16 +00:00
Eric Badger d79f705a30 YARN-10562. Follow up changes for YARN-9833. Contributed by Jim Brennan.
(cherry picked from commit 768e2f42ba)
2021-01-13 23:53:16 +00:00
Eric Payne a093bd859d YARN-4589: Diagnostics for localization timeouts is lacking. Contributed by Chang Li (lichangleo) and Jim Brennan (Jim_Brennan) 2021-01-13 19:44:26 +00:00
He Xiaoqiao a17793534c
Make upstream aware of 3.2.2 release. 2021-01-09 18:07:46 +08:00
Szilard Nemeth 59795ec3d6 YARN-10528. maxAMShare should only be accepted for leaf queues, not parent queues. Contributed by Siddharth Ahuja 2021-01-08 12:49:58 +01:00
Eric Badger 264dd67018 YARN-10540. Node page is broken in YARN UI1 and UI2 including RMWebService api
for nodes. Contributed by Jim Brennan.

(cherry picked from commit 4c5d88e230)
2020-12-21 23:22:43 +00:00
Andrea Scarpino 66e8f1ae09
YARN-10511. Update yarn.nodemanager.env-whitelist value in docs (#2512)
Reviewed-by: Adam Antal <adamantal@apache.org>
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 9170eb566b)
2020-12-04 00:17:48 +09:00
Eric Payne 1184284baf YARN-10278: CapacityScheduler test framework ProportionalCapacityPreemptionPolicyMockFramework. Contributed by Szilard Nemeth (snemeth) 2020-12-02 17:22:49 +00:00
kevinzhao1661 7e89bd5e42
YARN-10498. Fix typo in CapacityScheduler Markdown document (#2484)
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 4d2ae5b398)
2020-11-30 11:18:41 +09:00
Akira Ajisaka 752c890f9d
YARN-10470. When building new web ui with root user, the bower install should support it. Contributed by zhuqi.
(cherry picked from commit c4ba0ab7df)
2020-11-24 15:23:13 +09:00
Ahmed Hussein c107e75424 YARN-10485. TimelineConnector swallows InterruptedException (#2450). Contributed by Ahmed Hussein
(cherry picked from commit 0b2510ee1f)
2020-11-16 21:21:58 +00:00
Peter Bacsko c5ae78b793 YARN-10396. Max applications calculation per queue disregards queue level settings in absolute mode. Contributed by Benjamin Teke. 2020-11-16 11:48:50 +01:00
Eric E Payne d6a55caa9a YARN-10479. RMProxy should retry on SocketTimeout Exceptions. Contributed by Jim Brennan (Jim_Brennan)
(cherry picked from commit 55339c2bdd)
2020-11-05 22:23:24 +00:00
Eric E Payne 31154fdde5 YARN-10475: Scale RM-NM heartbeat interval based on node utilization. Contributed by Jim Brennan (Jim_Brennan). 2020-11-02 17:33:57 +00:00
Jim Brennan 63888afdd0 YARN-10471. Prevent logs for any container from becoming larger than a configurable size. Contributed by Eric Payne 2020-10-29 20:17:51 +00:00
Jonathan Hung d0104e72c5 YARN-10467. ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers. Contributed by Haibo Chen
(cherry picked from commit bab5bf9743)
(cherry picked from commit f95c0824b0)
2020-10-28 10:38:58 -07:00
Eric Badger 4c61136616 YARN-10450. Add cpu and memory utilization per node and cluster-wide metrics.
Contributed by Jim Brennan.
2020-10-16 18:51:53 +00:00
He Xiaoqiao 3274fd139d
Preparing for 3.2.3 development 2020-10-16 14:52:41 +08:00
Akira Ajisaka a2c1fb7c8c
YARN-9848. Revert YARN-4946. Contributed by Steven Rand. 2020-10-16 01:04:45 +09:00
Jim Brennan e1c6804ace YARN-9667. Container-executor.c duplicates messages to stdout. Contributed by Peter Bacsko 2020-10-08 21:09:30 +00:00
Jim Brennan 4ef9cf9d71 YARN-10455. TestNMProxy.testNMProxyRPCRetry is not consistent. Contributed by Ahmed Hussein
(cherry picked from commit deb35a32ba)
2020-10-08 19:01:38 +00:00
Jim Brennan ecf91638a8 YARN-10451. RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined. Contributed by Eric Payne 2020-10-06 18:36:51 +00:00
Adam Antal b7420eb4b0 YARN-10393. MR job live lock caused by completed state container leak in heartbeat between node manager and RM. Contributed by zhenzhao wang and Jim Brennan
(cherry picked from commit a1f7e760df)
2020-10-05 10:39:14 +02:00
Eric E Payne 947b0a154a YARN-9809. Added node manager health status to resource manager registration call. Contributed by Eric Badger (ebadger). 2020-09-28 18:50:44 +00:00
Jim Brennan 1efb54bd52 YARN-10430. Log improvements in NodeStatusUpdaterImpl. Contributed by Bilwa S T. 2020-09-15 16:27:08 +00:00
Eric E Payne 5b14af6d09 YARN-10390: LeafQueue: retain user limits cache across assignContainers() calls. Contributed by Samir Khan (samkhan).
(cherry picked from commit 9afec2ed17)
2020-09-11 16:46:28 +00:00
bibinchundatt b5d24d646c YARN-10369. Make NMTokenSecretManagerInRM sending NMToken for nodeId DEBUG. Contributed by Jim Brennan.
(cherry picked from commit 5d8600e80a)
2020-09-08 21:05:26 +00:00
Eric Badger 01ada576f3 [YARN-10353] Log vcores used and cumulative cpu in containers monitor.
Contributed by Jim Brennan

(cherry picked from commit 736bed6d6d)
2020-09-08 16:14:26 +00:00
Adam Antal 696494d663 YARN-10332. RESOURCE_UPDATE event was repeatedly registered in DECOMMISSIONING state. Contributed by yehuanhuan
(cherry picked from commit 34fe74da0e)
2020-09-07 12:01:35 +02:00
Sunil G 94723bff64 Revert "YARN-10396. Max applications calculation per queue disregards queue level settings in absolute mode. Contributed by Benjamin Teke."
This reverts commit 2a40a33dfe.
2020-08-20 19:15:10 +05:30
Sunil G 2a40a33dfe YARN-10396. Max applications calculation per queue disregards queue level settings in absolute mode. Contributed by Benjamin Teke.
(cherry picked from commit 82ec28f442)
2020-08-19 12:00:33 +05:30
Jonathan Hung 17d18a2a3a YARN-10251. Show extended resources on legacy RM UI. Contributed by Eric Payne 2020-08-07 17:43:52 -07:00
Eric Badger 9a1db93b1b YARN-4575. ApplicationResourceUsageReport should return ALL reserved resource.
Contributed by Bibin Chundatt and Eric Payne.

(cherry picked from commit 5edd8b925e)
2020-08-05 19:03:48 +00:00
Eric E Payne 863689ff9a YARN-1529: Add Localization overhead metrics to NM. Contributed by Jim_Brennan.
(cherry picked from commit e0c9653166)
2020-07-30 17:08:02 +00:00
Jonathan Hung ffb920de2a YARN-10343. Legacy RM UI should include labeled metrics for allocated, total, and reserved resources. Contributed by Eric Payne 2020-07-28 13:44:17 -07:00
Eric Badger 7350773b69 YARN-4771. Some containers can be skipped during log aggregation after NM
restart. Contributed by Jason Lowe and Jim Brennan.

(cherry picked from commit ac5f21dbef)
2020-07-24 22:55:08 +00:00
Ayush Saxena 27a97e4f28 HADOOP-17100. Replace Guava Supplier with Java8+ Supplier in Hadoop. Contributed by Ahmed Hussein. 2020-07-22 18:39:49 +05:30
Akira Ajisaka 0d949d375e HADOOP-16753. Refactor HAAdmin. Contributed by Xieming Li.
(cherry picked from commit 1defe3a65a)
2020-07-20 13:20:34 -07:00
Ahmed Hussein 8fd3dcc9ce HADOOP-17099. Replace Guava Predicate with Java8+ Predicate
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
(cherry picked from commit 1f71c4ae71)
2020-07-15 12:05:49 -05:00
Ahmed Hussein 43a865dc07 HADOOP-17101. Replace Guava Function with Java8+ Function
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
(cherry picked from commit 98fcffe93f)
2020-07-15 10:18:47 -05:00
Eric Badger 09f1547697 YARN-10348. Allow RM to always cancel tokens after app completes. Contributed by
Jim Brennan.
2020-07-14 18:26:15 +00:00
Eric E Payne 52f2303b5a YARN-10297. TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently. Contributed by Jim Brennan (Jim_Brennan)
(cherry picked from commit 0427100b75)
2020-07-13 21:34:21 +00:00
Masatake Iwasaki 936dece92b YARN-10347. Fix double locking in CapacityScheduler#reinitialize in branch-3.1.
(cherry picked from commit 4fa8055aa4)
2020-07-09 14:19:22 +09:00
Eric E Payne e6794f2fc4 YARN-9903: Support reservations continue looking for Node Labels. Contributed by Jim Brennan (Jim_Brennan). 2020-06-29 19:21:04 +00:00
Eric Badger 7363931942 YARN-10312. Add support for yarn logs -logFile to retain backward compatibility.
Contributed by Jim Brennan
2020-06-12 19:08:36 +00:00
Szilard Nemeth 30d7a06686 YARN-10295. CapacityScheduler NPE can cause apps to get stuck without resources. Contributed by Benjamin Teke 2020-06-10 18:16:21 +02:00
Szilard Nemeth 8bef26a607 YARN-10296. Make ContainerPBImpl#getId/setId synchronized. Contributed by Benjamin Teke 2020-06-10 18:01:20 +02:00
Eric E Payne 034d458511 YARN-10300: appMasterHost not set in RM ApplicationSummary when AM fails before first heartbeat. Contributed by Eric Badger (ebadger).
(cherry picked from commit 56247db302)
2020-06-09 21:09:11 +00:00
Szilard Nemeth 54c89ffad4 YARN-10286. PendingContainers bugs in the scheduler outputs. Contributed by Andras Gyori 2020-06-05 09:49:54 +02:00
Jonathan Hung f31146bc1f YARN-6492. Generate queue metrics for each partition. Contributed by Manikandan R
(cherry picked from commit c30c23cb66)
(cherry picked from commit 7a323a45aa)
2020-05-29 10:43:33 -07:00
Jonathan Hung a7ea55e015 YARN-10260. Allow transitioning queue from DRAINING to RUNNING state. Contributed by Bilwa S T
(cherry picked from commit fff1d2c122)
(cherry picked from commit 564d3211f2)
2020-05-12 10:52:58 -07:00
Szilard Nemeth d345994468 YARN-9444. YARN API ResourceUtils's getRequestedResourcesFromConfig doesn't recognize yarn.io/gpu as a valid resource. Contributed by Gergely Pollak
(cherry picked from commit 52e9ee39a1)
2020-05-07 18:17:06 +00:00
Ahmed Hussein 7740b88ee9 YARN-8959. TestContainerResizing fails randomly (Ahmed Hussein via jeagles)
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
(cherry picked from commit 92e3ebb401)
2020-05-06 12:32:36 -05:00
Ahmed Hussein b23a585cb1 YARN-10256. Refactor TestContainerSchedulerQueuing.testContainerUpdateExecTypeGuaranteedToOpportunistic (Ahmed Hussein via jeagles)
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
(cherry picked from commit f5081a9a5d)
2020-05-04 10:49:45 -05:00
Szilard Nemeth 1dabbd5006 YARN-10194. YARN RMWebServices /scheduler-conf/validate leaks ZK Connections. Contributed by Prabhu Joseph 2020-04-28 17:52:14 +02:00
Szilard Nemeth f445487d50 YARN-10189. Code cleanup in LeveldbRMStateStore. Contributed by Benjamin Teke 2020-04-27 09:59:13 +02:00
Szilard Nemeth 0dbb02a76c YARN-9999. TestFSSchedulerConfigurationStore: Extend from ConfigurationStoreBaseTest, general code cleanup. Contributed by Benjamin Teke 2020-04-24 11:31:33 +02:00
Szilard Nemeth 3b67dc24aa YARN-9998. Code cleanup in LeveldbConfigurationStore. Contributed by Benjamin Teke 2020-04-24 11:15:53 +02:00
Akira Ajisaka 7b036c512f
YARN-10223. Remove jersey-test-framework-core dependency from yarn-server-common. (#1939)
(cherry picked from commit 9827ff2961)
2020-04-24 10:47:36 +09:00
Wei-Chiu Chuang 48f1c8ffb6 Revert "YARN-10063. Add container-executor arguments --http/--https to usage. Contributed by Siddharth Ahuja"
This reverts commit a2067aafa9.
2020-04-23 12:37:21 -07:00
Szilard Nemeth c81844d8a5 YARN-9996. Code cleanup in QueueAdminConfigurationMutationACLPolicy. Contributed by Siddharth Ahuja 2020-04-23 14:57:18 +02:00
Szilard Nemeth 764fa92c9f YARN-9997. Code cleanup in ZKConfigurationStore. Contributed by Andras Gyori 2020-04-23 14:51:07 +02:00
Szilard Nemeth 73cb3d3cb3 YARN-10001. Add explanation of unimplemented methods in InMemoryConfigurationStore. Contributed by Siddharth Ahuja 2020-04-18 09:38:22 +02:00
Jonathan Hung d1af4e0fae YARN-9954. Configurable max application tags and max tag length. Contributed by Bilwa S T
(cherry picked from commit 49ae9b2137)
2020-04-17 10:36:16 -07:00
Szilard Nemeth 2f01a91428 YARN-10002. Code cleanup and improvements in ConfigurationStoreBaseTest. Contributed by Benjamin Teke 2020-04-15 08:24:15 +02:00
Szilard Nemeth 58e559b5ac YARN-9354. Resources should be created with ResourceTypesTestHelper instead of TestUtils. Contributed by Andras Gyori 2020-04-15 08:16:15 +02:00
Szilard Nemeth a7eb95f114 YARN-9995. Code cleanup in TestSchedConfCLI. Contributed by Bilwa S T. 2020-04-15 08:05:07 +02:00
Szilard Nemeth 9af7d905f9 YARN-5277. When localizers fail due to resource timestamps being out, provide more diagnostics. Contributed by Siddharth Ahuja 2020-04-15 07:55:49 +02:00
Jonathan Hung 54599b177c YARN-10212. Create separate configuration for max global AM attempts. Contributed by Bilwa S T
(cherry picked from commit 57659422abbf6d9bf52e6e27fca775254bb77a56)
(cherry picked from commit e3a52804b03d646f15048c078f8c5292d5cbecfa)
2020-04-09 10:37:36 -07:00
Szilard Nemeth d2853d1bb0 YARN-10003. YarnConfigurationStore#checkVersion throws exception that belongs to RMStateStore. Contributed by Benjamin Teke 2020-04-09 17:40:26 +02:00
Szilard Nemeth 9a66477e4d YARN-10207. CLOSE_WAIT socket connection leaks during rendering of (corrupted) aggregated logs on the JobHistoryServer Web UI. Contributed by Siddharth Ahuja 2020-04-09 12:22:12 +02:00
Akira Ajisaka 2a2c3a4094
HADOOP-14836. Upgrade maven-clean-plugin to 3.1.0 (#1933)
(cherry picked from commit e53d472bb0)
2020-04-09 01:49:28 +09:00
Wilfred Spiegelenburg a2067aafa9
YARN-10063. Add container-executor arguments --http/--https to usage. Contributed by Siddharth Ahuja
Conflicts:
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c

(cherry picked from commit 2214005c0f)
2020-04-08 13:12:31 +10:00
Akira Ajisaka 6d5f87b228
YARN-10202. Fix documentation about NodeAttributes. Contributed by Sen Zhao.
(cherry picked from commit c162648aff)
2020-04-01 16:06:41 +09:00
Jonathan Hung 5d3fb0ebe9 YARN-10200. Add number of containers to RMAppManager summary
(cherry picked from commit 2de0572cdc1c6fdbfaab108b169b2d5b0c077e86)
2020-03-25 10:27:48 -07:00
Siyao Meng 29e1880d27
HADOOP-16935. Backport HADOOP-10848. Cleanup calling of sun.security.krb5.Config. (#1912)
(cherry picked from commit 0d47d283a6)

Co-authored-by: Akira Ajisaka <aajisaka@apache.org>
2020-03-24 16:01:33 -07:00
Szilard Nemeth 9e0d742025 YARN-9419. Log a warning if GPU isolation is enabled but LinuxContainerExecutor is disabled. Contribued by Andras Gyori 2020-03-10 16:39:03 +01:00
Eric E Payne 153eac1d21 YARN-942. TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers fails sporadically Contributed by Ahmed Hussein (ahussein)
(cherry picked from commit ede05b19d1)
2020-03-10 14:28:13 +00:00
Inigo Goiri 733f9b76b6 YARN-10161. TestRouterWebServicesREST is corrupting STDOUT. Contributed by Jim Brennan.
(cherry picked from commit a43510e21d)
2020-02-27 13:19:43 -08:00
Elixir Kook 7d0ff2dc85
YARN-10156. Fix typo 'complaint' which means quite different in Federation.md (#1856)
(cherry picked from commit d608e94f92)
2020-02-26 17:32:09 +09:00
Szilard Nemeth 92ad3bd099 YARN-10143. YARN-10101 broke Yarn logs CLI. Contributed by Adam Antal 2020-02-24 21:27:31 +01:00
Sunil G de63115a2a YARN-10139. ValidateAndGetSchedulerConfiguration API fails when cluster max allocation > default 8GB. Contributed by Prabhu Joseph.
(cherry picked from commit 6526f95bd2)
2020-02-19 11:18:19 +05:30
Szilard Nemeth 6aec712c6c YARN-10101. Support listing of aggregated logs for containers belonging to an application attempt. Contributed by Adam Antal 2020-02-11 09:18:44 +01:00
Sunil G 95b1cbcbd4 YARN-10109. Allow stop and convert from leaf to parent queue in a single Mutation API call. Contributed by Prabhu Joseph
(cherry picked from commit 28f730b317)
2020-02-09 21:15:31 +05:30
Jonathan Hung aca930402c YARN-10116. Expose diagnostics in RMAppManager summary
(cherry picked from commit 314e2f9d2e)
2020-02-05 11:17:03 -08:00
Prabhu Joseph 7136ebbb7a YARN-10022. Add RM Rest API to validate a CapacityScheduler Config with delta change
Contributed by Kinga Marton.

(cherry-picked from commit 1ab9c692fa)
2020-02-04 14:06:23 +05:30
Eric Badger 5736ecd123 YARN-10084. Allow inheritance of max app lifetime / default app lifetime. Contributed by Eric Payne. 2020-01-29 04:07:28 +00:00
Abhishek Modi be412546be YARN-9790. Failed to set default-application-lifetime if maximum-application-lifetime is less than or equal to zero. Contributed by kyungwan nam.
(cherry picked from commit d2d963f3d4)
2020-01-23 15:23:45 +00:00
Szilard Nemeth dcc16b07fa YARN-10083. Addendum to fix compilation error due to missing import 2020-01-22 17:18:03 +01:00
Szilard Nemeth 1e7679035f YARN-7913. Improve error handling when application recovery fails with exception. Contributed by Wilfred Spiegelenburg 2020-01-22 16:50:15 +01:00
Szilard Nemeth da416c826f YARN-10083. Provide utility to ask whether an application is in final status. Contributed by Adam Antal 2020-01-22 16:18:35 +01:00
Szilard Nemeth bbdc39c13e YARN-9462. TestResourceTrackerService.testNodeRemovalGracefully fails sporadically. Contributed by Prabhu Joseph 2020-01-22 15:49:35 +01:00
Szilard Nemeth c411a1468f YARN-8148. Update decimal values for queue capacities shown on queue status CLI. Contributed by Prabhu Joseph 2020-01-20 09:29:52 +01:00
Szilard Nemeth 805589fc71 YARN-9970. Refactor TestUserGroupMappingPlacementRule#verifyQueueMapping. Contributed by Manikandan R 2020-01-16 18:51:56 +01:00