Commit Graph

4942 Commits

Author SHA1 Message Date
Jonathan Hung 95ec7050b5 Addendum to YARN-9730. Support forcing configured partitions to be exclusive based on app node label
(cherry picked from commit d86a1acc866cbda845fb3896dc824baf12217383)
(cherry picked from commit f4f210d2e5)
2019-09-25 17:49:52 -07:00
Jonathan Hung 783cbced1d YARN-9730. Support forcing configured partitions to be exclusive based on app node label
(cherry picked from commit 73a044a63822303f792183244e25432528ecfb1e)
(cherry picked from commit dd094d79023f6598e47146166aa8c213e03d41b7)
2019-09-24 13:52:14 -07:00
Jonathan Hung 6a1d2d56bd YARN-9762. Add submission context label to audit logs. Contributed by Manoj Kumar
(cherry picked from commit 3d78b1223d)
(cherry picked from commit a1fa9a8a7f)
2019-09-23 13:13:09 -07:00
Sunil G 45cf3de2e9 YARN-9833. Race condition when DirectoryCollection.checkDirs() runs during container launch. Contributed by Peter Bacsko.
(cherry picked from commit c474e24c0b)
2019-09-18 09:23:46 +05:30
Weiwei Yang 1978317068 YARN-2255. YARN Audit logging not added to log4j.properties. Contributed by Aihua Xu.
(cherry picked from commit f8c14326ee)
2019-09-18 09:20:37 +08:00
Eric Yang dae22c962d YARN-9837. Fixed reading YARN Service JSON spec file larger than 128k.
Contributed by Tarun Parimi
2019-09-17 13:17:13 -04:00
Jonathan Hung d75693bd6e YARN-9824. Fall back to configured queue ordering policy class name
(cherry picked from commit f8f8598ea5)
(cherry picked from commit 1dbf87c9ff)
2019-09-10 15:31:58 -07:00
Jonathan Hung 80735a15a5 YARN-8541 (branch-3.1 addendum): RM startup failure on recovery after user deletion 2019-09-09 20:15:42 -07:00
bibinchundatt aee8fb567b YARN-8948. PlacementRule interface should be for all YarnSchedulers. Contributed by Bibin A Chundatt.
(cherry picked from commit a68d766e87)
(cherry picked from commit e10050678d)
2019-09-09 20:00:46 -07:00
Wangda Tan 9e8ff94d16 YARN-8361. Change App Name Placement Rule to use App Name instead of App Id for configuration. (Zian Chen via wangda)
Change-Id: I17e5021f8f611a9c5e3bd4b38f25e08585afc6b1
(cherry picked from commit a2e49f41a8)
2019-09-09 20:00:33 -07:00
Wangda Tan 81d63d5ea1 YARN-8016. Refine PlacementRule interface and add a app-name queue mapping rule as an example. (Zian Chen via wangda)
Change-Id: I35caf1480e0f76f5f3a53528af09312e39414bbb
(cherry picked from commit a90471b3e6)
2019-09-09 19:59:50 -07:00
Jonathan Hung 0e88bcd8e6 YARN-9820. RM logs InvalidStateTransitionException when app is submitted. Contributed by Prabhu Joseph 2019-09-09 00:27:33 -07:00
Jonathan Hung 080fc6d943 YARN-9764. Print application submission context label in application summary. Contributed by Manoj Kumar
(cherry picked from commit 43e389b980)
(cherry picked from commit 45220d1157)
2019-09-08 19:15:12 -07:00
Wangda Tan 0ee7d09138 YARN-9813. RM does not start on JDK11 when UIv2 is enabled. (Adam Antal/Eric Yang via wangda)
Change-Id: I18b8edc930b2efa0652f59c246931ad0d46827f3
(cherry picked from commit 34b82e6da0)
2019-09-06 19:19:59 -07:00
Tao Yang 36c1caced5 YARN-8995. Log events info in AsyncDispatcher when event queue size cumulatively reaches a certain number every time(addendum). Contributed by Jonathan Hung. 2019-09-07 08:19:02 +08:00
Tao Yang 1f6f4a2457 YARN-9795. ClusterMetrics to include AM allocation delay. Contributed by Fengnan Li. 2019-09-07 07:54:30 +08:00
Jonathan Hung 980a922481 YARN-9763. Print application tags in application summary. Contributed by Manoj Kumar 2019-09-06 10:52:37 -07:00
Jonathan Hung 11f6e3bc41 YARN-9761. Allow overriding application submissions based on server side configs. Contributed by Pralabh Kumar 2019-09-06 10:02:20 -07:00
Billie Rinaldi 619bd1e876 YARN-9718. Fixed yarn.service.am.java.opts shell injection. Contributed by Eric Yang
(cherry picked from commit 2e2e5401f2)
2019-09-05 14:19:05 -07:00
Jonathan Hung 37d1f8c81e YARN-9810. Add queue capacity/maxcapacity percentage metrics. Contributed by Shubham Gupta
(cherry picked from commit 0ccf4b0fe1)
(cherry picked from commit cb806988d72bde1f9837c9e0fb82a3a6c032542c)
2019-09-05 14:06:09 -07:00
Tao Yang d74c069427 YARN-8995. Log events info in AsyncDispatcher when event queue size cumulatively reaches a certain number every time. Contributed by zhuqi. 2019-09-05 16:54:55 +08:00
Zhankun Tang ef79d98788 Preparing for 3.1.4 development 2019-09-04 16:11:36 +08:00
Zhankun Tang fff4fbc957 YARN-9785. Fix DominantResourceCalculator when one resource is zero. Contributed by Bibin A Chundatt, Sunil Govindan, Bilwa S T. 2019-09-04 12:05:29 +08:00
bibinchundatt 3210d1e993 YARN-9797. LeafQueue#activateApplications should use resourceCalculator#fitsIn. Contributed by Bilwa S T.
(cherry picked from commit 03489124ea)
2019-09-03 11:56:19 +05:30
Akira Ajisaka 3c9d2f5317 YARN-9162. Fix TestRMAdminCLI#testHelp. Contributed by Ayush Saxena.
(cherry picked from commit 5db7c49062)
(cherry picked from commit a453f38015)
2019-08-30 17:55:35 -07:00
Eric E Payne 51896ff7e6 YARN-9756: Create metric that sums total memory/vcores preempted per round. Contributed by Manikandan R (manirajv06).
(cherry picked from commit d562050cec)
2019-08-28 21:05:23 +00:00
Jonathan Hung f73842780e YARN-9438. launchTime not written to state store for running applications
(cherry picked from commit 9568656cd21d9c02168e18ce35c6726077bbf3a1)
(cherry picked from commit 0c498de6e87c6bdc959afa31deb03d0907e0e1a1)
2019-08-27 15:45:42 -07:00
Jonathan Hung 6baa0d1e4d YARN-9775. RMWebServices /scheduler-conf GET returns all hadoop configurations for ZKConfigurationStore. Contributed by Prabhu Joseph
(cherry picked from commit 8660e48ca1)
(cherry picked from commit e4249c3202)
2019-08-26 15:55:11 -07:00
bibinchundatt eb618e4f22 YARN-9642. Fix Memory Leak in AbstractYarnScheduler caused by timer. Contributed by Bibin A Chundatt.
(cherry picked from commit d3ce53e507)
2019-08-26 23:25:16 +05:30
Szilard Nemeth fd2e353236 YARN-9100. Add tests for GpuResourceAllocator and do minor code cleanup. Contributed by Peter Bacsko 2019-08-16 15:27:10 +02:00
Szilard Nemeth 0a379e94ba YARN-8586. Extract log aggregation related fields and methods from RMAppImpl. Contributed by Peter Bacsko 2019-08-16 12:15:27 +02:00
Szilard Nemeth 94114378ce YARN-9488. Skip YARNFeatureNotEnabledException from ClientRMService. Contributed by Prabhu Joseph
(cherry picked from commit 1845a83cec)
2019-08-15 17:16:32 +02:00
Szilard Nemeth aa0631a042 YARN-9140. Code cleanup in ResourcePluginManager.initialize and in TestResourcePluginManager. Contributed by Peter Bacsko 2019-08-14 19:04:09 +02:00
Eric Badger a995e6352f YARN-9442. container working directory has group read permissions. Contributed by Jim Brennan.
(cherry picked from commit 2ac029b949)

Conflicts:
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c

(cherry picked from commit cec71691be)
2019-08-13 17:16:57 +00:00
Szilard Nemeth cb91ab73b0 YARN-9135. NM State store ResourceMappings serialization are tested with Strings instead of real Device objects. Contributed by Peter Bacsko
(cherry picked from commit 8b3c6791b1)
2019-08-13 15:47:57 +02:00
Szilard Nemeth a762a6be29 Revert "YARN-9135. NM State store ResourceMappings serialization are tested with Strings instead of real Device objects. Contributed by Peter Bacsko"
This reverts commit b20fd9e212.
Commit is reverted since unnecessary files were added, accidentally.
2019-08-13 15:47:57 +02:00
Szilard Nemeth 9da9b6d58e YARN-9723. ApplicationPlacementContext is not required for terminated jobs during recovery. Contributed by Prabhu Joseph
(cherry picked from commit e4b538bbda)
2019-08-12 15:16:49 +02:00
Szilard Nemeth 148121d889 YARN-9451. AggregatedLogsBlock shows wrong NM http port. Contributed by Prabhu Joseph
(cherry picked from commit b91099efd6)
2019-08-12 15:06:48 +02:00
Szilard Nemeth 6b4ded7647 YARN-9135. NM State store ResourceMappings serialization are tested with Strings instead of real Device objects. Contributed by Peter Bacsko 2019-08-12 14:03:50 +02:00
Sunil G 58ad5ad493 YARN-9729. [UI2] Fix error message for logs when ATSv2 is offline. Contributed by Zoltan Siegl.
(cherry picked from commit 1c5b28659f)
2019-08-11 11:49:57 +05:30
Szilard Nemeth be9ac8adf9 Logging fileSize of log files under NM Local Dir. Contributed by Prabhu Joseph
(cherry picked from commit 54ac80176e)
2019-08-09 13:23:49 +02:00
Sunil G 1c4364cb9a YARN-9715. [UI2] yarn-container-log URI need to be encoded to avoid potential misuses. Contributed by Akhil PB.
(cherry picked from commit acffec7a92)
2019-08-09 16:06:07 +05:30
Adam Antal 600a61f410 YARN-9124. Resolve contradiction in ResourceUtils: addMandatoryResources / checkMandatoryResources work differently (#1121)
(cherry picked from commit cbcada804d)
2019-08-09 11:44:22 +02:00
Szilard Nemeth 410f7a3069 YARN-9092. Create an object for cgroups mount enable and cgroups mount path as they belong together. Contributed by Gergely Pollak
(cherry picked from commit e0c21c6da9)
2019-08-09 10:25:12 +02:00
Szilard Nemeth b2f39f81fe YARN-9096: Some GpuResourcePlugin and ResourcePluginManager methods are synchronized unnecessarily. Contributed by Gergely Pollak
(cherry picked from commit 742e30b473)
2019-08-09 10:05:40 +02:00
Szilard Nemeth 943dfc78d1 YARN-9094: Remove unused interface method: NodeResourceUpdaterPlugin#handleUpdatedResourceFromRM. Contributed by Gergely Pollak
(cherry picked from commit 72d7e570a7)
2019-08-09 09:53:14 +02:00
Eric E Payne b131214685 YARN-9685: NPE when rendering the info table of leaf queue in non-accessible partitions. Contributed by Tao Yang.
(cherry picked from commit 3b38f2019e)
2019-08-08 13:08:05 +00:00
Haibo Chen f943bff254 YARN-9559. Create AbstractContainersLauncher for pluggable ContainersLauncher logic. (Contributed by Jonathan Hung)
(cherry picked from commit f51702d539)
(cherry picked from commit 8d357343c4)
2019-08-06 15:01:06 -07:00
Eric Badger 698e74d097 YARN-8045. Reduce log output from container status calls. Contributed by Craig Condit
(cherry picked from commit 144a55f0e3)
2019-08-02 20:41:26 +00:00
Eric E Payne 36af8845de YARN-9596: QueueMetrics has incorrect metrics when labelled partitions are involved. Contributed by Muhammad Samir Khan.
(cherry picked from commit 42683aef1a)
2019-07-30 19:45:00 +00:00
Jonathan Hung 3ff2148482 YARN-9668. UGI conf doesn't read user overridden configurations on RM and NM startup. (Contributed by Jonanthan Hung) 2019-07-22 10:54:08 -07:00
Weiwei Yang 48192531ad YARN-9682. Wrong log message when finalizing the upgrade. Contributed by kyungwan nam.
(cherry picked from commit 85d9111a88)
2019-07-17 11:08:21 +08:00
Szilard Nemeth 30c7b43227 YARN-9127. Create more tests to verify GpuDeviceInformationParser. Contributed by Peter Bacsko
(cherry picked from commit 18ee1092b4)
2019-07-15 12:15:36 +02:00
Szilard Nemeth bb37c6cb7f YARN-9337. Addendum to fix compilation error due to mockito spy call 2019-07-13 00:42:14 +02:00
Erik Krogen 07a6510e6a HDFS-13286. [SBN read] Add haadmin commands to transition between standby and observer. Contributed by Chao Sun. 2019-07-12 11:03:31 -07:00
Szilard Nemeth 773591ee42 YARN-9626. UI2 - Fair scheduler queue apps page issues. Contributed by Zoltan Siegl
(cherry picked from commit 557056e18e)
2019-07-12 17:41:23 +02:00
Szilard Nemeth 531e0c0bc1 YARN-9337. GPU auto-discovery script runs even when the resource is given by hand. Contributed by Adam Antal
(cherry picked from commit 61b0c2bb7c)
2019-07-12 17:30:50 +02:00
Szilard Nemeth 43c89d1e2b YARN-9235. If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown. Contributed by Antal Balint Steinbach, Adam Antal
(cherry picked from commit c416284bb7)
2019-07-12 17:07:25 +02:00
Szilard Nemeth 872a039bac YARN-9625. UI2 - No link to a queue on the Queues page for Fair Scheduler. Contributed by Zoltan Siegl
(cherry picked from commit 9cec023186)
2019-07-11 20:02:19 +02:00
Szilard Nemeth d590745046 YARN-9573. DistributedShell cannot specify LogAggregationContext. Contributed by Adam Antal. 2019-07-11 19:54:31 +02:00
bibinchundatt 5effeae1f3 YARN-9557. Application fails in diskchecker when ReadWriteDiskValidator is configured. Contributed by Bilwa S T.
(cherry picked from commit 5f8395f393)
2019-07-10 14:47:29 +05:30
Sunil G 9eb96b0fbf YARN-9644. First RMContext object is always leaked during switch over. Contributed by Bibin A Chundatt.
(cherry picked from commit d18986e4e8)
2019-07-04 11:06:41 +05:30
Szilard Nemeth 46177ade8b YARN-9629. Support configurable MIN_LOG_ROLLING_INTERVAL. Contributed by Adam Antal.
(cherry picked from commit a2a8be18cb)
2019-07-03 14:24:53 +02:00
Sunil G d2a5749482 YARN-9327. Improve synchronisation in ProtoUtils#convertToProtoFormat block. Contributed by Bibin A Chundatt.
(cherry picked from commit 0c8813f135)
2019-07-02 12:15:40 +05:30
Weiwei Yang 46b81a982b YARN-9655. AllocateResponse in FederationInterceptor lost applicationPriority. Contributed by hunshenshi.
(cherry picked from commit 570eee30e5)
2019-07-02 10:17:56 +08:00
bibinchundatt 4f622ecad8 YARN-9639. DecommissioningNodesWatcher cause memory leak. Contributed by Bilwa S T.
(cherry picked from commit be80334cdf)
2019-06-27 10:11:30 +05:30
Zhankun Tang 829202740a YARN-9584. Should put initializeProcessTrees method call before get pid. Contributed by Wanqiang Ji.
(cherry picked from commit 67414a1a80)
2019-06-18 13:20:07 +08:00
Weiwei Yang 56a4935048 YARN-9621. Fix TestDSWithMultipleNodeManager.testDistributedShellWithPlacementConstraint on branch-3.1. Contributed by Prabhu Joseph. 2019-06-17 17:22:49 +08:00
Sean Mackrory fee1e67453 HADOOP-16213. Update guava to 27.0-jre. Contributed by Gabor Bota. 2019-06-13 07:38:43 -06:00
Sunil G c343554e2a YARN-9543. [UI2] Handle ATSv2 server down or failures cases gracefully in YARN UI v2. Contributed by Zoltan Siegl and Akhil P B.
(cherry picked from commit 52128e352a)
2019-06-12 19:28:43 +05:30
Sunil G bc028d3ebb YARN-9545. Create healthcheck REST endpoint for ATSv2. Contributed by Zoltan Siegl.
(cherry picked from commit 72203f7a12)
2019-06-12 19:28:10 +05:30
Sunil G 1bb9e9a4f2 Revert "YARN-9545. Create healthcheck REST endpoint for ATSv2. Contributed by Zoltan Siegl."
This reverts commit d65371c4e8.
2019-06-12 19:27:21 +05:30
bibinchundatt f42e246f8a YARN-9547. ContainerStatusPBImpl default execution type is not returned. Contributed by Bilwa S T.
(cherry picked from commit 3303723f55)
2019-06-11 23:43:54 +05:30
bibinchundatt d386f595f9 YARN-9565. RMAppImpl#ranNodes not cleared on FinalTransition. Contributed by Bilwa S T.
(cherry picked from commit 60c95e9b6a)
2019-06-11 23:15:02 +05:30
bibinchundatt 4a39165b41 YARN-9594. Fix missing break statement in ContainerScheduler#handle. Contributed by lujie.
(cherry picked from commit 6d80b9bc3f)

 Conflicts:
	hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/ContainerScheduler.java
2019-06-11 23:05:06 +05:30
Sunil G d65371c4e8 YARN-9545. Create healthcheck REST endpoint for ATSv2. Contributed by Zoltan Siegl.
(cherry picked from commit f1d3a17d3e)
2019-06-06 06:25:02 +05:30
Sunil G 6be665cfc6 YARN-8906. [UI2] NM hostnames not displayed correctly in Node Heatmap Chart. Contributed by Akhil PB.
(cherry picked from commit 59719dc560)
2019-06-03 15:54:33 +05:30
Sunil G 894ef5c07b YARN-8947. [UI2] Active User info missing from UI2. Contributed by Akhil PB.
(cherry picked from commit 7f46dda513)
2019-06-03 12:25:40 +05:30
Weiwei Yang 23f9508a89 YARN-9507. Fix NPE in NodeManager#serviceStop on startup failure. Contributed by Bilwa S T.
(cherry picked from commit 4530f4500d)
2019-06-03 14:26:16 +08:00
Eric Yang 413a6b63bc YARN-9542. Fix LogsCLI guessAppOwner ignores custome file format suffix.
Contributed by Prabhu Joseph

(cherry picked from commit b2a39e8883)
2019-05-29 18:05:47 -04:00
Eric E Payne 9c3ab58aa7 YARN-8625. Aggregate Resource Allocation for each job is not present in ATS. Contributed by Prabhu Joseph.
(cherry picked from commit 3c63551101)
2019-05-29 19:08:27 +00:00
Ahmed Hussein f2202f7990 YARN-9563. Resource report REST API could return NaN or Inf (Ahmed Hussein via jeagles)
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
(cherry picked from commit abf76ac371)
2019-05-29 12:47:27 -05:00
Takanobu Asanuma 8098ddaf40 HADOOP-16331. Fix ASF License check in pom.xml. Contributed by Akira Ajisaka.
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2019-05-29 17:39:49 +09:00
Akira Ajisaka f8bd5deec1
HADOOP-16323. https everywhere in Maven settings. 2019-05-27 15:28:21 +09:00
Sunil G f09befd2ea YARN-9519. TFile log aggregation file format is not working for yarn.log-aggregation.TFile.remote-app-log-dir config. Contributed by Adam Antal.
(cherry picked from commit 7d831eca64)
2019-05-14 10:50:02 -07:00
Sunil G 8b306e34e0 YARN-9504. [UI2] Fair scheduler queue view page does not show actual capacity. Contributed by Zoltan Siegl.
(cherry picked from commit 64c7f36ab1)
2019-05-10 14:28:08 +05:30
Eric Yang bf013aa06e YARN-8622. Fixed container-executor compilation on MacOSX.
Contributed by Siyao Meng

(cherry picked from commit ef97a20831)
2019-05-09 14:55:38 -04:00
Haibo Chen ea1f0f282b YARN-9529. Log correct cpu controller path on error while initializing CGroups. (Contributed by Jonathan Hung)
(cherry picked from commit 597fa47ad1)
(cherry picked from commit c6573562cb)
2019-05-06 11:59:20 -07:00
Eric E Payne 41ffaea342 YARN-9285: RM UI progress column is of wrong type. Contributed by Ahmed Hussein.
(cherry picked from commit b094b94d43)
2019-05-02 19:57:44 +00:00
Weiwei Yang 94a895b94f YARN-9307. node_partitions constraint does not work. Contributed by kyungwan nam. 2019-04-26 13:16:43 +08:00
Weiwei Yang d242b166ed YARN-9325. TestQueueManagementDynamicEditPolicy fails intermittent. Contributed by Prabhu Joseph.
(cherry picked from commit 1c8046d67e)
2019-04-23 14:25:33 +08:00
Eric Yang 8b228a42e9 YARN-8587. Added retries for fetching docker exit code.
Contributed by Charo Zhang

(cherry picked from commit c16c49b8c3)
2019-04-19 15:40:56 -04:00
Eric Yang 68a98be8a2 YARN-6695. Fixed NPE in publishing appFinished events to ATSv2.
Contributed by Prabhu Joseph

(cherry picked from commit df76cdc895)
2019-04-18 12:31:34 -04:00
Weiwei Yang c37065eae9 YARN-9463. Add queueName info when failing with queue capacity sanity check. Contributed by Aihua Xu.
(cherry picked from commit 8c1bba375b)
2019-04-10 23:04:27 +08:00
Weiwei Yang bd0c9bc160 YARN-9413. Queue resource leak after app fail for CapacityScheduler. Contributed by Tao Yang.
(cherry picked from commit ec143cbf67)
2019-04-06 20:38:06 +08:00
Eric Yang dbc02bcda7 YARN-9391. Fixed node manager environment leaks into Docker containers.
Contributed by Jim Brennan

(cherry picked from commit 3c45762a0b)
2019-03-25 15:55:46 -04:00
Sunil G 6941033396 YARN-8803. [UI2] Show flow runs in the order of recently created time in graph widgets. Contributed by Akhil PB.
(cherry picked from commit c79f139519)
2019-03-06 16:50:19 +05:30
Sunil G 379a9bfd9a YARN-9138. Improve test coverage for nvidia-smi binary execution of GpuDiscoverer. Contributed by Szilard Nemeth.
(cherry picked from commit 46045c5cb3)
2019-03-06 16:02:39 +05:30
bibinchundatt e663a6af89 Revert "YARN-8132. Final Status of applications shown as UNDEFINED in ATS app queries. Contributed by Prabhu Joseph"
This reverts commit 7db50ffceb.
2019-03-04 17:03:45 +05:30
Sunil G 80d507d1a4 YARN-9139. Simplify initializer code of GpuDiscoverer. Contributed by Szilard Nemeth. 2019-03-01 19:28:33 +05:30