YARN-8394. Improve data locality documentation for Capacity Scheduler. Contributed by Weiwei Yang.
This commit is contained in:
parent
108da85320
commit
29024a6203
|
@ -149,6 +149,8 @@
|
||||||
attempts to schedule rack-local containers.
|
attempts to schedule rack-local containers.
|
||||||
When setting this parameter, the size of the cluster should be taken into account.
|
When setting this parameter, the size of the cluster should be taken into account.
|
||||||
We use 40 as the default value, which is approximately the number of nodes in one rack.
|
We use 40 as the default value, which is approximately the number of nodes in one rack.
|
||||||
|
Note, if this value is -1, the locality constraint in the container request
|
||||||
|
will be ignored, which disables the delay scheduling.
|
||||||
</description>
|
</description>
|
||||||
</property>
|
</property>
|
||||||
|
|
||||||
|
|
|
@ -400,9 +400,14 @@ list of current scheduling edit policies as a comma separated string in `yarn.re
|
||||||
|
|
||||||
* Data Locality
|
* Data Locality
|
||||||
|
|
||||||
|
Capacity Scheduler leverages `Delay Scheduling` to honor task locality constraints. There are 3 levels of locality constraint: node-local, rack-local and off-switch. The scheduler counts the number of missed opportunities when the locality cannot be satisfied, and waits this count to reach a threshold before relaxing the locality constraint to next level. The threshold can be configured in following properties:
|
||||||
|
|
||||||
| Property | Description |
|
| Property | Description |
|
||||||
|:---- |:---- |
|
|:---- |:---- |
|
||||||
| `yarn.scheduler.capacity.node-locality-delay` | Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically, this should be set to number of nodes in the cluster. By default is setting approximately number of nodes in one rack which is 40. Positive integer value is expected. |
|
| `yarn.scheduler.capacity.node-locality-delay` | Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. Typically, this should be set to number of nodes in the cluster. By default is setting approximately number of nodes in one rack which is 40. Positive integer value is expected. |
|
||||||
|
| `yarn.scheduler.capacity.rack-locality-additional-delay` | Number of additional missed scheduling opportunities over the node-locality-delay ones, after which the CapacityScheduler attempts to schedule off-switch containers. By default this value is set to -1, in this case, the number of missed opportunities for assigning off-switch containers is calculated based on the formula `L * C / N`, where `L` is number of locations (nodes or racks) specified in the resource request, `C` is the number of requested containers, and `N` is the size of the cluster. |
|
||||||
|
|
||||||
|
Note, this feature should be disabled if YARN is deployed separately with the file system, as locality is meaningless. This can be done by setting `yarn.scheduler.capacity.node-locality-delay` to `-1`, in this case, request's locality constraint is ignored.
|
||||||
|
|
||||||
* Container Allocation per NodeManager Heartbeat
|
* Container Allocation per NodeManager Heartbeat
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue