YARN-4468. Document the general ReservationSystem functionality, and the REST API. (subru and carlo via asuresh)

This commit is contained in:
Arun Suresh 2016-04-15 16:58:49 -07:00
parent 69f3d428d5
commit cab9cbaa0a
5 changed files with 513 additions and 2 deletions

View File

@ -133,6 +133,7 @@
<item name="Using CGroups" href="hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html"/>
<item name="Secure Containers" href="hadoop-yarn/hadoop-yarn-site/SecureContainer.html"/>
<item name="Registry" href="hadoop-yarn/hadoop-yarn-site/registry/index.html"/>
<item name="Reservation System" href="hadoop-yarn/hadoop-yarn-site/ReservationSystem.html"/>
</menu>
<menu name="YARN REST APIs" inherit="top">

View File

@ -0,0 +1,65 @@
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
Reservation System
==================
* [Purpose](#Purpose)
* [Overview](#Overview)
* [Flow of a Reservation](#Flow_of_a_Reservation)
* [Configuring the Reservation System](#Configuring_the_Reservation_System)
Purpose
-------
This document provides a brief overview of the `YARN ReservationSystem`.
Overview
--------
The `ReservationSystem` of YARN provides the user the ability to reserve resources over (and ahead of) time, to ensure that important production jobs will be run very predictably. The ReservationSystem performs careful admission control and provides guarantees over absolute amounts of resources (instead of % of cluster size). Reservation can be both malleable or have gang semantics, and can have time-varying resource requirements. The ReservationSystem is a component of the YARN ResourceManager.
Flow of a Reservation
----------------------
![YARN Reservation System | width=600px](./images/yarn_reservation_system.png)
With reference to the figure above, a typical reservation proceeds as follows:
* **Step 1** The user (or an automated tool on its behalf) submit a reservation request specified by the Reservation Definition Language (RDL). This describes the user need for resources over-time (e.g., a skyline of resources) and temporal constraints (e.g., deadline). This can be done both programmatically through the usual Client-to-RM protocols or via the REST api of the RM.
* **Step 2** The ReservationSystem leverages a ReservationAgent (GREE in the figure) to find a plausible allocation for the reservation in the Plan, a data structure tracking all reservation currently accepted and the available resources in the system.
* **Step 3** The SharingPolicy provides a way to enforce invariants on the reservation being accepted, potentially rejecting reservations. For example, the CapacityOvertimePolicy allows enforcement of both instantaneous max-capacity a user can request across all of his/her reservations and a limit on the integral of resources over a period of time, e.g., the user can reserve up to 50% of the cluster capacity instantanesouly, but in any 24h period of time he/she cannot exceed 10% average.
* **Step 4** Upon a successful validation the ReservationSystem returns to the user a ReservationId (think of it as an airline ticket).
* **Step 5** When the time comes, a new component called the PlanFollower publishes the state of the plan to the scheduler, by dynamically creating/tweaking/destroying queues.
* **Step 6** The user can then submit one (or more) jobs to the reservable queue, by simply including the ReservationId as part of the ApplicationSubmissionContext.
* **Step 7** The Scheduler will then provide containers from a special queue created to ensure resources reservation is respected. Within the limits of the reservation, the user has guaranteed access to the resources, above that resource sharing proceed with standard Capacity/Fairness sharing.
* **Step 8** The system includes mechanisms to adapt to drop in cluster capacity. This consists in replanning by "moving" the reservation if possible, or rejecting the smallest amount of previously accepted reservation (to ensure that other reservation will receive their full amount).
Configuring the Reservation System
----------------------------------
Configuring the `ReservationSystem` is simple. Currently we have added support for *reservations* in both `CapacityScheduler` and `FairScheduler`. You can mark any **leaf queue** in the **capacity-scheduler.xml** or **fair-scheduler.xml** as available for "reservations" (see [CapacityScheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Configuring_ReservationSystem_with_CapacityScheduler) and the [FairScheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html) for details). Then the capacity/fair share within that queue can be used for making reservations. Jobs can still be submitted to the *reservable queue* without a reservation, in which case they will be run in best-effort mode in whatever capacity is left over by the jobs running within active reservations.

View File

@ -34,6 +34,9 @@ ResourceManager REST API's.
* [Cluster Application Priority API](#Cluster_Application_Priority_API)
* [Cluster Delegation Tokens API](#Cluster_Delegation_Tokens_API)
* [Cluster Reservation API List](#Cluster_Reservation_API_List)
* [Cluster Reservation API Submit](#Cluster_Reservation_API_Submit)
* [Cluster Reservation API Update](#Cluster_Reservation_API_Update)
* [Cluster Reservation API Delete](#Cluster_Reservation_API_Delete)
Overview
--------
@ -3223,8 +3226,8 @@ The Cluster Reservation API can be used to list reservations. When listing reser
| Item | Data Type | Description |
|:---- |:---- |:---- |
| arrival | long | The UTC time representation of the earliest time this reservation can be allocated from. |
| deadline | long | The UTC time representation of the latest time within which this reservatino can be allocated. |
| reservation-name | string | A mnemonic name of the reservaiton (not a valid identifier). |
| deadline | long | The UTC time representation of the latest time within which this reservation can be allocated. |
| reservation-name | string | A mnemonic name of the reservation (not a valid identifier). |
| reservation-requests | object | A list of "stages" or phases of this reservation, each describing resource requirements and duration |
### Elements of the *reservation-requests* object
@ -3381,3 +3384,443 @@ Response Body:
</reservations>
</reservationListInfo>
```
Cluster Reservation API Submit
------------------------------
The Cluster Reservation API can be used to submit reservations.When submitting a reservation the user specify the constraints in terms of resources, and time that are required, the resulting page returns a reservation-id that the user can use to get access to the resources by specifying it as part of [Cluster Submit Applications API](#Cluster_Applications_APISubmit_Application).
### URI
* http://<rm http address:port>/ws/v1/cluster/reservation/submit
### HTTP Operations Supported
* POST
### POST Response Examples
POST requests can be used to submit reservations to the ResourceManager. As mentioned above, a reservation-id is returned upon success (in the body of the answer). Successful submissions result in a 200 response. Please note that in order to submit a reservation, you must have an authentication filter setup for the HTTP interface. The functionality requires that a username is set in the HttpServletRequest. If no filter is setup, the response will be an "UNAUTHORIZED" response.
Please note that this feature is currently in the alpha stage and may change in the future.
#### Elements of the POST request object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| queue | string | The (reservable) queue you are submitting to|
| reservation-definition | object | A set of constraints representing the need for resources over time of a user. |
Elements of the *reservation-definition* object
| Item | Data Type | Description |
|:---- |:---- |:---- |
|arrival | long | The UTC time representation of the earliest time this reservation can be allocated from. |
| deadline | long | The UTC time representation of the latest time within which this reservation can be allocated. |
| reservation-name | string | A mnemonic name of the reservation (not a valid identifier). |
| reservation-requests | object | A list of "stages" or phases of this reservation, each describing resource requirements and duration |
Elements of the *reservation-requests* object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| reservation-request-interpreter | int | A numeric choice of how to interpret the set of ReservationRequest: 0 is an ANY, 1 for ALL, 2 for ORDER, 3 for ORDER\_NO\_GAP |
| reservation-request | object | The description of the resource and time capabilities for a phase/stage of this reservation |
Elements of the *reservation-request* object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| duration | long | The duration of a ReservationRequeust in milliseconds (amount of consecutive milliseconds a satisfiable allocation for this portion of the reservation should exist for). |
| num-containers | int | The number of containers required in this phase of the reservation (capture the maximum parallelism of the job(s) in this phase). |
| min-concurrency | int | The minimum number of containers that must be concurrently allocated to satisfy this allocation (capture min-parallelism, useful to express gang semantics). |
| capability | object | Allows to specify the size of each container (memory, vCores).|
Elements of the *capability* object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| memory | int | the number of MB of memory for this container |
| vCores | int | the number of virtual cores for this container |
**JSON response**
This examples contains a reservation composed of two stages (alternative to each other as the *reservation-request-interpreter* is set to 0), so that the first is shorter and "taller" and "gang"
with exactly 220 containers for 60 seconds, while the second alternative is longer with 120 seconds duration and less tall with 110 containers (and a min-concurrency of 1 container, thus no gang semantics).
HTTP Request:
```json
POST http://rmdns:8088/ws/v1/cluster/reservation/submit
Content-Type: application/json
{
"queue" : "dedicated",
"reservation-definition" : {
"arrival" : 1765541532000,
"deadline" : 1765542252000,
"reservation-name" : "res_1",
"reservation-requests" : {
"reservation-request-interpreter" : 0,
"reservation-request" : [
{
"duration" : 60000,
"num-containers" : 220,
"min-concurrency" : 220,
"capability" : {
"memory" : 1024,
"vCores" : 1
}
},
{
"duration" : 120000,
"num-containers" : 110,
"min-concurrency" : 1,
"capability" : {
"memory" : 1024,
"vCores" : 1
}
}
]
}
}
}
```
Response Header:
200 OK
Cache-Control: no-cache
Expires: Thu, 17 Dec 2015 23:36:34 GMT, Thu, 17 Dec 2015 23:36:34 GMT
Date: Thu, 17 Dec 2015 23:36:34 GMT, Thu, 17 Dec 2015 23:36:34 GMT
Pragma: no-cache, no-cache
Content-Type: application/xml
Content-Encoding: gzip
Content-Length: 137
Server: Jetty(6.1.26)
Response Body:
```json
{"reservation-id":"reservation_1448064217915_0009"}
```
**XML response**
HTTP Request:
```xml
POST http://rmdns:8088/ws/v1/cluster/reservation/submit
Accept: application/xml
Content-Type: application/xml
<reservation-submission-context>
<queue>dedicated</queue>
<reservation-definition>
<arrival>1765541532000</arrival>
<deadline>1765542252000</deadline>
<reservation-name>res_1</reservation-name>
<reservation-requests>
<reservation-request-interpreter>0</reservation-request-interpreter>
<reservation-request>
<duration>60000</duration>
<num-containers>220</num-containers>
<min-concurrency>220</min-concurrency>
<capability>
<memory>1024</memory>
<vCores>1</vCores>
</capability>
</reservation-request>
<reservation-request>
<duration>120000</duration>
<num-containers>110</num-containers>
<min-concurrency>1</min-concurrency>
<capability>
<memory>1024</memory>
<vCores>1</vCores>
</capability>
</reservation-request>
</reservation-requests>
</reservation-definition>
</reservation-submission-context>
```
Response Header:
200 OK
Cache-Control: no-cache
Expires: Thu, 17 Dec 2015 23:49:21 GMT, Thu, 17 Dec 2015 23:49:21 GMT
Date: Thu, 17 Dec 2015 23:49:21 GMT, Thu, 17 Dec 2015 23:49:21 GMT
Pragma: no-cache, no-cache
Content-Type: application/xml
Content-Encoding: gzip
Content-Length: 137
Server: Jetty(6.1.26)
Response Body:
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<reservation-submission-response>
<reservation-id>reservation_1448064217915_0010</reservation-id>
</reservation-submission-response>
```
Cluster Reservation API Update
------------------------------
The Cluster Reservation API Update can be used to update existing reservations.Update of a Reservation works similarly to submit described above, but the user submits the reservation-id of an existing reservation to be updated. The semantics is a try-and-swap, successful operation will modify the existing reservation based on the requested update parameter, while a failed execution will leave the existing reservation unchanged.
### URI
* http://<rm http address:port>/ws/v1/cluster/reservation/update
### HTTP Operations Supported
* POST
### POST Response Examples
POST requests can be used to update reservations to the ResourceManager. Successful submissions result in a 200 response, indicate in-place update of the existing reservation (id does not change). Please note that in order to update a reservation, you must have an authentication filter setup for the HTTP interface. The functionality requires that a username is set in the HttpServletRequest. If no filter is setup, the response will be an "UNAUTHORIZED" response.
Please note that this feature is currently in the alpha stage and may change in the future.
#### Elements of the POST request object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| reservation-id | string | The id of the reservation to be updated (the system automatically looks up the right queue from this)|
| reservation-definition | object | A set of constraints representing the need for resources over time of a user. |
Elements of the *reservation-definition* object
| Item | Data Type | Description |
|:---- |:---- |:---- |
|arrival | long | The UTC time representation of the earliest time this reservation can be allocated from. |
| deadline | long | The UTC time representation of the latest time within which this reservation can be allocated. |
| reservation-name | string | A mnemonic name of the reservation (not a valid identifier). |
| reservation-requests | object | A list of "stages" or phases of this reservation, each describing resource requirements and duration |
Elements of the *reservation-requests* object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| reservation-request-interpreter | int | A numeric choice of how to interpret the set of ReservationRequest: 0 is an ANY, 1 for ALL, 2 for ORDER, 3 for ORDER\_NO\_GAP |
| reservation-request | object | The description of the resource and time capabilities for a phase/stage of this reservation |
Elements of the *reservation-request* object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| duration | long | The duration of a ReservationRequeust in milliseconds (amount of consecutive milliseconds a satisfiable allocation for this portion of the reservation should exist for). |
| num-containers | int | The number of containers required in this phase of the reservation (capture the maximum parallelism of the job(s) in this phase). |
| min-concurrency | int | The minimum number of containers that must be concurrently allocated to satisfy this allocation (capture min-parallelism, useful to express gang semantics). |
| capability | object | Allows to specify the size of each container (memory, vCores).|
Elements of the *capability* object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| memory | int | the number of MB of memory for this container |
| vCores | int | the number of virtual cores for this container |
**JSON response**
This examples updates an existing reservation identified by *reservation_1449259268893_0005* with two stages (in order as the *reservation-request-interpreter* is set to 2), with the first stage being a "gang" of 10 containers for 5 minutes (min-concurrency of 10 containers) followed by a 50 containers for 10 minutes(min-concurrency of 1 container, thus no gang semantics).
HTTP Request:
```json
POST http://rmdns:8088/ws/v1/cluster/reservation/update
Accept: application/json
Content-Type: application/json
{
"reservation-id" : "reservation_1449259268893_0005",
"reservation-definition" : {
"arrival" : 1765541532000,
"deadline" : 1765542252000,
"reservation-name" : "res_1",
"reservation-requests" : {
"reservation-request-interpreter" : 2,
"reservation-request" : [
{
"duration" : 300000,
"num-containers" : 10,
"min-concurrency" : 10,
"capability" : {
"memory" : 1024,
"vCores" : 1
}
},
{
"duration" : 60000,
"num-containers" : 50,
"min-concurrency" : 1,
"capability" : {
"memory" : 1024,
"vCores" : 1
}
}
]
}
}
}
```
Response Header:
200 OK
Cache-Control: no-cache
Expires: Thu, 17 Dec 2015 23:36:34 GMT, Thu, 17 Dec 2015 23:36:34 GMT
Date: Thu, 17 Dec 2015 23:36:34 GMT, Thu, 17 Dec 2015 23:36:34 GMT
Pragma: no-cache, no-cache
Content-Type: application/json
Content-Encoding: gzip
Content-Length: 137
Server: Jetty(6.1.26)
Response Body:
No response body
**XML response**
HTTP Request:
```xml
POST http://rmdns:8088/ws/v1/cluster/reservation/update
Accept: application/xml
Content-Type: application/xml
<reservation-update-context>
<reservation-id>reservation_1449259268893_0005</reservation-id>
<reservation-definition>
<arrival>1765541532000</arrival>
<deadline>1765542252000</deadline>
<reservation-name>res_1</reservation-name>
<reservation-requests>
<reservation-request-interpreter>2</reservation-request-interpreter>
<reservation-request>
<duration>300000</duration>
<num-containers>10</num-containers>
<min-concurrency>10</min-concurrency>
<capability>
<memory>1024</memory>
<vCores>1</vCores>
</capability>
</reservation-request>
<reservation-request>
<duration>60000</duration>
<num-containers>50</num-containers>
<min-concurrency>1</min-concurrency>
<capability>
<memory>1024</memory>
<vCores>1</vCores>
</capability>
</reservation-request>
</reservation-requests>
</reservation-definition>
</reservation-update-context>
```
Response Header:
200 OK
Cache-Control: no-cache
Expires: Thu, 17 Dec 2015 23:49:21 GMT, Thu, 17 Dec 2015 23:49:21 GMT
Date: Thu, 17 Dec 2015 23:49:21 GMT, Thu, 17 Dec 2015 23:49:21 GMT
Pragma: no-cache, no-cache
Content-Type: application/xml
Content-Encoding: gzip
Content-Length: 137
Server: Jetty(6.1.26)
Response Body:
No response body
Cluster Reservation API Delete
------------------------------
The Cluster Reservation API Delete can be used to delete existing reservations.Delete works similar to update. The requests contains the reservation-id, and if successful the reservation is cancelled, otherwise the reservation remains in the system.
### URI
* http://<rm http address:port>/ws/v1/cluster/reservation/delete
### HTTP Operations Supported
* POST
### POST Response Examples
POST requests can be used to delete reservations to the ResourceManager. Successful submissions result in a 200 response, indicating that the delete succeeded. Please note that in order to delete a reservation, you must have an authentication filter setup for the HTTP interface. The functionality requires that a username is set in the HttpServletRequest. If no filter is setup, the response will be an "UNAUTHORIZED" response.
Please note that this feature is currently in the alpha stage and may change in the future.
#### Elements of the POST request object
| Item | Data Type | Description |
|:---- |:---- |:---- |
| reservation-id | string | The id of the reservation to be deleted (the system automatically looks up the right queue from this)|
**JSON response**
This examples deletes an existing reservation identified by *reservation_1449259268893_0006*
HTTP Request:
```json
POST http://10.200.91.98:8088/ws/v1/cluster/reservation/delete
Accept: application/json
Content-Type: application/json
{
"reservation-id" : "reservation_1449259268893_0006"
}
```
Response Header:
200 OK
Cache-Control: no-cache
Expires: Fri, 18 Dec 2015 01:31:05 GMT, Fri, 18 Dec 2015 01:31:05 GMT
Date: Fri, 18 Dec 2015 01:31:05 GMT, Fri, 18 Dec 2015 01:31:05 GMT
Pragma: no-cache, no-cache
Content-Type: application/json
Content-Encoding: gzip
Transfer-Encoding: chunked
Server: Jetty(6.1.26)
Response Body:
No response body
**XML response**
HTTP Request:
```xml
POST http://10.200.91.98:8088/ws/v1/cluster/reservation/delete
Accept: application/xml
Content-Type: application/xml
<reservation-delete-context>
<reservation-id>reservation_1449259268893_0006</reservation-id>
</reservation-delete-context>
```
Response Header:
200 OK
Cache-Control: no-cache
Expires: Fri, 18 Dec 2015 01:33:23 GMT, Fri, 18 Dec 2015 01:33:23 GMT
Date: Fri, 18 Dec 2015 01:33:23 GMT, Fri, 18 Dec 2015 01:33:23 GMT
Pragma: no-cache, no-cache
Content-Type: application/xml
Content-Encoding: gzip
Content-Length: 101
Server: Jetty(6.1.26)
Response Body:
No response body

View File

@ -32,3 +32,5 @@ The Scheduler has a pluggable policy which is responsible for partitioning the c
The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.
MapReduce in hadoop-2.x maintains **API compatibility** with previous stable release (hadoop-1.x). This means that all MapReduce jobs should still run unchanged on top of YARN with just a recompile.
YARN also supports the notion of **resource reservation** via the [ReservationSystem](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ReservationSystem.html), a component that allows users to specify a profile of resources over-time and temporal constraints (e.g., deadlines), and reserve resources to ensure the predictable execution of important jobs.The *ReservationSystem* tracks resources over-time, performs admission control for reservations, and dynamically instruct the underlying scheduler to ensure that the reservation is fullfilled.

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB