<p>Estimating job resource requirements remains an important and challenging problem for enterprise clusters. This is amplified by the ever-increasing complexity of workloads, i.e. from traditional batch jobs to interactive queries to streaming and recently machine learning jobs. This results in jobs relying on multiple computation frameworks such as Tez, MapReduce, Spark, etc., and the problem is further compounded by sharing nature of the clusters. Current state-of-art solution relies on user expertise to make resource requirement estimations for the jobs (for e.g.: number of reducers or container memory size, etc.), which is both tedious and inefficient.</p>
<p>Based on the analysis of our cluster workloads, we observe that a large portion of jobs (more than 60%) are recurring jobs, giving us the opportunity to automatically estimate job resource requirements based on job’s history runs. It is worth noting that jobs usually come from different computation frameworks, and the version may change across runs as well. Therefore, we want to come up with a framework agnostic black-box solution to automatically make resource requirement estimation for the recurring jobs.</p></section><section>
<h3><aname="Goals"></a>Goals</h3>
<ul>
<li>For a periodic job, analyze its history logs and predict its resource requirement for the new run.</li>
<li>Support various types of job logs.</li>
<li>Scale to terabytes of job logs.</li>
</ul></section><section>
<h3><aname="Architecture"></a>Architecture</h3>
<p>The following figure illustrates the implementation architecture of the resource estimator.</p>
<p><imgsrc="images/resourceestimator_arch.png"alt="The architecture of the resource estimator"/></p>
<p>Hadoop-resourceestimator mainly consists of three modules: Translator, SkylineStore and Estimator.</p>
<olstyle="list-style-type: decimal">
<li><code>ResourceSkyline</code> is used to characterize job’s resource utilizations during its lifespan. More specifically, it uses <code>RLESparseResourceAllocation</code> (<aclass="externalLink"href="https://github.com/apache/hadoop/blob/b6e7d1369690eaf50ce9ea7968f91a72ecb74de0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java">https://github.com/apache/hadoop/blob/b6e7d1369690eaf50ce9ea7968f91a72ecb74de0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java</a>) to record the container allocation information. <code>RecurrenceId</code> is used to identify a specific run of a recurring pipeline. A pipeline could consist of multiple jobs, each has a <code>ResourceSkyline</code> to characterize its resource utilization.</li>
<li><code>Translator</code> parses the job logs, extracts their <code>ResourceSkylines</code> and stores them to the SkylineStore. <code>SingleLineParser</code> parses one line in the log stream and extract the <code>ResourceSkyline</code>. <code>LogParser</code> recursively parses each line in the log stream using <code>SingleLineParser</code>. Note that logs could have different storage formats, so <code>LogParser</code> takes a stream of strings as input, instead of File or other formats. Since job logs may have various formats thus require different <code>SingleLineParser</code> implementations, <code>LogParser</code> initiates the <code>SingleLineParser</code> based on user configuration. Currently Hadoop-resourceestimator provides two implementations for <code>SingleLineParser</code>: <code>NativeSingleLineParser</code> supports an optimized native format, and <code>RMSingleLineParser</code> parses the YARN ResourceManager logs generated in Hadoop systems since RM logs are widely available (in production deployments).</li>
<li><code>SkylineStore</code> serves as the storage layer for Hadoop-resourceestimator and consists of 2 parts. <code>HistorySkylineStore</code> stores the <code>ResourceSkylines</code> extracted by the <code>Translator</code>. It supports four actions: addHistory, deleteHistory, updateHistory and getHistory. addHistory appends new <code>ResourceSkylines</code> to the recurring pipelines, while updateHistory deletes all the <code>ResourceSkylines</code> of a specific recurring pipeline, and re-insert new <code>ResourceSkylines</code>. <code>PredictionSkylineStore</code> stores the predicted <code>RLESparseResourceAllocation</code> generated by the Estimator. It supports two actions: addEstimation and getEstimation.
<p>Currently Hadoop-resourceestimator provides in-memory implementation for the SkylineStore.</p></li>
<li><code>Estimator</code> predicts recurring pipeline’s resource requirements based on its history runs, stores the prediction to the <code>SkylineStore</code> and makes recurring resource reservations to YARN (YARN-5326). <code>Solver</code> reads all the history <code>ResourceSkylines</code> of a specific recurring pipeline, and predicts its new resource requirements wrapped in <code>RLESparseResourceAllocation</code>. Currently Hadoop-resourceestimator provides a <code>LPSOLVER</code> to make the prediction (the details of the Linear Programming model can be find in the paper). There is also a <code>BaseSolver</code> to translate predicted resource requirements into <code>ReservationSubmissionRequest</code> which is used by different solver implementations to make recurring resource reservations on YARN.</li>
<li><code>ResourceEstimationService</code> wraps Hadoop-resourceestimator as a micro-service, which can be easily deployed in clusters. It provides a set of REST APIs to allow users to parse specified job logs, query pipeline’s history <code>ResourceSkylines</code>, query pipeline’s predicted resource requirements and run the <code>SOLVER</code> if the prediction does not exist, delete the <code>ResourceSkylines</code> in the <code>SkylineStore</code>.</li>
</ol></section></section><section>
<h2><aname="Usage"></a>Usage</h2>
<p>This section will guide you through the usage of resource estimator service.</p>
<p>Here let <code>$HADOOP_ROOT</code> represent the Hadoop install directory. If you build Hadoop yourself, <code>$HADOOP_ROOT</code> is <code>hadoop-dist/target/hadoop-$VERSION</code>. The location of the resource estimator service, <code>$ResourceEstimatorServiceHome</code>, is <code>$HADOOP_ROOT/share/hadoop/tools/resourceestimator</code>. It contains 3 folders: <code>bin</code>, <code>conf</code> and <code>data</code>. Please note that users can use resource estimator service with the default configurations.</p>
<ul>
<li>
<p><code>bin</code> contains the running scripts for the resource estimator service.</p>
</li>
<li>
<p><code>conf</code>: contains the configuration files for the resource estimator service.</p>
</li>
<li>
<p><code>data</code> contains the sample log that is used to run the example of resource estimator service.</p>
</li>
</ul><section>
<h3><aname="Step_1:_Start_the_estimator"></a>Step 1: Start the estimator</h3>
<p>First of all, copy the configuration file (located in <code>$ResourceEstimatorServiceHome/conf/</code>) to <code>$HADOOP_ROOT/etc/hadoop</code>.</p>
<p>The script to start the estimator is <code>start-estimator.sh</code>.</p>
<divclass="source">
<divclass="source">
<pre>$ cd $ResourceEstimatorServiceHome
$ bin/start-estimator.sh
</pre></div></div>
<p>A web server is started, and users can use the resource estimation service through REST APIs.</p></section><section>
<h3><aname="Step_2:_Run_the_estimator"></a>Step 2: Run the estimator</h3>
<p>The URI for the resource estimator sercive is <code>http://0.0.0.0</code>, and the default service port is <code>9998</code> (configured in <code>$ResourceEstimatorServiceHome/conf/resourceestimator-config.xml</code>). In <code>$ResourceEstimatorServiceHome/data</code>, there is a sample log file <code>resourceEstimatorService.txt</code> which contains the logs of tpch_q12 query job for 2 runs.</p>
<ul>
<li><code>parse job logs: POST http://URI:port/resourceestimator/translator/LOG_FILE_DIRECTORY</code></li>
</ul>
<p>Send <code>POST http://0.0.0.0:9998/resourceestimator/translator/data/resourceEstimatorService.txt</code>. The underlying estimator will extract the ResourceSkylines from the log file and store them in the jobHistory SkylineStore.</p>
<ul>
<li><code>query job's history ResourceSkylines: GET http://URI:port/resourceestimator/skylinestore/history/{pipelineId}/{runId}</code></li>
</ul>
<p>Send <code>GET http://0.0.0.0:9998/resourceestimator/skylinestore/history/*/*</code>, and the underlying estimator will return all the records in history SkylineStore. You should be able to see ResourceSkylines for two runs of tpch_q12: tpch_q12_0 and tpch_q12_1. Note that both <code>pipelineId</code> and <code>runId</code> fields support wildcard operations.</p>
<ul>
<li><code>predict job's resource skyline requirement: GET http://URI:port/resourceestimator/estimator/{pipelineId}</code></li>
</ul>
<p>Send <code>http://0.0.0.0:9998/resourceestimator/estimator/tpch_q12</code>, and the underlying estimator will predict job’s resource requirements for the new run based on its history ResourceSkylines, and store the predicted resource requirements to jobEstimation SkylineStore.</p>
<ul>
<li><code>query job's estimated resource skylines: GET http://URI:port/resourceestimator/skylinestore/estimation/{pipelineId}</code></li>
</ul>
<p>Send <code>http://0.0.0.0:9998/resourceestimator/skylinestore/estimation/tpch_q12</code>, and the underlying estimator will return the history resource requirement estimation for tpch_q12 job. Note that for jobEstimation SkylineStore, it does not support wildcard operations.</p>
<ul>
<li><code>delete job's history resource skylines: DELETE http://URI:port/resourceestimator/skylinestore/history/{pipelineId}/{runId}</code></li>
</ul>
<p>Send <code>http://0.0.0.0:9998/resourceestimator/skylinestore/history/tpch_q12/tpch_q12_0</code>, and the underlying estimator will delete the ResourceSkyline record for tpch_q12_0. Re-send <code>GET http://0.0.0.0:9998/resourceestimator/skylinestore/history/*/*</code>, and the underlying estimator will only return the ResourceSkyline for tpch_q12_1.</p></section><section>
<h3><aname="Step_3:_Run_the_estimator"></a>Step 3: Run the estimator</h3>
<p>The script to stop the estimator is <code>stop-estimator.sh</code>.</p>
<divclass="source">
<divclass="source">
<pre>$ cd $ResourceEstimatorServiceHome
$ bin/stop-estimator.sh
</pre></div></div>
</section></section><section>
<h2><aname="Example"></a>Example</h2>
<p>Here we present an example for using Resource Estimator Service.</p>
<p>First, we run a tpch_q12 job for 9 times, and collect job’s resource skylines in each run (note that in this example, we only collect “# of allocated containers” information).</p>
<p>Then, we run the log parser in Resource Estimator Service to extract the ResourceSkylines from logs and store them in the SkylineStore. The job’s ResourceSkylines are plotted below for demonstration.</p>
<p><imgsrc="images/tpch_history.png"alt="Tpch job history runs"/></p>
<p>Finally, we run the estimator in Resource Estimator Service to predict the resource requirements for the new run, which is wrapped in RLESparseResourceAllocation (<aclass="externalLink"href="https://github.com/apache/hadoop/blob/b6e7d1369690eaf50ce9ea7968f91a72ecb74de0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java">https://github.com/apache/hadoop/blob/b6e7d1369690eaf50ce9ea7968f91a72ecb74de0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java</a>). The predicted resource requirement is plotted below for demonstration.</p>
<p><imgsrc="images/tpch_predict.png"alt="Tpch job history prediction"/></p></section><section>
<p>This section will guide you through the configuration for Resource Estimator Service. The configuration file is located at <code>$ResourceEstimatorServiceHome/conf/resourceestimator-config.xml</code>.</p>
<p>The resource estimator has an integrated Linear Programming solver to make the prediction (refer to <aclass="externalLink"href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/osdi16-final107.pdf">https://www.microsoft.com/en-us/research/wp-content/uploads/2016/10/osdi16-final107.pdf</a> for more details), and this parameter tunes the tradeoff between resource over-allocation and under-allocation in the Linear Programming model. This parameter varies from 0 to 1, and a larger alpha value means the model minimizes over-allocation better. Default value is 0.1.</p>
<p>The time length which is used to discretize job execution into intervals. Note that the estimator makes resource allocation prediction for each interval. A smaller time interval has more fine-grained granularity for prediction, but it also takes longer time and more space for prediction. Default value is 5 (seconds).</p>
<p>The class name of the skylinestore provider. Default value is <code>org.apache.hadoop.resourceestimator.skylinestore.impl.InMemoryStore</code>, which is an in-memory implementation of skylinestore. If users want to use their own skylinestore implementation, they need to change this value accordingly.</p>
<p>The class name of the translator provider. Default value is <code>org.apache.hadoop.resourceestimator.translator.impl.BaseLogParser</code>, which extracts resourceskylines from log streams. If users want to use their own translator implementation, they need to change this value accordingly.</p>
<p>The class name of the translator single-line parser, which parses a single line in the log. Default value is <code>org.apache.hadoop.resourceestimator.translator.impl.NativeSingleLineParser</code>, which can parse one line in the sample log. Note that if users want to parse Hadoop Resource Manager (<aclass="externalLink"href="https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html</a>) logs, they need to set the value to be <code>org.apache.hadoop.resourceestimator.translator.impl.RmSingleLineParser</code>. If they want to implement single-line parser to parse their customized log file, they need to change this value accordingly.</p>
<p>The class name of the solver provider. Default value is <code>org.apache.hadoop.resourceestimator.solver.impl.LpSolver</code>, which incorporates a Linear Programming model to make the prediction. If users want to implement their own models, they need to change this value accordingly.</p>
<p>The port which ResourceEstimatorService listens to. The default value is 9998.</p></section><section>
<h2><aname="Future_work"></a>Future work</h2>
<olstyle="list-style-type: decimal">
<li>
<p>For SkylineStore, we plan to provide a persistent store implementation. We are considering HBase to future proof our scale requirements.</p>
</li>
<li>
<p>For Translator module, we want to support Timeline Service v2 as the primary source as we want to rely on a stable API and logs are flaky at best.</p>
</li>
<li>
<p>Job resource requirements could vary across runs due to skewness, contention, input data or code changes, etc. We want to design a Reprovisioner module, which dynamically monitors job progress at runtime, identifies the performance bottlenecks if the progress is slower than expectation, and dynamically adjusts job’s resource allocations accordingly using ReservationUpdateRequest.</p>
</li>
<li>
<p>When Estimator predicts job’s resource requirements, we want to provide the confidence level associated with the prediction according to the estimation error (combination of over-allocation and under-allocation), etc.</p>
</li>
<li>
<p>For Estimator module, we can integrate machine learning tools such as reinforcement learning to make better prediction. We can also integrate with domain-specific solvers such as PerfOrator to improve prediction quality.</p>
</li>
<li>
<p>For Estimator module, we want to design incremental solver, which can incrementally update job’s resource requirements only based on the new logs.</p>