d3.2 – dynamic power management - ningapi.ning.com/.../d3.2dynamicpowermanagement.pdf · with...

Project Number 611411

D3.2 – Dynamic Power Management

Version 2.021 August 2015

Final

Public Distribution

University of Stuttgart

Project Partners: aicas, Bosch, CNRS, Rheon Media, The Open Group, University of Stuttgart, University of York

Every effort has been made to ensure that all statements and information contained herein are accurate, howeverthe DreamCloud Project Partners accept no liability for any error or omission in the same.

© 2015 Copyright in this document remains vested in the DreamCloud Project Partners.


Project Partner Contact Information

aicas BoschFridtjof Siebert Björn SaballusHaid-und-Neue Strasse 18 Robert-Bosch-Campus 176131 Karlsruhe 71272 RenningenGermany GermanyTel: +49 721 66396823 Tel: +49 711 811 35335E-mail: [email protected] E-mail: [email protected]

CNRS Rheon MediaGilles Sassatelli Raj PatelRue Ada 161 20 Leighton Avenue34392 Montpellier Pinner Middlesex HA5 3BWFrance United KingdomTel: +33 4 674 18690 Tel: +44 7547 162920E-mail: [email protected] E-mail: [email protected]

The Open Group University of StuttgartScott Hansen Bastian KollerAvenue du Parc de Woluwe 56 Nobelstrasse 191160 Brussels 70569 StuttgartBelgium GermanyTel: +32 2 675 1136 Tel: +49 711 68565891E-mail: [email protected] E-mail: [email protected]

University of YorkLeandro IndrusiakDeramore LaneYork YO10 5GHUnited KingdomTel: +44 1904 325 570E-mail: [email protected]

Page ii Version 2.0Confidentiality: Public Distribution

21 August 2015


Table of Contents

1 Introduction 2

2 Energy-Aware Scheduling of Dynamic Application Workflows 52.1 Dynamic Application Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Static and Dynamic Power Management . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Energy-Aware Workflow Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Dynamic Scheduling Framework 113.1 General Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 DREAMCLOUD’s Dynamic Scheduling Architecture . . . . . . . . . . . . . . . . . 13

3.2.1 Key Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.2 Interaction between Key Components . . . . . . . . . . . . . . . . . . . . . 16

3.3 Exploiting Application Profiles for Energy-Aware Scheduling . . . . . . . . . . . . . 23

3.3.1 Monitoring Framework Revisited . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.2 Integration of Energy-Aware Heuristics . . . . . . . . . . . . . . . . . . . . 26

4 DreamCloud’s Roadmap Towards Dynamic Power Management 314.1 HPC Domain (HLRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Automotive Domain (BOSCH) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Video Processing Domain (RheonMedia) . . . . . . . . . . . . . . . . . . . . . . . 37

4.4 Evaluation Platform (CNRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Conclusions 43

6 Appendix - Monitoring API 45

References 69

21 August 2015 Version 2.0Confidentiality: Public Distribution

Page iii


Document Control

Version Status Date

0.1 Initial Draft 12.02.20150.2 Executive Summary; Introduction 14.07.20150.3 Related Work; Scheduling Framework; HPC Roadmap 10.08.20150.4 BOSCH contributed to Section 4 (Roadmap) 12.08.20150.5 CNRS contributed to Section 4 (Roadmap) 12.08.20150.6 RheonMedia contributed to Section 4 (Roadmap) 12.08.2015

1.0 Revised revision for internal review 14.08.2015

1.1 Included feedback given by UoY and CNRS 20.08.20151.2 Revision of Section 4 (Roadmap) 21.08.2015

2.0 Final version for submission to EU 21.08.2015

Page iv Version 2.0Confidentiality: Public Distribution

21 August 2015


Executive Summary

This deliverable proposes and describes DREAMCLOUD’s dynamic scheduling framework to tackle thechallenge of optimizing dynamic resource allocation for modern HPC infrastructures and embeddedsystems. Our approach employs the history of an application’s performance and energy profileto improve energy-efficiency by reclaiming dynamic slack times. These slack times can then beexploited to employ energy-saving techniques dynamically at run-time. Our framework improvesthe resource allocation of tasks by feeding the dynamic resource allocation techniques presented inWork Package 2 with application data obtained through monitoring [36]. We successfully extendedthe EXCESS MONITORING FRAMEWORK, which is introduced with Deliverable D3.1 [32], withadditional functionality to collect required application data. By combining dynamic resource allocationwith Dynamic Power Management (DPM) techniques such as Dynamic Voltage and FrequencyScaling (DVFS), our framework improves the execution of dynamic application workflows withrespect to performance or energy; users can thus execute applications in an efficient way without theneed of having in-depth knowledge of either the application itself or the underlying infrastructure.

Our proposed dynamic scheduling framework can be adapted to other DREAMCLOUD use casessuch as the video platform to optimize resource allocation for streaming. Furthermore, we show thatcomponents of the proposed framework can be efficiently re-used in almost all use cases. Therefore,we actively contribute with our work towards a common DREAMCLOUD software toolkit.


Page 1


1 Introduction

Scalable applications are often composed of multiple tasks. These tasks can be either dependentor independent of each other, meaning that some tasks are executable simultaneously, and othersneed to be executed consecutively. Modern HPC systems are, however, inefficient in their energyconsumption due to a permanently changing distribution of computation, communication, and I/Otasks. To tackle this challenge, key components such as the central processing unit (CPU) can beslowed down or turned off to save energy while waiting for new tasks. Employing such energy savingstrategies is known as Dynamic Power Management (DPM). The most noted DPM technique isDynamic Voltage and Frequency Scaling (DVFS); it decreases the voltage and frequency of CPUsto reduce the energy consumption. Frequency scaling, however, may lead to considerable longerexecution times as well as longer CPU idle periods. The additional time may then again yield a highertotal energy consumption of the underlying system, and is likely to violate hard or soft real-timeconstraints. Since both are negative side effects, the effectiveness of DVFS tightly depends on theapplication’s workflow and system behaviour. As a consequence, we cannot blindly apply frequencyscaling to dynamic application workflows without having in-depth knowledge of the application’sbehaviour on a given infrastructure. Dynamic power management can only be effective when theapplication’s workflow for a given infrastructure is thoroughly analysed and exploited to optimize taskscheduling at run-time.

We motivated in Deliverable D3.1 that traditional static scheduling solutions cannot cope with thischallenge accordingly [32]. That’s because standard solutions don’t take the dynamically changingcharacteristics of the underlying infrastructure into account; changing characteristics include memoryaccess rate or network bandwidth. Therefore, we proposed a dynamic scheduling framework asillustrated in Figure 1. We laid already the foundations for dynamic scheduling with Delieverable D3.1by introducing a monitoring framework to compile detailed application profiles. Motivated by the fact

Figure 1: Illustrating DREAMCLOUD’s application lifecycle.

Page 2 Version 2.0Confidentiality: Public Distribution

21 August 2015


that none of the existing monitoring solutions could cope with the requirements of DREAMCLOUD1, wedeveloped in collaboration with the European project EXCESS a novel solution [16]. The monitoringframework is extended substantially with additional features such as the support of profiling thethermodynamics of the underlying infrastructure.

Having collected application profiles for various deployments on a given infrastructure, the applicationdata is then fed to one of the dynamic resource allocation algorithms presented in Work Package 2to generate an optimized deployment plan. Our scheduling framework uses these plans to exploitslack times2, to efficiently utilize well-established DPM techniques aiming to reduce the total energyconsumption. Furthermore, our dynamic scheduler includes a smart feedback control mechanism; thecontroller continuously monitors both execution times and energy consumption of running tasks. Thecollected data is then compared to previously estimated values. When these values differ significantlyat run-time, our scheduler either re-schedules tasks itself or invokes the dynamic resource allocationalgorithm again for an updated deployment plan. Details on smart scheduling are summarized inDeliverable D3.4 [34].

In summary, we address in this deliverable the challenge of dynamic power management in HPCand embedded systems by proposing a dynamic scheduling framework that incorporates detailedinformation about both the application’s performance and the thermodynamics of the infrastructure toimprove the dynamic resource allocation. A better resource allocation then allows the framework toautomatically exploit slack times of tasks by utilizing well-established DPM techniques. We encourageusers to optimize the execution of their applications towards better performance and lower energyconsumption. The main contributions of this deliverable are:

Techniques for energy-aware scheduling of applications. Energy-aware scheduling requires to setup a relevant infrastructure, first. In this regard, we successfully extended the EXCESSMONITORING FRAMEWORK developed in Deliverable D3.1 to meet additional requirementssuch as monitoring the thermodynamics of the underlying infrastructure [32]. Extensionsinclude a novel monitoring API as well as provisioning of new metrics. These extensions allowfor better predicting the run-time and energy consumption of dynamic application workflows.As a result, monitoring of applications now yields both a detailed performance profile as well asan energy profile (cf. Section 3.3.1).

Exploiting application profiles to improve dynamic resource allocation. The research done in WorkPackage 2 shows that in-depth application profiles can improve algorithms for dynamic resourceallocation. As a consequence, we have actively pushed the integration with the EXCESSMONITORING FRAMEWORK to support, among others, energy profiles. Historical applicationprofiles as well as run-time progress information is then used as input for the profiling-basedheuristics developed in Work Package 2 (cf. Section 3.3.2).

Proposing a dynamic scheduling framework including DPM. Dynamic power management plays acrucial role for both HPC and embedded systems. Since it is a challenging task, we propose a

1A key requirement for a monitoring framework used within the DREAMCLOUD project was to sample infrastructuredata at high frequencies without causing a significant performance overhead. For more information on the requirementsand details about DREAMCLOUD’s monitoring framework itself, please refer to Deliverable D3.1 [32].

2A slack is the additional time a task can be executed without violating its deadline.


Page 3


novel dynamic scheduling framework to optimize the dynamic resource allocation of tasks atrun-time (cf. Section 3).

Roadmap for DREAMCLOUD’s use cases towards DPM. We report on the applicability of DPMtechniques for each DREAMCLOUD use case. Although our evaluations revealed that DPM isnot feasible for the automotive domain, we fostered the collaboration and integration processwith CNRS and RheonMedia to come up with a roadmap towards possible dynamic resourceallocation solutions; these solutions re-use key components of our proposed dynamic schedulingframework such as monitoring (cf. Section 4).

The remainder of this deliverable is organized as follows. Section 2 gives an overview about existingstate-of-the-art approaches to task scheduling and dynamic power management techniques. Theoverview focuses, in particular, on energy-aware task scheduling algorithms. Section 3 then detailsDREAMCLOUD’s approach to dynamic task scheduling by exploiting the applications historicalcontext as well as the monitored thermodynamics of the underlying infrastructure. In this context,Section 3.3.1 highlights improvements developed for the EXCESS MONITORING FRAMEWORK,whereas Section 3.3.2 provides an overview on the integration with the dynamic resource allocationalgorithm proposed in Deliverable D2.3 [36]. Section 4 evaluates different DREAMCLOUD use casesin terms of applicability of dynamic power management techniques. Section 5 then finally concludesthis deliverable by summarizing the contributions made and giving an outlook on future work.


21 August 2015


2 Energy-Aware Scheduling of Dynamic Application Workflows

Our proposed dynamic scheduling framework incorporates techniques and algorithms from domainssuch as dynamic power management and task scheduling. This section overviews briefly work relevantand related to both domains. We highlight for each domain the approach adopted by DREAMCLOUD.

2.1 Dynamic Application Workflows

Dynamic application workflows are quite common in scientific computing. An application workflowis composed of multiple—often interdependent—tasks, where tasks are halted until dependent taskshave finished. Application workflows are said to be dynamic if the execution order of tasks is notknown a priori. In summary, dynamic workflows are more flexible to configure and to scale on HPCsystems.

Application workflows, in general, are modelled by directed acyclic graphs as Figure 2 exempli-fies. Tasks are represented by nodes, and the communication between tasks is represented by edges.Deliverable D3.1 on Cloud Communications Patterns Analysis already motivated that an applica-tion workflow equals a software model that can be directly mapped to the AMALTHEA format [18],which is also utilized by DREAMCLOUD applications developed within the embedded domain [32].

DREAMCLOUD’s approach to dynamic application workflows

We carefully selected for DREAMCLOUD’s HPC use case the workflow-based molecular dy-namics simulation ms2 as a typical example for dynamic application workflows [13]. Theapplication is outlined in Deliverable D3.1 [32], whereby the workflow model is described inmore details in Deliverable D3.3 on Energy-Aware Allocation for Clouds [33].

Our proposed dynamic scheduling framework optimizes the allocation of individual tasks; foreach task allocation, effective power management is applied to target systems to considerablyreduce energy consumption.

2.2 Static and Dynamic Power Management

The key objective of power management is to considerably reduce the overall energy consumptionof a system. Turning off system components whenever they are idling is the most obvious techniqueto reduce energy. Figure 3 provides a high-level classification of power management techniquesinto static and dynamic. Static Power Management (SPM) solutions include low-energy hardwareand cooling. For example, the Dynamical Exascale Entry Platform (DEEP) built an exascale-readysystem based on a highly-efficient direct liquid cooling system. The project pledges that their solutionis more energy-efficient than standard air cooling by a factor of 200 [31]. However, low-energyhardware and cooling is expensive, and thus more cost-effective solutions are required. Furthermore,DREAMCLOUD’s objective, to optimize the execution of dynamic application workflows, cannot beachieved with SPM techniques alone. These techniques cannot react to changing system behaviourduring operation. In contrast to SPM, Dynamic Power Management (DPM) can adjust the power


Page 5


Figure 2: Example of a dynamic application workflow modelled as a directed acyclic graph. Theworkflow has four different tasks: Tasks T2.1 and T2.2 require data generated by task T1. Likewise,task T3 cannot start its execution before tasks T2.1 and T2.2 have finished. Thus, tasks T2.1 and T2.2are independent from each other.

consumption of capable hardware such as modern processors during operation. As a consequence,implementing DPM techniques into the dynamic scheduling framework allows DREAMCLOUD toquickly react to dynamically-changing application behaviour at run-time.

Dynamic power management can be implemented in various ways. The focus is either on (a) reducingthe average energy consumption, (b) reducing the peak power consumption [12], (c) limiting the peakpower consumption (also known as power capping) [7], or on (d) reducing the energy consumptionwhile satisfying Quality-of-Service guarantees [17, 29].

Typical dynamic power management techniques include powering down or turn-ing off hardware components such as memory, network devices, or hard disks.Although those are also of interest for DreamCloud, we currently focus ontwo well-established techniques: DVFS and (energy-aware) task scheduling.

DREAMCLOUD’s approach to static and dynamic power management

Our proposed dynamic scheduling framework supports Quality-of-Service guarantees via multi-criteria fitness functions. These functions are integrated into the heuristics developed in WorkPackage 2. They guarantee user-defined constraints with respect to economic costs of users,time and energy consumption. We refer the interested reader to deliverables D2.3 [36] andD3.4 [34] for a detailed overview of heuristics.

2.2.1 Dynamic Voltage and Frequency Scaling (DVFS)

Nowadays, reducing energy is a driving factor for data centers [41]. Aside from operational costs,another major challenge identified is the degrading system reliability while temperature increases [42].Patterson [20] shows that the CPU is in particular sensitive to an increase in temperature, which results


21 August 2015


Figure 3: Taxonomy of power management techniques.

in an increasing thermal leakage; computing is also more error-prone. Here, DVFS can be used tocontrol the energy consumption as well as the thermal leakage.

Usually, DVFS is implemented to reduce frequency and voltage and thereby save energy. Reducingthe processor frequency, however, does not necessarily result in saving energy. A lower frequency islikely to result in longer execution times, that can lead in turn to an increased energy consumption [9].In addition, the non-linear relationship between processor frequency and energy consumption has to betaken into account while adjusting the processor frequency. Sandoval et al. [26] analysed in this contextthe energy consumption of the Intel Xeon E5-2690v2 CPU for different frequencies; experimentswere performed on a widespread set of benchmarks. They show that the power consumption risessignificantly when the CPU is in turbo mode, and that the relationship between performance and powerconsumption varies significantly between benchmarks. Their results suggest that a key objective inimplementing DVFS is to estimate future workload. Estimations help to select the minimum requiredfrequency for different application phases to satisfy guarantees issued by QoS.


Page 7


Figure 4: Scheduling of the application workflow as shown in Figure 2. Tasks T1, T2.2, and T3 areexecuted on CPU0, whereas the task T2.1 is executed parallel to T2.2 on CPU1.

DREAMCLOUD’s approach to dynamic voltage and frequency scaling

Traditional DVFS has its limits in minimizing the energy consumption of dynamic applicationworkflows when used in isolation [21]. This is why we combine DFVS with task scheduling, asit is often suggested in literature [24, 23].

Our proposed dynamic scheduling framework exploits in addition to DVFS application profilescollected at run-time to cope with dynamic application characteristics (cf. Sections 3). Moreover,we have developed a light-weight utility to set CPU frequencies (cf. Section 4.1).

Task Scheduling

Usually, the main objective of a scheduling algorithm is to reduce the overall execution time of anapplication. The algorithm analyses the application’s workflow, and then allocates individual tasks onavailable compute resources including server nodes and VMs to name but a few. A simple resourceallocation is presented in Figure 4 for the workflow shown in Figure 2. A static scheduling algorithmwould estimate the worst-case execution times of all tasks in order to yield an allocation of resourceswith shortest execution time while guaranteeing task dependencies.

Figure 5: We observe a slack if the task T2.1 finishes before its expected worst-case execution time;task T3 cannot start before task T2.2 also finishes.


21 August 2015


Task scheduling is an important technique to reduce the energy consumption of dynamic applicationworkflows. As Figure 2 shows, tasks T2.1 and T2.2 are independent of each other. Both taskscan be either executed simultaneously or subsequently in random order; the latter usually resultsin a longer execution time, and thus is likely to consume more energy. Scheduling decisions canbe made either before the workflow is executed (static) or at run-time (dynamic); the classificationis similar to power management. Scheduling algorithms for dynamic application workflows (DAGmodel) are heuristic-based; the most prominent one is the Heterogeneous Earliest Finish Time (HEFT)algorithm [39]. The very basic idea of the HEFT algorithm is the following: The algorithm integratescomputation and communication costs into the standard DAG model by assigning weights to bothnodes (computation costs) and edges (communication costs). The algorithm then ranks each task(i.e., node) using a user-defined ranking function. After re-ranking, tasks are scheduled in order oftheir rank value to guarantee the earliest finish time. It is shown that the HEFT algorithm generates ascheduling that more often finishes earlier than comparable algorithms [44].

Coming back to the example workflow shown in Figure 4. If the interdependent tasks T2.1 and T2.2have different execution times, we can observe a so-called slack. A slack is defined as the time a task,here T2.1, can be delayed without causing a delay on the entire execution workflow. As a result, wecan extend the execution time of T2.1 until T2.2 finishes. Naturally, slacks can be exploited to saveenergy: The lowest possible processor frequency is set, so that a task still meets its deadline. Figure 5illustrates such a scenario for the workflow in question. Exploiting slack times while scheduling tosave energy is known as energy-aware task scheduling.

2.3 Energy-Aware Workflow Scheduling

Energy-aware task scheduling is a challenging research topic and widely discussed in literature [21, 24,23]; it is tackled in DREAMCLOUD’s Work Package 2 on Dynamic Resource Allocation Techniques,too. Deliverable D2.3, for example, motivates that slack times should be exploited for energy savingsby implementing DVFS [36]. Pietri et al. [21] discuss scheduling algorithms that incorporate DVFS.They distinguish in this context strategies based on slack prediction and slack reclamation. Slackprediction is concerned with predicting the slack times prior to the execution of an applicationworkflow. Prediction-based algorithms therefore need a-priori knowledge about tasks to determineslack times within the workflow. Pietri and Sakellariou [21] tackle the problem of lacking a-prioriknowledge by executing new tasks at the highest possible frequency. At run-time, the task executiontimes are measured, and the data is used for predicting execution times using lower frequencies forsubsequent executions. The dynamic algorithm proposed by [21] is also able to re-schedule tasks ifthe monitoring detects deviations from previous predictions. A downside of their approach is, thatthey assume that each task behaves the same each time it gets executed. We claim that this limits theapplicability of their approach to real-world scenarios: Dynamic scheduling of tasks leads usually todifferent allocations of resources each time a workflow is executed. Thus, it is likely that tasks of thesame or other workflows interfere with each other.

Slack reclamation, by contrast, exploits additional slack times that occur at run-time due to the factthat some tasks finish before their estimated worst-case execution time (cf. Figure 5). Since Pietri andSakellariou [21] included a similar feedback control mechanism into their algorithm, that can exploitdynamic slacks at run-time, their approach can also be categorized as a slack reclamation algorithm.


Page 9


DREAMCLOUD’s approach to energy-aware workflow scheduling

We claim that key to success is proper slack prediction and reclamation to save energy efficiently.Our approach focuses on exploiting application profiles to optimize the slack prediction forfuture schedulings. Progress information is monitored during the execution of applications, too.This information is used at run-time in the same manner as in [21]—to apply a re-schedulingbetween tasks, if required. In contrast to [21], our heuristics are smart: They continuously learnby taking the application’s profile history into account. We don’t assume that a task behaves thesame each time it gets executed.


21 August 2015


3 Dynamic Scheduling Framework

This section defines the specification to the proposed dynamic scheduling framework. It is structuredon the three main phases: planning, profiling, and execution. It should be noted that this sectionprovides a high-level overview of the dynamic scheduling framework, where implementation detailsare given in Deliverables D3.3 (planning phase) and D3.4 (execution phase), respectively [33, 34].

3.1 General Overview

We already touched DREAMCLOUD’s approach to energy-aware scheduling in Section 2, whichpresented well-established techniques that combine dynamic power management with schedulingof dynamic application workflows to save energy. Our approach can be classified as a combinationof slack prediction and reclamation: Dynamic slacks are exploited at run-time to effectively applyDVFS techniques to save energy. Since DREAMCLOUD is tackling both HPC and embedded systems,this section proposes DREAMCLOUD’s generic framework for dynamic scheduling for both domains.Our solution takes historic and current application profiles into account. It improves slack predictionand quickly reacts on deviations in task characteristics. Since the task’s characteristics can differsignificantly even if executed on the same infrastructure (e.g., due to network congestion or defectivehardware), solutions for energy-aware scheduling have to account for that dynamism.

Figure 6: High-level abstraction of DREAMCLOUD’s dynamic scheduling framework; the threephases correlate to the phases illustrated in Figure 1 on page 2.


Page 11


Figure 6 shows a high-level abstraction of our dynamic scheduling framework. We distinguish threephases: Firstly, a user submits an application workflow together with an optimization criteria inthe planning phase (cf. Deliverable D3.4 [34]). Secondly, the EXCESS MONITORING FRAME-WORK collects in the profiling phase performance and energy data for the application in question atrun-time. The outcome of this phase are so-called performance and energy profiles for tasks and work-flows. Thirdly, in the execution phase, the WORKFLOW SCHEDULER requests from the HEURISTIC

MANAGER a deployment plan based on the original user submission. The HEURISTIC MANAGER

than pulls from the EXCESS MONITORING FRAMEWORK historic information about the applica-tion in question. Based on the given workflow description and profiles, the HEURISTIC MANAGER

creates an optimized deployment plan. During execution, the WORKFLOW SCHEDULER monitorsrunning applications, tracks their progress via the EXCESS MONITORING FRAMEWORK, and no-tifies the HEURISTIC MANAGER if the execution differs significantly from the provided plan. Thecorresponding feedback controller to handle deviations is detailed in Section 3.3.2.

Figure 7: DREAMCLOUD’s dynamic scheduling framework; the numbering specifies the workflow.


21 August 2015


3.2 DREAMCLOUD’s Dynamic Scheduling Architecture

This section details the novel dynamic scheduling framework. Furthermore, the interaction betweenthese components and services is presented, whereby the focus lies on components essential for energysaving: the HEURISTIC MANAGER, the EXCESS MONITORING FRAMEWORK, the SCHEDULING

ADVISOR, and the PROGRESS TRACKER. If not stated otherwise, components offer a RESTful APIfor communication.

3.2.1 Key Components

Figure 6 shows the key components of DREAMCLOUD’s dynamic scheduling framework are: WORK-FLOW MANAGER, SCHEDULING ADVISOR, RESOURCE MANAGER, EXCESS MONITORING

FRAMEWORK, and HEURISTIC MANAGER.

WORKFLOW MANAGER. The Workflow Manager is responsible for processing incoming jobrequests, initiating and monitoring the job execution. Therefore, the Workflow Manager has thefollowing components and services:

• Submission Interface. The Submission Interface accepts both the native DREAMCLOUD

scheduling format as well as workflows described in the AMALTHEA format. The SubmissionInterface processes incoming job requests, validates the attached workflow graph, and forwardsthe processed job request to the Scheduling Interface.

• Scheduling Interface. The Scheduling Interface forwards new scheduling requests to theSCHEDULING ADVISOR. A scheduling request consists of the original job request, the selectedoptimization criterion, and current information on allocated and available resources of theinfrastructure; the latter information is retrieved by querying the RESOURCE MANAGER. TheScheduling Interface expects from the SCHEDULING ADVISOR in response a deployment planfor the current workflow. It should be noted that the SCHEDULING ADVISOR acts as an oraclethat returns for each scheduling request a deployment plan. However, it is not guaranteed that adeployment plan is optimal with respect to the given optimization criterion.3

• Deployment Manager. The Deployment Manager has two responsibilities. Firstly, it keepstrack of running and scheduled job requests including the current allocation. That way, we canavoid conflicts in resource allocation on a given platform. Secondly, the Deployment Managerinterfaces with the system’s resource manager. In the HPC use case, the well-establishedPBS/TORQUE resource manager is utilized to control job executions on the cluster. It allowsto execute jobs on a per-core basis on different compute nodes using MPI. The DeploymentManager takes a scheduling request, and passes it on to PBS/TORQUE to start the execution.That implies that the job execution description in Table 2 has to contain commands understoodby the system’s resource manager—i.e., a working PBS/TORQUE script.

• Progress Tracker. The PROGRESS TRACKER keeps track of the current progress of individualtasks and the entire workflow. Tasks themselves have to report their progress in percentage to theEXCESS MONITORING FRAMEWORK via a RESTful API. Thus, the PROGRESS TRACKER

interacts directly with the monitoring framework. Every second the current progress of running

3On the other hand, we expect that the HEURISTIC MANAGER always returns an optimal deployment plan.


Page 13


tasks is polled, and compared with the expected execution time. If the progress differs fromestimated execution times, then the SCHEDULING ADVISOR is notified with the current andexpected status to provide an adapted deployment plan, if required.

RESOURCE MANAGER. The RESOURCE MANAGER maintains a table of allocated and availablecompute resources. The table is updated whenever a task finishes its execution. The RESOURCE

MANAGER provides a RESTful API, so that other components such as the Scheduling Interface canquery current information. The RESOURCE MANAGER itself has no further components. In theHPC use case, we interface the existing PBS/TORQUE manager to retrieve information on allocatedand available compute resources. However, our dynamic scheduling framework requires availabilityof individual CPU cores, which cannot be fulfilled by PBS/TORQUE. As a result, the RESOURCE

MANAGER compiles a list of allocated and available cores per node through information provided bythe Deployment Manager.

SCHEDULING ADVISOR. The SCHEDULING ADVISOR plays a key role in the entire schedulingframework. It takes on scheduling tasks to reduce the complexity of the WORKFLOW MANAGER.Responsibilities of the SCHEDULING ADVISOR are:

• Provision of deployment plans using well-known scheduling algorithms such as HEFT;• Interfacing with the HEURISTIC MANAGER to retrieve optimized deployment plans;• Rescheduling when expected execution times differ significantly from estimations.

The term SCHEDULING ADVISOR has been adopted from the EU-ICT project JUNIPER [38], whichdevelops a platform for large-scale and real-time data analytics. There, the SCHEDULING ADVISOR isused as a middleware service between the native scheduling components of an HPC-cloud system andthe supervising components of the platform; it allows their interoperation without adding any changesinto the security-sensitive layers of the HPC software stack. A similar strategy is being followed byour DREAMCLOUD platform, which aims to facilitate a better integration across all components of theDreamCloud framework.

The SCHEDULING ADVISOR implements a controlled feedback mechanism to reduce the expensivecommunication with the Heuristic Manager. As a result, the SCHEDULING ADVISOR either createsdeployment plans locally using implemented scheduling algorithms, or delegates this task to theHEURISTIC MANAGER. The SCHEDULING ADVISOR sees the HEURISTIC MANAGER as an oracle,which immediately returns an optimal deployment plan on request.

EXCESS MONITORING FRAMEWORK. The EXCESS MONITORING FRAMEWORK is de-ployed on the system, and monitors the infrastructure and running application workflows. Theframework processes and stores metrics relevant to evaluate the performance and energy consumptionfor individual tasks and an entire workflow. The EXCESS MONITORING FRAMEWORK also makesprogress information of all current and historic tasks and workflows available to clients such as thePROGRESS TRACKER. However, the key role of the EXCESS MONITORING FRAMEWORK withinDREAMCLOUD’s dynamic scheduling framework is to provide so-called performance and energy pro-files to the HEURISTIC MANAGER. The HEURISTIC MANAGER requests these profiles whenever anew scheduling request arrives.


21 August 2015


Figure 8: Sequence diagram for a sample application workflow scheduled by DREAMCLOUD’sdynamic scheduling framework. Whereas invididual communication patterns are shown for the firstexecution phase (2-11), only the initial request (2) is shown for the second and third phase. However,phases two and three also run through all steps.

The EXCESS MONITORING FRAMEWORK is ideal to be used within the HPC use in order to provideessential information about the impact of applications on the underlying infrastructure. However,it should be noted that the EXCESS MONITORING FRAMEWORK cannot be deployed for each ofDREAMCLOUD’s use cases. For example, the resources of BOSCH’s automotive use case are verylimited, and do not allow to install additional software components to be used at run-time. However,we have developed light-weight user libraries in Python and C to overcome this obstacle: The clientsallow to send metric data as a post-processing step without the need of installing individual monitoringagents. Thus, in use cases with limited resources, relevant application profiles can be generated whenexecutions have finished. Profiles can then be send to the monitoring database for further analysis.

HEURISTIC MANAGER. The scheduling components see the HEURISTIC MANAGER as an oracle,too. The HEURISTIC MANAGER is queried with a scheduling request including an optimizationcriterion, and responds with an optimal deployment plan. The HEURISTIC MANAGER is providedby the University of York, and it is developed in the course of Work Package 2. The communicationbetween scheduling components and the HEURISTIC MANAGER is implemented via a RESTful API.That way, the HEURISTIC MANAGER can retrieve relevant performance and energy profiles for giventasks and workflows from the EXCESS MONITORING FRAMEWORK.


Page 15


We would like to highlight once more that only the functional interaction between all components andservices described lead to improved performance and energy consumption. Thus, it is crucial that allcomponents are implemented efficiently, and play well together.

3.2.2 Interaction between Key Components

This section details the interaction between key components of the dynamic scheduling framework ex-emplified by the sample workflow illustrated by Figure 2 on page 6. A brief review: The workflow hasfour tasks, where tasks T1 and T3 are serial, and tasks T2.1 and T2.2 can be executed in parallel (i.e.,these are MPI processes in the HPC use case). Figure 8 shows the sequence diagram highlighting indi-vidual communication between components to successfully schedule dynamic application workflows.A user submits the previously mentioned sample workflow, where the communication is detailed foreach of the three phases (T1 = serial, T2.x = parallel, and T3 = serial). The remainder of this section isstructured by Figure 8, and specifies for each step the shared data exchange format. It should be noted,that data exchange format is presented in C-style pseudo code.

First, let us specify the general workflow and define a common terminology used throughout the restof this document: We define an experiment as the execution of a dynamic application workflow ona given particular platform; multiple tasks form a workflow. Tasks are either executed in series orin parallel. Moreover, each experiment is associated with an optimization criterion, and a specificdeployment plan that details the allocation of resources for each task.

For the analysis of workflows, and in particular their individual tasks, we associate each stage of anexperiment with unique identifiers in order to enable end-users to analyse tasks across experiments.We introduce for a clearer understanding a series of identifiers associated with each component ofan experiment (cf. Table 1): experiment, workflow, task, and deployment IDs. All APIs apply thisterminology for communication. Next, we describe the main steps of a simple workflow execution.

Term Abbreviation Description

Experiment ID expID An experiment refers to the execution of an application workflow under a givenresource allocation. The experiment ID is unique, and it exists a one-to-onerelationship between experiments and deployment plans.

Workflow ID workflowID Each application workflow is described by a unique identifier. Since an ap-plication workflow is part of various experiments, it exists a one-to-manyrelationship between workflows and experiments.

Task ID taskId Workflows are composed of at least one task, where each task is identifiablevia a unique task ID. To keep it simple, we do not consider the case where atask is re-used by different workflows. It follows then a one-to-one relationshipbetween tasks and workflows. A task ID allows us to track tasks across multipleexperiments.

DeploymentPlan ID

deployID Each experiment is also linked to a specific deployment plan. It exists a one-to-many relationship between deployment plans and experiments, because variousexperiments can use the same deployment plan.

Table 1: Terminology that describes dynamic application workflows in the context of scheduling.This terminology is widely applied to simplify communication between interfaces, and to provide areasonable monitoring API.


21 August 2015


task T1 startnext T2.1 T2.2file T1.jobenergy 1.0

task T2.1next T3file T2.1.jobenergy 0.5

task T2.2next T3file T2.2.jobenergy 0.5

task T3 exitfile T3.jobenergy 1.0

Listing 1: Sample workflow submission format based on the dynamic application workflow illustratedin Figure 2 on page 6.

¬ Submitting a new job request to the WORKFLOW MANAGER. Let us assume that a usersubmits a new job request to the WORKFLOW MANAGER in order to execute a dynamic applicationworkflow. Listing 1 shows a simple workflow description in the native format of our schedulingframework, whereby individual tasks are separated by line breaks. Table 2 lists common fields todescribe this workflow in the native format. Users can extend the list of fields by adding customfields. As a custom field, for example, a user could choose type. It would tell the scheduler if a taskat hand is either compute-, memory-, or communication-intensive. This information could then beexploited by the SCHEDULING ADVISOR and the HEURISTIC MANAGER to improve deploymentplans. However, developers have to ensure that dependent components understand custom fields;otherwise, those fields will be discarded while submitting new jobs to the WORKFLOW MANAGER.

Field Parameters Required Comment

task <string> ROUTING yes A unique task ID (e.g., T2.1) is used to identify a task within agiven workflow. We use an enumerated typeROUTING = {START, EXIT} to mark tasks that are either thestarting point respectively the end of a workflow.

next <string> no Reference to another task ID, which is executed after the giventask finishes. Allows generation of a task graph.

file <string> yes File that includes required instructions to execute a given task.energy <int> no Optimization criterion in the range of [0, 1], where energy = 0

optimizes the execution against performance, and energy = 1.0optimizes towards lowest energy consumption.

Table 2: List of default fields supported by the native workflow submission format.


Page 17


typedef struct Parameter {char* name;float value;

} Parameter;

enum ROUTING {START,EXIT

};

typedef struct Task {char* id;ROUTING route;char* next_ids[128];Parameter params[1024];

} Task;

void processTask(Task task) { /* do something with a single task */ }void processTasks(Task tasks[]) { /* do something with parallel tasks */ }

/* Process task T1 */Task task1 = (Task){ .id = "T1", .route = START,

.next_ids = { "T2.1", "T2.2" },

.params = {{ .name = "energy", .value = 1.0 },{ .name = "type", .value = 0 } /* 0 = memory-bound */

}};processTask(task1);

/* Process tasks T2.1 and T2.2 */Task task21 = (Task){ .id = "T2.1", .next_ids = { "T3" },

.params = {{ .name = "energy", .value = 1.0 },{ .name = "type", .value = 2 } /* 2 = communication-intensive */

}};Task task22 = (Task){ .id = "T2.2", .next_ids = { "T3" },

.params = {{ .name = "energy", .value = 1.0 },{ .name = "type", .value = 2 } /* 2 = communication-intensive */

}};Task parallel_tasks[2] = { task21, task22 };processTasks(parallel_tasks);

/* Process task T3 */Task task3 = (Task){ .id = "T3", .route = EXIT, .params = {

{ .name = "energy", .value = 1.0 },{ .name = "type", .value = 1 } /* 1 = compute-intensive */

}};processTask(task3)

Listing 2: Example for a scheduling request including energy and type parameters. First, the taskT1 is described, followed by the parallel tasks T2.1 and T2.2. Eventually, task T3 is declared andprocessed. The original task dependencies are guaranteed through the procedural description.


21 August 2015


node01CPU0CPU1 0..120 240..480CPU2 0..120 240..480CPU3 120..360 480..960CPU4 120..360 480..960

node02CPU0 0..960CPU1CPU2CPU3 120..360 480..960CPU4 120..360 480..960

Listing 3: Response from RESOURCE MANAGER when queried for allocated and available resources.For each node, CPUs are listed in increasing order. Availability intervals are associated with CPUs, ifthe CPU is allocated. If the CPU is available, no further timings are given. Linebreaks distinguishdifferent compute nodes, and spaces declare different availability intervals.

Scheduling request to SCHEDULING ADVISOR. The processed job request, i.e., a representa-tion of the directed acyclic graph, is forwarded to the SCHEDULING ADVISOR. The data exchangeformat follows the example as given in Listing 2. Additional parameters declared for each task—including the optimization criterion—are passed to the advisor via the methods processTask()respectively processTasks(); the latter method takes an array of tasks to be executed in parallel.

® SCHEDULING ADVISOR sends request to RESOURCE MANAGER. When a new schedulingrequest arrives at the SCHEDULING ADVISOR, the SCHEDULING ADVISOR pulls from the RESOURCE

MANAGER a current list of allocated and available compute resources including the availability ofCPU cores. This request is independent from any given application workflow, and thus a parameterlessfunction call suffices.

¯ RESOURCE MANAGER response to SCHEDULING ADVISOR. The RESOURCE MANAGER

stores all allocated and available compute resources of a system at all times. Resources are describedon a per-node basis, and nodes are associated with a unique identifier. The response from theRESOURCE MANAGER then contains a list of all registered compute nodes including their computeresources (e.g., CPUs, GPUs, et cetera) and availability intervals. Availability intervals are given inseconds as exemplified by Listing 3: CPU1 of node01, for example, is available for 120 seconds,allocated between seconds 120 and 240, and then available again from seconds 240 to 480.

° SCHEDULING ADVISOR sends request to HEURISTIC MANAGER. Following the incomingscheduling request and the retrieval of allocated and available resources, the SCHEDULING ADVISOR

sends all aggregated data to the HEURISTIC MANAGER. The SCHEDULING ADVISOR can alsoinclude additional parameters to the request such as declaring the scheduling algorithm the HEURISTIC

MANAGER is supposed to use to create a deployment plan. Since the HEURISTIC MANAGER is an


Page 19


{"command": "generateDeploymentPlan""workflowID": "sample.workflow","tasks": [{

"taskID": "T1","energy": 1.0,"parents": [],"children": [ "T2.1", "T2.2" ]

},{

"taskID": "T2.1","energy": 0.5,"parents": [ "T1" ],"children": [ "T3" ]

},{

"taskID": "T2.2","energy": 0.5,"parents": [ "T1" ],"children": [ "T3" ]

},{

"taskID": "T3","energy": 1.0,"parents": [ "T2.1" ],"children": []

},]

}

Listing 4: SCHEDULING ADVISOR requests a deployment plan from the HEURISTIC MANAGER bysubmitting a workflow (i.e., DAG graph) and optimization parameters.

external component, communication is done via a RESTful Web service. Listing 4 illustrates a requestfor the considered sample workflow. In case that the SCHEDULING ADVISOR does not get an answerfrom the HEURISTIC MANAGER within a defined time interval, the SCHEDULING ADVISOR continuesby creating a deployment plan based on the given information. The SCHEDULING ADVISOR supportsscheduling via the described HEFT algorithm (cf. Section 2.2.1).

± HEURISTIC MANAGER sends response to SCHEDULING ADVISOR. The HEURISTIC MAN-AGER returns a deployment plan as exemplified in Listing 5. The response includes allocation times ofindividual tasks to compute resources; an allocation is described by the CPU identifier (e.g., CPU0),the task identifier (e.g., T2.1), and the availability time of the resource. The deployment also includesfor each CPU a recommended power mode, i.e., energy-saving, performance, or balanced. Dependingon these recommendations, the SCHEDULING ADVISOR adapts the power consumption on each targetnode using DVFS (cf. Sections 2 and 4.1).


21 August 2015


{"command": "receiveDeploymentPlan""errorcode": "0","workflowID": "sample.workflow" [

{"taskID": "T1","resource": "node02","startTime": 0.0,"endTime": 1.0,"cores": 1,"powerMode": 0,"dependencies": []

},{

"taskID": "T2.1","resource": "node01","startTime": 1.0,"endTime": 83.0,"cores": 16,"powerMode": 1,"dependencies":[ "T1" ]

},{

"taskID": "T2.2","resource": "node02","startTime": 1.0,"endTime": 83.0,"cores": 16,"powerMode": 1,"dependencies":[ "T1" ]

},{

"taskID": "T3","resource": "node01","startTime": 325.0,"endTime": 326.0,"cores": 1,"powerMode": 0,"dependencies":[ "T2.1", "T2.2" ]

}]

}

Listing 5: Deployment plan returned by the HEURISTIC MANAGER. The deployment plan includesfor each task of the workflow an allocation to nodes and CPUs. For each CPU, the energy state isincluded as well as timings planned for allocating the respective CPUs. Power modes are defined asfollows: energy-saving = 0, balanced = 1, performance = 2. More information on supported powermodes are given in Table 4 on page 34.

² Generating architecture-specific deployment commands. The SCHEDULING ADVISOR hasnow either created the deployment plan itself, or retrieved an optimized deployment plan from the


Page 21


typedef struct CPUAvailability {int CPUID;float range[2];char* taskId;

} CPUAvailability;

CPUAvailability avail = (CPUAvailability){.CPUID = 1,.range = { 1.0, 83.0 },.taskID = "T2.1"

};ResourceManager.update(CPUAvailability);

Listing 6: Resource Update.

HEURISTIC MANAGER. Next, the SCHEDULING ADVISOR extends the generic deployment plan witharchitecture-specific commands to enable deployment on the target system. Commands for parallelprocessing of tasks could be based on OpenMPI, for instance. The extended deployment plan isthen passed back to the WORKFLOW MANAGER for preparing the actual execution of the workflowrespectively task.

³ WORKFLOW MANAGER triggers RESOURCE MANAGER for status update. The WORK-FLOW MANAGER triggers the RESOURCE MANAGER for an update on the allocation and availabilitystates of compute resources before the actual execution of the application workflow is initiated. Thisprocedure ensures that the resources are still free. Listing 6 states the request to update the database ofthe RESOURCE MANAGER. Once a resource is marked as available for execution, a final check for al-ready scheduled tasks on that resource is performed. If no tasks are scheduled, the execution of thefirst task of the workflow starts.

´ Start monitoring via EXCESS MONITORING FRAMEWORK. Simultaneously with the exe-cution of a task, the EXCESS MONITORING FRAMEWORK is started on relevant compute resourcesto monitor the progress and characteristics of that given task. The WORKFLOW MANAGER initiatesthis process by passing the required parameters to the EXCESS MONITORING FRAMEWORK as ex-emplified by Listing 7. The request includes the URL to the monitoring server, a current experimentID associated with the task execution, the task identifier itself, and a pointer to a configuration file.The configuration file declares which metrics are monitored at which frequency during execution timeof the task (cf. Deliverable D3.1 [32]).

µ RESOURCE MANAGER tracks task status at run-time. The WORKFLOW MANAGER tracksthe execution status of each task at run-time. Tasks are supposed to send progress information to theEXCESS MONITORING FRAMEWORK. The data is then continuously polled by the WORKFLOW

MANAGER to verify that the estimated execution time agrees with the actual running time of thetask. A downside of this approach is, that components of the scheduling framework cannot estimate


21 August 2015


mf_agent \-id=${EXP_ID} \-config=${CONFIG_FILE} \-task=${TASK_ID} \-workflow=${WORKFLOW_ID}

# Examplemf_agent \

-id=AU3TtOOaYHjgymAd2i5T \-config=/opt/mf/configs/default \-task=T2.1 \-workflow=sample.workflow

Listing 7: Passing parameters to a monitoring agent. Parameters are the experiment ID, the task ID, theworkflow ID, and the path where the configuration file is located. The agents have to be started oneach compute node, where the task to be monitored is deployed. The parameters, however, are thesame on each node.

the progress of each task externally. Therefore, the source code of each task has to be instrumentedin order to include required API calls to the EXCESS MONITORING FRAMEWORK. For moreinformation on the progress tracker, please refer to Section 3.3.2.

l RESOURCE MANAGER stops monitoring upon task completion. When a task finishes, theRESOURCE MANAGER stops all agents previously started on the compute nodes.

3.3 Exploiting Application Profiles for Energy-Aware Scheduling

The run-time behaviour and execution time of an application is not just dependent on the given inputparameters.4 Distributed applications, for example, are further influenced by a multitude of run-timeevents including network congestion, hardware failure, or contention for compute resources to namebut a few. As a consequence, executing the same application on the same infrastructure may result insignificantly varying execution times. However, profiling applications allows us to gain better insightsof their run-time characteristics. Exploiting profiling data enables developers to

• improve task scheduling in a way to increase slack times (i.e., slack reclamation) [24, 23], wherelonger slack times lead to a better implementation of energy-saving techniques;

• improve prediction of execution times of individual tasks (i.e., slack prediction) [6, 21];• cross-platform energy prediction [6]; and to• classify tasks and execution sections within tasks into categories such as compute-intensive,

memory-intensive, and communication-intensive [6, 27]; identified categories allow developersto react on different phases with reasonable power management techniques at run-time.

4For example, the execution time of a molecular dynamics simulation such as ms2 heavily depends on the number ofmolecules to be simulated.


Page 23


In DREAMCLOUD, we are exploiting performance and energy profiles of dynamic application work-flows to improve slack prediction and slack reclamation. The profiles serve as an additional input toimprove the heuristics developed in the course of Work Package 2. Given an application workflow andan optimization criterion, the heuristics are then able to create optimized deployment plans while con-sidering specific task characteristics obtained through the profiles. For more details on how to theseprofiles flow into the process of creating deployment plans please refer to Deliverable 2.3 [36]. We seethe heuristics as an oracle, that returns for a given query an optimized deployment plan. Deploymentplans, as stated in the previous section, include recommendations on best processor frequency andvoltage level with respect towards the optimization criterion. As a result, components of our proposedscheduling framework use this information to apply reasonable power states for each allocated CPU.

Which components of our dynamic scheduling framework are essential towards better dynamicpower management? We have already learned, that in order to improve the performance or toreduce the energy-consumption of application workflows, we require an optimal deployment planthat improves slack reclamation as well as reasonable power management techniques. Therefore, theinterplay of the components of the proposed scheduling framework is crucial for success. First is themonitoring component. Without monitoring application workflow, these workflows are a black boxand thus difficult to optimize. However, sampling relevant metrics such as performance indicatorsfor different deployments helps to understand the characteristics of an application. Second are theheuristics that produce optimized deployment plans. These deployment plans are then used forscheduling by the SCHEDULING ADVISOR. At run-time, progress tracking as well as monitoringis essential to quickly react on unexpected changes during execution. This feedback mechanism isimplemented by the SCHEDULING ADVISOR and detailed in Deliverable D3.4 [34]. The SCHEDULING

ADVISOR also sets the CPU frequency for each task to be scheduled. That way, and via increasedslack times, the scheduling framework improves energy-efficiency.

3.3.1 Monitoring Framework Revisited

Deliverable D3.1 introduced the EXCESS MONITORING FRAMEWORK as a reasonable tool for profil-ing application communication patterns. Exploiting synergies between EXCESS and DREAMCLOUD

allowed us to extend the original EXCESS MONITORING FRAMEWORK with additional function-ality essential for current and future work in DREAMCLOUD. The outcome of deliverables D3.2 toD3.4 will therefore be merged into upcoming releases of the EXCESS MONITORING FRAMEWORK.This section reports on the additional features that were implemented to realize energy-aware taskscheduling with DREAMCLOUD’s dynamic scheduling framework. Features include

• additional support for energy metrics,• application-specific progress tracking through code instrumentation, and• implementation of RESTful Web services to retrieve and store performance profiles, energy

profiles, deployment plans, and for requesting basic statistics on tasks and workflows.

Metric Support. We implemented in cooperation with EXCESS the following list of essentialplug-ins to collect relevant performance and energy metrics at run-time: PAPI-C [30], RAPL [15],


21 August 2015


Likwid [40], /proc/meminfo, iostat, Infiniband, nvdia-smi, and hw_power [37]. We wouldlike to highlight the new energy-related plug-ins RAPL, Likwid and hw_power.

The performance monitoring programming interface (PAPI) has compiled-in support for RAPL. RAPLis a feature specific to Intel Xeon CPUs in order to estimate the processor and memory power [8]; ituses a prediction-based power model based on hardware events.

Likwid, short for Like I Knew What I’m Doing, is a collection of command-line tools that facilitateaccess to hardware registers for measurement and management [40]. The current version of Likwidaggregates a subset of the data retrieved via RAPL.

During the course of the EXCESS project, a hardware-based power and energy measurement systemwas integrated with the EXCESS cluster. The power measurement system consists of one Addi-System (power measurement system) and four A/D converter APCIE-3021-16. The tools responsiblefor aggregating the counters is referred to as the EXCESS Power Tools [37]. It should be noted that thehardware-based power measurement of the computational nodes is switched on automatically with thebeginning of each job. Thus, hw_power is exclusively developed for the EXCESS cluster, and cannotbe deployed as a plug-in on other platforms. hw_power sends the metric data directly to the databasevia its RESTful service.

Performance and Energy Profiles. Monitoring workflows on a per-task basis allows us to saveboth performance and energy profiles for further analysis. Individual metrics can be collect at a user-defined rate. Listing 8 outlines a response from the RESTful Web service. These profiles can then beused as an input to the HEURISTIC MANAGER.

We have implemented a Web service that can be queried to search for experiments, workflows, tasks,and deployment plans. Details on the API are given in the Appendix 6 on page 45.

Progress Tracking. The SCHEDULING ADVISOR requires up-to-date information about runningtasks and workflows in order to monitor the overall progress effectively. If the expected progress differsfrom the actual progress, then the SCHEDULING ADVISOR can react accordingly (i.e., rescheduling).Progress information helps to implement a feedback-control mechanism using the well-known PIDcontroller (cf. Deliverable D3.4 for more information).

Progress tracking of current tasks and workflows requires, however, that tasks themselves provideprogress information at run-time; tasks cannot be longer seen as a black box. As a result, we haveimplemented in cooperation with EXCESS light-weight libraries for source code instrumentation inC and Python. They have the following key features:

• sending application-specific data (e.g., progress information),• profiling specific code fragments (e.g., monitoring execution times of functions), and• retrieving historic metric data for code optimization at run-time and via post-analysis.

Listing 9 presents a minimal example of how to use the library in order to send the current progress.Progress information for a given task can then be retrieved by clients using a RESTful service;Listing 10 shows the corresponding server response in JSON. The EXCESS Monitoring API isdetailed in Appendix 6 on page 45.


Page 25


[{"@timestamp": "2015-07-14T14:52:19.694","host": "node02","task": "T2.1","type": "performance","CPU0::PAPI_FP_INS": 21748,"CPU0::PAPI_TOT_CYC": 383128746,"CPU1::PAPI_FP_INS": 29514,"CPU1::PAPI_TOT_CYC": 260883197,"CPU2::PAPI_FP_INS": 48749,"CPU2::PAPI_TOT_CYC": 133928286,"CPU3::PAPI_FP_INS": 52237,"CPU3::PAPI_TOT_CYC": 219913611

},{"@timestamp": "2015-07-14T14:52:50.931","host": "node02","task": "T2.1","type": "energy","PP0_ENERGY:PACKAGE1": 41.708,"PP0_ENERGY:PACKAGE0": 57.5284

}...

]

Listing 8: Example performance and energy profile for a given task. The first JSON object was createdby the PAPI plug-in collecting on a per-core basis information regarding floating point operations(FP_INS) and the total number of cycles (TOT_CYC). The second JSON object shows Joules persecond for the first and second package of a CPU. Usually, the processor package is composed of oneor more dies, a top shell protecting the die, and a bottom board with contacts matching motherboardsockets.

3.3.2 Integration of Energy-Aware Heuristics

We follow a smart allocation and scheduling of tasks, which is detailed in Deliverable D3.3 [33];the main idea is the following: Our dynamic scheduling framework performs a standard schedulingbased on HEFT scheduling supported by the SCHEDULING ADVISOR. We will refer to this kind ofscheduling managed by the SCHEDULING ADVISOR as low-level. In contrast to existing schedulingframeworks, the low-level scheduler is connected to our smart HEURISTIC MANAGER. The HEURIS-TIC MANAGER, as already motivated throughout this deliverable, is expected to compute an optimaltask scheduling for a given application workflow. We refer to the HEURISTIC MANAGER as thehigh-level scheduler.

Moreover, we distinguish two scheduling scenarios: Firstly, an offline scenario, where the SCHEDUL-ING ADVISOR receives an incoming scheduling request, and immediately forwards the request to theHEURISTIC MANAGER—as described by point 5 of Section 3.2.2. Based on the returned deploymentplan, the given application workflow is executed by the WORKFLOW MANAGER. Secondly, an onlinescenario, where an application worfklow is currently running. In this case, the SCHEDULING AD-


21 August 2015


#include <atom_api.h>

int main (int argc, char **argv) {/* initialize monitoring */char* experiment_id = "AU3TtOOaYHjgymAd2i5T";char* server = "localhost:3000/dreamcloud/mf";mf_init(server, experiment_id);

/* define the progress metric */mf_metric* m;m = malloc(sizeof(mf_metric));m->timestamp = mf_get_time();m->type = "progress";m->name = "progress(%)";m->value = "20";

/* send metric data */mf_update(m);

}

Listing 9: Code instrumentation via a C library. Firstly, the URL to the web server as well as anexperiment ID that associates the sampled data with the current execution has to be passed to themethod atom_init. Secondly, progress information is sent via calling atom_update.

VISOR monitors the progress of all current tasks by requesting task updates through the PROGRESS

TRACKER. Whenever the current progress differs from the estimated schedule, a feedback controlloop based on a PID controller decides, if a rescheduling for the remaining tasks is beneficial. If yes,the feedback controller either decides to perform the rescheduling locally (i.e., low-level scheduling),

[{

"@timestamp": "2015-06-12T13:54:52.354","host": "node02","task": "t2.1","type": "progress","progress(%)": "0"

},{

"@timestamp": "2015-06-12T13:54:57.224","host": "node02","task": "t2.1","type": "progress","progress(%)": "6"

},...

]

Listing 10: Excerpt of a JSON response about progress information for the task t2.1.


Page 27


Figure 9: High-level abstraction of the SCHEDULING ADVISOR’s workflow.

or to request an updated deployment plan from the HEURISTIC MANAGER (i.e., high-level schedul-ing). Naturally, the latter has higher communication costs, and thus introduces latency. Hence, calls tothe high-level scheduler should be minimized.

Due to the fact that we provide the HEURISTIC MANAGER with in-depth performance and energyprofiles of running and previously executed tasks, the HEURISTIC MANAGER is able to createdeployment plans optimized with respect to energy consumption.

In summary, we reduce the energy consumption of dynamic application workflows by

• energy-optimized deployment plans,• online progress tracking of running tasks,• smart feedback control mechanism that enables our framework to re-schedule remaining tasks,• exploitation of energy profiles available for all tasks of a workflow, and• setting reasonable CPU frequencies per node/core for individual tasks.

The remainder of this section details the communication between the SCHEDULING ADVISOR andthe HEURISTIC MANAGER, as well as the interaction between the HEURISTIC MANAGER and theEXCESS MONITORING FRAMEWORK.

Communication between the SCHEDULING ADVISOR and the HEURISTIC MANAGER. Com-munication between the SCHEDULING ADVISOR and HEURISTIC MANAGER is implemented viaRESTful services. Each component is running a Web service (i.e., a RESTful server) to handle thecommunication efficiently.5 That way, both components are decoupled, and can communicate not justlocally, but as well over the Internet.

5The specification of the relevant APIs is detailed in Deliverable 3.4 [34].


21 August 2015


Figure 9 illustrates the communication between the SCHEDULING ADVISOR and the HEURISTIC

MANAGER. Both components are implemented independently, although their functionality partiallyoverlaps. The rational for separating the functionality into two components relies on the originalrequirements for flexibility in scheduling: the original scheduling is based on platform-specific metricsand is provided by the HEURISTIC MANAGER. However, it is unlikely that actual execution timesalways match predicted execution times. We tackle this challenge by incorporating a smart controlfeedback loop, which handles the following two scenarios at run-time using progress information:

1. Task execution is delayed by up to a given threshold t. The threshold t, for example t = 10%,should be based on the predicted execution time, or on an optimal SLA specification. Wheneverthe deviation of the execution time is within the given tolerance, we assume that reschedulingis unnecessary; the HEURISTIC MANAGER will not be notified. However, due to the fact thatthe HEURISTIC MANAGER exploits historic performance and energy profiles while creatingdeployment plans, the heuristics of the HEURISTIC MANAGER will automatically take thedeviation into account when a new deployment plan is created that includes the affected task.

2. Task execution is ahead of predicted runtime by a given threshold. We distinguish two scenarios:Firstly, the SCHEDULING ADVISOR verifies that the tasks on individual cores run at an operatingspeed of 100%. In addition, the energy-awareness parameter is assessed. We then adjust the CPUmode if the gain in execution time exceeds the time needed to switch the core frequencies. If theCPU mode is adjusted, then the HEURISTIC MANAGER is notified with an update. Otherwise,execution of the application workflow proceeds as planned.6 Secondly, the SCHEDULING

ADVISOR decides that no optimization is possible. A new deployment plan is requested, taskdelays are submitted to the HEURISTIC MANAGER.

It should be noted that the SCHEDULING ADVISOR always compares updated deployment plans withan existing deployment plan. If the new deployment plan does not improve execution time or energyconsumption, respectively, then no rescheduling is performed. Otherwise, tasks that were reassignedto other nodes or cores will be executed according to the new deployment plan. The SCHEDULING

ADVISOR notifies the HEURISTIC MANAGER about accepting or declining the new deployment plans.It is expected that the HEURISTIC MANAGER uses this information for optimizing future deploymentplans for the given workflow.

In summary, the control feedback mechanism allows to reduce expensive communication between theSCHEDULING ADVISOR and the HEURISTIC MANAGER. Additionally, the SCHEDULING ADVISOR

can benefit from improved deployment plans when the current scheduling cannot be optimized locally.For more information on the control feedback loop, please refer to Deliverable D3.4 [34].

Communication between the EXCESS MONITORING FRAMEWORK and the HEURISTICMANAGER. The HEURISTIC MANAGER requires both performance and energy profiles for aset of tasks in order to optimize deployment plans using specific heuristics. These profiles are explic-itly pulled by the HEURISTIC MANAGER while receiving a new deployment plan request from theSCHEDULING ADVISOR (cf. point 5 in Figure 8). The SCHEDULING ADVISOR includes in its re-quest all required information to query the EXCESS MONITORING FRAMEWORK to get relevant

6This technique allows the SCHEDULING ADVISOR to adjust the schedule locally without requesting another optimizeddeployment plan each time the actual execution time does not match expectations. In summary, we significantly cutcommunication costs and bandwidth usage.


Page 29


profiles (cf. Figure 9). These information include: a workflow ID, a task ID, and the current exper-iment ID. The HEURISTIC MANAGER then retrieves either a performance or an energy profile forthe task at hand. Listing 11 shows a request to get a performance profile, whereas Listing 12 shows arequest to request an energy profile.

$ curl -XGET http://mf.excess-project.eu:3030/dreamcloud/mf/profiles/sample.workflow/t2.1/AU3TtOOaYHjgymAd2i5T

[{

"@timestamp": "2015-06-08T17:06:10.740","host": "node02","task": "T2.1","type": "memory","mem_avail": 73.87,"mem_used": 26.13

},{

"@timestamp": "2015-06-08T17:06:11.932","host": "node02","task": "T2.1","type": "memory","mem_avail": 73.87,"mem_used": 26.13

},...

]

Listing 11: Sample request to retrieve a performance profile for a given combination of workflow ID(sample.workflow), task ID (T2.1), and experiment ID (AU3TtOOaYHjgymAd2i5T).


21 August 2015


4 DreamCloud’s Roadmap Towards Dynamic Power Manage-ment

This section presents the roadmap towards implementing dynamic power management for each ofDREAMCLOUD’s use cases. The focus lies on specifying individual requirements, highlightingprimary integration efforts, and outlining a general schedule for future research. Moreover, we presentprimary integration efforts with the evaluation platform developed by CNRS.

4.1 HPC Domain (HLRS)

Energy-awareness in HPC systems has come of age in recent years [41]. Although performancekeeps to be the topmost objective for companies, data centers have shown a high interest in reducingtheir energy consumption that continues to increase due to several reasons [2, 17, 19]: These reasonsinclude the advancement of big data, fast data, the Internet of Things (IoT), and the recent trend ofemploying Graphical Processing Units (GPUs) in order to speed up execution of compute-intensivescientific applications. Since increased energy consumption leads to extra operational costs, reducingenergy is a driving factor for data centers [41]. Thus, data centers can significantly reduce theiroperational costs by saving energy and reducing cooling costs.

In HPC systems, both static and dynamic power play an important part. Energy efficiency canbe achieved on application level through a sophisticated resource management7, or via energy-efficient hardware [42]. However, optimizing energy on an application level is difficult. Eachapplication behaves differently depending on the underlying infrastructure, the set of allocated

7Resource allocation can be either performed at network level (task mapping) or at core level (scheduling).

$ curl -XGET http://mf.excess-project.eu:3030/dreamcloud/mf/energy/sample.workflow/t2.1/AU3TtOOaYHjgymAd2i5T

[{

"t2.1": [{

"@timestamp": "2015-06-08T17:06:12.132","type": "pwm","CPU1_node02": 48,"ATX12V_node02": 193,"CPU2_node02": 48

},..

]}

]

Listing 12: Sample request to retrieve an energy profile for a given combination of workflow ID(sample.workflow), task ID (T2.1), and experiment ID (AU3TtOOaYHjgymAd2i5T).


Page 31


resources, and possible network congestion. In addition, current energy analyses report only on theoverall consumption of specific nodes or the entire system, and do not take into account the actualapplication [28]. What can be done beyond conventional SPM and DPM techniques towards savingenergy in HPC?

We believe that a better slack prediction and reclamation in combination with effective power man-agement techniques such as DVFS on a per-core basis are key to success. As a result, we haveimplemented and deployed a primarily version of the presented dynamic scheduling framework onone of our clusters. The remainder of this section reports on requirements and outlines a roadmaptowards a more effective power management in HPC.

4.1.1 Requirements Analysis

We have assessed the following four requirements that a workflow manager has to implement on anHPC system to improve energy-efficiency:

• provision of an energy-aware workflow scheduling,• capability of allocating tasks to individual CPU cores instead of compute nodes as a whole,• detailed monitoring of individual tasks at run-time to quickly react on changing task characteris-

tics or hardware failures, and• switching CPU power states on a per-core basis with low latency.

The dynamic scheduling framework as proposed in Section 3 already offers energy-aware schedulingas well as a suitable monitoring component by means of the EXCESS MONITORING FRAMEWORK.Furthermore, the proposed SCHEDULING ADVISOR includes setting the CPU frequencies as needed.That leaves us with the capability to allocate tasks to individual CPU cores. HPC systems, in general,employ the PBS/TORQUE resource manager8 for workflow management and execution. Althoughthis resource manager excels in usability and efficient workflow management, it lacks on support forallocating tasks to individual CPU cores [1]. Thus, we just use PBS/TORQUE to reserve requiredcompute nodes, but then perform (parallel) task allocation and execution via standard MPI. That way,we can allocate multiple tasks on the same compute node to distinct CPU cores.

Since the EXCESS MONITORING FRAMEWORK offers a substantial list of plug-ins to measureperformance and energy, we selected three plug-ins for the implementation of our prototype (cf. Sec-tion 3.3.1 on page 24): PAPI-C, RAPL, and hw_power. For each of these plug-ins, we furtherselected a subset of metrics to be collected at run-time. These metrics are listed in Table 3.

4.1.2 Selected Optimization Strategy

There is a number of ways to optimize energy consumption for dynamic application workflows. Asidefrom modifications introduced to the source code of the application itself, most of these methodsrely upon a well-designed scheduling of the individual tasks of the entire workflow. As introduced inSection 2, these application workflows are often modelled via directed acyclic graphs. Thus, the task

8Version 4.2.8


21 August 2015


Plug-in Metric Description

PAPI-C PAPI_FP_INS Floating point instructionsPAPI_TOT_CYC Total cycles

RAPL via PAPI-C PP0_ENERGY:PACKAGE0 Energy used by all cores in package 0 (nJ)PP0_ENERGY:PACKAGE1 Energy used by all cores in package 1 (nJ)

hw_power CPU1_node0X Power usage of CPU1 on node XCPU2_node0X Power usage of CPU2 on node XATX12V_node0X Power usage of the ATXV12 power supply unit of node X

Table 3: Table of metrics monitored through the EXCESS MONITORING FRAMEWORK at run-time.It should be noted, that the PAPI metrics are sampled on a per-core basis (e.g., PAPI_TOT_CYCis collected for each individual CPU core). Energy is measured by RAPL per processor package,whereby a package is composed of one or more dies, a top shell protecting the die, and a bottom boardwith contacts matching motherboard sockets.

of figuring out an optimal mapping for a set of tasks onto a set of available compute resources is NP-complete; execution time and minimum energy consumption being the objectives. Both objectives areopposite by nature: shortest execution time requires higher energy consumption.

As a result, we have decided to integrate the heuristics developed in Work Package 2 to get deploy-ment plans optimized either for performance or energy consumption9. Still, we have to considerrecommendations coming from the deployment plans regarding best CPU power states. The proto-type implementation distinguishes five different kind of power states that the HEURISTIC MANAGER

can propose. These power states, as exemplified by Table 4, are set for each compute node usinga command-line utility that we have implemented for this purpose: cpufreq. cpufreq adjuststhe CPU frequency on our HPC test system. The utility uses the same-named Linux kernel mod-ule to change the underlying scaling governor (i.e., the power scheme used by the CPU) and to setfrequencies (cf. Listing 13); three governors are supports [4]:

1. Ondemand: frequency adjusts automatically based on the current CPU usage.2. Performance: The CPU frequency is set to maximum (equals Intel’s turbo boost mode).3. Userspace: Enables users to set the frequency per core individually.

Selecting the Userspace governor allows end-users to set the frequency freely. cpufreq allows us toset CPU frequencies in steps of 0.1 GHz; the voltage is adjusted automatically. However, it shouldbe noted that Intel Ivy Bridge processors have only access to an external voltage regulator, that setsall CPU cores to the same frequency and voltage [25]. By contrast, newer Intel Haswell processorshave additional internal voltage regulators that allow to adjust frequency and voltage of each CPUcore individually [5]. It should be noted, that Intel will remove for its next processor family namedSkylake the fully integrated voltage regulators (FIVR), which may prevent developers again fromsetting frequencies and voltages on a per-core basis [22]. Our HPC cluster has two compute nodesbased on Intel Ivy Bridge, and one Intel Haswell-based compute node. Thus, only the latter node canbe used for setting CPU cores individually.

9It should be noted that also a hybrid scenario is possible, which provides an optimal trade-off for performance andenergy consumption.


Page 33


Power State Frequency Range (GHz) CPU Usage (%)

low-energy (range 1) 1.5-1.7 50-55balanced (range 1) 2.1 66balanced (range 2) 2.4-2.7 80-90power (range 1) 3.0 100power (range 2) 3.3 110

Table 4: Power states defined for each compute node of our HPC cluster. Compute nodes are equippedwith an Intel Xeon E5-2690v2 CPU, which has a minimum frequency of 1.2 GHz, and maximumfrequency of 3.0 GHz; turbo mode is between 3.3 and 3.6 GHz. Hence, power range 2 equalsa CPU load of 110%. These power states are defined in the deployment plans, and then set via thecpufreq utility.

4.1.3 Roadmap

The previous sections detailed our current approach towards dynamic power management for the HPCdomain. We have implemented—in cooperation with partners from Work Package 2—a first prototypethat analyses the scientific application on molecular dynamics: ms2. The application is described inDeliverable D3.1 [32]. Implementation details concerning the dynamic scheduling framework areprovided in deliverables D3.3 [33] and D3.4 [34]. We will continue with improving the prototype, inparticular focusing on selecting an optimal subset of metrics to be collected at run-time, and reasonablepower states to be set via our cpufreq utility. We will also broaden our analysis to other scientificworkflows. These experiments and their evaluation will be presented in Deliverable D6.5 in PM36.

4.2 Automotive Domain (BOSCH)

The EU legislation demands for future cars an ongoing decrease of CO2 exhaustion with the goalof an average outlet of 95g/km in 2020, coming from an average CO2 exhaustion of all new cars of135.7g/km in 2011 [11]. Of this 135.7g/km, about 8% are contributed by the electrics and electronicswithin the car, together with air conditioning and cooling. In electrical power consumption, the electric/ electronic based vehicle functions under real operation conditions sum up to 700–900W for midclass vehicles and more than 1kW pro premium vehicles with a maximum of up to 2kW, where as it isonly 250W in the NEDC (New European Driving Cycle) with most consumers de-activated. Here, therule of thumb derived from NEDC to convert between electrical power, gasoline consumption andCO2 emission is:

100W ∼ 0.1l/100km ∼ 2.32g/km CO2

In a premium car under real driving conditions, this equals a consumption of

1kW ∼ 1l/100km ∼ 23.2g/km CO2

up to a maximum of2kW ∼ 2l/100km ∼ 46.4g/km CO2

for the electronic based vehicle functions.


21 August 2015


[root@node01 ~]# use_cpufreq -h#* HW-Threads Frequency Tool based on ’cpufreq’ kernel module#* High Performance Computing Center Stuttgart (HLRS)#* University of Stuttgart#* Mail bugs to: [email protected]

Options include:-p <int>: cpu id (default 0)-c <int>: command id(see below)

0 - do nothing (nop)1 - print possible governor2 - print possible frequencies3 - set "ondemand" governor4 - set "userspace" governor5 - print current frequency6 - print current governor (use cat /sys/devices/system/cpu/<metric>

-f <int>: set frequency by its index-h : show this help text and exit

Examples:-p 0 -c 4: set "userspace" governor on the first cpu-p 0 -f 0: set the lowest frequency on the first cpu-p 0 -c 4: set "ondemand" governor on the first cpu

Listing 13: Help text provided by the cpufreq utility.

Task 3.2 is about the analysis of the energy consumption of Embedded Clouds to investigate thepotential for optimization. This analysis shall help the dynamic resource allocation algorithms fromWork Package 2 to find an optimal resource allocation for the applications.


Analysis of Energy Consumption of ECUs Today, an upper class car contains up to 100 and moreelectronic control units (ECUs) that are interconnected by different bus systems. These ECUs aredistributed across multiple functional domains, e.g. windows, doors, mirror adjustment, control engine,airbags, cruise control, steering, lighting, opening system, safety, et cetera. Of all these ECUs, theone with the biggest micro-controller is the ECU for the engine control. Table 5 gives an overview oncommon ECUs and their respective power consumption. As stated above, a gasoline direct injectionrequires 24.6W, and a multi fuel system 29.5W. Even an engine control unit that consumes 120W.

Smaller ECUs, e.g. the seat- or sunroof adjuster or the window lifer control, require on average up to2W. Today, all these ECUs are “always on” and participate in the communication on the buses; eventhough they are only needed during short time intervals. Thus, an alternative to save energy is eitherto switch off single ECUs or a whole bus segment. Here, the shutdown of a single ECU seems tobe more promising, at least if many ECUs with different functionalities are attached to the bus andthese ECUs have to be woken up at different (and not overlapping) times. Audi for example confirmedby measurements that the ECU responsible for the trunk lid is only needed in 3% of the total time10.

10ATZ elektronik, “Titelthema Energieeffizienz”, p. 10-15, vol. 01/2012


Page 35


ECU Power [W]

Engine Control 30WStiring 14WAnti-Blocking System (ABS) 7WAdaptive Cruse Control (ACC) 30WParking Helper 15WLane Change 15WClimate 7WLighting Control 10WBody Computer 10WDoor 24WTempomat 30WSpeed Measurement 6WCombi Instrument 30W

Average 17.5W

Table 5: Typical power consumption for some larger ECUs.

However, the current hardware technology does not sufficiently support the shutdown of a single ECU,since it is currently not possible to send a wake-up packet to the transceiver that afterwards wakes upthe whole ECU. Nevertheless, BOSCH would be interested in profiling the metrics defined in Table 6for further analysis.

Analysis of Energy Consumption Within a Network of ECUs To be able to switch off a wholebus segment, it must be ensured that the bus segment contains only ECUs with related functionality:the whole infotainment sub-system, for example. Otherwise, the different functional requirementsmight prevent the shutdown or only allows it very seldom.

The AUTORSAR consortium discusses two different concepts to switch off (parts of) a network:partial networking and pretended networking. In partial networking, an ECU shall be put to sleep andwoken up by network messages sent by other nodes in the network. Thus, the hardware, i.e. the networktransceiver has to provide the feature to send and receive such messages. Currently, this concept is onlysupported in the CAN bus and the new Partial Networking-enabled CAN-Transceivers that support thenecessary wakeup detection feature, e.g. the NXP stand-alone TJA1145 CAN transceiver. However,these transceivers are not portable to other bus systems e.g. FlexRay. In pretended networking, ECUitself decides when it goes to sleep and becomes inactive. Nevertheless, the node still reacts to networkmessages and thus hides its own sleep state, i.e., the node pretends to be fully awake. With this feature,pretended networking is compatible to the current networking hardware. Only the micro-controllerand the peripherals are switched to a low-power mode, while the communication controller and thetransceiver stay awake.

4.2.2 Conclusions

BOSCH did an analysis of the energy consumption of a TriCore microprocessor that is today used torun the engine control software. As such, this ECU is typically the most powerful ECU within a car.


21 August 2015


General Maximum Core Load of ECU Life Time

ISR / Task The number of calls since last reset of the ECU.The maximal net runtime of the task since last reset of the ECU.The minimal net runtime of the task since last reset of the ECU.The net runtime from the last cycle of the task.The maximal response time of the task since last reset of the ECU.The minimal response time of the task since last reset of the ECU.The response time from the last cycle of the task.The load caused by the task in the last elapsed time frame.The number of calls in the last time frame.The number of memory read/write accesses.The number of stall cycles for data and instruction memory accesses.Memory consumption RAM/Flash.The average net runtime of the task or ISR in the last elapsed time frame.

Stack Maximum fill rate of Stack since last reset of the ECU.Maximum fill rate of CSA during current Driving Cycle.Maximum fill rate of CSA during ECU Life Time.

Error The number of errors detected by the OS.

Table 6: List of desired metrics to be measured for the engine control software use case.

The analysis showed that the TriCore processor requires at most about 540mW, but a whole enginecontrol ECU demands about 30W in total. Furthermore, there is no possibility to switch off the wholeECU while the engine is running. Therefore, and since the use case provided by BOSCH is an enginecontrol software, BOSCH concludes that there is currently no potential for energy saving by dynamicresource allocation, at least in the engine control system.

Nevertheless, a large number of other ECUs have lower power demands. Furthermore, ECUs do nothave to be fully-featured functioning the entire operating time: This includes ECUs for the windowlifter, or the seat or mirror adjustment. For these ECUs, the AUTORSAR standard already discussesdifferent possibilities to save energy by switching off single ECUs or a whole sub-system of thenetwork. Here, BOSCH sees a larger potential for dynamic resource allocation and will accompanythe development in this direction during the project.

4.3 Video Processing Domain (RheonMedia)

Depending on the class of the product developed to utilise DREAMCLOUD technology, shall governwhether it is practical for dynamic power management to be utilised. There are many variablesassociated like costs, customer use-cases and placement in the home of the end product. For exampleis the device class a gateway product that is always-on like a Wi-Fi/ADSL Router & TV Streamingservices or a Satellite only product Either cases, according to EU Regulation 1275/2008/EC, consumerelectronics household IT equipment have to fulfil a maximum power consumption of 0.5 Watt in OFFmode and 1 Watt in passive standby mode for devices, since January 2013. Today’s consumer electronicproducts are meeting this criteria however the EU are looking to further reduce this requirement level,equivalent to mobile devices which are hitting 0.03 Watts like the iPhone 4 in standby mode.


Page 37


Metric Description

SessionID A unique session value from the Host

HostTypeID Relating to the general categories of the Host, Server, Embedded Device, Raspberry PI etc.

UserID The individual User identifier based on the client making the video stream request

DeviceState An array of running states and specific stream information as required; specific streammetrics are detailed in Deliverable D3.4 [34]State The state of the device, Standby, WakeOnLan, Running, PassiveVoltage The amount of volts (V)Power The amount of power being consumed (W)Current The amount of amps being consumed (A)

NoActiveStreams The number of concurrent streams provided at a given time

Table 7: Key fields to describe how video is consumed.

During execution, feedback of the metrics into the DPM will assist in understanding the differentscenarios exhibited by users and understand where power considerations can be.


In controlled conditions, metrics shall be collected, that allow better understanding of usage patterns.Usage patterns can be derived from two key sources, the end user and the broadcaster.

The patterns allow profiling different behaviours across time, devices and the content chosen by theviewer to watch. The broadcaster source is the type of content being watched. Depending on the typeof content that has been encoded by the broadcaster, i.e. Sports versus Movie content, it will have aneffect on the transcoding resource requirements and hence the amount of processing it will requireand subsequently the power consumption.

Table 7 highlights the key metrics that provides insight into how video is being viewed within thehome. Some if not all can be used to alter the profiling characteristics of DREAMCLOUD and thereforeenabling potential saving in the power consumption by lowering the resource requirement or bymigration.

In order to utilise the DPM, other than APIs to submit metrics into the system, analysis APIs arerequired to provide averaging of key metrics over a period of time. In this context, the EXCESSMONITORING FRAMEWORK API was extended to support deployment plans and to compute extendedstatistics for fields including the power consumption value (cf. Appendix 6 on page 45).


One of the key migration strategies is to utilise the metrics captured, run comparative split testing onmodified application workflow profiles and review if any users had been impacted, whether objectivelymeasured or a subjective assessment is carried out. So although a runtime dynamic updating heuristicsmodels based on embedded workflows at this stage is infeasible to support at this stage of the product


21 August 2015


Metric Description

AppID A unique identifier for an application (e.g., CNRS_1)DeploymentID A unique identifier for a deployment plan (e.g., CNRS_D1)ExecutionTime Declares the execution time of a taskCosts Costs linked to the execution of a taskEnergy Energy consumed by the simulationDeployFindTime Time taken to find the deployment plan given by Deployment IDArrivalTime Time of task arrival

Table 8: Metric data available after simulation and sent to the monitoring database.

development, a software upgraded workflow model would be supported on specific profiles in thefuture.

4.3.3 Roadmap

Initial Monitoring API integration on a standard transcoder has been started to load metrics into theDPM. During integration of the heuristics and different hardware platforms, ongoing analysis andrefinements shall be carried out towards Deliverable 6.4 in PM 36.

4.4 Evaluation Platform (CNRS)

As a fast, modular and flexible evaluation framework, DREAMCLOUD’s evaluation platform allowsdesigners to build and evaluate different resource allocation heuristics for embedded multi-coresystems and cloud/HPC systems (cf. D5.2 [35]). The framework has been validated for the scalability,and it performs well for the system sizes which are expected to be available in the market till 2020 [10].

BOSCH has already shown that dynamic power management isn’t feasible for the automotive usecase. However, DREAMCLOUD’s evaluation platform could be employed to verify that optimizeddeployments yield better performance and lead to saving energy. Whereby HLRS follows an onlinescenario, where deployment plans are created at run-time, the evaluation framework would rely oncreating optimized deployment plans offline, i.e., subsequent to the simulation.

The remainder of this section reports on first integration efforts already done by re-using a key compo-nent of the proposed dynamic scheduling framework: the EXCESS MONITORING FRAMEWORK.


Simulation reports are generated for completion time, energy consumption, average packet latency(AVL11), and inter-core communication volume. The data is produced by the framework in theform of common text files. Besides, detailed traces of Network-on-a-Chip (NoC12) traffic as well as

11The average packet latency is an estimation of the current network throughput.12The term NoC refers to a distributed network of compute resources that is configured as an on-chip network. We

would like to refer the interested reader to an overview on advantages and disadvantages of NoCs [3].


Page 39


runnable execution are reported via CSV files. Additionally, the simulation waveforms are producedin GTKWave format to enable a detailed analysis of scheduling and mapping techniques. GTKWaveis a common waveform viewer for UNIX [43]; it allows to analyse the signal transitions over time.We kindly ask the interested reader to refer to Deliverable D5.2 to read more on these metrics [35].

The mentioned performance and energy metrics could be exploited for dynamic power managementto minimize energy consumption. Table 8 lists a subset of these metrics as supported by the prototypeimplementation, whereas Table 9 includes keys to describe deployment plans.

HLRS provided CSV parsers to transform and submit the data to the monitoring database. In thisrespect, the EXCESS MONITORING FRAMEWORK is extended by deployment plan support, as wellas substantial statistics. Statistics are computed for individual tasks or entire workflows across allexperiments. That way, clients can request, for instance, the selected deployment plan of an experimenthaving the lowest energy consumption. Listing 14 shows a server response including statistics for themetric ExecutionTime. The additional functionality concerning deployment plans and statistics isdetailed in the Appendix 6.


It is planned to implement the following scenario: A simulation creates a set of output files containingboth performance and energy counters as described in the previous section. Next, these files areparsed and sent to the monitoring database of the EXCESS MONITORING FRAMEWORK. TheEXCESS MONITORING FRAMEWORK then holds this data for post-analysis; this could be performedby the existing HEURISTIC MANAGER. Since we have already implemented interfaces between theEXCESS MONITORING FRAMEWORK and the HEURISTIC MANAGER, we can simply re-use theAPIs for this use case. Eventually, the HEURISTIC MANAGER processes the performance and energyprofiles together with the previous deployment plans to create an optimized deployment plan for thenext simulation of a given workflow. That way, we could verify if deployment plans created by theHEURISTIC MANAGER lead to optimal allocations in terms of performance and energy consumption.

Metric Description

DeploymentID A unique identifier for a deployment plan (e.g., CNRS_D1)NumUsedCores Number of used coresTaskID A unique identifier to describe a taskAllocation List of Task IDs to be allocated on the same core of a given nodePModes List of P-Modes linked to Task IDs

Table 9: Key fields to describe a deployment plan. It should be noted that the field Allocationsmaintains a list of Task ID, and that each P-Mode in PModes references the Task ID inAllocations having the same index.


21 August 2015


4.4.3 Roadmap

This section outlined a possible approach to integrate existing components into DREAMCLOUD’sevaluation platform. We will continue the integration, and if this approach proves beneficial, we willreport on results in Deliverable D5.4 in PM30.


Page 41


$ curl -XGET http://mf.excess-project.eu:3030/dreamcloud/mf/statistics/CNRS_1?metric=ExecutionTime

{"workflow": {"href": "http://mf.excess-project.eu:3030/dreamcloud/mf/workflows/cnrs_1"

},"metric": "ExecutionTime","statistics": {"count": 4,"min": 35,"max": 50,"avg": 42.5,"sum": 170,"sum_of_squares": 7350,"variance": 31.25,"std_deviation": 5.5901699437494745,"std_deviation_bounds": {

"upper": 53.680339887498945,"lower": 31.31966011250105

}},"min": {"plugin": "statistics","ExecutionTime": 35,"Energy": 84,"DeployFindTime": 0.044,"Value": 270,"DeploymentID": "CNRS_D1","ArrivalTime": 2,"@timestamp": "2015-07-27T11:27:59.241"

},"max": {"plugin": "statistics","ExecutionTime": 50,"Energy": 84,"DeployFindTime": 0.024,"Value": 150,"DeploymentID": "CNRS_D4","ArrivalTime": 2,"@timestamp": "2015-07-27T11:27:59.288"

}}

Listing 14: Statistics computed for the metric ExecutionTime as returned by the EXCESSMONITORING FRAMEWORK for the workflow CNRS_1.


21 August 2015


5 Conclusions

This deliverable has proposed DREAMCLOUD’s dynamic scheduling framework. We will give a briefsummary on our main contributions, followed by next steps planned.

The proposed framework reduces the energy consumption of dynamic application workflows; itfocuses in particular on HPC and embedded systems. We have motivated the need for a novelscheduling framework to cope with the requirements imposed by DREAMCLOUD. An overview ofwell-established energy-aware scheduling frameworks showed that they assume, for simplicity, thattask characteristics do not change on the same infrastructure. However, we have demonstrated that thisis not the case. Network congestion, resource contention as well as allocating tasks in each executionto different compute resources likely lead to different application characteristics at run-time on thesame infrastructure. As a consequence, an energy-aware scheduling framework has to account forthese events at run-time through continuously monitoring tasks. Moreover, it has to quickly react ondeviations from expected run-time behaviour. Our novel dynamic scheduling framework does exactlythat. This is achieved through

• continuously monitoring tasks at run-time, and collecting both performance and energy metrics.The data is stored for further analysis in a database. As a result, varying task behaviour isrecorded in the so-called performance and energy profiles for each task (cf. Section 3.3.1).This information can then be exploited by the HEURISTIC MANAGER to improve and adaptupcoming deployment plans for the tasks at hand.

• a control feedback service to quickly react on deviations from estimated execution times (cf. Sec-tion 3.3.2). In order to be effective, a progress monitor service provides up-to-date informationon the current progress of individual tasks.

Moreover, optimized deployment plans allow for better slack reclamation. Also, dynamic slackreclamation can be exploited through the mentioned feedback control mechanism at run-time. Bothmechanisms combined allow us to reduce energy consumption for dynamic application workflows bysetting reasonable CPU frequencies on a per-core basis (cf. Section 4.1).

The EXCESS MONITORING FRAMEWORK introduced in Deliverable D3.1 was further developedin cooperation with the EXCESS project, and now includes the following additional features that gobeyond what we have promised in D3.1:

• Profiling of individual tasks of a dynamic application workflow.• Additional plug-in support including energy-related plug-ins such as RAPL.• Extensive monitoring API based on a RESTful service to provide experimenters and clients

with a rich interface to access data on workflows, experiments, deployment plans, performanceprofiles, energy profiles, and statistics.

• A light-weight user library available in C and Python to send and retrieve data from themonitoring database. The libraries built upon the monitoring API, which is also used by theHEURISTIC MANAGER for communication.

Finally, we have presented roadmaps for each of DREAMCLOUD’s use cases to adopt the proposeddynamic scheduling framework. Although each use case has specific requirements, often a substantialsubset of the components and services of the generic framework could be re-used.


Page 43


Outlook and Future Work. The next steps include the refinement of the current architecture.Moreover, each component has to be evaluated to ensure that it is low-intrusive at run-time, and doesnot interfere with workflows at execution. In particular, the communication between the SCHEDULING

ADVISOR and the HEURISTIC MANAGER as well as switching CPU frequencies on a per-core basisis required to guarantee a low latency [14]; we will not gain any energy savings if both interactionsconsume too much time.

We will report on the development of the proposed dynamic scheduling framework in Deliverable D6.5.Furthermore, we will present experiments to evaluate the proposed dynamic scheduling frameworkwithin the HPC domain against individual optimization criteria including performance and energy.

Moreover, we will continue actively the integration process with other partners to present resultsacross all use cases in the deliverables of Work Package 6 on integration and validation.


21 August 2015


6 Appendix - Monitoring API

This section presents the monitoring API, which was developed during the course of Work Package 3as an extension to the existing EXCESS MONITORING FRAMEWORK.

WorkflowsGET /dreamcloud/mf/workflows

PUT /dreamcloud/mf/workflows/:id

GET /dreamcloud/mf/workflows/:id

ExperimentsGET /dreamcloud/mf/experiments

POST /dreamcloud/mf/experiments/:id

GET /dreamcloud/mf/experiments/:id

Performance ProfilesGET /dreamcloud/mf/profiles/:id

GET /dreamcloud/mf/profiles/:id/:id

GET /dreamcloud/mf/profiles/:id/:id/:id

Energy ProfilesGET /dreamcloud/mf/energy/:id/:id

GET /dreamcloud/mf/energy/:id/:id/:id

Deployment PlansGET /dreamcloud/mf/deployments

POST /dreamcloud/mf/deployments

GET /dreamcloud/mf/deployments/:id

PUT /dreamcloud/mf/deployments/:id

MetricsPOST /dreamcloud/mf/metrics

POST /dreamcloud/mf/metrics/:id/:id

StatisticsGET /dreamcloud/mf/statistics/:id

GET /dreamcloud/mf/statistics/:id/:id


Page 45


GET /dreamcloud/mf/workflows

Gets a list of workflows.

URL Parameters

Parameter Required Description

details no Retrieve more detailed information

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/workflows$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/workflows?details

Example Responses

{"ms2_v2": {

"href": "/dreamcloud/mf/workflows/ms2_v2"},"power_stream": {

"href": "/dreamcloud/mf/workflows/power_stream"}

}

GET /dreamcloud/mf/workflows

{"ms2_v2": {

"application": "MS2","version": "v2","author": "Thomas Baumann","tasks": [

"T1.V1.job","T2.1.V1.job","T2.2.V1.job","T2.3.V1.job","T3.V1.job"

]},"power_stream": {

"application": "POWER_STREAM","version": "v1","author": "Dmitry Khabi"

}}

GET /dreamcloud/mf/workflows?details


21 August 2015


PUT /dreamcloud/mf/workflows/:id

Registers a new workflow under the given workflow ID.

Remarks

It should be noted that each workflow needs to be registered only once. Registering a workflow underthe same workflow ID results in updating the original description.

Keys including version, author, and tasks, are optional. If present, they allow for bettersearch and presentation of workflows within a graphical front-end. If tasks is present in the caseof a workflow-based application, then it enables the server to associate tasks with the correspondingworkflow. Please ensure that the name equals the task ID.

Example Requests

$ curl -XPUT "mf.excess-project.eu:3030/dreamcloud/mf/workflows/ms2_v2" -d ’{"application": "Molecular Dynamics Simulation (MS2)","author": "Thomas Baumann""tasks": [

{"name": "T1.V1.job","next": "T2.1.V1.job"

},{

"name": "T2.1.V1.job","next": "T3.V1.job"

},{


},{


},{

"name": "T3.V1.job"},

]}’

Example Responses

{"href": "/dreamcloud/mf/workflows/ms2_v2"

}

/dreamcloud/mf/workflows/ms2_v2


Page 47


GET /dreamcloud/mf/workflows/:id

Gets information on the given workflow ID.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/workflows/ms2_v2$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/workflows/power_stream

Example Responses

{"application": "MS2","version": "v2","author": "Thomas Baumann","tasks": [

{"name": "T1.V1.job","type": "enter","next": "T2.1.V1.job"

},{


},{


},{


},{

"name": "T3.V1.job","type": "exit"

}]

}

GET /dreamcloud/mf/workflows/ms2_v2

{"application": "POWER_STREAM","version": "v1","author": "Dmitry Khabi"

}

GET /dreamcloud/mf/workflows/power_stream


21 August 2015


GET /dreamcloud/mf/experiments

Gets a list of experiment IDs.

URL Parameters


details no Retrieve more detailed information

workflows no Comma-separated list of workflow IDs tofilter results by the given workflows

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/experiments$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/experiments?details$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/experiments?workflows=

ms2_v2$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/experiments?workflows=

ms2_v2&extends=tasks

Example Responses

{"AUv-XWfndvv5tJC7PpCF": {

"href": "/dreamcloud/mf/experiments/AUv-XWfndvv5tJC7PpCF?workflow=ms2_v2"},"AUv-E2iydvv5tJC7Pnfb": {

"href": "/dreamcloud/mf/experiments/AUv-E2iydvv5tJC7Pnfb?workflow=ms2_v2"},"AUvqSatCdvv5tJC7CGwB": {

"href": "/dreamcloud/mf/experiments/AUvqSatCdvv5tJC7CGwB?workflow=ms2_v2"}

}

GET /dreamcloud/mf/experiments


Page 49


{"AUv-XWfndvv5tJC7PpCF": {

"user": "hpcdhopp","description": "Profiling MS2 (run_workflow.sh)","date": "2015-03-09","started": "2015.03.09-12.48.29","workflow": "ms2_v2"

},"AUv-E2iydvv5tJC7Pnfb": {


},"AUvqSatCdvv5tJC7CGwB": {


}}

GET /dreamcloud/mf/experiments?details

{"AU1wOF2dULel9aBS0dX4": {

"href": "/dreamcloud/mf/experiments/AU1wOF2dULel9aBS0dX4?workflow=power_stream"

},"AU1wrurmULel9aBS0dX7": {

"href": "/dreamcloud/mf/experiments/AU1wrurmULel9aBS0dX7?workflow=power_stream"

},"AU1wr3SZULel9aBS0dX9": {

"href": "/dreamcloud/mf/experiments/AU1wr3SZULel9aBS0dX9?workflow=power_stream"

},"AU1sYGU1ULel9aBS0dX3": {

"href": "/dreamcloud/mf/experiments/AU1sYGU1ULel9aBS0dX3?workflow=power_stream"

}}

GET /dreamcloud/mf/experiments?workflows=power_stream


21 August 2015


POST /dreamcloud/mf/experiments/:id

Creates a new experiment for a given workflow.

Example Requests

$ curl -XPOST "mf.excess-project.eu:3030/dreamcloud/mf/experiments/ms2_v2" -d ’{"user": "hpcdhopp","description": "Testing the MS2 application"

}’

Example Responses

{"AU2QVWvtULel9aBS1b8G": {

"href": "/dreamcloud/mf/experiments/AU2QVWvtULel9aBS1b8G?workflow=ms2_v2"}

POST /dreamcloud/mf/workflows/ms2_v2


Page 51


GET /dreamcloud/mf/experiments/:id

Gets information about a particular experiment ID.

URL Parameters


workflow yes Valid workflow ID associated with theexperiment.

extends no Comma-separated list of additional keys toretrieve. Available keys are: tasks.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/mf/experiments/AUv-XWfndvv5tJC7PpCF?workflow=ms2_v2

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/experiments/AUv-XWfndvv5tJC7PpCF?workflow=ms2_v2&extends=tasks

Example Responses

{"user": "hpcdhopp","description": "Profiling MS2 (run_workflow.sh)","timestamp": "2015.03.09-12.48.29"

}

GET /mf/experiments/AUv-XWfndvv5tJC7PpCF?workflow=ms2_v2

{"user": "hpcdhopp","description": "Profiling MS2 (run_workflow.sh)","timestamp": "2015.03.09-12.48.29","tasks": [

"T1.V1.job","T2.1.V1.job","T2.2.V1.job","T2.3.V1.job","T2.4.V1.job","T2.5.V1.job","T3.V1.job"

]}

GET /mf/experiments/XWfndvv5tJC7PpCF?workflow=ms2_v2&extends=tasks


21 August 2015


GET /dreamcloud/mf/profiles/:id

Returns a list of references to available experiments for a given workflow ID.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/profiles/ms2_v2

Example Responses

{"t2.1.v1.job": {

"AU1wwhTZULel9aBS0dYB": {"href": "/dreamcloud/mf/profiles/ms2_v2/t2.1.v1.job/

AU1wwhTZULel9aBS0dYB"}

},"t2.2.v1.job": {



}}

GET /dreamcloud/mf/profiles/ms2_v2


Page 53


GET /dreamcloud/mf/profiles/:id/:id

Returns a list of references to available experiments for a given workflow ID and task ID. Results aresorted by the date the experiments started.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/profiles/ms2_v2/t2.2.v1.job

Example Responses

{"2015.05.20": [



]}

GET /dreamcloud/mf/profiles/ms2_v2/t2.2.v1.job


21 August 2015


GET /dreamcloud/mf/profiles/:id/:id/:id

Returns a performance profile for a specific task execution described by three elements: workflow ID,task ID, and experiment ID.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/profiles/ms2_v2/t2.2.v1.job/AU3TtOOaYHjgymAd2i5T

Example Responses

[{

"@timestamp": "2015-06-08T17:06:10.740","host": "node02.excess-project.eu","task": "T2.1.V1.job","type": "memory","mem_avail": 73.87,"mem_used": 26.13

},{

"@timestamp": "2015-06-08T17:06:11.932","host": "node02.excess-project.eu","task": "T2.1.V1.job","type": "memory","mem_avail": 73.87,"mem_used": 26.13

},...

]

GET /dreamcloud/mf/profiles/ms2_v2/t2.2.v1.job/AU3TtOOaYHjgymAd2i5T


Page 55


GET /dreamcloud/mf/energy/:id/:id

Returns collected energy profiles for each task of a given workflow ID and experiment ID. Energymetrics are collected through the hw_power.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/energy/ms2_v2/AU1wwhTZULel9aBS0dYB

Example Responses

{"t2.1.v1.job": [{

"@timestamp": "1435586890.28","type": "pwm","CPU1_node02": 48,"ATX12V_node02": 193,"CPU2_node02": 48

},..

]"t2.2.v1.job": [{


},..

]}

GET /dreamcloud/mf/energy/ms2_v2/AU1wwhTZULel9aBS0dYB


21 August 2015


GET /dreamcloud/mf/energy/:id/:id/:id

Returns an energy profile for a specific task execution described by three elements: workflow ID, taskID, and experiment ID. Energy metrics are collected through the hw_power.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/energy/ms2_v2/t2.1.v1.job/AU3TtOOaYHjgymAd2i5T

Example Responses

[{


},{


},...

]

GET /dreamcloud/mf/energy/ms2_v2/t2.1.v1.job/AU3TtOOaYHjgymAd2i5T


Page 57


GET /dreamcloud/mf/deployments

Gets a list of deployment IDs.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/deployments

Example Responses

{"cnrs_d1": {

"href": "http://mf.excess-project.eu:3030/dreamcloud/mf/deployments/cnrs_d1"

},"rm_stream_d1": {

"href": "http://mf.excess-project.eu:3030/dreamcloud/mf/deployments/rm_stream_d1"

}}

GET /dreamcloud/mf/deployments


21 August 2015



Inserts a batch of deployment plans.

Example Requests

$ curl -XPOST "mf.excess-project.eu:3030/dreamcloud/mf/deployments" -d ’{[

{"numUsedCores":1.0,"DeploymentID":"CNRS_D4","@timestamp":"2015-08-12T10:38:48.540","allocations":{

"t4":{"node":"node01","core":0,"pmode":"P3"

},"t2":{

"node":"node01","core":0,"pmode":"P3"

},"t3":{


},"t1":{


}}

},..

]}’

Example Responses

[http://mf.excess-project.eu:3030/dreamcloud/mf/deployments/cnrs_d1,http://mf.excess-project.eu:3030/dreamcloud/mf/deployments/cnrs_d2,..

]



Page 59


GET /dreamcloud/mf/deployments/:id

Gets information about a particular deployment plan given the deployment ID.

Example Requests

$ curl -XGET mf.excess-project.eu:3030/dreamcloud/mf/deployments/rm_stream_d1

Example Responses

{"@timestamp": "2015-07-31T19:38:13.608","allocations": {

"t4": {"node": "winterfell","cores": [ 1, 2 ],"pmode": "P1"

},"t2": {

"node": "winterfell","cores": [ 1, 2, 3, 4 ],"pmode": "P0"

},"t3": {


},"t1": {

"node": "winterfell","cores": [ 1, 2 ],"pmode": "P4"

}}

}

GET /dreamcloud/mf/deployments/rm_stream_d1


21 August 2015


PUT /dreamcloud/mf/deployments/:id

Inserts a new deployment plan for the given ID.

Example Requests

$ curl -XPUT "mf.excess-project.eu:3030/dreamcloud/mf/deployments/rm_stream_d1"-d ’{

{"@timestamp": "2015-07-31T19:38:13.608","allocations": {

"t4": {"node": "winterfell","cores": [ 1, 2 ],"pmode": "P1"

},"t2": {


},"t3": {


},"t1": {

"node": "winterfell","cores": [ 1, 2 ],"pmode": "P4"

}}

}}’

Example Responses

{"rm_stream_d1": {

"href": "http://mf.excess-project.eu:3030/dreamcloud/mf/deployments/rm_stream_d1"

}}

PUT /dreamcloud/mf/deployments/rm_stream_d1


Page 61


POST /dreamcloud/mf/metrics

Inserts a set of metrics to the database; individual plug-ins are resprensented as JSON objects withinan array. It should be noted that both a workflow and an experiment has to be registered at the database,first.

Example Requests

$ curl -XPOST "mf.excess-project.eu:3030/dreamcloud/mf/metrics" -d ’{[{

"WorkflowID": "CNRS_1","plugin": "statistics","ExecutionTime": 35.0,"Energy": 84.0,"DeployFindTime": 0.044,"Value": 270.0,"DeploymentID": "CNRS_D1","ExperimentID": "AU8dFoYU7ZJPcdLgl2B-","ArrivalTime": 2.0,"@timestamp": "2015-08-11T16:08:40.630"

},{

"WorkflowID": "CNRS_1","plugin": "statistics","ExecutionTime": 40.0,"Energy": 83.0,"DeployFindTime": 0.034,"Value": 250.0,"DeploymentID": "CNRS_D2","ExperimentID": "AU8dFoYn7ZJPcdLgl2B_","ArrivalTime": 2.0,"@timestamp": "2015-08-11T16:08:40.650"

},..

]}’

Example Responses

[http://mf.excess-project.eu:3030/dreamcloud/mf/profiles/cnrs_1/all/

AU7Pz6nNJZJIh9Rd1UVr,http://mf.excess-project.eu:3030/dreamcloud/mf/profiles/cnrs_1/all/

AU7Pz6nmJZJIh9Rd1UVs,..

]

POST /dreamcloud/mf/metrics/ms2_v2


21 August 2015


POST /dreamcloud/mf/metrics/:id/:id

Inserts a new metric to the database for a given workflow ID and experiment ID.

URL Parameters


task no A valid task ID.

Example Requests

$ curl -XPOST "mf.excess-project.eu:3030/dreamcloud/mf/metrics/cse/AU8dPM6x7ZJPcdLgl3rZ" -d ’{

"plugin": "simulation","@timestamp": "2015-08-11T16:08:40.650","Packet ID": 11876,"Priority": 5,"Info": 0,"Source": [ 10, 0 ],"Destination": [ 0, 0 ],"Injection Time(NS)": 62597,"Delivery Time(NS)": 62865,"Packet Latency(NS)": 268

}’

$ curl -XPOST "mf.excess-project.eu:3030/dreamcloud/mf/metrics/cse/AU8dPM6x7ZJPcdLgl3rZ?task=t1" -d ’{

"plugin": "simulation","@timestamp": "2015-08-11T16:08:41.650","Packet ID": 11898,"Injection Time(NS)": 62617,"Delivery Time(NS)": 62816,"Packet Latency(NS)": 275

}

Example Responses

{"AU8dPM-m7ZJPcdLgl3rb": {

"href": "http://mf.excess-project.eu:3030/dreamcloud/mf/profiles/cse/all/AU8dPM6x7ZJPcdLgl3rZ"

}}

POST /dreamcloud/mf/metrics/cse/AU8dPM6x7ZJPcdLgl3rZ


Page 63


GET /dreamcloud/mf/statistics/:id

Returns a set of statistics computed for the given metric. The basis of calculation are all experimentsperformed for the workflow ID at hand.

URL Parameters


metric yes A valid metric name the computation is based on.filter no Filter results by a valid field name, e.g.,

host_type.from no Start date for the computation (Elasticsearch time

format).to no End date for the computation (Elasticsearch time

format).

Available Statistics

Name of statistic Comment

count Number of experiments stored in the databasemin Minimum valuemax Maximum valueavg Mean valuesum Sum of all valuessum_of_squares Sum of squaresvariance Variancestd_deviation Standard deviationstd_deviation_bounds Standard deviation bounds (upper and lower)

Example Requests

$ curl -XGET http://mf.excess-project.eu:3030/dreamcloud/mf/statistics/rm_stream_1?metric=execution_time

$ curl -XGET http://mf.excess-project.eu:3030/dreamcloud/mf/statistics/rm_stream_1?metric=execution_time &filter=host_type%3D%3DVM&from=2015-07-31T13:00&to=2015-07-31T20:00


21 August 2015


Example Responses

{"workflow": {

"href": "http://mf.excess-project.eu:3030/dreamcloud/mf/workflows/rm_stream_1"

},"metric": "execution_time","statistics": {

"count": 4,"min": 16,"max": 84,"avg": 54.25,"sum": 217,"sum_of_squares": 14169,"variance": 599.1875,"std_deviation": 24.47830672248389,"std_deviation_bounds": {

"upper": 103.20661344496779,"lower": 5.293386555032221

}},"min": {

"task": "t1","execution_time": 16,"@timestamp": "2015-07-31T19:38:13.766","host_type": "VM","agent": "agent_t1","host": "winterfell","energy": 76

},"max": {

"task": "t3","execution_time": 84,"@timestamp": "2015-07-31T19:38:15.677","host_type": "PI","agent": "agent_t3","host": "winterfell","energy": 64

}}

GET /dreamcloud/mf/statistics/rm_stream_1


Page 65


GET /dreamcloud/mf/statistics/:id/:id

Returns a set of statistics computed for the given metric. The basis of calculation are all experimentsperformed for the given task ID and workflow ID.

URL Parameters


metric yes A valid metric name the computation is based on.filter no Filter results by a valid field name, e.g.,

host_type.from no Start date for the computation (Elasticsearch time

format).to no End date for the computation (Elasticsearch time

format).

Available Statistics

Name of statistic Comment

count Number of experiments stored in the databasemin Minimum valuemax Maximum valueavg Mean valuesum Sum of all valuessum_of_squares Sum of squaresvariance Variancestd_deviation Standard deviationstd_deviation_bounds Standard deviation bounds (upper and lower)

Example Requests

$ curl -XGET http://mf.excess-project.eu:3030/dreamcloud/mf/statistics/rm_stream_1/t1?metric=execution_time

$ curl -XGET http://mf.excess-project.eu:3030/dreamcloud/mf/statistics/rm_stream_1/t1?metric=execution_time &filter=host_type%3D%3DVM&from=2015-07-31T13:00&to=2015-07-31T20:00


21 August 2015


Example Responses

{"workflow": {

"href": "http://mf.excess-project.eu:3030/dreamcloud/mf/workflows/rm_stream_1"

},"metric": "execution_time","statistics": {

"count": 1,"min": 16,"max": 16,"avg": 16,"sum": 16,"sum_of_squares": 256,"variance": 0,"std_deviation": 0,"std_deviation_bounds": {

"upper": 16,"lower": 16

}},"min": {


},"max": {


}}

GET /dreamcloud/mf/statistics/rm_stream_1/t1?metric=execution_time


Page 67



21 August 2015


References[1] Adaptive Computing Enterprises, Inc. TORQUE Resource Manager—Administrator’s

Guide. https://docs.adaptivecomputing.com/torque/5-1-1/torqueAdminGuide-5.1.1.pdf, 2015. Accessed on 2015-07-20.

[2] Basmadjian, R., de Meer, H., Lent, R., and Giuliani, G. Cloud Computing and its Interest inSaving Energy: The Use Case of a Private Cloud. Journal of Cloud Computing, 1(1):1–25, 2012.

[3] Bjerregaard, T. and Mahadevan, S. A Survey of Research and Practices of Network-on-chip. ACMComputing Surveys (CSUR), 38(1), June 2006. ISSN 0360-0300. doi: 10.1145/1132952.1132953.

[4] Brodowski, D. and Golde, N. Linux CPUFreq Governors—Information for Users and Develop-ers. https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt, 2015. Accessed on 2015-07-30.

[5] Burton, E., Schrom, G., Paillet, F., Douglas, J., Lambert, W. J., Radhakrishnan, K., Hill, M. J.,et al. FIVR—Fully integrated voltage regulators on 4th generation Intel® Core SoCs. In AppliedPower Electronics Conference and Exposition (APEC), 2014 Twenty-Ninth Annual IEEE, pages432–439. IEEE, 2014.

[6] Chetsa, G. T., Lefèvre, L., Pierson, J., Stolf, P., and Costa, G. D. Exploiting PerformanceCounters to predict and improve Energy Performance of HPC Systems. Future GenerationComputer Systems, 36:287–298, 2014. ISSN 0167-739X. doi: http://dx.doi.org/10.1016/j.future.2013.07.010.

[7] Choi, J., Govindan, S., Urgaonka, B., and Anand, S. Profiling, Prediction, and Capping ofPower Consumption in Consolidated Environments. In Modeling, Analysis and Simulation ofComputers and Telecommunication Systems (2008), pages 1–10, Washington, DC, USA, Sept2008. IEEE Computer Society. doi: 10.1109/MASCOT.2008.4770558.

[8] David, H., Gorbatov, E., Hanebutte, U. R., Khanna, R., and Le, C. RAPL: Memory PowerEstimation and Capping. In Proceedings of the 16th ACM/IEEE International Symposium onLow Power Electronics and Design, ISLPED ’10, pages 189–194, New York, NY, USA, 2010.ACM. ISBN 978-1-4503-0146-6. doi: 10.1145/1840845.1840883.

[9] Etinski, M., Corbalán, J., Labarta, J., and Valero, M. Understanding the future of energy-performance trade-off via DVFS in HPC environments. Journal of Parallel and DistributedComputing, 72(4):579–590, 2012. doi: 10.1016/j.jpdc.2012.01.006.

[10] European Commission. Monitoring CO2 Emissions from new Passenger Cars in the EU:Summary of Data for 2011. http://ec.europa.eu/clima/policies/transport/vehicles/cars/index_en.htm, 2015. Accessed on 2015-08-20.

[11] European Environment Agency. Monitoring CO2 Emissions from new Passenger Cars inthe EU: Summary of Data for 2011. http://www.eea.europa.eu/publications/monitoring-co2-emissions-from-new/at_download/file, 2012. Accessed on2015-07-20.


Page 69

https://docs.adaptivecomputing.com/torque/5-1-1/torqueAdminGuide-5.1.1.pdf

https://docs.adaptivecomputing.com/torque/5-1-1/torqueAdminGuide-5.1.1.pdf

https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt

https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt

http://ec.europa.eu/clima/policies/transport/vehicles/cars/index_en.htm

http://ec.europa.eu/clima/policies/transport/vehicles/cars/index_en.htm

http://www.eea.europa.eu/publications/monitoring-co2-emissions-from-new/at_download/file

http://www.eea.europa.eu/publications/monitoring-co2-emissions-from-new/at_download/file


[12] Felter, W., Rajamani, K., Keller, T., and Rusu, C. A Performance-conserving Approach forReducing Peak Power Consumption in Server Systems. In Proceedings of the 19th AnnualInternational Conference on Supercomputing, ICS ’05, pages 293–302, New York, NY, USA,2005. ACM. ISBN 1-59593-167-8. doi: 10.1145/1088149.1088188.

[13] Glass, C. W., Reiser, S., Rutkai, G., Deublein, S., Köster, A., Guevara-Carrion, G., Wafai, A.,Horsch, M., Bernreuther, M., Windmann, T., Hasse, H., and Vrabec, J. A Molecular SimulationTool for Thermodynamic Properties. Computer Physics Communications, 185(12):3302–3306,2014. ISSN 0010-4655. doi: http://dx.doi.org/10.1016/j.cpc.2014.07.012.

[14] Hackenberg, D., Schöne, R., Ilsche, T., Molka, D., Schuchart, J., and Geyer, R. An EnergyEfficiency Feature Survey of the Intel Haswell Processor. In The 11th Workshop on High-Performance, Power-Aware Computing in conjunction with the 29th International Parallel &Distributed Processing Symposium (IPDPS 2015), HPPAC ’15, pages 1–9, 2015.

[15] Hähnel, M., Döbel, B., Völp, M., and Härtig, H. Measuring Energy Consumption for ShortCode Paths Using RAPL. SIGMETRICS Perform. Eval. Rev., 40(3):13–17, January 2012. ISSN0163-5999. doi: 10.1145/2425248.2425252.

[16] Hoppe, D., Sandoval, Y., and Gienger, M. ATOM: A Near-Real Time Monitoring Frameworkfor HPC and Embedded Systems. In fEEDBACk Workshop on Energy Efficient Distributed andParallel Computing at the ACM Symposium on Principles of Distributed Computing, PODC ’15,pages 1–8, 2015.

[17] Liu, L., Wang, H., Liu, X., Jin, X., He, W. B., Wang, Q. B., and Chen, Y. GreenCloud: ANew Architecture for Green Data Center. In Proceedings of the 6th International ConferenceIndustry Session on Autonomic Computing and Communications Industry Session, ICAC-INDST’09, pages 29–38, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-612-0. doi: 10.1145/1555312.1555319.

[18] Mackamul, H. AMALTHEA—An Open Tool Platform for Embedded Multicore Systems.EclipseCon Europe ’13, October 2013.

[19] Obaidat, M. S., Anpalagan, A., and Woungang, I. Handbook of Green Inform. and Comm.Systems. Academic Press, 1st edition, 2012. ISBN 0124158447, 9780124158443.

[20] Patterson, M. K. The Effect of Data Center Temperature on Energy Efficiency. In 2008 11th IEEEIntersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, I-THERM, pages 1167–1174, 2008. ISBN 9781424417018. doi: 10.1109/ITHERM.2008.4544393.

[21] Pietri, I. and Sakellariou, R. Energy-Aware Workflow Scheduling Using Frequency Scaling. In43rd International Conference on Parallel Processing Workshops, ICPPW 2014, pages 104–113.IEEE Computer Society, 2014. ISBN 978-1-4799-5615-9. doi: 10.1109/ICPPW.2014.26.

[22] Pirzada, S. M. U. Intel to Abandon the Internal Voltage Regulator (IVR) with Skylake Mi-croarchitecture. http://wccftech.com/intel-abandon-internal-voltage-regulator-skylake-microarchitecture/, 2014. Accessed on 2015-08-19.


21 August 2015

http://wccftech.com/intel-abandon-internal-voltage-regulator-skylake-microarchitecture/

http://wccftech.com/intel-abandon-internal-voltage-regulator-skylake-microarchitecture/


[23] Qiu, M., Ming, Z., Li, J., Liu, S., Wang, B., and Lu, Z. Three-phase time-aware energy mini-mization with DVFS and unrolling for Chip Multiprocessors. Journal of Systems Architecture,58(10):439–445, 2012.

[24] Rountree, B., Lownenthal, D. K., de Supinski, B. R., Schulz, M., Freeh, V. W., and Bletsch,T. Adagio: Making DVS practical for complex HPC Applications. In Proceedings of the 23rdInternational Conference on Supercomputing, pages 460–469. ACM, 2009.

[25] Rusu, S., Muljono, H., Ayers, D., Tam, S., Chen, W., Martin, A., Li, S., Vora, S., Varada, R.,and Wang, E. 5.4 Ivytown: A 22nm 15-core enterprise Xeon® processor family. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International, pages102–103. IEEE, 2014.

[26] Sandoval, Y., Hoppe, D., Khabi, D., Gienger, M., Kessler, C., Li, L., Dastgeer, U., Tran, V.,Umar, I., Ha, P., Tsigas, P., et al. EXCESS: Execution Models for Energy-Efficient ComputingSystems. In fEEDBACk Workshop on Energy Efficient Distributed and Parallel Computing atthe ACM Symposium on Principles of Distributed Computing, PODC ’15, pages 1–18, 2015.

[27] Song, S., Ge, R., Feng, X., and Cameron, K. Energy Profiling and Analysis of the HPC ChallengeBenchmarks. International Journal of High Performance Computing Applications (IJHPCA), 23(3):265–276, 2009. URL http://dx.doi.org/10.1177/1094342009106193.

[28] Subramaniam, B., Saunders, W., Scogland, T., and Wu-chun, F. Trends in Energy-EfficientComputing: A Perspective from the Green500. In Green Computing Conference (IGCC), pages1–8, June 2013. doi: 10.1109/IGCC.2013.6604520.

[29] Sudan, K., Srinivasan, S., Balasubramonian, R., and Iyer, R. Optimizing Datacenter Powerwith Memory System Levers for Guaranteed Quality-of-Service. In Proceedings of the 21stInternational Conference on Parallel Architectures and Compilation Techniques, PACT ’12,pages 117–126, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1182-3. doi: 10.1145/2370816.2370834.

[30] Terpstra, D., Jagode, H., You, H., and Dongarra, J. Collecting Performance Data with PAPI-C. InMüller, M. S., Resch, M. M., Nagel, W. E., and Schulz, A., editors, Tools for High PerformanceComputing 2009, pages 157–173. Springer, 3rd Parallel Tools Workshop, Dresden, Germany,2009.

[31] The DEEP Project (Dynamical Exascale Entry Platform). Energy Efficiency in DEEP.http://www.deep-project.eu/deep-project/EN/Hardware/Energy-Efficiency/_node.html, 2015. Accessed on 2015-07-30.

[32] The DreamCloud Project Consortium. D3.1—Cloud Communications Patterns Analysis. Publicdeliverable, DreamCloud Project (FP7/2011-2014 grant agreement no 611411), 2014.

[33] The DreamCloud Project Consortium. D3.3—Energy-Aware Allocation for Clouds. Publicdeliverable, DreamCloud Project (FP7/2011-2014 grant agreement no 611411), 2015.


Page 71

http://dx.doi.org/10.1177/1094342009106193

http://www.deep-project.eu/deep-project/EN/Hardware/Energy-Efficiency/_node.html

http://www.deep-project.eu/deep-project/EN/Hardware/Energy-Efficiency/_node.html


[34] The DreamCloud Project Consortium. D3.4—Specification of the smart scheduling heuristicsfor DreamCloud. Public deliverable, DreamCloud Project (FP7/2011-2014 grant agreement no611411), 2015.

[35] The DreamCloud Project Consortium. D5.2—Abstract Transactional Simulation Platform.Public deliverable, DreamCloud Project (FP7/2011-2014 grant agreement no 611411), 2015.

[36] The DreamCloud Project Consortium. D2.3—Multi-criteria Resource Management. Publicdeliverable, DreamCloud Project (FP7/2011-2014 grant agreement no 611411), 2015.

[37] The EXCESS Project Consortium. D5.2: Prototype of an Energy-aware System based onConventional HPC Technology. Public Deliverable, The EXCESS Project (FP7/2013-2016 grantagreement no 611183), 2014.

[38] The JUNIPER Project Consortium. Java Platform for High-Performance and Real-time LargeScale Data. http://www.juniper-project.org, 2015. Accessed on 2015-08-12.

[39] Topcuouglu, H., Hariri, S., and you Wu, M. Performance-Effective and Low-Complexity TaskScheduling for Heterogeneous Computing. IEEE Trans. Parallel Distrib. Syst., 13(3):260–274,March 2002. ISSN 1045-9219. doi: 10.1109/71.993206.

[40] Treibig, J., Hager, G., and Wellein, G. LIKWID: A Lightweight Performance-oriented ToolSuite for x86 Multicore Environments. In Proceedings of the 1st International Workshop onParallel Software Tools and Tool Infrastructures, San Diego CA, 2010.

[41] Valentini, G. L., Lassonde, W., Khan, S. U., Min-Allah, N., Madani, S., Li, J., Zhang, L.,Wang, L., Ghani, N., Kolodziej, J., and Li, H. An Overview of Energy Efficiency Techniquesin Cluster Computing Systems. Cluster Computing, 16(1):3–15, 2013. ISSN 13867857. doi:10.1007/s10586-011-0171-x.

[42] Vasic, N., Barisits, M., Salzgeber, V., and Kostic, D. Making Cluster Applications Energy-aware.In Proceedings of the 1st Workshop on Automated Control for Datacenters and Clouds (ACDC

’09), page 37. ACM, 2009. ISBN 9781605585857. doi: 10.1145/1555271.1555281.

[43] Wheeler, J. and Bybell, T. GTKWave 3.3 Wave Analyzer User’s Guide. Technical report,GTKWave, 2014.

[44] Wieczorek, M., Prodan, R., and Fahringer, T. Scheduling of scientific workflows in theASKALON grid environment. ACM SIGMOD Record, 34(3):56–62, 2005.


21 August 2015

http://www.juniper-project.org

d3.2 – dynamic power management - ningapi.ning.com/.../d3.2dynamicpowermanagement.pdf · with...

Documents