power and energy modelling of multi-core processors for ...presentation:oscarpalo… · eaco...

Post on 30-Sep-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

This project and the research leading to these results has received funding from the European Community's

Seventh Framework Programme [FP7/2007-2013] under grant agreement n

318693

Santhosh Kumar Rethinagiri, Oscar Palomar, Osman Unsal, Adrian Cristal

Barcelona Supercomputing Center

Energy-Aware Computing Workshop 2014

Power and Energy Modelling of Multi-core Processors for

System-Level Design Space Exploration

EACO Workshop, Sept. 10th 2014, Bristol

ParaDIME Consortium

2

TU – Dresden (GERMANY)

IMEC- Leuven (BELGIUM)

BSC- Barcelona (SPAIN)

University of Neuchatel

(SWITZERLAND)

Cloud & Heat Technologies - Dresden

(GERMANY)

EACO Workshop, Sept. 10th 2014, Bristol

Why ParaDIME ?

Parallel Distributed Infrastructure for

Minimization of Energy

Rising cost

Hardware cost

Programming efficiency

Runtime optimization

Energy aware data center computing

3

EACO Workshop, Sept. 10th 2014, Bristol

The ParaDIME Stack

4

ParaDIME Infrastructure

Data Center

Computing Node/Stack

OS

JVM

API

Application/BM

Intra Data Center Scheduler

Simulated HW

Cores

Accelerators

Interconnect

Future Devices

JVM

Scala

AKKA

Actor Sched

Computing Node/Stack

Re

al

HW

JVMJVM

OSOS

VMVM

Hypervisor

Re

al

HW

Re

al

HW

Re

al

HW

Re

al

HW

Hyper

JVM JVM

OS

VM

Scala

AKKA

Actor Sched

API

Application/BM

Mu

lti

Da

ta C

en

ter

Sc

he

du

ler

BSC

IMEC

UNINE

TuD

Cloud & Heat Technologies

EACO Workshop, Sept. 10th 2014, Bristol 5

Challenges of modelling power of heterogeneous systems

Estimating power/energy is a critical design goal for electronic

devices.

Designers today must evaluate power estimation as early as

possible in the electronics design.

Design changes are easier in the design phase and have the

greatest impact on application power estimation at System-Level.

A platform to use different processors and components.

Functional level is accurate but it’s a course grain. Restriction in

terms of measuring power from the real board.

For fine grain, we can achieve it from gate level simulation.

Restriction applies as we don’t have the tools and RTL sources.

Very slow simulation speed.

Another challenge is power law holds for a simple processor but for

complex processor system remains debatable ?

EACO Workshop, Sept. 10th 2014, Bristol 6

Power estimation methodology and tools

McPAT

EACO Workshop, Sept. 10th 2014, Bristol

Hybrid design space exploration methodology

EACO Workshop, Sept. 10th 2014, Bristol

First step : FLPA ( Functional Level Power Analysis)

EACO Workshop, Sept. 10th 2014, Bristol

Functional block (ARM Cortex-A9)

EACO Workshop, Sept. 10th 2014, Bristol

Generic Power Model Parameters

The Parameters which influence the power in a system.

EACO Workshop, Sept. 10th 2014, Bristol

Power measurement environment

11

Courtesy: Open-People project

EACO Workshop, Sept. 10th 2014, Bristol

Variation of Instruction Per Cycle (IPC) in Power for ARM Cortex-A8

0

5

10

15

20

25

30

35

40

45

50

0 0.5 1 1.5 2

Power(mW)

Instruc onPerCycle(IPC)

Power(m

W)

EACO Workshop, Sept. 10th 2014, Bristol

Power consumption models generated with FLPA

EACO Workshop, Sept. 10th 2014, Bristol

Second Step: System Level Power Analysis

EACO Workshop, Sept. 10th 2014, Bristol

Result Interface

15

EACO Workshop, Sept. 10th 2014, Bristol

Result Interface

16

EACO Workshop, Sept. 10th 2014, Bristol

Results and Comparison (Power estimation)

EACO Workshop, Sept. 10th 2014, Bristol

Results and comparison (Energy)

EACO Workshop, Sept. 10th 2014, Bristol

Third step: Auto optimization

DVFS

Runtime – Inter task DVFS

Programmer annotation based DVFS

Work-load balancing based on task

Runtime

Programmer based request

19

EACO Workshop, Sept. 10th 2014, Bristol

Task scheduling

20

EACO Workshop, Sept. 10th 2014, Bristol

Optimization based on work load balancing

21

EACO Workshop, Sept. 10th 2014, Bristol

Optimization (Inter task DVFS)

22

EACO Workshop, Sept. 10th 2014, Bristol

Conclusion

In our tool, we have proved that our estimates are accurate.

Adaptable for any kind of complex processor system.

Added advantage, rapid prototyping of the components and porting

of the applications made easy.

Estimating power and designing applications made easy and time

efficient.

EACO Workshop, Sept. 10th 2014, Bristol

The ParaDIME Stack

24

ParaDIME Infrastructure

Data Center

Computing Node/Stack

OS

JVM

API

Application/BM

Intra Data Center Scheduler

Simulated HW

Cores

Accelerators

Interconnect

Future Devices

JVM

Scala

AKKA

Actor Sched

Computing Node/Stack

Re

al

HW

JVMJVM

OSOS

VMVM

Hypervisor

Re

al

HW

Re

al

HW

Re

al

HW

Re

al

HW

Hyper

JVM JVM

OS

VM

Scala

AKKA

Actor Sched

API

Application/BM

Mu

lti

Da

ta C

en

ter

Sc

he

du

ler

BSC

IMEC

UNINE

TuD

Cloud & Heat Technologies

EACO Workshop, Sept. 10th 2014, Bristol

Hardware Architecture

Energy-Efficient Message Passing Message passing microarchitecture

Message passing accelerator

Task passing

Operation Below Safe Vdd

Automatic HW lowering of Vdd

SW-guided (low-power annotation)

Errors?

Heterogeneous Computing Architectural level

Device level

25

EACO Workshop, Sept. 10th 2014, Bristol

Heterogeneous system-level environment

26

Memory

Bus

Task 3

Data

PETS Tool activity counter Interface

Virtual Platform

Task 1 Task 2

ARM Cortex-A9 Quad-core

ISS

ARM Cortex-A8 Quad-core

ISS

DSP C64x ISS

FPGA Hardware

Accelerator

GPU accelerator

EACO Workshop, Sept. 10th 2014, Bristol

Heterogeneous computing results and comparison

27

0

200

400

600

800

1000

1200

1400

1600

ARM Cortex-A9(quad core)

ARM Cortex-A8(quad core)

FPGA (ZynQ) DSP C64x GPU Tegra3

K-means

Ene

rgy

(J)

EACO Workshop, Sept. 10th 2014, Bristol

Message passing programming model Actor model (Akka+Scala)

Annotations to provide information to the hardware

Operation below Safe Vdd

Approximate Computing @Storage(Array("precise=false", "VF_relax=true"))

var x = 5

@Calculation(Array("VF_relax=true", "VF_det=DMR", VF_corr=TM"))

def calc(first:Array[double])

Rewrite/Expand annotated code with Scala Macro Annotations

Programming model

28

This project and the research leading to these results has received funding from the European Community's

Seventh Framework Programme [FP7/2007-2013] under grant agreement n

318693

Oscar Palomar (BSC)

Energy-Aware Computing Workshop 2014

Power and Energy Modeling of Multi-core Processors for

System-Level Design Space Exploration

EACO Workshop, Sept. 10th 2014, Bristol

Computation of power and measurement of voltage for

OMAP

EACO Workshop, Sept. 10th 2014, Bristol

Power measurement environment

top related