new rules: scaling performance for extreme scale computing

28
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY New Rules: Scaling Performance for Extreme Scale Computing S. Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA. 30332 Sponsors : National Science Foundation, Sandia National Laboratories, Semiconductor Research Corporation, and IBM

Upload: laird

Post on 14-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

New Rules: Scaling Performance for Extreme Scale Computing. S. Yalamanchili. School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA. 30332. Sponsors : National Science Foundation, Sandia National Laboratories, Semiconductor Research Corporation, and IBM. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

New Rules: Scaling Performance for Extreme Scale Computing

S. YalamanchiliSchool of Electrical and Computer

EngineeringGeorgia Institute of Technology

Atlanta, GA. 30332

Sponsors: National Science Foundation, Sandia National Laboratories, Semiconductor Research Corporation, and IBM

Page 2: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 2

Scaling Computing

Cray Titan

Hybrid Memory Cube

Page 3: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 3

Outline

Performance Scaling Rules & Constraints

Case Study: CPU-GPU Coordinated Power Management

Some New Projects MIDAS – Cooling-Power-Microarchitecture Co-Design

Hardware/Software Reliability Enhancement

Adaptive Architectures

Page 4: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 4

Moore’s Law

From wikipedia.org

• Performance scaled with number of transistors

• Dennard scaling: power scaled with feature size

Goal: Sustain Performance

Scaling

Page 5: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Post Dennard Architecture Performance Scaling

W. J. Dally, Keynote IITC 2012

Data_movement_cost

Three operands x 64 bits/operand

5

Power Delivery Cooling

Page 6: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Scaling Performance: Cost of Data Movement

6

Embedded Platforms

Goal: 1-100 GOps/w Goal: 20MW/Exaflop

Big Science: To Exascale

• Sustain performance scaling through massive concurrency

• Data movement becomes more expensive than computation

Courtesy: Sandia National Labs :R. Murphy.

Cost of Data Movement

Page 7: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Post Dennard Architecture Performance Scaling

W. J. Dally, Keynote IITC 2012

Operator_cost + Data_movement_cost

Three operands x 64 bits/operandSpecialization heterogeneity

and asymmetry

7

Page 8: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Scaling Performance: Simplify and MultiplyAMD Bulldozer Core

ARM A7 Core (arm.com)

Extracting single thread performance costs energy

Out-of-order execution Branch prediction Scheduling etc.

Multithread performance exploits parallelism

Simpler pipelines Core scaling

Still important!

8

NVIDIA Fermi

Page 9: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Asymmetry vs. Heterogeneity

Multiple voltage and frequency islands

Different memory technologies

STT-RAM, PCM, Flash

9

Tile Tile

Tile Tile

Tile Tile

Tile Tile

Tile Tile

Tile Tile

Tile Tile

Tile Tile

MC

MC

MC

MC

Tile

Tile

Tile

Tile

Tile

Tile

Tile

Tile

MC

MC

MC

MC

Performance Asymmetry

Functional Asymmetry

Heterogeneous

Complex cores and simple cores

Shared instruction set architecture (ISA)

Subset ISA Distinct microarchitecture Fault and migrate model of

operation1

Uniform ISA Multi-ISA

1Li., T., et.al., “Operating system support for shared ISA asymmetric multi-core architectures,” in WIOSCA, 2008.

Multi-ISA Microarchitecture

Memory & Interconnect hierarchy

Page 10: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 10

The Challenge: The Memory System

Xeon Phi

Hybrid Memory Cube

What should the memory hierarchy look like?Parallelism vs. locality tradeoffsMinimize data movement Processor in Memory?

H. Kim (SCS) and S. Yalamanchili (ECE)Sponsor: Sandia National Labs

Page 11: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Thermal Capacity

Exploit package physics Temperature changes on the

order of milliseconds Workload behaviors change on

the order of microsecondsImpact on device behavior?

Inst

ruct

ions

/cyc

le

Time

Time Varying Workload

Thermal Capacity

Power-Performance

Management!Figures: psdgraphics.com and wikipedia.org

11

Page 12: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Adapting to Power + Thermal ConstraintsExploit package physics

Temperature changes on the order of milliseconds

Use the thermal headroom

Max Power

TDP Power

Low power – build up thermal credits

Turbo boost region

10s of seconds

Intel Sandy Bridge

Time12

E. Rotem, et.al., Power-management Aarchitecture of the Intel Microarchitecture code-named Sandy Bridge, IEEE Micro March 2012

Page 13: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Summary: New Performance Scaling Rules

Energy efficiency: Scale performance by scaling energy efficiency diversify programming models?

Parallelism: Scale number of cores rather than performance of a single core multiply

Data Movement: Energy cost of data movement is more expensive than the energy cost of computation communication-centric

Physics Capacity: Scaling limited by thermal/power capacity power management

13

Page 14: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 14

Physics of Computation

Physical phenomena interact with architecture to affect system level metrics such as energy, power, performance and reliability

These interactions are modulated by workloadsNeed management techniques that can improve energy efficiency

Page 15: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 15

Thermal Coupling

CU1CU0GPU

Thermal coupling between CPU and GPU leads to inefficient operation

Interaction with power management, e.g., turbo core/boost reduces efficiency

AMD Trinity APU

I. Paul et.al., (ISCA 2013)

Page 16: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 16

Thermal Signatures and Fields

CPUs are more thermally dense Thermal gradient from CPUs to GPU

GPU temperature rise due thermal pollution Premature throttling and hence performance loss

Measurements on an AMD Trinity A8-4555M APU

Page 17: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 17

Thermal Coupling

1 19 37 55 73 91 1091271451631811992172352532712893073253430.5

1

1.5

2

2.5

3

3.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

GPU Pow CPU CU0 Pow

CPU CU1 Pow PeakDieTemp

Time (seconds) ->

CPU

& G

PU

Rel

ativ

e P

ower

->

Pea

k D

ie T

emp

erat

ure

->

CPU power is limited, GPU running at max DVFS state

Thermal cou-pling

Temp throttling

CU1CU0GPU

Thermal coupling between CPU and GPU accelerates temperature rise

Solution Cooperative Boosting

AMD Trinity APU

Page 18: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 18

CPU Core Performance States: AMD Trinity APU

Page 19: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 19

Performance Coupling Effects

Need to balance performance coupling and thermal coupling effects

Page 20: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 20

Power Savings Using CB

Page 21: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 21

Energy-Delay2

Page 22: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 22

Related Projects

Page 23: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 23

MIDAS

Probe Stations: Chip and Board Level Probe, Test, & Power

Measurement

FPPE

On-Demand Cooling

Advanced micro-fabrication for microfluidics

FPGA Implementation

Multi-scale Transient Models for Coupled Multi-physics

Emulating the Physics

Microarchitecture Optimization

Locally transient Adaptations

GPU

CPU core

CPU core

CPU core

CPU core

CPU core

CPU core

cache cache cache

Globally Transient Adaptations

Power Model

Functional simulator (Apps, OS)

Cycle-Level Multicore Timing Simulator

PDN Model

Thermal Model

MIDAS Tasks

Indicates existing infrastructures/models

 

System

Arch. Impa

ct

Power

Distr.

Network

 Novel

Cooling Technology

Co-Design

M. Bakir (ECE), H. Kim (SCS), Y. Joshi (ME), S. Mukhopadhyay (ECE),

S. Yalamanchili (ECE)

Page 24: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 24

Reliability Enhancement - Microarchitecture

Peak temperature and MTTF analysis from J. Srinivasan et al.,“Lifetime Reliability: Toward An Architectural Solution,” Micro 2005.

64-core asymmetric chip multiprocessor layout and failure

probability distribution

x10-10In-order core Out-of-order core

25% peak-to-peak difference of failure distribution across the processor die; induced by architectural asymmetry, thermal coupling, power management, and workload characteristics

Single-core processor lifetime reliability Multicore processor lifetime reliability

S. Yalamanchili (ECE) and S. Mukhopadhyay (ECE)Sponsor: SRC (IBM)

Page 25: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 25

Reliability Enhancement - Software

Key Idea: low level code injectionFramework:

On-demand Customizable Transparent

Examples: Alignment check Bounds check Control flow check

S. Yalamanchili (ECE) and K. Schwan (SCS)Sponsor: NSF

Page 26: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 26

Adaptive 3D Architecture

Network

Many Core Processor

Memory

Memory

Memory

Memory

GPUCPU core

CPU core

CPU core

CPU core

CPU core

CPU core

cache cache cache

Reliability

Functional simulator (Apps, OS)

Cycle-Level Multicore Timing Simulator

PDN Model

Power & Thermal Model

Applications and Threads

workload (w(t))Manifold

1. Workload Characterization

2. Characterize Interacting Physical Effects

3. Explore Architecture Design Space

4. Microarchitectural Adaptation Mechanisms

S. Yalamanchili (ECE) and S. Mukhopadhyay (ECE)

Sponsor: SRC

Page 27: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 27

Summary

Architecture

Applications

The physical phenomena should not be masked, but rather integrated into the design

We need tightly coupled integrated models of software, algorithms, architecture and technology

Technology

Page 28: New Rules: Scaling Performance for Extreme Scale Computing

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Thank You

Questions?

28