system simulation of 1000-cores heterogeneous socs

21
System Simulation Of 1000- cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)

Upload: oleg

Post on 22-Feb-2016

61 views

Category:

Documents


0 download

DESCRIPTION

System Simulation Of 1000-cores Heterogeneous SoCs. Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL). Price profile N. Price profile 1. Load profile N. Load profile 1. w. $. $. w. now. now. now. now. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: System Simulation Of 1000-cores Heterogeneous  SoCs

System Simulation Of 1000-cores Heterogeneous SoCs 

Shivani RaghavEmbedded System Laboratory (ESL)

Ecole Polytechnique Federale de Lausanne (EPFL)

Page 2: System Simulation Of 1000-cores Heterogeneous  SoCs

ESL Work on Energy-Aware Datacenter Design

2

System Simulation for many-core

SIMinG-1k

communic.

now

Load profile Nw

Datacenter infrastructure

IPS

IPS

Load profile 1, 2 and 3

IPS

PMSM: Power/Therm. Manager

New server cooling tech.

network

now

Load profile 1

w$

now

Price profile 1

Internet

Grid

IPS

IPS

IPS

$

now

Price profile N

Page 3: System Simulation Of 1000-cores Heterogeneous  SoCs

Emerging Data-Intensive Workloads

Cloud Servers

Molecular Dynamics

Monte CarloSimulations

Gene Sequencing

Online Gaming Services

Financial SimulationsMedical Imaging

Page 4: System Simulation Of 1000-cores Heterogeneous  SoCs

Demand for Hardware Acceleration

Tile based ManycoresIntel SCC, Tile 64(Integrated)

GPU Clusters (off –chip 

Accelerators)

Hybrid Cores AMD Fusion (on-chip) 

Page 5: System Simulation Of 1000-cores Heterogeneous  SoCs

Urgent Need for Simulation of Heterogeneous SoCs

Thermal& Power

Evaluations

BenchmarkingProfiling

Debugging

Design Space Exploration

Early Software

Development

Simulation

Page 6: System Simulation Of 1000-cores Heterogeneous  SoCs

How to Design a Fast and Scalable Many-Core 

Simulator?Parallel Target

Parallel Simulator

  Parallel Host

Page 7: System Simulation Of 1000-cores Heterogeneous  SoCs

Simulating Parallel Target on Parallel Hostis an Old Technology…

FPGA GPGPU

FlexusRAMP Opportunity

WWT IIGraphite

Cotson, OVPSim

Large ParallelSystems

Page 8: System Simulation Of 1000-cores Heterogeneous  SoCs

Target ArchitectureData-Parallel Coprocessors

Simple In-order Cores

1000s of cores in a  tile network 

Fine grain parallelism

Core

Caches

Memory

Switch

Page 9: System Simulation Of 1000-cores Heterogeneous  SoCs

Solution – Accelerating Simulation using GPGPUs

Target Architecture Host Platform

APerfectMatch

Page 10: System Simulation Of 1000-cores Heterogeneous  SoCs

Outline

• Problem Overview Simulation of Heterogeneous SoCs

• SolutionSIMinG-1k: A GPU accelerated 

simulator• Evaluation• Summary

Page 11: System Simulation Of 1000-cores Heterogeneous  SoCs

Overall Simulation Framework

Host Platform

Sequential Code

Data Parallel Code

Simulator SIMinG-1k

TargetArchitecture

General Purpose

CPU

Many-Core Accelerator

Application

Page 12: System Simulation Of 1000-cores Heterogeneous  SoCs

SIMinG-1k - Features

• Instruction Accurate

• Inexpensive and  Easily Available

• Fast Development Cycle

• Equation Performance Model 

• Portability (Target Independent)

• Interpretation based core-simulation

Page 13: System Simulation Of 1000-cores Heterogeneous  SoCs

Challenges of using GPU as a host

• SIMT (Single inst multiple threads)• Divergent Code is a problem• Synchronization outside thread block• Slow CPU-GPU communication• Global Memory is slow and limited

Page 14: System Simulation Of 1000-cores Heterogeneous  SoCs

Outline

• Problem Overview Simulation of Heterogeneous 

SoCs• Solution

SIMinG-1k (GPU accelerated simulator)• Evaluation• Summary

Page 15: System Simulation Of 1000-cores Heterogeneous  SoCs

Results – Architecture 1MIPS -  Number of simulated instruction in host wall clock time

ARM

ISA

Data Scratchpad

Single tile of target AcceleratorInst Scratchpad

128 256 512 1024 2048 40960

100

200

300

400

500

600

700

MMNCCIDCTEPCCDQFFTSYNC1

Number of Simulated Cores

S-M

IPS

Page 16: System Simulation Of 1000-cores Heterogeneous  SoCs

Speed Up – Architecture 1

32 64 128 256 512 1024 2048 40960

200

400

600

800

1000

SIMinG-1k

OVP

# Simulated Cores 

MIPS

Matrix Multiply

Speedup compared to simulation on OVPSim (thousands of ARM cores)

Page 17: System Simulation Of 1000-cores Heterogeneous  SoCs

  Single tile of Data-parallel Accelerator(cores, caches, on-chip interconnect)

Results – Architecture 2

Core

Caches

Memory

Switch

128 256 512 1024 2048 409605101520253035404550

0.180 0.077 0.026 0.006 0.002 0.001

NCCMMIDCTDQFFTEPCCSYNC1

Number of Simulated Cores

S-M

IPS

Page 18: System Simulation Of 1000-cores Heterogeneous  SoCs

Speed Up – Architecture 2Speedup compared to serial simulation on QEMU

Page 19: System Simulation Of 1000-cores Heterogeneous  SoCs

Outline

• Problem Overview Simulation of Heterogeneous 

SoCs• Solution

SIMinG-1k (GPU accelerated simulator)• Evaluation• Summary

Page 20: System Simulation Of 1000-cores Heterogeneous  SoCs

Conclusion Challenge 

Fast and parallel simulator for heterogeneous SoCs  Solution 

Parallelize 1000 core simulation using GPUs  Design 

Full System Simulation using QEMU and SIMinG-1k Results 

High Scalability and speedup upto 4096 cores

Extend the simulator for thermal and power evaluations Complete simulation of Cloud Data Centers

Future Work

Page 21: System Simulation Of 1000-cores Heterogeneous  SoCs

Thanks!

Questions?