scalability-based manycore partitioning hiroshi sasaki kyushu university koji inoue kyushu...

28
Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura The University of Tokyo PACT 2012 Presented by Kim, Jong-yul 2013. 7. 31

Upload: arline-freeman

Post on 17-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Scalability-BasedManycore Partitioning

Hiroshi SasakiKyushu University

Koji InoueKyushu University

Teruo TanimotoThe University of Tokyo

Hiroshi NakamuraThe University of Tokyo

PACT 2012

Presented by Kim, Jong-yul2013. 7. 31

Page 2: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Contents

• Motivation• SBMP Scheduler

• Scalability Prediction• Core Partition• Core Donation• Phase Change Detection

• Evaluation Results• Conclusions

2 / 27

Page 3: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Prospects

• Limitation of increasing F• ILP, power wall, transistor scaling

• Multi-core, many-core system

System

APP2 APP3

APP1

Multi-threaded multiprogramming

3 / 27

Page 4: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Problem

• Traditional OS Assign equal CPU to all running apps

• Programs have different Scalability

Normalized Turnaround Time Clock cycles when multiprogrammed with others

Clock cycles when solo-run

WorkloadsAverage

AverageWorkloads

Performance

4 / 27

Linux: 2.04

Best Partitioning: 1.38

Page 5: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Experimental System

allocation unit

5 / 27

Page 6: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

SBMP SchedulerScalability PredictionCore PartitioningCore DonationPhase Change Detection

6 / 27

Page 7: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Overview

• Assign cores considering scalability of applications

• SBMP: Scalability-Based Manycore Partitioning scheduler

Partitioning

SteadyScalability Prediction

Core Parti-tioning

Core Dona-tion Detect

7 / 27

Page 8: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

SteadyScalability Prediction

Core Parti-tioning

Core Dona-tion Detect

8 / 27

Page 9: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Workloads

Scalability Prediction (1/2)

• Cumulative retired instructions per second (IPS) Little effect from # of cores

Total # of instructions

Tota

l # o

f ins

truc

tions

8%

9 / 27

Page 10: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Scalability Prediction (2/2)

• If obtained directly…• Warm up branch prediction & cache system• Need 8 allocations (6, 12, 18, …, 48)

• Simple model

• 3 coefficients (α, β, γ)• 3 Samplings: 1 single core + 2 different configurations

Performance Amdahl’s law Overhead caused by additional core

Over 3 seconds

10 / 27

Page 11: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

SteadyScalability Prediction

Core Parti-tioning

Core Dona-tion Detect

11 / 27

Page 12: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

# of cores

Relativeperformance

Core Partitioning (1/2)

High

Medium

Low

# of cores

Relativeperformance

12 / 27

Page 13: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Core Partitioning (2/2)

• Scalability-table for each program• Key -value

• Key : # of cores• Value : performance with [key] cores

• Goal

• Hill climbing algorithm Near optimal assignment

Single-run

Multiprogrammed

13 / 27

Page 14: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

SteadyScalability Prediction

Core Parti-tioning

Core Dona-tion Detect

14 / 27

Page 15: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Core Donation

• 1 program for each processor die• CPU utilization

Core1 Program1

CPU utilization ratio < Threshold (70%)

Core2

Donor

• Donee: most beneficial one• Utilization, scalability

• Priority: Donee < Donor • Finer granularity• Processor die (6 cores)

time

Program2Program2

Donee

15 / 27

Page 16: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

SteadyScalability Prediction

Core Parti-tioning

Core Dona-tion Detect

16 / 27

Page 17: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

SteadyScalability Prediction

Core Parti-tioning

Core Dona-tion Detect

17 / 27

Page 18: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Detection (1/2)

1. Creation or termination of program2. Phase transition detected in any of the programs

Performance

18 / 27

Page 19: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Detection (2/2) – Phase Prediction

• SBMP scheduler monitors performance every epoch (2.5s)

• Threshold ( > or <

SteadyScalability Prediction

Core Parti-tioning

Core Dona-tion Detect

19 / 27

Page 20: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

EvaluationCore PartitioningPhase PredictionCore DonationOverall Performance

20 / 27

Page 21: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Experimental System

• PARSEC benchmark suite 2.1

Processor 4 X AMD Opteron 6172

# of dies / processor 2

# of cores / die 6

Total # of cores 48

L3 cache size 12 MB / socket

Main memory 96 GB DDR3 PC3-10600

Linux kernel 2.6.37.6

21 / 27

Page 22: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Workloads

Core Partitioning

• SBMP-base• Scalability Prediction + Core Partitioning

• Single-phase application (2 Medium + 2 Low)

Workloads

Performance

Average

Linux: 1.88

SBMP-base: 1.54

22 / 27

Page 23: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Phase Prediction

• SBMP-PP (Phase Prediction)• SBMP-base + Phase Prediction

• Multiple-phase application

Workloads

Linux: 1.89

SBMP-base: 2.09

SBMP-PP: 1.77

23 / 27

Page 24: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Core Donation

• SBMP-CD (Core Donation)• SBMP-PP + Core Donation

• 2 low CPU utilization + 2 normal

Workloads

Linux: 2.06

SBMP-PP: 1.68

SBMP-CD: 1.60

24 / 27

Page 25: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Overall Results

• All programs

Linux: 1.83SBMP-base: 1.99SBMP-PP: 1.70 (8%)SBMP-CD: 1.65 (11%)

72 Workloads

25 / 27

Page 26: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Conclusions

• OS scheduling on many core system• Multiple Multi-threaded applications

• SBMP Scheduler• Dynamic scalability prediction + Core partitioning• Phase recognition• Core Donation

• 11% over Linux

26 / 27

Page 27: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

QnA

27 / 27

Page 28: Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura

Hill Climbing Algorithm

• Find near optimal solution• Start with arbitrary solution• Incrementally changing a single element

28 / 27