amphisbaena: modeling two orthogonal ways to hunt on heterogeneous many-cores an analytical...

22
Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li State Key Laboratory of Computer Architecture Institute of Computing Technology, C.A.S. Univ. of Chinese Academy of Sciences

Upload: ross-dickerson

Post on 30-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-coresan analytical performance model for boosting performance

Jun Ma, Guihai Yan, Yinhe Han and Xiaowei Li

State Key Laboratory of Computer ArchitectureInstitute of Computing Technology, C.A.S.

Univ. of Chinese Academy of Sciences

Page 2: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Trends in Cloud Computing The increasing computing demands

More massive More diverse High service level agreement(response time, throughput)

The computing platform to meet these demands Multicore to manycore Homogeneous to heterogeneous

Page 3: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Two Orthogonal Ways to Boost Performance Scale-out speedup: explore many cores for higher

thread-level parallelism

Scale-up speedup: explore heterogeneous cores for optimal application-core mapping

Page 4: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Quantifying Scale-out and Scale-up Speedup The overall performance

Type Issue Width ROB Size

Core-A 4 64

Core-B 6 96

Core-C 8 128

Indicate how to improve overall performance of each application.

How to figure out the application-specific scale-out and scale-up speedup?

Page 5: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Amphisbaena: an Analytical Approach to Model Performance

Amphisbaena, or shortly, Modeling the overall performance speedup coming from

two orthogonal ways

I’m I’m

The ratio of performance on target cores to current cores under the same multithreading configuration.

The ratio of performance on target multithreading configuration to current configuration on the same type of cores.

Page 6: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Experimental Setup

cluster-based layoutdistributed, banked LLC

directory-based MOESI protocol

Page 7: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Scale-out Speedup

– the serial part.– the parallelizable part.– the multithreading penalty.

Page 8: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Observation

– modulating constant.– synchronization waiting

cycles per kilo-instructions(SPKI).

– thread number.

– modulating constant.– misses waiting cycles per

kilo-instructions(MPKI).– thread number squared.

Page 9: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

The Details of Multithreading Penalty

Coefficients Value Implementationsa0 1.837e-003 constant a1 0.05312 constant a2 -2.025e-005 constantk0 bias redundant computationsk1 SPKI bottleneck-identifying instructionsk2 MPKI built-in performance counters

offline

online

Page 10: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Alpha Model Accuracy

benchmarks 12phases 50threads 33(1,2,4,6…64)total space 633600samples 600

Our error is under 5% on average, which outperforms the error of Amdahl’s Law with error of 11.4%.

Page 11: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Scale-up Speedup

the frontend: issue width

• W [Big, Small]

the backend: ROB size

• R[Big, Small]

How to predict the CPI on various type of cores?

S B SB

B B S S

C0 C1

C2 C3

Page 12: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Observation

this trend is well approximated by a power law. this trend fits an exponential function well.

Page 13: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

The Details of CPI Model

Coefficients Value Implementationsb0 0.2837 constant b1 1.1675 constant b2 1.8427 constantr bias b0×CPIbase

s memory intensity CPImem/CPIt computing intensity CPIbase/CPICPImem penalty with stalls CPI stack calculationCPIbase penalty without stalls CPI stack calculation

memory intensity.computing intensity.bias.

offline

online

online

Page 14: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Beta Model Accuracy

benchmarks 12phases 50core types 6total space 18000samples 600

Our error is kept below 8% on average, which outperforms the error of PIE with error of 12.2%.

Page 15: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Phi Model Accuracy

benchmarks 12phases 50threads 33(1,2,4,6…64)core types 6total space 633600×18000samples 1080

The prediction error of overall performance is kept below 12% on average.

Page 16: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Orthogonality Validation

0: mmmityOrthogonal

benchmarks 12phases 50threads 33(1,2,4,6…64)core types 6total space 633600×18000measured 2268

mmm ,, three measured values.

For most applications, the error about orthogonality is below 5% on average.

Page 17: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Application of Phi Model Using Phi for runtime management

Predict the performance speedup coming from scale-out and scale-up on any other target configurations online.

Invoke scheduling algorithm to figure out the optimal configuration in terms of maximizing performance.

The operating system enables the specified multithreading and application-core mapping.

Page 18: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Phi Scheduling

Dout Dup Phi

“application with higher scale-out speedup should spawn more thread.”

“application with largest scale-up speedup is allocated with the fastest type of cores.”

“decide the thread number to spawn for each application.”

“decide the cores to map for each application.”

“Phi scheduling use the heuristic algorithm to maximize performance.”

function

policy

algorithm

Page 19: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Performance Comparison

Baselines Scale-out Scale-upBias Dout memory-related samplesPIE Dout PIE modelStatic fixed thread number DupPhi Dout Dup

Phi averagely outperforms the other three baselines by 12.2% (Static), 13.3% (Bias) and 12.9% (PIE).

Page 20: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Related Works

Performance prediction and optimization periodically Only decided the number of threads/active cores

• CPR: Composable Performance Regression for Scalable Multiprocessor – [Benjamin C. Lee etc. MICRO2008]

• FDT: Feedback-Driven Threading Power-Efficient and High-Performance Execution of Multi-threaded Workloads on CMPs– [M. Aater Suleman etc. ASPLOS2008]

Only decided the type of heterogeneous cores• Single-ISA Heterogeneous Multi-core Architectures for

Multithreaded Workload Performance– [Rakesh Kumar etc. ISCA2004]

• Scheduling Heterogeneous Multi-cores Through Performance Impact Estimation (PIE)– [Kenzo Van Craeynest etc. ISCA2012]

Page 21: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Conclusion Analytical model for performance prediction

Scale-out speedup Scale-up speedup Overall performance

Phi scheduling Apply for runtime management Return optimal performance

Page 22: Amphisbaena: Modeling Two Orthogonal Ways to Hunt on Heterogeneous Many-cores an analytical performance model for boosting performance Jun Ma, Guihai Yan,

Thanks for Your Attention

Q&A