1 computation spreading: employing hardware migration to specialize cmp cores on-the-fly koushik...
TRANSCRIPT
![Page 1: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/1.jpg)
1
Computation Spreading: Employing Hardware Migration to Specialize
CMP Cores On-the-fly
Koushik Chakraborty Philip WellsGurindar Sohi
{kchak,pwells,sohi}@cs.wisc.edu
![Page 2: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/2.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 2
Paper Overview
Multiprocessor Code ReusePoor resource utilization
Computation SpreadingNew model for assigning computation within a program on CMP cores in H/WCase Study: OS and User computation
Investigate performance characteristics
![Page 3: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/3.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 3
Talk Outline
Motivation Computation Spreading (CSP)
Case study: OS and User compution Implementation Results Related Work and Summary
![Page 4: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/4.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 4
Homogeneous CMP
Many existing systems are homogeneous
Sun Niagara, IBM Power 5, Intel Xeon MP
Multithreaded server application Composed of server threadsTypically each thread handles a client requestOS assigns software threads to cores• Entire computation from one thread
execute on a single core (barring migration)
![Page 5: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/5.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 5
Code Reuse
Many client requests are similarSimilar service across multiple threadsSame code path traversed in multiple cores
Instruction footprint classificationExclusive – single core accessCommon – many cores accessUniversal – all cores access
![Page 6: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/6.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 6
Multiprocessor Code Reuse
![Page 7: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/7.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 7
Implications
Lack of instruction stream specialization
Redundancy in predictive structures• Poor capacity utilization
Destructive interference No synergy among multiple cores
Lost opportunity for co-operationExploit core proximity in CMPExploit core proximity in CMP
![Page 8: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/8.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 8
Talk Outline
Motivation Computation Spreading (CSP)
Case study: OS and User compution Implementation Results Related Work and Summary
![Page 9: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/9.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 9
Computation Spreading (CSP)
Computation fragment = dynamic instruction stream portion
Collocate similar computation fragments from multiple threads
Enhance constructive interference
Distribute dissimilar computation fragments from a single thread Reduce destructive interference
Reassignment is the key
![Page 10: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/10.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 10
Example
A1
B1
C1
B2
C2
A2
C3
A3
B3
T1 T2 T3
B3
A3
C3A1
C1
B1
B2
C2
A2
P1 P2 P3
CCAANNOONNIICCAALL
CCSSPP
time
A1
B1
C1
B2
C2
A2
C3
A3
B3
![Page 11: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/11.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 11
Key Aspects
Dynamic SpecializationHomogeneous multicore acquires specialization via retaining mutually exclusive predictive state
Data LocalityData dependencies between different computation fragmentsCareful fragment selection to avoid loss of data locality
![Page 12: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/12.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 12
Selecting Fragments
Server workloads characteristicsLarge data and instruction footprintSignificant OS computation
User Computation and OS Computation
A natural separationExclusive instruction footprints
Relatively independent Relatively independent data footprint
![Page 13: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/13.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 13
Data Communication
T1 T2
T1-User
T1-OS
T2-User
T2-OS
Core 1 Core 2
![Page 14: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/14.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 14
Relative Inter-core Data Communication
Apache OLTP
OS-User Communication is limited
![Page 15: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/15.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 15
Talk Outline
Motivation Computation Spreading (CSP)
Case study: OS and User compution Implementation Results Related Work and Summary
![Page 16: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/16.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 16
Implementation
Migrating ComputationTransfer state through the memory subsystem
• ~2KB of register state in SPARC V9• Memory state through coherence
Lightweight Virtual Machine Monitor
Migrates computation as dictated by the CSP PolicyImplemented in hardware/firmware
![Page 17: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/17.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 17
BaselineUser Cores
OS Cores
User CompOS Comp
Virtual CPUs
Physical
Cores
Software
Stack
Implementation contThreads
![Page 18: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/18.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 18
User Cores
OS Cores
Virtual CPUs
Physical
Cores
Software
Stack
Implementation contThreads
![Page 19: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/19.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 19
CSP Policy
Policy dictates computation assignment
Thread Assignment Policy (TAP)Maintains affinity between VCPUs and physical cores
Syscall Assignment Policy (SAP)OS computation assigned based on system calls
TAP and SAP use identical assignment for user computation
![Page 20: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/20.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 20
Talk Outline
Motivation Computation Spreading (CSP)
Case study: OS and User compution Implementation Results Related Work and Summary
![Page 21: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/21.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 21
Simulation Methodology Virtutech SIMICS MAI running Solaris 9 CMP system: 8 out-of-order processors
2 wide, 8 stages, 128 entry ROB, 3GHz 3 level memory hierarchy
Private L1 and L2Directory base MOSIL3: Shared, Exclusive 8MB (16w) (75 cycle load-to-use)Point to point ordered interconnect (25 cycle latency)Main Memory 255 cycle load to use, 40GB/s
Measure impact on predictive structures
![Page 22: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/22.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 22
L2 Instruction Reference
![Page 23: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/23.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 23
Result Summary
Branch predictors9-25% reduction in mis-predictions
L2 data references0-19% reduction in load missesModerate increase in store misses
Interconnect messagesModerate reduction (after accounting extra messages for migration)
![Page 24: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/24.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 24
Performance Potential
Migration Overhead
![Page 25: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/25.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 25
Talk Outline
Motivation Computation Spreading (CSP)
Case study: OS and User compution Implementation Results Related Work and Summary
![Page 26: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/26.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 26
Related Work
Software re-design: staged executionCohort Scheduling [Larus and Parkes 01], STEPS [Ailamaki 04], SEDA [Welsh 01], LARD [Pai 98]CSP: similar execution in hardware
OS and User Interference [several]Structural separation to avoid interferenceCSP avoids interference and exploits synergy
![Page 27: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/27.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 27
Summary
Extensive code reuse in CMPs45-66% instruction blocks universally accessed in server workloads
Computation SpreadingLocalize similar computation and separate dissimilar computationExploits core proximity in CMPs
Case Study: OS and User computationDemonstrate substantial performance potential
![Page 28: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/28.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 28
Thank You!
![Page 29: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/29.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 29
Backup Slides
![Page 30: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/30.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 30
L2 Data Reference
L2 load miss comparable, slight to moderate increase in L2 store miss
![Page 31: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/31.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 31
Multiprocessor Code Reuse
![Page 32: 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu](https://reader036.vdocuments.mx/reader036/viewer/2022081603/56649f145503460f94c287fd/html5/thumbnails/32.jpg)
Chakraborty, Wells, and Sohi ASPLOS 2006 32
Performance Potential