multiple clock and voltage domains for chip multi...
TRANSCRIPT
![Page 1: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/1.jpg)
Multiple Clock and Voltage Domains for Chip Multi Processors
Efraim Rotem- Intel Corporation IsraelAvi Mendelson- Microsoft R&D Israel
Ran Ginosar- Technion Israel institute of TechnologyUri Weiser- Technion Israel Institute of Technology
Presented by: Michael Moeng- University of Pittsburgh
![Page 2: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/2.jpg)
Outline
Multiple Voltage Domains
Power Model
Performance Model
Power Management Policies
Results
![Page 3: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/3.jpg)
Multiple Voltage Domains
Multiprocessors can distribute power in several ways:Single clock domain (also implies single voltage domain)
All cores operate at same frequency and voltage
![Page 4: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/4.jpg)
Multiple Voltage Domains
Multiprocessors can distribute power in several ways:Single clock domain (also implies single voltage domain)
All cores operate at same frequency and voltageMultiple clock domains -- communicate through FIFO buffers with minor overhead
Multiple Voltage Domains:Cores independently scale frequency and voltage
![Page 5: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/5.jpg)
Multiple Voltage Domains
Multiprocessors can distribute power in several ways:Single clock domain (also implies single voltage domain)
All cores operate at same frequency and voltageMultiple clock domains -- communicate through FIFO buffers with minor overhead
Multiple Voltage Domains:Cores independently scale frequency and voltage
Single voltage domainIndividual cores use only frequency scalingSingle voltage for all cores determined by highest frequency
![Page 6: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/6.jpg)
Multiple Voltage Domains
Multiprocessors can distribute power in several ways:Single clock domain (also implies single voltage domain)
All cores operate at same frequency and voltageMultiple clock domains -- communicate through FIFO buffers with minor overhead
Multiple Voltage Domains:Cores independently scale frequency and voltage
Single voltage domainIndividual cores use only frequency scalingSingle voltage for all cores determined by highest frequency
Clustered topologies: Hybrid approach between two extremes
![Page 7: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/7.jpg)
Multiple Voltage Domains - Power Delivery
Previous works assume no overhead for extra voltage regulators.A voltage regulator must be designed for a nominal current. Additional voltage regulators have consequences for:
![Page 8: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/8.jpg)
Multiple Voltage Domains - Power Delivery
Previous works assume no overhead for extra voltage regulators.A voltage regulator must be designed for a nominal current. Additional voltage regulators have consequences for:
Current SharingPower Delivery Network Resistance
![Page 9: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/9.jpg)
Current Sharing
A regulator will realistically be designed for a maximum current of 130% to 250% of its nominal current.
Compare chip power delivery systems:
single voltage regulator, X~2.5X ampstwo voltage regulators, .5X~1.25X amps eachN voltage regulators, X/N~2.5X/N amps each
![Page 10: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/10.jpg)
Current Sharing
A regulator will realistically be designed for a maximum current of 130% to 250% of its nominal current.
Compare chip power delivery systems:
single voltage regulator, X~2.5X ampstwo voltage regulators, .5X~1.25X amps eachN voltage regulators, X/N~2.5X/N amps each
Maximum power to a single core can be much higher with fewer regulators.
![Page 11: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/11.jpg)
Resistance in Power Delivery Network
Splitting Power Delivery Network N ways results in N times higher resistanceFor symmetric workloads, each regulator also supplies N times less current -- no penaltyWhen assigning power asymmetrically, higher resistance results in a voltage drop -- wasted power
![Page 12: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/12.jpg)
Power Model
![Page 13: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/13.jpg)
Power Model
Assumption: Future high-oower CMPs will be designed with nominal frequency and power at the minimum operating voltage allowed by a process.
![Page 14: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/14.jpg)
Benchmarks
![Page 15: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/15.jpg)
Quick Check
If we run 16 copies of ammp at nominal frequency, how much power do we have left?
![Page 16: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/16.jpg)
Performance Model
![Page 17: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/17.jpg)
Performance Model
Frequency
![Page 18: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/18.jpg)
Performance Model
Minimum Operating Frequency
![Page 19: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/19.jpg)
Performance Model
Minimum Operating Frequency
![Page 20: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/20.jpg)
Benchmarks
![Page 21: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/21.jpg)
Power Management Policies
Goal: Maximize performance given a power constraint
![Page 22: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/22.jpg)
Power Management Policies
Goal: Maximize performance given a power constraintAssume benchmarks have already been profiled (we know the frequency scaling)Policies assume its better to give core with better scalability a higher frequency, and provide a function of frequency given scalability.
![Page 23: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/23.jpg)
Quick Check 2
The polynomial policy scales frequency inversely with the
freq-power dependency.
What is this function?
![Page 24: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/24.jpg)
Power Management: following constraintsAfter each core's desired power level is determined:
If desired current exceeds current capacity, scale frequency down to maximum allowedAll values are normalized so total power meets power constraints
![Page 25: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/25.jpg)
Evaluation
Simulation and real machine execution used to determine parameters for each benchmark
![Page 26: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/26.jpg)
Evaluation
Simulation and real machine execution used to determine parameters for each benchmark
"Oracle" simulated using a gradient descent algorithm
![Page 27: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/27.jpg)
Evaluation
Simulation and real machine execution used to determine parameters for each benchmark
"Oracle" simulated using a gradient descent algorithm
Monte Carlo modeling for workload generationEvaluates workloads with 2,4,8,12,14,16 threads to show performance with idle cores
![Page 28: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/28.jpg)
Evaluation
Simulation and real machine execution used to determine parameters for each benchmark
"Oracle" simulated using a gradient descent algorithm
Monte Carlo modeling for workload generationEvaluates workloads with 2,4,8,12,14,16 threads to show performance with idle cores
Baseline is single-clock domain, single-voltage domain
10-30% improvement over no-DVFS Quick Check 3: How does this improve performance?
![Page 29: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/29.jpg)
Oracle policy
For about half the workloads, it's best to use the same frequency for all cores
Loss comes from asynchronous FIFO buffers
![Page 30: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/30.jpg)
Best policies for each configuration
Shows loss vs oracleLower is better
Knowledge of frequencyscalability is crucial
![Page 31: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/31.jpg)
Limiting threads
Multiple voltage domainsare heavily dependent on high headroom for voltage regulators
![Page 32: Multiple Clock and Voltage Domains for Chip Multi Processorspeople.cs.pitt.edu/~kirk/cs3150spring2010/Multiple_Clock... · 2010. 4. 12. · Multiple Clock and Voltage Domains for](https://reader034.vdocuments.mx/reader034/viewer/2022051603/5fede31f9ed5273d9a27ea5d/html5/thumbnails/32.jpg)
Clustered Topologies
Matches performance of single voltage domain with few threadsMatches performance of multiple voltage domains with many threads