pipelining - facultystaff.richmond.edu · performance measurements • cycle time: time in between...
TRANSCRIPT
![Page 1: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/1.jpg)
Pipelining
![Page 2: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/2.jpg)
Performance Measurements
• Cycle Time: Time in between clock ticks
• Latency: Time to finish a complete job, start to finish
• Throughput: Average jobs completed per unit time
• CyclesPerJob: Number of cycles between finishing jobs.
![Page 3: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/3.jpg)
Goals
• Faster clock rate• Use machine more efficiently• No longer execute only one instruction
at a time
![Page 4: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/4.jpg)
Laundry
• Laundry-o-matic washes, dries & folds
• Wash: 30 min• Dry: 40 min• Fold: 20 min• It switches them internally with no
delay• How long to complete 1 load?
______
![Page 5: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/5.jpg)
Laundry
• Laundry-o-matic washes, dries & folds
• Wash: 30 min• Dry: 40 min• Fold: 20 min• It switches them internally with no
delay• How long to complete 1 load? 90
min
![Page 6: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/6.jpg)
Laundry-o-Matic - SingleCycle
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
![Page 7: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/7.jpg)
Laundry-o-Matic - SingleCycle
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
W F D
![Page 8: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/8.jpg)
Laundry-o-Matic - SingleCycle
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
W F
W D F
D
![Page 9: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/9.jpg)
Laundry-o-Matic - SingleCycle
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
W F
W D F
W F D
D
![Page 10: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/10.jpg)
Laundry-o-Matic
• Cycle Time: Clothing is switched every ____ minutes
• Latency: A single load takes a total of ______ minutes
• Throughput: A load completes each ______ minutes
• CyclesPerLoad: Every ____ cycles, a load completes
![Page 11: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/11.jpg)
Laundry-o-Matic
• Cycle Time: Clothing is switched every 90 minutes
• Latency: A single load takes a total of ______ minutes
• Throughput: A load completes each ______ minutes
• CyclesPerLoad: Every ____ cycles, a load completes
![Page 12: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/12.jpg)
Laundry-o-Matic
• Cycle Time: Clothing is switched every 90 minutes
• Latency: A single load takes a total of 90 minutes
• Throughput: A load completes each ______ minutes
• CyclesPerLoad: Every ____ cycles, a load completes
![Page 13: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/13.jpg)
Laundry-o-Matic
• Cycle Time: Clothing is switched every 90 minutes
• Latency: A single load takes a total of 90 minutes
• Throughput: A load completes each 90 minutes
• CyclesPerLoad: Every ____ cycles, a load completes
![Page 14: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/14.jpg)
Laundry-o-Matic
• Cycle Time: Clothing is switched every 90 minutes
• Latency: A single load takes a total of 90 minutes
• Throughput: A load completes each 90 minutes
• CyclesPerLoad: Every 1 cycles, a load completes
![Page 15: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/15.jpg)
Pipelined Laundry
• Split the laundry-o-matic into a washer, dryer, and folder (what a concept)
• Moving the laundry from one to another takes 6 minutes
![Page 16: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/16.jpg)
Pipelined Laundry
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
W F D
We have to include time to switch stages
![Page 17: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/17.jpg)
Pipelined Laundry
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
W F D
W F D
![Page 18: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/18.jpg)
Pipelined Laundry
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
W F D
W F D
Two loads can not be in Dryer at the same
time.
![Page 19: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/19.jpg)
Pipelined Laundry
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
W F
W
Switch all loads at the same time
D
D F
![Page 20: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/20.jpg)
Pipelined Laundry
Minutes
Load
1 2 3
0 30 60 90 120 150 180 210 240 270
W F
W
D
D F
W D F
![Page 21: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/21.jpg)
Pipelined Laundry
• Cycle Time: Clothing is switched every ____ minutes
• Latency: A single load takes a total of ______ minutes
• Throughput: A load completes each ______ minutes
• CyclesPerLoad: Every ____ cycles, a load completes
![Page 22: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/22.jpg)
Pipelined Laundry
• Cycle Time: Clothing is switched every 46 minutes
• Latency: A single load takes a total of ______ minutes
• Throughput: A load completes each ______ minutes
• CyclesPerLoad: Every ____ cycles, a load completes
![Page 23: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/23.jpg)
Pipelined Laundry
• Cycle Time: Clothing is switched every 46 minutes
• Latency: A single load takes a total of 138 minutes
• Throughput: A load completes each ______ minutes
• CyclesPerLoad: Every ____ cycles, a load completes
![Page 24: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/24.jpg)
Pipelined Laundry
• Cycle Time: Clothing is switched every 46 minutes
• Latency: A single load takes a total of 138 minutes
• Throughput: A load completes each 46 minutes
• CyclesPerLoad: Every ____ cycles, a load completes
![Page 25: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/25.jpg)
Pipelined Laundry
• Cycle Time: Clothing is switched every 46 minutes
• Latency: A single load takes a total of 138 minutes
• Throughput: A load completes each 46 minutes
• CyclesPerLoad: Every 1 cycles, a load completes
![Page 26: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/26.jpg)
Single-Cycle vs Pipelined
• _________ has the higher cycle time• _________ has the higher clock rate• _________ has the higher single-load
latency• _________ has the higher throughput• _________ has the higher CPL (Cycles
per Load)• More stages makes a _______ clock
rate
![Page 27: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/27.jpg)
Single-Cycle vs Pipelined
• Single has the higher cycle time• _________ has the higher clock rate• _________ has the higher single-load
latency• _________ has the higher throughput• _________ has the higher CPL (Cycles
per Load)• More stages makes a _______ clock
rate
![Page 28: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/28.jpg)
Single-Cycle vs Pipelined
• Single has the higher cycle time• Pipelined has the higher clock rate• _________ has the higher single-load
latency• _________ has the higher throughput• _________ has the higher CPL (Cycles
per Load)• More stages makes a _______ clock
rate
![Page 29: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/29.jpg)
Single-Cycle vs Pipelined
• Single has the higher cycle time• Pipelined has the higher clock rate• Pipelined has the higher single-load
latency• _________ has the higher throughput• _________ has the higher CPL (Cycles
per Load)• More stages makes a _______ clock
rate
![Page 30: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/30.jpg)
Single-Cycle vs Pipelined
• Single has the higher cycle time• Pipelined has the higher clock rate• Pipelined has the higher single-load
latency• Pipelined has the higher throughput• _________ has the higher CPL (Cycles
per Load)• More stages makes a _______ clock
rate
![Page 31: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/31.jpg)
Single-Cycle vs Pipelined
• Single has the higher cycle time• Pipelined has the higher clock rate• Pipelined has the higher single-load
latency• Pipelined has the higher throughput• Neither has the higher CPL (Cycles per
Load)• More stages makes a _______ clock
rate
![Page 32: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/32.jpg)
Single-Cycle vs Pipelined
• Single has the higher cycle time• Pipelined has the higher clock rate• Pipelined has the higher single-load
latency• Pipelined has the higher throughput• Neither has the higher CPL (Cycles per
Load)• More stages makes a Higher clock rate
![Page 33: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/33.jpg)
Obstacles to speedup in PipeliningW F D
• 1. • 2.
• Ideal cycle time w/out above limitations with n stage pipeline:
![Page 34: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/34.jpg)
Obstacles to speedup in Pipelining
• 1. Uneven Stages• 2.
• Ideal cycle time w/out above limitations with n stage pipeline:
W F D
![Page 35: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/35.jpg)
Obstacles to speedup in Pipelining
• 1. Uneven Stages• 2. Pipeline Register Delay
• Ideal cycle time w/out above limitations with n stage pipeline:
W F D
![Page 36: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/36.jpg)
Obstacles to speedup in Pipelining
• 1. Uneven Stages• 2. Pipeline Register Delay
• Ideal cycle time w/out above limitations with n stage pipeline:w OldCycleTime / n
W F D
![Page 37: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/37.jpg)
Example
• Washing = 45• Drying = 120• Folding = 15• Switching = 5
• What is the latency for one load of laundry?
• What is the latency for three loads?
![Page 38: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/38.jpg)
Creating Stages
• Fetch – get instruction• Decode – read registers• Execute – use ALU• Memory – access memory• WriteBack – write registers
Fetch Decode Execute Memory WriteBack
IF WB MEM ID
![Page 39: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/39.jpg)
Pipelined Machine
Read Addr Out Data
Instruction Memory
PC
4
src1 src1data
src2 src2data Register File
destreg
destdata
op/fun rs rt rd imm
Addr Out Data
Data Memory
In Data
32 Sign Ext
16
<< 2
<< 2
Pipeline Register
Fetch
(Writeback)
Execute Decode Memory
![Page 40: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/40.jpg)
IF WB MEM ID
Time->
IF
1 2 3 4 5 6 7 8
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
add $s0, $0, $0
![Page 41: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/41.jpg)
IF WB MEM ID
Time->
IF ID
IF
1 2 3 4 5 6 7 8
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
add $s0, $0, $0 lw $s1, 0($t0)
![Page 42: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/42.jpg)
IF WB MEM ID
Time->
IF ID
IF ID
IF
1 2 3 4 5 6 7 8
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
add $s0, $0, $0 lw $s1, 0($t0) sw $s2, 0($t1)
![Page 43: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/43.jpg)
IF WB MEM ID
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF
add $s0, $0, $0
ID
IF
lw $s1, 0($t0)
ID
IF
sw $s2, 0($t1)
MEM
ID
IF
or $s3, $s4, $t3
1 2 3 4 5 6 7 8
![Page 44: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/44.jpg)
IF WB MEM ID
add $s0, $0, $0 lw $s1, 0($t0) sw $s2, 0($t1) or $s3, $s4, $t3
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
MEM
ID
WB
![Page 45: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/45.jpg)
IF WB MEM ID
lw $s1, 0($t0) sw $s2, 0($t1) or $s3, $s4, $t3
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
MEM
ID
WB MEM
WB
![Page 46: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/46.jpg)
IF WB MEM ID
sw $s2, 0($t1) or $s3, $s4, $t3
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
MEM ID
WB
MEM
WB
MEM
WB
![Page 47: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/47.jpg)
IF WB MEM ID
or $s3, $s4, $t3
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
ID WB
MEM
WB
MEM
WB
MEM
WB
![Page 48: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/48.jpg)
IF WB MEM ID
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
ID WB
MEM
WB
MEM
WB
MEM
WB
The machine in cycle 4
![Page 49: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/49.jpg)
IF WB MEM ID
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
ID WB
MEM
WB
MEM
WB
MEM
WB
The machine in cycle 5
![Page 50: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/50.jpg)
In what cycle was $s1 written? In what cycle was $s4 read? In what cycle was the Add executed?
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
ID WB
MEM
WB
MEM
WB
MEM
WB
![Page 51: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/51.jpg)
In what cycle was $s1 written? 6 In what cycle was $s4 read? In what cycle was the Add executed?
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
ID WB
MEM
WB
MEM
WB
MEM
WB
![Page 52: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/52.jpg)
In what cycle was $s1 written? 6 In what cycle was $s4 read? 5 In what cycle was the Add executed?
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
ID WB
MEM
WB
MEM
WB
MEM
WB
![Page 53: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/53.jpg)
In what cycle was $s1 written? 6 In what cycle was $s4 read? 5 In what cycle was the Add executed? 3
Time->
add $s0, $0, $0
lw $s1, 0($t0)
sw $s2, 0($t1)
or $s3, $s4, $t3
IF ID
IF ID
IF
MEM
ID
IF
1 2 3 4 5 6 7 8
ID WB
MEM
WB
MEM
WB
MEM
WB
![Page 54: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/54.jpg)
Performance Analysis
• Measurements related to our machine• Job = single instruction• Latency: Time to finish a complete
_______________, start to finish. • Throughput: Average ______________
completed per unit time.
• Which is more important for reducing program execution time?
![Page 55: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/55.jpg)
Performance Analysis
• Measurements related to our machine• Job = single instruction• Latency: Time to finish a complete
instruction start to finish. • Throughput: Average ______________
completed per unit time.
• Which is more important for reducing program execution time?
![Page 56: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/56.jpg)
Performance Analysis
• Measurements related to our machine• Job = single instruction• Latency: Time to finish a complete
instruction start to finish. • Throughput: Average number of
instructions completed per unit time.
• Which is more important for reducing program execution time?
![Page 57: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/57.jpg)
Pipelined Machine
Read Addr Out Data
Instruction Memory
PC
4
src1 src1data
src2 src2data Register File
destreg
destdata
op/fun rs rt rd imm
Addr Out Data
Data Memory
In Data
32 Sign Ext
16
<< 2
<< 2
Pipeline Register
Fetch
(Writeback)
Execute Decode Memory
![Page 58: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/58.jpg)
Pipeline Registers
w IF/ID§ 32b instruction§ 32b nPC
w ID/EX§ 32b register§ 32b register§ 32b immediate field§ 32b nPC
w EX/MEM§ Zero§ 32b ALU result§ 32b nPC§ 32b register value
w MEM/WB§ 32b ALU result§ 32b memory value
• Named for two stages they separate • Store all data corresponding to lines that go
through them
![Page 59: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/59.jpg)
Register File
• Only takes half of a cycle to read or write to register file
• Convention:w Read 2nd half of cyclew Write 1st half of cycle
![Page 60: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/60.jpg)
Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay
Single-Cycle Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns
Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns
![Page 61: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/61.jpg)
Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay
Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns
Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns
![Page 62: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/62.jpg)
Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay
Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: _____ inst/ns
Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns
![Page 63: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/63.jpg)
Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay
Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns
Pipelined Implementation Clock cycle time: _____ ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns
![Page 64: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/64.jpg)
Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay
Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns
Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: _____ ns Throughput for machine: _____ inst/ns
![Page 65: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/65.jpg)
Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay
Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns
Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: 2.1*5=10.5 ns Throughput for machine: _____ inst/ns
![Page 66: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/66.jpg)
Machine Comparison Fetch Decode Execute Memory WriteBack 2ns 1ns 2ns 2ns 1ns 0.1 ns pipeline register delay
Single-Cycle Implementation Clock cycle time: 8 ns Latency of a single instruction: 8 ns Throughput for machine: 1/8 inst/ns
Pipelined Implementation Clock cycle time: 2.1 ns Latency of a single instruction: 2.1*5=10.5 ns Throughput for machine: 1 / 2.1 inst/ns
![Page 67: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/67.jpg)
Example 2 – How do we speed up pipelined machine?
Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns
![Page 68: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/68.jpg)
Example 2 – How do we speed up pipelined machine?
Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Single cycle: 1 / 32 inst / ns Pipelined: 1 / 10.1 inst / ns
![Page 69: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/69.jpg)
Example 2 – Split more stages
Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? _________ and _________
![Page 70: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/70.jpg)
Example 2 – Split more stages
Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? Memory and _________
![Page 71: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/71.jpg)
Example 2 – Split more stages
Fetch Decode Execute Memory Writeback 6ns 4ns 8ns 10ns 4ns 0.1 ns pipelined register delay Which stage(s) should we split? Memory and Execute
![Page 72: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/72.jpg)
Example 2 – After Split
F D X1 X2 M1 M2 WB ___ns ___ns ___ns ___ns ___ns ___ns ___ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns
![Page 73: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/73.jpg)
Example 2 – After Split
F D X1 X2 M1 M2 WB 6 ns 4 ns ___ns ___ns ___ns ___ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns
![Page 74: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/74.jpg)
Example 2 – After Split
F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns ___ns ___ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns
![Page 75: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/75.jpg)
Example 2 – After Split
F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / ns Pipelined: 1 / ns
![Page 76: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/76.jpg)
Example 2 – After Split
F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / 32 ns Pipelined: 1 / ns
![Page 77: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/77.jpg)
Example 2 – After Split
F D X1 X2 M1 M2 WB 6 ns 4 ns 4 ns 4 ns 5 ns 5 ns 4 ns 0.1 ns pipelined register delay Single cycle: 1 / 32 ns Pipelined: 1 / 6.1 ns
![Page 78: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/78.jpg)
Easy Right? Not so fast.
In what cycle does the add write $s0? In what cycle does the or read $s0?
Time->
add $s0, $0, $0
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Incorrect Execution
![Page 79: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/79.jpg)
Time->
add $s0, $0, $0
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
Easy Right? Not so fast.
In what cycle does the add write $s0? 1st half of cycle 5 In what cycle does the or read $s0?
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
![Page 80: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/80.jpg)
Time->
add $s0, $0, $0
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
Easy Right? Not so fast.
In what cycle does the add write $s0 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 3
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
WB
![Page 81: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/81.jpg)
Time->
add $s0, $0, $0
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
Easy Right? Not so fast.
In what cycle does the add write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 3
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Ahhhh! Values can not pass backwards
in time
WB
![Page 82: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/82.jpg)
Time->
add $s0, $0, $0
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
Easy Right? Not so fast. In what cycle does the add write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 5
Stall - wasted cycles
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
IF IF
Correct, Slow Execution
![Page 83: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/83.jpg)
Time->
add $s0, $0, $0
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
Easy Right? Not so fast. In what cycle does the add write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 5
Stall - wasted cycles
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
IF IF
Correct, Slow ExecutionOnly Register File rd/wr in half a cycle. All other stages take a full cycle – this is
because of shared hardware
![Page 84: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/84.jpg)
Pipelined Machine
Read Addr Out Data
Instruction Memory
PC
4
src1 src1data
src2 src2data Register File
destreg
destdata
op/fun rs rt rd imm
Addr Out Data
Data Memory
In Data
32 Sign Ext
16
<< 2
<< 2
Pipeline Register
Fetch
(Writeback)
Execute Decode Memory
![Page 85: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/85.jpg)
Incorrect Execution caused by Data Hazard
In what cycle does the lw write $s0? In what cycle does the or read $s0?
Time->
lw $s0, 0($t4)
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
![Page 86: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/86.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
In what cycle does the lw write $s0? 1st half of cycle 5 In what cycle does the or read $s0?
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
lw $s0, 0($t4)
Incorrect Execution caused by Data Hazard
![Page 87: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/87.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM WB
In what cycle does the lw write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 3
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
lw $s0, 0($t4) WB
Incorrect Execution caused by Data Hazard
![Page 88: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/88.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM WB
In what cycle does the lw write $s0? 1st half of 5 In what cycle does the or read $s0? 2nd half of 3
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
lw $s0, 0($t4) WB
ID
Incorrect Execution caused by Data Hazard
Arrow to the left is information passed backwards in time
![Page 89: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/89.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM WB
In what cycle does the lw write $s0? 1st half of 5 In what cycle does the or read $s0? 2nd half of 3
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Data Hazard
lw $s0, 0($t4) WB
ID
Remember!!! Only WB and ID take ½ cycle. All other stages take a full
cycle!!!!!!!!!
![Page 90: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/90.jpg)
Barriers to pipelined performance
• Uneven stages• Pipeline register delays
![Page 91: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/91.jpg)
Barriers to pipelined performance
• Uneven stages• Pipeline register delays• Data Hazards
![Page 92: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/92.jpg)
Barriers to pipeline performance
• Uneven stages• Pipeline register delays• Data Hazards
w An instruction depends on the result of a previous instruction still in the pipeline
![Page 93: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/93.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM WB
In what cycle does the lw write $s0? 1st half of cycle 5 In what cycle does the or read $s0? 2nd half of cycle 5
IF IF
Stall - wasted cycles
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Default: Stall
lw $s0, 0($t4) WB
ID
Remember!!! Only WB and ID take ½ cycle. All other stages take a full
cycle!!!!!!!!!
![Page 94: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/94.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM WB
In what cycle is $s0 calculated in the machine? In what cycle is $s0 used in the machine?
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Solution 1: Data Forwarding
lw $s0, 0($t4) WB
ID
![Page 95: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/95.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM WB
In what cycle is $s0 calculated in the machine? End of cycle 4 In what cycle is $s0 used?
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Solution 1: Data Forwarding
lw $s0, 0($t4) WB
ID
![Page 96: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/96.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM WB
In what cycle is $s0 calculated in the machine? End of cycle 4 In what cycle is $s0 used? beginning of cycle 4
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Solution 1: Data Forwarding
lw $s0, 0($t4) WB
ID
![Page 97: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/97.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM WB
In what cycle is $s0 calculated in the machine? end of cycle 4 In what cycle is $s0 used? beginning of cycle 5
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Solution 1: Data Forwarding
lw $s0, 0($t4)
ID
IF
WB
ID
![Page 98: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/98.jpg)
Data-ForwardingWhere are those wires?
Read Addr Out Data
Instruction Memory
PC
4
src1 src1data
src2 src2data Register File
destreg
destdata
op/fun rs rt rd imm
Addr Out Data
Data Memory
In Data
32 Sign Ext
16
<< 2
<< 2
Pipeline Register
Fetch
(Writeback)
Execute Decode Memory
![Page 99: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/99.jpg)
Data-ForwardingWhere are those wires?
Read Addr Out Data
Instruction Memory
PC
4
src1 src1data
src2 src2data Register File
destreg
destdata
op/fun rs rt rd imm
Addr Out Data
Data Memory
In Data
32 Sign Ext
16
<< 2
<< 2
Pipeline Register
Fetch
(Writeback)
Execute Decode Memory
![Page 100: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/100.jpg)
Time->
add $s2, $s2, $t0
F D
F
M
1 2 3 4 5 6 7 8 9 10 11 12
Draw the timing diagram with data forwarding Draw arrows to indicate data passing through forwarding
F
lw $t0, 0($s0) W
D
Data ForwardingExample 2
sw $s2, 0($s0)
addi $t0, $t0, 1
![Page 101: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/101.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM WB
IF IF
Stall - wasted cycles
sw $s2, 0($t1)
and $s6, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Solution 2: Instruction Reordering (Before reordering)
lw $s0, 0($t4) WB
ID
![Page 102: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/102.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
sw $s2, 0($t1) and $s6, $s4, $t3 IF ID
IF ID WB
MEM
MEM
WB
Solution 2: Instruction Reordering (After Reordering)
lw $s0, 0($t4)
ID
WB
![Page 103: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/103.jpg)
Who reorders instructions?
• Static schedulingw Compilerw Simpler, but does not know when caches
miss or loads/stores are to the same locations
• Dynamic schedulingw Hardwarew More complicated, but has all knowledge
![Page 104: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/104.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF ID
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
sw $s3, 0($t1)
and $s0, $s4, $t3
IF ID
IF ID WB
MEM
MEM
WB
Solution 2: Instruction Reordering
lw $s0, 0($t4) ID
WB
![Page 105: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/105.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
sw $s3, 0($t1)
and $s0, $s4, $t3 IF ID
IF ID WB
MEM
MEM
WB
Solution 2: Instruction Reordering
lw $s0, 0($t4) ID
WB
Is this the same execution?!?
![Page 106: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/106.jpg)
Time->
or $s3, $s0, $t3
IF ID
IF
MEM
1 2 3 4 5 6 7 8
MEM
WB
WB
sw $s3, 0($t1)
and $s0, $s4, $t3 IF ID
IF ID WB
MEM
MEM
WB
Solution 2: Instruction Reordering
lw $s0, 0($t4) ID
WB
Is this the same execution?!?
![Page 107: Pipelining - facultystaff.richmond.edu · Performance Measurements • Cycle Time: Time in between clock ticks • Latency: Time to finish a complete job, start to finish • Throughput:](https://reader030.vdocuments.mx/reader030/viewer/2022041200/5d40dd3a88c9938c3f8d19d0/html5/thumbnails/107.jpg)
Pipelined Machine
Read Addr Out Data
Instruction Memory
PC
4
src1 src1data
src2 src2data Register File
destreg
destdata
op/fun rs rt rd imm
Addr Out Data
Data Memory
In Data
32 Sign Ext
16
<< 2
<< 2
Pipeline Register
Fetch
(Writeback)
Execute Decode Memory