ece232: hardware organization and design · ece232: hardware organization and design lecture 13:...

19
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture 13: Pipelining

Upload: others

Post on 07-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

ECE232: Hardware Organization and Design

Lecture 13: Pipelining

Page 2: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 2

Overview

§  Single-cycle MIPs datapath presented so far

§  Not overly efficient. Components of the datapath can be used more efficiently

§  Idea! •  Put registers between stages of the datapath •  Clock used to update register values •  All stages perform an operation on every clock cycle

§  Pipelined datapath: the basis for almost all modern microprocessors!

Page 3: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 3

Speeding up through pipelining

§  Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold •  Washer takes 30 minutes

•  Dryer takes 30 minutes •  “Folder” takes 30 minutes

•  “Stasher” takes 30 minutes to put clothes into drawers

A B C D

Page 4: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 4

Sequential Laundry

§  Sequential laundry takes 8 hours for 4 loads §  If they learned pipelining, how long would laundry take?

30 T a s k O r d e r

B

C D

A Time 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30

6 PM 7 8 9 10 11 12 1 2 AM

Page 5: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 5

Pipelined Laundry: Start work ASAP

§  Pipelined laundry takes 3.5 hours for 4 loads!

T a s k O r d e r

12 2 AM 6 PM 7 8 9 10 11 1

Time

B C D

A 30 30 30 30 30 30 30

Page 6: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 6

Pipelining Lessons

§  Pipelining doesn’t help latency of single task, it helps throughput of entire workload

§  Multiple tasks operating simultaneously using different resources

§  Potential speedup = Number pipe stages

§  Pipeline rate limited by slowest pipeline stage

§  Unbalanced lengths of pipe stages reduces speedup

§  Time to “fill” pipeline and time to “drain” it reduces speedup

6 PM 7 8 9 Time

B C D

A 30 30 30 30 30 30 30

T a s k O r d e r

Page 7: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 7

MIPs Datapath §  Datapath contains 5 stages §  Instruction fetch (IF), Decode (ID), Execute (EX), Memory (Mem

), Writeback (W)

Stage 5 (W)!

PC Registers A!L!U!

Stage 1 (IF)! Stage 2 (ID)! Stage 3 (EX)!

Data!Memory!

Stage 4 (Mem)!

!Instruction!

Memory

§  Can I pipeline the MIPs stages?

Page 8: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 8

Pipelining Instructions

Time (in cycles)

Inst

ruct

ion

IF ID EX M W

IF ID EX M W

IF ID EX M W

IF ID EX M W

IF ID EX M

IF ID EX

Fetch = 200 ps Decode = 100 ps Execute = 200 ps Memory = 200 ps Write back = 100 ps

W

M W

What is the latency for this pipeline?

Page 9: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 9

Pipeline Performance

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Page 10: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 10

Why Pipeline? Because the resources are there!

I n s t r. O r d e r

Time (clock cycles)

Inst 1

Inst 2

Inst 3

Inst 5

Inst 4

AL

U

Im Reg Dm Reg

AL

U

Im Reg Dm Reg

AL

U

Im Reg Dm Reg A

LU

Im Reg Dm Reg

AL

U

Im Reg Dm Reg

Page 11: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 11

MIPS Pipelined Datapath §  State registers between pipeline stages to isolate them

Read Address

Instruction Memory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

16 32

ALU

Shift left 2

Add

Data Memory

Address

Write Data

Read Data IF

etch

/Dec

Dec

/Exe

c

Exec

/Mem

Mem

/WB

IF:IFetch ID:Dec EX:Execute MEM: MemAccess

WB: WriteBack

System Clock

Sign Extend

Inst 1 Inst 2 Inst 3 Inst 4 Inst 5

Page 12: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 12

Pipeline Hazards §  Data hazards: an instruction uses the result of a previous

instruction (RAW) ADD R1, R2, R3 or SW R1, 4(R2) SUB R4, R1, R5 LW R3, 4(R2)

§  Control hazards: the address of the next instruction to be executed depends on a previous instruction

BEQ R1,R2,CONT SUB R6,R7,R8 …

CONT: ADD R3,R4,R5

§  Structural hazards: two instructions need access to the same resource

•  e.g., single memory shared for instruction fetch and load/store

Page 13: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 13

I n s t r. O r d e r

Time (clock cycles)

lw

Inst 1

Inst 2

Inst 4

Inst 3

ALU

Mem Reg Mem Reg

ALU

Mem Reg Mem Reg

ALU

Mem Reg Mem Reg A

LU

Mem Reg Mem Reg

ALU

Mem Reg Mem Reg

Structural Hazard

Reading data from memory

Reading instruction from memory

§  Fix with separate instruction and data memories (I$ and D$)

Page 14: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 14

Data Hazards

Time (in cycles)

F

Inst

ruct

ion

D EX M W

F D EX M W

Write Data to R1 Here

Get data from R1 Here ADD R1, R2, R3 SUB R4, R1, R5

Page 15: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 16

Additional Way to “Fix” a Data Hazard

I n s t r. O r d e r

add $1,…

ALU

IM Reg DM Reg

sub $4,$1,$5

and $6,$1,$7 A

LU

IM Reg DM Reg

ALU

IM Reg DM Reg

by forwarding

xor $4,$1,$5

or $8,$1,$9

ALU

IM Reg DM Reg

ALU

IM Reg DM Reg

Time

Page 16: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 17

Internal data forwarding

I n s t r. O r d e r

add $1,…

ALU

IM Reg DM Reg

sub $4,$1,$5

and $6,$1,$7 A

LU

IM Reg DM Reg

ALU

IM Reg DM Reg

Fix data hazards by forwarding

results to where they are needed

xor $4,$1,$5

or $8,$1,$9

ALU

IM Reg DM Reg

ALU

IM Reg DM Reg

ALU-to-ALU forwarding vs. full forwarding

Time

Page 17: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 18

Forwarding with Load-use Data Hazards

ALU

IM Reg DM Reg

ALU

IM Reg DM Reg

ALU

IM Reg DM Reg

ALU

IM Reg DM Reg

ALU

IM Reg DM Reg

§  sub needs to stall §  Will still need one stall cycle even with forwarding

I n s t r. O r d e r

lw $1,4($2)

sub $4,$1,$5

and $6,$1,$7

xor $4,$1,$5

or $8,$1,$9

Time

Page 18: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 19

Control Hazard

Time (in cycles)

F

Inst

ruct

ion

D EX M W

F D EX M W

Destination Available Here

Need Destination Here JR R25 ...

XX: ADD ...

Simple solution: Flush Instruction fetch until branch resolved

Page 19: ECE232: Hardware Organization and Design · ECE232: Hardware Organization and Design Lecture 13: Pipelining ECE232: Pipelining 2 Overview ! Single-cycle MIPs datapath presented so

ECE232: Pipelining 20

Summary

§  Pipelined processors are fundamental. •  Spend the time to understand why pipelining is important

§  The use of pipelining greatly improves microprocessor performance •  The “clock” for microprocessors is about 3 GHz today

§  Hazards can be a difficult concept •  Convince yourself with examples •  Next time: Control hazards!