anne bracy cs 3410• add pipeline registers (flip-flops)for isolation • each stage begins by...

Post on 30-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AnneBracyCS3410

ComputerScienceCornellUniversity

The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.

2

insn0.fetch, dec, exec

Single-cycle

insn1.fetch, dec, exec

Pipelinedinsn0.decinsn0.fetch

insn1.decinsn1.fetchinsn0.exec

insn1.exec

5-stagePipeline• Implementation• WorkingExample

3

Hazards• Structural• DataHazards• ControlHazards

Write-BackMemory

InstructionFetch Execute

InstructionDecode

extend

registerfile

control

4

alu

memory

din dout

addrPC

memory

newpc

inst

IF/ID ID/EX EX/MEM MEM/WB

imm

BA

ctrl

ctrl

ctrl

BD D

M

computejump/branch

targets

+4

5

1 2 3 4 5 6 7 8 9Cycle

Latency:Throughput:

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

Latency: 5cyclesThroughput: 1insn/cycle CPI=1

add

nand

lw

add

sw

• Breakdatapath intomultiplecycles (here5)• Parallelexecutionincreasesthroughput• Balancedpipelineveryimportant

• Sloweststagedeterminesclockrate• Imbalancekillsperformance

• Addpipelineregisters(flip-flops) forisolation• Eachstagebeginsbyreadingvaluesfromlatch• Eachstageendsbywritingvaluestolatch

• Resolvehazards

6

7

Stage PerformFunctionality Latchvaluesofinterest

Fetch UsePCtoindexProgramMemory,incrementPC

Instructionbits(tobedecoded)PC+4(tocomputebranchtargets)

Decode Decodeinstruction,generatecontrolsignals,readregisterfile

Controlinformation,Rdindex,immediates,offsets,register values(Ra,Rb),PC+4(tocomputebranchtargets)

ExecutePerformALUoperationComputetargets(PC+4+offset,etc.)incasethisisabranch,decideifbranchtaken

Controlinformation,Rdindex, etc.ResultofALUoperation,valueincasethisisastoreinstruction

Memory Performload/storeifneeded,addressisALUresult

Controlinformation,Rdindex,etc.Resultofload,passresultfrom execute

Writeback Selectvalue,writetoregisterfile

8

PC

instructionmemory

inst

addr mc

00=readword

IF/ID

Restofp

ipeline

+4

PC+4

pc-sel

pc-regpc-rel

pc-abs•PC+4•pc-reg (PCregisters:JR)•pc-rel (PC-relative: BEQ,BNE)•pc-abs(PCabsolute:JandJAL)

9

ctrl

ID/EX

Restofp

ipeline

PC+4

inst

IF/ID

PC+4

Stage1:InstructionFetch

registerfile

WERd

Ra Rb

DB

A

BA

extend imm

decode

result

dest

Stage2:InstructionDe

code

pc-rel

pc-abs

10

ctrl

EX/MEM

Restofp

ipeline

BD

ctrl

ID/EX

PC+4

BA

alu

+�

branch?im

mpc-sel

pc-reg

target

11

ctrl

MEM/WB

Restofp

ipeline

Stage3:Execute

MD

ctrl

EX/MEM

BD

memory

din doutaddr

mctarget

branch?pc-sel

pc-rel

pc-abs

pc-reg

12

Stage4:M

emory

ctrl

MEM/WB

MD

result

dest

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

BA

Rt

BD

MD

PC+4

imm

ctrl

target

OP

Rd

OP

PC

instmem

Rd

Ra Rb

DB

A

Rd

13

Consideranon-pipelinedprocessorwithclockperiodC (e.g.,50ns).IfyoudividetheprocessorintoN stages(e.g.,5),yournewclockperiodwillbe:

A. CB. NC. lessthanC/ND. C/NE. greaterthanC/N

14

• Instructionssamelength• 32bits,easytofetchandthendecode

• 3typesofinstructionformats• Easytoroutebitsbetweenstages• Canreadaregistersourcebeforeevenknowing

whattheinstructionis• Memoryaccessthroughlwandswonly

• AccessmemoryafterALU

15

5-stagePipeline• Implementation• WorkingExample

16

Hazards• Structural• DataHazards• ControlHazards

add r3 ß r1, r2 nand r6 ß r4, r5 lw r4 ß 20(r2)add r5 ß r2, r5sw r7 à 12(r3)

Assume8-registermachine

17

data

dest

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

Time:018

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

nop

0

0

0

040

0

nop

0

0

nop

0

0

0

0

add312

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Fetch:add312

add312

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

Time:119

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

add

3

9

36

480

0

nop

0

0

nop

0

0

0

0nand645

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

12

Bits26-31

data

dest

Fetch:nand645

nand 645 add312

IF/ID ID/EX EX/MEM MEM/WB

extend

2MUX

3

Time:220

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

nand

6

7

18

8124

45

add

3

9

nop

0

0

0

0lw420(2)

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

45

Bits26-31

data

dest

Fetch:lw420(2)

lw420(2) nand 645 add312

36

9

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

6 32

Time:321

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

lw

20

18

9

12168

-3

nand

6

7

add

3

45

0

0add525

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

24

Bits26-31

data

dest

Fetch:add525

add525 lw420(2) nand 645 add312

18

7

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

4MUX

0 65

Time:4

nand

18=0100107=000111-------------------3=111101

22

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

add

5

7

9

162012

29

lw

4

18

nand

6

-3

0

0sw712(3)

945187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

25

Bits26-31

data

dest

Fetch:sw712(3)

sw712(3) add525 lw420(2) nand 645add312

9

20

4

-3

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

5 04

Time:523

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

sw

12

22

45

2016

16

add

5

7

lw

4

29

99

0945187

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

37

Bits26-31

data

dest

Nomoreinstructions

nop sw712(3) add525 lw420(2) nand 645

9

7

5

29

4

-3

6

IF/ID ID/EX EX/MEM MEM/WB

extend

7MUX

0 55

Time:624

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

20

57

sw

7

22

add

5

16

0

0945997

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Nomoreinstructions

nop nop sw712(3) add525 lw420(2)

45

7

12

16

5

99

4

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

07

Time:725

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

sw

7

57

0

9459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Nomoreinstructions

nop nop nop sw712(3) add525

2257

22

16

5

SlidesthankstoSallyMcKee

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Time:826

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

9459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits21-23

data

dest

Nomoreinstructions

nop nop nop nop sw712(3)

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Time:927

Pipeliningisgreatbecause:

A. Youcanfetchanddecodethesameinstructionatthesametime.

B. Youcanfetchtwoinstructionsatthesametime.C. Youcanfetchoneinstructionwhiledecoding

another.D. Instructionsonlyneedtovisitthepipeline

stagesthattheyrequire.E. CandD

28

5-stagePipeline• Implementation• WorkingExample

29

Hazards• Structural• DataHazards• ControlHazards

Correctnessproblemsassociatedw/processordesign

1. StructuralhazardsSameresourceneededfordifferentpurposesatthesametime(Possible:ALU, RegisterFile,Memory)

2. DatahazardsInstructionoutputneededbeforeit’savailable

3. ControlhazardsNextinstructionPCunknownattimeofFetch

30

31

addr3,r2,r1nopnop

addr6,r5,r4

datamem

instmem

DB

A

IF ID Ex M WIF ID Ex M W

IF ID Ex M W

add r3, r2,r1nopaddr6,r5,r4

Problem: NeedtoreadfromandwritetoRegisterFileatthesametimeSolution: negateRFclock:writefirsthalf,readsecondhalf

nop

IF ID Ex M W

Dependence:relationshipbetweentwoinsns• Data:twoinsnsusesamestoragelocation• Control: 1insnaffectswhetheranotherexecutesatall• Notabadthing,programswouldbeboring otherwise• Enforcedbymakingolderinsngobeforeyoungerone

– Happensnaturallyinsingle-/multi-cycledesigns– Butnotinapipeline

Hazard:dependence&possibilityofwronginsnorder• Effectsofwronginsnordercannotbeexternallyvisible• Hazardsareabadthing:mostsolutionseithercomplicatethehardwareorreduceperformance

32

DataHazards• registerfile(RF)readsoccurinstage2(ID)• RFwritesoccurinstage5(WB)• RFwrittenin½half,readinsecond½halfofcycle

x10: add r3 ß r1, r2x14: sub r5 ß r3, r4

1.Isthereadependence?2.Isthereahazard?

33

A) YesB) NoC) Cannottellwiththe

informationgiven.

Whichofthefollowingstatementsistrue?

A.Whetherthereisadatadependencebetweentwoinstructionsdependsonthemachinetheprogramisrunningon.B.Whetherthereisadatahazardbetweentwoinstructionsdependsonthemachinetheprogramisrunningon.C.BothA&BD.NeitherAnorB

34

35

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

36

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

backwardsarrowsrequiretimetravel

37

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

38

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

PC

instmem

Rd

Ra Rb

DB

A

Rd

Detecting Data Hazards

IF/ID.Ra ≠0?

39

Ra==? Ra==

?

add r3, r1, r2subr5,r3,r4

Stall=(IF/ID.Ra !=0&& (IF/ID.Ra ==ID/EX.Rd||IF/ID.Ra ==EX/M.Rd))

1. DoNothing• ChangetheISAtomatchimplementation• “Heycompiler:don’tcreatecodew/datahazards!”

(Wecandobetterthanthis)

2. Stall• Pausecurrentandsubsequentinstructionstillsafe

3. Forward/bypass• Forwarddatavaluetowhereitisneeded

(Onlyworksifvalueactuallyexistsalready)

40

HowtostallaninstructioninIDstage• preventIF/IDpipelineregisterupdate

– stallstheIDstageinstruction

• convertIDstageinsn intonop forlaterstages– innocuous“bubble”passesthroughpipeline

• preventPCupdate– stallsthenext(IFstage)instruction

41

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addr

PC

instmem

Rd

Ra Rb

DB

A

42

Rd

addr3,r1,r2subr5,r3,r5orr6,r3,r4addr6,r3,r8

inst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

Ifhazard:

WE=0MemWr=0RegWr=0

detecthazard

43

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

addr3,r1,r2

(MemWr=0RegWr=0)

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

subr5,r3,r5

orr6,r3,r4 (WE=0)

STALLCONDITIONMET

44

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

nop

(MemWr=0RegWr=0)

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

addr3,r1,r2subr5,r3,r5

(MemWr=0RegWr=0)

orr6,r3,r4 (WE=0)

STALLCONDITIONMET

45

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

nop

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

addr3,r1,r2subr5,r3,r5

(MemWr=0RegWr=0)

orr6,r3,r4 (WE=1)NOSTALLCONDITIONMET:suballowedtoleavedecodestage

nop

46

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

subr5,r3,r5

or r6,r3,r4

addr6,r3,r8

time

47

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

subr5,r3,r5

or r6,r3,r4

addr6,r3,r8

r3=10

r3=20

time

IF ID Ex M W

IF ID Ex M W

IF ID Ex M

ID* ID*

IF* IF*

IF ID Ex

2StallCycles

1. DoNothing• ChangetheISAtomatchimplementation• “Compiler:don’tcreatecodewithdatahazards!”

(Nicetry,wecandobetterthanthis)

2. Stall• Pausecurrentandsubsequentinstructionstillsafe

3. Forward/bypass• Forwarddatavaluetowhereitisneeded

(Onlyworksifvalueactuallyexistsalready)

48

49

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

IF/ID ID/Ex Ex/Mem Mem/WB

50

datamem

imm

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

Twotypesofforwarding/bypass• ForwardingfromEx/Mem registerstoExstage(M®Ex)• ForwardingfromMem/WBregistertoExstage(W® Ex)

IF/ID ID/Ex Ex/Mem Mem/WB

51

addr3,r1,r2

subr5,r3,r1

datamem

instmem

DB

A

IF ID Ex M W

IF ID Ex M W

addr3,r1,r2subr5,r3,r1

Problem:EXneedsALUresultthatisinMEMstageSolution:addabypassfromEX/MEM.DtostartofEX

Ex/Mem

52

datamem

instmem

DB

A

DetectionLogicinExStage:forward=(Ex/M.WE&&EX/M.Rd !=0&&

ID/Ex.Ra ==Ex/M.Rd)||(sameforRb)

addr3,r1,r2subr5,r3,r1

Ex/Mem

53

addr3,r1,r2

subr5,r3,r1

orr6,r3,r4

datamem

instmem

DB

A

IF ID Ex M WIF ID

IF WEx M WID Ex M

Problem:EXneedsvaluebeingwrittenbyWBSolution:AddbypassfromWBfinalvaluetostartofEX

Mem/WB

add r3, r1,r2subr5,r3,r1orr6,r3,r4

54

datamem

instmem

DB

A

DetectionLogic:forward=(M/WB.WE&&M/WB.Rd !=0&&

ID/Ex.Ra ==M/WB.Rd &&not(ID/Ex.WE &&Ex/M.Rd !=0&&

ID/Ex.Ra ==Ex/M.Rd)||(sameforRb)

Mem/WB

add r3, r1,r2subr5,r3,r1orr6,r3,r4

55

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

time

56

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

IF ID Ex M W

IF ID

IF W

Ex M W

ID Ex M

IF ID Ex

time

M W

IF ID Ex M W

Datadependencyafteraloadinstruction:• ValuenotavailableuntilaftertheMstageàNextinstructioncannotproceedifdependent

THEKILLERHAZARD57

datamem

instmem

DB

A

lwr4,20(r8)orr6,r3,r4

58

lwr4,20(r8)

or r6,r3,r4

datamem

instmem

DB

A

lwr4,20(r8)orr6,r4,r1

59

lwr4,20(r8)

or r6,r3,r4

datamem

instmem

DB

A

IF ID Ex

IF ID

lwr4,20(r8)orr6,r4,r1

60

datamem

instmem

DB

A

NOPorr6,r4,r1 lwr4,20(r8)

lwr4,20(r8)

or r6,r3,r4

IF ID Ex M W

IF ID Ex M WID*Stall

61

datamem

instmem

DB

A

NOPorr6,r4,r1 lwr4,20(r8)

Ex

lwr4,20(r8)

or r6,r3,r4

IF ID Ex M W

IF ID Ex M WID*Stall

62

datamemim

m

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MCRa

MC

forwardunit

detecthazard

IF/ID ID/Ex Ex/Mem Mem/WB

Stall=If(ID/Ex.MemRead &&IF/ID.Ra ==ID/Ex.Rd

RdMC

Mostfrequent3410non-solutiontoload-usehazardsWhyisthis“solution”sosososososoawful? 63

datamemim

m

B

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MCRa

MC

forwardunit

detecthazard

IF/ID ID/Ex Ex/Mem Mem/WB

RdMC

ForwardingvaluesdirectlyfromMemorytotheExecutestagewithoutstoringtheminaregisterfirst:

A. Doesnotremovetheneedtostall.B. AddsonetoomanypossibleinputstotheALU.C. Willcausethepipelineregistertohavethe

wrongvalue.D. Halvesthefrequencyoftheprocessor.E. BothA&D

64

TwoMIPSSolutions:• MIPS2000/3000:delayslot

– ISAsaysresultsofloadsarenotavailableuntilonecyclelater

–Assemblerinsertsnop,orreorderstofilldelayslot

• MIPS4000onwards:stall– Butreally,programmer/compilerreorderstoavoidstallingintheloaddelayslot

65

5-stagePipeline• Implementation• WorkingExample

66

Hazards• Structural• DataHazards• ControlHazards

67

for (i = 0; i < max; i++) {n += 2;

}i = 7;n--;

x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 addi r1, r0, 7 # i = 7x24 subi r2, r2, 1 # n++

i à r1Assume:n à r2max à r3

ControlHazards• instructionsarefetchedinstage1(IF)• branchandjumpdecisionsoccurinstage3(EX)à nextPCnotknownuntil2cycles after branch/jump

x1C blt r1, r3, Loop x20 addi r1, r0, 7

x24 subi r2, r2, 1

68

Branchnot taken?NoProblem!

Branchtaken?Justfetched2addi’sà Zap&Flush

69

1C blt r1,r3,L20 addi r1,r0,724 subi r2,r2,1

datamem

instmem D

B

A

PC

+4

NOPIF ID Ex M W

IF ID NOP NOPNOPIF NOP NOP NOP

branchcalc

decidebranch

IF ID Ex M W

IfbranchTaken®Zap

• preventPCupdate• clearIF/IDlatch• branchcontinues

NewPC=1C

14 L:addi r2,r2,2

70

1C blt r1,r3,L20 addi r1,r0,724 subi r2,r2,1

datamem

instmem D

B

A

PC

+4

NOPIF ID Ex M W

IF ID NOP NOPNOPIF NOP NOP NOP

branchcalc

decidebranch

IF ID Ex M W

IfbranchTaken®Zap

• preventPCupdate• clearIF/IDlatch• branchcontinues

NewPC=1C

14 L:addi r2,r2,2

Foreverytakenbranch?OUCH!!!

1. DelaySlot• YouMUSTdothis• MIPSISA:1insn afterctrlinsn always executed

• Whetherbranchtakenornot

2. ResolveBranchatDecode• SomegroupsdothisforProject2,yourchoice• Movebranchcalc fromEXtoID• Alternative:justzap2nd instructionwhenbranchtaken

3. BranchPrediction• Notin3410,buteveryprocessorworthanythingdoesthis

(nooffense!)

71

datamem

instmem D

B

A

PC

+4

branchcalc

decidebranchNewPC=1C

1C blt r1, r3, Loop F D X

20 addi r1, r0, 7 F D

24 subi r2, r2, 1 F

73

for (i = 0; i < max; i++) {n += 2;

}i = 7;n--;x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 nopx24 addi r1, r0, 7 # i = 7x28 subi r2, r2, 1 # n++

i à r1Assume:n à r2max à r3

datamem

instmem D

B

A

PC

+4

branchcalc

decidebranchNewPC=1C

1C blt r1, r3, Loop F D X

20 nop F D

24 addi r1, r0, 7 F

75

datamem

instmem D

B

A

PC

+4

NewPC=1C

branchcalc

decidebranch

1C blt r1, r3, Loop F D X

20 nop F D

14 Loop:addi r2,r2,2 F

76

x10 addi r1, r0, 0 # i=0x14 Loop: addi r2, r2, 2 # n x+= 2x18 addi r1, r1, 1 # i++x1C blt r1, r3, Loop # i<max?x20 nop

x10 addi r1, r0, 0 # i=0x14 Loop: addi r1, r1, 1 # i++x18 blt r1, r3, Loop # i<max?x1C addi r2, r2, 2 # n x+= 2

Compiler transforms code

77

datamem

instmem D

B

A

PC

+4

NewPC=1C

branchcalc

decidebranch

1C blt r1, r3, Loop F D X

20 addi r2,r2,2 F D

14 Loop:addi r1,r1,1 F

MostprocessorsupportSpeculativeExecution• Guess directionofthebranch

– Allowinstructionstomovethroughpipeline– Zapthemlaterifguessturnsouttobewrong

• Amustforlongpipelines

78

Datahazardsoccurwhenaoperand(register)dependsontheresultofapreviousinstructionthatmaynotbecomputedyet.Pipelinedprocessorsneedtodetectdatahazards.

Stalling,preventingadependentinstructionfromadvancing,isonewaytoresolvedatahazards.StallingintroducesNOPs(“bubbles”)intoapipeline.IntroduceNOPsby(1)preventingthePCfromupdating,(2)preventingwritestoIF/IDregistersfromchanging,and(3)preventingwritestomemoryandregisterfile.Nops significantlydecreaseperformance.

Forwardingbypassessomepipelinedstagesforwardingaresulttoadependentinstructionoperand(register).Betterperformancethanstalling.

79

ControlhazardsoccurbecausethePCfollowingacontrolinstructionisnotknownuntilcontrolinstructionisexecuted.Ifbranchistakenà needtozapinstructions.1cycleperformancepenalty.

DelaySlotscanpotentiallyincreaseperformanceduetocontrolhazards.Theinstructioninthedelayslotwillalwaysbeexecuted.Requiressoftware(compiler)tomakeuseofdelayslot.Putnop indelayslotifnotabletoputusefulinstructionindelayslot.

WecanreducecostofacontrolhazardbymovingbranchdecisionandcalculationfromExstagetoIDstage.Withadelayslot,thisremovestheneedtoflushinstructionsontakenbranches.

80

top related