mips 5-stage pipeline - introduction | csap · mips 5-stage pipeline lecture 2.2 august 22nd ... §...

43
Seoul Na)onal University 1 heig-vd/snu summer university 2017: how modern processors work? MIPS 5-stage Pipeline Lecture 2.2 August 22 nd , 2017 Jae W. Lee ([email protected] ) Computer Science and Engineering Seoul NaLonal University Download this lecture slides at hOps://goo.gl/rJPMQU Slide credits: [CS:APP3e] slides from CMU; [COD5e] slides from Elsevier Inc.

Upload: lamphuc

Post on 06-Sep-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

SeoulNa)onalUniversity

1 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

MIPS5-stagePipelineLecture2.2August22nd,2017

JaeW.Lee([email protected])ComputerScienceandEngineeringSeoulNaLonalUniversityDownloadthislectureslidesathOps://goo.gl/rJPMQUSlidecredits:[CS:APP3e]slidesfromCMU;[COD5e]slidesfromElsevierInc.

SeoulNa)onalUniversity

2 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Review:MIPSISA¢  32-bitfixedformatinstruc)on(3formats)¢  3232-bitgeneral-purposeregisters(GPRs)

§  RegisterR0-R31§  R0containszero

¢  32single-precisionfloa)ng-pointregisters(FPRs)§  RegisterF0-F31§  Pairedtoform16double-precisionFPRs(F0/F1,F2/F3,andsoon)

¢  3-operand,reg-regarithme)cinstruc)on§  e.g.,addR1,R1,R2#R1=R1+R2

¢  Singleaddressmodeforload/store:base+displacement§  noindirecLon

¢  Simplebranchcondi)ons§  Usearegister(R1~R31)tostoreflag

SeoulNa)onalUniversity

3 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Review:3MIPSInstruc)onFormats

Op

31 26 01516202125

Rs Rt immediate

Op

31 26 025

Op

31 26 01516202125

Rs Rt

target

Rd funct

Register-Register(e.g.,addRd,Rs,Rt)561011

Register-Immediate(e.g.,lwRt,Rs(imm))

Op

31 26 01516202125

Rs Rt immediate

Branch(e.g.,beqRs,Rt,imm)

Jump/Call(e.g.,jtarget)

R-Format

I-Format

J-Format

Shamt

SeoulNa)onalUniversity

4 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

OutlineReference:[COD5e]Ch.4.5-4.8¢  ISAimplementa)onbasics¢  Conceptofpipelining¢  Classic5-stagePipelineforMIPS:AFirstShot¢  PipelineHazards

§  StructuralHazards§  DataHazards§  ControlHazards

¢  Excep)on/InterruptHandling¢  HandlingMul)-CycleOpera)ons

SeoulNa)onalUniversity

5 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

“MealeyMachine

”“M

ooreM

achine

ISAImplementa)onBasics¢  FiniteStateMachines=Combina)onallogic+Flip-Flops

Alpha/0

Delta/2

Beta/1

0/0

1/0

1/1

0/10/0

1/1

Flip-flop

s

Combina

)ona

lLogic

Input& Stateold& Statenew& Div&

0&0&0&

00&01&10&

00&10&01&

0&0&1&

1&1&1&

00&01&10&

01&00&10&

0&1&1&

SeoulNa)onalUniversity

6 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

ISAImplementa)onBasics¢  FundamentalExecu)onCycle

Instruc(onFetch

Instruc(onDecode

OperandFetch

Execute

ResultStore

NextInstruc(on

Obtaininstruc)onfromprogramstorage

Determinerequiredac)onsandinstruc)onsize

Locateandobtainoperanddata

Computeresultvalueorstatus

Depositresultsinstorageforlateruse

Determinesuccessorinstruc)on

Processor

regs

F.U.s

Memory

program

Data

vonNeumanbocleneck

SeoulNa)onalUniversity

7 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

ISAImplementa)onBasics¢  DatapathvsControl

§  Datapath:Storage,FU,interconnectsufficienttoperformdesiredfuncLons§  InputsareControlPoints§  Outputsaresignals

§  Controller:StatemachinetoorchestrateoperaLonondatapath§  BasedondesiredfuncLonandsignals

Datapath Controller

ControlPoints

signals

SeoulNa)onalUniversity

8 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

ConceptofPipelining

SeoulNa)onalUniversity

9 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

YouAlreadyKnowPipelining:LaundryExample

¢  Sequen)alProcessing:Wash-Dry-Fold-Store

SeoulNa)onalUniversity

10 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

YouAlreadyKnowPipelining:LaundryExample

¢  PipelinedProcessing

SeoulNa)onalUniversity

11 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipeliningforComputa)on

¢  System§  ComputaLonrequirestotalof300picoseconds§  AddiLonal20picosecondstosaveresultinregister§  Musthaveclockcycleofatleast320ps

Combinational logic

R e g

300 ps 20 ps

Clock

Delay = 320 ps Throughput = 3.12 GIPS

SeoulNa)onalUniversity

12 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipeliningforComputa)on

¢  3-WayPipelinedVersion§  DividecombinaLonallogicinto3blocksof100pseach§  CanbeginnewoperaLonassoonaspreviousonepassesthroughstageA.

§  BeginnewoperaLonevery120ps§  Overalllatencyincreases

§  360psfromstarttofinish

R e g

Clock

Comb. logic

A

R e g

Comb. logic

B

R e g

Comb. logic

C

100 ps 20 ps 100 ps 20 ps 100 ps 20 ps

Delay = 360 ps Throughput = 8.33 GIPS

SeoulNa)onalUniversity

13 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipeliningforComputa)on:PipelineDiagrams

¢  Unpipelined

§  CannotstartnewoperaLonunLlpreviousonecompletes

¢  3-WayPipelined

§  Upto3operaLonsinprocesssimultaneously

Time

OP1 OP2 OP3

Time

A B C A B C

A B C

OP1 OP2 OP3

SeoulNa)onalUniversity

14 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShot

SeoulNa)onalUniversity

15 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShotMemoryAccess

WriteBack

Instruc)onFetch

Instr.Decode+Reg.Fetch

Execute+Addr.Calc

ALU

InstrucLonMem

ory

RegFile

MUXMUX

DataMem

ory

MUX

SignExtend

Zero?

IF/ID

ID/EX

MEM

/WB

EX/MEM

4

Adder

NextSEQPC NextSEQPC

Rd Rd Rd

WBDa

ta

NextPC

PC

RsRt

Imm

MUX

IR <= mem[PC]; PC <= PC + 4

A <= Reg[IRrs]; B <= Reg[IRrt]

rslt <= A opIRop B

WB <= rslt; Reg[IRrd] <= WB

SeoulNa)onalUniversity

16 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShot¢  Example:execu)ngload(lw)instruc)on–IFstage

SeoulNa)onalUniversity

17 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShot¢  Example:execu)ngload(lw)instruc)on–IDstage

SeoulNa)onalUniversity

18 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShot¢  Example:execu)ngload(lw)instruc)on–EXstage

SeoulNa)onalUniversity

19 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShot¢  Example:execu)ngload(lw)instruc)on–MEMstage

SeoulNa)onalUniversity

20 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShot¢  Example:execu)ngload(lw)instruc)on–WBstage

Wrongregisternumber

SeoulNa)onalUniversity

21 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShot¢  Example:execu)ngload(lw)instruc)on–WBstage

§  Withcorrecteddatapath

SeoulNa)onalUniversity

22 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Classic5-stagePipelineforMIPS:AFirstShot¢  Visualizingpipelining:showingresourceusage

SeoulNa)onalUniversity

23 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards

SeoulNa)onalUniversity

24 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards¢  Pipeliningisnotquitethateasy!¢  Limitstopipelining:Hazardspreventnextinstruc)on

fromexecu)ngduringitsdesignatedclockcycle§  Structuralhazards:HWcannotsupportthiscombinaLonof

instrucLons§  Datahazards:InstrucLondependsonresultofpriorinstrucLonsLll

inthepipeline§  Controlhazards:Causedbydelaybetweenthefetchingof

instrucLonsanddecisionsaboutchangesincontrolflow(branchesandjumps).

SeoulNa)onalUniversity

25 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Instr.Order

Time(clockcycles)

Load

Instr1

Instr2

Instr3

Instr4

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Cycle1 Cycle2 Cycle3 Cycle4 Cycle6 Cycle7Cycle5

Reg ALU

DMemIfetch Reg

PipelineHazards:StructuralHazards¢  Structuralhazardwithonlyonememoryport

SeoulNa)onalUniversity

26 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:StructuralHazards

Instr.Order

Time(clockcycles)

Load

Instr1

Instr2

Stall

Instr3

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Cycle1 Cycle2 Cycle3 Cycle4 Cycle6 Cycle7Cycle5

Reg ALU

DMemIfetch Reg

Bubble Bubble Bubble BubbleBubble

Howdoyou“bubble”thepipe?

¢  Structuralhazardwithonlyonememoryport

SeoulNa)onalUniversity

27 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Instr.Order

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9 xor r10,r1,r11

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

PipelineHazards:DataHazards¢  DatahazardonR1

Time(clockcycles)

IF ID/RF EX MEM WB

SeoulNa)onalUniversity

28 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:DataHazards¢  Genericdatahazards1:ReadAoerWrite(RAW)

§  InstrJtriestoreadoperandbeforeInstrIwritesit

§  Causedbya“Dependence”(incompilernomenclature).ThishazardresultsfromanactualneedforcommunicaLon.

I: add r1,r2,r3 J: sub r4,r1,r3

SeoulNa)onalUniversity

29 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:DataHazards¢  Genericdatahazards2:WriteAoerRead(WAR)

§  InstrJwritesoperandbeforeInstrIreadsit

§  Calledan“anL-dependence”bycompilerwriters.Thisresultsfromreuseofthename“r1”.

§  Can’thappeninMIPS5stagepipelinebecause:§  AllinstrucLonstake5stages,and§  ReadsarealwaysinStage2,and§  WritesarealwaysinStage5

I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7

SeoulNa)onalUniversity

30 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:DataHazards¢  Genericdatahazards3:WriteAoerWrite(WAW)

§  InstrJwritesoperandbeforeInstrIwritesit.

§  Calledan“outputdependence”bycompilerwriters.Thisalsoresultsfromthereuseofname“r1”.

§  Can’thappeninMIPS5stagepipelinebecause:§  AllinstrucLonstake5stages,and§  Writesarealwaysinstage5

§  WillseeWARandWAWinmorecomplicatedpipes

I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

SeoulNa)onalUniversity

31 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Time(clockcycles)

PipelineHazards:DataHazards¢  Forwardingtoavoiddatahazard

Instr.Order

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

SeoulNa)onalUniversity

32 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:DataHazards¢  HWchangeforforwarding

MEM

/WR

ID/EX

EX/MEM

DataMemory

ALU

mux

mux

Registers

NextPC

Immediate

mux

Whatcircuitdetectsandresolvesthishazard?

SeoulNa)onalUniversity

33 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Time(clockcycles)

PipelineHazards:DataHazards¢  ForwardingtoavoidLW-SWdatahazard

Instr.

Order

add r1,r2,r3

lw r4,0(r1)

sw r4,12(r1)

or r8,r6,r9

xor r10,r9,r11

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

SeoulNa)onalUniversity

34 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Time(clockcycles)

Instr.Order

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

or r8,r1,r9

PipelineHazards:DataHazards¢  Datahazardevenwithforwarding(akaload-usehazard)

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

SeoulNa)onalUniversity

35 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:DataHazards¢  Datahazardevenwithforwarding(akaload-usehazard)

or r8,r1,r9

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

Reg ALU

DMem Ifetch Reg

Reg Ifetch ALU

DMem Reg Bubble

Ifetch ALU

DMem Reg Bubble Reg

Ifetch ALU

DMem Bubble Reg

Time(clockcycles)

Instr.Order

SeoulNa)onalUniversity

36 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:ControlHazards¢  Controlhazardonbranches:three-stagestall

10: beq r1,r3,36

14: and r2,r3,r5

18: or r6,r1,r7

22: add r8,r1,r9 36: xor r10,r1,r11

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Whatdoyoudowiththe3instruc)onsinbetween?Howdoyoudoit?Whereisthe“commit”?

SeoulNa)onalUniversity

37 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:ControlHazards¢  Performanceimpactofbranchstalls

§  TwopartsoluLon:§  Determinebranchtakenornotsooner,AND§  Computetakenbranchaddressearlier

§  MIPSbranchtestsifregister=0or≠0§  MIPSSoluLon:

§  MoveZerotesttoID/RFstage§  AddertocalculatenewPCinID/RFstage§  1clockcyclepenaltyforbranch(versus3)

SeoulNa)onalUniversity

38 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Adder

PipelineHazards:ControlHazards¢  PipelinedMIPSdatapath(new)

MemoryAccess

WriteBack

Instruc)onFetch

Instr.DecodeReg.Fetch

ExecuteAddr.Calc

ALU

Mem

ory

RegFile

MUX

DataMem

ory

MUX

SignExtend

Zero?

MEM

/WB

EX/MEM

4

Adder

NextSEQPC

Rd Rd Rd

WBDa

ta

NextPC

Address

RsRt

Imm

MUX

ID/EX

IF/ID

*InterplayofISAdesignandcycle)me

SeoulNa)onalUniversity

39 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

PipelineHazards:ControlHazards¢  Fourbranchhazardalterna)ves

#1:StallunLlbranchdirecLonisclear#2:PredictBranchNotTaken

§  ExecutesuccessorinstrucLonsinsequence§  “Squash”instrucLonsinpipelineifbranchactuallytaken§  Advantageoflatepipelinestateupdate§  47%MIPSbranchesnottakenonaverage§  PC+4alreadycalculated,souseittogetnextinstrucLon

#3:PredictBranchTaken§  53%MIPSbranchestakenonaverage§  Buthaven’tcalculatedbranchtargetaddressinMIPS

–  MIPSsLllincurs1cyclebranchpenalty–noperformancebenefit–  Othermachines:branchtargetknownbeforeoutcomeå

SeoulNa)onalUniversity

40 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Excep)on/InterruptHandling¢  Excep)onsvs.interrupts

§  ExcepLon:AnunusualeventhappenstoaninstrucLonduringitsexecuLon§  Examples:dividebyzero,undefinedopcode

§  Interrupt:HardwaresignaltoswitchtheprocessortoanewinstrucLonstream§  Example:asoundcardinterruptswhenitneedsmoreaudiooutputsamples(anaudio“click”happensifitislerwaiLng)

¢  “Precise”excep)ons§  Problem:ItmustappearthattheexcepLonorinterruptmustappear

between2instrucLons(IiandIi+1)§  TheeffectofallinstrucLonsuptoandincludingIiiscomplete§  NoeffectofanyinstrucLonarerIicantakeplace

§  Theinterrupt(excepLon)handlereitherabortsprogramorrestartsatinstrucLonIi+1

SeoulNa)onalUniversity

41 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Excep)on/InterruptHandling

¢  Excep)onhandlinginMIPS§  ExcepLonsmanagedbyaSystemControlCoprocessor(CP0)§  SavePCofoffending(orinterrupted)instrucLon

§  InMIPS:ExcepLonProgramCounter(EPC)§  SaveindicaLonoftheproblem

§  InMIPS:Causeregister§  Jumptohandleratapredeterminedaddress(e.g.,0x80000180)

SeoulNa)onalUniversity

42 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Excep)on/InterruptHandling¢  Preciseexcep)onsinsta)cpipelines

Keyobserva)on:architectedstateonlychangeinmemoryandregisterwritestages.

SeoulNa)onalUniversity

43 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Summary¢  Justoverlaptasks;easyiftasksareindependent¢  SpeedUp≤PipelineDepth¢  Hazardslimitperformanceoncomputers:

§  Structural:needmoreHWresources§  Data(RAW,WAR,WAW):needforwarding,compilerscheduling§  Control:delayedbranch,predicLon

¢  Excep)ons,interruptsaddcomplexity

¢  Next)me:Let'stalkaboutout-of-order(OOO)scheduling