mips 5-stage pipeline - introduction | csap · mips 5-stage pipeline lecture 2.2 august 22nd ... §...
TRANSCRIPT
SeoulNa)onalUniversity
1 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
MIPS5-stagePipelineLecture2.2August22nd,2017
JaeW.Lee([email protected])ComputerScienceandEngineeringSeoulNaLonalUniversityDownloadthislectureslidesathOps://goo.gl/rJPMQUSlidecredits:[CS:APP3e]slidesfromCMU;[COD5e]slidesfromElsevierInc.
SeoulNa)onalUniversity
2 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Review:MIPSISA¢ 32-bitfixedformatinstruc)on(3formats)¢ 3232-bitgeneral-purposeregisters(GPRs)
§ RegisterR0-R31§ R0containszero
¢ 32single-precisionfloa)ng-pointregisters(FPRs)§ RegisterF0-F31§ Pairedtoform16double-precisionFPRs(F0/F1,F2/F3,andsoon)
¢ 3-operand,reg-regarithme)cinstruc)on§ e.g.,addR1,R1,R2#R1=R1+R2
¢ Singleaddressmodeforload/store:base+displacement§ noindirecLon
¢ Simplebranchcondi)ons§ Usearegister(R1~R31)tostoreflag
SeoulNa)onalUniversity
3 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Review:3MIPSInstruc)onFormats
Op
31 26 01516202125
Rs Rt immediate
Op
31 26 025
Op
31 26 01516202125
Rs Rt
target
Rd funct
Register-Register(e.g.,addRd,Rs,Rt)561011
Register-Immediate(e.g.,lwRt,Rs(imm))
Op
31 26 01516202125
Rs Rt immediate
Branch(e.g.,beqRs,Rt,imm)
Jump/Call(e.g.,jtarget)
R-Format
I-Format
J-Format
Shamt
SeoulNa)onalUniversity
4 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
OutlineReference:[COD5e]Ch.4.5-4.8¢ ISAimplementa)onbasics¢ Conceptofpipelining¢ Classic5-stagePipelineforMIPS:AFirstShot¢ PipelineHazards
§ StructuralHazards§ DataHazards§ ControlHazards
¢ Excep)on/InterruptHandling¢ HandlingMul)-CycleOpera)ons
SeoulNa)onalUniversity
5 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
“MealeyMachine
”“M
ooreM
achine
”
ISAImplementa)onBasics¢ FiniteStateMachines=Combina)onallogic+Flip-Flops
Alpha/0
Delta/2
Beta/1
0/0
1/0
1/1
0/10/0
1/1
Flip-flop
s
Combina
)ona
lLogic
Input& Stateold& Statenew& Div&
0&0&0&
00&01&10&
00&10&01&
0&0&1&
1&1&1&
00&01&10&
01&00&10&
0&1&1&
SeoulNa)onalUniversity
6 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
ISAImplementa)onBasics¢ FundamentalExecu)onCycle
Instruc(onFetch
Instruc(onDecode
OperandFetch
Execute
ResultStore
NextInstruc(on
Obtaininstruc)onfromprogramstorage
Determinerequiredac)onsandinstruc)onsize
Locateandobtainoperanddata
Computeresultvalueorstatus
Depositresultsinstorageforlateruse
Determinesuccessorinstruc)on
Processor
regs
F.U.s
Memory
program
Data
vonNeumanbocleneck
SeoulNa)onalUniversity
7 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
ISAImplementa)onBasics¢ DatapathvsControl
§ Datapath:Storage,FU,interconnectsufficienttoperformdesiredfuncLons§ InputsareControlPoints§ Outputsaresignals
§ Controller:StatemachinetoorchestrateoperaLonondatapath§ BasedondesiredfuncLonandsignals
Datapath Controller
ControlPoints
signals
SeoulNa)onalUniversity
8 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
ConceptofPipelining
SeoulNa)onalUniversity
9 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
YouAlreadyKnowPipelining:LaundryExample
¢ Sequen)alProcessing:Wash-Dry-Fold-Store
SeoulNa)onalUniversity
10 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
YouAlreadyKnowPipelining:LaundryExample
¢ PipelinedProcessing
SeoulNa)onalUniversity
11 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipeliningforComputa)on
¢ System§ ComputaLonrequirestotalof300picoseconds§ AddiLonal20picosecondstosaveresultinregister§ Musthaveclockcycleofatleast320ps
Combinational logic
R e g
300 ps 20 ps
Clock
Delay = 320 ps Throughput = 3.12 GIPS
SeoulNa)onalUniversity
12 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipeliningforComputa)on
¢ 3-WayPipelinedVersion§ DividecombinaLonallogicinto3blocksof100pseach§ CanbeginnewoperaLonassoonaspreviousonepassesthroughstageA.
§ BeginnewoperaLonevery120ps§ Overalllatencyincreases
§ 360psfromstarttofinish
R e g
Clock
Comb. logic
A
R e g
Comb. logic
B
R e g
Comb. logic
C
100 ps 20 ps 100 ps 20 ps 100 ps 20 ps
Delay = 360 ps Throughput = 8.33 GIPS
SeoulNa)onalUniversity
13 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipeliningforComputa)on:PipelineDiagrams
¢ Unpipelined
§ CannotstartnewoperaLonunLlpreviousonecompletes
¢ 3-WayPipelined
§ Upto3operaLonsinprocesssimultaneously
Time
OP1 OP2 OP3
Time
A B C A B C
A B C
OP1 OP2 OP3
SeoulNa)onalUniversity
14 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShot
SeoulNa)onalUniversity
15 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShotMemoryAccess
WriteBack
Instruc)onFetch
Instr.Decode+Reg.Fetch
Execute+Addr.Calc
ALU
InstrucLonMem
ory
RegFile
MUXMUX
DataMem
ory
MUX
SignExtend
Zero?
IF/ID
ID/EX
MEM
/WB
EX/MEM
4
Adder
NextSEQPC NextSEQPC
Rd Rd Rd
WBDa
ta
NextPC
PC
RsRt
Imm
MUX
IR <= mem[PC]; PC <= PC + 4
A <= Reg[IRrs]; B <= Reg[IRrt]
rslt <= A opIRop B
WB <= rslt; Reg[IRrd] <= WB
SeoulNa)onalUniversity
16 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShot¢ Example:execu)ngload(lw)instruc)on–IFstage
SeoulNa)onalUniversity
17 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShot¢ Example:execu)ngload(lw)instruc)on–IDstage
SeoulNa)onalUniversity
18 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShot¢ Example:execu)ngload(lw)instruc)on–EXstage
SeoulNa)onalUniversity
19 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShot¢ Example:execu)ngload(lw)instruc)on–MEMstage
SeoulNa)onalUniversity
20 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShot¢ Example:execu)ngload(lw)instruc)on–WBstage
Wrongregisternumber
SeoulNa)onalUniversity
21 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShot¢ Example:execu)ngload(lw)instruc)on–WBstage
§ Withcorrecteddatapath
SeoulNa)onalUniversity
22 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Classic5-stagePipelineforMIPS:AFirstShot¢ Visualizingpipelining:showingresourceusage
SeoulNa)onalUniversity
24 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards¢ Pipeliningisnotquitethateasy!¢ Limitstopipelining:Hazardspreventnextinstruc)on
fromexecu)ngduringitsdesignatedclockcycle§ Structuralhazards:HWcannotsupportthiscombinaLonof
instrucLons§ Datahazards:InstrucLondependsonresultofpriorinstrucLonsLll
inthepipeline§ Controlhazards:Causedbydelaybetweenthefetchingof
instrucLonsanddecisionsaboutchangesincontrolflow(branchesandjumps).
SeoulNa)onalUniversity
25 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Instr.Order
Time(clockcycles)
Load
Instr1
Instr2
Instr3
Instr4
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
Cycle1 Cycle2 Cycle3 Cycle4 Cycle6 Cycle7Cycle5
Reg ALU
DMemIfetch Reg
PipelineHazards:StructuralHazards¢ Structuralhazardwithonlyonememoryport
SeoulNa)onalUniversity
26 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:StructuralHazards
Instr.Order
Time(clockcycles)
Load
Instr1
Instr2
Stall
Instr3
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
Cycle1 Cycle2 Cycle3 Cycle4 Cycle6 Cycle7Cycle5
Reg ALU
DMemIfetch Reg
Bubble Bubble Bubble BubbleBubble
Howdoyou“bubble”thepipe?
¢ Structuralhazardwithonlyonememoryport
SeoulNa)onalUniversity
27 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Instr.Order
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9 xor r10,r1,r11
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
PipelineHazards:DataHazards¢ DatahazardonR1
Time(clockcycles)
IF ID/RF EX MEM WB
SeoulNa)onalUniversity
28 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:DataHazards¢ Genericdatahazards1:ReadAoerWrite(RAW)
§ InstrJtriestoreadoperandbeforeInstrIwritesit
§ Causedbya“Dependence”(incompilernomenclature).ThishazardresultsfromanactualneedforcommunicaLon.
I: add r1,r2,r3 J: sub r4,r1,r3
SeoulNa)onalUniversity
29 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:DataHazards¢ Genericdatahazards2:WriteAoerRead(WAR)
§ InstrJwritesoperandbeforeInstrIreadsit
§ Calledan“anL-dependence”bycompilerwriters.Thisresultsfromreuseofthename“r1”.
§ Can’thappeninMIPS5stagepipelinebecause:§ AllinstrucLonstake5stages,and§ ReadsarealwaysinStage2,and§ WritesarealwaysinStage5
I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7
SeoulNa)onalUniversity
30 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:DataHazards¢ Genericdatahazards3:WriteAoerWrite(WAW)
§ InstrJwritesoperandbeforeInstrIwritesit.
§ Calledan“outputdependence”bycompilerwriters.Thisalsoresultsfromthereuseofname“r1”.
§ Can’thappeninMIPS5stagepipelinebecause:§ AllinstrucLonstake5stages,and§ Writesarealwaysinstage5
§ WillseeWARandWAWinmorecomplicatedpipes
I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7
SeoulNa)onalUniversity
31 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Time(clockcycles)
PipelineHazards:DataHazards¢ Forwardingtoavoiddatahazard
Instr.Order
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
SeoulNa)onalUniversity
32 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:DataHazards¢ HWchangeforforwarding
MEM
/WR
ID/EX
EX/MEM
DataMemory
ALU
mux
mux
Registers
NextPC
Immediate
mux
Whatcircuitdetectsandresolvesthishazard?
SeoulNa)onalUniversity
33 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Time(clockcycles)
PipelineHazards:DataHazards¢ ForwardingtoavoidLW-SWdatahazard
Instr.
Order
add r1,r2,r3
lw r4,0(r1)
sw r4,12(r1)
or r8,r6,r9
xor r10,r9,r11
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
SeoulNa)onalUniversity
34 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Time(clockcycles)
Instr.Order
lw r1, 0(r2)
sub r4,r1,r6
and r6,r1,r7
or r8,r1,r9
PipelineHazards:DataHazards¢ Datahazardevenwithforwarding(akaload-usehazard)
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
SeoulNa)onalUniversity
35 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:DataHazards¢ Datahazardevenwithforwarding(akaload-usehazard)
or r8,r1,r9
lw r1, 0(r2)
sub r4,r1,r6
and r6,r1,r7
Reg ALU
DMem Ifetch Reg
Reg Ifetch ALU
DMem Reg Bubble
Ifetch ALU
DMem Reg Bubble Reg
Ifetch ALU
DMem Bubble Reg
Time(clockcycles)
Instr.Order
SeoulNa)onalUniversity
36 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:ControlHazards¢ Controlhazardonbranches:three-stagestall
10: beq r1,r3,36
14: and r2,r3,r5
18: or r6,r1,r7
22: add r8,r1,r9 36: xor r10,r1,r11
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Whatdoyoudowiththe3instruc)onsinbetween?Howdoyoudoit?Whereisthe“commit”?
SeoulNa)onalUniversity
37 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:ControlHazards¢ Performanceimpactofbranchstalls
§ TwopartsoluLon:§ Determinebranchtakenornotsooner,AND§ Computetakenbranchaddressearlier
§ MIPSbranchtestsifregister=0or≠0§ MIPSSoluLon:
§ MoveZerotesttoID/RFstage§ AddertocalculatenewPCinID/RFstage§ 1clockcyclepenaltyforbranch(versus3)
SeoulNa)onalUniversity
38 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Adder
PipelineHazards:ControlHazards¢ PipelinedMIPSdatapath(new)
MemoryAccess
WriteBack
Instruc)onFetch
Instr.DecodeReg.Fetch
ExecuteAddr.Calc
ALU
Mem
ory
RegFile
MUX
DataMem
ory
MUX
SignExtend
Zero?
MEM
/WB
EX/MEM
4
Adder
NextSEQPC
Rd Rd Rd
WBDa
ta
NextPC
Address
RsRt
Imm
MUX
ID/EX
IF/ID
*InterplayofISAdesignandcycle)me
SeoulNa)onalUniversity
39 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
PipelineHazards:ControlHazards¢ Fourbranchhazardalterna)ves
#1:StallunLlbranchdirecLonisclear#2:PredictBranchNotTaken
§ ExecutesuccessorinstrucLonsinsequence§ “Squash”instrucLonsinpipelineifbranchactuallytaken§ Advantageoflatepipelinestateupdate§ 47%MIPSbranchesnottakenonaverage§ PC+4alreadycalculated,souseittogetnextinstrucLon
#3:PredictBranchTaken§ 53%MIPSbranchestakenonaverage§ Buthaven’tcalculatedbranchtargetaddressinMIPS
– MIPSsLllincurs1cyclebranchpenalty–noperformancebenefit– Othermachines:branchtargetknownbeforeoutcomeå
SeoulNa)onalUniversity
40 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Excep)on/InterruptHandling¢ Excep)onsvs.interrupts
§ ExcepLon:AnunusualeventhappenstoaninstrucLonduringitsexecuLon§ Examples:dividebyzero,undefinedopcode
§ Interrupt:HardwaresignaltoswitchtheprocessortoanewinstrucLonstream§ Example:asoundcardinterruptswhenitneedsmoreaudiooutputsamples(anaudio“click”happensifitislerwaiLng)
¢ “Precise”excep)ons§ Problem:ItmustappearthattheexcepLonorinterruptmustappear
between2instrucLons(IiandIi+1)§ TheeffectofallinstrucLonsuptoandincludingIiiscomplete§ NoeffectofanyinstrucLonarerIicantakeplace
§ Theinterrupt(excepLon)handlereitherabortsprogramorrestartsatinstrucLonIi+1
SeoulNa)onalUniversity
41 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Excep)on/InterruptHandling
¢ Excep)onhandlinginMIPS§ ExcepLonsmanagedbyaSystemControlCoprocessor(CP0)§ SavePCofoffending(orinterrupted)instrucLon
§ InMIPS:ExcepLonProgramCounter(EPC)§ SaveindicaLonoftheproblem
§ InMIPS:Causeregister§ Jumptohandleratapredeterminedaddress(e.g.,0x80000180)
SeoulNa)onalUniversity
42 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Excep)on/InterruptHandling¢ Preciseexcep)onsinsta)cpipelines
Keyobserva)on:architectedstateonlychangeinmemoryandregisterwritestages.
SeoulNa)onalUniversity
43 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Summary¢ Justoverlaptasks;easyiftasksareindependent¢ SpeedUp≤PipelineDepth¢ Hazardslimitperformanceoncomputers:
§ Structural:needmoreHWresources§ Data(RAW,WAR,WAW):needforwarding,compilerscheduling§ Control:delayedbranch,predicLon
¢ Excep)ons,interruptsaddcomplexity
¢ Next)me:Let'stalkaboutout-of-order(OOO)scheduling