isas and y86-64cr4bd/3330/s2018/slides/20180201--slides-1up.pdf•isa •agreed upon interface...
TRANSCRIPT
SamiraKhan
ISAsandY86-64
Agenda• ISAvsMicroarchitecture• ISATradeoffs• Y86-64ISA• Y86-64Format• Y86-64Encoding/Decoding
LEVELSOFTRANSFORMATION• ISA• Agreeduponinterfacebetweensoftwareandhardware• SW/compilerassumes,HWpromises
• Whatthesoftwarewriterneedstoknowtowritesystem/userprograms
• Microarchitecture• SpecificimplementationofanISA• Notvisibletothesoftware
• Microprocessor• ISA,uarch,circuits• “Architecture” =ISA+microarchitecture
MicroarchitectureISAProgram/Language
AlgorithmProblem
Logic
Circuits
3
ISAVS.MICROARCHITECTURE• WhatispartofISAvs.Uarch?
• Gaspedal:interfacefor“acceleration”• Internalsoftheengine:implements“acceleration”• Addinstructionvs.Adderimplementation
• Implementation(uarch)canbevariousaslongasitsatisfiesthespecification(ISA)• Bitserial,ripplecarry,carrylookahead adders• x86ISAhasmanyimplementations:286,386,486,Pentium,PentiumPro,…
• Uarch usuallychangesfasterthanISA• FewISAs(x86,SPARC,MIPS,Alpha)butmanyuarchs• Why?
4
ISA• Instructions• Opcodes,AddressingModesDataTypes• InstructionTypesandFormats• Registers,ConditionCodes
• Memory• Addressspace,Addressability,Alignment• Virtualmemorymanagement
• Call,Interrupt/ExceptionHandling
• AccessControl,Priority/Privilege
• I/O
• TaskManagement
• PowerandThermalManagement
• Multi-threadingsupport,Multiprocessorsupport
5
ExampleISAs
• x86— dominantindesktops,servers• ARM— dominantinmobiledevices• POWER—WiiU,IBMsupercomputersandsomeservers• MIPS— commoninconsumerwifi accesspoints• SPARC— someOracleservers,Fujitsusupercomputers• z/Architecture— IBMmainframes• Z80— TIcalculators• SHARC— somedigitalsignalprocessors• Itanium— someHPservers(beingretired)• RISCV— someembedded• …
Agenda• ISAvsMicroarchitecture• ISATradeoffs• Y86-64ISA• Y86-64Format• Y86-64encoding/decoding
ISA: INSTRUCTIONLENGTH
• Fixedlength:Lengthofallinstructionsthesame+Easiertodecodesingleinstructioninhardware+Easiertodecodemultipleinstructionsconcurrently-- Wastedbitsininstructions(Whyisthisbad?)-- Harder-to-extendISA(howtoaddnewinstructions?)
• Variablelength:Lengthofinstructionsdifferent(determinedbyopcode andsub-opcode)
+Compactencoding(Whyisthisgood?)Intel432:Huffmanencoding(sortof).6to321bitinstructions.How?
-- Morelogictodecodeasingleinstruction-- Hardertodecodemultipleinstructionsconcurrently
8
ISA:ADDRESSINGMODES
• Addressingmodespecifieshowtoobtainanoperandofaninstruction• Register• Immediate• Memory(displacement,registerindirect,indexed,absolute,memoryindirect,autoincrement,autodecrement,…)
• x86-64:10(%r11,%r12,4)• ARM:%r11 << 3 (shiftregistervaluebyconstant)• VAX:((%r11)) (registervalueispointertopointer)
9
ISA:ConditionCodes
cmpq %r11, %r12
je somewhere
• coulddo:/* _Branch if _EQual */
beq %r11, %r12, somewhere
ISA-LEVELTRADEOFFS:SEMANTICGAP
• WheretoplacetheISA?Semanticgap• Closertohigh-levellanguage(HLL)orclosertohardwarecontrolsignals?à Complexvs.simpleinstructions• RISCvs.CISCvs.HLLmachines• FFT,QUICKSORT,POLY,FPinstructions?• VAXINDEXinstruction(arrayaccesswithboundschecking)• e.g.,A[i][j][k]oneinstructionwithboundcheck
11
SEMANTICGAP
12
High-LevelLanguage
ControlSignals
ISA
SemanticGap
Software
Hardware
SEMANTICGAP
13
High-LevelLanguage
ControlSignals
ISASemanticGap
Software
Hardware
CISC
RISC
ISA-LEVELTRADEOFFS:SEMANTICGAP• WheretoplacetheISA?Semanticgap• Closertohigh-levellanguage(HLL)orclosertohardwarecontrolsignals?à Complexvs.simpleinstructions• RISCvs.CISCvs.HLLmachines
• FFT,QUICKSORT,POLY,FPinstructions?• VAXINDEXinstruction(arrayaccesswithboundschecking)
• Tradeoffs:• Simplecompiler,complexhardwarevs.complexcompiler,simplehardware
• Burdenofbackwardcompatibility• Performance?
• Optimizationopportunity:ExampleofVAXINDEXinstruction:who(compilervs.hardware)putsmoreeffortintooptimization?
• Instructionsize,codesize
14
SMALLSEMANTICGAPEXAMPLESINVAX• FINDFIRST
• Findthefirstsetbitinabitfield• HelpsOSresourceallocationoperations
• SAVECONTEXT,LOADCONTEXT• Specialcontextswitchinginstructions
• INSQUEUE,REMQUEUE• Operationsondoublylinkedlist
• INDEX• Arrayaccesswithboundschecking
• STRINGOperations• Comparestrings,findsubstrings,…
• CyclicRedundancyCheckInstruction• EDITPC
• Implementseditingfunctionstodisplayfixedformatoutput
• DigitalEquipmentCorp.,“VAX11780ArchitectureHandbook,” 1977-78.
15
CISCvs.RISC
16
REPMOVSX:MOVADDCOMPMOVADDJMPX
Whichoneiseasytooptimize?
x86: REP MOVS DEST SRC
SMALLVERSUSLARGESEMANTICGAP• CISCvs.RISC• Complexinstructionsetcomputerà complexinstructions
• Initiallymotivatedby“notgoodenough” codegeneration• Reducedinstructionsetcomputerà simpleinstructions
• JohnCocke,mid1970s,IBM801• Goal:enablebettercompilercontrolandoptimization
• RISCmotivatedby• Memorystalls(noworkdoneinacomplexinstructionwhenthereisamemorystall?)• Whenisthiscorrect?
• Simplifyingthehardwareà lowercost,higherfrequency• Enablingthecompilertooptimizethecodebetter
• Findfine-grainedparallelismtoreducestalls
17
TypicalRISCISAproperties
• fewer,simplerinstructions• separateinstructionstoaccessmemory• fixed-lengthinstructions• moreregisters• noinstructionswithtwomemoryoperands• fewaddressingmodes
Agenda• ISAvsMicroarchitecture• ISATradeoffs• Y86-64ISA• Y86-64Format• Y86-64encoding/decoding
Y86-64instructionset
• basedonx86• omitsmostofthe1000+instructionsaddq jmp pushqsubq jCC popqandq cmovCC movq (renamed)xorq call hlt (renamed)nop ret• much,muchsimplerencoding
Y86-64:movq
• irmovq immovq iimovq• rrmovq rmmovq rimovq• mrmovq mmmovq mimovq
Y86-64:cmovCC• conditionalmove• (Conditionally)copyvaluefromsourcetodestinationregister• Y86-64:register-to-registeronly• insteadof:jle skip_moverrmovq %rax, %rbxskip_move:• //...• cando:cmovg %rax, %rbx
Y86-64:halt
• (x86-64instructioncalledhlt)• Y86-64instructionhalt• stopstheprocessor• otherwise— something’sinmemory“after”program!• realprocessors:reservedforOS
Y86-64:specifyingaddresses
• rmmovq %r11, 10(%r12)• memory[10+r12]ß r11
• r12ßmemory[10+r11]+r12mrmovq 10(%r11), %r11/* overwrites %r11 */addq %r11, %r12
Y86-64:accessingmemory
• r12ßmemory[10+8*r11]+r12/* replace %r11 with 8*%r11 */addq %r11, %r11addq %r11, %r11addq %r11, %r11mrmovq 10(%r11), %r11addq %r11, %r12
Y86-64constants
• irmovq $100, %r11• onlyinstructionwithnon-addressconstantoperand
• r12 ß r12 + 1• Invalid:addq $1, %r12• Instead,needanextraregister:irmovq $1, %r11addq %r11, %r12
Y86-64:conditioncodes
• ZF— valuewaszero?• SF— signbitwasset?i.e.valuewasnegative?• thiscourse:noOF,CF(tosimplifyassignments)• setbyaddq, subq, andq, xorq• notsetbyanythingelse
Y86-64:usingconditioncodes
j__or cmov__ conditioncodebittest valuetest
le SF=1orZF=1 value<= 0
l SF=1 value<0
e ZF=1 value=0
ne ZF=0 value!=0
ge SF=0 value>= 0
g SF=0andZF=0 value>0
subq SECOND,FIRST(value=FIRST- SECOND)
push/pop
pushq %rbx%rsp ß %rsp − 8memory[%rsp] ß %rbx
popq %rbx%rbx ß memory[%rsp]%rsp ß %rsp + 8
Agenda• ISAvsMicroarchitecture• ISATradeoffs• Y86-64ISA• Y86-64Format• Y86-64encoding/decoding
Y86-64InstructionSet#1Byte
pushq rA A 0 rA F
jXX Dest 7 fn Dest
popq rA B 0 rA F
call Dest 8 0 Dest
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB) 4 0 rA rB D
mrmovq D(rB), rA 5 0 rA rB D
OPq rA, rB 6 fn rA rB
ret 9 0
nop 1 0
halt 0 0
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
V
D
D
Y86-64InstructionSet#2Byte
pushq rA A 0 rA F
jXX Dest 7 fn Dest
popq rA B 0 rA F
call Dest 8 0 Dest
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB
rmmovq rA, D(rB) 4 0 rA rB
mrmovq D(rB), rA 5 0 rA rB
OPq rA, rB 6 fn rA rB
ret 9 0
nop 1 0
halt 0 0
rrmovq 2 0
cmovle 2 1
cmovl 2 2
cmove 2 3
cmovne 2 4
cmovge 2 5
cmovg 2 6
Y86-64InstructionSet#3Byte
pushq rA A 0 rA F
jXX Dest 7 fn Dest
popq rA B 0 rA F
call Dest 8 0 Dest
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB) 4 0 rA rB D
mrmovq D(rB), rA 5 0 rA rB D
OPq rA, rB 6 fn rA rB
ret 9 0
nop 1 0
halt 0 0
0 1 2 3 4 5 6 7 8 9
addq 6 0
subq 6 1
andq 6 2
xorq 6 3
Y86-64InstructionSet#4Byte
pushq rA A 0 rA F
jXX Dest 7 fn Dest
popq rA B 0 rA F
call Dest 8 0 Dest
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB) 4 0 rA rB D
mrmovq D(rB), rA 5 0 rA rB D
OPq rA, rB 6 fn rA rB
ret 9 0
nop 1 0
halt 0 0
0 1 2 3 4 5 6 7 8 9jmp 7 0
jle 7 1
jl 7 2
je 7 3
jne 7 4
jge 7 5
jg 7 6
EncodingRegisters• Eachregisterhas4-bitID
• Sameencodingasinx86-64
• RegisterID15(0xF)indicates“noregister”• Willusethisinourhardwaredesigninmultipleplaces
%rax%rcx%rdx%rbx
0123
%rsp%rbp%rsi%rdi
4567
%r8%r9%r10%r11
89AB
%r12%r13%r14
NoRegister
CDEF
InstructionExample
• AdditionInstruction
• AddvalueinregisterrA tothatinregisterrB• StoreresultinregisterrB• NotethatY86-64onlyallowsadditiontobeappliedtoregisterdata
• Setconditioncodesbasedonresult• e.g.,addq %rax,%rsi Encoding: 60 06• Two-byteencoding
• Firstindicatesinstructiontype• Secondgivessourceanddestinationregisters
addq rA, rB 6 0 rA rB
EncodedRepresentation
GenericForm
ArithmeticandLogicalOperations• Refertogenericallyas“OPq”• Encodingsdifferonlyby“functioncode”• Low-order4bytesinfirstinstructionword
• Setconditioncodesassideeffect
addq rA, rB 6 0 rA rB
subq rA, rB 6 1 rA rB
andq rA, rB 6 2 rA rB
xorq rA, rB 6 3 rA rB
Add
Subtract(rAfromrB)
And
Exclusive-Or
InstructionCode FunctionCode
MoveOperations
• Likethex86-64movq instruction• Simplerformatformemoryaddresses• Givedifferentnamestokeepthemdistinct
rrmovq rA, rB 2 0
Registerè Register
Immediateè Register
irmovq V, rB F rB3 0 V
RegisterèMemory
rmmovq rA, D(rB) 4 0 rA rB D
Memoryè Register
mrmovq D(rB),rA 5 0 rA rB D
rA rB
ConditionalMoveInstructions• Refertogenericallyas“cmovXX”• Encodingsdifferonlyby“functioncode”• Basedonvaluesofconditioncodes• Variantsofrrmovqinstruction• (Conditionally)copyvaluefromsourcetodestinationregister
rrmovq rA,rB
MoveUnconditionally
cmovle rA,rB
MoveWhenLessorEqual
cmovl rA,rB
MoveWhenLess
cmove rA,rB
MoveWhenEqual
cmovne rA,rB
MoveWhenNotEqual
cmovge rA,rB
MoveWhenGreaterorEqual
cmovg rA,rB
MoveWhenGreater
2 0 rA rB
2 1 rA rB
2 2 rA rB
2 3 rA rB
2 4 rA rB
2 5 rA rB
2 6 rA rB
JumpInstructions
• Refertogenericallyas“jXX”• Encodingsdifferonlyby“functioncode”fn• Basedonvaluesofconditioncodes• Sameasx86-64counterparts• Encodefulldestinationaddress
• UnlikePC-relativeaddressingseeninx86-64
jXX Dest 7 fn
Jump(Conditionally)
Dest
JumpInstructions
jmp Dest 7 0
JumpUnconditionally
Dest
jle Dest 7 1
JumpWhenLessorEqual
Dest
jl Dest 7 2
JumpWhenLess
Dest
je Dest 7 3
JumpWhenEqual
Dest
jne Dest 7 4
JumpWhenNotEqual
Dest
jge Dest 7 5
JumpWhenGreaterorEqual
Dest
jg Dest 7 6
JumpWhenGreater
Dest
StackOperations
• Decrement%rsp by8• StorewordfromrA tomemoryat%rsp• Likex86-64
• Readwordfrommemoryat%rsp• SaveinrA• Increment%rsp by8• Likex86-64
pushq rA A 0 rA F
popq rA B 0 rA F
SubroutineCallandReturn
• Pushaddressofnextinstructionontostack• StartexecutinginstructionsatDest• Likex86-64
• Popvaluefromstack• Useasaddressfornextinstruction• Likex86-64
call Dest 8 0 Dest
ret 9 0
MiscellaneousInstructions
• Don’tdoanything
• Stopexecutinginstructions• x86-64hascomparableinstruction,butcan’texecuteitinusermode• Wewilluseittostopthesimulator• Encodingensuresthatprogramhittingmemoryinitializedtozerowillhalt
nop 1 0
halt 0 0
Agenda• ISAvsMicroarchitecture• ISATradeoffs• Y86-64ISA• Y86-64Format• Y86-64Encoding/Decoding
Y86-64encodinglong addOne(long x) {return x + 1;}• x86-64:movq %rdi, %raxaddq $1, %raxret• Y86-64:irmovq $1, %raxaddq %rdi, %raxret
Y86-64encoding
addOne:irmovq $1, %raxaddq %rdi, %raxret
Byte
pushq rA A 0 rA F
jXX Dest 7 fn Dest
popq rA B 0 rA F
call Dest 8 0 Dest
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB)4 0 rA rB D
mrmovq D(rB), rA5 0 rA rB D
OPq rA, rB 6 fn rA rB
ret 9 0
nop 1 0
halt 0 0
0 1 2 3 4 5 6 7 8 9
Y86-64encoding
doubleTillNegative:/* suppose at address 0x123 */
addq %rax, %raxjge doubleTillNegative
Byte
pushq rA A 0 rA F
jXX Dest 7 fn Dest
popq rA B 0 rA F
call Dest 8 0 Dest
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB)4 0 rA rB D
mrmovq D(rB), rA5 0 rA rB D
OPq rA, rB 6 fn rA rB
ret 9 0
nop 1 0
halt 0 0
0 1 2 3 4 5 6 7 8 9
Y86-64decoding2010602061377284000000000000002012200170680000000000000020106020613772840000000000000020122001706800000000000000rrmovq %rcx,%rax• 0ascc:always• 1asreg:%rcx• 0asreg:%raxaddq %rdx,%raxsubq %rbx,%rdi• 0asfn:add• 1asfn:subjl 0x84• 2ascc:l(lessthan)• hex8400…aslittleendianDest:0x84rrmovq %rcx,%rdxrrmovq %rax,%rcxjmp 0x68
Byte
pushq rA A 0 rA F
jXX Dest 7 fn Dest
popq rA B 0 rA F
call Dest 8 0 Dest
cmovXX rA, rB 2 fn rA rB
irmovq V, rB 3 0 F rB V
rmmovq rA, D(rB)4 0 rA rB D
mrmovq D(rB), rA5 0 rA rB D
OPq rA, rB 6 fn rA rB
ret 9 0
nop 1 0
halt 0 0
0 1 2 3 4 5 6 7 8 9