this powerpointslides are modified from its original...
Post on 07-Nov-2020
3 Views
Preview:
TRANSCRIPT
ThisPowerpoint slidesaremodifiedfromitsoriginalversionavailableathttp://www.cs.cmu.edu/afs/cs/academic/class/15213-s09/www/lectures/ppt-sources/
u HistoryofIntelprocessorsandarchitecturesu C,assembly,machinecodeu AssemblyBasics:Registers,operands,moveu Arithmetic&logicaloperations
-2 -
u Totallydominatecomputermarketu Evolutionarydesign
§ Backwardscompatibleupuntil8086,introducedin1978§ Addedmorefeaturesastimegoeson
u Complexinstructionsetcomputer(CISC)§ Manydifferentinstructionswithmanydifferentformats
• But,onlysmallsubsetencounteredwithLinuxprograms
§ HardtomatchperformanceofReducedInstructionSetComputers(RISC)
§ But,Intelhasdonejustthat!
- 3 -
u Name Date Transistors MHzu 8086 1978 29K 5-10
§ First16-bitIntelprocessor.BasisforIBMPC&DOS§ 1MBaddressspace
u 386 1985 275K 16-33§ First32bitIntelprocessor,referredtoasIA32§ Added“flataddressing”,capableofrunningUnix
u Pentium4E 2004 125M 2800-3800§ First64-bitIntelx86processor,referredtoasx86-64
u Core2 2006 291M 1060-3500§ Firstmulti-coreIntelprocessor
u Corei7 2008 731M 1700-3900§ Fourcores
X86-64/EM64t
X86-32/IA32
X86-16 8086
286
386486PentiumPentiumMMX
PentiumIII
Pentium4
Pentium4E
Pentium4F
Core2DuoCorei7
IA:oftenredefinedaslatestIntelarchitecture
time
Architectures Processors
MMX
SSE
SSE2
SSE3
SSE4
- 5 -
u MachineEvolution§ 386 1985 0.3M§ Pentium 1993 3.1M§ Pentium/MMX 1997 4.5M§ PentiumPro 1995 6.5M§ PentiumIII1999 8.2M§ Pentium4 2001 42M§ Core2Duo 2006 291M§ Corei7 2008 731M
u AddedFeatures§ Instructionstosupportmultimediaoperations§ Instructionstoenablemoreefficientconditionaloperations§ Transitionfrom32bitsto64bits§ Morecores
u Historically§ AMDhasfollowedjustbehindIntel§ Alittlebitslower,alotcheaper
u Then§ RecruitedtopcircuitdesignersfromDigitalEquipmentCorp.andotherdownwardtrendingcompanies
§ BuiltOpteron:toughcompetitortoPentium4§ Developedx86-64,theirownextensionto64bits
u RecentYears§ Intelgotitsacttogether
• Leadstheworldinsemiconductor technology
§ AMDhasfallenbehind• Reliesonexternalsemiconductormanufacturer
u 2001:IntelAttemptsRadicalShiftfromIA32toIA64§ Totallydifferentarchitecture(Itanium)§ ExecutesIA32codeonlyaslegacy§ Performancedisappointing
u 2003:AMDStepsinwithEvolutionarySolution§ x86-64(nowcalled“AMD64”)
u IntelFeltObligatedtoFocusonIA64§ HardtoadmitmistakeorthatAMDisbetter
u 2004:IntelAnnouncesEM64TextensiontoIA32§ ExtendedMemory64-bitTechnology§ Almostidenticaltox86-64!
u Allbutlow-endx86processorssupportx86-64§ But,lotsofcodestillrunsin32-bitmode
u IA32§ Thetraditionalx86§ RIP,Spring2015
u x86-64§ Thestandard§ shark> gcc hello.c
§ shark> gcc –m64 hello.c
u Presentation§ Bookcoversx86-64§ Wewillonlycoverx86-64
u HistoryofIntelprocessorsandarchitecturesu C,assembly,machinecodeu AssemblyBasics:Registers,operands,moveu Arithmetic&logicaloperations
-10 -
u Architecture:(alsoISA:instructionsetarchitecture)Thepartsofaprocessordesignthatoneneedstounderstandorwriteassembly/machinecode§ Examples:instructionsetspecification,registers.
u Microarchitecture:Implementationofthearchitecture§ Examples:cachesizesandcorefrequency
u CodeForms:§ MachineCode:Thebyte-levelprogramsthataprocessorexecutes§ AssemblyCode:Atextrepresentationofmachinecode
u ExampleISAs:§ Intel:x86,IA32,Itanium,x86-64§ ARM:Usedinalmostallmobilephones
CPU
u Programmer-VisibleState§ PC:Programcounter• Addressofnextinstruction• Called“RIP”(x86-64)
§ Registerfile• Heavilyusedprogramdata
§ Condition codes• Storestatus informationaboutmostrecentarithmeticoperation• Usedforconditionalbranching
§Memory• Byteaddressablearray• Code,userdata,(some)OSdata• Includesstackusedtosupportprocedures
- 12 -
PC Registers
Memory
ObjectCodeProgramDataOSData
Addresses
Data
Instructions
Stack
ConditionCodes
text
text
binary
binary
Compiler(gcc –Og -S)
Assembler(gcc oras)
Linker(gcc or ld)
Cprogram(p1.c p2.c)
Asm program(p1.s p2.s)
Objectprogram(p1.o p2.o)
Executableprogram(p)
Staticlibraries(.a)
§ Codeinfilesp1.cp2.c§ Compilewithcommand:gcc –Og p1.cp2.c-op
• Usebasicoptimizations (-Og)• Putresultingbinaryinfilep
long plus(long x, long y);
void sumstore(long x, long y, long *dest)
{long t = plus(x, y);*dest = t;
}
Generatedx86-64Assemblysumstore:
pushq %rbxmovq %rdx, %rbxcall plusmovq %rax, (%rbx)popq %rbxret
Obtainwithcommand
gcc –Og –S sum.c
Producesfilesum.s
CCode(sum.c)
u “Integer”dataof1,2,4,or8bytes§ Datavalues§ Addresses(untypedpointers)
u Floatingpointdataof4,8,or10bytes
u Code:Bytesequencesencodingseriesofinstructions
u Noaggregatetypessuchasarraysorstructures§ Justcontiguouslyallocatedbytesinmemory
u Performarithmeticfunctiononregisterormemorydata
u Transferdatabetweenmemoryandregister§ Loaddatafrommemoryintoregister§ Storeregisterdataintomemory
u Transfercontrol§ Unconditionaljumpsto/fromprocedures§ Conditionalbranches
Codeforsumstore0x0400595:
0x530x480x890xd30xe80xf20xff0xff0xff0x480x890x030x5b0xc3
u Assembler§ Translates.sinto.o§ Binaryencodingofeachinstruction§ Nearly-completeimageofexecutablecode
§ Missinglinkagesbetweencodeindifferentfiles
u Linker§ Resolvesreferencesbetweenfiles§ Combineswithstaticrun-timelibraries
• E.g.,codeformalloc,printf
§ Somelibrariesaredynamicallylinked• Linkingoccurswhenprogrambeginsexecution
• Totalof14bytes• Eachinstruction1,3,or5bytes
• Startsataddress0x0400595
u CCode§ Storevaluetwheredesignatedbydest
u Assembly§ Move8-bytevaluetomemory
• Quadwordsinx86-64parlance
§ Operands:• t: Register %rax• dest: Register %rbx• *dest: Memory M[%rbx]
u ObjectCode§ 3-byteinstruction§ Storedataddress0x40059e
*dest = t;
movq %rax, (%rbx)
0x40059e: 48 89 03
u Disassembler§ objdump -dp§ Usefultoolforexaminingobjectcode§ Analyzesbitpatternofseriesofinstructions§ Producesapproximaterenditionofassemblycode§ Canberunoneithera.out (completeexecutable)or.ofile
- 19 -
00401040<_sum>:0: 55 push%ebp1: 89e5 mov %esp,%ebp3: 8b450c mov 0xc(%ebp),%eax6: 034508 add0x8(%ebp),%eax9: 89ec mov %ebp,%espb: 5d pop%ebpc: c3 retd: 8d7600 lea0x0(%esi),%esi
Disassembled
0x401040 <sum>: push %ebp0x401041 <sum+1>: mov %esp,%ebp0x401043 <sum+3>: mov 0xc(%ebp),%eax0x401046 <sum+6>: add 0x8(%ebp),%eax0x401049 <sum+9>: mov %ebp,%esp0x40104b <sum+11>: pop %ebp0x40104c <sum+12>: ret0x40104d <sum+13>: lea 0x0(%esi),%esi
u Withingdb Debugger§ gdb p§ disassemblesum§ Disassembleprocedure§ x/13bsum§ Examinethe13bytesstartingatsum
- 20 -
Object
0x401040:0x550x890xe50x8b0x450x0c0x030x450x080x890xec0x5d0xc3
u Anythingthatcanbeinterpretedasexecutablecodeu Disassemblerexaminesbytesandreconstructsassemblysource
-21 -
%objdump-dWINWORD.EXE
WINWORD.EXE:fileformatpei-i386
Nosymbolsin"WINWORD.EXE".Disassemblyofsection.text:
30001000<.text>:30001000: 55 push%ebp30001001: 8bec mov%esp,%ebp30001003: 6aff push$0xffffffff30001005: 6890100030 push$0x300010903000100a: 6891dc4c30 push$0x304cdc91
u HistoryofIntelprocessorsandarchitecturesu C,assembly,machinecodeu AssemblyBasics:Registers,operands,moveu Arithmetic&logicaloperations
-22 -
u Inteluses“word”torefertoa16-bitdatatype§ 32-bitquantitiesasdoublewordsand64-bitquantitiesasquadwords
u mov:movb,movw,movq,movl
Cdeclaration Inteldatatype GASsuffix Size(B)char Byte b 1short Word w 2int Doubleword l 4
unsigned Doubleword l 4longint Quadword q 8
unsignedlong Quadword q 8char* Quadword q 8float SinglePrecision s 4double DoublePrecision l 8
longdouble ExtendedPrecision t 10/12
- 23 -
%rsp
§ Canreferencelow-order4bytes(alsolow-order1&2bytes)
%eax
%ebx
%ecx
%edx
%esi
%edi
%esp
%ebp
%r8d
%r9d
%r10d
%r11d
%r12d
%r13d
%r14d
%r15d
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15
%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rbp
%eax
%ecx
%edx
%ebx
%esi
%edi
%esp
%ebp
%ax
%cx
%dx
%bx
%si
%di
%sp
%bp
%ah
%ch
%dh
%bh
%al
%cl
%dl
%bl
16-bitvirtualregisters(backwardscompatibility)
gene
ralpurpo
se
accumulate
counter
data
base
source index
destinationindex
stack pointer
basepointer
Origin(mostlyobsolete)
- 26 -
%eax
%ax
%ah %al
u MovingDatamovq Source,Dest:
u OperandTypes§ Immediate: Constantintegerdata
• Example:$0x400,$-533• LikeCconstant,butprefixedwith‘$’
• Encodedwith1,2,or4bytes
§ Register:Oneof16integerregisters• Example:%rax, %r13• But%rsp reservedforspecialuse• Othershavespecialusesforparticularinstructions
§ Memory: 8consecutivebytesofmemoryataddressgivenbyregister• Simplestexample:(%rax)
• Variousother“addressmodes”
%rax
%rcx
%rdx
%rbx
%rsi
%rdi
%rsp
%rbp
%rN
u Cannotdomemory-memorytransferwithasingleinstruction
-28 -
Source Destination Example Canalogy
immediate(Imm)
register movq $0x4,%rax temp = 0x4;
memory movq $-147,(%rax) *p = -147;
registerregister movq %rax,%rdx temp2 = temp1;
memory movq %rax,(%rdx) *p = temp;
memory register movq (%rax),%rdx temp = *p;
uNormal (R) Mem[Reg[R]]§ RegisterRspecifiesmemoryaddress§ Aha!PointerdereferencinginC
movq (%rcx),%rax
uDisplacement D(R) Mem[Reg[R]+D]§ RegisterRspecifiesstartofmemoryregion§ ConstantdisplacementDspecifiesoffset
movq 8(%rbp),%rdx
void swap(long *xp, long *yp)
{long t0 = *xp;long t1 = *yp;*xp = t1;*yp = t0;
}
swap:movq (%rdi), %raxmovq (%rsi), %rdxmovq %rdx, (%rdi)movq %rax, (%rsi)ret
%rdi
%rsi
%rax
%rdx
void swap(long *xp, long *yp)
{long t0 = *xp;long t1 = *yp;*xp = t1;*yp = t0;
}
Memory
Register Value%rdi xp%rsi yp%rax t0%rdx t1
swap:movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *ypmovq %rdx, (%rdi) # *xp = t1movq %rax, (%rsi) # *yp = t0ret
Registers
123
456
%rdi
%rsi
%rax
%rdx
0x120
0x100
RegistersMemory
swap:movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *ypmovq %rdx, (%rdi) # *xp = t1movq %rax, (%rsi) # *yp = t0ret
0x120
0x118
0x110
0x108
0x100
Address
123
456
%rdi
%rsi
%rax
%rdx
0x120
0x100
123
RegistersMemory
swap:movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *ypmovq %rdx, (%rdi) # *xp = t1movq %rax, (%rsi) # *yp = t0ret
0x120
0x118
0x110
0x108
0x100
Address
123
456
%rdi
%rsi
%rax
%rdx
0x120
0x100
123
456
RegistersMemory
swap:movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *ypmovq %rdx, (%rdi) # *xp = t1movq %rax, (%rsi) # *yp = t0ret
0x120
0x118
0x110
0x108
0x100
Address
456
456
%rdi
%rsi
%rax
%rdx
0x120
0x100
123
456
RegistersMemory
swap:movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *ypmovq %rdx, (%rdi) # *xp = t1movq %rax, (%rsi) # *yp = t0ret
0x120
0x118
0x110
0x108
0x100
Address
456
123
%rdi
%rsi
%rax
%rdx
0x120
0x100
123
456
RegistersMemory
swap:movq (%rdi), %rax # t0 = *xp movq (%rsi), %rdx # t1 = *ypmovq %rdx, (%rdi) # *xp = t1movq %rax, (%rsi) # *yp = t0ret
0x120
0x118
0x110
0x108
0x100
Address
uNormal (R) Mem[Reg[R]]§ RegisterRspecifiesmemoryaddress§ Aha!PointerdereferencinginC
movq (%rcx),%rax
uDisplacement D(R) Mem[Reg[R]+D]§ RegisterRspecifiesstartofmemoryregion§ ConstantdisplacementDspecifiesoffset
movq 8(%rbp),%rdx
uMostGeneralFormD(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+D]
§ D: Constant“displacement”1,2,or4bytes§ Rb: Baseregister:Anyof16integerregisters§ Ri: Indexregister:Any,exceptfor%rsp§ S: Scale:1,2,4,or8(whythesenumbers?)
uSpecialCases(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D](Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
Expression AddressComputation Address
0x8(%rdx)
(%rdx,%rcx)
(%rdx,%rcx,4)
0x80(,%rdx,2)
Carnegie Mellon
Expression AddressComputation Address
0x8(%rdx) 0xf000 + 0x8 0xf008
(%rdx,%rcx) 0xf000 + 0x100 0xf100
(%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400
0x80(,%rdx,2) 2*0xf000 + 0x80 0x1e080
%rdx 0xf000
%rcx 0x0100
u Exercise%dh=0x8d,%eax=98765432movb%dh,%eax %eax=0x9876548Dmobsbl%dh,%eax %eax=0xFFFFFF8Dmobzbl%dh,%eax %eax=0x0000008D
Instruction Effect Descriptionmovl S,Dmovw S,D
D←SD←S
MovedoublewordMoveword
movb S,D D←S Movebytemovsbl S,D D←SignExtend(S) Movesign-extendedbytemovzbqS,D D←ZeroExtend(S) Movezero-extendedbyte
pushl SR[%esp ]← R[%esp ]– 4;M[R[%esp ]]←S
Push
poplDD← M[R[%esp ]];R[%esp ]←R[%esp ]+4
Pop
0x00000000%eax
0x00000000%edx
u HistoryofIntelprocessorsandarchitecturesu C,assembly,machinecodeu AssemblyBasics:Registers,operands,moveu Arithmetic&logicaloperations
Carnegie Mellon
u leaq Src,Dst§ Src isaddressmodeexpression§ SetDst toaddressdenotedbyexpression
u Uses§ Computingaddresseswithoutamemoryreference• E.g.,translationofp = &x[i];
§ Computingarithmeticexpressionsoftheformx+k*y• k=1,2,4,or8
u Examplelong m12(long x){
return x*12;}
leaq (%rdi,%rdi,2), %rax # t <- x+x*2salq $2, %rax # return t<<2
ConvertedtoASMbycompiler:
Carnegie Mellon
u TwoOperandInstructions:Format Computationaddq Src,Dest Dest =Dest +Srcsubq Src,Dest Dest =Dest − Srcimulq Src,Dest Dest =Dest *Srcsalq Src,Dest Dest =Dest <<Src Alsocalledshlqsarq Src,Dest Dest =Dest >>Src Arithmeticshrq Src,Dest Dest =Dest >>Src Logicalxorq Src,Dest Dest =Dest ^Srcandq Src,Dest Dest =Dest &Srcorq Src,Dest Dest =Dest |Src
u Watchoutforargumentorder!u Nodistinctionbetweensignedandunsignedint (why?)
Carnegie Mellon
u OneOperandInstructionsincq DestDest =Dest +1decq DestDest =Dest - 1negq DestDest =- Destnotq DestDest =~Dest
u Seebookformoreinstructions
Carnegie Mellon
InterestingInstructions§ leaq:addresscomputation§ salq:shift§ imulq:multiplication
• But,onlyusedonce
long arith(long x, long y, long z){
long t1 = x+y;long t2 = z+t1;long t3 = x+4;long t4 = y * 48;long t5 = t3 + t4;long rval = t2 * t5;return rval;
}
arith:leaq (%rdi,%rsi), %raxaddq %rdx, %raxleaq (%rsi,%rsi,2), %rdxsalq $4, %rdxleaq 4(%rdi,%rdx), %rcximulq %rcx, %raxret
Carnegie Mellon
long arith(long x, long y, long z){
long t1 = x+y;long t2 = z+t1;long t3 = x+4;long t4 = y * 48;long t5 = t3 + t4;long rval = t2 * t5;return rval;
}
arith:leaq (%rdi,%rsi), %rax # t1addq %rdx, %rax # t2leaq (%rsi,%rsi,2), %rdxsalq $4, %rdx # t4leaq 4(%rdi,%rdx), %rcx # t5imulq %rcx, %rax # rvalret
Register Use(s)
%rdi Argumentx
%rsi Argumenty
%rdx Argumentz
%rax t1, t2,rval
%rdx t4
%rcx t5
u HistoryofIntelprocessorsandarchitectures§ Evolutionarydesignleadstomanyquirksandartifacts
u C,assembly,machinecode§ Newformsofvisiblestate:programcounter,registers,...§ Compilermusttransformstatements,expressions,proceduresintolow-levelinstructionsequences
u AssemblyBasics:Registers,operands,move§ Thex86-64moveinstructionscoverwiderangeofdatamovementforms
u Arithmetic§ Ccompilerwillfigureoutdifferentinstructioncombinationstocarryoutcomputation
top related