lec2 machinelevelprogramming basics

Introduction to Computer Systems 15-213/18-243, spring 2009

Machine-Level Programming I: Basics

15-213/18-243: Introduction to Computer Systems 4th Lecture, Sep. 2, 2010Instructors: Randy Bryant and Dave OHallaronCarnegie Mellon#1Today: Machine Programming I: BasicsHistory of Intel processors and architecturesC, assembly, machine codeAssembly Basics: Registers, operands, moveIntro to x86-64Carnegie Mellon#2Intel x86 ProcessorsTotally dominate laptop/desktop/server market

Evolutionary designBackwards compatible up until 8086, introduced in 1978Added more features as time goes on

Complex instruction set computer (CISC)Many different instructions with many different formatsBut, only small subset encountered with Linux programsHard to match performance of Reduced Instruction Set Computers (RISC)But, Intel has done just that!In terms of speed. Less so for low power.Carnegie Mellon#Intel x86 Evolution: MilestonesNameDateTransistorsMHz8086197829K5-10First 16-bit processor. Basis for IBM PC & DOS1MB address space3861985275K16-33First 32 bit processor , referred to as IA32Added flat addressingCapable of running Unix32-bit Linux/gcc uses no instructions introduced in later modelsPentium 4F2004125M2800-3800First 64-bit processor, referred to as x86-64Core i72008731M2667-3333Our shark machinesCarnegie Mellon#Intel x86 Processors: OverviewX86-64 / EM64tX86-32/IA32X86-168086

286386486PentiumPentium MMXPentium IIIPentium 4Pentium 4EPentium 4F

Core 2 DuoCore i7IA: often redefined as latest Intel architecturetimeArchitecturesProcessorsMMX

SSE

SSE2

SSE3SSE4Carnegie Mellon#5Intel x86 Processors, contd.Machine Evolution38619850.3MPentium19933.1MPentium/MMX19974.5MPentiumPro19956.5MPentium III19998.2MPentium 4200142MCore 2 Duo2006291MCore i72008731MAdded FeaturesInstructions to support multimedia operationsParallel operations on 1, 2, and 4-byte data, both integer & FPInstructions to enable more efficient conditional operationsLinux/GCC EvolutionTwo major steps: 1) support 32-bit 386. 2) support 64-bit x86-64

Carnegie Mellon#More InformationIntel processors (Wikipedia)Intel microarchitectures

Carnegie Mellon#7New Species: ia64, then IPF, then Itanium, NameDateTransistorsItanium200110MFirst shot at 64-bit architecture: first called IA64Radically new instruction set designed for high performanceCan run existing IA32 programsOn-board x86 engineJoint project with Hewlett-PackardItanium 22002221MBig performance boostItanium 2 Dual-Core20061.7BItanium has not taken off in marketplaceLack of backward compatibility, no good compiler support, Pentium 4 got too goodCarnegie Mellon#x86 Clones: Advanced Micro Devices (AMD)HistoricallyAMD has followed just behind IntelA little bit slower, a lot cheaperThenRecruited top circuit designers from Digital Equipment Corp. and other downward trending companiesBuilt Opteron: tough competitor to Pentium 4Developed x86-64, their own extension to 64 bitsCarnegie Mellon#Intels 64-BitIntel Attempted Radical Shift from IA32 to IA64Totally different architecture (Itanium)Executes IA32 code only as legacyPerformance disappointingAMD Stepped in with Evolutionary Solutionx86-64 (now called AMD64)Intel Felt Obligated to Focus on IA64Hard to admit mistake or that AMD is better2004: Intel Announces EM64T extension to IA32Extended Memory 64-bit TechnologyAlmost identical to x86-64!All but low-end x86 processors support x86-64But, lots of code still runs in 32-bit modeCarnegie Mellon#Our CoverageIA32The traditional x86

x86-64/EM64TThe emerging standard

PresentationBook presents IA32 in Sections 3.13.12Covers x86-64 in 3.13We will cover both simultaneouslySome labs will be based on x86-64, others on IA32Carnegie Mellon#Today: Machine Programming I: BasicsHistory of Intel processors and architecturesC, assembly, machine codeAssembly Basics: Registers, operands, moveIntro to x86-64

Carnegie Mellon#12DefinitionsArchitecture: (also instruction set architecture: ISA) The parts of a processor design that one needs to understand to write assembly code. Examples: instruction set specification, registers.Microarchitecture: Implementation of the architecture.Examples: cache sizes and core frequency.

Example ISAs (Intel): x86, IA, IPFCarnegie Mellon#13CPUAssembly Programmers ViewProgrammer-Visible StatePC: Program counterAddress of next instructionCalled EIP (IA32) or RIP (x86-64)Register fileHeavily used program dataCondition codesStore status information about most recent arithmetic operationUsed for conditional branchingPCRegistersMemoryObject CodeProgram DataOS DataAddressesDataInstructionsStackConditionCodesMemoryByte addressable arrayCode, user data, (some) OS dataIncludes stack used to support procedures

Carnegie Mellon#texttextbinarybinaryCompiler (gcc -S)Assembler (gcc or as)Linker (gcc or ld)C program (p1.c p2.c)Asm program (p1.s p2.s)Object program (p1.o p2.o)Executable program (p)Static libraries (.a)Turning C into Object CodeCode in files p1.c p2.cCompile with command: gcc O1 p1.c p2.c -o pUse basic optimizations (-O1)Put resulting binary in file pCarnegie Mellon#Compiling Into AssemblyC Code

int sum(int x, int y){ int t = x+y; return t;}Generated IA32 Assembly

sum: pushl %ebp movl %esp,%ebp movl 12(%ebp),%eax addl 8(%ebp),%eax popl %ebp retObtain with command/usr/local/bin/gcc O1 -S code.cProduces file code.sSome compilers use instruction leaveCarnegie Mellon#Assembly Characteristics: Data TypesInteger data of 1, 2, or 4 bytesData valuesAddresses (untyped pointers)

Floating point data of 4, 8, or 10 bytes

No aggregate types such as arrays or structuresJust contiguously allocated bytes in memoryCarnegie Mellon#Assembly Characteristics: OperationsPerform arithmetic function on register or memory data

Transfer data between memory and registerLoad data from memory into registerStore register data into memory

Transfer controlUnconditional jumps to/from proceduresConditional branchesCarnegie Mellon#Code for sum

0x401040 : 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x5d 0xc3Object CodeAssemblerTranslates .s into .oBinary encoding of each instructionNearly-complete image of executable codeMissing linkages between code in different filesLinkerResolves references between filesCombines with static run-time librariesE.g., code for malloc, printfSome libraries are dynamically linkedLinking occurs when program begins executionTotal of 11 bytesEach instruction 1, 2, or 3 bytesStarts at address 0x401040Carnegie Mellon#Machine Instruction ExampleC CodeAdd two signed integersAssemblyAdd 2 4-byte integersLong words in GCC parlanceSame instruction whether signed or unsignedOperands:x:Register%eaxy:MemoryM[%ebp+8]t:Register%eaxReturn function value in %eaxObject Code3-byte instructionStored at address 0x80483caint t = x+y;addl 8(%ebp),%eax0x80483ca: 03 45 08Similar to expression: x += yMore precisely:int eax;int *ebp;eax += ebp[2]Carnegie Mellon#Disassembled

Disassembling Object CodeDisassemblerobjdump -d pUseful tool for examining object codeAnalyzes bit pattern of series of instructionsProduces approximate rendition of assembly codeCan be run on either a.out (complete executable) or .o file080483c4 : 80483c4: 55 push %ebp 80483c5: 89 e5 mov %esp,%ebp 80483c7: 8b 45 0c mov 0xc(%ebp),%eax 80483ca: 03 45 08 add 0x8(%ebp),%eax 80483cd: 5d pop %ebp 80483ce: c3 ret Carnegie Mellon#Disassembled

Dump of assembler code for function sum:0x080483c4 : push %ebp0x080483c5 : mov %esp,%ebp0x080483c7 : mov 0xc(%ebp),%eax0x080483ca : add 0x8(%ebp),%eax0x080483cd : pop %ebp0x080483ce : retAlternate DisassemblyWithin gdb Debuggergdb pdisassemble sumDisassemble procedurex/11xb sumExamine the 11 bytes starting at sum

Object

0x401040: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x5d 0xc3Carnegie Mellon#What Can be Disassembled?Anything that can be interpreted as executable codeDisassembler examines bytes and reconstructs assembly source% objdump -d WINWORD.EXE

WINWORD.EXE: file format pei-i386

No symbols in "WINWORD.EXE".Disassembly of section .text:

30001000 :30001000: 55 push %ebp30001001: 8b ec mov %esp,%ebp30001003: 6a ff push $0xffffffff30001005: 68 90 10 00 30 push $0x300010903000100a: 68 91 dc 4c 30 push $0x304cdc91Carnegie Mellon#Today: Machine Programming I: BasicsHistory of Intel processors and architecturesC, assembly, machine codeAssembly Basics: Registers, operands, moveIntro to x86-64

Carnegie Mellon#24Integer Registers (IA32)%eax%ecx%edx%ebx%esi%edi%esp%ebp%ax%cx%dx%bx%si%di%sp%bp%ah%ch%dh%bh%al%cl%dl%bl16-bit virtual registers(backwards compatibility)general purposeaccumulatecounterdatabasesource indexdestinationindexstack pointerbasepointerOrigin(mostly obsolete)Carnegie Mellon#25Moving Data: IA32Moving Datamovl Source, Dest:Operand TypesImmediate: Constant integer dataExample: $0x400, $-533Like C constant, but prefixed with $Encoded with 1, 2, or 4 bytesRegister: One of 8 integer registersExample: %eax, %edxBut %esp and %ebp reserved for special useOthers have special uses for particular instructionsMemory: 4 consecutive bytes of memory at address given by registerSimplest example: (%eax)Various other address modes%eax%ecx%edx%ebx%esi%edi%esp%ebpCarnegie Mellon#movl Operand CombinationsCannot do memory-memory transfer with a single instructionmovlImmRegMemRegMemRegMemRegSourceDestC Analogmovl $0x4,%eaxtemp = 0x4;movl $-147,(%eax)*p = -147;movl %eax,%edxtemp2 = temp1;movl %eax,(%edx)*p = temp;movl (%eax),%edxtemp = *p;Src,DestCarnegie Mellon#Simple Memory Addressing ModesNormal(R)Mem[Reg[R]]Register R specifies memory address

movl (%ecx),%eax

DisplacementD(R)Mem[Reg[R]+D]Register R specifies start of memory regionConstant displacement D specifies offset

movl 8(%ebp),%edxCarnegie Mellon#Using Simple Addressing Modesvoid swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}BodySetUpFinishswap: pushl %ebp movl %esp,%ebp pushl %ebx

movl 8(%ebp), %edx movl 12(%ebp), %ecx movl (%edx), %ebx movl (%ecx), %eax movl %eax, (%edx) movl %ebx, (%ecx)

popl %ebx popl %ebp retCarnegie Mellon#Using Simple Addressing Modesvoid swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}swap:pushl %ebpmovl %esp,%ebppushl %ebxmovl8(%ebp), %edxmovl12(%ebp), %ecxmovl(%edx), %ebxmovl(%ecx), %eaxmovl%eax, (%edx)movl%ebx, (%ecx)

popl%ebxpopl%ebpretBodySetUpFinishCarnegie Mellon#Understanding Swapvoid swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}Stack(in memory)RegisterValue%edxxp%ecxyp%ebxt0%eaxt1ypxpRtn adrOld %ebp%ebp 0 4 8 12 OffsetOld %ebx-4 %espmovl8(%ebp), %edx# edx = xpmovl12(%ebp), %ecx# ecx = ypmovl(%edx), %ebx# ebx = *xp (t0)movl(%ecx), %eax# eax = *yp (t1)movl%eax, (%edx)# *xp = t1movl%ebx, (%ecx)# *yp = t0Carnegie Mellon#Understanding Swap0x1200x124Rtn adr%ebp 0 4 8 12 Offset-4 123456Address0x124 0x120 0x11c 0x118 0x114 0x110 0x10c0x108 0x104 0x100 ypxp%eax%edx%ecx%ebx%esi%edi%esp%ebp0x104movl8(%ebp), %edx# edx = xpmovl12(%ebp), %ecx# ecx = ypmovl(%edx), %ebx# ebx = *xp (t0)movl(%ecx), %eax# eax = *yp (t1)movl%eax, (%edx)# *xp = t1movl%ebx, (%ecx)# *yp = t0Carnegie Mellon#Understanding Swap0x1200x124Rtn adr%ebp 0 4 8 12 Offset-4 123456Address0x124 0x120 0x11c 0x118 0x114 0x110 0x10c0x108 0x104 0x100 ypxp%eax%edx%ecx%ebx%esi%edi%esp%ebp0x1240x1040x120movl8(%ebp), %edx# edx = xpmovl12(%ebp), %ecx# ecx = ypmovl(%edx), %ebx# ebx = *xp (t0)movl(%ecx), %eax# eax = *yp (t1)movl%eax, (%edx)# *xp = t1movl%ebx, (%ecx)# *yp = t0Carnegie Mellon#Understanding Swap0x1200x124Rtn adr%ebp 0 4 8 12 Offset-4 123456Address0x124 0x120 0x11c 0x118 0x114 0x110 0x10c0x108 0x104 0x100 ypxp%eax%edx%ecx%ebx%esi%edi%esp%ebp0x1200x1040x1240x124movl8(%ebp), %edx# edx = xpmovl12(%ebp), %ecx# ecx = ypmovl(%edx), %ebx# ebx = *xp (t0)movl(%ecx), %eax# eax = *yp (t1)movl%eax, (%edx)# *xp = t1movl%ebx, (%ecx)# *yp = t0Carnegie Mellon#456Understanding Swap0x1200x124Rtn adr%ebp 0 4 8 12 Offset-4 123456Address0x124 0x120 0x11c 0x118 0x114 0x110 0x10c0x108 0x104 0x100 ypxp%eax%edx%ecx%ebx%esi%edi%esp%ebp0x1240x1201230x104movl8(%ebp), %edx# edx = xpmovl12(%ebp), %ecx# ecx = ypmovl(%edx), %ebx# ebx = *xp (t0)movl(%ecx), %eax# eax = *yp (t1)movl%eax, (%edx)# *xp = t1movl%ebx, (%ecx)# *yp = t0Carnegie Mellon#Understanding Swap0x1200x124Rtn adr%ebp 0 4 8 12 Offset-4 123456Address0x124 0x120 0x11c 0x118 0x114 0x110 0x10c0x108 0x104 0x100 ypxp%eax%edx%ecx%ebx%esi%edi%esp%ebp4560x1240x1200x104123123movl8(%ebp), %edx# edx = xpmovl12(%ebp), %ecx# ecx = ypmovl(%edx), %ebx# ebx = *xp (t0)movl(%ecx), %eax# eax = *yp (t1)movl%eax, (%edx)# *xp = t1movl%ebx, (%ecx)# *yp = t0Carnegie Mellon#456456Understanding Swap0x1200x124Rtn adr%ebp 0 4 8 12 Offset-4 Address0x124 0x120 0x11c 0x118 0x114 0x110 0x10c0x108 0x104 0x100 ypxp%eax%edx%ecx%ebx%esi%edi%esp%ebp4564560x1240x1201230x104123movl8(%ebp), %edx# edx = xpmovl12(%ebp), %ecx# ecx = ypmovl(%edx), %ebx# ebx = *xp (t0)movl(%ecx), %eax# eax = *yp (t1)movl%eax, (%edx)# *xp = t1movl%ebx, (%ecx)# *yp = t0Carnegie Mellon#Understanding Swap0x1200x124Rtn adr%ebp 0 4 8 12 Offset-4 456123Address0x124 0x120 0x11c 0x118 0x114 0x110 0x10c0x108 0x104 0x100 ypxp%eax%edx%ecx%ebx%esi%edi%esp%ebp4560x1240x1200x104123123movl8(%ebp), %edx# edx = xpmovl12(%ebp), %ecx# ecx = ypmovl(%edx), %ebx# ebx = *xp (t0)movl(%ecx), %eax# eax = *yp (t1)movl%eax, (%edx)# *xp = t1movl%ebx, (%ecx)# *yp = t0Carnegie Mellon#Complete Memory Addressing ModesMost General FormD(Rb,Ri,S)Mem[Reg[Rb]+S*Reg[Ri]+ D]D: Constant displacement 1, 2, or 4 bytesRb: Base register: Any of 8 integer registersRi:Index register: Any, except for %espUnlikely youd use %ebp, eitherS: Scale: 1, 2, 4, or 8 (why these numbers?)

Special Cases(Rb,Ri)Mem[Reg[Rb]+Reg[Ri]]D(Rb,Ri)Mem[Reg[Rb]+Reg[Ri]+D](Rb,Ri,S)Mem[Reg[Rb]+S*Reg[Ri]]Carnegie Mellon#Today: Machine Programming I: BasicsHistory of Intel processors and architecturesC, assembly, machine codeAssembly Basics: Registers, operands, moveIntro to x86-64

Carnegie Mellon#40Data Representations: IA32 + x86-64Sizes of C Objects (in Bytes) C Data TypeGeneric 32-bitIntel IA32x86-64unsigned444int444long int448char111short222float444double888long double810/1216char *448Or any other pointerCarnegie Mellon#%rspx86-64 Integer RegistersExtend existing registers. Add 8 new ones.Make %ebp/%rbp general purpose%eax%ebx%ecx%edx%esi%edi%esp%ebp%r8d%r9d%r10d%r11d%r12d%r13d%r14d%r15d%r8%r9%r10%r11%r12%r13%r14%r15%rax%rbx%rcx%rdx%rsi%rdi%rbpCarnegie Mellon#InstructionsLong word l (4 Bytes) Quad word q (8 Bytes)

New instructions:movl movqaddl addqsall salqetc.

32-bit instructions that generate 32-bit resultsSet higher order bits of destination register to 0Example: addlCarnegie Mellon#32-bit code for swapvoid swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}BodySetUpFinishswap:pushl %ebpmovl %esp,%ebppushl %ebx

movl8(%ebp), %edxmovl12(%ebp), %ecxmovl(%edx), %ebxmovl(%ecx), %eaxmovl%eax, (%edx)movl%ebx, (%ecx)

popl%ebxpopl%ebpretCarnegie Mellon#64-bit code for swapOperands passed in registers (why useful?)First (xp) in %rdi, second (yp) in %rsi64-bit pointersNo stack operations required32-bit dataData held in registers %eax and %edx movl operation

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}BodySetUpFinishswap:

movl(%rdi), %edxmovl(%rsi), %eaxmovl%eax, (%rdi)movl%edx, (%rsi)retCarnegie Mellon#64-bit code for long int swap64-bit dataData held in registers %rax and %rdx movq operationq stands for quad-wordvoid swap(long *xp, long *yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0;}BodySetUpFinishswap_l:

movq (%rdi), %rdxmovq (%rsi), %raxmovq %rax, (%rdi)movq %rdx, (%rsi)

retCarnegie Mellon#Machine Programming I: SummaryHistory of Intel processors and architecturesEvolutionary design leads to many quirks and artifactsC, assembly, machine codeCompiler must transform statements, expressions, procedures into low-level instruction sequencesAssembly Basics: Registers, operands, moveThe x86 move instructions cover wide range of data movement formsIntro to x86-64A major departure from the style of code seen in IA32Carnegie Mellon#47

lec2 machinelevelprogramming basics

Documents

intel x86 processorstotally

intel x86 evolution

x86 clones

developed x86

goodcarnegie mellon

bitscarnegie mellon

shark machinescarnegie

dave ohallaroncarnegie