introduction to assembly here we have a brief introduction to ibm pc assembly language –cisc...

Post on 17-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Assembly

• Here we have a brief introduction to IBM PC Assembly Language– CISC instruction set– Special purpose register set– 8 and 16 bit operations initially (expanded to 32

and 64 bit operations for Pentium)– Memory-register and register-register

operations available– Several addressing modes including many

implied addresses

Instruction Format• [name/label] [mnemonic] [operands] [;comments]

– Operands are either literals, variables/constants, or registers

• Number of operands depends on type of instruction, range from 0 to 2

• Examples:– mov ax, bx – 2 operands, source and destination

– mov ax, 5 – one operand is a literal

– mov y, ax – memory to register movement

– add ax, 5 – 2 operands for add

– mul value – 1 operand for mul, other operand implied to be ax

– nop – no operands for the no-op instruction

– je location – 1 operand with comparison implied to be a flag

Literals and Variables• Literals require that the type of

value be specified by following the value with one of the following:– D, d (or nothing) for decimal– H, h for hexadecimal– Q, q for octal– b for binary– Strings are placed in ‘ ’ or “ ”

marks– Examples:

• 10101011b• 0Ah• 35• ‘hello’• “goodbye”

• As we will define all assembly code within C/C++ programs, we will declare all variables in the C/C++ code itself– We only have to worry about

types• int is 32 bit

• short is 16 bit

• char is 8 bit

• We must insure that we place the datum into the right sized register, and that if we reference a literal, it is specified to be of the proper size to fit in the associated variable

Registers• There are 14 registers in the Intel-based architecture

– They are all special purpose • you can only use a given register as it was intended to be used• but there are some exceptions to this rule as described below

– There are 4 data registers:• AX – accumulator

– the only data register– AX is an implied register in the Mul and Div instructions

• BX – base counter – used for addressing, particularly when dealing with arrays and strings– BX can be used as a data register when not used for addressing

• CX – counter – implicitly used in loop instructions– in non-looping instructions, can be used as a data register

• DX – data register – Primarily used for In and Out instructions, but also used to store partial results of Mul and

Div operations– in other cases, can be used as a data register

• In the Pentium architecture, each of these are expanded to 32 bits (EAX, EBX, ECX, EDX), but there are also 8-bit versions (AL, AH, BL, BH, CL, CH, DL, DH)

Other Registers– Other registers can not be used for data but have

specific uses:– Segment registers

• point to different segments in memory• used as implied addressing

– SS – stack– CS – code– DS – data– ES – extra (used as a base pointer for variables)

– Indexing registers• used for offsets to the current procedure, stack, or string depending on the

instruction– BP – base pointer used with SS to address subroutine local variables on the stack– SP – stack pointer used with SS for top of stack– SI and DI – source and destination for string transfers

– IP – program counter– Status flags

Operations: Data Movement• mov and xchg instructions

– mov allows for register-register, memory-register, register-memory, register-immediate and memory-immediate

• first item is destination, second is source

• memory-memory moves must be done with 2 instructions using a register as temporary storage

• memory references can use direct, direct+offset, or register-indirect modes

• if datum is 8-bit, register only uses high or low side, 16-bit uses entire register, 32-bit uses extended register (e.g., EAX, EDX) and 64-bit combines two registers

– xchg instruction allows only register-register, memory-register and register-memory and exchanges two values rather than moves one value as with mov

Operations: Arithmetic/Conditional • inc/dec dest• add/sub dest, source

– dest is register or memory reference, source for add/sub is register, memory reference, or literal

• as long as dest and source are not both memory references

• mul/div source– one datum is source, the other

is implied to be eax (or ax or al)

– destination is implied as eax/edx combined (or ax/dx, al/ah depending on size)

• div places quotient in ax, al or eax, remainder in dx, ah or edx

• mul places low half of result in one place and high half in other

• shl, shr, sal, sar, shld, shrd – shift, shift arithmetic, shift

double

• rol, ror, rcl, rcr – rotate, rotate with carry

• Logic operations: AND, OR, XOR, NOT– form is OP dest, source

• NEG dest– convert two’s complement value

to its opposite

• CMP first, second– compare first and second and set

proper flag(s) (PF, ZF, NF)– the result of cmp operations are

then used for branch instructions

Operations: Branches• Conditional branches:

– yhese must be preceded by an instruction which sets at least one status flag (this includes cmp operations)

• the flag tested is based on which branch is used

– je/jne location • branch if zero flag set/clear

– jg/jge/jl/jle location • jump on > (positive flag set),

>=, < (negative flag set), <=

– jc/jnc/jz/jnz/jp/jnp location• Jump on carry, no carry, zero,

not zero, even parity, not even parity (odd parity)

• Unconditional branches do not use a previous comparison or flag, just branch to given location– jmp location

• jmp instructions are used to implement goto statements and procedure calls

• loop location– decrement cx (or ecx)

– if cx (ecx) != 0 then branch to label location

• used for for-loops

• since cx (or ecx) is used implicitly here, inside such a loop structure, we cannot use cx/ecx as a data register!

Addressing Modes• Immediate – place datum in instruction as a literal

– add ax, 10• use this mode when datum is known at program implementation time

• Direct – place variable in instruction– mov ax, x ; moves x into register ax– add y, ax ; sets y = [y] + [ax]

• use this mode to access a variable in memory

• Direct + Offset– mov ax, x+2 ; if x is word size, then this moves x[2]– mov ax, x[bx] ; offsets into x by the number stored in bx

• Note: mov ax, x[y] is illegal as it has 2 memory references, x and y!• use this mode when dealing with strings, arrays and structs

• Register Indirect – use index and/or segment registers– mov ax, [si + ds] ; base-indexed– mov ax, [si – 4] ; base with displacement– mov ax, [si + ds – 6] ; base-indexed with displacement

• we will not use these modes

Addressing Examples• Imagine that we have declared in C:

– int a[ ] = {0, 11, 15, 21, 99};

• Then, the following accesses give us the values of a as shown:– mov eax, a eax 0– mov eax, a+4 eax 11– mov eax, a+8 eax 15– mov eax, a[ebx] eax 99 if ebx = 16

• If ebx and ecx both = 0 and size is the number of items in the array, then we can iterate through the array as follows:

top: mov eax, a[ebx] … do something with the array value … add ebx, 4 add ecx, 1 cmp ecx, size jl top // use jl since we stop once ecx = = size

Writing Assembly in a C Program

• For simplicity, we will write our code inside of C (or C++) programs

• This allows us to – declare variables in C/C++ thus avoiding the .data section– do I/O in C/C++ thus avoiding difficulties dealing with

assembly input and assembly output– compile our programs rather than dealing with assembling

them using MASM or TASM

• To include assembly code, in your C/C++ program, add the following compiler directive

_ _ asm { }

• And place all of your assembly code between the { }

Data Types

• One problem that might arise in using C/C++ to run our assembly code is that we might mix up data types– if you declare a variable to be of type int, then this is a 4-byte

variable• moving it into a register means that you must move it into a 4-byte

register (such as eax) and not a 2-byte or 1-byte register!• if you try to move a variable into the wrong sized register, or a register

value into the wrong sized variable, you will get a “operand size conflict” syntax error message when compiling your program

– to use ax, bx, cx, dx, declare variables to be of type short– to use eax, ebx, ecx, edx, declare variables to be of type int– also notice that char are 1 byte, so should use either the upper

or lower half a register (al, ah, dl, dh)

top related