introduction to assembly here we have a brief introduction to ibm pc assembly language –cisc...
TRANSCRIPT
Introduction to Assembly
• Here we have a brief introduction to IBM PC Assembly Language– CISC instruction set– Special purpose register set– 8 and 16 bit operations initially (expanded to 32
and 64 bit operations for Pentium)– Memory-register and register-register
operations available– Several addressing modes including many
implied addresses
Instruction Format• [name/label] [mnemonic] [operands] [;comments]
– Operands are either literals, variables/constants, or registers
• Number of operands depends on type of instruction, range from 0 to 2
• Examples:– mov ax, bx – 2 operands, source and destination
– mov ax, 5 – one operand is a literal
– mov y, ax – memory to register movement
– add ax, 5 – 2 operands for add
– mul value – 1 operand for mul, other operand implied to be ax
– nop – no operands for the no-op instruction
– je location – 1 operand with comparison implied to be a flag
Literals and Variables• Literals require that the type of
value be specified by following the value with one of the following:– D, d (or nothing) for decimal– H, h for hexadecimal– Q, q for octal– b for binary– Strings are placed in ‘ ’ or “ ”
marks– Examples:
• 10101011b• 0Ah• 35• ‘hello’• “goodbye”
• As we will define all assembly code within C/C++ programs, we will declare all variables in the C/C++ code itself– We only have to worry about
types• int is 32 bit
• short is 16 bit
• char is 8 bit
• We must insure that we place the datum into the right sized register, and that if we reference a literal, it is specified to be of the proper size to fit in the associated variable
Registers• There are 14 registers in the Intel-based architecture
– They are all special purpose • you can only use a given register as it was intended to be used• but there are some exceptions to this rule as described below
– There are 4 data registers:• AX – accumulator
– the only data register– AX is an implied register in the Mul and Div instructions
• BX – base counter – used for addressing, particularly when dealing with arrays and strings– BX can be used as a data register when not used for addressing
• CX – counter – implicitly used in loop instructions– in non-looping instructions, can be used as a data register
• DX – data register – Primarily used for In and Out instructions, but also used to store partial results of Mul and
Div operations– in other cases, can be used as a data register
• In the Pentium architecture, each of these are expanded to 32 bits (EAX, EBX, ECX, EDX), but there are also 8-bit versions (AL, AH, BL, BH, CL, CH, DL, DH)
Other Registers– Other registers can not be used for data but have
specific uses:– Segment registers
• point to different segments in memory• used as implied addressing
– SS – stack– CS – code– DS – data– ES – extra (used as a base pointer for variables)
– Indexing registers• used for offsets to the current procedure, stack, or string depending on the
instruction– BP – base pointer used with SS to address subroutine local variables on the stack– SP – stack pointer used with SS for top of stack– SI and DI – source and destination for string transfers
– IP – program counter– Status flags
Operations: Data Movement• mov and xchg instructions
– mov allows for register-register, memory-register, register-memory, register-immediate and memory-immediate
• first item is destination, second is source
• memory-memory moves must be done with 2 instructions using a register as temporary storage
• memory references can use direct, direct+offset, or register-indirect modes
• if datum is 8-bit, register only uses high or low side, 16-bit uses entire register, 32-bit uses extended register (e.g., EAX, EDX) and 64-bit combines two registers
– xchg instruction allows only register-register, memory-register and register-memory and exchanges two values rather than moves one value as with mov
Operations: Arithmetic/Conditional • inc/dec dest• add/sub dest, source
– dest is register or memory reference, source for add/sub is register, memory reference, or literal
• as long as dest and source are not both memory references
• mul/div source– one datum is source, the other
is implied to be eax (or ax or al)
– destination is implied as eax/edx combined (or ax/dx, al/ah depending on size)
• div places quotient in ax, al or eax, remainder in dx, ah or edx
• mul places low half of result in one place and high half in other
• shl, shr, sal, sar, shld, shrd – shift, shift arithmetic, shift
double
• rol, ror, rcl, rcr – rotate, rotate with carry
• Logic operations: AND, OR, XOR, NOT– form is OP dest, source
• NEG dest– convert two’s complement value
to its opposite
• CMP first, second– compare first and second and set
proper flag(s) (PF, ZF, NF)– the result of cmp operations are
then used for branch instructions
Operations: Branches• Conditional branches:
– yhese must be preceded by an instruction which sets at least one status flag (this includes cmp operations)
• the flag tested is based on which branch is used
– je/jne location • branch if zero flag set/clear
– jg/jge/jl/jle location • jump on > (positive flag set),
>=, < (negative flag set), <=
– jc/jnc/jz/jnz/jp/jnp location• Jump on carry, no carry, zero,
not zero, even parity, not even parity (odd parity)
• Unconditional branches do not use a previous comparison or flag, just branch to given location– jmp location
• jmp instructions are used to implement goto statements and procedure calls
• loop location– decrement cx (or ecx)
– if cx (ecx) != 0 then branch to label location
• used for for-loops
• since cx (or ecx) is used implicitly here, inside such a loop structure, we cannot use cx/ecx as a data register!
Addressing Modes• Immediate – place datum in instruction as a literal
– add ax, 10• use this mode when datum is known at program implementation time
• Direct – place variable in instruction– mov ax, x ; moves x into register ax– add y, ax ; sets y = [y] + [ax]
• use this mode to access a variable in memory
• Direct + Offset– mov ax, x+2 ; if x is word size, then this moves x[2]– mov ax, x[bx] ; offsets into x by the number stored in bx
• Note: mov ax, x[y] is illegal as it has 2 memory references, x and y!• use this mode when dealing with strings, arrays and structs
• Register Indirect – use index and/or segment registers– mov ax, [si + ds] ; base-indexed– mov ax, [si – 4] ; base with displacement– mov ax, [si + ds – 6] ; base-indexed with displacement
• we will not use these modes
Addressing Examples• Imagine that we have declared in C:
– int a[ ] = {0, 11, 15, 21, 99};
• Then, the following accesses give us the values of a as shown:– mov eax, a eax 0– mov eax, a+4 eax 11– mov eax, a+8 eax 15– mov eax, a[ebx] eax 99 if ebx = 16
• If ebx and ecx both = 0 and size is the number of items in the array, then we can iterate through the array as follows:
top: mov eax, a[ebx] … do something with the array value … add ebx, 4 add ecx, 1 cmp ecx, size jl top // use jl since we stop once ecx = = size
Writing Assembly in a C Program
• For simplicity, we will write our code inside of C (or C++) programs
• This allows us to – declare variables in C/C++ thus avoiding the .data section– do I/O in C/C++ thus avoiding difficulties dealing with
assembly input and assembly output– compile our programs rather than dealing with assembling
them using MASM or TASM
• To include assembly code, in your C/C++ program, add the following compiler directive
_ _ asm { }
• And place all of your assembly code between the { }
Data Types
• One problem that might arise in using C/C++ to run our assembly code is that we might mix up data types– if you declare a variable to be of type int, then this is a 4-byte
variable• moving it into a register means that you must move it into a 4-byte
register (such as eax) and not a 2-byte or 1-byte register!• if you try to move a variable into the wrong sized register, or a register
value into the wrong sized variable, you will get a “operand size conflict” syntax error message when compiling your program
– to use ax, bx, cx, dx, declare variables to be of type short– to use eax, ebx, ecx, edx, declare variables to be of type int– also notice that char are 1 byte, so should use either the upper
or lower half a register (al, ah, dl, dh)