cs/ece 552: instruction setssinclair/courses/cs552/spring...cs/ece 552: instruction sets prof....

Post on 30-Mar-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS/ECE 552: Instruction Sets

Prof. Matthew D. Sinclair

Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, David Wood, Guri Sohi, Josh San

Miguel, John Shen, and Jim Smith

This Class

• Instruction set architectures (ISAs)

• MIPS

• ALU ops, loads/stores, branches/jumps

2

Instruction Sets

MIPS/MIPS-like ISA used in 552Simple, sensible, regular, easy to design CPU

Most common: x86 (IA-32) and ARMx86: Intel Pentium/Core i7, AMD Athlon, etc.

ARM: cell phones, embedded systems, IOT

Others:PowerPC (IBM servers)

SPARC (Sun)

Alpha (Alpha)

RISC-V (SiFive, some use internally elsewhere, e.g. NVIDIA)

3

Basics

• C statement

f = (g + h) – (i + j)

• MIPS instructions

add t0, g, h

add t1, i, j

sub f, t0, t1

• Multiple instructions for one C statement

4

Opcode/Mnemonic: Specifies operation

Operands: Input and output data

Source

Destination

Why not bigger instructions?

• Why not “f = (g + h) – (i + j)” as one instruction?• Church’s thesis: A very primitive computer can

compute anything that a fancy computer can compute – you need only logical functions, read and write memory, and data-dependent decisions

• Therefore, ISA selected for practical reasons:– Performance and cost, not computability

• Regularity tends to improve both– E.g. H/W to handle arbitrary number of operands is

complex and slow and UNNECESSARY

• Tradeoff between regularity and fewer instructors– Discussed in greater detail in CS/ECE 752

5

Arithmetic-Logic Unit Ops

• Some ALU ops:– add, addi, addu, addiu (immediate, unsigned)

– sub …

– mul, div – wider result • 32b x 32b = 64b product

• 32b / 32b = 32b quotient and 32b remainder

– and, andi

– or, ori

– sll, srl

• Why registers?– Short name fits in instruction word: log2(32) = 5 bits

• But are registers enough?6

Memory and Load/Store

• Need more than 32 words of storage

• An array of locations M[j] indexed by j

• Data movement (on words or integers)

– Load word for register <= memory

lw $t1, 0($s1) # where [$s1]=4008; get input g

– Store word for register => memory

sw $t1, 0($s0) # where [$s0]=4004; save output f

7

Memory and Load/Store

$0

$31

Processor

Re

gis

ters

ALU

Memory0

maxmem

4004

4008

f

g

8000

8004

A[0]

A[1]

8008 A[2]

8

Branches and Jumps

while ( i != j) {

j= j + i;

i= i + 1;

}

9

Branches and Jumps

while ( i != j) {

j= j + i;

i= i + 1;

}

# [$s1] is i, [$s2] is j

Loop:

Exit:

10

Branches and Jumps

while ( i != j) {

j= j + i;

i= i + 1;

}

# [$s1] is i, [$s2] is j

Loop: beq $s1, $s2, Exit

add $s2, $s2, $s1

addi $s1, $s1 , 1

j Loop

Exit:

11

Branches and Jumps

• MIPS branchesbeq $s1, $s2, imm # if ($s1==$s2) PC = PC + imm<< 2 else PC += 4;

bne …

slt, sle, sgt, sge

• Unconditional jumpsj addr # PC = addr

jr $s3 # PC = $s3

jal addr # $31 = PC + 4; PC = addr;

(used for function calls)

12

Exercise – MIPS Assembly

What does this assembly code do?

13

lw $t0, 0($s0)nor $t0, $t0, $zeroaddi $t0, $t0, 1sw $t0, 0($s0)

Exercise – MIPS Assembly

What does this assembly code do?

14

lw $t0, 0($s0)nor $t0, $t0, $zeroaddi $t0, $t0, 1sw $t0, 0($s0)

Computes two’s complement of integer at ($s0).

Exercise – MIPS Assembly

What does this assembly code do?

15

lw $t0, 0($s0)nor $t0, $t0, $zeroaddi $t0, $t0, 1sw $t0, 0($s0)

Computes two’s complement of integer at ($s0).

Reduce the number of instructions executed?

Exercise – MIPS Assembly

What does this assembly code do?

16

lw $t0, 0($s0)sub $t0, $zero, $t0sw $t0, 0($s0)

Computes two’s complement of integer at ($s0).

Reduce the number of instructions executed?

Exercise – MIPS Assembly

What does this assembly code do?

17

addi $t0, $zero, 4096lw $t1, 0($s0)slt $t2, $t1, $zeronor $t2, $zero, $t2slt $t3, $t1, $t0and $t4, $t2, $t3bne $t4, $zero, TARGET

Exercise – MIPS Assembly

What does this assembly code do?

18

addi $t0, $zero, 4096lw $t1, 0($s0)slt $t2, $t1, $zeronor $t2, $zero, $t2slt $t3, $t1, $t0and $t4, $t2, $t3bne $t4, $zero, TARGET

Branches to TARGET if 0 ≤ ($s0) < 4096.

Exercise – MIPS Assembly

What does this assembly code do?

19

addi $t0, $zero, 4096lw $t1, 0($s0)slt $t2, $t1, $zeronor $t2, $zero, $t2slt $t3, $t1, $t0and $t4, $t2, $t3bne $t4, $zero, TARGET

Branches to TARGET if 0 ≤ ($s0) < 4096.

Reduce the number of instructions executed?

Exercise – MIPS Assembly

What does this assembly code do?

20

addi $t0, $zero, -4096lw $t1, 0($s0)slt $t2, $t1, $zeronor $t2, $zero, $t2slt $t3, $t1, $t0and $t4, $t0, $t1beq $t4, $zero, TARGET

Branches to TARGET if 0 ≤ ($s0) < 4096.

Reduce the number of instructions executed?

Exercise – MIPS Assembly

What does this assembly code do?

21

addi $t0, $zero, -4096lw $t1, 0($s0)and $t4, $t0, $t1beq $t4, $zero, TARGET

Branches to TARGET if 0 ≤ ($s0) < 4096.

Reduce the number of instructions executed?

Exercise – MIPS Assembly

What does this assembly code do?

22

addi $t0, $zero, -4096lw $t1, 0($s0)and $t4, $t0, $t1beq $t4, $zero, TARGET

Branches to TARGET if 0 ≤ ($s0) < 4096.

Can reduce even more?

Exercise – MIPS Assembly

What does this assembly code do?

23

addi $t0, $zero, -4096lw $t1, 0($s0)sltiu $t4, $t1, 4096bne $t4, $zero, TARGET

Branches to TARGET if 0 ≤ ($s0) < 4096.

Can reduce even more?

Exercise – MIPS Assembly

What does this assembly code do?

24

lw $t1, 0($s0)sltiu $t4, $t1, 4096bne $t4, $zero, TARGET

Branches to TARGET if 0 ≤ ($s0) < 4096.

Can reduce even more?

CS/ECE 552: Instruction Sets (Part 2)

Prof. Matthew D. Sinclair

Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, David Wood, Guri Sohi, Josh San

Miguel, John Shen, and Jim Smith

This Class

• Machine code

• Stack and procedures

• Endianness

26

MIPS Machine Language

• All instructions are 32 bits wide

• Assembly: add $1, $2, $3

• Machine language:33222222222211111111110000000000

10987654321098765432109876543210

00000000010000110000100000010000

000000 00010 00011 00001 00000 010000

alu-rr 2 3 1 zero add (signed)

27

Instruction Format

• R-format

– opc rs rt rd shamt function

– 6 5 5 5 5 6

• Digression:

– How do you store the number 4,392,976?• Same as add $1, $2, $3

• Stored program: instructions are represented as numbers

– Programs can be read/written in memory like numbers

• Other R-format: addu, sub, …

28

Instruction Format

• Assembly: lw $1, 72($2)

• Machine: 100011 00010 00001 0000000001001000

lw 2 1 72

• I-format

– opc rs rt address/immediate

– 6 5 5 16

29

Summary: Instruction Formats

R: opcode rs rt rd shamt function

6 5 5 5 5 6

I: opcode rs rt address/immediate

6 5 5 16

J: opcode addr

6 26

30

Stack

• Stack: parameters, return values, return address• Stack grows from higher to lower addresses

(indefinitely)• $sp ($29) is stack pointer: points to top (recent) word• Push $t2:

addi $sp, $sp, -4

sw $t2, 0($sp)

• Pop $t2:lw $t2, 0($sp)

addi $sp, $sp, 4

31

Procedure Calls

• Calling convention is part of ABI– Caller

• Save registers ($t regs are caller-saved)– Push current values of registers that you will read after the procedure returns

• Set up arguments

• Call procedure (jal)

• Get results

• Restore (pop) registers

– Callee• Save registers ($s regs are callee-saved)

– Push current values of registers that you will (over)write within the procedure

• Perform procedure work, set up result

• Restore (pop) registers

• Return (jr $ra)32

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2add $t1, $a0, $t1 lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) jr $ra

33

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) jr $ra

34

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) jr $ra

35

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) jr $ra

36

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) sw $t0, 4($t1) jr $ra

37

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) jr $ra

38

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) # v[k+1] = tempjr $ra

39

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) # v[k+1] = tempjr $ra # return

40

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) # v[k+1] = tempjr $ra # return

What can go wrong if we use $s1 instead of $t1?41

ProcedureExample

swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;

}

# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries

swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) # v[k+1] = tempjr $ra # return

What can go wrong if we use $s1 instead of $t1?42

lw $s1, 0($t2)jal swapadd $t3, $s4, $s1

Endianness• How bytes within a word are addressed

• Big endian: LS byte at address xxxxxx11 (binary)

– E.g. IBM, SPARC

• Little endian: LS byte at address xxxxxx00 (binary)

– E.g. Intel x86

• Mode selectable

– E.g. PowerPC, MIPS

• Causes headaches for

– Ugly pointer arithmetic

– Multibyte datatype transfers from one machine to another

43

Endianness

• All instructions are 32 bits wide

• Assembly: add $1, $2, $3

• Machine language:33222222222211111111110000000000

10987654321098765432109876543210

00000000010000110000100000010000

000000 00010 00011 00001 00000 010000

alu-rr 2 3 1 zero add (signed)

44

Little endian

addr 00 0x10

addr 01 0x08

addr 10 0x43

addr 11 0x00

Big endian

addr 00 0x00

addr 01 0x43

addr 10 0x08

addr 11 0x10

Exercise – MIPS Assembly

Fill in the blanks:

45

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;int i = 0;

do {sum += A[i];i++;

} while (i < N);

addi $s0, $zero, ___add $t0, $zero, $zeroadd $t1, $zero, $zero___ $s1, $zero, 64

LOOP: sll $t2, $t1, ___add $t2, $t2, $s0lw $t3, 0($t2)add $t0, ___, $t3addi $t1, $t1, 1slt $t4, ___, ___bne $t4, $zero, LOOP

Exercise – MIPS Assembly

Fill in the blanks:

46

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;int i = 0;

do {sum += A[i];i++;

} while (i < N);

addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64

LOOP: sll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1slt $t4, $t1, $s1bne $t4, $zero, LOOP

Exercise – MIPS Assembly

Does the assembly correspond to the C code?

47

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64

LOOP: sll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1slt $t4, $t1, $s1bne $t4, $zero, LOOP

Exercise – MIPS Assembly

Fill in the blanks:

48

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64

LOOP: slt $t4, $t1, $s1___ $t4, $zero, ___sll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1___ LOOP

EXIT:

Exercise – MIPS Assembly

Fill in the blanks:

49

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64

LOOP: slt $t4, $t1, $s1beq $t4, $zero, EXITsll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1j LOOP

EXIT:

Exercise – MIPS Assembly

Reduce the number of instructions executed?

50

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64

LOOP: slt $t4, $t1, $s1beq $t4, $zero, EXITsll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1j LOOP

EXIT:

Exercise – MIPS Assembly

Reduce the number of instructions executed?

51

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64

LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITsll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP

EXIT:

Exercise – MIPS Assembly

Reduce the number of instructions executed?

52

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64

LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITsll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP

EXIT:

Exercise – MIPS Assembly

Reduce the number of instructions executed?

53

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroaddi $s1, $zero, 64

LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITlw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP

EXIT:

Exercise – MIPS Assembly

Reduce the number of instructions executed?

54

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroaddi $s1, $zero, ___

LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITlw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP

EXIT:

Exercise – MIPS Assembly

Reduce the number of instructions executed?

55

// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;

for (int i = 0; i < N; i++) {sum += A[i];

}

addi $s0, $zero, 256add $t0, $zero, $zeroaddi $s1, $zero, 512

LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITlw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP

EXIT:

top related