midterm solution cs 3339 apan qasem texas state university spring 2015

Midterm Solution

CS 3339Apan Qasem

Texas State University

Spring 2015

Announcements

Exam 1 Grades

• Average : 73 • Median : 82

• 5 As, 10 Bs, 2 Cs, 5 Ds, 4 Fs

• Top score : 98• Low score : 37

MCQ 1

(3 pts) A word is equal to

a) 2 bytesb) 4 bytesc) 8 bytesd) 16 bytese) depends on the architecture

MCQ 2

(3 pts) A store instructiona) takes a data value from a register and puts it

in memoryb) takes a data value from memory and puts it

into a register c) moves a data value from one register to

anotherd) moves a data value from one location in

memory to another location e) all of the above

MCQ 3

(3 pts) The metric used to measure performance in terms of

completed tasks per unit time is called

a) clock rateb) clock cycle timec) CPU execution timed) instruction counte) throughput

MCQ 4

(3 pts) Moore’s law states that a) memory speed doubles every eighteen

monthsb) number of cores per chip doubles every

eighteen monthsc) processor speed doubles every eighteen

monthsd) processor power consumption doubles every

eighteen monthse) all of the above

MCQ 5

(3 pts) The best way to summarize speedup numbers is to use

a) arithmetic meanb) geometric meanc) harmonic meand) mediane) mode

MCQ 6

(3 pts) If we take the 4-bit, 2’s complement value 1110 and sign-extend

it to 8 bits, we get

a) 0000 1110b) 1110 1110c) 1111 1110d) 1110 0000e) 1110 1111

MCQ 7

(3 pts) Underflow for floating-point values may occur when

a) A positive exponent is too largeb) A negative exponent is too largec) The fraction is too larged) The fraction is too smalle) None of the above

Problem 1

Problem 1

Performance Equation: Instruction Count x CPI

CPU Time = -------------------------------- Clock Rate

• Problem mentions nothing about the number of instructions executed on each platform; assume

IC = ICA = ICB = 100

• Clock Rate is given• Need to compute Effective CPI

Problem 1

Assuming, 100 instructions:

Total cycles for system A = 60 x 1 + 20 x 10 + 10 x 10 + 10 x 3 = 60 + 200 + 100 + 30 = 390Total cycles for system B = 60 x 1 + 20 x 10 + 10 x 10 x 3 = 130

Effective CPIA = total cycles/IC = 390/100 = 3.9

Effective CPIB = total cycles/IC = 130/100 = 1.3

CPUA = (3.9 x 100)/3 = 390/3 = 130

CPUB = (1.3 x 100)/2 = 130/2 = 65

B is faster. Speedup = CPUA/CPUB = 130/65 = 2

Problem 1

• Speedup = old/new OR slow/fast• higher value is better• don’t say, achieved a 100% speedup• say, achieved a factor of 2 speedup

• If you put clock rate in your equation early on, the computation got more cumbersome

• also don’t need to convert clock rate to seconds, just computing ratios

Problem 2

• (10 pts) [Binary Arithmetic] $s1 is a 4-bit register and holds the value 1011. Show how the contents of the register change as we apply each of the following operations. (Show contents after each operation)

sll $s1, $s1, 1

add $s1, $s1, $s1

srl $s1, $s1, 2

beq $s1, $s1, 100

1 0 1 1

0 1 1 0

1 1 0 0

0 0 1 1

0 0 1 1

Problem 3

• (15 pts) [FP Representation] Show how the bit string for +2010 would be stored in memory if represented in IEEE floating point standard. Assume you have an 8-bit machine and all floating point values are stored using 8 bits: 3 bits for the exponent (with a bias of 3) and 4 bits for the fraction.

Problem 3

+ 2010

1. Express number as some multiple of a power of 2 value 20 = 5 x 4= 5 x 22

2. Convert to binary5x 22 = 101 x 22

3. Determine place for decimal : nothing to do4. Normalize

101. x 22 = 1.01 x 24

• Adjust for IEEE• Fraction = 0100 [4 bits]• Exponent = bias + 4 = 3 + 4 = 111

0 111 0100

E FS

Problem 3

• If you got the formatting right, you got 6 points

• People mostly lost points in calculating the bias (up to 5 points)

• Lost 2 points if you didn’t account for the hidden bit

(15 pts) [Instruction Encoding and Implementation] The IBM PowerPC supports several additional addressing modes beyond the ones we have discussed in class. One, called indexed addressing, adds the values stored in two registers (whose register file addresses are contained in the instruction) to form the memory address of the operand.

Problem 4

Problem 4(a)

Show one possible encoding of a load instruction with indexed

addressing. Assume, instructions are 32 bits.

• Need place for 2 source register addresses • Need place for opcode • Need place for address of destination register (load instruction)

opcode src reg 1 src reg 2 dest reg ignored

6 bits 5 bits 5 bits 5 bits 11 bits

Assuming 32 registers

Problem 4(b)

Complete the diagram below to show the implementation of a load instruction

ReadAddr Instr[31-0]

DataMemory

Addr

Write Data

Read Data

[25-21]

[20-16]

[15-11]

[10-0]

[26-31]

ignored

control

unit

> Write Data

Read Addr 1

Read Addr 2

Write Addr

RegisterFile

ReadData 1 <

ReadData 2 <

ALU

ALU control

ADD

RegWrite

1

MemRead

1

InstructionMemory

Problem 5

• (12 pts) [Datapath Design] Consider an assembly instruction, increment-and-skip-on-zero (isz) that increments (by 1) the contents of a register, stores the incremented value back in the register, and skips the next instruction if the result of the increment is zero. All other instructions remain unchanged.

• Show one possible encoding of this instruction. Assume, instructions are 32 bits and you have 64 registers in the system. Explain the use of each field.

• Need 6 bits for opcode• Need 6 bits for source and destination register• Can ignore rest opcode src/dest unused

6 bits 6 bits 20 bits

Problem 5 (b)

(5 pts) How does the inclusion of this instruction, affect CPU execution time (discuss all three components)

•Will increase CPI (assuming clock cycle length does not change)•Will decrease instruction count (assuming instruction is emitted)•Will not affect clock rate unless an architectural revision is implemented

Problem 5 (c)

• (2 pts) Would you expect this instruction to be included in a RISC or CISC architecture? Justify your answer

• Expect to see this in CISC because it is a complex instruction (performing several different tasks)

Problem 6

(10 pts) Consider the following information “Spintel, a leading processor company, reports that a major architectural

breakthrough allows systems equipped with their processors achieve 6.5 GLFOPS on average on the PECS benchmark suite. A competing company, LAMDA, says their current processors achieve 80% of the performance of the Spintel systems on the same benchmark suite. However, LAMDA processors are about 20% less expensive than Spintel processors.”

Say, you work for TX State and you need to make a recommendation to the Vice President of IT about what machines should be purchased to equip the labs in Derrick Hall.

What would be your recommendation? Assume the VP has some time to listen to you, so you may want to discuss the pros and cons of both systems. Mention any other type information that may be helpful in making the recommendation. Note any misleading or ambiguous information.

[Hint: this is not just a performance vs. cost issue ]

Problem 6

• Issues • Performance vs. cost• What kind of benchmark suite is PECS?• What kind of applications are going to be run in the

labs?• Compatibility with existing machines (OS?)• What is the budget?• What does average performance mean?• …

• Up to 1 points extra credit

midterm solution cs 3339 apan qasem texas state university spring 2015

Documents

architecture slide

effective cpi slide

dmedian emode slide

x cpi cpu time

data value

s1 srl

pts underflow

clock rate problem