midterm solution cs 3339 apan qasem texas state university spring 2015
TRANSCRIPT
Midterm Solution
CS 3339Apan Qasem
Texas State University
Spring 2015
Announcements
Exam 1 Grades
• Average : 73 • Median : 82
• 5 As, 10 Bs, 2 Cs, 5 Ds, 4 Fs
• Top score : 98• Low score : 37
MCQ 1
(3 pts) A word is equal to
a) 2 bytesb) 4 bytesc) 8 bytesd) 16 bytese) depends on the architecture
MCQ 2
(3 pts) A store instructiona) takes a data value from a register and puts it
in memoryb) takes a data value from memory and puts it
into a register c) moves a data value from one register to
anotherd) moves a data value from one location in
memory to another location e) all of the above
MCQ 3
(3 pts) The metric used to measure performance in terms of
completed tasks per unit time is called
a) clock rateb) clock cycle timec) CPU execution timed) instruction counte) throughput
MCQ 4
(3 pts) Moore’s law states that a) memory speed doubles every eighteen
monthsb) number of cores per chip doubles every
eighteen monthsc) processor speed doubles every eighteen
monthsd) processor power consumption doubles every
eighteen monthse) all of the above
MCQ 5
(3 pts) The best way to summarize speedup numbers is to use
a) arithmetic meanb) geometric meanc) harmonic meand) mediane) mode
MCQ 6
(3 pts) If we take the 4-bit, 2’s complement value 1110 and sign-extend
it to 8 bits, we get
a) 0000 1110b) 1110 1110c) 1111 1110d) 1110 0000e) 1110 1111
MCQ 7
(3 pts) Underflow for floating-point values may occur when
a) A positive exponent is too largeb) A negative exponent is too largec) The fraction is too larged) The fraction is too smalle) None of the above
Problem 1
Problem 1
Performance Equation: Instruction Count x CPI
CPU Time = -------------------------------- Clock Rate
• Problem mentions nothing about the number of instructions executed on each platform; assume
IC = ICA = ICB = 100
• Clock Rate is given• Need to compute Effective CPI
Problem 1
Assuming, 100 instructions:
Total cycles for system A = 60 x 1 + 20 x 10 + 10 x 10 + 10 x 3 = 60 + 200 + 100 + 30 = 390Total cycles for system B = 60 x 1 + 20 x 10 + 10 x 10 x 3 = 130
Effective CPIA = total cycles/IC = 390/100 = 3.9
Effective CPIB = total cycles/IC = 130/100 = 1.3
CPUA = (3.9 x 100)/3 = 390/3 = 130
CPUB = (1.3 x 100)/2 = 130/2 = 65
B is faster. Speedup = CPUA/CPUB = 130/65 = 2
Problem 1
• Speedup = old/new OR slow/fast• higher value is better• don’t say, achieved a 100% speedup• say, achieved a factor of 2 speedup
• If you put clock rate in your equation early on, the computation got more cumbersome
• also don’t need to convert clock rate to seconds, just computing ratios
Problem 2
• (10 pts) [Binary Arithmetic] $s1 is a 4-bit register and holds the value 1011. Show how the contents of the register change as we apply each of the following operations. (Show contents after each operation)
sll $s1, $s1, 1
add $s1, $s1, $s1
srl $s1, $s1, 2
beq $s1, $s1, 100
1 0 1 1
0 1 1 0
1 1 0 0
0 0 1 1
0 0 1 1
Problem 3
• (15 pts) [FP Representation] Show how the bit string for +2010 would be stored in memory if represented in IEEE floating point standard. Assume you have an 8-bit machine and all floating point values are stored using 8 bits: 3 bits for the exponent (with a bias of 3) and 4 bits for the fraction.
Problem 3
+ 2010
1. Express number as some multiple of a power of 2 value 20 = 5 x 4= 5 x 22
2. Convert to binary5x 22 = 101 x 22
3. Determine place for decimal : nothing to do4. Normalize
101. x 22 = 1.01 x 24
• Adjust for IEEE• Fraction = 0100 [4 bits]• Exponent = bias + 4 = 3 + 4 = 111
0 111 0100
E FS
Problem 3
• If you got the formatting right, you got 6 points
• People mostly lost points in calculating the bias (up to 5 points)
• Lost 2 points if you didn’t account for the hidden bit
(15 pts) [Instruction Encoding and Implementation] The IBM PowerPC supports several additional addressing modes beyond the ones we have discussed in class. One, called indexed addressing, adds the values stored in two registers (whose register file addresses are contained in the instruction) to form the memory address of the operand.
Problem 4
Problem 4(a)
Show one possible encoding of a load instruction with indexed
addressing. Assume, instructions are 32 bits.
• Need place for 2 source register addresses • Need place for opcode • Need place for address of destination register (load instruction)
opcode src reg 1 src reg 2 dest reg ignored
6 bits 5 bits 5 bits 5 bits 11 bits
Assuming 32 registers
Problem 4(b)
Complete the diagram below to show the implementation of a load instruction
ReadAddr Instr[31-0]
DataMemory
Addr
Write Data
Read Data
[25-21]
[20-16]
[15-11]
[10-0]
[26-31]
ignored
control
unit
> Write Data
Read Addr 1
Read Addr 2
Write Addr
RegisterFile
ReadData 1 <
ReadData 2 <
ALU
ALU control
ADD
RegWrite
1
MemRead
1
InstructionMemory
Problem 5
• (12 pts) [Datapath Design] Consider an assembly instruction, increment-and-skip-on-zero (isz) that increments (by 1) the contents of a register, stores the incremented value back in the register, and skips the next instruction if the result of the increment is zero. All other instructions remain unchanged.
• Show one possible encoding of this instruction. Assume, instructions are 32 bits and you have 64 registers in the system. Explain the use of each field.
• Need 6 bits for opcode• Need 6 bits for source and destination register• Can ignore rest opcode src/dest unused
6 bits 6 bits 20 bits
Problem 5 (b)
(5 pts) How does the inclusion of this instruction, affect CPU execution time (discuss all three components)
•Will increase CPI (assuming clock cycle length does not change)•Will decrease instruction count (assuming instruction is emitted)•Will not affect clock rate unless an architectural revision is implemented
Problem 5 (c)
• (2 pts) Would you expect this instruction to be included in a RISC or CISC architecture? Justify your answer
• Expect to see this in CISC because it is a complex instruction (performing several different tasks)
Problem 6
(10 pts) Consider the following information “Spintel, a leading processor company, reports that a major architectural
breakthrough allows systems equipped with their processors achieve 6.5 GLFOPS on average on the PECS benchmark suite. A competing company, LAMDA, says their current processors achieve 80% of the performance of the Spintel systems on the same benchmark suite. However, LAMDA processors are about 20% less expensive than Spintel processors.”
Say, you work for TX State and you need to make a recommendation to the Vice President of IT about what machines should be purchased to equip the labs in Derrick Hall.
What would be your recommendation? Assume the VP has some time to listen to you, so you may want to discuss the pros and cons of both systems. Mention any other type information that may be helpful in making the recommendation. Note any misleading or ambiguous information.
[Hint: this is not just a performance vs. cost issue ]
Problem 6
• Issues • Performance vs. cost• What kind of benchmark suite is PECS?• What kind of applications are going to be run in the
labs?• Compatibility with existing machines (OS?)• What is the budget?• What does average performance mean?• …
• Up to 1 points extra credit