data representation and architecture modelling revision

Data Representation and Architecture Modelling Revision

Binary system 1. Conversion 1. Convert decimal to binary 2. Convert binary to decimal and hexadecimal 2. Integer representation 1. Unsigned notation 2. Signed notation 3. Excess notation 4. Tows complement 3. Advantages of using Twos complement

Floating point representation 1. What decimal floating point number is represented by the following 32 bits (single precision format)? Show your workings. 1 100 0011 1 000 1010 0000 0000 0000 0000 2. What is the range of negative numbers in this representation 3. Define negative overflow and underflow in this representation.

Solution 1. Method the sign-bit is one, negative number biased exponent = 10000111 = 128 + 4 +2 + 1 = 135 The real exponent = 135-127= 8 the normalized mantissa = 000 1010 0000 0000 0000 0000. the real mantissa = 1.000101 the final value represented = -(1.000101 2 ) x 2 8 = 100010100 2 = - (256+16+ 4)= -276 2. Negative range: -(2-2 -23 )x 2 127 to- 2- 127 3. Negative overflow and underlflow Negative over: value less than -(2-2 -23 )x 2 127. Negative underflow: 2- 127 < value < 0.

CPU CPU registers: PC, IR, AC,MAR, MBR System bus Data bus, Address bus, and control bus Pipelining Role of pipelining Pipelining hazards (control hazards, data hazards, and structural hazards) What is the disadvantage of using a very long stage pipeline?

Exercise Suppose you have designed a processor implementation whose five pipeline stages take the following amounts of time: IF(instruction fetch)=20ns, ID (instruction decode)=10ns, EX (execution)=20ns, MEM (memory operation)=35ns and WB (write back)=10ns. (a) What is the minimum clock period for which your processor functions properly? (b) What should be redesigned first to improve this processors performance? (c) Assume this processor is redesigned with 50 pipeline stages. Is it true to say that the new processor is 10 times faster than the previous design with 5 pipeline stages?

solution (a) The minimum clock period is the time of the longest stage: stage MEM takes 35ns. (b) The MEM should be redesigned to reduce the clock cycle. (c) Probably not. Longer pipelines can be faster due to higher clock rates, unlikely that the clock rate is 10x faster due to uneven pipeline stages and register overheads Furthermore, longer pipelines tend to make data and control hazards require longer stalls. higher clock-rate processor is likely to be more power- hungry proportional to the increase in clock-speed

Question 2 An instruction requires four stages to execute: stage 1 (instruction fetch) requires 30 ns, stage 2 (instruction decode) = 9 ns, stage 3 (instruction execute) = 20 ns and stage 4 (store results) = 10 ns. An instruction must proceed through the stages in sequence. 1) What is the minimum asynchronous time for any single instruction to complete? 2) We want to set this up as a pipelined operation. How many stages should we have and at what rate should we clock the pipeline?

Hints 1) The minimum time it takes to execute all the 4 stages of an instruction. We have 4 natural stages given and no information on how we might be able to further subdivide them, so we use 4 stages in our pipeline. Clock rate? use the longest stage Or use a time that closely matches the shortest stage, but integrally divisible into the other stages. DISCUSS EACH CASE.

Question 3 The pipeline for these instructions runs with a 100 MHz clock with the following stages: instruction fetch = 2 clocks, instruction decode = 1 clock, fetch operands = 1 clock, execute = 2 clocks, and store result = 1 clock.

HINTS FOR QUESTION 3 1) THE longest stage takes two cycle. Hence we need to execute one instruction per 2 cycles. What is the rate then? 2) The Operand Fetch unit must wait until the prior instruction stores its result. before it can retrieve one of its operands (e.g. Op Fetch for 2 must wait until Op Store for 1 completes). As a result, things begin backing up in the pipeline, and we produce one instruction output only every 4 cycles.

No dependencies Execute instruction every 2 cycles. Cock rate?

dependency From the table we still begin fetching instructions every two cycles. However the operand fetch for 2 instruction must wait until Op Store for instruction 1 completes. (wait for another 2 cycles). Hence, the rate????

Memories CPU registers Cache memory Main memory (electronic memory) Magnetic memory (hard drive) Optical memory Magnetic tape

Cache memory Cache memory enhances computer performance using: Temporal locality principle Spatial locality principle Cache mapping Associative Mapped Cache Direct-Mapped Cache Set-Associative Mapped Cache

Why is cache memory needed? CPU slowed down by the main memory When a program references a memory location, it is likely to reference that same memory location again soon. A memory location that is near a recently referenced location is more likely to be referenced than a memory location that is far away.

Cache memory Resides between the CPU and the main memory Operates at a speed near to that of the CPU Data is exchanged between CPU and main memory through the cache memory Cache memory use locality principles to enhances computer performance. Temporal locality principle Spatial locality principle

Temporal locality principle When a program references a memory location, it is likely to reference that same memory location again soon. Cache memory keeps records of data recently being used.

Spatial locality principle A memory location that is near a recently referenced location is more likely to be referenced than a memory location that is far away. Cache memory copies not only the recently referenced memory locations but also its nearby.

Cache mapping Commonly used methods: Associative Mapped Cache Direct-Mapped Cache Set-Associative Mapped Cache

Associative Mapped Cache Any main memory blocks can be mapped into each cache slot. To keep track of which of the 2 27 possible blocks is in each slot, a 27-bit tag field is added to each slot.

Associative Mapped Cache Valid bit is needed to indicate whether or not the slot holds a line that belongs to the program being executed. Dirty bit keeps track of whether or not a line has been modified while it is in the cache.

Associative Mapped Cache The mapping from main memory blocks to cache slots is performed by partitioning an address into fields. For each slot, if the valid bit is 1, then the tag field of the referenced address is compared with the tag field of the slot.

Associative Mapped Cache How an access to the memory location (A035F014) 16 is mapped to the cache. If the addressed word is in the cache, it will be found in word (14) 16 of a slot that has a tag of (501AF80) 16, which is made up of the 27 most significant bits of the address.

Associative Mapped Cache Advantages Any main memory block can be placed into any cache slot. Regardless of how irregular the data and program references are, if a slot is available for the block, it can be stored in the cache.

Associative Mapped Cache Disadvantages Considerable hardware overhead needed for cache bookkeeping. There must be a mechanism for searching the tag memory in parallel.

Direct-Mapped Cache Each cache slot corresponds to an explicit set of main memory. In our example we have 2 27 memory blocks and 2 14 cache slots. A total of 2 27 / 2 14 = 2 13 main memory blocks can be mapped onto each cache slot.

Direct-Mapped Cache The 32-bit main memory address is partitioned into a 13-bit tag field, followed by a 14-bit slot field, followed by a five-bit word field.

Direct-Mapped Cache When a reference is made to the main memory address, the slot field identifies in which of the 2 14 slots the block will be found. If the valid bit is 1, then the tag field of the referenced address is compared with the tag field of the slot.

Direct-Mapped Cache How an access to memory location (A035F014) 16 is mapped to the cache. If the addressed word is in the cache, it will be found in word (14) 16 of slot (2F80) 16 which will have a tag of (1406) 16.

Direct-Mapped Cache Advantages Simple and inexpensive The tag memory is much smaller than in associative mapped cache. No need for an associative search, since the slot field is used to direct the comparison to a single field.

Direct-Mapped Cache Disadvantages Fixed location for a given memory block. If a program accesses 2 blocks that map to the same line repeatedly, caches misses are very high.

Set-Associative Mapped Cache Combines the simplicity of direct mapping with the flexibility of associative mapping For this example, two slots make up a set. Since there are 2 14 slots in the cache, there are 2 14 /2 =2 13 sets.

Set-Associative Mapped Cache When an address is mapped to a set, the direct mapping scheme is used, and then associative mapping is used within a set.

Set-Associative Mapped Cache The format for an address has 13 bits in the set field, which identifies the set in which the addressed word will be found. Five bits are used for the word field and 14-bit tag field.

Typical exam question Explain the difference between direct mapped cache and associative mapped cache. Explain how cache memory uses temporal and spatial locality principles to enhance computers performance.

Web languages (html,xml, xhtml) Difference between these languages Disadvantages of using html How does XHTML solve these problems Advantages of CSS Difference between HTML selector, CLASS selectors and ID selectors

htlm selector: h{ bgcolor:green; color: red; font-weight: bold; } Class selector:.section { color: red; font-weight: bold; } ID selector: #section{ color: red; font-weight: bold; } An ID selector applies styles to an element in the same way as a class. The main difference between an ID selector and a class is that an ID can be used only once on each page, whereas a class can be used many times.

Computer networks Network classes and default mask TCP/IP model (internet model) The role of each layer Example of protocols at each layer and there role. TCP vs UDP How is error and flow control achieved? Layer responsible for this? Subnetting Role of subnetting Subnet address Host address Broadcast address Range of addresses in a subnet

Exercise Given a host configuration with an IP address 192.158.15.33 and a subnet mask 255.255.255.248: What is the subnet address? What is the host address? What is the broadcast address? What is the number of possible hosts and range of host addresses in this subnet?

Solution 192.168.10.32 0.0.0.1 192.168.10.39 The number if bits for the host is 3 and therefore the number if hosts allowed in in this subnet is 2 3 -2=6 The range of address is 192.168.10.33 - 192.168.10.38.

Exam Duration 1:30 hours 3 questions: 30 minutes each Time : May Preparation: Past exam papers Revise all the questions given in two assignments Consult revision slides Concentrate on the preparation list Attempt the Mock exam on my website Next week mock exam

Fin Good Luck

data representation and architecture modelling revision

Documents

clockspeed slide

uneven pipeline stages

natural stages

stage mem

single instruction

minimum clock period

long stage pipeline

data hazards