chapter 8 cpu and memory: design, implementation, and enhancement the architecture of computer...

Post on 13-Jan-2016

240 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 8CPU and Memory:Design, Implementation, and Enhancement

The Architecture of Computer Hardware and Systems Software: An Information Technology Approach

3rd Edition, Irv Englander

John Wiley and Sons 2003

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-2

CPU Architecture Overview

CISC – Complex Instruction Set Computer RISC – Reduced Instruction Set Computer CISC vs. RISC Comparisons VLIW – Very Long Instruction Word EPIC – Explicitly Parallel Instruction

Computer

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-3

CISC Architecture

Examples Intel x86, IBM Z-Series Mainframes, older

CPU architectures Characteristics

Few general purpose registers Many addressing modes Large number of specialized, complex

instructions Instructions are of varying sizes

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-4

Limitations of CISC Architecture

Complex instructions are infrequently used by programmers and compilers

Memory references, loads and stores, are slow and account for a significant fraction of all instructions

Procedure and function calls are a major bottleneck Passing arguments Storing and retrieving values in registers

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-5

RISC Features Examples

Power PC, Sun Sparc, Motorola 68000 Limited and simple instruction set Fixed length, fixed format instruction words

Enable pipelining, parallel fetches and executions Limited addressing modes

Reduce complicated hardware Register-oriented instruction set

Reduce memory accesses Large bank of registers

Reduce memory accesses Efficient procedure calls

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-6

CISC vs. RISC Processing

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-7

Circular Register Buffer

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-8

Circular Register Buffer- After Procedure Call

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-9

CISC vs. RISC Performance Comparison RISC Simpler instructions

more instructions

more memory accesses RISC more bus traffic and

increased cache memory misses More registers would improve CISC

performance but no space available for them Modern CISC and RISC architectures are

becoming similar

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-10

VLIW Architecture Transmeta Crusoe CPU 128-bit instruction bundle = molecule

4 32-bit atoms (atom = instruction) Parallel processing of 4 instructions

64 general purpose registers Code morphing layer

Translates instructions written for other CPUs into molecules

Instructions are not written directly for the Crusoe CPU

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-11

EPIC Architecture Intel Itanium CPU 128-bit instruction bundle

3 41-bit instructions 5 bits to identify type of instructions in bundle

128 64-bit general purpose registers 128 82-bit floating point registers Intel X86 instruction set included Programmers and compilers follow guidelines

to ensure parallel execution of instructions

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-12

Paging

Managed by the operating system Built into the hardware Independent of application

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-13

Logical vs. Physical Addresses

Logical addresses are relative locations of data, instructions and branch target and are separate from physical addresses

Logical addresses mapped to physical addresses

Physical addresses do not need to be consecutive

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-14

Logical vs. Physical Address

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-15

Page Address Layout

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-16

Page Translation Process

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-17

Memory Enhancements Memory is slow compared to CPU processing

speeds! 2Ghz CPU = 1 cycle in ½ of a billionth of a second 70ns DRAM = 1 access in 70 millionth of a second

Methods to improvement memory accesses Wide Path Memory Access

Retrieve multiple bytes instead of 1 byte at a time Memory Interleaving

Partition memory into subsections, each with its own address register and data register

Cache Memory

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-18

Memory Interleaving

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-19

Why Cache?

Even the fastest hard disk has an access time of about 10 milliseconds

2Ghz CPU waiting 10 millisecondswastes 20 million clock cycles!

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-20

Cache Memory Blocks: 8 or 16 bytes Tags: location in main memory Cache controller

hardware that checks tags Cache Line

Unit of transfer between storage and cache memory Hit Ratio: ratio of hits out of total requests Synchronizing cache and memory

Write through Write back

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-21

Step-by-Step Use of Cache

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-22

Step-by-Step Use of Cache

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-23

Performance Advantages

Hit ratios of 90% common 50%+ improved execution speed Locality of reference is why caching works

Most memory references confined to small region of memory at any given time

Well-written program in small loop, procedure or function

Data likely in array Variables stored together

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-24

Two-level Caches

Why do the sizes of the caches have to be different?

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-25

Cache vs. Virtual Memory

Cache speeds up memory access Virtual memory increases amount of

perceived storage independence from the configuration and

capacity of the memory system low cost per bit

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-26

Modern CPU Processing Methods

Timing Issues Separate Fetch/Execute Units Pipelining Scalar Processing Superscalar Processing

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-27

Timing Issues Computer clock used for timing purposes MHz – million steps per second GHz – billion steps per second Instructions can (and often) take more than one step Data word width can require multiple steps

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-28

Separate Fetch-Execute Units

Fetch Unit Instruction fetch unit Instruction decode unit

Determine opcode Identify type of instruction and operands

Several instructions are fetched in parallel and held in a buffer until decoded and executed

IP – Instruction Pointer register Execute Unit

Receives instructions from the decode unit Appropriate execution unit services the instruction

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-29

Alternative CPU Organization

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-30

Instruction Pipelining Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions Only one instruction is being executed to completion at a time Scalar processing

Average instruction execution is approximately equal to the clock speed of the CPU Problems from stalling

Instructions have different numbers of steps Problems from branching

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-31

Branch Problem Solutions

Separate pipelines for both possibilities Probabilistic approach Requiring the following instruction to not

be dependent on the branch Instruction Reordering (superscalar

processing)

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-32

Pipelining Example

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-33

Superscalar Processing Process more than one instruction per

clock cycle Separate fetch and execute cycles as

much as possible Buffers for fetch and decode phases Parallel execution units

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-34

Superscalar CPU Block Diagram

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-35

Scalar vs. Superscalar Processing

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-36

Superscalar Issues Out-of-order processing – dependencies

(hazards) Data dependencies Branch (flow) dependencies and speculative

execution Parallel speculative execution or branch

prediction Branch History Table Register access conflicts

Logical registers

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-37

Hardware Implementation

Hardware – operations are implemented by logic gates

Advantages Speed RISC designs are simple and typically

implemented in hardware

Chapter 8: CPU and Memory:Design, Implementation, and Enhancement

8-38

Microprogrammed Implementation

Microcode are tiny programs stored in ROM that replace CPU instructions

Advantages More flexible Easier to implement complex instructions Can emulate other CPUs

Disadvantage Requires more clock cycles

top related