1 tk2123: computer organisation & architecture prepared by: associate prof. dr masri ayob...

33
1 TK2123: COMPUTER ORGANISATION & ARCHITECTURE Prepared By: Associate Prof. Dr Masri Ayob Lecture 5: Computer Performance

Post on 21-Dec-2015

226 views

Category:

Documents


2 download

TRANSCRIPT

1

TK2123: COMPUTER ORGANISATION & ARCHITECTURE

Prepared By: Associate Prof. Dr Masri Ayob

Lecture 5: Computer Performance

Prepared by: Dr Masri Ayob - TK2123

2

Contents

This lecture will discuss:This lecture will discuss:• Speeding up computer operation.Speeding up computer operation.• Improvements in Chip Organisation and Improvements in Chip Organisation and

Architecture.Architecture.• Multilevel MachinesMultilevel Machines

Prepared by: Dr Masri Ayob - TK2123

3

Speeding up computer operation

PipeliningPipelining

On board cacheOn board cache

On board L1 & L2 cacheOn board L1 & L2 cache

Branch predictionBranch prediction

Data flow analysisData flow analysis

Speculative executionSpeculative execution

Prepared by: Dr Masri Ayob - TK2123

4

Performance Balance

Processor speed increased.Processor speed increased.

Memory capacity increased.Memory capacity increased.

Memory speed lags behind processor speed.Memory speed lags behind processor speed.

Prepared by: Dr Masri Ayob - TK2123

5

Logic and Memory Performance Gap

Prepared by: Dr Masri Ayob - TK2123

6

Solutions

Increase number of bits retrieved at one timeIncrease number of bits retrieved at one time• Make DRAM “wider” rather than “deeper”Make DRAM “wider” rather than “deeper”

Change DRAM interfaceChange DRAM interface• CacheCache

Reduce frequency of memory accessReduce frequency of memory access• More complex cache and cache on chipMore complex cache and cache on chip

Increase interconnection bandwidthIncrease interconnection bandwidth• High speed busesHigh speed buses• Hierarchy of busesHierarchy of buses

Prepared by: Dr Masri Ayob - TK2123

7

I/O Devices

Peripherals with intensive I/O demandsPeripherals with intensive I/O demands

Large data throughput demandsLarge data throughput demands

Processors can handle thisProcessors can handle this

Problem moving data Problem moving data

Solutions:Solutions:• CachingCaching• BufferingBuffering• Higher-speed interconnection busesHigher-speed interconnection buses• More elaborate bus structuresMore elaborate bus structures• Multiple-processor configurationsMultiple-processor configurations

Prepared by: Dr Masri Ayob - TK2123

8

Typical I/O Device Data Rates

Prepared by: Dr Masri Ayob - TK2123

9

Key is Balance

Processor componentsProcessor components

Main memoryMain memory

I/O devicesI/O devices

Interconnection structuresInterconnection structures

Prepared by: Dr Masri Ayob - TK2123

10

Improvements in Chip Organization and Architecture

Increase hardware speed of processorIncrease hardware speed of processor• Fundamentally due to shrinking logic gate sizeFundamentally due to shrinking logic gate size• More gates, packed more tightly, increasing More gates, packed more tightly, increasing

clock rateclock rate• Propagation time for signals reducedPropagation time for signals reduced

Increase size and speed of cachesIncrease size and speed of caches• Dedicating part of processor chip Dedicating part of processor chip • Cache access times drop significantlyCache access times drop significantly

Change processor organization and architectureChange processor organization and architecture• Increase effective speed of executionIncrease effective speed of execution• ParallelismParallelism

Prepared by: Dr Masri Ayob - TK2123

11

Problems with Clock Speed and Logic Density

PowerPower• Power density increases with density of logic and Power density increases with density of logic and

clock speed.clock speed.• Dissipating heat.Dissipating heat.

RC delayRC delay• Speed at which electrons flow limited by Speed at which electrons flow limited by

resistance and capacitance of metal wires resistance and capacitance of metal wires connecting them.connecting them.

• Delay increases as RC product increases.Delay increases as RC product increases.• Wire interconnects thinner, increasing resistance.Wire interconnects thinner, increasing resistance.• Wires closer together, increasing capacitance.Wires closer together, increasing capacitance.

Prepared by: Dr Masri Ayob - TK2123

12

Problems with Clock Speed and Logic Density

Memory latencyMemory latency• Memory speeds lag processor speeds.Memory speeds lag processor speeds.

Solution:Solution:• More emphasis on organisational and More emphasis on organisational and

architectural approachesarchitectural approaches

Prepared by: Dr Masri Ayob - TK2123

13

Intel Microprocessor Performance

Prepared by: Dr Masri Ayob - TK2123

14

Increased Cache Capacity

Typically two or three levels of cache Typically two or three levels of cache between processor and main memory.between processor and main memory.

Chip density increasedChip density increased• More cache memory on chipMore cache memory on chip• Faster cache accessFaster cache access

Pentium chip devoted about 10% of chip area Pentium chip devoted about 10% of chip area to cache.to cache.

Pentium 4 devotes about 50%Pentium 4 devotes about 50%

Prepared by: Dr Masri Ayob - TK2123

15

More Complex Execution Logic

Enable parallel execution of instructionsEnable parallel execution of instructions

Pipeline works like assembly linePipeline works like assembly line• Different stages of execution of different Different stages of execution of different

instructions at same time along pipelineinstructions at same time along pipeline

Superscalar allows multiple pipelines within Superscalar allows multiple pipelines within single processorsingle processor• Instructions that do not depend on one another Instructions that do not depend on one another

can be executed in parallelcan be executed in parallel

Prepared by: Dr Masri Ayob - TK2123

16

Diminishing Returns

Internal organisation of processors complexInternal organisation of processors complex• Can get a great deal of parallelismCan get a great deal of parallelism• Further significant increases likely to be relatively Further significant increases likely to be relatively

modest.modest.

Benefits from cache are reaching limit.Benefits from cache are reaching limit.

Increasing clock rate runs into power dissipation Increasing clock rate runs into power dissipation problem. problem. • Some fundamental physical limits are being Some fundamental physical limits are being

reached.reached.

Prepared by: Dr Masri Ayob - TK2123

17

New Approach – Multiple Cores

Multiple processors on single chipMultiple processors on single chip• Large shared cacheLarge shared cache

Within a processor, increase in performance Within a processor, increase in performance proportional to square root of increase in complexityproportional to square root of increase in complexityIf software can use multiple processors, doubling If software can use multiple processors, doubling number of processors almost doubles performancenumber of processors almost doubles performanceSo, use two simpler processors on the chip rather than So, use two simpler processors on the chip rather than one more complex processorone more complex processorWith two processors, larger caches are justifiedWith two processors, larger caches are justified• Power consumption of memory logic less than Power consumption of memory logic less than

processing logicprocessing logicExample: IBM POWER4Example: IBM POWER4• Two cores based on PowerPCTwo cores based on PowerPC

Prepared by: Dr Masri Ayob - TK2123

18

POWER4 Chip Organization

Prepared by: Dr Masri Ayob - TK2123

19

Pentium Evolution (1)80808080• first general purpose microprocessorfirst general purpose microprocessor• 8 bit data path8 bit data path• Used in first personal computer – AltairUsed in first personal computer – Altair

80868086• much more powerfulmuch more powerful• 16 bit16 bit• instruction cache, prefetch few instructionsinstruction cache, prefetch few instructions• 8088 (8 bit external bus) used in first IBM PC8088 (8 bit external bus) used in first IBM PC

8028680286• 16 Mbyte memory addressable16 Mbyte memory addressable• up from 1Mbup from 1Mb

8038680386• 32 bit32 bit• Support for multitaskingSupport for multitasking

Prepared by: Dr Masri Ayob - TK2123

20

Pentium Evolution (2)

8048680486• sophisticated powerful cache and instruction pipeliningsophisticated powerful cache and instruction pipelining• built in maths co-processorbuilt in maths co-processor

PentiumPentium• SuperscalarSuperscalar• Multiple instructions executed in parallelMultiple instructions executed in parallel

Pentium ProPentium Pro• Increased superscalar organizationIncreased superscalar organization• Aggressive register renamingAggressive register renaming• branch predictionbranch prediction• data flow analysisdata flow analysis• speculative executionspeculative execution

Prepared by: Dr Masri Ayob - TK2123

21

Pentium Evolution (3)

Pentium IIPentium II• MMX technologyMMX technology• graphics, video & audio processinggraphics, video & audio processing

Pentium IIIPentium III• Additional floating point instructions for 3D graphicsAdditional floating point instructions for 3D graphics

Pentium 4Pentium 4• Note Arabic rather than Roman numeralsNote Arabic rather than Roman numerals• Further floating point and multimedia enhancementsFurther floating point and multimedia enhancements

ItaniumItanium• 64 bit64 bit

Itanium 2Itanium 2• Hardware enhancements to increase speedHardware enhancements to increase speed

Prepared by: Dr Masri Ayob - TK2123

22

Intel Computer Family (3)

Moore’s law for (Intel) CPU chips.

Prepared by: Dr Masri Ayob - TK2123

23

Intel Computer Family (1)

The Intel CPU family. Clock speeds are measured in MHz (megahertz) where 1 MHZ

is 1 million cycles/sec.

Prepared by: Dr Masri Ayob - TK2123

24

PowerPC

1975, 801 minicomputer project (IBM) RISC 1975, 801 minicomputer project (IBM) RISC

Berkeley RISC I processorBerkeley RISC I processor

1986, IBM commercial RISC workstation product, RT PC.1986, IBM commercial RISC workstation product, RT PC.• Not commercial successNot commercial success• Many rivals with comparable or better performanceMany rivals with comparable or better performance

1990, IBM RISC System/60001990, IBM RISC System/6000• RISC-like superscalar machineRISC-like superscalar machine• POWER architecturePOWER architecture

IBM alliance with Motorola (68000 microprocessors), and IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh)Apple, (used 68000 in Macintosh)

Result is PowerPC architectureResult is PowerPC architecture• Derived from the POWER architectureDerived from the POWER architecture• Superscalar RISCSuperscalar RISC• Apple MacintoshApple Macintosh• Embedded chip applicationsEmbedded chip applications

Prepared by: Dr Masri Ayob - TK2123

25

PowerPC Family (1)

601:601:• Quickly to market. 32-bit machineQuickly to market. 32-bit machine

603:603:• Low-end desktop and portable Low-end desktop and portable • 32-bit32-bit• Comparable performance with 601Comparable performance with 601• Lower cost and more efficient implementationLower cost and more efficient implementation

604:604:• Desktop and low-end serversDesktop and low-end servers• 32-bit machine32-bit machine• Much more advanced superscalar designMuch more advanced superscalar design• Greater performanceGreater performance

620:620:• High-end serversHigh-end servers• 64-bit architecture64-bit architecture

Prepared by: Dr Masri Ayob - TK2123

26

PowerPC Family (2)

740/750:740/750:• Also known as G3Also known as G3• Two levels of cache on chipTwo levels of cache on chip

G4:G4:• Increases parallelism and internal speedIncreases parallelism and internal speed

G5:G5:• Improvements in parallelism and internal speed Improvements in parallelism and internal speed • 64-bit organization64-bit organization

Prepared by: Dr Masri Ayob - TK2123

27

Internet Resources

http://www.intel.com/ http://www.intel.com/ • Search for the Intel MuseumSearch for the Intel Museum

http://www.ibm.comhttp://www.ibm.com

http://www.dec.comhttp://www.dec.com

Charles Babbage InstituteCharles Babbage Institute

PowerPCPowerPC

Intel Developer HomeIntel Developer Home

Prepared by: Dr Masri Ayob - TK2123

28

Languages, Levels, Virtual Machines

A multilevel machine

Prepared by: Dr Masri Ayob - TK2123

29

Contemporary Multilevel Machines

Prepared by: Dr Masri Ayob - TK2123

30

Evolution of Multilevel Machines

• Invention of microprogrammingInvention of microprogramming• Invention of operating systemInvention of operating system• Migration of functionality to microcodeMigration of functionality to microcode• Elimination of microprogrammingElimination of microprogramming

Prepared by: Dr Masri Ayob - TK212331

The Computer Spectrum

The current spectrum of computers available. The current spectrum of computers available.

Prepared by: Dr Masri Ayob - TK212332

Metric Units

The principal metric prefixes.The principal metric prefixes.

Prepared by: Dr Masri Ayob - TK2123

33

Thank youQ & A