cps 101 introduction to computational scienceshen/cps101/handout3.pdfcps 101 introduction to...

33
CPS 101 Introduction to Computational Science Wensheng Shen Department of Computational Science SUNY Brockport

Upload: others

Post on 11-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

CPS 101 Introduction to Computational Science

Wensheng Shen

Department of Computational ScienceSUNY Brockport

Chapter 3 Computer Memory

http://computer.howstuffworks.com/http://en.wikipedia.org/wiki/www.cs.utexas.edu/users/hunt/class/2003-fall/cs352/lectures/

Storage devices

� Registers

� Cache memory

� Main memory

� Hard disk

� External devices

� Network storage

Registers

� Register: A register is a small amount of very fast computer memory located inside CPU.

� Most modern computer architectures operate on the principle of moving data from main memory into registers, operating on them, then moving the result back into main memory—a so-called load-store architecture.

Main memory� The main memory of the computer is also known as RAM (Random Access Memory).

� It is constructed from integrated circuits and needs to have electrical power in order to maintain its information. When power is lost, the information is lost too!

� It can be directly accessed by the CPU. The access time to read or write any particular byte are independent of where that byte is in the memory, and currently is approximately 50 to 100 nanoseconds.

Memory Read Operation

� CPU places memory address Addr on the memory bus.

ALU

registers

bus interface

Adrr 0

Addrx

main memoryI/O bridge

%eax

Memory Read Operation

� Main memory reads Addr from the memory bus,

retrieves word x, and places it on the bus.

registers

ALU

bus interface

Adrr 0

Addrx

main memory

I/O bridge

%eax

Memory Read Operation

� CPU read word x from the bus and copies it into register.

registers

xALU

bus interface

Adrr 0

Addrx

main memoryI/O bridge

Memory Write Operation� CPU places address Addr on bus. Main memory reads it and waits for the corresponding data word to arrive.

registers

xALU

bus interface

Adrr 0

Addr

main memoryI/O bridge

%eax

Memory Write Operation� CPU places data word x on the bus.

registers

xALU

bus interface

x 0

Addr

main memoryI/O bridge

%eax

Memory Write Operation

� Main memory read data word x from the bus and stores it at address Addr.

registers

xALU

bus interface

x 0

Addrx

main memoryI/O bridge

%eax

Hard Disk� A hard disk is a magnetic disk on which you can store computer data.

� The term hard is used to distinguish it from a soft, or floppy, disk.

� Hard disks hold more data and are faster than floppy disks. A hard disk, for example, can store anywhere from 10 to more than 100 gigabytes, whereas most floppies have a maximum storagecapacity of 1.4 megabytes.

Disk Geometry

� Disks consist of platters, each with two surfaces.

� Each surface consists of concentric rings called tracks.

� Each track consists of sectors separated by gaps.

spindle

surfacetracks

track k

sectors

gaps

Disk Geometry (Multiple-Platter)

� Aligned tracks form a cylinder.

surface 0

surface 1surface 2

surface 3surface 4

surface 5

cylinder k

spindle

platter 0

platter 1

platter 2

Disk Capacity

� Capacity: maximum number of bits that can be stored.

� Vendors express capacity in units of gigabytes (GB), where 1 GB = 10^9.

Computing Disk Capacity� Capacity = (# bytes/sector) x (ave. # sectors/track) x

� (# tracks/surface) x (# surfaces/platter) x

� (# platters/disk)

� Example:

� 512 bytes/sector

� 300 sectors/track (on average)

� 20,000 tracks/surface

� 2 surfaces/platter

� 5 platters/disk

� Capacity = 512 x 300 x 20000 x 2 x 5

� = 30,720,000,000

� = 30.72 GB

Disk Operation (Single-Platter View)The disk surface

spins at a fixed

rotational rate

The arm can position the

read/write head over any

track.

The read/write head

is attached to the end

of the arm and flies over

the disk surface on

a thin cushion of air.

spindle

sp

ind

le

spindle

sp

ind

le

spindle

Disk Operation (Multi-Platter View)

arm

read/write heads

move uniformly

from cylinder to cylinder

spindle

Disk Access Time

� Average time to access some target sector approximated by :

� Taccess = Tseek + Trotation + Ttransfer� Seek time (Tseek)

� Time to position heads over cylinder containing target sector.

� Typical Tseek = 9 ms

� Rotational latency (Trotation)

� Time waiting for first bit of target sector to pass under read/write head.

� Trotation = 1/2 x 1/RPMs min

� Transfer time (Ttransfer)

� Time to read the bits in the target sector.

� Ttransfer = 1/RPM x 1/(avg # sectors/track) min.

Disk Access Time Example� Given:

� Rotational rate = 7,200 RPM

� Average seek time = 9 ms.

� Avg # sectors/track = 400.

� Derived:

� Trotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms.

� Ttransfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms

� Taccess = 9 ms + 4 ms + 0.02 ms

� Important points:

� Access time dominated by seek time and rotational latency.

� First bit in a sector is the most expensive, the rest are free.

� SRAM access time is about 4 ns/8bytes, DRAM about 60 ns

� Disk is about 40,000 times slower than SRAM,

� 2,500 times slower then DRAM.

I/O Bus

main

memoryI/O

bridgebus interface

ALU

register file

CPU chip

system bus memory bus

disk

controller

graphics

adapter

USB

controller

mousekeyboard monitor

disk

I/O bus Expansion slots for

other devices such

as network adapters.

Reading a Disk Sector

main

memory

ALU

register file

CPU chip

disk

controller

graphics

adapter

USB

controller

mousekeyboard monitor

disk

I/O bus

bus interface

CPU initiates a disk read to the disk

controller.

Reading a Disk Sector

main

memory

ALU

register file

CPU chip

disk

controller

graphics

adapter

USB

controller

mousekeyboard monitor

disk

I/O bus

bus interface

Disk controller reads the sector and

performs a direct memory access (DMA)

and transfer data into main memory.

Reading a Disk Sector

main

memory

ALU

register file

CPU chip

disk

controller

graphics

adapter

USB

controller

mousekeyboard monitor

disk

I/O bus

bus interface

When the DMA transfer completes, the

disk controller notifies the CPU with an

interrupt

The CPU-Memory Gap

� The increasing gap between DRAM, disk, and

CPU speeds.

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1980 1985 1990 1995 2000

year

ns

Disk seek time

DRAM access time

SRAM access time

CPU cycle time

Caches

� Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.

� Fundamental idea of a memory hierarchy:

� For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.

Caching in a Memory Hierarchy

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Larger, slower, cheaper storage

device at level k+1 is partitioned

into blocks.

Data is copied between

levels in block-sized transfer

units

8 9 14 3

Smaller, faster, more expensive

device at level k caches a

subset of the blocks from level k+1Level k:

Level k+1: 4

4

4 10

10

10

Request14

Request12

General Caching Concepts

� Program needs object d, which is stored in some block b.

� Cache hit

� Program finds b in the cache at level k. E.g., block 14.

� Cache miss

� b is not at level k, so level k cache must fetch it from level k+1. E.g., block 12.

� If level k cache is full, then some current block must be replaced (evicted). Which one is the “victim”?

� Placement policy: where can the new block go? E.g., b mod 4

� Replacement policy: which block should be evicted? E.g., LRU

9 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Level

k:

Level

k+1:

1414

12

14

4*

4*12

12

0 1 2 3

Request12

4*4*12

� Consider a simple example—a 4-kilobyte cache with a line size of 32 bytes direct-mapped on virtual addresses. Thus each load/store to cache moves 32 bytes. If one variable of type float takes 4 bytes on our system, each cache line will hold eight (32/4=8) such variables.

� The following loop calculates the inner product of these two arrays. Each array element is assumed to be 4 bytes long; the data has not been cached yet.

float a[1024], b[1024];for (i=0; i<1024; i++)sum += a[i]*b[i];

This line was cacheda[8..15]hitLoad a[9]9

This line was cachedb[8..15]hitLoad b[9]

t=a[9]*b[9]

Sum+=t

This line was not cached yeta[8..15]missLoad a[8]8

This line was not cached yetb[8..15]missLoad b[8]

t=a[8]*b[8]

Sum+=t

a[] was cacheda[0..7]hitLoad a[7]7

b[] was cachedb[0..7]hitLoad b[7]

t=a[7]*b[7]

Sum+=t

a[] was cacheda[0..7]hitLoad a[1]1

b[] was cachedb[0..7]hitLoad b[1]

t=a[1]*b[1]

Sum+=t

…… etc…..

Sum+=t

t=a[0]*b[0]

b[] was not cached yetb[0..7]missLoad b[0]

a[] was not cached yeta[0..7]missLoad a[0]0

commentIn cachestatusoperationsi

� In this example a[0...7] denotes elements a[0],..,a[7] ; a similar notation is used for vector b and other array sequences of elements.

� The cache hit rate is 7/8, which equals 87.5 percent. However, this is the best case.

Virtual Memory� With virtual memory, the computer can look for areas of RAM that have

not been used recently and copy them onto the hard disk. This frees up space in RAM to load the new application. Because it does this automatically, you don't even know it is happening, and it makes your computer feel like is has unlimited RAM space.

� The area of the hard disk that stores the RAM image is called a page file. The page size is normally a few Kbytes, for example, 4096 Bytes/page. The operating system moves data back and forth between the page file and RAM.

� Of course, the read/write speed of a hard drive is much slower than RAM. If your system has to rely too heavily on virtual memory, you will notice a significant performance drop.

� When you don't have enough memory, the operating system has to constantly swap information back and forth between RAM and the hard disk. This is called thrashing, and it can make your computer feel incredibly slow.