21scs147l17mid3 revision4-14[1]

7/27/2019 21SCS147L17Mid3 Revision4-14[1]

1/72

Mid3 Revision , VM andInstruction Set Architecture

Prof. Sin-Min Lee


2/72

Classification of DigitalCircuits

Combinational.

Output depends only on current input values.

Sequential.

Output depends on current input values andpresent state of the circuit, where the presentstate of the circuit is the current value of the

devices memory.Also called finite state machines.


3/72

Characteristic tables

The tables that weve

made so far are calledcharacteristic tables.

They show the nextstate Q(t+1) in terms ofthe current state Q(t)

and the inputs. For simplicity, the

control input C is notusually listed.

Again, these tables

D Q(t+1) Operation

0 0 Reset1 1 Set

T Q(t+1) Operation

0 Q(t) No change

1 Q(t) Complement

J K Q(t+1) Operation

0 0 Q(t) No change0 1 0 Reset

1 0 1 Set

1 1 Q(t) Complement


4/72

Characteristic equations We can also write characteristic equations,

where the next state Q(t+1) is defined interms of the current state Q(t) and inputs.

D Q(t+1) Operation

0 0 Reset

1 1 Set

T Q(t+1) Operation

0 Q(t) No change

1 Q(t) Complement

J K Q(t+1) Operation

0 0 Q(t) No change

0 1 0 Reset

1 0 1 Set

1 1 Q(t) Complement

Q(t+1) = D

Q(t+1) = KQ(t) + JQ(t)

Q(t+1) = TQ(t) + TQ(t)= T Q(t)


5/72

Memory Allocation

Compile for overlays Compile for fixed Partitions

Separate queue per partition Single queue

Relocation and variable partitions Dynamic contiguous allocation (bit maps versus linked

lists)

Fragmentation issues Swapping Paging


6/72

Overlays

Overlay Manager

Overlay Area

Main Program

Overlay 1

Overlay 2

Overlay 3

Secondary Storage

Overlay 1

Overlay 2

Overlay 3

Overlay 1

0K

5k

7k

12k


7/72

Multiprogramming with FixedPartitions

Divide memory into n

(possible unequal)partitions.

Problem:

Fragmentation

Free Space

0k

4k

16k

64k

128k


8/72

Fixed Partitions

LegendFree Space0k

4k

16k

64k

128k

Internalfragmentation

(cannot bereallocated)


9/72

Fixed Partition Allocation

Implementation Issues

Separate input queue for each partition

Requires sorting the incoming jobs and putting them into

separate queues Inefficient utilization of memory

when the queue for a large partition is empty but the queue for asmall partition is full. Small jobs have to wait to get into memoryeven though plenty of memory is free.

One single input queue for all partitions.

Allocate a partition where the job fits in. Best Fit

Worst Fit

First Fit


10/72

Relocation

Correct starting address when a program starts in memory

Different jobs will run at different addresses When a program is linked, the linker must know at what address the

program will begin in memory. Logical addresses, Virtual addresses

Logical address space , range (0 to max)

Physical addresses, Physical address space

range (R+0 to R+max) for base value R. User program never sees the real physical addresses

Memory-management unit (MMU) map virtual to physical addresses.

Relocation register Mapping requires hardware (MMU) with the base register


11/72

Relocation Register

Memory

Base Register

CPU

Instruction

Address

+

BA

MA MA+BA

Physical

AddressLogical

Address


12/72

Storage Placement Strategies

Best fit

Use the hole whose size is equal to the need, or if none is

equal, the whole that is larger but closest in size. Rationale?

First fit

Use the first available hole whose size is sufficient to meet

the need Rationale?

Worst fit

Use the largest available hole

Rationale?


13/72

Storage Placement Strategies

Every placement strategy has its ownproblem

Best fit Creates small holes that cant be used

Worst Fit

Gets rid of large holes making it difficult to run largeprograms

First Fit

Creates average size holes


14/72

Locality of Reference

Most memory references confined to smallregion

Well-written program in small loop, procedure

or function Data likely in array and variables stored

together

Working set Number of pages sufficient to run program normally,

i.e., satisfy locality of a particular program


15/72

Page Replacement Algorithms

Page fault - page is not in memory and must beloaded from disk

Algorithms to manage swapping

First-In, First-Out FIFOBeladys Anomaly Least Recently Used LRU

Least Frequently Used LFU

Not Used Recently NUR Referenced bit, Modified (dirty) bit

Second Chance Replacement algorithms

Thrashing too many page faults affect system performance


16/72

Virtual Memory Tradeoffs

Disadvantages

SWAP file takes up space on disk

Paging takes up resources of the CPU

Advantages

Programs share memory space

More programs run at the same time

Programs run even if they cannot fit into memoryall at once

Process separation


17/72

Virtual Memory vs. Caching

Cache speeds up memory access

Virtual memory increases amount ofperceived storage

Independence from the configuration andcapacity of the memory system

Low cost per bit compared to main memory


18/72

How Bad Is Fragmentation?

Statistical arguments - Random sizes

First-fit

Given N allocated blocks

0.5N blocks will be lost because offragmentation

Known as 50% RULE


19/72

Solve Fragmentation w.Compaction

Monitor Job 3 FreeJob 5 Job 6Job 7 Job 85






20/72

Storage Management Problems

Fixed partitions suffer from

internal fragmentation Variable partitions suffer from

external fragmentation

Compaction suffers from overhead


21/72

Placement Policy

Determines where in real memory aprocess piece is to reside

Important in a segmentation system

Paging or combined paging withsegmentation hardware performs addresstranslation


22/72

Replacement Policy

Placement Policy

Which page is replaced?

Page removed should be the page least likely

to be referenced in the near future

Most policies predict the future behavior onthe basis of past behavior


23/72

Replacement Policy

Frame Locking

If frame is locked, it may not be replaced

Kernel of the operating system

Control structures

I/O buffers

Associate a lock bit with each frame


24/72

Basic Replacement Algorithms

Optimal policy

Selects for replacement that page for whichthe time to the next reference is the longest

Impossible to have perfect knowledge offuture events


25/72


Least Recently Used (LRU)

Replaces the page that has not beenreferenced for the longest time

By the principle of locality, this should be thepage least likely to be referenced in the nearfuture

Each page could be tagged with the time oflast reference. This would require a greatdeal of overhead.


26/72


First-in, first-out (FIFO)

Treats page frames allocated to a process asa circular buffer

Pages are removed in round-robin style

Simplest replacement policy to implement

Page that has been in memory the longest is

replaced These pages may be needed again very soon


27/72


Clock Policy

Additional bit called a use bit

When a page is first loaded in memory, the use bit is

set to 1 When the page is referenced, the use bit is set to 1

When it is time to replace a page, the first frameencountered with the use bit set to 0 is replaced.

During the search for replacement, each use bit set to1 is changed to 0


28/72


29/72

FIFO Replacement Policy

2 1 3 4 2 5 5 1 1 3 3 4 5 5 6

2 1 3 4 2 2 5 5 1 1 3 4 4 5

2 1 3 4 4 2 2 5 5 1 3 3 4

Hit

Hit ratio: 4 / 15

String:

2 1 3 4 2 5 4 1 2 3 1 4 5 4 6


30/72

LRU Replacement Policy

2 1 3 4 2 5 4 1 2 3 1 4 5 4 6

2 1 3 4 2 5 4 1 2 3 1 4 5 4

2 1 3 4 2 5 4 1 2 3 1 1 5

Hit

Hit ratio: 3 / 15

String:

2 1 3 4 2 5 4 1 2 3 1 4 5 4 6


31/72

Optimal Replacement Policy

2 1 3 4 4 4 4 4 4 4 4 4 4 4 6

2 1 1 1 1 1 1 1 1 1 1 5 5 4

2 2 2 5 5 5 2 3 3 3 3 3 5

String:

2 1 3 4 2 5 4 1 2 3 1 4 5 4 6

Hit

Hit ratio: 6 / 15


32/72

Early memory managementschemes

Originally used to devote computer tosingle user:

User has all of memory

0 65535


33/72

Limitations of single-usercontiguous scheme

Only one person using the machine--lotsof computer time going to waste (why?)

Largest job based on size of machinememory


34/72

Next: fixed partitions

Created chunks of memory for each job:

Job 1 Job 2 Job 3

0 65535


35/72

Limitations of fixed partitions

Operator had to correctly guess size ofprograms

Programs limited to partitions they weregiven

Memory fragmentation resulted

The kind illustrated here is called internalmemory fragmentation


36/72

Dynamic Partitions

1

3

4

2

1

6

5

7


37/72

Internal versus external memoryfragmentation:

Job 8

Space previously allocated by Job 1

Space currently allocated by Job 8


38/72

Dynamic Partitions

Contiguous memory is still required forprocesses

How do we decide size of the partitions?

Once the machine is going, how do oldjobs get replaced by new ones?


39/72

Dyanmic Partitions: First Fit

In this scheme, we search forward in thefree list for a partition large enough toaccommodate the next job

Fast, but the gaps left can be large


40/72

Dynamic Partitions: Best Fit

In this scheme, we try to find the smallestpartition large enough to hold the next job

This tends to minimize the size of the gaps

But it also requires that we keep list of freespaces


41/72

Deallocating memory

If the block we are deallocating is adjacentto one or two free blocks, then it needs tobe merged with them.

So either we are returning a pointer to thefree block, or we are changing the size ofa block, or both


42/72

Relocatable Dynamic Partitions

We can see that in some cases, a job canfit into the combined spaces within orbetween partitions of the early schemes

So how do we take advantage of thatspace?

One way is to move programs while they

are in the machine--compacting themdown into the lower end of memory abovethe operating system


43/72

Several names for this

Garbage collection

Defragmentation

Compaction

All share a problem: relative addressing!


44/72

Page Replacement Algorithms

Optimal page replacement simply notpossible

Keep referenced (R) and Modify (M) bits toallow us to keep track of past usageinstead

Page is referenced by any read or write in it

Page is modified by any change (write) madeto it

P R l t Al ith


45/72

Page Replacement Algorithms,Continued

FIFO = First in, first out

LRU = Least recently used

LFU = Least frequently used

both of the latter rely on apagerequestcall to the operating system

a failure to find a page =page interrupt

we might measure quality byfailure rate = page interrupts / page

requests

P R l t Al ith


46/72

Page Replacement Algorithms,Continued

Clock page replacement

Hand of the clock points to the oldest page

If a page fault occurs, check R bits in

clockwise order

A variant called the two-handed clock isused in some UNIX systems

FIFO l ti i t


47/72

FIFO solution is not morememory

Called Beladys anomaly

the page request orderis an importantfactor, not just the size of memory


48/72

LRU

Doesnt suffer from Beladys anomaly

Presumes locality of reference

Butwhile it works well, it is a little morecomplex to implement in software

Consequently, aging and various clockalgorithms are the most common in practice

Aging can yield a good approximation


49/72

Segmented Memory Allocation

Instead of equal divisions, try to breakcode into its natural modules

Compiler now asked to help operating

system

No page frames--different sizes required(meaning we get external fragmentation

again)


50/72

Segmented/Demand Paging

Subdivide the natural program segmentsinto equal sized parts to load into pageframes

eliminates external fragmentation

allows for large virtual memory, so it isoften used in more modern OSs


51/72

Tradeoffs

Note that there is a tradeoff betweenexternal fragmentation and page faults inpaging systems

Note also that we probably want slightlysmaller page frames in a Segmented-Demand Paging framework


52/72

Instruction Set Architectures

Part 1

I/O systemInstr. Set Proc.

Compiler

OperatingSystem

Application

Digital Design

Circuit Design

Instruction SetArchitecture


53/72

Some ancient history

Earliest (1940s) computers were one-of-a-kind.

Early commercial computers (1950s), each newmodel had entirely different instruction set.

Programmed at machine code or assemblerlevel

1957 IBM introduced FORTRAN

Much easier to write programs.

Remarkably, code wasnt much slower than hand-written.

Possible to use a new machine withoutreprogramming.

-

Impact of High Level


54/72

Impact of High-LevelLanguages

Customers were delighted

Computer makers werent so happy

Needed to write new compilers (and OSs)

for each new model Written in assembly code

Portable compilers didnt exist


55/72

IBM 360 architecture

The first ISA used for multiple models

IBM invested $5 billion

6 models introduced in 1964

Performance varied by factor of 50

24-bit addresses (huge for 1964)

largest model only had 512 KB memory

Huge success!Architecture still in use today

Evolved to 370 (added virtual addressing) and 390(32 bit addresses).

Lets learn from our


56/72

Let s learn from oursuccesses ...

Early 70s, IBM took another big gamble

FS a new layer between ISA and high-level language

Put a lot of the OS function into hardware

Huge failure

Moral: Getting right abstraction is hard!

The Instruction Set


57/72

The Instruction SetArchitecture

I/O systemInstr. Set Proc.

Compiler

OperatingSystem

Application

Digital Design

Circuit Design

Instruction SetArchitecture

The agreed-upon interface between:

the software that runs on a computer andthe hardware that executes it.


58/72

The Instruction Set Architecture

that part of the architecture that is visibleto the programmer

instruction formats

opcodes (available instructions)

number and types of registers

storage access, addressing modes

exceptional conditions


59/72

Overall goals of ISA

Can be implemented by simple hardware

Can be implemented by fast hardware

Instructions do useful things

Easy to write (or generate) machine code


60/72

Key ISA decisionsinstruction length

are all instructions the same length?

how many registers?

where do operands reside?

e.g., can you add contents of memory to a register?

instruction format which bits designate what??

operands how many? how big?

how are memory addresses computed?

operations

what operations are provided??


61/72

Running examplesWell look at four example ISAs:

Digitals VAX (1977) - elegant Intels x86 (1978) - ugly, but successful (IBM PC)

MIPS focus of text, used in assorted machines

PowerPC used in Macs, IBM supercomputers, ...

VAX and x86 are CISC (Complex InstructionSet Computers)

MIPS and PowerPC are RISC (Reduced

Instruction Set Computers) almost all machines of 80s and 90s are RISC

including VAXs successor, the DEC Alpha


62/72

Instruction LengthVariable:

Fixed:

x86 Instructions vary from 1 to 17 Bytes long

VAX from 1 to 54 Bytes

MIPS, PowerPC, and most other RISCs:

all instruction are 4 Bytes long


63/72

Instruction Length

Variable-length instructions (x86, VAX):

- require multi-step fetch and decode.

+ allow for a more flexible and compact

instruction set.

Fixed-length instructions (RISCs)

+ allow easy fetch and decode.

+ simplify pipelining and parallelism.- instruction bits are scarce.


64/72

Whats going on??

How is it possible that ISAs of 70s weremuch more complex than those of 90s?

Doesnt everything get more complex?

Today, transistors are much smaller &cheaper, and design tools are better, sobuilding complex computer should be easier.

How could IBM make two models of 370ISA in the same year that differed by 50xin performance??


65/72

Microcode

Another layer - between ISA and hardware

1 instruction sequence of microinstructions

-instruction specifies values of individual

wires Each model can have different micro-

language

low-end (cheapest) model uses simple HW, longmicroprograms.

Well look at rise and fall of microcode later

Meanwhile, back to ISAs ...


66/72

How many registers?

All computers have a small set of registersMemory to hold values that will be used soon

Typical instruction will use 2 or 3 register values

Advantages of a small number of registers:It requires fewer bits to specify which one.

Less hardware

Faster access (shorter wires, fewer gates)

Faster context switch (when all registers need saving)

Advantages of a larger number:Fewer loads and stores needed

Easier to do several operations at once

In 141, load means moving

data from memory to register,

store is reverse


67/72

How many registers?VAX 16 registers

R15 is program counter (PC)Elegant! Loading R15 is a jump instruction

x86 8 general purpose regs Fine print some restrictions apply

Plus floating point and special purpose registers

Most RISCs have 32 int and 32 floating point regsPlus some special purpose ones

PowerPC has 8 four-bit condition registers, a countregister (to hold loop index), and others.

Itanium has 128 fixed, 128 float, and 64 predicate registers


68/72

Where do operands reside?Stack machine:

Push loads memory into 1st

register (top of stack), moves other regsdown

Pop does the reverse.

Add combines contents of first two registers, moves rest up.

Accumulator machine:Only 1 register (called the accumulator)

Instruction include store and acc acc + mem

Register-Memory machine :Arithmetic instructions can use data in registers and/or memory

Load-Store Machine (aka Register-RegisterMachine):Arithmetic instructions can only use data in registers.


69/72

Load-store architecturescan do:

add r1=r2+r3

load r3, M(address)

store r1, M(address)

forces heavy dependenceon registers, which isexactly what you want in

todays CPUs

cant do:add

r1=r2+M(address)

- more instructions

+ fast implementation (e.g.,easy pipelining)


70/72

Where do operands reside?VAX: register-memory

Very general. 0, 1, 2, or 3 operands can be inregisters

x86: register-memory ...

But floating-point registers are a stack.

Not as general as VAX instructions

RISC machines:

Always load-store machines

Im not aware of any accumulator machines in last 20years. But they may be used by embedded processors,and might conceivable be appropriate for 141L project.

Comparing the Number of


71/72

p gInstructions

Code sequence for C = A + BStack Accumulator Register-Memory Load-Store

Push A Load A Load R1,A

Push B Add B Load R2,BAdd Store C

Add C, A, B

Add R3,R1,R2

Pop C Store C,R3

Alternate ISAs


72/72

Alternate ISA sA = X*Y + X*Z

Stack Accumulator Reg-Mem Load-store

21scs147l17mid3 revision4-14[1]

Documents