21scs147l17mid3 revision4-14[1]

Upload: abhishekanand0107

Post on 14-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    1/72

    Mid3 Revision , VM andInstruction Set Architecture

    Prof. Sin-Min Lee

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    2/72

    Classification of DigitalCircuits

    Combinational.

    Output depends only on current input values.

    Sequential.

    Output depends on current input values andpresent state of the circuit, where the presentstate of the circuit is the current value of the

    devices memory.Also called finite state machines.

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    3/72

    Characteristic tables

    The tables that weve

    made so far are calledcharacteristic tables.

    They show the nextstate Q(t+1) in terms ofthe current state Q(t)

    and the inputs. For simplicity, the

    control input C is notusually listed.

    Again, these tables

    D Q(t+1) Operation

    0 0 Reset1 1 Set

    T Q(t+1) Operation

    0 Q(t) No change

    1 Q(t) Complement

    J K Q(t+1) Operation

    0 0 Q(t) No change0 1 0 Reset

    1 0 1 Set

    1 1 Q(t) Complement

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    4/72

    Characteristic equations We can also write characteristic equations,

    where the next state Q(t+1) is defined interms of the current state Q(t) and inputs.

    D Q(t+1) Operation

    0 0 Reset

    1 1 Set

    T Q(t+1) Operation

    0 Q(t) No change

    1 Q(t) Complement

    J K Q(t+1) Operation

    0 0 Q(t) No change

    0 1 0 Reset

    1 0 1 Set

    1 1 Q(t) Complement

    Q(t+1) = D

    Q(t+1) = KQ(t) + JQ(t)

    Q(t+1) = TQ(t) + TQ(t)= T Q(t)

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    5/72

    Memory Allocation

    Compile for overlays Compile for fixed Partitions

    Separate queue per partition Single queue

    Relocation and variable partitions Dynamic contiguous allocation (bit maps versus linked

    lists)

    Fragmentation issues Swapping Paging

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    6/72

    Overlays

    Overlay Manager

    Overlay Area

    Main Program

    Overlay 1

    Overlay 2

    Overlay 3

    Secondary Storage

    Overlay 1

    Overlay 2

    Overlay 3

    Overlay 1

    0K

    5k

    7k

    12k

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    7/72

    Multiprogramming with FixedPartitions

    Divide memory into n

    (possible unequal)partitions.

    Problem:

    Fragmentation

    Free Space

    0k

    4k

    16k

    64k

    128k

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    8/72

    Fixed Partitions

    LegendFree Space0k

    4k

    16k

    64k

    128k

    Internalfragmentation

    (cannot bereallocated)

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    9/72

    Fixed Partition Allocation

    Implementation Issues

    Separate input queue for each partition

    Requires sorting the incoming jobs and putting them into

    separate queues Inefficient utilization of memory

    when the queue for a large partition is empty but the queue for asmall partition is full. Small jobs have to wait to get into memoryeven though plenty of memory is free.

    One single input queue for all partitions.

    Allocate a partition where the job fits in. Best Fit

    Worst Fit

    First Fit

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    10/72

    Relocation

    Correct starting address when a program starts in memory

    Different jobs will run at different addresses When a program is linked, the linker must know at what address the

    program will begin in memory. Logical addresses, Virtual addresses

    Logical address space , range (0 to max)

    Physical addresses, Physical address space

    range (R+0 to R+max) for base value R. User program never sees the real physical addresses

    Memory-management unit (MMU) map virtual to physical addresses.

    Relocation register Mapping requires hardware (MMU) with the base register

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    11/72

    Relocation Register

    Memory

    Base Register

    CPU

    Instruction

    Address

    +

    BA

    MA MA+BA

    Physical

    AddressLogical

    Address

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    12/72

    Storage Placement Strategies

    Best fit

    Use the hole whose size is equal to the need, or if none is

    equal, the whole that is larger but closest in size. Rationale?

    First fit

    Use the first available hole whose size is sufficient to meet

    the need Rationale?

    Worst fit

    Use the largest available hole

    Rationale?

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    13/72

    Storage Placement Strategies

    Every placement strategy has its ownproblem

    Best fit Creates small holes that cant be used

    Worst Fit

    Gets rid of large holes making it difficult to run largeprograms

    First Fit

    Creates average size holes

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    14/72

    Locality of Reference

    Most memory references confined to smallregion

    Well-written program in small loop, procedure

    or function Data likely in array and variables stored

    together

    Working set Number of pages sufficient to run program normally,

    i.e., satisfy locality of a particular program

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    15/72

    Page Replacement Algorithms

    Page fault - page is not in memory and must beloaded from disk

    Algorithms to manage swapping

    First-In, First-Out FIFOBeladys Anomaly Least Recently Used LRU

    Least Frequently Used LFU

    Not Used Recently NUR Referenced bit, Modified (dirty) bit

    Second Chance Replacement algorithms

    Thrashing too many page faults affect system performance

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    16/72

    Virtual Memory Tradeoffs

    Disadvantages

    SWAP file takes up space on disk

    Paging takes up resources of the CPU

    Advantages

    Programs share memory space

    More programs run at the same time

    Programs run even if they cannot fit into memoryall at once

    Process separation

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    17/72

    Virtual Memory vs. Caching

    Cache speeds up memory access

    Virtual memory increases amount ofperceived storage

    Independence from the configuration andcapacity of the memory system

    Low cost per bit compared to main memory

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    18/72

    How Bad Is Fragmentation?

    Statistical arguments - Random sizes

    First-fit

    Given N allocated blocks

    0.5N blocks will be lost because offragmentation

    Known as 50% RULE

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    19/72

    Solve Fragmentation w.Compaction

    Monitor Job 3 FreeJob 5 Job 6Job 7 Job 85

    Monitor Job 3 FreeJob 5 Job 6Job 7 Job 86

    Monitor Job 3 FreeJob 5 Job 6Job 7 Job 87

    Monitor Job 3 FreeJob 5 Job 6Job 7 Job 88

    Monitor Job 3 FreeJob 5 Job 6Job 7 Job 89

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    20/72

    Storage Management Problems

    Fixed partitions suffer from

    internal fragmentation Variable partitions suffer from

    external fragmentation

    Compaction suffers from overhead

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    21/72

    Placement Policy

    Determines where in real memory aprocess piece is to reside

    Important in a segmentation system

    Paging or combined paging withsegmentation hardware performs addresstranslation

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    22/72

    Replacement Policy

    Placement Policy

    Which page is replaced?

    Page removed should be the page least likely

    to be referenced in the near future

    Most policies predict the future behavior onthe basis of past behavior

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    23/72

    Replacement Policy

    Frame Locking

    If frame is locked, it may not be replaced

    Kernel of the operating system

    Control structures

    I/O buffers

    Associate a lock bit with each frame

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    24/72

    Basic Replacement Algorithms

    Optimal policy

    Selects for replacement that page for whichthe time to the next reference is the longest

    Impossible to have perfect knowledge offuture events

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    25/72

    Basic Replacement Algorithms

    Least Recently Used (LRU)

    Replaces the page that has not beenreferenced for the longest time

    By the principle of locality, this should be thepage least likely to be referenced in the nearfuture

    Each page could be tagged with the time oflast reference. This would require a greatdeal of overhead.

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    26/72

    Basic Replacement Algorithms

    First-in, first-out (FIFO)

    Treats page frames allocated to a process asa circular buffer

    Pages are removed in round-robin style

    Simplest replacement policy to implement

    Page that has been in memory the longest is

    replaced These pages may be needed again very soon

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    27/72

    Basic Replacement Algorithms

    Clock Policy

    Additional bit called a use bit

    When a page is first loaded in memory, the use bit is

    set to 1 When the page is referenced, the use bit is set to 1

    When it is time to replace a page, the first frameencountered with the use bit set to 0 is replaced.

    During the search for replacement, each use bit set to1 is changed to 0

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    28/72

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    29/72

    FIFO Replacement Policy

    2 1 3 4 2 5 5 1 1 3 3 4 5 5 6

    2 1 3 4 2 2 5 5 1 1 3 4 4 5

    2 1 3 4 4 2 2 5 5 1 3 3 4

    Hit

    Hit ratio: 4 / 15

    String:

    2 1 3 4 2 5 4 1 2 3 1 4 5 4 6

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    30/72

    LRU Replacement Policy

    2 1 3 4 2 5 4 1 2 3 1 4 5 4 6

    2 1 3 4 2 5 4 1 2 3 1 4 5 4

    2 1 3 4 2 5 4 1 2 3 1 1 5

    Hit

    Hit ratio: 3 / 15

    String:

    2 1 3 4 2 5 4 1 2 3 1 4 5 4 6

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    31/72

    Optimal Replacement Policy

    2 1 3 4 4 4 4 4 4 4 4 4 4 4 6

    2 1 1 1 1 1 1 1 1 1 1 5 5 4

    2 2 2 5 5 5 2 3 3 3 3 3 5

    String:

    2 1 3 4 2 5 4 1 2 3 1 4 5 4 6

    Hit

    Hit ratio: 6 / 15

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    32/72

    Early memory managementschemes

    Originally used to devote computer tosingle user:

    User has all of memory

    0 65535

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    33/72

    Limitations of single-usercontiguous scheme

    Only one person using the machine--lotsof computer time going to waste (why?)

    Largest job based on size of machinememory

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    34/72

    Next: fixed partitions

    Created chunks of memory for each job:

    Job 1 Job 2 Job 3

    0 65535

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    35/72

    Limitations of fixed partitions

    Operator had to correctly guess size ofprograms

    Programs limited to partitions they weregiven

    Memory fragmentation resulted

    The kind illustrated here is called internalmemory fragmentation

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    36/72

    Dynamic Partitions

    1

    3

    4

    2

    1

    6

    5

    7

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    37/72

    Internal versus external memoryfragmentation:

    Job 8

    Space previously allocated by Job 1

    Space currently allocated by Job 8

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    38/72

    Dynamic Partitions

    Contiguous memory is still required forprocesses

    How do we decide size of the partitions?

    Once the machine is going, how do oldjobs get replaced by new ones?

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    39/72

    Dyanmic Partitions: First Fit

    In this scheme, we search forward in thefree list for a partition large enough toaccommodate the next job

    Fast, but the gaps left can be large

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    40/72

    Dynamic Partitions: Best Fit

    In this scheme, we try to find the smallestpartition large enough to hold the next job

    This tends to minimize the size of the gaps

    But it also requires that we keep list of freespaces

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    41/72

    Deallocating memory

    If the block we are deallocating is adjacentto one or two free blocks, then it needs tobe merged with them.

    So either we are returning a pointer to thefree block, or we are changing the size ofa block, or both

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    42/72

    Relocatable Dynamic Partitions

    We can see that in some cases, a job canfit into the combined spaces within orbetween partitions of the early schemes

    So how do we take advantage of thatspace?

    One way is to move programs while they

    are in the machine--compacting themdown into the lower end of memory abovethe operating system

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    43/72

    Several names for this

    Garbage collection

    Defragmentation

    Compaction

    All share a problem: relative addressing!

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    44/72

    Page Replacement Algorithms

    Optimal page replacement simply notpossible

    Keep referenced (R) and Modify (M) bits toallow us to keep track of past usageinstead

    Page is referenced by any read or write in it

    Page is modified by any change (write) madeto it

    P R l t Al ith

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    45/72

    Page Replacement Algorithms,Continued

    FIFO = First in, first out

    LRU = Least recently used

    LFU = Least frequently used

    both of the latter rely on apagerequestcall to the operating system

    a failure to find a page =page interrupt

    we might measure quality byfailure rate = page interrupts / page

    requests

    P R l t Al ith

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    46/72

    Page Replacement Algorithms,Continued

    Clock page replacement

    Hand of the clock points to the oldest page

    If a page fault occurs, check R bits in

    clockwise order

    A variant called the two-handed clock isused in some UNIX systems

    FIFO l ti i t

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    47/72

    FIFO solution is not morememory

    Called Beladys anomaly

    the page request orderis an importantfactor, not just the size of memory

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    48/72

    LRU

    Doesnt suffer from Beladys anomaly

    Presumes locality of reference

    Butwhile it works well, it is a little morecomplex to implement in software

    Consequently, aging and various clockalgorithms are the most common in practice

    Aging can yield a good approximation

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    49/72

    Segmented Memory Allocation

    Instead of equal divisions, try to breakcode into its natural modules

    Compiler now asked to help operating

    system

    No page frames--different sizes required(meaning we get external fragmentation

    again)

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    50/72

    Segmented/Demand Paging

    Subdivide the natural program segmentsinto equal sized parts to load into pageframes

    eliminates external fragmentation

    allows for large virtual memory, so it isoften used in more modern OSs

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    51/72

    Tradeoffs

    Note that there is a tradeoff betweenexternal fragmentation and page faults inpaging systems

    Note also that we probably want slightlysmaller page frames in a Segmented-Demand Paging framework

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    52/72

    Instruction Set Architectures

    Part 1

    I/O systemInstr. Set Proc.

    Compiler

    OperatingSystem

    Application

    Digital Design

    Circuit Design

    Instruction SetArchitecture

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    53/72

    Some ancient history

    Earliest (1940s) computers were one-of-a-kind.

    Early commercial computers (1950s), each newmodel had entirely different instruction set.

    Programmed at machine code or assemblerlevel

    1957 IBM introduced FORTRAN

    Much easier to write programs.

    Remarkably, code wasnt much slower than hand-written.

    Possible to use a new machine withoutreprogramming.

    -

    Impact of High Level

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    54/72

    Impact of High-LevelLanguages

    Customers were delighted

    Computer makers werent so happy

    Needed to write new compilers (and OSs)

    for each new model Written in assembly code

    Portable compilers didnt exist

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    55/72

    IBM 360 architecture

    The first ISA used for multiple models

    IBM invested $5 billion

    6 models introduced in 1964

    Performance varied by factor of 50

    24-bit addresses (huge for 1964)

    largest model only had 512 KB memory

    Huge success!Architecture still in use today

    Evolved to 370 (added virtual addressing) and 390(32 bit addresses).

    Lets learn from our

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    56/72

    Let s learn from oursuccesses ...

    Early 70s, IBM took another big gamble

    FS a new layer between ISA and high-level language

    Put a lot of the OS function into hardware

    Huge failure

    Moral: Getting right abstraction is hard!

    The Instruction Set

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    57/72

    The Instruction SetArchitecture

    I/O systemInstr. Set Proc.

    Compiler

    OperatingSystem

    Application

    Digital Design

    Circuit Design

    Instruction SetArchitecture

    The agreed-upon interface between:

    the software that runs on a computer andthe hardware that executes it.

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    58/72

    The Instruction Set Architecture

    that part of the architecture that is visibleto the programmer

    instruction formats

    opcodes (available instructions)

    number and types of registers

    storage access, addressing modes

    exceptional conditions

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    59/72

    Overall goals of ISA

    Can be implemented by simple hardware

    Can be implemented by fast hardware

    Instructions do useful things

    Easy to write (or generate) machine code

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    60/72

    Key ISA decisionsinstruction length

    are all instructions the same length?

    how many registers?

    where do operands reside?

    e.g., can you add contents of memory to a register?

    instruction format which bits designate what??

    operands how many? how big?

    how are memory addresses computed?

    operations

    what operations are provided??

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    61/72

    Running examplesWell look at four example ISAs:

    Digitals VAX (1977) - elegant Intels x86 (1978) - ugly, but successful (IBM PC)

    MIPS focus of text, used in assorted machines

    PowerPC used in Macs, IBM supercomputers, ...

    VAX and x86 are CISC (Complex InstructionSet Computers)

    MIPS and PowerPC are RISC (Reduced

    Instruction Set Computers) almost all machines of 80s and 90s are RISC

    including VAXs successor, the DEC Alpha

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    62/72

    Instruction LengthVariable:

    Fixed:

    x86 Instructions vary from 1 to 17 Bytes long

    VAX from 1 to 54 Bytes

    MIPS, PowerPC, and most other RISCs:

    all instruction are 4 Bytes long

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    63/72

    Instruction Length

    Variable-length instructions (x86, VAX):

    - require multi-step fetch and decode.

    + allow for a more flexible and compact

    instruction set.

    Fixed-length instructions (RISCs)

    + allow easy fetch and decode.

    + simplify pipelining and parallelism.- instruction bits are scarce.

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    64/72

    Whats going on??

    How is it possible that ISAs of 70s weremuch more complex than those of 90s?

    Doesnt everything get more complex?

    Today, transistors are much smaller &cheaper, and design tools are better, sobuilding complex computer should be easier.

    How could IBM make two models of 370ISA in the same year that differed by 50xin performance??

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    65/72

    Microcode

    Another layer - between ISA and hardware

    1 instruction sequence of microinstructions

    -instruction specifies values of individual

    wires Each model can have different micro-

    language

    low-end (cheapest) model uses simple HW, longmicroprograms.

    Well look at rise and fall of microcode later

    Meanwhile, back to ISAs ...

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    66/72

    How many registers?

    All computers have a small set of registersMemory to hold values that will be used soon

    Typical instruction will use 2 or 3 register values

    Advantages of a small number of registers:It requires fewer bits to specify which one.

    Less hardware

    Faster access (shorter wires, fewer gates)

    Faster context switch (when all registers need saving)

    Advantages of a larger number:Fewer loads and stores needed

    Easier to do several operations at once

    In 141, load means moving

    data from memory to register,

    store is reverse

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    67/72

    How many registers?VAX 16 registers

    R15 is program counter (PC)Elegant! Loading R15 is a jump instruction

    x86 8 general purpose regs Fine print some restrictions apply

    Plus floating point and special purpose registers

    Most RISCs have 32 int and 32 floating point regsPlus some special purpose ones

    PowerPC has 8 four-bit condition registers, a countregister (to hold loop index), and others.

    Itanium has 128 fixed, 128 float, and 64 predicate registers

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    68/72

    Where do operands reside?Stack machine:

    Push loads memory into 1st

    register (top of stack), moves other regsdown

    Pop does the reverse.

    Add combines contents of first two registers, moves rest up.

    Accumulator machine:Only 1 register (called the accumulator)

    Instruction include store and acc acc + mem

    Register-Memory machine :Arithmetic instructions can use data in registers and/or memory

    Load-Store Machine (aka Register-RegisterMachine):Arithmetic instructions can only use data in registers.

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    69/72

    Load-store architecturescan do:

    add r1=r2+r3

    load r3, M(address)

    store r1, M(address)

    forces heavy dependenceon registers, which isexactly what you want in

    todays CPUs

    cant do:add

    r1=r2+M(address)

    - more instructions

    + fast implementation (e.g.,easy pipelining)

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    70/72

    Where do operands reside?VAX: register-memory

    Very general. 0, 1, 2, or 3 operands can be inregisters

    x86: register-memory ...

    But floating-point registers are a stack.

    Not as general as VAX instructions

    RISC machines:

    Always load-store machines

    Im not aware of any accumulator machines in last 20years. But they may be used by embedded processors,and might conceivable be appropriate for 141L project.

    Comparing the Number of

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    71/72

    p gInstructions

    Code sequence for C = A + BStack Accumulator Register-Memory Load-Store

    Push A Load A Load R1,A

    Push B Add B Load R2,BAdd Store C

    Add C, A, B

    Add R3,R1,R2

    Pop C Store C,R3

    Alternate ISAs

  • 7/27/2019 21SCS147L17Mid3 Revision4-14[1]

    72/72

    Alternate ISA sA = X*Y + X*Z

    Stack Accumulator Reg-Mem Load-store