lect14.lecmar09 2006.cpu cahce main memory performance

Upload: jnturaj

Post on 07-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    1/37

    Anshul Kumar, CSE IITD

    CSL718 : Main Memory

    CPU-Cache-Main Memory Performance

    9th Mar, 2006

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    2/37

    Anshul Kumar, CSE IITD slide 2

    A Simple Model

    tav = tc + pm . tc.misswhere

    tav = average memory access time as seen by CPU

    tc = cache access time

    pm = miss probability (consider only read misses, if write penaltiesare hidden by buffers)

    tc.miss = cache miss penalty

    CPU Cache Memory

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    3/37

    Anshul Kumar, CSE IITD slide 3

    Cache miss penalty

    Depends on

    Various cache policies

    Read policy

    Load policy Write policy

    Write buffers etc.

    Main memory organization

    Interleaving

    Page mode

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    4/37

    Anshul Kumar, CSE IITD slide 4

    Read Policies

    Cache

    Memory

    Teff=(1-pm).1 +

    pm . (T+2)

    Sequential Simple:

    CacheMemory

    Teff=(1-pm).1 +

    pm . (T+1)

    Concurrent Simple:

    CacheMemory

    Teff

    =(1-pm

    ).1 +

    pm . (T+1)

    Sequential Forward:

    Cache

    Memory

    Teff=(1-pm).1 +

    pm . (T)

    Concurrent Forward:

    1 1 1

    T

    1 1 1T

    1 1

    T

    1 1

    T

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    5/37

    Anshul Kumar, CSE IITD slide 5

    Load policies

    4 AU Block

    Cache miss on AU 1

    Block Load

    Load ForwardFetch Bypass

    (wrap around

    load)

    0 1 2 3

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    6/37

    Anshul Kumar, CSE IITD slide 6

    Analyzing Write Policies:CPU time

    Hit:WB, Miss: WB 1 Tb + i 1 1

    Hit:WB, Miss: WTWA 1 Tb + i 1 1

    Hit:WB, Miss: WTNWA 1 Tb + i 1 1

    Hit:WT, Miss: WB 1 Tb + i 1 1

    Hit:WT, Miss: WTWA 1 Tb + i 1 1

    Hit:WT, Miss: WTNWA 1 Tb + i 1 1

    Policy Read Read Write Writehit miss hit miss

    i depends on read policy

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    7/37

    Anshul Kumar, CSE IITD slide 7

    Analyzing Write Policies:Bus time

    Hit:WB, Miss: WB 0 Tb (2-Pc) 0 Tb(2-Pc)

    Hit:WB, Miss: WTWA 0 Tb (2-Pc) 0 Tb(2-Pc)+Tw

    Hit:WB, Miss: WTNWA 0 Tb (2-Pc) 0 Tw

    Hit:WT, Miss: WB 0 Tb (2-Pc) Tw Tb(2-Pc)

    Hit:WT, Miss: WTWA 0 Tb Tw Tb+Tw

    Hit:WT, Miss: WTNWA 0 Tb Tw Tw

    Policy Read Read Write Writehit miss hit miss

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    8/37

    Anshul Kumar, CSE IITD slide 8

    Interleaving with Fast Page Mode

    m

    LLT

    m

    LTTT buscalineaccess 1

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    9/37

    Anshul Kumar, CSE IITD slide 9

    A Refined Model

    tav = tc + pm . (tc.miss + tinterference + tw-interference + tIO-interference )where

    tinterference = interference among line transfers

    tw-interference = interference between word writes and line

    transferstIO-interference = interference between I/O and line transfers

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    10/37

    Anshul Kumar, CSE IITD slide 10

    Interference among line transfers

    What happens when another miss occurs in tbusy =tm.miss -tc.miss interval?

    tinterference = additional delay due to this

    = expected number of misses during tbusy *

    delay per miss= ( * tbusy * pm) * (tbusy/ 2)

    where = memory request rate of processor

    tc tc.miss

    tm.miss

    CPU blocked CPU executing

    Memory busy

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    11/37

    Anshul Kumar, CSE IITD slide 11

    Interference I/Os and writes

    delay = prob that memory is busy when request arrives *

    average waiting periodwhat happens when memory is found to be busy serving one

    request and some other requests are waiting?

    Memory busy

    request arrivals

    served waiting served

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    12/37

    Anshul Kumar, CSE IITD slide 12

    I/O Interference

    tIO-interference = delay due to I/O contention

    = probability that memory is occupied with I/O *

    average time taken to complete ongoing I/O

    = () * (tservice +tIO-wait)/2tservice = time to service (block read/write time)

    tIO-wait= waiting time

    = 0, if CPU has a higher priority

    0, otherwise

    estimate using queuing

    model

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    13/37

    Anshul Kumar, CSE IITD slide 13

    Write Interference Delay

    tw-interference = probability that a write through is occupying thememory when a read miss occurs *

    average time taken to complete ongoing write

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    14/37

    Anshul Kumar, CSE IITD slide 14

    Memory performance using queuing model

    Arrival of

    requests

    (from processor/cache)

    Servicing of

    requests

    (by memory)

    Requests queuedfor service

    Statistical behaviour of arrivals ?

    Statistical behaviour of service?

    Model Nomenclature: arrival / service / numberM / G / 1 G : General

    M / M / 1 M : Poisson/Exponential

    M / D / 1 D : Constant

    MB / D / 1 MB : Binomial

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    15/37

    Anshul Kumar, CSE IITD slide 15

    Modeling memory requests

    prob of a request in one cycle =p

    prob of no request in one cycle = 1p

    prob of no request in T/cycles = (1p)T/

    prob of at least one req in T/cycles = 1

    (1

    p)T/prob ofkrequests in n (=T/) cycles = nCkp

    k(1p)n-k

    (Binomial distribution)

    expected no. of requests in n cycles = n p

    T: interval

    (memory cycle time)

    : processor cycle

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    16/37

    Anshul Kumar, CSE IITD slide 16

    Poisson Approximation

    If processor cycles are small

    (i.e., 0,p 0, n, n pT),

    Binomial distribution Poisson distribution, request rate =

    prob ofkrequests in interval T =

    expected no. of requests in intervalT =T

    Interval between two consecutive requests has an exponentialdistribution, prob (inter arrival interval > t) = 1 e - t

    Tk

    ekT

    !)(

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    17/37

    Anshul Kumar, CSE IITD slide 17

    Modeling Service

    Each request is served in constant time

    e.g. cache write through requests,

    cache block transfer requests

    or Service time has an exponential distribution

    e.g. I/O requests with varying block sizes where

    small blocks are more common than large blocks

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    18/37

    Anshul Kumar, CSE IITD slide 18

    M / G / 1 Model

    Average waiting time = Tw =

    Average queue length = Q =

    where

    = occupancy of server = /

    = average service ratec =

    = variance of service time

    )1(2)1(1

    22

    c

    )1(2

    )1(22

    c

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    19/37

    Anshul Kumar, CSE IITD slide 19

    Special cases: M/M/1, M/D/1

    M/M/1 c = 1Average waiting time = Tw =

    Average queue length = Q =

    M/D/1 c = 0

    Average waiting time = Tw =

    Average queue length = Q =

    1

    12

    1

    2

    )1(2

    12

    )1(2

    2

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    20/37

    Anshul Kumar, CSE IITD slide 20

    M/D/1 with low server occupancy

    Average waiting time = Tw =

    Average queue length = Q =

    when is small, Tw =

    =

    Compare this with

    )1(2

    12

    )1(2

    2

    2

    12

    2

    1

    2

    1

    2

    busym tp 2

    1

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    21/37

    Anshul Kumar, CSE IITD slide 21

    Designing buffer to hold the queue

    How to design a buffer so that buffer overflowor stalling due to buffer full is within certain

    limit?

    For M/M/1 model ,

    prob(queue size buffer size BF) = BF+1

    Choose BF so that this probability is below a

    desired value.

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    22/37

    Anshul Kumar, CSE IITD slide 22

    Open and Closed Queues

    Arrival of

    requests

    (from processor/cache)

    Servicing of

    requests

    (by memory)

    Requests queuedfor service

    Processor is not blocked by queuing delays and

    request rate remains unaffected Open queue

    Processor is blocked due to queuing delays andrequest rate drops Closed queue

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    23/37

    Anshul Kumar, CSE IITD slide 23

    Open and Closed Queues

    Arrival of

    requests

    (from processor/cache)

    Servicing of

    requests

    (by memory)

    Requests queued

    for service

    Time Tw 1/

    Number (open) Q = Tw = /

    Number (closed) Qa a

    occupancy(open q)=

    = occupancy(closed q) + waiting (closed q) a +Qa

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    24/37

    Anshul Kumar, CSE IITD slide 24

    M/D/1 Closed Queue

    Reduced request rate = aReduced occupancy =a =a/

    Requests being served = a

    Requests waiting =

    )1(2

    2

    a

    a

    1)1(1)1(

    1)1()1(2

    2

    2

    2

    2

    a

    a

    a

    a

    a

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    25/37

    Anshul Kumar, CSE IITD slide 25

    Deriving queue length, wait time

    Let ti = time when request i is being served

    ri = no. of arrivals during ti

    ni = queue length at the end ofti

    including item in serviceAssume occupancy of server = = /< 1

    process reaches a steady state

    Expected value E(ti ) = E(t) = T = 1/E(ri ) = E(r) = E(t) = /=

    E(ni ) = E(n) = N

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    26/37

    Anshul Kumar, CSE IITD slide 26

    Relating ni+1

    to ni

    ni+1 = ni + arrivals departures

    two cases need to be considered:

    i) ni 0

    ii) ni = 0

    Ci+1Ci+2Ci+3 Ci

    ni

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    27/37

    Anshul Kumar, CSE IITD slide 27

    When ni 0

    Ci+1 arrived before Ci left

    ni+1 = ni + ri+1 - 1

    Ci served Ci+1 served

    Ci leaves Ci+1 leaves

    time

    ti ti+1

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    28/37

    Anshul Kumar, CSE IITD slide 28

    When ni= 0

    Ci+1 arrived after Ci leftni+1 = ni + 1 + ri+1 1

    = ni + ri+1

    Ci served Ci+1 served

    Ci leaves Ci+1 leaves

    time

    ti ti+1

    Ci+1 arrives

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    29/37

    Anshul Kumar, CSE IITD slide 29

    Combining the two cases

    ni+1 = ni + ri+1 1 + i

    where i = 0, when ni 0 and

    i = 1, when ni = 0

    note that nii = 0 and i2= i

    E(ni+1) =E(ni ) +E(ri+1 ) 1 +E(i )

    in steady state, E(n) =E(n) +E(r) 1 +E()

    that is, E() = 1 -E(r) = 1 - prob ( n 0) =

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    30/37

    Anshul Kumar, CSE IITD slide 30

    Combining the two cases

    ni+1 = ni + ri+1 1 + i

    ni+12 = ni

    2 + (ri+1 1)2 + i

    2+ 2 ni (ri+1 1)

    + 2(ri+1 1) i + 2 nii

    ni+12 = ni

    2 + (ri+1 1)2 + i+ 2 ni (ri+1 1) + 2(ri+1 1) i

    E(ni+12) = E( ni

    2 ) + E(ri+1 1)2 + E( i )

    + 2E[ ni (ri+1 1) ] + 2E[(ri+1 1) i ]

    0 = E[(r 1)2] + E( )+ 2E[ n (r 1) ] + 2E[(r 1) ]

    0 = E(r2)-2+1+ (1-)+ 2E(n) ( 1) + 2( 1)(1-)

    2E(n) (1-) = E(r2)-22 +

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    31/37

    Anshul Kumar, CSE IITD slide 31

    continued

    2E(n) (1-) = E(r2)-22 +

    This is valid for G/G/1

    )1(2

    -)E(

    )1(2

    2-)E()E(N

    222

    rrn

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    32/37

    Anshul Kumar, CSE IITD slide 32

    Consider Poisson arrival

    P(ri) =

    mean E(ri) = ti

    variance ri2 = ti

    ri2 =E(ri

    2) - |E(ri)|2

    E(ri2) = ri

    2 +|E(ri)|2

    Take expectation over i

    E(r2) = E(t) + 2 E(t2)

    i

    i

    !)(

    i

    i t

    r

    ert

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    33/37

    Anshul Kumar, CSE IITD slide 33

    continued

    mean E(t) = 1/variance t

    2

    E(t2) = t2 + [E(t) ] 2 = t

    2 + 1/2

    Recall E(r2

    ) = E(t) + 2

    E(t2

    )Therefore, E(r2) = /+ 2 (t

    2 + 1/2 )

    = + 2t2 + 2

    where c2 = 2t2

    )1(2)(1

    )1(2

    )1(2-)E()E(N

    222222

    crn t

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    34/37

    Anshul Kumar, CSE IITD slide 34

    Direct Derivation for M/M/1

    P(n; t) = prob that there are n req in the system attime t (in queue + in service)

    P(n; t+t) = P(n; t)(1 - t - t)

    + P(n-1; t) t

    + P(n+1; t) t

    P(0; t+t) = P(0; t)(1 - t) + P(1; t) t

    Prob of more than one event in tis neglected (t2

    term)

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    35/37

    Anshul Kumar, CSE IITD slide 35

    Direct Derivation for M/M/1

    dP(n; t)/dt= P(n; t)(--) + P(n-1; t)+ P(n+1; t)dP(0; t)/dt= P(0; t)(-) + P(1; t)

    In steady state, We can drop ;t

    Derivatives tend to 0

    0 = P(n)(--) + P(n-1)+ P(n+1)

    0 = P(0)(-) + P(1)

    P(n) - P(n+1) = P(n-1) - P(n)

    P(0) - P(1) = 0

    P(n-1) - P(n) = 0 P(n) =P(n-1)

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    36/37

    Anshul Kumar, CSE IITD slide 36

    Direct Derivation for M/M/1

    P(n) =P(n-1)

    P(n) =n P(0)

    11)1()1(

    )1()1()()(

    )1()(1)0(

    1

    1

    )0(1)0(1)(

    2

    2

    000

    00

    i

    i

    i

    i

    i

    n

    i

    i

    i

    iiiPinE

    nPandP

    PPiP

  • 8/4/2019 Lect14.LecMar09 2006.CPU Cahce Main Memory Performance

    37/37

    A h l K CSE IITD slide 37

    Direct Derivation for M/M/1

    )(

    )(Prob

    )1(

    )1)(1()1()(

    )(Prob

    1

    1

    2

    00

    k

    k

    k

    i

    ik

    i

    kserverqueueinitems

    iP

    kserverqueueinitems