performance evaluation high parallel systems architecture

9

Click here to load reader

Upload: hassan-ahmed-khan

Post on 10-May-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 1

1

CS-417: COMPUTER SYSTEMS MODELING

Performance Evaluation of High Parallel

Systems Architecture

S. Zaffar Qasim

Assistant Professor (CIS)

BE (CIS)

Spring Semester 2014

2

Evaluation of High Parallel Systems Architecture

� The system chosen here is more indicative of realistic systems

where multiple servers are interconnected to serve several users.

� Here we have essentially the problem of memory allocation to

processors.

� A processor can have all of the memory or none of the memory or

anything in between.

Fig 1: Multibank shared memory model

Page 2: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 2

3

Evaluation of High Parallel Systems Architecture

� Allocations are done using the entire memory module.

� That is, a CPU cannot share a memory module with another CPU

during a cycle.

� On each CPU cycle, each processor makes a memory request.

� If there is a free memory meeting the CPU's request, it gets filled;

otherwise, the CPU must wait until the next cycle.

� When several processors make memory module requests to the

same memory module, only one is served (chosen at random from

those requesting).

� New memory requests for each processor are chosen randomly

from the M memory modules using a uniform distribution.

� Let the system state be the number of memory requests for each

memory module:-

K = (k1, k2, k3, …, km)where ki represents the memory request by processors for memory bank i.

4

Evaluation of High Parallel Systems Architecture

� At the start of a cycle the sum of all requests cannot exceed the number

of processors in the system, N:-

k1 + k2 + k3 + … + km = N

� The total number of possible states is related to the number of ways N

processor requests can be distributed to M memory modules:-

or, in other terms, how to allocate N balls to M cells.

� For N = 2 and M = 4 (see Fig 2) the possible way to allocate the four

memory modules to processors (indistinguishable from each other) is

shown in Table 1.

Fig 2: Multiprocessor system with N = 2 and M= 4.

Page 3: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 3

5

Evaluation of High Parallel Systems Architecture

� and is found by:

Table 1

6

Evaluation of High Parallel Systems Architecture

� We can see that if the number of processors requesting

memory modules and the number of memory modules are

increased,

o the number of possible states grows very quickly,

o making this analysis difficult for even relatively small

problems, as shown in Table 2.

Table 2

Page 4: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 4

7

Evaluation of High Parallel Systems Architecture

� Let H = (h1,h2, ... ,hm) represent the intermediate state, when

the memory access requested on a cycle has been filled and

the new requests have not yet been made:

� Let G represent a new (feasible) system state:

G = (g1, g2, g3, …, gm)

� First, let's define:-

8

Properties

1. If G is reachable from K in one cycle, the probability it will in

fact be the next state is given by:-

where x represents the number of new requests.

2. The system can be described by a Markov chain, since the

next state probabilities at any time depend only on the

current state.

3. The system is aperiodic, since a one-step transition from a

state to itself is possible at any time.

4. The system is irreducible, since it can reach any other in a

finite number of steps.

Page 5: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 5

9

Performance Assessment

� Also, since these conditions hold, there is an equilibrium state

probability distribution, Π, so that:-

ΠΠΠΠ=ΠΠΠΠ P

where P is the state transition matrix

Π = (Π1, Π2, Π3, Π4, …, Πj)

� A performance assessment typically made in such system

configurations to determine what the Effective processor

power of the N processors with M memory system is:

o EP (N, M) = the expected number of instructions executed

per second compared with an N =1, M =1 system.

� Let Proc(i) represent the number of memory requests

serviced (instructions executed) when the system is in state i:-

10

Performance Assessment

� For the simple case where N = 2 and M = 2, we have the

system illustrated in Fig 3.

Fig 3: Multiprocessor system with N = 2 and M= 2.

Fig 4: Probability state transition diagram.

Page 6: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 6

11

Performance Assessment

� The possible states this model could be in, representing the

requested memory requested by the two processors, is

described as (see Fig 4):-

� which represents the probability of being in state (2,0) and

transitioning to state (1,1).

12

Performance Assessment

� Similarly, the probability of being in state (1,1) and traversing

to state (2,0) would be found as:

and so on.

� The balance equations for this Markov chain can be found

using the relationship:-

Flow In = Flow Out

Page 7: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 7

13

Performance Assessment

� The discovered effective processor power is computed using the

relationship:-

EP(2,2 ) = 1ΠΠΠΠ1 + 2ΠΠΠΠ2 + 1ΠΠΠΠ3 = 0.25 + 1.0 + 0.25 = 1.5

� Limitations: The model does not take into account memory interference

caused by I/O operations.

o It also assumes the processors and memory are synchronized, as are

memory access/cycle.

14

Evaluation of Parallel Systems Architecture Petri net Perspective

� Assumptions: There are

o np processors,

o nm shared memory modules, and

o nb data buses.

� Each of the processors has local memory,

o gets used until a page miss

o new page being loaded into local memoryfrom external memory module.

� The miss rate (λλλλ) is exponentially distributed.

� The access time (1/µµµµ) to shared memory is alsoassumed to be exponentially distributed.

Page 8: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 8

15

Evaluation of Parallel Systems Architecture Petri net Perpective

� The model depicted contains two places per memory moduleo one place for processor tokens and one place for bus tokens and o one timed transition (for memory allocation and use).

� There are also two immediate transitions associated withsynchronizing and controlling the memory access.

� We have total nine places, four timed transitions, and siximmediate transitions.

Fig 5: Petri net model for multiprocessor system (np= 5, nm = 3, and nb = 2)

16

Petri net Perpective

� Tokens in place P1 represent processors executing on their localmemory.

� Tokens in place P2 represent data buses available for use.

� An important assumption: every processor and memory module actin an identical manner.

� When a processor completes its local memory access (has a pagemiss resulting in firing transition t1) and requires more sharedmemory resources, a token is moved from place P1 to place P3.

Page 9: Performance Evaluation High Parallel Systems Architecture

Computer Systems Modeling (CS-417) 9

17

Petri net Perpective

� A processor determines which memory it needs by firing the immediatetransition, t2, on the memory module it has chosen using a probabilisticbranch.

� Once t2 fires, a token is moved from place 3 to place 4.

� Once a token is in place 4, the processor is requesting access to a data bus.

� The processor acquires the memory desired, and then acquires a data bus toretrieve the needed information.

� Once a processor has the bus, signaled by the firing of transition t3, and hasacquired the memory (indicated by the token in place, P5), it begins tomodel using the memory module by initiating the timer on transition t4.

18

Petri net Perpective

� Upon completion of using the bus, the token representing theprocessor and the bus are routed back to their initial places, P2 andP1.

� If we run this model with inputs similar to what were applied tothe queuing model, we would find results that very closely matchthe queuing model case.

� That is, we would find out that the effective processor powerwould be proportional to about 2.05 with the configuration asspecified.