performance-driven processor allocation

28
D A C U C P Performance-Driven Processor Allocation Julita Corbalan, Xavier Martorell, Jesus Labarta {juli,xavim,jesus}@ac.upc.es DAC-UPC

Upload: geoff

Post on 25-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Performance-Driven Processor Allocation. Julita Corbalan, Xavier Martorell, Jesus Labarta {juli,xavim,jesus}@ac.upc.es DAC-UPC. Objective. Scheduling parallel applications in Shared Memory Multiprogrammed systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Performance-Driven Processor Allocation

D A C

U

CP

Performance-Driven Processor Allocation

Julita Corbalan, Xavier Martorell, Jesus Labarta{juli,xavim,jesus}@ac.upc.es

DAC-UPC

Page 2: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Objective

Scheduling parallel applications in Shared Memory Multiprogrammed systems

Allocate processors to applications that “can take advantage of them”

Implemented in an SGI Origin2000 with 64 processors

Page 3: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Outline

Introduction & Related Work NANOS Execution Environment Performance-Driven Processor Allocation:PDPA Evaluation Conclusions & Future Work

Page 4: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Introduction

Scheduling problem: allocate processors to applications

Space-Sharing / Time-Sharing Number of processes = Number of Processors

Process Control [Tucker89]

Space-sharing approaches: P fixed at submission time

FCFS, SJF, SCDF [Majumdar88,...] P defined at execution time (Adaptive / Dynamic)

Equal-allocation of the resources: Equipartition [McCan93] Processor allocation proportional to the application

performance

Page 5: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Introduction (2)

Processor allocation proportional to application performance

Drawback: Application performance is not known before its execution

Solution: Calculate it a priori Executing several times with different P and input data Extrapolate the values based on a few samples

These approaches may not be valid: Application performance depends on run-time parameters:

Initial data placement, process migrations, distance between processors and memory, …

It can be impracticable: e.g. infinite input data sets

Page 6: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Related Work

Dynamic performance analysis Self-Tuning [Nguyen96], efficiency calculated at run-time

as a function of: idleness, system and communication overhead

Adaptive/Dynamic processor allocation policies Equal_efficiency [Nguyen96], tries to achieve the same

efficiency on all processors Dynamic Allocation, based on the idleness [McCann93] Allocates the knee of the efficiency/execution time curve

[Eager89]

Page 7: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Our proposal

We propose: Dynamic performance analysis

Real speedup Calculated at run-time

Allocate processors to applications that “can take advantage of them”

Dynamic partitioning Cost conscious re-allocations (memory locality) Really efficient use of processors

Dynamic multiprogramming level Coordination between the medium & long term schedulers

Page 8: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Outline

Introduction & Related work NANOS Execution Environment Performance-Driven Processor Allocation:PDPA Evaluation Conclusions & Future Work

Page 9: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

NANOS Execution Environment

OpenMP ParallelApplications(malleable)

Shared Memory Multiprocessor

Operating System

CPU Manager

Queueing System

….

Start newapplication

Queuedapplications

Proc. request, speedup

Proc. allocated

Newapplication?

Resume,bind, ...

FCFS

SelfAnalyzer

-Request processors-Informs about its performance

-Implements the scheduling policy-Informs the applications about its decisions -Enforces the processor allocation

-Controls the application arrival-Coordinated with the CPU Manager

Page 10: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Outline

Introduction & Related work NANOS Execution Environment Performance-Driven Processor Allocation: PDPA

Dynamic Performance Analysis: SelfAnalyzer Performance-Driven Processor Allocation policy Dynamic Multiprogramming Level

Evaluation Conclusions & Future Work

Page 11: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Dynamic Performance Analysis: SelfAnalyzer

Based on iterative parallel applications Source code available

SelfAnalyzer calls inserted by the user or the compiler

Source code not available Dynamic Periodicity Detection SelfAnalyzer dynamically loaded

Tool to estimate the application speedup and execution time

Do

!$OMP PARALLEL DO

do

enddo

!$OMP END DO

!$OMP PARALLEL DO

do

enddo

!$OMP END DO

end do

Page 12: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

T(b) T(P)

B Proc. P Proc.

...

Dynamic Performance Analysis: SelfAnalyzer(2)

Speedup calculated as the relationship between T(1) and T(P)

)()()( bAFPTbTSpeedup

T(1) T(P)

)()1(pT

TSpeedup...

1 Proc. P Proc.

Serialization!!

Page 13: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Performance-Driven Processor Allocation

Space-Sharing Allocation for acceptable efficiency (S(p)/p)

In the range [low_eff , high_eff] [50%-70%]

Run-To-Completion Minimum allocation of one processor

Dynamic partitioning, re-allocations when: Applications inform about their speedups Application arrival/Application end

Remembers the application state Allocation, performance

Page 14: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Performance-Driven Processor Allocation(2)

NO_REF

DEC STABLE INC

NewApplP=min(Free Proc., Proc. Requested)

Eff(p)<high_eff&&

Eff(p)>low_eff

Eff(p)<low_effP=P-step

Eff(p)>low_eff

Eff(p)>high_effP=P+min(free,step)

Eff(p)<high_eff ORNot proportional benefit

System Changes System Changes

Policy parameters: step, low_eff and high_eff

Page 15: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Dynamic Multiprogramming Level

Multiprogramming level (ML) Number of applications running concurrently Static/Dynamic ML

Coordination between the medium & long term schedulers

If (new_appl_fits()?) start_new_appl()new_appl_fits() defined by the scheduling policy

• Free processors during several quanta start_new_appl() implemented by the queuing system

Page 16: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Outline

Introduction & Related work NANOS Execution Environment Performance-Driven Processor Allocation:PDPA Evaluation

Processor Allocation Policies Applications & Workloads Execution Time & Processor Allocation

Conclusions & Future Work

Page 17: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Processor Allocation Policies

Equip: equal CPUs to each running application

PDPA + DML : our proposal

Equal_eff: equal efficiency in all the processors

SGI-MP: native IRIX Scheduler MP_BLOCKTIME=200000 OMP_DYNAMIC=TRUE

Page 18: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Applications & Workloads

Architecture & System SGI Origin2000 with 64 processors + IRIX 6.5.8

Applications: Open MP Swim(44.2), Bt(20.85), Hydro2d(6.3), apsi(1)

Workloads Multiprogramming Level set to 4 Request = 32 processors each application

Swim Bt Hydro2d apsiW1 6 6W2 6 6W3 6 6W4 12

Page 19: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Exec.Time & Proc. Allocation

W1

0

100

200

300

400

swim bt total

Exe

cutio

n tim

e (s

e)

EQUIP PDPA EQUAL_EFF SGI-MP

W1

05

1015202530

swim bt

Allo

catio

n

EQUIP PDPA EQUAL_EFF SGI-MP

ML=4

DML=5

Limited processor allocation

Appl. exc. time slightly increased

Total execution time reduced

Page 20: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Exec.Time & Proc. Allocation

W2

0

200

400

600

800

1000

bt apsi total

Exe

cutio

n tim

e (s

ec)

EQUIP PDPA EQUAL_EFF SGI-MP

W2

0

10

20

30

bt apsi

Allo

catio

n

EQUIP PDPA EQUAL_EFF SGI-MP

Performance affected by the multiprogrammed executionTotal exec. Time improved

DML=10

Processors are efficiently used

Allocation proportional to the performance

Page 21: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

SGI vs. PDPA

Processor Affinity+ Process Control

4476 vs. 4 processes migrations !!!!

Page 22: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

PDPA behavior (zoom)

Tuning algorithm

Page 23: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Outline

Introduction & Related Work NANOS Execution Environment Performance-Driven Processor Allocation:PDPA Evaluation Conclusions & Future Work

Page 24: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Conclusions

It is important to provide an accurate performance information

SelfAnalyzer: dynamic, accurate, easy to use

PDPA allocates processors to applications that “can take advantage of them”

The Dynamic Multiprogramming Level improves the system performance

Coordinating the medium & long term schedulers

Page 25: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Future Work

Dynamic performance analysis Non-iterative applications

PDPA Space Sharing+Time Sharing Evaluation in a open environment Step, low_eff and high_eff need further research Number of reallocations limited

Coordination medium & long term schedulers New policies

Page 26: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

More contact info...

http://www.ac.upc.es/NANOS

http://www.ac.upc.es/homes/juli [email protected]

Page 27: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Related Work

Dynamic performance analysis Self-Tuning [Nguyen96], efficiency calculated at run-time

as a function of: idleness, system and communication overhead

Dynamic processor allocation policies Equal_efficiency [Nguyen96], tries to achieve the same

efficiency on all processors Dynamic Allocation, based on the idleness [McCann93] Allocates the knee of the efficiency/execution time curve

[Eager89]

It does not calculate the real speedup

It does not ensure an efficient use of processors

Excessive number of reallocations

Uses a priori information

Page 28: Performance-Driven Processor Allocation

CD A C

UP

Performance-Driven Processor Allocation

Performance-Driven Processor Allocation(3)

Advantages PDPA works with run-time information Ensures that processors are always efficiently

used

Drawbacks The tuning algorithm can introduce overhead

inside the application Dynamic step

Some processors can remain unallocated Dynamic Multiprogramming Level