static process scheduling section 5.2

Static Process SchedulingSection 5.2

CSc 8320Alex De Ruiter

[email protected]

The Book:Static Process Scheduling

What is it (classical definition)? The scheduling of a set of partially ordered tasks

on a non-preemptive multiprocessor system of identical processors to minimize the overall finishing time (aka. Makespan).[1]

Implications: Mapping of processes to processors is determined

before execution of a process. Once started, processes stays on processor until

completed. -- No preemption -- Process behavior, process execution time,

precedence relationships, and communication patterns need to be known before execution.


Scheduling to optimize makespan has been shown to be NP-complete. So, research is directed toward approximate/heuristic methods.

How does static scheduling for distributed systems differ from classical definition? Interprocessor communication is considered to be

negligible in classical definition. Definitely not the case in distributed system.


Goal? Scheduling algorithm that can best balance and

overlap computation and communication.

Two types proposed by book Precedence Process Model: generally appropriate for

user applications where process precedence is explicitly specified by the user.

Communication Process Model: generally appropriate for system applications where the scheduling goal is to maximize resource utilization and minimize interprocess communication.

The Book:Precedence Process Model

General Goal: minimize overall makespan.

Represented by directed acyclic graph (DAG).

Critical path: represents the longest execution path in the precedence process DAG. Possible scheduling strategy is to map all critical

path processes onto the same processor.


Three forms: List Scheduling

No processor remains idle if tasks remain Extended List Scheduling

Use LS to distribute without concern for communication delays.

Add in communication delays. -- No anticipation Earliest Task First

Earliest schedulable task is scheduled first.


Each node represents a task/execution time combination.

Each edge represents a precedence relationship.

Each edge also notes the message unit weight.

[1]


[1]

The Book:Communication Process Model

Why a Communication Process Model? Processes don't always have an explicit

completion time. Processes don't always have precedence

constraints. Scheduling goal is to maximize resource

utilization, minimize interprocess communication, and minimize total execution time.

The Book:Communication Process Model Module Allocation Problem: Used to define “cost” in

the Communication Process Module

G: is undirected graph with nodes V and edges E P: some number of processors ej(pi) : execution cost of process j on processor i c(i,j)(pi,pj): communication cost between processors i and j

Also NP-Complete

Cost G , P = ∑j∈V G

e j pi ∑i , j∈E G

ci , j pi , p j

The Book:Communication Process Model Approaches:

Minimize communication cost by selecting “Cut Set” with least weight.

Cut set represents total cost of interprocessor communication.

By selecting for communication cost, concurrency is potentially reduced.

Logical conclusion would be to schedule all processes on one processor.

Leads to Maximum Flow / Minimum Cut which represents optimized two processor scheduling selection.


Each node represents a processor.

Each edge represents a weighted communication cost

[1]


[1]

The Book:Communication Process Model Heuristic approach for more than one processor:

Define super group S containing all proposed processes.

Define a communication cost threshold where if communication cost between two processes exceeds threshold, both processes are assigned to same processor.

Using Cost(G,P), iteratively combine processes into sub groups from super group S. Optimize for computation and communication cost as each subgroup is produced.

Proceed until all process are removed from S.

The Book:Wrap-up

Static process scheduling is imprecise due to problem complexity as number of processors and processes grows.

When one remembers that the system need not maintain the static process allocation thanks to subsequent load balancing efforts, best effort approximations prior to process initiation become less significant in the overall system performance.

Today

Realtime Grid computing scheduling schemes: Earliest Deadline First (EDF): Highest

priority to processes with earliest required deadline[3].

Lest Laxity First (LLF) process are scheduled in non-decreasing order of slack time where slack time is given as the difference between the process's deadline and its remaining computational time. So processes that are closest to exceeding their deadlines go first[3].

Today

Random Brokering: Specific process arrival time and process

duration are unknown but in general conform to some statistical distribution.

Resource assignment guided by known properties of arrival time/ process duration distribution (i.e. duration may conform to power law and arrival time may be represented by Poisson distributuion[2][3]).

References

1) Randy Chow, Theodore Johnson, “Distributed Operating Systems & Algorithms”, Addison Wesley, pp.156-163.

2) Vandy Berten, Joel Goossens, Emmanuel Jeannot, “On the Distribution of Sequential Jobs in Random Brokering for Heterogeneous Computational Grids”, IEEE Transactions on Parallel and Distributed Systems, VOL 17, No. 2, February 2006, Page 113.

3) “Poisson Distribution”, http://en.wikipedia.org/wiki/Poisson_distribution4) Nikolaos D. Doulamis, Anastasios D. Doulamis, Emmanouel A.

Varvarigos, Theodora A. Varvarigou, “Fair Scheduling Algorithms in Grids”, IEEE Transactions on Parallel and Distributed Systems, Vol 18, No. 11, November 2007.

static process scheduling section 5.2

Documents

process precedence

communication process

communication cost

process behavior

precedence process dag

process execution time

communication process

interprocess communication