cilk - an efficient multithreaded runtime system

31
Cilk: An Efficient Multithreaded Runtime System Mohanadarshan - 148241N Shareek Ahamed - 148201T Authors: Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall and Yuli Zhou MIT Laboratory for Computer Science, Cambridge

Upload: shareek-ahamed

Post on 20-Jul-2015

80 views

Category:

Software


3 download

TRANSCRIPT

Cilk: An Efficient Multithreaded Runtime System

Mohanadarshan - 148241NShareek Ahamed - 148201T

Authors: Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul,

Charles E. Leiserson, Keith H. Randall and Yuli Zhou

MIT Laboratory for Computer Science, Cambridge

Agenda

● What is Cilk ?

● Why Cilk ?

● Introduction

● Scheduling & Work Stealing

● How it Works ?

● Fibonacci Calculation

● Performance in Cilk Applications

● Current Usage

● Related Works

● Cilk Plus

● Conclusion

What is Cilk ?

● Cilk is a C-based runtime system for multi-threaded parallel programming.

● Cilk guarantees efficient and predictable performance

● Lightweight fork and join

○ Own scheduler (Work Stealing Scheduler)

● Proofs for Performance and Space

● World Class chess programs like StarTech, *Socrates, and Cilkchess are

developed by Cilk.

Why Cilk ?

Multithreading requires to implement dynamic, asynchronous, concurrent programs.

● A multithreaded system provides the programmer with a means to create,

synchronize, and schedule threads.

● Cilk reduces the complexity of implementing multithreaded programs.

● Programmer don’t have to worry about the complexity, only need to identify

region for parallelism.

● Cilk optimizes:

➔ Total work

➔ Critical path

Introduction

Introduction (contd..)

● Cilk program is a set of procedures

● A procedure is a sequence of threads

● Cilk threads are:

○ Represented by nodes in the dag

○ Non-blocking: run to completion: no waiting or suspension: atomic units

of execution

● Threads can spawn child threads

○ downward edges connect a parent to its children

Introduction (contd..)

● A child & parent can run concurrently.

○ Non-blocking threads --> a child cannot return a value to its parent.

○ The parent spawns a successor that receives values from its children

● A thread & its successor are parts of the same Cilk procedure.

○ connected by horizontal arcs

● Children’s returned values are received before their successor begins:

○ They constitute data dependencies.

○ Connected by curved arcs

How it Works ?

● spawn T (k, ?x)

- spawn a child thread

● spawn_next T(k, ?x)

- A successor thread is spawned the same way as a child, except the keyword spawn_next is used

● send_argument( k, value )

- sends value to the argument slot of a waiting closure specified by continuation k.

spawn_next

send_argumentspawn

Parent

Child

Successor

Scheduling

Every Processor has own

- Scheduler

- Ready-Queue

Invoked when thread ends

- Schedules or steals another thread

Work Stealing

● Cilk uses run time scheduling called work stealing.

● Works well on dynamic, asynchronous, MIMD-style programs.

● Work-stealing:

○ a process with no work selects a victim from which to get work.

○ it gets the shallowest thread in the victim’s spawn tree.

● In Cilk, thieves choose the victims randomly.

Work Stealing (contd..)

void func f( )

{

work;

spawn g( );

work;

work;

work;

….

work;

}

thread void func g( )

{

work;

work;

work;

}

Worker1 Worker2

Work Stealing (contd..)

void func f( )

{

work;

spawn g( );

work;

work;

work;

….

work;

}

thread void func g( )

{

work;

work;

work;

}

Worker1 Worker2

Work Stealing (contd..)

void func f( )

{

work;

spawn g( );

work;

work;

work;

….

work;

}

thread void func g( )

{

work;

work;

work;

}

Worker1 Worker2

Work Stealing (contd..)

void func f( )

{

work;

spawn g( );

work;

work;

work;

….

work;

}

thread void func g( )

{

work;

work;

work;

}

Worker1 Worker2

Work Stealing (contd..)

void func f( )

{

work;

spawn g( );

work;

work;

work;

….

work;

}

thread void func g( )

{

work;

work;

work;

}

Worker1 Worker2

Work Stealing (contd..)

void func f( )

{

work;

spawn g( );

work;

work;

work;

….

work;

}

thread void func g( )

{

work;

work;

work;

}

Worker1 Worker2

Work Stealing (contd..)

void func f( )

{

work;

spawn g( );

work;

work;

work;

….

work;

}

thread void func g( )

{

work;

work;

work;

}

Worker1 Worker2

How it Works ? (Example :Fibonacci)

thread int fib ( cont int k, int n ) {

if ( n < 2 ) send_argument( k, n );

else { cont int x, y;

spawn_next sum ( k, ?x, ?y );

spawn fib ( x, n - 1 );

spawn fib ( y, n - 2 );

}

}

thread sum ( cont int k, int x, int y ) {

send_argument ( k, x + y );

}

Fibonacci Calculation

Ready Queue

if ( ! readyDeque .isEmpty() )

take deepest thread

else

steal shallowest thread from readyDeque of randomly selected victim

Performance in Cilk Application

Experiments were ran on a CM5 supercomputer to document the efficiency of the work-stealing scheduler.

Tested Applications

1. fib (fibonacci)2. queens (placing N queens on a N x N chessboard)3. pfold (protein-folding)4. ray (ray-tracing algorithm for graphics rendering) 5. Knary (at each node runs an empty “for” loop )6. Socrates (parallel chess program, uses the Jamboree search algorithm)

Performance in Cilk Application (contd..)

Tserial

⇒ Time taken to run C program (gcc)

T1 ⇒ Time taken to run 1-processor Cilk program

T ∞ ⇒ Cilk computation timestamping each thread

Tp ⇒ Processor execution time of the Cilk program

Tserial

⇒ Efficiency of the Cilk program T

1

⇒ Efficiency is close to 1 for programs with moderately long threads

Cilk overhead is small.

Performance of Cilk on various applications

Performance in Cilk Application

Finding 33rd Fibonacci Number

Example applications

Virus shell assembly

Graphics rendering

n-body simulation

Heuristic search

Dense and sparse matrix computations

Friction-stir welding simulation

Artificial evolution

Related Works

EARTH (An Efficient Architecture for Running THreads)

EARTH supports an adaptive event Driven multithreaded execution model, containing two thread levels:

● threaded procedures● fibers

A threaded procedure is invoked asynchronously forking a parallel thread of execution.

A threaded procedure is statically divided into fibers fine grain threads communicating through dataflow-like synchronization operations.

EARTH vs. CILK

EARTH Model CILK Model

Note: - EARTH has it origin in static dataflow model

- In comparison features of CILK Model is similar to the EARTH model

Cilk Plus

● Maintained by Intel ©

● Only 3 keywords

– Cilk_spawn

– Cilk_sync

– Cilk_for

● Available in Intel Compilers & in gcc branch.

More info - http://www.cilkplus.org/

https://www.youtube.com/watch?v=mv5i3MEvX98

Cilk Plus

cilk int fib (int n)

{

if (n < 2) return n;

else

{

int x, y;

x = spawn fib (n-1);

y = spawn fib (n-2);

sync;

return (x+y);

}

}

- Easier to implement than Cilk!- Less complex than Cilk!

Conclusion

● Pros➔ Guaranteed runtime & space usage

➔ Good performance

➔ Critical Path is short compared to total work

➔ Low Overhead

➔ Very Simple to Use

● Cons➔ Only suitable for tree like computations

➔ Continuations are confusing

➔ No shared memory

Thank You ...