ParallelFxBringing Mono applications in the multicore era
Jérémie Laval
http://blog.neteril.org
http://twitter.com/jeremie_laval
IRC Garuma on #mono @ GIMPNet.org
Outline
Why bother?The free lunchAn awesome ideaRemaining performant
ParallelFxThe big pictureTasks and coPLinqState of things
A note for the future
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Everyone loves his single thread
“The ideal number of thread you should use is 1”
– Alan Mc GovernMono hacker
3 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Everyone loves his single thread
“The ideal number of thread you should use is 1”
– Alan Mc GovernMono hacker
3 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Everyone loves his single thread
“The ideal number of thread you should use is 1”
– Alan Mc GovernMono hacker
3 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Everyone loves his single thread
“The ideal number of thread you should use is 1”
– Alan Mc GovernMono hacker
3 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Because the free lunch is over!
4 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Because the free lunch is over!
4 / 23
Why bother? ParallelFx A note for the future Questions
Free lunch?
70’s - 2005: Moore’s law in action
Intel Processor Clock Speed (MHz)10000
1000
100
10
1
0.1
1968 1973 1979 1984 1990 1995 2001 2006
808080286
80386
80486
Pentium
Celeron
Pentium III
Pentium 4 (Prescott)
Core 2 Extreme
Multicore crisis is here !
Source: Smoothspan blog
5 / 23
Why bother? ParallelFx A note for the future Questions
The awesome idea
Can’t scale vertically? Scale horizontally!
6 / 23
Why bother? ParallelFx A note for the future Questions
The awesome idea
Can’t scale vertically? Scale horizontally!
6 / 23
Why bother? ParallelFx A note for the future Questions
Trend of things
Number of cores
1 2 4 12ish 80
Pentium Core DuoCore 2 Quad
Core i7-980X Intel prototype
7 / 23
Why bother? ParallelFx A note for the future Questions
Problem
No more magical free-lunch
8 / 23
Why bother? ParallelFx A note for the future Questions
Problem
No more magical free-lunch
8 / 23
Why bother? ParallelFx A note for the future Questions
Solution
Let’s break up work...
....and share it among cores9 / 23
Why bother? ParallelFx A note for the future Questions
Errr...
10 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?
� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?
� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?
� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?
� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?� Too few: it’s not scaling
� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?� Too few: it’s not scaling� Too much: context-switching hurts performance
� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible
� Simulate familiar constructs� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible
� Simulate familiar constructs� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible
� Simulate familiar constructs� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible� Simulate familiar constructs
� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible� Simulate familiar constructs� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
Enter ParallelFx
ParallelFx at a glance
13 / 23
Why bother? ParallelFx A note for the future Questions
At the heart
Work-stealing scheduler
Mono Application
Sharedworkpool
ParallelFX library(Scheduler)
Thread Worker
Local workpool
OSthread
Thread Worker
Local workpool
OSthread
Steal
Retrieve
Manage
14 / 23
Why bother? ParallelFx A note for the future Questions
Tasks
CancellationTokenSource source = new CancellationTokenSource ();
Task task = Task.Factory.StartNew (() => DoSomeStuff (), source.Token);
Task continuation = task.ContinueWith ((t) => Console.WriteLine ("task finished");
source.Cancel ();
t.Wait ();
15 / 23
Why bother? ParallelFx A note for the future Questions
Future (Task<T>)
static int SumParallel (Tree<int> tree, int curDepth){
const int SequentialThreshold = 3;
if (tree == null) return 0;
if (curDepth > SequentialThreshold)return SumSequentialInternal (tree);
int right = SumParallel (tree.Right, curDepth + 1);
Task<int> left =Task.Factory.StartNew (() => SumParallel (tree.Left, curDepth + 1));
return tree.Data + left.Value + right;}
16 / 23
Why bother? ParallelFx A note for the future Questions
Parallel.For
Fractal fractal = new Fractal (width, height);
ColorChooser colorChooser = new ColorChooser ();
Parallel.For (0, width, (i) => {for (int j = 0; j < height; j++) {
ProcessPixel (i, j, fractal, colorChooser);}
});
17 / 23
Why bother? ParallelFx A note for the future Questions
Demo: Parallel For
Demo: image processing with parallelloops
18 / 23
Why bother? ParallelFx A note for the future Questions
PLinq
ParallelEnumerable.Range (1, 1000)./* ... */ParallelEnumerable.Repeat (‘‘Rupert’’, 1000)./* ... */enumerable.AsParallel ()./* ... */
var query = from x in Directory.GetFiles ("/etc/").AsParallel ()where x.EndsWith (".conf")select x;
query.ForAll ((e) => {Console.WriteLine ("{0} from {1}", e,
Thread.CurrentThread.ManagedThreadId);});
19 / 23
Why bother? ParallelFx A note for the future Questions
Demo: PLinq
Demo: raytracing the PLinq way
20 / 23
Why bother? ParallelFx A note for the future Questions
State of things
� Mono 2.6 (December 2009): .NET 4 beta 1� Task, Future, Parallel loops� Concurrent collections� Coordination data structures
� Mono 2.8 (or git master): .NET 4 compliant� Enabled by default� Tons of improvement� With PLinq!
21 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline
� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency
� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
Closing the curtains
23 / 23