parallelfx, bringing mono applications in the multicore era
DESCRIPTION
ulticore computer are now part of our everyday life. Most desktop and laptop machines out there bundle a dual-core processor, quad-core processor or even 8-core processor by default. This multiplication of the number of core on the same chip is destined to become the way for manufacturers to remain competitive. However, developers were a bit left out in this process, having written sequential programs for ages whereas they were now required to parallelize their program to make them efficient which isn't an easy step to take. That's why we now see the apparition of framework designed to help programmers to take advantage of this new architecture of processor by hiding away the parallel difficulty under primitives that they are used to. ParallelFx is one of such framework for the Mono and .NET world. By providing several new parallel constructs and concurrent data structures, it allows Mono applications to enter painlessly in this new multicore era.TRANSCRIPT
ParallelFxBringing Mono applications in the multicore era
Jérémie Laval
http://blog.neteril.org
http://twitter.com/jeremie_laval
IRC Garuma on #mono @ GIMPNet.org
Outline
Why bother?The free lunchAn awesome ideaRemaining performant
ParallelFxThe big pictureTasks and coPLinqState of things
A note for the future
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Everyone loves his single thread
“The ideal number of thread you should use is 1”
– Alan Mc GovernMono hacker
3 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Everyone loves his single thread
“The ideal number of thread you should use is 1”
– Alan Mc GovernMono hacker
3 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Everyone loves his single thread
“The ideal number of thread you should use is 1”
– Alan Mc GovernMono hacker
3 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Everyone loves his single thread
“The ideal number of thread you should use is 1”
– Alan Mc GovernMono hacker
3 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Because the free lunch is over!
4 / 23
Why bother? ParallelFx A note for the future Questions
Why are we bothering with parallelization
Because the free lunch is over!
4 / 23
Why bother? ParallelFx A note for the future Questions
Free lunch?
70’s - 2005: Moore’s law in action
Intel Processor Clock Speed (MHz)10000
1000
100
10
1
0.1
1968 1973 1979 1984 1990 1995 2001 2006
808080286
80386
80486
Pentium
Celeron
Pentium III
Pentium 4 (Prescott)
Core 2 Extreme
Multicore crisis is here !
Source: Smoothspan blog
5 / 23
Why bother? ParallelFx A note for the future Questions
The awesome idea
Can’t scale vertically? Scale horizontally!
6 / 23
Why bother? ParallelFx A note for the future Questions
The awesome idea
Can’t scale vertically? Scale horizontally!
6 / 23
Why bother? ParallelFx A note for the future Questions
Trend of things
Number of cores
1 2 4 12ish 80
Pentium Core DuoCore 2 Quad
Core i7-980X Intel prototype
7 / 23
Why bother? ParallelFx A note for the future Questions
Problem
No more magical free-lunch
8 / 23
Why bother? ParallelFx A note for the future Questions
Problem
No more magical free-lunch
8 / 23
Why bother? ParallelFx A note for the future Questions
Solution
Let’s break up work...
....and share it among cores9 / 23
Why bother? ParallelFx A note for the future Questions
Errr...
10 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?
� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?
� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?
� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?
� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?� Too few: it’s not scaling
� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?� Too few: it’s not scaling� Too much: context-switching hurts performance
� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
Parallelization is difficult
� Hard: is that stuff really thread safe?
� Tedious: whose turn is it to debug deadlocks?
� Inefficient: how many thread to use?� Too few: it’s not scaling� Too much: context-switching hurts performance� What if the number of core changes?
11 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible
� Simulate familiar constructs� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible
� Simulate familiar constructs� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible
� Simulate familiar constructs� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible� Simulate familiar constructs
� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
KISS time (Keep it simple, stupid)
We need something different :
� Automagically regulate thread usage at runtime
� As straightforward as possible� Simulate familiar constructs� Reuse existing code with only slight modification
12 / 23
Why bother? ParallelFx A note for the future Questions
Enter ParallelFx
ParallelFx at a glance
13 / 23
Why bother? ParallelFx A note for the future Questions
At the heart
Work-stealing scheduler
Mono Application
Sharedworkpool
ParallelFX library(Scheduler)
Thread Worker
Local workpool
OSthread
Thread Worker
Local workpool
OSthread
Steal
Retrieve
Manage
14 / 23
Why bother? ParallelFx A note for the future Questions
Tasks
CancellationTokenSource source = new CancellationTokenSource ();
Task task = Task.Factory.StartNew (() => DoSomeStuff (), source.Token);
Task continuation = task.ContinueWith ((t) => Console.WriteLine ("task finished");
source.Cancel ();
t.Wait ();
15 / 23
Why bother? ParallelFx A note for the future Questions
Future (Task<T>)
static int SumParallel (Tree<int> tree, int curDepth){
const int SequentialThreshold = 3;
if (tree == null) return 0;
if (curDepth > SequentialThreshold)return SumSequentialInternal (tree);
int right = SumParallel (tree.Right, curDepth + 1);
Task<int> left =Task.Factory.StartNew (() => SumParallel (tree.Left, curDepth + 1));
return tree.Data + left.Value + right;}
16 / 23
Why bother? ParallelFx A note for the future Questions
Parallel.For
Fractal fractal = new Fractal (width, height);
ColorChooser colorChooser = new ColorChooser ();
Parallel.For (0, width, (i) => {for (int j = 0; j < height; j++) {
ProcessPixel (i, j, fractal, colorChooser);}
});
17 / 23
Why bother? ParallelFx A note for the future Questions
Demo: Parallel For
Demo: image processing with parallelloops
18 / 23
Why bother? ParallelFx A note for the future Questions
PLinq
ParallelEnumerable.Range (1, 1000)./* ... */ParallelEnumerable.Repeat (‘‘Rupert’’, 1000)./* ... */enumerable.AsParallel ()./* ... */
var query = from x in Directory.GetFiles ("/etc/").AsParallel ()where x.EndsWith (".conf")select x;
query.ForAll ((e) => {Console.WriteLine ("{0} from {1}", e,
Thread.CurrentThread.ManagedThreadId);});
19 / 23
Why bother? ParallelFx A note for the future Questions
Demo: PLinq
Demo: raytracing the PLinq way
20 / 23
Why bother? ParallelFx A note for the future Questions
State of things
� Mono 2.6 (December 2009): .NET 4 beta 1� Task, Future, Parallel loops� Concurrent collections� Coordination data structures
� Mono 2.8 (or git master): .NET 4 compliant� Enabled by default� Tons of improvement� With PLinq!
21 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline
� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency
� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
A note for the future
GLinq: GPGPU-powered Linq
� Re-use PLinq pipeline� Same design guidelines than PLinq: simplicity, transparency� Rewrite C# code in OpenCL (Expression trees FTW!)
� Transparent mapping of OpenCL idioms to .NET
22 / 23
Why bother? ParallelFx A note for the future Questions
Closing the curtains
23 / 23