overview of parallel development - ericnel
DESCRIPTION
VBUG Newcastle delivery 24th February 2009 by Eric NelsonTRANSCRIPT
1
Overview of Parallel Development
Eric Nelsonhttp://geekswithblogs.net/iupdateablehttp://blogs.msdn.com/goto100 http://twitter.com/ericnel
Agenda
Overview of what we are up toDrill down into parallel programming for managed developers
Things I learnt...We have a very large investment in parallel computing
We have “something for everyone”It is not all synced, it is sometimes overlapping
It is a big topicManaged vs native vs client vs server vs task vs data...
Even with the investment, design/code/test for parallel is far harder
Locking, Deadlocks, Livelocks
It is about getting ready for the futureCode today – run better tomorrow?
VS2010 CTP – not a great place for parallelSingle core in guestUnsupported route to use Hyper-V
Easiest route to dabble – Microsoft Parallel Extensions June CTP for VS2008
Buying a new Processor
£100 - £300£100 - £300
2-3GHz2-3GHz
2 cores or 42 cores or 4
64-bit64-bit
CoreCore
CoreCore
Buying a new Processor
CoreCoreCoreCoreCoreCoreCoreCore£200 - £500£200 - £500
2-3GHz2-3GHz
4 cores with HT4 cores with HT
64-bit64-bit
QuickPath QuickPath InterconnectInterconnect
Memory ControllerMemory Controller
Where will it all end?
Unisys ES7000 (7600R) used with kind permission of Mr Henk var der Valk, Unisys, NL
Was it a wise purchase?
Windows OSWindows OS
App 1App 1 App 2App 2 ......
App 1App 1
.NET CLR.NET CLR
.NET Framework.NET Framework
My CodeMy Code
Was it a wise purchase?
Some environments scale to take advantage of additional CPU cores (mostly server-side)
A lot of code does not (mostly client-side)This code will see little benefit from future hardware advances
ASP.NET Web Forms/ServicesASP.NET Web Forms/Services WCF ServicesWCF Services WF EngineWF Engine ......
.NET ThreadPool or Custom Threading Strategy.NET ThreadPool or Custom Threading Strategy
What happened to “The Free Lunch”?
Bad sequential code will run faster on a faster processor
Just using parallel code is not enoughBad parallel code WILL NOT run faster on more cores
0
16
32
48
64
0 16 32 48 64
Cores
Par
alle
l S
pee
du
p
Production Fluid
Production Face
Production Cloth
Game Fluid
Game Rigid Body
Game Cloth
Marching Cubes
Sports Video Analysis
Video Cast Indexing
Home Video Editing
Text Indexing
Ray Tracing
Foreground Estimation
Human Body Tracker
Portifolio Management
Geometric Mean
Graphics Rendering – Physical Simulation -- Vision – Data Mining -- Analytics
Applications Can Scale Well
Multithreaded programming is “hard” todayDoable by only a subgroup of senior specialistsParallel patterns are not prevalent, well known, nor easy to implementSo many potential problems
Races, deadlocks, livelocks, lock convoys, cache coherency overheads, lost event notifications, broken serializability, priority inversion, and so on…
Businesses have little desire to “go deep”Best developers should focus on business value, not concurrencyNeed simple ways to allow all developers to write concurrent code
What's The Problem?
void MatrixMult( int size, double** m1, double** m2, double** result){ for (int i = 0; i < size; i++) { for (int j = 0; j < size; j++) { result[i][j] = 0; for (int k = 0; k < size; k++) { result[i][j] += m1[i][k] * m2[k][j]; } } }}
void MatrixMult( int size, double** m1, double** m2, double** result) { int N = size; int P = 2 * NUMPROCS; int Chunk = N / P; HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, NULL); long counter = P; for (int c = 0; c < P; c++) { std::thread t ([&,c] { for (int i = c * Chunk; i < (c + 1 == P ? N : (c + 1) * Chunk); i++) { for (int j = 0; j < size; j++) { result[i][j] = 0; for (int k = 0; k < size; k++) { result[i][j] += m1[i][k] * m2[k][j]; } } } if (InterlockedDecrement(counter) == 0) SetEvent(hEvent); }); } WaitForSingleObject(hEvent,INFINITE); CloseHandle(hEvent);}
Synchronization Knowledge
Error prone
Heavy synchronization
Static partitioning
Lack of thread reuse
Tricks
Lots of boilerplate
Microsoft Parallel Computing Technologies
•Robotics-based manufacturing assembly line•Silverlight Olympics viewer
•Enterprise search, OLTP, collab•Animation / CGI rendering•Weather forecasting•Seismic monitoring•Oil exploration
•Automotive control system •Internet –based photo services
•Ultrasound imaging equipment •Media encode/decode•Image processing/ enhancement•Data visualization
Task Concurrency
Data Parallelism
Distributed/Cloud Computing
LocalComputing
CCR
Maestro
TPL / PPL
Cluster TPL
Cluster PLINQ
MPI / MPI.Net
WCF
Cluster SOA
WF
PLINQ
TPL / PPL
CDS
OpenMP
WF
Compute Shader
Visual Studio 2010Tools / Programming Models / Runtimes
Parallel Pattern Library
Resource Manager
Task Scheduler
Task Parallel Library
PLINQ
Managed Library Native Library
ThreadsThreadsOperating System
Concurrency Runtime
Programming Models
AgentsLibrary
ThreadPool
Task SchedulerTask Scheduler
Resource ManagerResource Manager
Data Structures
Dat
a St
ruct
ures
Integrated Tooling
Tools
ParallelDebugger
Tool
Profiler Concurrenc
yAnalysis
Programming Models
Concurrency Runtime
16
Explicit Tasking Support
.NET 4.0 Task Parallel Library
Task, TaskFactoryParallel.ForParallel.ForeachParallel.InvokeConcurrent data structures
Visual Studio 2010 C++Parallel Pattern Library
task, task_groupparallel_forparallel_for_eachparallel_invokeConcurrent data structuresPrimitives for message passingUser-mode locks
Task Parallel Library ( TPL )
18
Task
No Threadingto Threadingto Tasks
Program Thread
Program Thread
CLR Thread Pool
User Mode Scheduler
GlobalQueue
Worker Thread 1
Worker Thread p
CLR Thread Pool: Work-Stealing
Worker Thread 1
Worker Thread p
Program Thread
Program Thread
User Mode Scheduler For Tasks
GlobalQueue
LocalQueue
LocalQueue
Task 1Task 1Task 2Task 2
Task 3Task 3Task 5Task 5Task 4Task 4
Task 6Task 6
Debugger Support
Support both managed and native1. Parallel Tasks2. Parallel Stacks
Higher Level Constructs
Even with Task there are common patterns that build into higher level abstractions
The Parallel classInvoke, For, For<T>, Foreach
Care needs to be taken with state, ordering“This is not your Father’s for loop”
23
Parallel
Parallel.ForEachParallel.Invoke
Declarative Data Parallelism
Parallel LINQ-to-Objects (PLINQ)Enables LINQ devs to leverage multiple coresFully supports all .NET standard query operatorsMinimal impact to existing LINQ model
var q = from p in people where p.Name == queryInfo.Name && p.State == queryInfo.State && p.Year >= yearStart && p.Year <= yearEnd orderby p.Year ascending select p;
25
Parallel LINQ
What Next?
Download VS 2010 CTPRemember to set the clock back
OrDownload Parallel Extensions June CTP for VS2008Experiment with runtime and API
Team is working on Visual Studio 2010 betaVery open to feedbackJoin in the discussion forumshttp://blogs.msdn.com/pfxteam/
Parallel Computing Resources
Downloads, Binaries, Code, Forums, Blogs, Videos, Screencasts,
Podcasts, Articles, Samples
http://msdn.com/concurrency
http://blogs.msdn.com/pfxteam/