phil pennington sr. developer evangelist microsoft corporation session code: wsv325
TRANSCRIPT
Technical Computing from Domain Analysis to Performance Profiling
Phil PenningtonSr. Developer EvangelistMicrosoft Corporation
SESSION CODE: WSV325
AGENDATechnical_Computing = Parallel (Platform + Tools + Solvers + Analysis);
Technical Computing @ MicrosoftParallel Tools in Visual StudioThinking ParallelUsing TPL and C#
An ExampleMonte Carlo Approximation of Pi
S = Area of square
S = (2*r) * (2*r) = 4*r *r
S = 4*r*r
DEMO
C = Area of CircleC = Pi*r*r
Pi = 4 * (Area of Circle / Area of Square)
DEMO
S = 4*r*rC = Pi*r*r
Pi = 4*(C/S)
An ExampleMonte Carlo Approximation of Pi
For each Point (P),d(P) = SQRT((x * x) + (y * y))
if (d < r) then (x,y) in C
Why Parallel?
Windows and Logical ProcessorsBefore Win7/R2, the maximum number of Logical Processors (LPs) was dictated by processor integral word size
LP state (e.g. idle, affinity) represented in word-sized bitmask32-bit Windows: 32 LPs64-bit Windows: 64 LPs
01631
32-bit Idle Processor Mask
Idle Busy
Windows Organizes Many-Cores via GROUPNew with Windows 7 and R2
GROUPNUMA NODE
NUMA NODE
Socket Socket
Core Core
Core CoreLP LP
LP LP
NUMA = Non-Uniform Memory Access
LP = Logical Processor
Processor GroupsExample: 2 Groups, 4 Nodes, 8 Sockets, 32 Cores, 4 LPs/Core = 128 LPs
Group
NUMA NodeSocket
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
Socket
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
NUMA NodeSocket
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
Socket
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
Group
NUMA NodeSocket
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
Socket
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
NUMA NodeSocket
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
Socket
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CoreLP LP
LP LP
CPU0 CPU1 CPU2 CPU3
Static Scheduling
Load-Balancing Task Scheduler
Dynamic scheduling improves performance by distributing work efficiently at runtime.
CPU0 CPU1 CPU2 CPU3
Dynamic Scheduling
Your Scheduler
Logic
Reason:Yield
Wait
Reason:Yield
User Mode SchedulingArchitectural Perspective
Application
Kernel
S1 S2
Scheduler Threads
CPU 1 CPU 2
W1 W2 W3 W4
Blocked Worker Threads
UMS Scheduler’s Ready List
UMS Completion List
Reason:Created
Reason:Blocked
The Platform
Topology
DEMO
AGENDATechnical_Computing = Parallel (Platform + Tools + Solvers + Analysis);
Technical Computing @ MicrosoftParallel Tools in Visual StudioThinking ParallelUsing TPL and C#
Tasks in .NET and C++
.NET 4.0Parallel.For(x, y, λ)Parallel.ForEach(IEnum, λ)Parallel.Invoke(λ, λ)TaskTask.Factory.StartNew(λ)ThreadPool-based
Visual C++ 10parallel_for(x, y, step, λ);parallel_for_each(it, λ)parallel_invoke(λ, λ)task_group / task_handletask_group::run (λ)Native concurrency runtime
(and many overloads for the above)
Tools Programming Models – Structured Parallelism
.NET Parallel Extensions
.NET Runtime
Visual Studio 2010, .NET Developer Tools, Programming Models, Runtimes
Parallel LINQ(PLINQ)
Resource Manager
Task Scheduler
Managed Library
Threads Pools
Dat
a St
ruct
ures
Tools
ParallelDebugger
ParallelProfiler
Task ParallelLibrary
Tools Programming Models – Structured Parallelism
C++ Concurrency Runtime
Operating System
Visual Studio 2010, C++ Developer Tools, Programming Models, Runtimes
Parallel Pattern Library
Resource Manager
Task Scheduler
Native LibraryKey:
Threads
Dat
a St
ruct
ures
Tools
Parallel Debugger
Parallel Profiler
AgentsLibrary
Win7/R2: UMS Threads
Capabilities Comparison (1)VS2010 PCP Technologies
Capability .NET4 TPL
C++ ConcRT OpenMP PLINQ MS MPI Threads/Thread-
Pools
Task Parallelism Y Y N+ Y- N N
Data Parallelism Y Y N+ Y Y N
Parallel Patterns Y Y N Y N+ N
Fine-grained Parallelism (loops) Y Y Y Y- N N
Work-Item Partitioning Y Y Y- Y N+ N
Dynamic Scheduling Y Y N Y N+ N
Capabilities Comparison (2)VS2010 PCP Technologies
Capability .NET4 TPL
C++ ConcRT OpenMP PLINQ MS MPI Threads/Thread-
Pools
Affinity N N Y N Y- Y
Concurrent Data Structures Y Y N+ Y N N
Scalable Memory Allocator Y Y N Y N N
Optimized I/O Capability N N N N Y Y
User-Mode Sync Primitives Y Y Y Y Y N
Automatically Collates Results N N N Y N N
The Tools
LibrariesLanguagesDebuggersProfilers
DEMO
AGENDATechnical_Computing = Parallel (Platform + Tools + Solvers + Analysis);
Technical Computing @ MicrosoftParallel Tools in Visual StudioThinking ParallelUsing TPL and C#
Thinking Parallel - “Control” vs. “Data” ParallelismControl Parallelism
Parallel.For (0, size, (i) =>{
Console.WriteLine(i);});
Data Parallelism
IEnumerable<int> numbers = Enumerable.Range(2, 100-3);var parallelQuery =
from n in numbers.AsParallel()where Enumerable.Range(2, (int)Math.Sqrt(n)).All(i => n % i >
0)select n;
int[] primes = parallelQuery.ToArray();
Thinking Parallel – How to Schedule my Tasks?
Thinking Parallel – How to Partition my Data?Several partitioning schemes built-in
ChunkWorks with any IEnumerable<T>Single enumerator shared; chunks handed out on-demand
RangeWorks only with IList<T>Input divided into contiguous regions, one per partition
StripeWorks only with IList<T>Elements handed out round-robin to each partition
HashWorks with any IEnumerable<T>Elements assigned to partition based on hash code
Custom partitioning available through Partitioner<T>Partitioner.Create available for tighter control over built-in partitioning schemes
Thinking Parallel – How to Collate my Results?
Using TPL and C#
PartitionExecute (i.e. Schedule)Collate
DEMOs
Track Resources
Managed APIs/runtimes (.NET 4)Tasks, loops, collections, and PLINQhttp://msdn.microsoft.com/en-us/library/dd460693(VS.100).aspx
Native APIs/runtimes (Visual C++ 10)Tasks, loops, collections, and Agentshttp://msdn.microsoft.com/en-us/library/dd504870(VS.100).aspx
Tools (in the VS2010 IDE)Debugger and profilerhttp://msdn.microsoft.com/en-us/library/dd460685(VS.100).aspx
General VS2010 Parallel Computing Developer Centerhttp://msdn.microsoft.com/en-us/concurrency/default.aspx
Related Content
DEV314, ManyCore and .NET4 with VS2010, Mon, 14:45, Rm 288ARC205, Patterns of Parallel Programming, Tues, 17:00, Rm 276ARC02-INT, Patterns for Parallel Programming, Wed, 08:00, Rm 346DEV408, TPL: Design Principles and Best Practices, Wed, 11:45, Rm 283DEV317, Profiling and Debugging Parallel Code with VS2010, Thurs, 08:00, Rm 293DEV307, F# in VS2010, Thurs, 09:45, Rm 276WSV325, TC from Domain Analysis to Performance Profiling, Thurs, 17:00, Rm 388
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
Complete an evaluation on CommNet and enter to win!
Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st
http://northamerica.msteched.com/registration
You can also register at the
North America 2011 kiosk located at registrationJoin us in Atlanta next year
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
JUNE 7-10, 2010 | NEW ORLEANS, LA