using parallel computing platform - nhdnug
DESCRIPTION
Slides from Phil Pennington\'s talk on Using Parallel Computing with Visual Studio 2010 and .NET 4.0, originally presented at the North Houston .NET Users Group (facebook.com/nhdnug).TRANSCRIPT
![Page 2: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/2.jpg)
Agenda
2
• What’s new with Windows?• Parallel Computing Tools in Visual Studio• Using .NET Parallel Extensions
![Page 3: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/3.jpg)
First, An ExampleMonte Carlo Approximation of Pi
S = 4*r*r C = Pi*r*r
Pi = 4*(C/S)
For each Point (P),d(P) = SQRT((x * x) + (y * y))
if (d < r) then P(x,y) is in C
![Page 4: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/4.jpg)
Windows and Maximum Processors• Before Win7/R2, the maximum number of Logical Processors (LPs)
was dictated by processor integral word size– LP state (e.g. idle, affinity) represented in
word-sized bitmask– 32-bit Windows: 32 LPs– 64-bit Windows: 64 LPs
01631
32-bit Idle Processor Mask
Idle Busy
![Page 5: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/5.jpg)
Processor GroupsNew with Windows 7 and Windows Server R2
5
GROUPNUMA NODE
NUMA NODE
Socket Socket
Core Core
Core CoreLP
LP
LP
LP
![Page 6: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/6.jpg)
Processor GroupsExample: 2 Groups, 4 nodes, 8 sockets, 32 cores, 128 LP’s
6
Group
NUMA NodeSocket
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
Socket
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
NUMA NodeSocket
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
Socket
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
Group
NUMA NodeSocket
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
Socket
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
NUMA NodeSocket
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
Socket
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
CoreLP
LP
LP
LP
![Page 7: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/7.jpg)
Many-Core Topology APIs Discovery
7
![Page 8: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/8.jpg)
Many-Core Topology APIs Resource Localization
8
![Page 9: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/9.jpg)
Many-Core Topology APIs Memory Management
9
![Page 10: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/10.jpg)
Your Schedule
rLogic
Reason:
Yield
Wait
Reason:
Yield
User Mode SchedulingArchitectural Perspective
Application
Kernel
S1 S2
Scheduler Threads
CPU 1 CPU 2
W1 W2 W3 W4
Blocked Worker Threads
UMS Scheduler’s Ready List
UMS Completion List
Reason:
Created
Reason:
Blocked
![Page 11: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/11.jpg)
Task Scheduling with a UMS SchedulerMaximize Quantum, Minimize Blocking Affects
• Tasks are run by worker threads, which the scheduler controls
Dead Zone
WT0
WT1
WT2
WT3 Without UMS (signal-and-wait)
With UMS (UMS yield)
WT0
WT1
WT2
WT3
![Page 12: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/12.jpg)
CPU0 CPU1 CPU2 CPU3
Static Scheduling
Load-Balancing, Work Stealing Scheduler
Dynamic scheduling improves performance by distributing work efficiently at runtime.
CPU0 CPU1 CPU2 CPU3
Dynamic Scheduling
![Page 13: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/13.jpg)
Demos
The Platform- Topology- Schedulers
![Page 14: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/14.jpg)
Agenda
14
• What’s new with Windows?• Parallel Computing Tools in Visual Studio• Using .NET Parallel Extensions
![Page 15: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/15.jpg)
Tools Programming Models – Structured Parallelism
.NET Parallel Extensions
.NET Runtime
Visual Studio 2010, .NET Developer Tools, Programming Models, Runtimes
Parallel LINQ(PLINQ)
Resource Manager
Task Scheduler
Managed Library
Threads Pools
Dat
a S
tru
ctu
res
Tools
Debugger
Profiler
Task ParallelLibrary (TPL)
![Page 16: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/16.jpg)
Thread-Pool Scheduler in .NET 4.0
• Global Q is shared by legacy ThreadPool API and TPL
• Local work queues and work stealing scheduler (TPL only)
Enqueue
Global Queue (FIFO)
Thread 1Dispatch
Loop
Thread 1Local Queue
(LIFO)
Thread 2Dispatch
Loop
Thread 2Local Queue
(LIFO)
Thread NDispatch
Loop
Thread NLocal Queue
(LIFO)
Dequeue
DequeueEnqueue
Steal
T2T3 T4
Steal Steal
T5
T6
T7
T8
T1
![Page 17: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/17.jpg)
Task Parallel Library (TPL)Tasks Concepts
TaskAn asynchronous operation
Task<TResult>A Task that returns a result
ContinuationA Task that starts when another
completes
FromAsyncA Task that wraps an existing APM
implementation
TaskCompletionSourceA Task that represents another
operation
TaskSchedulerAn extensible scheduler that executes
Tasks
Common Functionality: waiting, cancellation, continuations, parent/child relationships
![Page 18: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/18.jpg)
Primitives and Structures• Thread-safe, scalable collections
– IProducerConsumerCollection<T>• ConcurrentQueue<T>• ConcurrentStack<T>• ConcurrentBag<T>
– ConcurrentDictionary<TKey,TValue>
• Phases and work exchange– Barrier – BlockingCollection<T>– CountdownEvent
• Partitioning– {Orderable}Partitioner<T>
• Partitioner.Create
• Exception handling– AggregateException
• Initialization– Lazy<T>
• LazyInitializer.EnsureInitialized<T>
– ThreadLocal<T>
• Locks– ManualResetEventSlim– SemaphoreSlim– SpinLock– SpinWait
• Cancellation• CancellationToken{Source}
![Page 19: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/19.jpg)
Parallel Debugging
• Two new debugger toolwindows– Support both native and managed
• “Parallel Tasks”• “Parallel Stacks”
![Page 20: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/20.jpg)
Parallel Tasks
− What threads are executing my tasks?− Where are my tasks running (location,
call stack)?− Which tasks are blocked?− How many tasks are waiting to run?
![Page 21: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/21.jpg)
Parallel Stacks
Zoom control Bird’s eye view
− Multiple call stacks in a single view− Task-specific view (Task status)− Easy navigation to any executing method− Rich UI (zooming, panning, bird’s eye view,
flagging, tooltips)
![Page 22: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/22.jpg)
Parallel Profiling
![Page 23: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/23.jpg)
CPU Utilization
Number of cores
Your Process
Idle time
Other processes
![Page 24: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/24.jpg)
Threads
Usage Hints
Detailed thread analysis(one channel per thread)
Active Legend
Hide uninteresting
threads
Measure time for interesting segments
Zoom in and out
Call Stacks
![Page 25: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/25.jpg)
CoresEach logical core
in a swim lane
One color per thread
Cross-core migration details
Migration visualization
![Page 26: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/26.jpg)
Demo
LibrariesLanguagesDebuggersProfilers
![Page 27: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/27.jpg)
Agenda
27
• What’s new with Windows?• Parallel Computing Tools in Visual Studio• Using .NET Parallel Extensions
![Page 28: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/28.jpg)
Thinking Parallel - “Task” vs. “Data” Parallelism
Task Parallelism
Parallel.Invoke(() => { Console.WriteLine("Begin first task...");
}, () => { Console.WriteLine("Begin second task...");
}, () => { Console.WriteLine("Begin third task...");
} );
Data Parallelism
IEnumerable<int> numbers = Enumerable.Range(2, 100-3);var myQuery =
from n in numbers.AsParallel()where Enumerable.Range(2,
(int)Math.Sqrt(n)).All(i => n % i > 0)select n;
int[] primes = myQuery.ToArray();
![Page 29: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/29.jpg)
Thinking Parallel – How to Partition Work?
Several partitioning schemes built-in– Chunk
• Works with any IEnumerable<T>• Single enumerator shared; chunks handed out on-demand
– Range• Works only with IList<T>• Input divided into contiguous regions, one per partition
– Stripe• Works only with IList<T>• Elements handed out round-robin to each partition
– Hash• Works with any IEnumerable<T>• Elements assigned to partition based on hash code
Custom partitioning available through Partitioner<T>– Partitioner.Create available for tighter control over built-in partitioning schemes
![Page 30: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/30.jpg)
Thinking Parallel – How to Execute Tasks?
![Page 31: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/31.jpg)
Thinking Parallel – How to Collate Results?
![Page 32: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/32.jpg)
Demos
PartitionExecuteCollate
![Page 33: Using Parallel Computing Platform - NHDNUG](https://reader034.vdocuments.mx/reader034/viewer/2022052618/554f4189b4c90572088b535f/html5/thumbnails/33.jpg)
Resources
• Native APIs/runtimes (Visual C++ 10)– Tasks, loops, collections, and Agents– http://msdn.microsoft.com/en-us/library/dd504870(VS.100).aspx
• Tools (in the VS2010 IDE)– Debugger and profiler– http://msdn.microsoft.com/en-us/library/dd460685(VS.100).aspx
• Managed APIs/runtimes (.NET 4)– Tasks, loops, collections, and PLINQ– http://msdn.microsoft.com/en-us/library/dd460693(VS.100).aspx
General VS2010 Parallel Computing Developer Centerhttp://msdn.microsoft.com/en-us/concurrency/default.aspx