multicore programmingandtpl

Download Multicore programmingandtpl

Post on 20-Jan-2015




4 download

Embed Size (px)




  • 1. Multicore Programming

2. Agenda Part 1 - Current state of affairs Part 2 - Multithreaded algorithms Part 3 Task Parallel Library 3. Multicore ProgrammingPart 1: Current state of affairs 4. Why Moores law is not workinganymore Power consumption Wire delays DRAM access latency Diminishing returns of more instruction-level parallelism 5. Power consumption Suns Surface10,000 1,000 Rocket NozzlePower Density (W/cm2)100Nuclear Reactor 10 Pentium processorsHot Plate1 8080486386 70 8090 0010 Intel Developer Forum, Spring 2004 - Pat Gelsinger 6. Wire delays 7. DRAM access latency 8. Diminishing returns 80s 10 CPI 1 CPI 90 1 CPI 0.5CPI 00s: multicore 9. The Free Lunch Is Over. AFundamental Turn TowardConcurrency in SoftwareHerb Sutter 10. Survival To scale performance, put many processing cores on themicroprocessor chip New Moores law edition is about doubling of cores. 11. Quotations No matter how fast processors get, softwareconsistently finds new ways to eat up the extra speed If you havent done so already, now is the time to takea hard look at the design of yourapplication, determine what operations are CPU-sensitive now or are likely to become so soon, andidentify how those places could benefit fromconcurrency. -- Herb Sutter, C++ Architect at Microsoft (March2005) After decades of single core processors, the highvolume processor industry has gone from single todual to quad-core in just the last two years. MooresLaw scaling should easily let us hit the 80-core markin mainstream processors within the next ten yearsand quite possibly even less.-- Justin Rattner, CTO, Intel (February 12. What keeps us away from multicore Sequential way of thinking Believe that parallel programming is difficult anderror-prone Unwilling to accept the fact that sequential era isover Neglecting performance 13. What have been done Many frameworks have been created, that bringsparallelism at application level. Vendors hardly tries to teach programmingcommunity how to write parallel programs MIT and other education centers did a lot ofresearches in this area 14. Multicore ProgrammingPart 2: Multithreaded algorithms 15. Chapter 27 MultithreadedAlgorithms 16. Multithreaded algorithms No single architecture of parallelcomputer no single and wideaccepted model of parallelcomputing We rely on parallel shared memorycomputer 17. Dynamic multithreaded model(DMM) Allows programmer to operate with logicalparallelism without worrying about any issues ofstatic programming Two main features are: Nested parallelism (parent can proceed whilespawned child is computing its result) Parallel loop (iteration of the loop can executeconcurrently) 18. DMM - advantages Simple extension of serial model. Only 3 newkeywords: parallel, spawn and sync. Provides theoretically clean way of quantifyparallelism based on notions of work andspan Many MT algorithms based on nested parallelisma naturally follows from divide and conquerapproach 19. Multithreaded execution model 20. Work 21. Span 22. Speedup 23. Parallelism 24. Performance summary 25. Example: fib(4) 26. Scheduler role 27. Analyzing MT algorithms: MatrixmultiplicationP-Square-Matrix-Multiply:1. n = a.rows2. let C be new NxN matrix3. parallel for i = 1 to n4. parallel for j = 1 to n5. Cij = 06. for k 1 to n7.Cij= Cij + Aik * B kj 28. Analyzing MT algorithms: Matrixmultiplication 29. Chess Lesson 30. Multicore ProgrammingPart 2: Task Parallel Library 31. TPL building blocks Consist of:- Tasks- Tread Safe Scalable Collections- Phases and Work Exchange- Partitioning- Looping- Control- Breaking- Exceptions- Results 32. Data parallelismParallel.ForEach(letters, ch => Capitalize(ch)); 33. Task parallelismParallel.Invoke(() => Average(), () => Minimum()); 34. Thread Pool in .net 3.5 35. Thread Pool in .NET 4.0 36. Task Scheduler & Thread pool 3.5 ThreadPool.QueueUserWorkItem disadvantages: Zero information about each work item Fairness FIFO queue maintain Improvements: More efficient FIFO queue (ConcurrentQueue) Enhance the API to get more information from user Task Work stealing Threads injections Wait completion, handling exceptions, getting computation result 37. New Primitives Thread-safe, scalable collections AggregateException IProducerConsumerCollection Initialization ConcurrentQueue Lazy ConcurrentStack LazyInitializer.EnsureInitialized ConcurrentBag ThreadLocal ConcurrentDictionary Locks ManualResetEventSlim Phases and work exchange SemaphoreSlim Barrier SpinLock BlockingCollection SpinWait CountdownEvent Cancellation Partitioning CancellationToken{Source} {Orderable}Partitioner Partitioner.Create Exception handling 38. References The Free Lunch Is Over: A Fundamental TurnToward Concurrency in Software MIT Introduction to algorithms video lectures Chapter 27 Multithreaded Algorithms fromIntroduction to algorithms 3rd edition CLR 4.0 ThreadPool Improvements: Part 1 Multicore Programming Primer ThreadPool on Channel 9