orion granatir omar rodriguez gdc 3/12/10 don’t dread threads
TRANSCRIPT
![Page 1: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/1.jpg)
Orion Granatir Omar Rodriguez
GDC 3/12/10
Don’t Dread Threads
![Page 2: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/2.jpg)
2
Agenda
• Threading is worthwhile
• Data decomposition is a good place to start
• Think tasks!!
• Intel tools help make things easy
![Page 3: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/3.jpg)
3
Threading is important!!
![Page 4: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/4.jpg)
4
Multi-core Needs Parallel Applications
Threading is required to maximize performance
GHz Era Multi-core Era
APP PERFORMANCE
TIME
PLATFORM POTENTIAL
PERF
ORM
ANCE
Parallel
Serial
33 FPS in our demo
104 FPS in our demo
![Page 5: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/5.jpg)
5
Follow these steps to add threading…
1.Use data decomposition
2.Use tasks
![Page 6: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/6.jpg)
6
Functional decomposition is limited
Core Core Core Core
![Page 7: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/7.jpg)
7
Functional decomposition is limited
Core Core Core Core
![Page 8: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/8.jpg)
8
Functional decomposition is limited
Core Core Core Core
• Potential latency with pipelining
• Poor load balancing
• Doesn’t scale on varying core counts
![Page 9: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/9.jpg)
9
Data decomposition can scale to n-cores
Core Core Core Core
![Page 10: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/10.jpg)
10
Big loops are ideal cases for data decomposition// Loop through each AIfor( int Index = 0; Index < g_NumAI; Index++ ){ // Update each AI for this frame g_AI[ Index ].Update();}
![Page 11: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/11.jpg)
11
Minimize interactions// Loop through each AIfor( int Index = 0; Index < g_NumAI; Index++ ){ // Update each AI for this frame g_AI[ Index ].Update();}
AI 0 AI 1
Set m_HP to 10
![Page 12: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/12.jpg)
12
Minimize interactions// Loop through each AIfor( int Index = 0; Index < g_NumAI; Index++ ){ // Update each AI for this frame g_AI[ Index ].Update();}
AI 0 AI 1
Set m_HP to 10
![Page 13: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/13.jpg)
13
Avoid locking// Loop through each AIfor( int Index = 0; Index < g_NumAI; Index++ ){ // Update each AI for this frame g_AI[ Index ].Update();}
AI 0 AI 1
Set m_HP to 10
![Page 14: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/14.jpg)
14
Read global data, don’t write// Loop through each AIfor( int Index = 0; Index < g_NumAI; Index++ ){ // Update each AI for this frame g_AI[ Index ].Update();}
![Page 15: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/15.jpg)
15
OpenMP is a great way to get started// Loop through each AI#pragma omp parallel forfor( int Index = 0; Index < g_NumAI; Index++ ){ // Update each AI for this frame g_AI[ Index ].Update();}
Serial 6 Core
1.00x 2.31x
Algorithm
~12.0x
![Page 16: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/16.jpg)
16
The next step is to use tasks
Core Core Core Core
![Page 17: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/17.jpg)
17
The next step is to use tasks
Core Core Core Core
![Page 18: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/18.jpg)
18
The next step is to use tasks
Core Core Core Core
![Page 19: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/19.jpg)
19
The next step is to use tasks
Core Core Core Core
• Needed for load balancing (avoid oversubscription)
• Support large chucks of work
• Better utilization of cache
![Page 20: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/20.jpg)
20
Task can be used to parallelize complex problems
Texture Lookup
Data Parallelism
ProcessingSetup
![Page 21: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/21.jpg)
21
Tasks can be arranged in a dependency graph
Texture Lookup
Data Parallelism
ProcessingSetup
![Page 22: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/22.jpg)
22
Dependency graph can be mapped to a thread pool
![Page 23: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/23.jpg)
23
Dependency graph can be mapped to a thread pool
Core
Core
Core
Core
![Page 24: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/24.jpg)
24
Think of a task as a unit of work
A task is a unit of work• It’s run on a thread pool
• It runs to completion
• It has heavy penalties for blocking
• It’s an efficient way to avoid oversubscription
• They adapt to any number of threads/cores … regardless of CPU topology
![Page 25: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/25.jpg)
25
// Update all AIvoid UpdateAI( float DeltaTime ){
for( int Index = 0; Index < g_NumAI; Index++ ) { // Update each AI for this frame g_AI[ Index ].Update(); }}
Data decomposition makes defining tasks easy
![Page 26: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/26.jpg)
26
// Update all AIvoid UpdateAI( float DeltaTime ){ // Determine the number of AI tasks we want to create unsigned int AIGroups = g_NumAI / MAX_AI_PER_GROUP;
for( unsigned int Index = 0; Index < AIGroups; Index++ ) { // Build the task specific data AITaskData* pData = new AITaskData(); pData->m_Start = Index * MAX_AI_PER_GROUP; pData->m_DeltaTime = DeltaTime;
// Submit task SubmitTask( Task_UpdateAI, (void*)pData ); }}
Data decomposition makes defining tasks easy
![Page 27: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/27.jpg)
27
void Task_UpdateAI( void* pTaskData ){ // Read data AITaskData* pData = (AITaskData*)pTaskData; unsigned int Start = pData->m_Start; unsigned int End = pData->m_Start + MAX_AI_PER_GROUP;
// Gap End with max number of AI End = ( End > g_NumAI ) ? g_NumAI : End;
// Loop through all of our AI and update for( unsigned int Index = Start; Index < End; Index++ ) { g_AI[ Index ].Update(); }
// Cleanup delete pData;}
Individual task are run by the thread pool
![Page 28: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/28.jpg)
28
Intel Threading Building Blocks is a good for tasksIntel® Threading Building Blocks (Intel® TBB) has a low-level API to create and process trees of work – each node is a task.
Root
Task
More
Callback
Spawn & Wait
Root
Task
More
Spawn
Wait
Blocking calls go down
Continuations go up
Root
![Page 29: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/29.jpg)
29
Learn more about tasking…
… or get Game Engine Gems 1* and read Brad Werth’s article.
Task-based Multithreading – How to Program for 100 Cores
Presented by Ron Fosner
Friday, March 12 @ 4:30PMSouth 300
* Other names and brands may be claimed as the property of others.
![Page 30: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/30.jpg)
30
Time to look at our example…
![Page 31: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/31.jpg)
31
Hotspots are good candidates for threading
• Use tools like Intel® Vtune™ and Intel®Parallel Studio to locate hotspots.
![Page 32: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/32.jpg)
32
Hotspots are good candidates for threading
• Use tools like Intel® Vtune™ and Intel®Parallel Studio to locate hotspots.
• Intel® Parallel Studio inspector shows that Flock() is the main bottleneck. This is a good place to investigate threading.
![Page 33: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/33.jpg)
33
Validate threading results with Parallel Amplifier
1.
2.
![Page 34: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/34.jpg)
34
Use Parallel Amplifier to validate concurrency
![Page 35: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/35.jpg)
35
Use Parallel Amplifier to validate concurrency
• We have “ideal” CPU utilization for Flocking. • Now we can start looking for other hotspots to optimize.
![Page 36: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/36.jpg)
36
Use Parallel Amplifier to validate concurrency
• We have “ideal” CPU utilization for Flocking. • Now we can start looking for other hotspots to optimize.• There is still a lot of serial code…
![Page 37: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/37.jpg)
37
Use Parallel Inspector to find threading errors
![Page 38: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/38.jpg)
38
Use Parallel Inspector to find threading errors
![Page 39: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/39.jpg)
39
Use Parallel Inspector to find threading errors
• Have a lot of system memory
• Use a reduced data set
• Workload should be repeatable
![Page 40: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/40.jpg)
40
Use other tools as needed… I like Intel® GPA• Intel® Graphics Performance Analyzer is designed for games.
• System Analyzer gives a complete view of system resources (CPU, GPU, Bus)
• Frame Analyzer allows you to dive into a DX frame • Platform View allow you to instrument code to analyze workload balance and execution time.
![Page 41: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/41.jpg)
41
Conclusion
• Threading is required to maximize your game
• Use data decomposition to scale to n-cores
• Use tasks for load balancing and to be platform independent
• Use Intel tools to make your life easier
• Attend: “Task-based Multithreading – How to Program for 100 Cores” this Friday.
![Page 42: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/42.jpg)
42
Email: [email protected] [email protected]
http://www.intel.com/software/gdc
See Intel at GDC: Intel Booth at Expo, North HallIntel Interactive Lounge
Contact Information
![Page 43: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/43.jpg)
43
Other Sessions
A Visual Guide to Game and Task Performance on Mass-market PC Game Platforms
Thursday, March 11 @ 4:30PMNorth 122
Building Games for NetbooksFriday, March 12 @ 9AMSouth 310
Simpler Better Faster VectorFriday, March 12 @ 1:30PMNorth 122
![Page 44: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/44.jpg)
44
Other Sessions
Tuning Your Game for Next Generation Intel Graphics
Friday, March 12 @ 1:30PMSouth 302
Task-based Multithreading – How to Program for 100 Cores
Friday, March 12 @ 4:30PMSouth 300
![Page 45: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/45.jpg)
Please fill out an evaluation form… it’ll help us win a bet
Thank you
![Page 46: Orion Granatir Omar Rodriguez GDC 3/12/10 Don’t Dread Threads](https://reader030.vdocuments.mx/reader030/viewer/2022032803/56649e2e5503460f94b1de8e/html5/thumbnails/46.jpg)
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO
LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.
Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and
are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as
errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.
Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license
*Other names and brands may be claimed as the property of others. Copyright © 2010 Intel Corporation.