![Page 1: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/1.jpg)
Architecting Scalable and Responsive Applications
John FeoArchitectMicrosoft Corporation
SYMP02
![Page 2: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/2.jpg)
The Free Lunch Is Over
Bad sequential code will run faster on a faster processorBad parallel code WILL NOT run faster on more cores
1 2 4 8 16 320
0.5
1
1.5
2
2.5
3
Speedup
Speedup
Just using parallel code is not enough
![Page 3: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/3.jpg)
How to think about different levels of parallelism
How to architect parallel algorithms Optimization techniques
Agenda
![Page 4: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/4.jpg)
No – I can compute my problem, today and tomorrow, in a reasonable time
Yes – I need to compute N instances of my problem (throughput) Animated film Portfolio simulations
Yes – I need to compute one instance of my problem in time T (capability) Director frames Intelligent avatars Speech recognition
Do I Need Parallelism?
![Page 5: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/5.jpg)
Master sends out inputs and gathers results Increase # of processors to decrease time to
solution Sequential worker code is okay, but only if
problem instances are balanced; otherwise,
fit on single computing element; otherwise,
Throughput Computing
Memory
CPU
Memory
CPU
Memory
CPU
Memory
CPU
![Page 6: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/6.jpg)
Render a short sequence of movie frames quickly and accurately Don’t keep the expensive director waiting Get enough accuracy to enable the
right decision Create more challenging game adversaries
Faster More skills Better accuracy More intelligence
Capability Computing
![Page 7: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/7.jpg)
Problem always as big as the machine Write parallel code that effectively
uses all resources Decompose by task or data Communicate shared values Synchronize access to shared values
Challenges Of Capability Computing
Optimize for critical resource
![Page 8: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/8.jpg)
You want enough parallelism to keep the machine busy
Enough, But Not Too Much…
too much parallelism may
increase communication/ synchronization
costs
watch out for resources
consumed by waiting
tasks
over decomposeto improve
load balance
rely on MS runtime to schedule
and manage threads
decompose to scale with
problem size and processor count
![Page 9: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/9.jpg)
Decompose program by operations
Task Parallelism
Audio
Video
UINetwork
Avatars
Weapons
Vehicles
![Page 10: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/10.jpg)
High level, big chunks Communication is coarse grain Synchronization is minimal Number of tasks is small
May not scale May not load balance Take advantage of data parallelism within tasks
The Good And Bad Of Task Parallelism
![Page 11: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/11.jpg)
Think of application as a set of elements
Lots of parallelism (easy to load balance) Scales with problem size and # processors Just for loops, so similar to sequential code
Data Parallelism
for all pixels for all triangles
![Page 12: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/12.jpg)
Most data parallel applications can be parallelized at different levels Cubes, planes, columns, cells Graphs, nodes, edges Volumes, objects, rays, pixels
Levels Of Data Parallelism
![Page 13: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/13.jpg)
Sequential Matrix Multiply
for(0, N, i => { for(0, M, j => { for(0, N, k => { C[i][j] += A[i][k] * B[j][k]; }) }) })
![Page 14: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/14.jpg)
Parallel Matrix Multiply
Parallel.For(0, N, i => { for(0, M, j => { for(0, N, k => { C[i][j] += A[i][k] * B[j][k]; }) }) })
![Page 15: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/15.jpg)
More Parallel Matrix Multiply
Parallel.For(0, N * M, ij => { i = ij / N; j = ij % N; for(0, N, k => { C[i][j] += A[i][k] * B[j][k]; }) })
![Page 16: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/16.jpg)
Reduce Loads And Stores
Parallel.For(0, N * M, ij => { i = ij / N; j = ij % N; double sum = 0.0; for(0, N, k => { sum += A[i][k] * B[j][k]; })
C[i][j] = sum; })
![Page 17: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/17.jpg)
Permute An Array Of Integers
for(0, NumberOfSwaps, k => { int i = (int) (N * Rnd.NextDouble()); int j = (int) (N * Rnd.NextDouble()); int temp; temp = X[i]; X[i] = X[j]; X[j] = temp; })
![Page 18: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/18.jpg)
Parallel Permutation
Parallel.For(0, NumberOfSwaps, k =>
{
int i = (int) (N * Rnd.NextDouble());
int j = (int) (N * Rnd.NextDouble());
// lock elements i and j, and swap
if (i == j) continue;
else if (i > j) Swap(i, j);
lock (X + i)
{
lock (X + j)
{
int temp = X[i];
X[i] = X[j];
X[j] = temp;
}
}
})
![Page 19: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/19.jpg)
Maybe We Can Make It Easy…
Parallel.For(0, NumberOfSwaps, k => { int i = (int) (N * Rnd.NextDouble()); int j = (int) (N * Rnd.NextDouble()); transaction { int temp = X[i]; X[i] = X[j]; X[j] = temp; } })
![Page 20: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/20.jpg)
Use the best parallel algorithm Use hardware accelerators Use special instructions (SSE) Push parallelism as far out as possible Cut-off recursion Accumulate locally Pre-allocate memory Collapse and fuse loops Use parallel data structures Remove I/O from inside parallel regions
Optimize, Optimize, Optimize
![Page 21: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/21.jpg)
Given a graph G(V, E) and nodes s and t, does there exist a path from s to t?
s is the source, t is the sink Two common solutions:
Horizon iterative outer loop, data parallel inner loop user managed “list of nodes”
Recursion data parallel loop with recursive call rely on runtime system to manage program
Breadth-first Search
![Page 22: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/22.jpg)
Horizon Method
3
1
6
4
8s
2
s1 3 52 4 6 87 9 t
5
9
A
D
B
C
E
t
7
![Page 23: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/23.jpg)
Our Code For Horizon Method
private BlockingCollection<node> _horizon =
new BlockingCollection<node>();
_horizon.Add(source);
Parallel.ForEach(_horizon.GetConsumingEnumerable(), n =>
{
for (n.Neighbors(), nn =>
{
if (nn.NotVisited())
{
if (nn == sink)
{_horizon.CompleteAdding(); break;}
else
{try {_horizon.Add(nn);} catch() {break;}}
}
})
})
![Page 24: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/24.jpg)
Not Visited
public boolean NotVisited() {
boolean flag = false;
if (visited == 0) { lock(&visited) { if (visited == 0) {flag = true; visited = 1;} } }
return flag;
}
![Page 25: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/25.jpg)
Recursive Method
3
1
6
4
8s
2s
5
9
A
D
B
C
E
t
71 3 5
2 4 6 8
9 7 t
![Page 26: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/26.jpg)
Our Code For Recursive Method
public static boolean BFS(node source) { CancellationTokenSource found; _BFS(source, found);
return found.IsCancellationRequested; }
![Page 27: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/27.jpg)
Our Code For Recursive Method
private static void _BFS (node n, CancellationTokenSource found) { if (found.IsCancellationRequested) return;
Parallel.ForEach(n.Neighbors(), nn => { if (nn.NotVisited()) { if (nn == sink) found.Cancel(); else _BFS(nn, found); } }) }
![Page 28: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/28.jpg)
Find the best parallel algorithm Find the right level of parallelism Think data parallel first Use parallel data structures Minimize shared data and synchronization Optimize, optimize, optimize
Summary
Bad sequential code will run faster on a faster processorBad parallel code WILL NOT run faster on more cores
![Page 29: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/29.jpg)
Learn more about Parallel Computing at:
MSDN.com/concurrencyAnd download Parallel Extensions to
the .NET Framework!
![Page 30: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/30.jpg)
![Page 31: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/31.jpg)
Connected Visual Computing
Jerry Bautista, PhDDirector, Microprocessor Technology ManagementIntel Corporation
![Page 32: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/32.jpg)
SOCIAL NETWORKING
USER-GENERATED
CONTENT
VISUALCOMPUTING
Internet Trends Converging
BROADBANDCONNECTIVITY
MOBILE COMPUTING
![Page 33: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/33.jpg)
Physics-based Animation
Expressive Faces
Video Search
Computer Vision
Ray-traced Graphics
Look realAct realFeel real
Visual Computing – 3D and moreApplications that use every available FLOP
![Page 34: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/34.jpg)
Next: Connected Visual ComputingBringing VC to connected usage models
3D Digital Entertainment
Virtual Worlds
Creating newdigital worlds
Multiplayer Games
InternetData
PeopleEverywhere
The ActualWorld
CO
NN
ECTED
CO
NN
ECTED
CONNECTED
RichVisual
I nterfaces
LIMITED RICH
Better content quality, social interaction – a better user experience
StaticWeb Web 2.0 CVC
Real-world datavisualization
Enhancing theactual world
Earth Mapping
Augmented Reality
Social networking, collaboration, online gaming, online retail, and more.
![Page 35: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/35.jpg)
Simulated environments
All company and/or product names may be trade names, trademarks and/or registered trademarks of the respective owners with which they are associated.
Virtual Worlds
Multiplayer Online Games3D Cinema
• Realistic, representative visuals both professional and user-generated
• Socialization, education, entertainment, collaboration
![Page 36: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/36.jpg)
Data Visualization
West Nile VirusVisualization
Visualizing RealWorld Information --
Dust storm in Morocco
Virtual Colonoscopy
Sharing data and representing data in richer, more intuitive ways.
OpenSim N-body Simulation
All company and/or product names may be trade names, trademarks and/or registered trademarks of the respective owners with which they are associated.
![Page 37: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/37.jpg)
Collaboration Environments
Virtual team roomsEnterprise-class environments to allow virtual teams to have realistic, natural interactions
Virtual information environmentsInformation space for documents,
app-sharing, and visualizations
![Page 38: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/38.jpg)
Augmented RealityCombines real world info with data overlays
Virtual Instruction
Mobile Augmented Reality (MAR) particularly compelling
TextOverlays
2D/3D VisualOverlays
Visual Search
MapHybrids
Today 2010 2012 2014
Location Information Identification &Hyperlinking
Translation
![Page 39: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/39.jpg)
Meeting The Challenges Of CVC
Platform Optimization
•Server, client demands•Network performance•Energy-efficiency
Distributed Computing
•Scaling•Client diversity•Programmability
Visual Content
•Interoperability•User creation
Mobile Experience
•Better connectivity, BW•Sensor integration
![Page 40: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/40.jpg)
Rich Interaction Versus Complexity
Interactions Growing # of users
Scene complexity Growing # of objects
Realism Better object behavior
richness
com
plex
ity
interactioncomplexity
![Page 41: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/41.jpg)
SERVERS: 10x More Work75%+ Time = Compute Intensive Work
TYPE SOFTWAREMAX CLIENTSPER SERVER
MMORPGS
VWs Second Life 160
WoW 2500
CLIENTS: 3x CPU, 20x GPU 65%+ Time = Compute Intensive Work
Second Life 70 35-75
NETWORK: 100x BandwidthMaximum Bandwidth Limited byServer to Client 0
50
100
25 50 75 100 125 150Time (In Seconds)
Ban
dwid
th(In
KB
/s)) Cached
Uncached
Sources: WoW data (source www.warcraftrealms.com), Second Life data (source Intel Linden Labs CTO-CTO meeting and www.secondlife.com), and Intel measurements.
Platform Performance Demands
APPLICATION% CPU
UTILIZATION% GPU
UTILIZATION
2D Websites 20 0-1
Google Earth 50 10-15
![Page 42: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/42.jpg)
Scaling Performance: Parallelism
![Page 43: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/43.jpg)
0
16
32
48
64
0 16 32 48 64
Cores
Par
alle
l S
pee
du
p
Production Fluid
Production Face
Production Cloth
Game Fluid
Game Rigid Body
Game Cloth
Marching Cubes
Sports Video Analysis
Video Cast Indexing
Home Video Editing
Text Indexing
Ray Tracing
Foreground Estimation
Human Body Tracker
Portifolio Management
Geometric Mean
Graphics Rendering – Physical Simulation -- Vision – Data Mining -- Analytics
Applications Scale Well
![Page 44: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/44.jpg)
Intel® Thread Checker
Intel thread checker is an analysis tool that pinpoints hard-to-find threading errors like data races and deadlocks in 32-bit and 64-bit applications.
Intel® Thread Building Blocks
Intel threading building blocks (Intel TBB) is a C++ runtime library that abstracts the low-level threading details necessary for optimal multicore performance. implementation work.
Current Parallel Programming Products
![Page 45: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/45.jpg)
SmokeA game framework to maximize core utilization
Framework built for Nehalem and future processors targeting N-threads
Uses real game technologies (Havok, FMOD, Ogre3D, DX9, etc.)
Well partitioned and configurable
*Other names and brands may be claimed as the property of others.
![Page 46: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/46.jpg)
The FrameworkHow is the Smoke highly threaded?
Engine
ManagersFramework
Scheduler Parser Environment
Service
Platform
TaskScene CC Object CC
UScene
UObject UObject…
Systems
Definition Files
Interfaces
System
1. Scheduler manages system jobs
2. Change Control (CC) Manager minimizes thread synchronization
3. Data structured to support independent processing
4. System modularity (through interfaces)
5. Systems are specific to the demo (e.g. AI, physics, etc)
![Page 47: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/47.jpg)
Ct Research A throughput programming model
TVEC<F32> a(src1), b(src2);TVEC<F32> c = a + b;c.copyOut(dest);
1 1 0 00 1 0 1 0 1 0 00 0 1 1
1 1 0 00 1 0 1 0 1 0 00 0 1 1+
Thread 4
0 0 1 1
0 0 1 1+
Thread 3
0 1 0 0
0 1 0 0+
Thread 2
0 0 0 1
0 0 0 1+
Thread 1
1 1 0 1
1 1 0 1+
Ct JIT Compiler: Auto-vectorization, SSE, AVX, LarrabeeCore 1
SIMD Unit
Core 2
SIMD Unit
Core 3
SIMD Unit
Core 4
SIMD Unit
Programmer Thinks Serially; Ct Exploits Parallelism
Ct Parallel Runtime: Auto-Scale to Increasing
Cores
User Writes Core Independent C++ Code
![Page 48: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/48.jpg)
Global User Services, Agents, Data
Regional Simulation,Data assets, and Services
USERS ACT WORLD
REACTS
DISPLAYS REFRESH
CVC Processing
Loop
DATA PIPES: Potential Bottlenecks
“Light” Clients
Rendering & ReasoningServices Cloud
Sensors
Other CVC Environments
Visual Computing Clients
Distributed ComputingCVC Environment
![Page 49: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/49.jpg)
Ecosystem Building Blocks
H/W Platforms (Server/Client)
S/W Platforms (Engines)
Service Providers
Device ManufacturersSalesOEMsGPU VendorsCPU Vendors
Content ToolsDevelopment ToolsS/W InfrastructureGame EnginesO/S
World OperatorsInfrastructureMarketing - Ad, Promotion…Digital Asset MarketplaceEnterprise Integration
A broad effort is required to fully enable CVC
![Page 50: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/50.jpg)
Open Standards Accelerated The Internet
Proprietary
Proprietary
Proprietary
1993-1995
Browser
HTMLServer
HTTP
Walled Gardens Open Standards
*Other names and brands may be claimed as the property of others.
![Page 51: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/51.jpg)
CVC Future ArchitectureCommon Building Blocks
Presentation
RenderingServices
A/V EffectsServices
UserFeedback
Behavior
UserInput/Control
ScriptedBehavior
GamePhysics
Support Services
Asset &Inventory
TransactionsIdentity
WORLD SIMULATOR
Communication
Sensors/Context
![Page 52: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/52.jpg)
User Input & Control
Rendering
Audio/Visual Effects
VIEW (CLIENT)
Simulation, Synchronization
TransactionsCommunication
Identities & Assets
WORLD (SERVER)
Support Services
Asset &Inventory
Transactions
Identity
CommunicationBehavioralfunctions
UserInput/Control
ScriptedBehavior
Physics
Sensors/Context
Presentation
RenderingServices
A/V EffectsServices
UserFeedback
Tomorrow: More horizontal, open, building blocks
Today:Vertical
proprietary,CVC apps
CVC Future ArchitectureFrom Monolithic to Building Blocks
![Page 53: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/53.jpg)
Example: OpenSim
Platform for “Creating and Deploying 3D Environments”
Diverse Dev Community Virtual World Service Providers IBM™, Microsoft™, Intel™
Highly modular architecture Protocols Physics Script Engines
![Page 54: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/54.jpg)
Visual Content
Easy User-Generation Professional End-user
Interoperability Own, share “my” content
Scalable Delivery Pre-distribution
Just in time distribution, caching
![Page 55: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/55.jpg)
Example: Simplifying Content CreationParameterized Content Research
Full Narrow
FULLNESS
Flat Round
FLATNESS
Square Triangle
SHAPE
Sharp RoundCHIN
3D Face Database
Create aFace Model
Simple ControlParameters
CustomizedFaces
Expression Modeling
![Page 56: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/56.jpg)
Intel’s CVC Research Agenda
• Workload Characterization• Understanding platform demands • Optimizations for future platforms
• Scalable system, app architectures• Dynamic repartitioning of workloads• Execution on diverse clients
CHALLENGE RESEARCH
Platform Optimization
Distributed Computation
Mobile Experience
Visual Content
• Data-enhanced real world interaction • Mirror-world creation and navigation
• Parameterized Content • Easy User-generated 3D Content • Standards enabling content reuse
![Page 57: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/57.jpg)
Summary
CVC apps offer a compelling user experience …however…some real challenges at the platform HW and
SW model Scaling through parallelism (many threads, many core) Several platform challenges – power, memory bandwidth,
heterogeneous HW integration, scalable compute resources, etc. Distributed computing – from cloud to handheld and everything in
between Programming models must make content creation, integration,
and context aware delivery/interaction “seamless and easy” Simultaneously cannot ignore legacy usages
Promising research results - early implementations address many of these challenges…tremendous opportunity for HW/SW architecture innovation for substantive end-user benefit
![Page 58: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/58.jpg)
Evals & Recordings
Please fill
out your
evaluation for
this session at:
This session will be available as a recording at:
www.microsoftpdc.com
![Page 59: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/59.jpg)
Please use the microphones provided
Q&A
![Page 60: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/60.jpg)
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
![Page 61: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/61.jpg)
Observations:
• Significant client/server compute every cycle• Many aspects best computed on client • Extensive use of MIPS, FLOPS, threads• Partitioning depends on client capability, connectivity
Client (moving to TS) Tera-scale Server or Compute Cloud
User Inputs
Rendering
Audio/Visual Effects Animation Spatial Audio Smoke, Crowds, Fluids
Send Requested Update
World Simulation Collision Physics NPC Script Execution Simulation
Get Input
Display Updates
User Takes Action on the “World”
Collects ChangesFrom All Users
Resolves All Object Behaviors and Interactions
Generates A NewPer-User Model
MoreServerCompute
MoreClientCompute
AlwaysConnected
Processing Will Span Client/Server
![Page 62: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/62.jpg)
• Combining location data, a camera, online satellite maps and social networking• Provides an enhanced view of the real world
Mirroring the Real World
![Page 63: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/63.jpg)
OpenSim ArchitectureAt the center of interoperability innovation
Identity Inventory Assets PresenceWorld MapVoice
CORE INFRASTRUCTURE
DECENTRALIZED SIMULATORS
S
S S
S S
S S
S S
S S
S S S
S
S
S
S
WORLD MAP Simulator
CollisionDetection
ScriptEngine
ObjectModel Game
Engine
![Page 64: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/64.jpg)
Virtual Worlds“Connected” Visual Computing
Users Collaborate & Play
Scenario Play
Virtual Teamroom
Users CreateWorld of Warcraft Avatar
Eiffel Tower in
Google Earth
Users Explore and Learn
Qwaq TreefortVirtual Room
Machinima Interactive Movies
Users Enhance the Actual World
West Nile VirusVisualization
Visualizing RealWorld Information --
Dust storm in Morocco
CVC apps will transform the Internet from 2D to 3D…but require LOTS of compute horsepower
![Page 65: John Feo Architect Microsoft Corporation SYMP02. Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more](https://reader036.vdocuments.mx/reader036/viewer/2022062716/56649dc55503460f94ab8ecd/html5/thumbnails/65.jpg)