1 design patterns for parallel programming research work by: kurt keutzer, eecs, uc berkeley tim...
Post on 17-Dec-2015
216 Views
Preview:
TRANSCRIPT
1
Design Patterns for Parallel Programming
Research work by:Kurt Keutzer, EECS, UC Berkeley
Tim Mattson, Intel Corporation
Presented to faculty at Tsinghua Multicore Workshop
Michael Wrinn, Intel Corporation
2
Outline
A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: pattern language case studies Examples: pattern language in Smoke Demo
Outline Motivation – what is the problem we are trying to solve How do programmers think? Psychology meets computer science. Pattern Language for Parallel programming
Toy problems showing some key PLPP patterns A PLPP example (molecular dynamics)
Expanding the effort: Patterns for engineering parallel software A Survey of some key patterns Case studies
3
Microprocessor trends
IBM Cell
NVIDIA Tesla C1060 Intel Terascale research chip
ATI RV770
3rd party names are the property of their owners.
Many core processors are the “new normal”.
80 cores
240 cores
1 CPU + 6 cores
160 cores
5
The State of the field A harsh assessment …
We have turned to multi-core chips not because of the success of our parallel software but because of our failure to continually increase CPU frequency.
Result: a fundamental and dangerous (for the computer industry) mismatch Parallel hardware is ubiquitous. Parallel software is rare
The multi-core Challenge: How do we make Parallel software as ubiquitous as parallel hardware?
This could be the greatest challenge ever faced by the computer industry
We need to create a new generation of parallel programmers
Parallel Programming is difficult, error prone and only accessible to small cadre of experts … Clearly we haven’t been doing a very good job at educating programmers.
Consider the following: All known programmers are human beings. Hence, human psychology, not hardware design, should guide
our work on this problem.
6
We need to focus on the needs of the programmer, not the needs of the computer
Outline Motivation – what is the problem we are trying to solve How do programmers think? Psychology meets computer science. Pattern Language for Parallel programming
Toy problems showing some key PLPP patterns A PLPP example (molecular dynamics)
Expanding the effort: Patterns for engineering parallel software A Survey of some key patterns Case studies
7
Cognitive Psychology and human reasoning
Human Beings are model builders We build hierarchical complexes of mental models. Understand sensory input in terms of these models. When input conflicts with models, we tend to believe the models.
Consider the following slide …
Why did most of you see motion?
Your brain’s visual system contains a model that says this combination of geometric shapes and colors implies motion.
Your brain believes the model and not what your eyes see..
To understand people and how we work, you need understand the models we work with.
Programming and models Programming is a process of successive
refinement of a problem over a hierarchy of models. [Brooks83]
The models represent the problem at a different level of abstraction. The top levels express the problem in the original
problem domain. The lower levels represent the problem in the
computer’s domain. The models are informal, but detailed enough to
support simulation.
Domain
Problem Specific: polygons,
rays, molecules, etc.
OpenMP’s fork/join,
Actors
Threads – shared memory
Processes – shared nothing
Registers, ALUs, Caches, Interconnects, etc.
Models
Specification
Programming
Computation
Machine (AKA Cost Model)
Model based reasoning in programming
Programming process: getting started.
The programmer starts by constructing a high level model from the specification.
Successive refinement starts by using some combination of the following techniques: The problem’s state is defined in terms of objects
belonging to abstract data types with meaning in the original problem domain.
Key features of the solution are identified and emphasized. These features, sometimes referred to as "beacons” [Wiedenbeck89] , emphasize key aspects of the solution.
The programming process
Programmers use an informal, internal notation based on the problem, mathematics, programmer experience, etc. Within a class of programming languages, the
program generated is only weakly dependent on the language. [Robertson90] [Petre88]
Programmers think about code in chunks or “plans”. [Rist86] Low level plans code a specific operation: e.g.
summing an array. High level or global plans relate to the overall
program structure.
Programming Process: Strategy + Opportunistic Refinement
Common strategies are: Backwards goal chaining - start at result and work
backwards to generate sub goals. Continue recursively until plans emerge.
Forward chaining: Directly leap to plans when a problem is familiar. [Rist86]
Opportunistic Refinement: [Petre90]
Progress is made at multiple levels of abstraction. Effort is focused on the most productive level.
Programming Process: the role of testing
Programmers test the emerging solution throughout the programming process.
Testing consists of two parts: Hypothesis generation: The programmer forms an
idea of how a portion of the solution should behave. Simulation: The programmer runs a mental simulation
of the solution within the problem models at the appropriate level of abstraction. [Guindom90]
From psychology to software Hypothesis:
A Design Pattern language provides the roadmap to apply results from the psychology of programming to software engineering:
Design patterns capture the essence of plans The structure of patterns in a pattern language should mirror
the types of models programmers use. Connections between patterns must fit well with goal-chaining
and opportunistic refinement.
References [Brooks83] R. Brooks, "Towards a theory of the comprehension of computer programs",
International Journal of Man-Machine Studies, vol. 18, pp. 543-554, 1983. [Guindom90] R. Guindon, "Knowledge exploited by experts during software system
design", Int. J. Man-machine Studies, vol. 33, pp. 279-304, 1990 [Hoc90] J.-M. Hoc, T.R.G. Green, R. Samurcay and D.J. Gilmore (eds.), Psychology of
Programming, Academic Press Ltd., 1990. [Petre88] M. Petre and R.L. Winder, "Issues governing the suitability of programming
languages for programming tasks. "People and Computers IV: Proceedings of HCI-88, Cambridge University Press, 1988.
[Petre90] M. Petre, "Expert Programmers and Programming Languages", in [Hoc90], p. 103, 1990.
[Rist86] R.S. Rist, "Plans in programming: definition, demonstration and development" in E. Soloway and S. Iyengar (Eds.), Empirical Studies of Programmers, Norweed, NF, Ablex, 1986.
[Rist90] R.S. Rist, "Variability in program design: the interaction of process with knowledge", International Journal of Man-Machine Studies, Vol. 33, pp. 305-322,1990.
[Robertson90] S. P. Robertson and C Yu, "Common cognitive representations of program code across tasks and languages", int. J. Man-machine Studies, vol. 33, pp. 343-360, 1990.
[Wiedenbeck89] S. Wiedenbeck and J. Scholtz, "Beacons: a knowledge structure in program comprehension", In G. Salvendy and M.J. Smith (eds.) Designing and Using Human Computer interfaces and Knowledge-based systems, Amsterdam: Elsevier,1989.
18
People, Patterns, and Frameworks
Design Patterns Frameworks
Application Developer Uses application design patterns
(e.g. feature extraction) to design the
application
Uses application frameworks(e.g. CBIR)
to develop application
Application-Framework Developer
Uses programming design patterns (e.g. Map/Reduce)to design the application framework
Uses programming design patterns(e.g MapReduce) to develop the application framework
Eventually
Domain literate programming gurus (1% of the population).
Application frameworks
Parallel patterns &
programming frameworks
+
Parallel programming gurus (1-10% of programmers)Parallel
programming frameworks
3
2
Domain ExpertsEnd-user,
application programs
Application patterns &
frameworks
+1
The hope is for Domain Experts to create parallel code with little or no understanding of parallel programming.
Leave hardcore “bare metal” efficiency layer programming to the parallel programming experts
Today
Domain literate programming gurus (1% of the population).
Application frameworks
Parallel patterns &
programming frameworks
+
Parallel programming gurus (1-10% of programmers)Parallel
programming frameworks
3
2
Domain ExpertsEnd-user,
application programs
Application patterns &
frameworks
+1
• For the foreseeable future, domain experts, application framework builders, and parallel programming gurus will all need to learn the entire stack.
• That’s why you all need to be here today!
Definitions - 1
Design Patterns: “Each design pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.“ Page x, A Pattern Language, Christopher Alexander
Structural patterns: design patterns that provide solutions to problems associated with the development of program structure
Computational patterns: design patterns that provide solutions to recurrent computational problems
Definitions - 2
Library: The software implementation of a computational pattern (e.g. BLAS) or a particular sub-problem (e.g. matrix multiply)
Framework: An extensible software environment (e.g. Ruby on Rails) organized around a structural pattern (e.g. model-view-controller) that allows for programmer customization only in harmony with the structural pattern
Domain specific language: A programming language (e.g. Matlab) that provides language constructs that particularly support a particular application domain. The language may also supply library support for common computations in that domain (e.g. BLAS). If the language is restricted to maintain fidelity to a structure and provides library support for common computations then it encompasses a framework (e.g. NPClick).
13 dwarves
Getting our software act together:First step … Define a conceptual roadmap to guide our work
25
Alexander’s Pattern Language
Christopher Alexander’s approach to (civil) architecture: "Each pattern describes a problem
which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.“ Page x, A Pattern Language, Christopher Alexander
Alexander’s 253 (civil) architectural patterns range from the creation of cities (2. distribution of towns) to particular building problems (232. roof cap)
A pattern language is an organized way of tackling an architectural problem using patterns
Main limitation: It’s about civil not software
architecture!!!
26
Alexander’s Pattern Language (95-103)
Layout the overall arrangement of a group of buildings: the height and number of these buildings, the entrances to the site, main parking areas, and lines of movement through the complex.
95. Building Complex
96. Number of Stories
97. Shielded Parking
98. Circulation Realms
99. Main Building
100. Pedestrian Street
101. Building Thoroughfare
102. Family of Entrances
103. Small Parking Lots
27
Family of Entrances (102)May be part of Circulation Realms (98).
Conflict:
When a person arrives in a complex of offices or services or workshops, or in a group of related houses, there is a good chance he will experience confusion unless the whole collection is laid out before him, so that he can see the entrance of the place where he is going.
Resolution:Lay out the entrances to form a family. This means:
1) They form a group, are visible together, and each is visible from all the others.
2) They are all broadly similar, for instance all porches, or all gates in a wall, or all marked by a similar kind of doorway.
May contain Main Entrance (110), Entrance Transition (112), Entrance Room (130), Reception Welcomes You (149).
29
Computational Patterns
The Dwarfs from “The Berkeley View” (Asanovic et al.)Dwarfs form our key computational patterns
30
Patterns for Parallel Programming• PLPP is the first attempt to
develop a complete pattern language for parallel software development.
• PLPP is a great model for a pattern language for parallel software
• PLPP mined scientific applications that utilize a monolithic application style
• PLPP doesn’t help us much with horizontal composition
• Much more useful to us than: Design Patterns: Elements of Reusable Object-Oriented Software, Gamma, Helm, Johnson & Vlissides, Addison-Wesley, 1995.
31
Structural programming patterns In order to create more complex
software it is necessary to compose programming patterns
For this purpose, it has been useful to induct a set of patterns known as “architectural styles”
Examples: pipe and filter event based/event driven layered Agent and
repository/blackboard process control Model-view-controller
Elements of a Pattern DescriptionElements of a Pattern Description
• Name
• Problem: Classes of problems this pattern addresses
• Context Context in which this problem occurs
• Forces Trade-offs that crop up in this situation
• Solution Solution the pattern embodies
• Invariants Properties that need to always be true for this pattern to work
• Examples
• Known uses
• Related Patterns
33
34
Graph Algorithms
Dynamic Programming
Dense Linear Algebra
Sparse Linear Algebra
Unstructured Grids
Structured Grids
Model-view controller
Iterator
Map reduce
Layered systems
Arbitrary Static Task Graph
Pipe-and-filter
Agent and Repository
Process Control
Event based, implicit invocation
Graphical models
Finite state machines
Backtrack Branch and Bound
N-Body methods
Circuits
Spectral Methods
Task Decomposition ↔ Data DecompositionGroup Tasks Order groups data sharing data access
Applications
Pipeline
Discrete Event
Event Based
Divide and Conquer
Data Parallelism
Geometric Decomposition
Task Parallelism
Graph algorithms
Fork/Join
CSP
Master/worker
Loop Parallelism
BSP
Distributed Array Shared-Data
Shared Queue
Shared Hash Table
Barriers
Mutex
Thread Creation/destruction
Process/Creation/destruction
Message passing
Collective communication
Speculation
Transactional memory
Choose your high level structure – what is the structure of my application? Guided expansion
Identify the key computational patterns – what are my key computations?Guided instantiation
Implementation methods – what are the building blocks of parallel programming? Guided implementation
Choose your high level architecture - Guided decomposition
Refine the structure - what concurrent approach do I use? Guided re-organization
Utilize Supporting Structures – how do I implement my concurrency? Guided mapping
Pro
du
ctiv
ity
Lay
erE
ffic
ien
cy L
ayer
Digital Circuits
Semaphores
Programming Pattern Language 1.0 Keutzer& Mattson
35
Decompose Tasks• Group tasks• Order Tasks
Architecting Parallel Software
Identify the Software Structure
Identify the Key Computations
Decompose Data• Identify data sharing• Identify data access
36
• Pipe-and-Filter• Agent-and-Repository• Event-based coordination• Iterator• MapReduce• Process Control• Layered Systems
Identify the SW Structure
Structural Patterns
These define the structure of our software but they do not describe what is computed
38
Identify Key Computations
Computational patterns describe the key computations but not how they are implemented
Computational Patterns
40
• Pipe-and-Filter• Agent-and-Repository• Event-based• Bulk Synchronous• MapReduce• Layered Systems• Arbitrary Task Graphs
Decompose Tasks/Data
Order tasks Identify Data Sharing and Access
• Graph Algorithms• Dynamic programming• Dense/Spare Linear Algebra • (Un)Structured Grids• Graphical Models• Finite State Machines• Backtrack Branch-and-Bound• N-Body Methods• Circuits• Spectral Methods
Architecting Parallel Software
Identify the Software Structure
Identify the Key Computations
41
Analogy: Architected Factory
Raises appropriate issues like scheduling, latency, throughput, workflow, resource management, capacity etc.
42
Outline
A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: pattern language case studies Examples: pattern language in Smoke Demo
Inventory of Structural Patterns
1. pipe and filter
2. iterator
3. MapReduce
4. blackboard/agent and repository
5. process control
6. model-view controller
7. layered
8. event-based coordination
44
Elements of a structural pattern
Components are where the computation happens
Connectors are where the communication happens
A configuration is a graph of components (vertices) and connectors (edges)
A structural pattern may be described as a family of graphs.
45
Filter 6
Filter 5
Filter 4
Filter 2
Filter 7
Filter 3
Filter 1
Pattern 1: Pipe and Filter• Filters embody computation• Only see inputs and produce
outputs• Pipes embody
communication
May have feedback
Examples?
46
Examples of pipe and filter
Almost every large software program has a pipe and filter structure at the highest level
Logic optimizerImage Retrieval SystemCompiler
47
Pattern 2: Iterator Pattern
itera
te
Exit condition met?
Initialization condition
Synchronize results of iteration
Variety of functions performed asynchronously
Yes
No
Examples?
4848
Example of Iterator Pattern:Training a Classifier: SVM Training
48
UpdatesurfaceUpdatesurface
IdentifyOutlierIdentifyOutlier
itera
te
Iterator Structural Pattern
All points withinacceptable error? Yes
No
49
Pattern 3: MapReduce
To us, it means A map stage, where data is mapped onto independent
computations A reduce stage, where the results of the map stage are
summarized (i.e. reduced)
Map
Reduce
Map
Reduce
Examples?
50
Examples of Map Reduce
General structure: Map a computation across distributed data sets Reduce the results to find the best/(worst),
maxima/(minima)
Speech recognition• Map HMM computation
to evaluate word match• Reduce to find the
most-likely word sequences
Support-vector machines (ML)• Map to evaluate distance from
the frontier • Reduce to find the greatest
outlier from the frontier
51
Pattern 4: Agent and Repository
Repository/Blackboard
(i.e. database)
Agent 2Agent 1
Agent 4
Agent and repository : Blackboard structural patternAgents cooperate on a shared medium to produce a resultKey elements: Blackboard: repository of the resulting creation that is
shared by all agents (circuit database) Agents: intelligent agents that will act on blackboard
(optimizations) Manager: orchestrates agents access to the blackboard and
creation of the aggregate results (scheduler)
Agent 3
Examples?
52
Example: Compiler Optimization
Constantfolding
loopfusion
Softwarepipelining
Common-sub-expressionelimination
Strength-reduction
Dead-code elimination
Optimization of a software program Intermediate representation of program is stored in the repository Individual agents have heuristics to optimize the program Manager orchestrates the access of the optimization agents to the
program in the repository Resulting program is left in the repository
InternalProgram
representation
53
Example: Logic Optimization
Optimization of integrated circuits Integrated circuit is stored in the repository Individual agents have heuristics to optimize the circuitry of an
integrated circuit Manager orchestrates the access of the optimization agents to the
circuit repository Resulting optimized circuit is left in the repository
timingopt agent 1
timingopt agent 2
timingopt agent 3
timingopt agent N……..
Circuit Database
54
Pattern 5: Process Control
Process control: Process: underlying phenomena to be controlled/computed Actuator: task(s) affecting the process Sensor: task(s) which analyze the state of the process Controller: task which determines what actuators should be
effected
processcontroller
input variables
controlledvariables
controlparameters
manipulatedvariables
senso
rs
actuators
Source: Adapted from Shaw & Garlan 1996, p27-31.
Examples?
55
Examples of Process Control
Circuitcontroller
usertiming
constraints
Speed?
Launchingtransformations
Timingconstraints
Power?
Process control structural pattern
56
Pattern 6: Model-View-Controller
• Model: embodies the data and “intelligence” (aka business logic) of the system
• Controller: captures all user input and translates it into actions on the model
• View: renders the current state of the model for userExamples?
57
Example of Model-View Controller
View 2View 2View 1View 1
20%30%
50%
Control FormControl Form
Values
View 1
View 2
50%
30%
20%
Back Back
Controllera = 50%b = 30%c = 20%
Model
User Updates Values
&Presses ‘View 1’ Button
State Change
Controller Selects View View
Determines Model State
Extended from: Design Patterns Elements of Reusable Object-
Oriented Software
58
Pattern 7: Layered Systems
Individual layers are big but the interface between two adjacent layers is narrow
Non-adjacent layers cannot communicate directly. Examples?
60
Pattern 8: Event-based Systems
Agents interact via events/signals in a medium Event manager manages events Interaction among agents is dynamic – no fixed connection
Agent
Agent
Agent Agent
AgentAgent
Agent
Agent
EventManager
Medium
Examples?
61
Example: The Internet
128.0.0.56
• Internet is the medium• Computers are agents• Signals are IP packets• Control plane of the router is
the event manager
62
Remember the Analogy: Layout of Factory Plant
We have only talked about structure. We haven’t described computation.
63
Decompose Tasks• Group tasks• Order Tasks
Architecting Parallel Software
Identify the Software Structure
Identify the Key Computations
Decompose Data• Identify data sharing• Identify data access
64
Outline
A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: design patterns in Smoke Demo
66
Problem and Context
Problem In many situations the proper behavior of a system can naturally
be described by a language of finite, or perhaps infinite, strings. The problem is to define a piece of software that distinguishes between valid input strings (associated with proper behavior) and invalid input strings (improper behavior). The system may have a set of pre-defined responses to proper input and another set of responses to improper input.
Context: As inputs arrive a system must respond depending on the input.
Alternatively, depending on inputs, changes in a process may be actuated. The proper response of the system may be to idle after receiving a sequence of inputs, or the input stream may be presumed to be infinite.
67
Solution: Finite State Machines
Transducer FSM: Huffman decoding
A 0
B 10
C 1100
D 1101
E 1110
F 1111
Symbol Codeword
a b c
d
e
1/-0/B
0/A0/C, 1/D
0/E, 1/F
1/-1/-
0/-
Input Alphabet : 0, 1Output Alphabet : A - FStates: a,b,c,d,e Transitions:
( (a, 0) , (a, A)), ( (a, 1) , (b,-)),…
Initial state: a
Output
After: Multimedia Image and Video Processing By Ling Guan, Jan Larsen
68
Example: Traffic Light State Machine
IF A=1
AND B=0
Always
IF A=0
AND B=1
Otherwise
Otherwise
Always
Note:Clock beatsevery 4 sec.
So Light
is Yellow
for 4 sec.
CarSensors
B
A
69
Problem and Context
Problem In many problems the output is a simple logical function, or
bit-wise permutation, of the input.
Context: A vector of Boolean values is applied as input and another
set, defined by combinational operators or “wiring”, is given as output.
71
DESchosenplaintext
Kp
Kp-1
Kp-2
K3
K2
K1
new SP
MASK
test DP? EP
'1'
01
01
Example:Digital EncryptionStandard (DES)
72
Outline
A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: pattern language case studies Examples: pattern language in Smoke Demo
73
Architecture of Logic Optimization
Graph algorithm pattern Graph algorithm pattern
Group, order tasks
Group, order tasks
Group, order tasks
DecomposeData
7474
Architecting Speech Recognition
Signal Processing
Inference Engine
Recognition Network
Voice Input
Most Likely Word
Sequence
Iterator
Pipe-and-filter
MapReduce
Beam Search Iterations
Active State Computation Steps
Dynamic Programming
Graphical Model
Pipe-and-filter
Large Vocabulary Continuous Speech Recognition Poster: Chong, YiWork also to appear at Emerging Applications for Manycore Architecture
7575
CBIR Application Framework
ResultsResults
Exercise ClassifierExercise Classifier
Learn ClassifierLearn Classifier
Feature ExtractionFeature Extraction
User FeedbackUser Feedback
Choose ExamplesChoose Examples
New ImagesNew Images
0
50
100
150
200
250
?
?
Catanzaro, Sundaram, Keutzer, “Fast SVM Training and Classification on Graphics Processors”, ICML 2008
7676
Feature Extraction
Image histograms are common to many feature extraction procedures, and are an important feature in their own right
76
• Agent and Repository: Each agent computes a local transform of the image, plus a local histogram.
• Results are combined in the repository, which contains the global histogram The data dependent access patterns found when
constructing histograms make them a natural fit for the agent and repository pattern
7777
Learn Classifier: SVM Training
77
Update Optimalit
y Condition
s
Update Optimalit
y Condition
s
Select Working
Set, Solve QP
Select Working
Set, Solve QP
Learn ClassifierLearn Classifier itera
te
Bulk Synchronous Structural Pattern
MapReduce
MapReduce
7878
Exercise Classifier : SVM Classification
Compute dot
products
Compute dot
products
Compute Kernel
values, sum & scale
Compute Kernel
values, sum & scale
Output
Test Data
SV
Exercise ClassifierExercise Classifier
MapReduce
Dense LinearAlgebra
79
Outline
A Programming Pattern Language Motivation – patterns: why and what? Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: pattern language case studies Examples: pattern language in Smoke Demo
80
Recap: what we’re trying to accomplish Ultimately, we want domain-expert programmers to routinely
create parallel programs. Domain-Expert programmers:
Driven by the need to ship robust solutions on schedule. New features are more important than performance. Little or no background in computer science or concurrency. Finished with formal education … “on the job” learning in burst mode.
Note: we not trying to tell concurrency experts how to write parallel programs Expert parallel programmers … they already have what they need. HPC programmers … they want performance at all costs and will work
as close to the hardware as needed to hit performance goals.
81
How do we support domain experts:
Domain literate programming gurus (1% of the population).
Application frameworks
Parallel programming frameworks
+
Parallel programming gurus (1-10% of programmers)Parallel
programming frameworks
3
2
Domain ExpertsEnd-user, application programs
Application frameworks
+1
The hope is for Domain Experts to create parallel code with little or no understanding of parallel programming.
Leave hardcore “bare metal” efficiency layer programming to the parallel programming experts
82
How will we make this happen?
Software architecture systematically described in terms of a design pattern language: Everyone … even the “gurus” … need direction.
The “parallel programming gurus” need to know the variety of parallel programming patterns in use.
The “domain literate programming gurus” need to know which application patterns to support.
… and we need to document these so everyone can work with them at the appropriate level of detail.
By expressing parallel programming in terms of design pattern languages:
We provide this direction to the gurus We document the frameworks and how they work. We layout a roadmap to solving the parallel programming problem.
83
Graph Algorithms
Dynamic Programming
Dense Linear Algebra
Sparse Linear Algebra
Unstructured Grids
Structured Grids
Model-view controller
Iterator
Map reduce
Layered systems
Arbitrary Static Task Graph
Pipe-and-filter
Agent and Repository
Process Control
Event based, implicit invocation
Graphical models
Finite state machines
Backtrack Branch and Bound
N-Body methods
Circuits
Spectral Methods
Task Decomposition ↔ Data DecompositionGroup Tasks Order groups data sharing data access
Applications
Pipeline
Discrete Event
Event Based
Divide and Conquer
Data Parallelism
Geometric Decomposition
Task Parallelism
Graph algorithms
Fork/Join
CSP
Master/worker
Loop Parallelism
BSP
Distributed Array Shared-Data
Shared Queue
Shared Hash Table
Barriers
Mutex
Thread Creation/destruction
Process Creation/destruction
Message passing
Collective communication
Speculation
Transactional memory
Choose your high level structure – what is the structure of my application? Guided expansion
Identify the key computational patterns – what are my key computations?Guided instantiation
Implementation methods – what are the building blocks of parallel programming? Guided implementation
Choose your high level architecture - Guided decomposition
Refine the structure - what concurrent approach do I use? Guided re-organization
Utilize Supporting Structures – how do I implement my concurrency? Guided mapping
Pro
du
ctiv
ity
Lay
erE
ffic
ien
cy L
ayer
Digital Circuits
Semaphores
Programming Pattern Language 1.0 Keutzer& Mattson
84
Graph Algorithms
Dynamic Programming
Dense Linear Algebra
Sparse Linear Algebra
Unstructured Grids
Structured Grids
Model-view controller
Iterator
Map reduce
Layered systems
Arbitrary Static Task Graph
Pipe-and-filter
Agent and Repository
Process Control
Event based, implicit invocation
Graphical models
Finite state machines
Backtrack Branch and Bound
N-Body methods
Circuits
Spectral Methods
Task Decomposition ↔ Data DecompositionGroup Tasks Order groups data sharing data access
Applications
Pipeline
Discrete Event
Event Based
Divide and Conquer
Data Parallelism
Geometric Decomposition
Task Parallelism
Graph algorithms
Fork/Join
CSP
Master/worker
Loop Parallelism
BSP
Distributed Array Shared-Data
Shared Queue
Shared Hash Table
Barriers
Mutex
Thread Creation/destruction
Process Creation/destruction
Message passing
Collective communication
Speculation
Transactional memory
Choose your high level structure – what is the structure of my application? Guided expansion
Identify the key computational patterns – what are my key computations?Guided instantiation
Implementation methods – what are the building blocks of parallel programming? Guided implementation
Choose your high level architecture - Guided decomposition
Refine the structure - what concurrent approach do I use? Guided re-organization
Utilize Supporting Structures – how do I implement my concurrency? Guided mapping
Pro
du
ctiv
ity
Lay
erE
ffic
ienc
y L
ayer
Digital Circuits
Semaphores
Programming Pattern Language 1.0 Keutzer& Mattson
Top levels: Structural and computational patterns: Note: in many cases, these pertain to
good software practices … both serial and parallel.
These patterns are all about parallel algorithms and how to turn them into code.
Many books talk about how to use a particular programming language … We instead focus on how to
use these languages and “think parallel”.
85
PLPP’s structure: Four design spaces in parallel software development
Original Problem Tasks, shared and local data
Decomposition
Implementation. & building blocks
Corresponding source code
Program SPMD_Emb_Par (){ TYPE *tmp, *func(); global_array Data(TYPE); global_array Res(TYPE); int N = get_num_procs(); int id = get_proc_id(); if (id==0) setup_problem(N,DATA); for (int I= 0; I<N;I=I+Num){ tmp = func(I); Res.accumulate( tmp); }}
Program SPMD_Emb_Par (){ TYPE *tmp, *func(); global_array Data(TYPE); global_array Res(TYPE); int N = get_num_procs(); int id = get_proc_id(); if (id==0) setup_problem(N,DATA); for (int I= 0; I<N;I=I+Num){ tmp = func(I); Res.accumulate( tmp); }}
Program SPMD_Emb_Par (){ TYPE *tmp, *func(); global_array Data(TYPE); global_array Res(TYPE); int N = get_num_procs(); int id = get_proc_id(); if (id==0) setup_problem(N,DATA); for (int I= 0; I<N;I=I+Num){ tmp = func(I); Res.accumulate( tmp); }}
Program SPMD_Emb_Par (){ TYPE *tmp, *func(); global_array Data(TYPE); global_array Res(TYPE); int Num = get_num_procs(); int id = get_proc_id(); if (id==0) setup_problem(N, Data); for (int I= ID; I<N;I=I+Num){ tmp = func(I, Data); Res.accumulate( tmp); }}
Concurrency
strate
gy
Units of execution + new shared data for extracted dependencies
86
Start with a specification that solves the original problem -- finish with the problem decomposed into tasks, shared data, and a partial ordering.
Design EvaluationDesign Evaluation
Start
DependencyAnalysis
DecompositionAnalysis
DataDecompositionDataDecomposition
TaskDecompositionTaskDecomposition
OrderGroupsOrderGroups
GroupTasksGroupTasks
DataSharingDataSharing
Structural Computational
decomposition
Concurrency strategy
Implementation strategy
Par prog building blocks
Decomposition(Finding Concurrency)
87
Concurrency Strategy(Algorithm Structure)
Start
Organize By Data
GeometricDecomposition
GeometricDecomposition
Linear?
Recursive?
Task Parallelism
Task Parallelism
Divide and Conquer
Divide and Conquer
Recursive Data
Recursive Data
Linear?
Recursive?
Organize By Flow of Data
Regular?
Irregular?
PipelinePipeline Event Based Coordination
Event Based Coordination
Design PatternDesign Pattern
Decision
Decision Point Key
Organize By Tasks
Structural Computational
decomposition
Concurrency strategy
Implementation strategy
Par prog building blocks
88
Implementation strategy(Supporting Structures)
High level constructs impacting large scale organization of the source code.
Program Structure
Master/WorkerMaster/Worker
SPMDSPMD
Loop ParallelismLoop Parallelism
Fork/JoinFork/Join
Data Structures
Shared DataShared Data
Shared QueueShared Queue
Distributed ArrayDistributed Array
Structural Computational
decomposition
Concurrency strategy
Implementation strategy
Par prog building blocks
89
Parallel Programming building blocks
Low level constructs implementing specific constructs used in parallel computing. Examples in Java, OpenMP and MPI.
These are not properly design patterns, but they are included to make the pattern language self-contained.
Process controlProcess control
Thread controlThread control
UE* ManagementSynchronization
Memory sync/fencesMemory sync/fences
barriersbarriers
Mutual ExclusionMutual Exclusion
* UE = Unit of execution
Collective CommCollective Comm
Message PassingMessage Passing
Other CommOther Comm
Communications
Structural Computational
decomposition
Concurrency strategy
Implementation strategy
Par prog building blocks
90
Case Studies Simple Examples
Numerical Integration
Linear Algebra Abstracting complexity in data structures
Spectral Methods Multi-level parallelism
Branch and Bound Managing recursive parallelism
Molecular dynamics A complete example from design to code.
Game Engines A complex example showing that there are multiple ways
to parallelize a problem
91
The heart of a game is the “Game Engine”
InputSim
data/stateAudio
Data/state
Render
Data/state
assetsMedia (disk, optical media)
control
Front End
Data
networkAnimation
game state in its internally managed data structures.
Source: Conversations with developers at Electronic Arts
92
The heart of a game is the “Game Engine”
InputSim
data/stateAudio
Data/state
Render
Data/state
assetsMedia (disk, optical media)
control
Front End
Data
networkAnimation
game state in its internally managed data structures.
Physics Particle System
AICollision Detection
Time Loop
Sim is the integrator and integrates from one time step to the next; calling update
methods for the other modules inside a central time loop.
Sim is the integrator and integrates from one time step to the next; calling update
methods for the other modules inside a central time loop.
Sim
Source: Conversations with developers at Electronic Arts
93
Combine modules into groups, assign one group to a thread.
Asynchronous execution … interact through events.
Coarse grained parallelism dominated by flow of data between groups … event based coordination pattern.
Finding concurrency: functional parallelism
InputSim
data/stateAudio
Data/state
Render
Data/state
assetsMedia (disk, optical media)
control
Front End
Data
networkAnimation
94
The FindingConcurrency Design Space
Start with a specification that solves the original problem -- finish with the problem decomposed into tasks, shared data, and a partial ordering.
Design EvaluationDesign Evaluation
Start
DependencyAnalysis
DecompositionAnalysis
DataDecompositionDataDecomposition
TaskDecompositionTaskDecomposition
OrderGroupsOrderGroups
GroupTasksGroupTasks
DataSharingDataSharing
Many cores needs many concurrent tasks … functional parallelism just doesn’t
expose enough concurrency in this case. We need to go back and rethink the design.
95
More sophisticated parallelization strategy
InputSim
data/stateAudio
Data/state
Render
Data/state
assetsMedia (disk, optical media)
control
Front End
Data
networkAnimation
Decompose computation into a pool of tasks – finer granularity exposes more concurrency.
Work stealing critical to support good load balancing.
96
Parallel Execution Strategy: Task parallelism pattern
InputSim
data/state
Audio
Data/state
Render
Data/state
assetsMedia (disk, optical media)
control
Front End
Data
networkAnimation
A pool of threads executes tasks based on:
• When dependencies are met.
• Task Priority
The Challenge is scheduling based on bus usage, load balancing, and managing concurrency for correctness
Dependencies create an acyclic graph of tasks.
Tasks are assigned priorities with some labeled as optional.
97
The AlgorithmStructure Design Space
Start
Organize By Data
GeometricDecomposition
GeometricDecomposition
Linear? Recursive?
Task Parallelism
Task Parallelism
Divide and Conquer
Divide and Conquer
Recursive Data
Recursive Data
Linear? Recursive?
Organize By Flow of Data
Regular? Irregular?
PipelinePipeline Event Based Coordination
Event Based Coordination
Organize By Tasks
Solution: A composition of these two patterns
High level architecture … functional decomposition
Collections of tasks generated by modules
Structural Computational
decomposition
Concurrency strategy
Implementation strategy
Par prog building blocks
98
The Supporting Structures Design Space
Program Structure
Master/WorkerMaster/Worker
SPMDSPMD
Loop ParallelismLoop Parallelism
Fork/JoinFork/Join
Data Structures
Shared DataShared Data
Shared QueueShared Queue
Distributed ArrayDistributed Array
Manage task queues in a global pool (to support work stealing)
Top level functional Decomposition with MPI
Standard queue won’t work, will need to build more general data structures to support prioritized queue and interrupt capabilities
Top level functional Decomposition with MPI
Standard queue won’t work, will need to build more general data structures to support prioritized queue and interrupt capabilities
Manage task queues in a global pool (to support work stealing)
Top level functional Decomposition with MPI
Standard queue won’t work, will need to build more general data structures to support prioritized queue and interrupt capabilities
99
Architecture/Framework implicationsTop level structural patterns
Event based, implicit invocation
Inside each module Fine grained task decomposition Master worker pattern extended with
Work stealing Task priorities Soft real time constraints … interrupt, update state, flush queues
This would fit nicely into a framework … in fact it needs a framework since the task queues would
be very complicated to get right.
100
Outline The parallel computing landscape
A Programming Pattern Language Overview Structural patterns Computational patterns Composing patterns: examples Concurrency patterns and PLPP Examples: simple case studies
Applications in CAD The CAD algorithm landscape Parallel patterns in CAD: overview Detailed case studies: architecting parallel CAD software
How does architecting SW help?How does architecting SW help?
A good software architecture should provide: Composibility: allow you to build something complicated by
composing simple pieces Modularity: elements of the software are firewalled from each
other Locality: easy to identify what’s happening where May provide a basis for building a more general framework (e.g.
Model-view-controller and Ruby-on-Rails)
Patterns help by: Clearly identifying the problem Offering solutions that capture and embody a lot of shared
experience with solving the problem Offering a common vocabulary for this problem/solution process Also capturing common pitfalls of the solutions in the pattern
(e.g. repository will always become the bottleneck in data-and-respository)
101
Overall SummaryOverall Summary Manycore microprocessors are on the horizon and computationally intensive
applications, like computer-aided design, will want to exploit them
Incremental methods will not scale to exploit >= 32 processors: Use of threads (OpenMP, Win32, others) Incremental parallelization of serial codes Use of threading Tools and Library support
Need to re-architect software
Key to re-architect software for manycore is patterns and frameworks
We identified two key computational patterns for CAD: graph algorithms and backtracking/branch-and-bound
There is some software support in the from of a framework, Cilk, for backtracking/branch-and-bound
The key to parallelize graph algorithms is netlist/graph partitioning
We presented a number of approaches for effective graph partitioning
We aim to use this to produce a framework for graph algorithms
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
103
Design Patterns in education:lessons from history?
Early OO days
Perception: Object oriented? Isn’t that just an academic thing?
Usage: specialists only. Mainstream avoids with indifference, anxiety, or worse.
Performance: not so good.
Now
Perception: OO=programming
Isn’t this how it was always done?
Usage: cosmetically widespread, some key concepts actually deployed.
Performance: so-so, masked by CPU advances until now.
1994
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
104
Design Patterns in education:lessons from history?
Early PP days(now)
Perception: Parallel programming? Isn’t that just an HPC thing?
Usage: specialists only. Mainstream avoids with indifference, anxiety, or worse.
Performance: very good, for the specialists.
Future
Perception: PP=programming
Isn’t this how it was always done?
Usage: cosmetically widespread, some key concepts actually deployed.
Performance: broadly sufficient. Application domains greatly expanded.
2010?
Copyright © 2009, Intel Corporation. All rights reserved. *Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners
Smoke Pattern DecompositionMichael Anderson, Mark
Murphy,
Kurt Keutzer
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
106
Agenda
Characterize the architecture and computation of Smoke in terms of Our Pattern Language
• What is an effective multi-threaded game architecture?
• Can the architecture and computation in Smoke be described in Our Pattern Language?
• Which components of Smoke can be easily sped up on manycore devices?
Games as a potential ParLab app
• Where can we as grad students have the most impact?
• What else can manycore enable?
106/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
107
Our Pattern Language
Hierarchy of patterns that describe reusable components of software.
107/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
108
Agenda
Characterize the architecture and computation of Smoke in terms of Our Pattern Language
• What is an effective multi-threaded game architecture?
• Can the architecture and computation in Smoke be described in Our Pattern Language?
• Which components of Smoke can be easily sped up on manycore devices?
Games as a potential ParLab app
• Where can we as grad students have the most impact?
• What else can manycore enable?
108/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
109
Structural and Computational PatternsFirst lets talk about the structural and computational patterns we found in Smoke.
109/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
110
Input PhysicsPos, Velocity, Collision
AIPos, Velocity, Goal
GraphicsPos, Verts, Texture
InputInput data
AudioPos, SFX
Pos
Velocity
Intro
Smoke is composed of several subsystems
Some (Physics, Graphics . .) are large independent engines.
110/32
Pos
Pos
Top-Level: Architecture
Brad Werth, GDC 2009
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
111
Top-Level v1: Task Graph Pattern
There exist fixed dependencies between subsystems
Can be modeled as an arbitrary task graph
Example: Moving the zombie
• Keyboard -> AI -> Physics -> Graphics
111/32
Input
Physics
Graphics
AI
Effects
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
112
Top-Level v1: Iterator Pattern
Iterates over consecutive frames
Data from previous frame is used in the next
112/32
Input
Physics
Graphics
AI
Effects
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
113
Top-Level v2: Puppeteer
Subsystems have private data, need access to other subsystems' data
Don't want to write N*(N-1) interfaces
Common use is to manage communication between multiple simulators, each with different data structures
113/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
114
Task Graph (v1) vs. Puppeteer (v2)
Subsystem modularity is important (swappable)
Want to reduce the complexity of communication for a scalable multi-threaded design
Motivates the use of the puppeteer pattern instead of arbitrary task graph for subsystem communication
114/32
Input Physics Graphics Effects
Framework
Change Control Manager
Interfaces
AI
Brad Werth, GDC 2009
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
115
Agenda
Characterize the architecture and computation of Smoke in terms of Our Pattern Language
• What is an effective multi-threaded game architecture?
• Can the architecture and computation in Smoke be described in Our Pattern Language?
• Which components of Smoke can be easily sped up on manycore devices?
Games as a potential ParLab app
• Where can we as grad students have the most impact?
• What else can manycore enable?
115/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
116
Puppeteer Subsystem Patterns
Now that we’ve described the top-level structural patterns, look at the subsystems.
116/32
Input Physics Graphics Effects
Framework
Change Control Manager
Interfaces
AI
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
117
Physics Subsystem
From Havok User Guide
Pipe and filter and structure
117/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
118
Physics Subsystem
Constraint Solver – Minimize penalties and forces in a system subject to constraints.
Dense Linear Algebra computational pattern
118/32
> 180°
robbocode.com
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
119
Physics Subsystem
Collision Detection – Interpolate paths and test if overlap occurs during the current timestep
N-body computational pattern
119/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
120
AI Subsystem
Agent and Repository structural pattern
• Multiple agents access and modify shared data
• Examples: Version control systems, logic optimization, parallel SAT solvers
120/32
Shared Repository
(i.e., Database)
Agent 1
Agent 2
Agent N
. . .
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
121
AI Subsystem
Zombies and chickens update location in POI repository
One writer (zombie) many readers (chickens) reduces the burden on the repository controller
Agents (animals) are independent state machines
121/32Brad Werth, GDC 2009
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
122
Effects Subsystem
Procedural Fire: Two particle simulators
• Visible fire
• Invisible “Heat” particles
N-body computational pattern
122/32Hugh Smith, Intel
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
123
Parallel Implementation Patterns
We’ve described the structural and computational patterns. What about the lower layers?
123/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
124
Parallel Algorithm Strategy Patterns (1)At the top level, subsystems can run concurrently
• Communication within a single frame is eliminated by double-buffering
Smoke exploits this task parallelism (subject of Lab 2)
124/32
Frame 1 Data
Frame 2 Data
Graphics Physics AI . . .
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
125
Parallel Algorithm Strategy Patterns (2)Procedural Fire is highly data parallel
Smoke spreads the computation over 8 threads (ideal hardware utilization)
125/32
Hugh Smith, Intel
Brad Werth, GDC 2009
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
126
Parallel Algorithm Strategy Patterns (3)AI subsystem exploits task parallelism among independent agents (chickens)
126/32
Render
AI
Brad Werth, GDC 2009
Render
AI
Thread Profile Before
Thread Profile After
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
127
Implementation Strategy and Concurrent Execution Patterns
Task queue -> Thread pool
Intel Thread Building Blocks
127/32
Brad Werth, GDC 2009
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
128
Agenda
Characterize the architecture and computation of Smoke in terms of Our Pattern Language
• What is an effective multi-threaded game architecture?
• Can the architecture and computation in Smoke be described in Our Pattern Language?
• Which components of Smoke can be easily sped up on manycore devices?
Games as a potential ParLab app
• Where can we as grad students have the most impact?
• What else can manycore enable?
128/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
129
What was done in the demo?
129/32
Top level
• Task parallelism among subsystems
Procedural Fire
• Data parallel particle simulator
AI Subsystem
• Task parallelism among agents
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
130
What else can be done? (1)
130/32
Add data parallel effects: rain, ragdoll corpses, more things breaking apart
• Similar to procedural fire
• Can easily be added without changing gameplay
• Objects may or may not be registered with the scene graph
nvidia.com
Particle simulator
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
131
What else can be done? (2)
131/32
Speed up and enhance physics subsystem
• Independent “simulation islands” can be executed in parallel
• Parallel constraint solver & collision detection
• Speedup allows for more detailed and realistic simulation– More active independent objects in simulation– Bring in new algorithms from scientific computing
• Must ensure the worst-case computation can fit in a single frame
• Nvidia PhysX is working on this
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
132
What else can be done? (3)
132/32
Enable more complex AI
• Smoke’s AI is very simple. What is AI like in larger games?
• If AI is just state machines, we can add more
Add new interfaces
• Computer vision pose detection as input
• Overlaps well with our current research
• Microsoft Xbox – Project Natal
Xbox.com/projectnatal
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
133
Agenda
Characterize the architecture and computation of Smoke in terms of Our Pattern Language
• What is an effective multi-threaded game architecture?
• Can the architecture and computation in Smoke be described in Our Pattern Language?
• Which components of Smoke can be easily sped up on manycore devices?
133/32
Copyright © 2006, Intel Corporation. All rights reserved. Prices and availability subject to change without notice.*Other brands and names are the property of their respective owners
134
Summary and conclusions
Characterize the architecture and computation of Smoke in terms of Our Pattern Language
• What is an effective multi-threaded game architecture?– Puppeteer patterns allows for modularity, ease of development and
evolution
• Can the architecture and computation in Smoke be described in Our Pattern Language?– Yes, the description is natural
• Which components of Smoke can be easily sped up on manycore devices?– Data parallel effects, physics, AI, and new interfaces all show potential
benefit
134/32
top related