parallel computing 2007: science applications

PC07ScienceApps [email protected] 1

Parallel Computing 2007:Science Applications

February 26-March 1 2007Geoffrey Fox

Community Grids Laboratory Indiana University

505 N Morton Suite 224Bloomington IN

[email protected]


Four Descriptions of Matter -- Quantum, Particle, Statistical, Continuum

• Quantum Physics • Particle Dynamics• Statistical Physics • Continuum Physics

– These give rise to different algorithms and in some cases, one will mix these different descriptions. We will briefly describe these with a pointer to types of algorithms used.

– These descriptions underlie several different fields such as physics, chemistry, environmental modeling, climatology.

– indeed any field that studies physical world from a reasonably fundamental point of view.

– For instance, they directly underlie weather prediction as this is phrased in terms of properties of atmosphere.

– However, if you simulate a chemical plant, you would not phrase this directly in terms of atomic properties but rather in terms of phenomenological macroscopic artifacts - "pipes", "valves", "machines", "people" etc. (today several biology simulations are of this phenomenological type)

• General Relativity and Quantum Gravity – These describe space-time at the ultimate level but are not needed in

practical real world calculations. There are important academic computations studying these descriptions of matter.


Quantum Physics and Examples of Use of Computation

• This is a fundamental description of the microscopic world. You would in principle use it to describe everything but this is both unnecessary and too difficult both computationally and analytically.

• Quantum Physics problems are typified by Quantum Chromodynamics (QCD) calculations and these end up looking identical to statistical physics problems numerically. There are also some chemistry problems where quantum effects are important. These give rise to several types of algorithms. – Solution to Schrodinger's equation (a partial differential equation). This

can only be done exactly for simple 2-->4 particle systems – Formulation of a large matrix whose rows and columns are the distinct

states of the system. This is followed by typical matrix operations (diagonalization, multiplication, inversion)

– Statistical methods which can be thought of as Monte Carlo evaluation of integrals gotten in integral equation formulation of problem

• These are Grid (QCD) or Matrix


Particle Dynamics and Examples of Use of Computation

• Quantum effects are only important at small distances (10-13 cms for the so called strong or nuclear forces, 10-8 cm for electromagnetically interacting particles).

• Often these short distance effects are unimportant and it is sufficient to treat physics classically. Then all matter is made up of particles - which are selected from set of atoms (electrons etc.).

• The most well known problems of this type come from biochemistry. Here we study biologically interesting proteins which are made up of some 10,000 to 100,000 atoms. We hope to understand the chemical basis of life or more practically find which proteins are potentially interesting drugs.

• Particles each obey Newton's Law and study of proteins generalizes the numerical formulation of the study of the solar system where the sun and planets are evolved in time as defined by Gravity's Force Law


Particle Dynamics and Example of Astrophysics

• Astrophysics has several important particle dynamics problems where new particles are not atoms but rather stars, clusters of stars, galaxies or clusters of galaxies.

• The numerical algorithm is similar but there is an important new approach because we have a lot of particles (currently over N=107) and all particles interact with each other.

• This naively has a computational complexity of O(N2) at each time step but a clever numerical method reduces it to O(N) or O (NlogN).

• Physics problems addressed include: – Evolution of early universe structure of today – Why are galaxies spiral? – What happens when galaxies collide? – What makes globular clusters (with O(106) stars) like they are?


Statistical Physics and Comparison of Monte Carlo and Particle Dynamics

• Large systems reach equilibrium and ensemble properties (temperature, pressure, specific heat, ...) can be found statistically. This is essentially law of large numbers (central limit theorem).

• The resultant approach moves particles "randomly" asccording to some probability and NOT deterministically as in Newton's laws

• Many properties of particle systems can be calculated either by Monte Carlo or by Particle Dynamics. Monte Carlo is harder as cannot evolve particles independently.

• This can lead to (soluble!) difficulties in parallel algorithms as lack of independence implies that synchronization issues.

• Many quantum systems treated just like statistical physics as quantum theory built on probability densities


Continuum Physics as an approximation to Particle Dynamics

• Replace particle description by average. 1023 molecules in a molar volume is too many to handle numerically. So divide full system into a large number of "small" volumes dV such that: –

Macroscopic Properties: Temperature, velocity, pressure are essentially constant in volume

• In principle, use statistical physics (or Particle Dynamics averaged as "Transport Equations") to describe volume dV in terms of macroscopic (ensemble) properties for volume

• Volume size = dV must be small enough so macroscopic properties are indeed constant; dV must be large enough so can average over molecular motion to define properties – As typical molecule is 10-8 cm in linear dimension, these constraints

are not hard – Breaks down sometimes e.g. leading edges at shuttle reentry etc.

Then you augment continuum approach (computational fluid dynamics) with explicit particle method


Computational Fluid Dynamics• Computational Fluid Dynamics is dominant numerical field for

Continuum Physics • There are a set of partial differential equations which cover

– liquids including blood, oil etc.– gases including airflow over wings and weather

• We apply computational "fluid" dynamics most often to the gas - air. Gases are really particles

• If a small number (<106) of particles, use "molecular dynamics" and if a large number (1023) use computational fluid dynamics.


Computational Sweet Spots• A given application needs a certain computer performance to do a

certain style of computation• In 1980 we had a few megaflop (106 floating point operation/sec)

and this allowed simple two dimensional continuum physics simulations

• Now in 2005, we have “routinely” a few teraflop peak performance and this allows three dimensional continuum physics simulations

• However some areas need much larger computational power and haven’t reached “their sweet spot”– Some computations in Nuclear and Particle Physics are like this– One can study properties of particles with today’s computers but scattering

of two particles appears to require complexity 109 X 109

• In some areas you have two sweet spots – a low performance sweet spot for a “phenomenological model”– If you go to a “fundamental description”, one needs far more computer

power than is available today– Biology is of this type


What needs to be Solved?• A set of particles or things (cells in biology), transistors in circuit

simulation)– Solve couple ordinary differential equations

– There are lots of “things” to decompose over for parallelism

• One or more fields which are functions of space and time (continuum physics)– Discretize space and time and define fields on Grid points spread over

domain

– Parallelize over Grid points

• Matrices which could need to be diagonalized to find eigenvectors and eigenvalues– Quantun physics

– Mode analysis – principal components

– Parallelize over matrix elements


Classes of Physical Simulations• Mathematical (Numerical) formulations of simulations

fall into a few key classes which have their own distinctive algorithmic and parallelism issues

• Most common formalism is that of a field theory where quantities of interest are represented by densities defined over a 1,2,3 or 4 dimensional space. – Such a description could be “fundamental” as in

electromagnetism or relativity for gravitational field or “approximate” as in CFD where a fluid density averages over a particle description.

– Our Laplace example is of this form where field could either be fundamental (as in electrostatics) or approximate if comes from Euler equations for CFD


Applications reducing to Coupled set of Ordinary Differential Equations

• Another set of models of physical systems represent them as coupled set of discrete entities evolving over time – Instead of (x,t) one gets i(t) labeled by an index i

– Discretizing x in continuous case leads to discrete case but in many cases, discrete formulation is fundamental

• Within coupled discrete system class, one has two important approaches– Classic time stepped simulations -- loop over all i at fixed t

updating to

– Discrete event simulations -- loop over all events representing changes of state of i(t)


Particle Dynamics or Equivalent Problems• Particles are sets of entities -- sometimes fixed (atoms in a

crystal) or sometimes moving (galaxies in a universe)• They are characterized by force Fij on particle i due to

particle j• Forces are characterized by their range r: Fij(xi,xj) is zero if

distance |xi-xj| greater than r• Examples:

– The universe – A globular star cluster – The atoms in a crystal vibrating under interatomic forces – Molecules in a protein rotating and flexing under interatomic

force • Laws of Motion are typically ordinary differential

equations – Ordinary means differentiate wrt one variable -- typically time


Classes of Particle Problems• If the range r is small (as in a crystal), the one gets

numerical formulations and parallel computing considerations similar to those in Laplace example with local communication– We showed in Laplace module that efficiency increases as range

of force increases

• If r is infinite ( no cut-off for force) as in gravitational problem, one finds rather different issues which we will discuss in this module

• There are several “non-particle” problems discussed later that reduce to long range force problem characterized by every entity interacting with every other entity– Characterized by a calculation where updating entity i involves all

other entities j


Circuit Simulations I• An electrical or electronic network has the same structure as a

particle problem where “particles” are components (transistor, resistance, inductance etc.) and “force” between components i and j is nonzero if and only if i and j are linked in the circuit– For simulations of electrical transmission networks (the

electrical grid), one would naturally use classic time stepped simulations updating each component i from state at time t to state at time t+t.

• If one is simulating something like a chip, then time stepped approach is very wasteful as 99.99% of the components are doing nothing (i.e. remain in same state) at any given time step!– Here is where discrete event simulations (DES) are useful as one

only computes where the action is• Biological simulations often are formulated as networks where each

component (say a neuron or a cell) is described by an ODE and the network couples components


Circuit Simulations II• Discrete Event Simulations are clearly preferable on sequential

machines but parallel algorithms are hard due to need for dynamic load balancing (events are dynamic and not uniform throughout system) and synchronization (which events can be executed in parallel?)

• There are several important approaches to DES of which best known is Time Warp method originally proposed by David Jefferson -- here one optimistically executes events in parallel and rolls back to an earlier state if this is found to be inconsistent

• Conservative methods (only execute those events you are certain cannot be impacted by earlier events) have little paralellism– e.g. there is only one event with lowest global time

• DES do not exhibit the classic loosely synchronous compute-communicate structure as there is no uniform global time– typically even with time warp, no scalable parallelism


Discrete Event Simulations• Suppose we try to execute in parallel events E1 and E2 at times t1 and t2 with t1< t2.• We show the timelines of several(4) objects in the system and our two events E1 and E2 • If E1 generates no interfering events or one E*

12 at a time greater than t2 then our parallel execution of E2 is consistent• However if E1 generates E12 before t2 then execution of E2 has to be rolled back and E12 should be executed first

E1

E11

E2E12

E*12

E22

E21

Objects

in System

Time


Matrices and Graphs I• Especially in cases where the “force” is linear in the

i(t) , it is convenient to think of force being specified by a matrix M whose elements mij are nonzero if and only if the force between i and j is nonzero. A typical force law is: Fi = mij j(t)

• In Laplace Equation example, the matrix M is sparse ( most elements are zero) and this is a specially common case where one can and needs to develop efficient algorithms

• We discuss in another talk the matrix formulation in the case of partial differential solvers


Matrices and Graphs II• Another way of looking at these problems is as graphs G

where the nodes of the graphs are labeled by the particles i, and one has edges linking i to j if and only if the force Fij is non zero

• In these languages, long range force problems correspond to dense matrix M (all elements nonzero) and fully connected graphs G

1

2

3 4

56

7

8

9

10

11

12


Other N-Body Like Problems - I• The characteristic structure of N-body problem is an observable

that depends on all pairs of entities from a set of N entities. • This structure is seen in diverse applications: • 1) Look at a database of items and calculate some form of

correlation between all pairs of database entries • 2) This was first used in studies of measurements of a "chaotic

dynamical system" with points xi which are vectors of length m Put rij = distance between xi and xj in m dimensional space Then probability p(rij = r) is proportional to r(d-1) – where d (not equal to m) is dynamical dimension of system – calculate by forming all the rij (for i and j running over observable points

from our system -- usually a time series) and accumulating in a histogram of bins in r

– Parallel algorithm in a nutshell: Store histograms replicated in all processors, distribute vectors equally in each processor and just pipeline xj through processors and as they pass through accumulate rij ; add histograms together at end.


Other N-Body Like Problems - II• 3) Green's Function Approach to simple Partial Differential

equations gives solutions as integrals of known Green's functions times "source" or "boundary" terms. – For the simulation of earthquakes in GEM project the source

terms are strains in faults and the stresses in any fault segment are the integral over strains in all other segments

– Compared to particle dynamics, Force law replaced by Green's function but in each case total stress/Force is sum over contributions associated with other entities in formulation

• 4) In the so called vortex method in CFD (Computational Fluid Dynamics) one models the Navier Stokes Equation as the long range interactions between entities which are the vortices

• 5) Chemistry uses molecular dynamics and so particles are molecules but force is not Newton's laws usually but rather Van der Waals forces which are long range but fall off faster than 1/r2


Chapters 5-8 of Sourcebook• Chapters 5-8 are the main application

section of this book!

• The Sourcebook of Parallel Computing, Edited by Jack Dongarra, Ian Foster, Geoffrey Fox, William Gropp, Ken Kennedy, Linda Torczon, Andy White, October 2002, 760 pages, ISBN 1-55860-871-0, Morgan Kaufmann Publishers. http://www.mkp.com/books_catalog/catalog.asp?ISBN=1-55860-871-0

http://www.mkp.com/books_catalog/catalog.asp?ISBN=1-55860-871-0

http://www.mkp.com/books_catalog/catalog.asp?ISBN=1-55860-871-0


Computational Fluid Dynamics (CFD) in Chapter 5 I

• This chapter provides a thorough formulation of CFD with a general discussion of the importance of non-linear terms and most importantly viscosity.

• Difficult features like shockwaves and turbulence can be traced to the small coefficient of the highest order derivatives.

• Incompressible flow is approached using the spectral element method, which combines the features of finite elements (copes with complex geometries) and highly accurate approximations within each element.

• These problems need fast solvers for elliptic equations and there is a detailed discussion of data and matrix structure and the use of iterative conjugate gradient methods.

• This is compared with direct solvers using the static condensation method for calculating the solution (stiffness) matrix.


Computational Fluid Dynamics (CFD) in Chapter 5 II

• The generally important problem of adaptive meshes is described using the successive refinement quad/oct-tree (in two/three dimensions) method.

• Compressible flow methods are reviewed and the key problem of coping with the rapid change in field variables at shockwaves is identified.

• One uses a lower order approximation near a shock but preserves the most powerful high order spectral methods in the areas where the flow is smooth.

• Parallel computing (using space filling curves for decomposition) and adaptive meshes are covered.


Space filling curve


Environment and Energy in Chapter 6 I

• This article describes three distinct problem areas – each illustrating important general approaches.

• Subsurface flow in porous media is needed in both oil reservoir simulations and environmental pollution studies. – The nearly hyperbolic or parabolic flow equations are characterized by

multiple constituents and by very heterogeneous media with possible abrupt discontinuities in the physical domain.

– This motivates the use of domain decomposition methods where the full region is divided into blocks which can use different solution methods if necessary.

– The blocks must be iteratively reconciled at their boundaries (mortar spaces).

– The IPARS code described has been successfully integrated into two powerful problem solving environment: NetSolve described in chapter 14 and DISCOVER (aimed especially at interactive steering) from Rutgers university.


Environment and Energy in Chapter 6 II

• The discussion of the shallow water problem uses a method involving implicit (in the vertical direction) and explicit (in the horizontal plane) time-marching methods.

• It is instructive to see that good parallel performance is obtained by only decomposing in the horizontal directions and keeping the hard to parallelize implicit algorithm sequentially implemented.

• The irregular mesh was tackled using space filling curves as also described in chapter 5.

• Finally important code coupling (meta-problem in chapter 4 notation) issues are discussed for oil spill simulations where water and chemical transport need to be modeled in a linked fashion

• . ADR (Active Data Repository) technology from Maryland is used to link the computations between the water and chemical simulations. Sophisticated filtering is needed to match the output and input needs of the two subsystems.


Molecular Quantum Chemistry in Chapter 7 I

• This article surveys in detail two capabilities of the NWChem package from Pacific Northwest Laboratory. It surveys other aspects of computational chemistry.

• This field makes extensive use of particle dynamics algorithms and some use of partial differential equation solvers.

• However characteristic of computational chemistry is the importance of matrix-based methods and these are the focus of this chapter. The matrix is the Hamiltonian (energy) and is typically symmetric positive definite.

• In a quantum approach, the eigensystems of this matrix are the equilibrium states of the molecule being studied. This type of problem is characteristic of quantum theoretical methods in physics and chemistry; particle dynamics is used in classical non-quantum regimes.


Molecular Quantum Chemistry in Chapter 7 II

• NWChem uses a software approach – the Global Array (GA) toolkit, whose programming model lies in between those of HPF and message passing and has been highly successful.

• GA exposes locality to the programmer but has a shared memory programming model for accessing data stored in remote processors.

• Interestingly in many cases calculating the matrix elements dominates (over solving for eigenfunctions) and this is a pleasing parallel task.

• This task requires very careful blocking and staging of the components used to calculate the integrals forming the matrix elements.

• In some approaches, parallel matrix multiplication is important in generating the matrices.

• The matrices typically are taken as full and very powerful parallel eigensolvers were developed for this problem.

• This area of science clearly shows the benefit of linear algebra libraries (see chapter 20) and general performance enhancements like blocking.


General Relativity • This field evolves in time complex partial differential equations

which have some similarities with the simpler Maxwell equations used in electromagnetics (Sec. 8.6).

• Key difficulties are the boundary conditions which are outgoing waves at infinity and the difficult and unique multiple black hole surface conditions internally.

• Finite difference and adaptive meshes are the usual approach.


Lattice Quantum Chromodynamics (QCD) and Monte Carlo Methods I

• Monte Carlo Methods are central to the numerical approaches to many fields (especially in physics and chemistry) and by their nature can take substantial computing resources.

• Note that the error in the computation only decreases like the square root of computer time used compared to the power convergence of most differential equation and particle dynamics based methods.

• One finds Monte Carlo methods when problems are posed as integral equations and the often-high dimension integrals are solved by Monte Carlo methods using a randomly distributed set of integration points.

• Quantum Chromodynamics (QCD) simulations described in this subsection are a classic example of large-scale Monte Carlo simulations which perform excellently on most parallel machines due to modest communication costs and regular structure leading to good node performance.


Errors in Numerical Integration

• For an integral with N points• Monte Carlo has error 1/N0.5

• Iterated Trapezoidal has error 1/N2

• Iterated Simpson has error 1/N4

• Iterated Gaussian is error 1/N2m for our a basic integration scheme with m points

• But in d dimensions, for all but Monte Carlo must set up a Grid of N1/d points on a side; that hardly works above N=3 – Monte Carlo error still 1/N0.5

– Simpson error becomes 1/N4/d etc.


Monte Carlo Convergence• In homework for N=10,000,000 one finds errors

in π of around 10-6 using Simpson’s rule• This is a combination of rounding error (when

computer does floating point arithmetic, it is inevitably approximate) and error from formula which is proportional to N-4

• For Monte Carlo, error will be about 1.0/N0.5

• So an error of 10-6 requires N=1012 or• N=1000,000,000,000 (100,000 more than

Simpson’s rule)• One doesn’t use Monte Carlo to get such

precise results!


Lattice Quantum Chromodynamics (QCD) and Monte Carlo Methods II

• This application is straightforward to parallelize and very suitable for HPF as the basic data structure is an array. However the work described here uses a portable MPI code.

• Section 8.9 describes some new Monte Carlo algorithms but QCD advances typically come from new physics insights allowing more efficient numerical formulations.

• This field has generated many special purpose facilities as the lack of significant I/O and CPU intense nature of QCD allows optimized node designs. The work at Columbia and Tsukuba universities is well known.

• There are other important irregular geometry Monte Carlo problems and they see many of the same issues such as adaptive load balancing seen in irregular finite element problems.


Ocean Modeling• This describes the issues encountered in optimizing a

whole earth ocean simulation including realistic geography and proper ocean atmosphere boundaries.

• Conjugate gradient solvers and MPI message passing with Fortran 90 are used for the parallel implicit solver for the vertically averaged flow.


Tsunami Simulations• These are still

very preliminary; an area where much more work could be done


Multidisciplinary Simulations

• Oceans naturally couple to atmosphere and atmosphere couples to environment including– Deforestration– Emissions from using gasoline (fossil fuels)– Conversely atmosphere makes lakes acid etc.

• These are not trivial as very different timescales


Earthquake Simulations • Earthquake simulations are a relatively young field and it is not

known how far they can go in forecasting large earthquakes. • The field has an increasing amount of real-time sensor data,

which needs data assimilation techniques and automatic differentiation tools such as those of chapter 24.

• Study of earthquake faults can use finite element techniques or with some approximation, Green’s function approaches, which can use fast multipole methods.

• Analysis of observational and simulation data need data mining methods as described in subsection 8.7 and 8.8.

• The principal component and hidden Markov classification algorithms currently used in the earthquake field illustrate the diversity in data mining methods when compared to the decision tree methods of section 8.7.

• Most uses of parallel computing are still pleasingly parallel

PC07ScienceApps [email protected]

40

Published February 19, 2002 in: Proceedings of the National Academy of Sciences, USA

Decision Threshold = 10-4


41

Status of the Real Time Earthquake Forecast Experiment (Original Version) ( JB Rundle et al., PNAS, v99, Supl 1, 2514-2521, Feb 19, 2002; KF Tiampo et al., Europhys. Lett., 60, 481-487, 2002; JB Rundle et al.,Rev. Geophys. Space Phys., 41(4), DOI 10.1029/2003RG000135 ,2003. http://quakesim.jpl.nasa.gov )

Decision Threshold = 10-3

(Composite N-S Catalog)

CL#

03-2

015

Plot of Log10 (Seismic Potential)Increase in Potential for significant earthquakes, ~ 2000 to 2010

Eighteen significant earthquakes (blue circles) have occurred in Central or Southern California. Margin of error of the anomalies is +/- 11 km; Data from S. CA. and N. CA catalogs:

After the work was completed 1. Big Bear I, M = 5.1, Feb 10, 2001 2. Coso, M = 5.1, July 17, 2001After the paper was in press ( September 1, 2001 ) 3. Anza I, M = 5.1, Oct 31, 2001After the paper was published ( February 19, 2002 ) 4. Baja, M = 5.7, Feb 22, 2002 5. Gilroy, M=4.9 - 5.1, May 13, 2002 6. Big Bear II, M=5.4, Feb 22, 2003 7. San Simeon, M = 6.5, Dec 22, 2003 8. San Clemente Island, M = 5.2, June 15, 2004 9. Bodie I, M=5.5, Sept. 18, 2004 10. Bodie II, M=5.4, Sept. 18, 2004 11. Parkfield I, M = 6.0, Sept. 28, 2004 12. Parkfield II, M = 5.2, Sept. 29, 2004 13. Arvin, M = 5.0, Sept. 29, 2004 14. Parkfield III, M = 5.0, Sept. 30, 2004 15. Wheeler Ridge, M = 5.2, April 16, 2005 16. Anza II, M = 5.2, June 12, 2005 17. Yucaipa, M = 4.9 - 5.2, June 16, 2005 18. Obsidian Butte, M = 5.1, Sept. 2, 2005

Note: This original forecast was made using both the full Southern California catalog plus the full Northern California catalog. The S. Calif catalog was used south of lattitude 36o, and the N. Calif. catalog was used north of 36o . No corrections were applied for the different event statistics in the two catalogs. Green triangles mark locations of large earthquakes (M 5.0) between Jan 1, 1990 – Dec 31, 1999.

6 ≤ M5 ≤ M ≤ 6


42

World-Wide Earthquakes, M > 5, 1965-2000World-Wide Seismicity ANSS Catalog - 1970-2000, Magnitude m 5

Forecasting m 7 Earthquakes: January 1, 2000 - 2010 Circles represent earthquakes m 7 from January 1, 2000 – Present

UC Davis Group led by John Rundle


Cosmological Structure Formation (CSF)

• CSF is an example of a coupled particle field problem. • Here the universe is viewed as a set of particles which generate

a gravitational field obeying Poisson’s equation. • The field then determines the force needed to evolve each

particle in time. This structure is also seen in Plasma physics where electrons create an electromagnetic field.

• It is hard to generate compatible particle and field decompositions. CSF exhibits large ranges in distance and temporal scale characteristic of the attractive gravitational forces.

• Poisson’s equation is solved by fast Fourier transforms and deeply adaptive meshes are generated.

• The article describes both MPI and CMFortran (HPF like) implementations.

• Further it made use of object oriented techniques (chapter 13) with kernels in F77. Some approaches to this problem class use fast multipole methods.


Cosmological Structure Formation (CSF)

• There is a lot of structure in universe


Computational Electromagnetics (CEM) • This overview summarizes several different approaches to

electromagnetic simulations and notes the growing importance of coupling electromagnetics with other disciplines such as aerodynamics and chemical physics.

• Parallel computing has been successfully applied to the three major approaches to CEM.

• Asymptotic methods use ray tracing as seen in visualization. Frequency domain methods use moment (spectral) expansions that were the earliest uses of large parallel full matrix solvers 10 to 15 years ago; these now have switched to the fast multipole approach.

• Finally time-domain methods use finite volume (element) methods with an unstructured mesh. As in general relativity, special attention is needed to get accurate wave solutions at infinity in the time-domain approach.


Data mining• Data mining is a broad field with many different

applications and algorithms (see also sections 8.4 and 8.8).

• This article describes important algorithms used for example in discovering associations between items that were likely to be purchased by the same customer; these associations could occur either in time or because the purchases tended to be in the same shopping basket.

• Other data-mining problems discussed include the classification problem tackled by decision trees.

• These tree based approaches are parallelized effectively (as they are based on huge transaction databases) with load balance being a difficult issue.


Signal and Image Processing

• This samples some of the issues from this field, which currently makes surprisingly little use of parallel computing even though good parallel algorithms often exist.

• The field has preferred the convenient programming model and interactive feedback of systems like MATLAB and Khoros.

• These are problem solving environments as described in chapter 14 of SOURCEBOOK.


Monte Carlo Methods and Financial Modeling I

• Subsection 8.2 introduces Monte Carlo methods and this subsection describes some very important developments in the generation of “random” numbers.

• Quasirandom numbers (QRN’s) are more uniformly distributed than the standard truly random numbers and for certain integrals lead to more rapid convergence.

• In particular these methods have been applied to financial modeling where one needs to calculate one or more functions (stock prices, their derivatives or other financial instruments) at some future time by integrating over the possible future values of the underlying variables.

• These future values are given by models based on the past behavior of the stock.


Monte Carlo Methods and Financial Modeling II

• This can be captured in some cases by the volatility or standard deviation of the stock.

• The simplest model is perhaps the Black-Scholes equation, which can be derived from a Gaussian stock distribution, combined with an underlying "no-arbitrage" assumption. This asserts that the stock market is always in equilibrium instantaneously and there is no opportunity to make money by exploiting mismatches between buy and sell prices.

• In a physics language, the different players in the stock market form a heat bath, which keeps the market in adiabatic equilibrium.

• There is a straightforward (to parallelize and implement) binomial method for predicting the probability distributions of financial instruments. However Monte Carlo methods and QRN’s are the most powerful approach.


Quasi Real-time Data analysis of Photon Source Experiments

• This subsection describes a successful application of computational grids to accelerate the data analysis of an accelerator experiment. It is an example that can be generalized to other cases.

• The accelerator (here a photon source at Argonne) data is passed in real-time to a supercomputer where the analysis is performed. Multiple visualization and control stations are also connected to the Grid.


Next GenerationInternet

Advanced Photon Source ImmersaDesk

Remote Visualization Workstations

Avatar

Virtual Reality Cave

Scientist


Forces Modeling and Simulation• This subsection describes event driven simulations which as

discussed in chapter 4 are very common in military applications.

• A distributed object approach called HLA (see chapter 13) is being used for modern problems of this class.

• Some run in “real-time” with synchronization provided by wall clock and humans and machines in the loop.

• Other cases are run in “virtual time” in a more traditional standalone fashion.

• This article describes integration of these military standards with Object Web ideas such as CORBA and .NET from Microsoft.

• One application simulated the interaction of vehicles with a million mines on a distributed Grid of computers. – This work also parallelized the minefield simulator using threads (chapter

10).


Event Driven Simulations• This is a graph based model where independent objects

issue events that travel as messages to other objects• Hard to parallelize as no guarantee that event will not arrive

from past in simulation time• Often run in “real-time”

1

2

3

t1

t2


Industrial Strength Parallel Computing I• Morgan Kaufmann publishes a book “Industrial Strength Parallel Computing” (ISPC), Edited by

Alice E. Koniges, which is complementary to our book as it has a major emphasis on application experience. As a guide to readers interested in further insight as to which technologies are useful in which application areas we give a brief summary of the application chapters of ISPC. We will use CRPC to designate work in Sourcebook and ISPC to denote work in “Industrial Strength Parallel Computing”.

• Chapter 7 - Ocean Modeling and Visualization (Yi Chao, P. Peggy Li, Ping Wang, Daniel S. Katz, Benny N. Cheng, Scott Whitman) of ISPC

• This uses a variant of the same ocean code described in section 8.4 of CRPC and describes both basic parallel strategies and the integration of the simulation with a parallel 3D volume renderer.

• Chapter 8 - Impact of Aircraft on Global Atmospheric Chemistry (Douglas A. Rotman, John R. Tannahill, Steven L. Baughcum) of ISPC

• This discusses issues related to those in chapter 6 of CRPC in the context of estimating the impact on atmospheric chemistry of supersonic aircraft emissions. Task decomposition (code coupling) for different physics packages is combined with domain decomposition and parallel block data decomposition. Again one keeps the vertical direction in each processor and decomposes in the horizontal plane. Nontrivial technical problems are found in the polar regions due to the decomposition singularities.

• Chapter 9 - Petroleum Reservoir Management (Michael DeLong, Allyson Gajraj, Wayne Joubert, Olaf Lubeck, James Sanderson, Robert E. Stephenson, Gautam S. Shiralkar, Bart van Bloemen Waanders) of ISPC

• This addresses an application covered in chapter 6 of CRPC but focuses on a different code Falcon developed as a collaboration between Amoco and Los Alamos. As in other chapters of ISPC, detailed performance results are given but particularly interesting is the discussion of the sparse matrix solver (chapter 21 of CRPC). A very efficient parallel pre-conditioner for a fully implicit solver was developed based on the ILU (Incomplete LU) approach. This re-arranged the order of computation but faithfully preserved the sequential algorithm.

http://www.mkp.com/books_catalog/longtc.asp?ISBN=1-55860-540-1#ch7


Industrial Strength Parallel Computing II• Chapter 10 - An Architecture-Independent Navier-Stokes Code (Johnson C. T. Wang,

Stephen Taylor) of ISPC • This describes parallelization of a commercial code ALSINS (from the Aerospace Corporation)

which solves the Navier-Stokes equations (chapter 5 of CRPC) using finite difference methods in the Reynolds averaging approximation for turbulence. Domain decomposition (Chapters 6 and 20 of CRPC) and MPI is used for the parallelism. The application studied involved flow over Delta and Titan launch rockets

• Chapter 11 - Gaining Insights into the Flow in a Static Mixer (Olivier Byrde, Mark L. Sawley) of ISPC

• This studies flow in commercial chemical mixers using Reynolds-averaged Navier-Stokes equations using finite volume methods as in ISPC, chapter 10. Domain decomposition (Chapters 6 and 20 of CRPC) of a block structured code and PVM is used for the parallelism. The mixing study required parallel study of particle trajectories in the calculated flow field.

• Chapter 12 - Modeling Groundwater Flow and Contaminant Transport (William J. Bosl, Steven F. Ashby, Chuck Baldwin, Robert D. Falgout, Steven G. Smith, Andrew F. B. Tompson) of ISPC

• This presents a groundwater flow (chapter 6 of CRPC) code ParFlow that uses finite volume methods to generate the finite difference equations. A highlight is the detailed discussion of parallel multigrid (Chapters 8.6, 12 and 21 of CRPC), which is used not as a complete solver but as a pre-conditioner for a conjugate gradient algorithm.

• Chapter 13 - Simulation of Plasma Reactors (Stephen Taylor, Marc Rieffel, Jerrell Watts, Sadasivan Shankar) of ISPC

• This simulates plasma reactors used in semiconductor manufacturing plants. The Direct Simulation Monte Carlo method is used to model the system in terms of locally interacting particles. Adaptive three dimensional meshes (chapter 19 of CRPC) are used with a novel diffusive algorithm to control dynamic load balancing (chapter 18 of CRPC).


Industrial Strength Parallel Computing III• Chapter 14 - Electron-Molecule Collisions for Plasma Modeling (Carl Winstead, Chuo-Han

Lee, Vincent McKoy) of ISPC• This complements chapter 13 of ISPC by studying the fundamental particle interactions in

plasma reactors. It is instructive to compare the discussion of the algorithm in this chapter with that of chapter 7 of CRPC. They lead to similar conclusions with chapter 7 naturally describing the issues more generally. Two steps – calculation of matrix elements and then a horde of matrix multiplications to transform basis sets dominate the computation. In this problem class, the matrix solver is not a computationally significant step.

• Chapter 15 - Three-Dimensional Plasma Particle-in-Cell Calculations of Ion Thruster Backflow Contamination (Robie I. Samanta Roy, Daniel E. Hastings, Stephen Taylor) of ISPC

• This chapter studies contamination from space-craft thruster exhaust using a three dimensional particle in the cell code. This involves a mix of solving Poisson’s equation for the electrostatic field and evolving ions under the forces calculated from this field. There are algorithmic similarities to the astrophysics problems in CRPC section 8.6 but electromagnetic problems produce less extreme density concentrations than the purely attractive (and hence clumping) gravitational force found in astrophysics.

• Chapter 16 - Advanced Atomic-Level Materials Design (Lin H. Yang) of ISPC • This describes a Quantum Molecular Dynamics package implementing the well known Car-

Parrinello method. This is part of the NWChem package featured in chapter 7 of CRPC but not described in detail there. The computation is mostly dominated by 3D FFT, and basic BLAS (complex vector arithmetic) calls but has significant I/O.

• Chapter 17 - Solving Symmetric Eigenvalue Problems (David C. O'Neal, Raghurama Reddy) of ISPC

• This describes parallel eigenvalue determination which is covered in sections 7.4.3 and chapter 20 of CRPC.

• Chapter 18 - Nuclear Magnetic Resonance Simulations (Alan J. Benesi, Kenneth M. Merz, James J. Vincent, Ravi Subramanya) of ISPC

• This is a pleasing parallel computation of NMR spectra gotten by averaging over crystal orientation.


Industrial Strength Parallel Computing IV• Chapter 19 - Molecular Dynamics Simulations Using Particle-Mesh Ewald Methods (Michael F. Crowley,

David W. Deerfield II, Tom A. Darden, Thomas E. Cheatham III) of ISPC• This chapter discusses parallelization of a widely used molecular dynamics AMBER and its application to

computational biology. Much of the discussion is devoted to implementing a particle-mesh method aimed at fast calculation of the long range forces. Chapter 8.6 discusses this problem for astrophysical cases. The ISPC discussion focuses on the needed 3D FFT.

• Chapter 20 - Radar Scattering and Antenna Modeling (Tom Cwik, Cinzia Zuffada, Daniel S. Katz, Jay Parker) of ISPC

• This article discusses a finite element formulation of computational electromagnetics (see section 8.7 of CRPC) which leads to a sparse matrix problem with multiple right hand sides. The minimum residual iterative solver was used – this is similar to the conjugate gradient approach described extensively in the CRPC Book (Chapters 20,21 and many applications – especially chapter 5). The complex geometries of realistic antenna and scattering problems demanded sophisticated mesh generation. (chapter 19 of CRPC)

• Chapter 21 - Functional Magnetic Resonance Imaging Dataset Analysis (Nigel H. Goddard, Greg Hood, Jonathan D. Cohen, Leigh E. Nystrom, William F. Eddy, Christopher R. Genovese, Douglas C. Noll) of ISPC

• This describes a commonly important type of data analysis where raw images (MRI scans in neuroscience) need basic processing before they can be interpreted. This processing for MRI involves a pipeline of 5-15 steps of which the computationally intense Fourier transforms, interpolation and head motion corrections were parallelized. Sections 8.9 and 8.11 of CRPC describe related applications.

• Chapter 22 - Selective and Sensitive Comparison of Genetic Sequence Data (Alexander J. Ropelewski, Hugh B. Nicholas, Jr., David W. Deerfield II ) of ISPC

• This describes the very important genome database search problem implemented in a program called Msearch. The basic sequential algorithm involves very sophisticated pattern matching but parallelism is straightforward because one can use pleasingly parallel approaches involving decomposing the computation over parts of the searched database.

• Chapter 23 - Interactive Optimization of Video Compression Algorithms (Henri Nicolas, Fred Jordan) of ISPC

• This chapter describes parallel compression algorithms for video streams. The parallelism involves dividing images into blocks and independently compressing each block. The goal is an interactive system to support design of new compression methods.


Parallel Computing Works Applications• Parallel Computing Works G C Fox, P Messina, and R

Williams; Morgan Kaufmann, San Mateo Ca, (1994) http://www.old-npac.org/copywrite/pcw/

• These applications are not as sophisticated as those discussed above as they come from a time when few scientists addressed three dimensional problems; 2D computations were typically the best you could do in partial differential equation arena. To make a stark contrast, the early 1983 QCD (section 8.3) computations in PCW were done on the Caltech hypercube whose 64 nodes could only make it to a total of 3 megaflops when combined! Today teraflop performance is available – almost a million times better …

• Nevertheless in many applications, the parallel approaches described in this book are still sound and state of the art.

• The book develops Complex Systems formalism used here


Parallel Computing Works I• PCW Chapter 3 “A Methodology for Computation”

describes more formally the approach taken in chapter 4 of CRPC.

• PCW Chapter 4 “Synchronous Applications I” describes QCD (section 8.3) and other similar statistical physics Monte Carlo simulations on a regular lattice. It also presents a cellular automata model for granular materials (such as sand dunes), which has a simple regular lattice structure as mentioned in section 4.5 of CRPC.

• PCW Chapter 6 “Synchronous Applications II” describes other regular problems including convectively-dominated flows and the flux-corrected transport differential equations. High statistics studies of two-dimensional statistical physics problems are used to study phase transitions (cf. sections 8.3 and 8.10 of CRPC). Parallel multiscale methods are also described for various image processing algorithms including surface reconstruction, character recognition, real-time motion field estimation and collective stereosis (cf. section 8.9 of CRPC).


Parallel Computing Works II• PCW Chapter 7 “Independent Parallelism” describes what is

termed “pleasingly parallel” applications in chapter 4. This PCW chapter included a physics computation of quantum string theory surfaces, parallel random number generation, and ray tracing to study a statistical approach to the gravitational lensing of quasars by galaxies. A high temperature superconductor study used Quantum Monte Carlo method – here one uses Monte Carlo methods to generate a set of random independent paths – a different problem structure to that of section 8.3 but a method of general importance in chemistry, condensed matter and nuclear physics. GENESIS was one of the first general purpose biological neural network simulators.

• PCW Chapter 8 “Full Matrix Algorithms and Their Applications” first discusses some parallel matrix algorithms (chapter 20) and applies the Gauss-Jordan matrix solver to a chemical reaction computation. This directly solves Schrödinger’s equation for a small number of particles and is different in structure from the problems in CRPC chapter 7; it reduces to a multi-channel ordinary differential equation and leads to full matrix solvers. A section on “electron-molecule collisions” describes a similar structure to the much more sophisticated simulation engines of CRPC chapter 7. Further work by this group can be found in chapter 14 of ISPC.


Parallel Computing Works III• PCW Chapter 9 “ Loosely Synchronous Problems”. The above chapters

described synchronous or pleasingly parallel systems in the language of chapter 4 of CRPC. This chapter describes several loosely synchronous cases. Geomorphology by micro-mechanical simulations was different approach to granular systems (from the cellular automata in chapter 4 of PCW) using direct modeling of particles “bouncing off each other”. Particle-in-cell simulation of an electron beam plasma instability used particle in the cell methods, which have of course grown tremendously in sophistication as seen in the astrophysics simulation of CRPC section 8.6 and the ion thruster simulations in chapter 15 of ISPC (which uses the same approach as described in this PCW chapter). Computational electromagnetics (see section 8.7 of CRPC) used finite element methods and is followed up in chapter 20 of ISPC. Concurrent DASSL applied to dynamic distillation column simulation uses a parallel sparse solver (chapter 21 of CRPC) to tackle coupled ordinary differential-algebraic equations arising in chemical engineering. This chapter also discusses parallel adaptive multigrid for solving differential equations – an area with similarities to mesh refinement discussed in CRPC chapters 5, 8.6, 12 and 19. See also chapter 9 of ISPC. Munkres’s assignment algorithm was parallelized for a multi-target Kalman filter problem (cf. section 8.8 of CRPC). This PCW chapter also discusses parallel implementations of learning methods for neural networks.

• PCW Chapter 10 “DIME Programming Environment” discusses one of the earliest parallel unstructured mesh generators and applies it to model finite element problems. Chapter 19 of CRPC is an up to date survey of this field.


Parallel Computing Works IV• PCW Chapter 11 “Load Balancing and Optimization” describes

approaches to optimization based on physical analogies and including approaches to the well-known traveling salesman problem. These physical optimization methods complement those discussed in CRPC chapters 8.8 and 22.

• PCW Chapter 12 “Irregular Loosely Synchronous Problems” features some of the harder parallel scientific codes in PCW. This chapter includes two adaptive unstructured mesh problems that used the DIME package described in PCW chapter 10. One was a simulation of the electrosensory system of the fish gnathonemus petersii and the other transonic flow in CFD (chapter 5 of CRPC). There is a full discussion of fast multipole methods and their parallelism; these were mentioned in chapters 4 and 8.7 of CRPC and in PCW are applied to astrophysical problems similar to those in section 8.6 of CRPC and to the vortex approach to CFD. Fast multipole methods are applied to the same problem class as particle in the cell codes as they again involve interacting particles and fields. Chapter 19 of ISPC discusses another biochemistry problem of this class. Parallel sorting is an interesting area and this PCW chapter describes several good algorithms and compares them. The discussion of cluster algorithms for statistical physics is interesting as these are the best sequential algorithms but the method is very hard to parallelize. The same difficult structure occurs in some approaches to region finding in image processing and also for some models of the formation of earthquakes using cellular automata-like models. The clusters are the aligned strains that form the earthquake.


Parallel Computing Works V• PCW Chapter 14 “Asynchronous Applications” describes examples of

the temporally asynchronous algorithms described in chapter 4 where scaling parallelism is not easy. Melting in two dimensions illustrates a subtle point that distinguishes Monte Carlo and PDE algorithms, as one cannot simultaneously update in Monte Carlo, sites with overlapping neighbors. This complicates the loosely synchronous structure and can make problem architecture look like that of asynchronous event driven simulations---here events are individual Monte Carlo updates. ``Detailed balance'' requires that such events be sequentially (if arbitrarily) ordered which is not easy in a parallel environment. Nevertheless using the equivalent of multiple threads (chapter 10 of CRPC), one finds an algorithm that gives good parallel performance.

• Computer Chess is the major focus of this chapter where parallelism is gotten from sophisticated parallelism of the game tree. Statistical methods are used to balance the processing of the different branches of the dynamically pruned game tree. There is a shared database containing previous evaluation of positions, but otherwise the processing of the different possible moves is independent. One does need a clever ordering of the work (evaluation of the different final positions) to avoid a significant number of calculations being wasted because they would ``later'' be pruned away by a parallel calculation on a different processor. Branch and bound applications have similar parallelization characteristics to computer chess. Note this is not the only and in fact not the easiest form of parallelism in computer chess. Rather fine grain parallelism in evaluating each position is used in all recent computer chess championship systems. The work described in PCW is complementary to this mainstream activity.


Parallel Computing Works VI• PCW Chapter 18 “Complex System Simulation

and Analysis” describes a few metaproblems using the syntax of section 4.9 of CRPC. ISIS was an Interactive Seismic Imaging System and there is a long discussion of one of the first large scale parallel military simulations mixing data and task parallelism. This involved generation of scenario, tracking multiple ballistic missiles and a simulation of the hoped for identification and destruction. A very sophisticated parallel Kalman filter was generated in this project.

• Workflow technology would be used in these applications today

parallel computing 2007: science applications

Documents

quantum physics problems

statistical physics

different descriptions

descriptions of matter

quantum gravity

chemistry problems

different algorithms

matrix pc07scienceapps