computational issues in carbon nanotube simulation ashok srinivasan department of computer science...

23
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University

Upload: nora-berry

Post on 30-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Computational issues in Carbon nanotube simulation

Ashok SrinivasanDepartment of Computer Science

Florida State University

Outline

• Background– Sequential computation– Performance analysis

• Parallelization– Shared memory – Message passing– Load balancing– Communication reduction

• Research issues

Background

• Uses of Carbon nanotubes– Materials– NEMS– Transistors– Displays– Etc

• www.ipt.arc.nasa.gov

Sequential computation

• Molecular dynamics, using Brenner’s potential– Short-range interactions– Neighbors can change dynamically during

the course of the simulation– Computational scheme

• Find force on each particle due to interactions with “close” neighbors

• Update position and velocity of each atom

Force computations

• Bond angles

• Pair interactions

• Dihedral

• Multibody

Performance analysis

Profile of execution time• 1: Force

• 2: Neighbor list

• 3: Predictor/corrector

• 4: Thermostating

• 5: Miscellaneous

Profile for force computations

Parallelization

• Shared memory

• Message passing

• Load balancing

• Communication reduction

Shared memory parallelization

• Do each of the following loops in parallel– For each atom

• Update forces due to atom i

• If neighboring atoms are owned by other threads, update an auxiliary array

– For each thread• Collect force terms for atoms it owns

– Srivastava, et al, SC-97 and CSE 2001• Simulated 105 to 107 atoms

• Speedup around 16 on 32 processors

• Include long-range forces too

Lexical decomposition

Message passing parallelization

• Decompose domain into cells– Each cell contains its atoms

• Assign a set of adjacent cells to each processor• Each processor computes values for its cells,

communicating with neighbors when their data is needed

• Caglar and Griebel, World scientific, 1999– Simulated 108 atoms on up to 512 processors– Linear speedup for 160,000 atoms on 64 processors

Load balancing

• Atom based decomposition– For each atom, compute

forces due to each bond, angle, and dihedral

– Load not balanced

Load balancing ... 2• Bond based decomposition

– For each bond, compute forces due to that bond, angles, and dihedrals

– Finer grained– Load still not

balanced!

Load balancing ... 3

• Load imbalance was not caused by granularity– Symmetry is used to reduce calculations through

– If i > j, don’t compute for bond (i,j)

• So threads get unequal load

– Change condition to• If i+j is even, don’t compute bond (i,j) if i > j• If i+j is odd, don’t compute bond (i,j) if i < j• Does not work, due to regular structure of nanotube

– Use a different condition to balance load

Load balancing ... 4

• Load is much better balanced now– ... at least for this simple configuration

Locality

• Locality important to reduce cache misses

• Current scheme based on lexical ordering

• Alternate: Decompose based on a breadth first search traversal of the atom-interaction graph

Locality ... 2

Research issues

• Neighbor search

• Parallelization– Dynamic load balancing– Communication reduction

• Multi-scale simulation of nano-composites

Neighbor search

• Neighbor lists– Crude algorithm

• Compare each pair, and determine if they are close enough• O(N2) for N atoms

– Cell based algorithm• Divide space into cells• Place atoms in their respective cells• Compare atoms only in neighboring cells• Problem

– Many empty cells

– Inefficient use of memory

Computational geometry techniques

• Orthogonal search data structures– K-d tree

• Tree construction time: O(N log N)• Worst case search overhead: O(N2/3)• Memory: O(N)

– Range tree• Tree construction time: O(N log2N)• Worst case search overhead: O(log2N)• Memory: O(N log2N)

Desired properties of search techniques

• Update should be efficient– But the number of atoms does not change– Position changes only slightly– The queries are known too– May be able to use knowledge of the

structure of the nanotube– Parallelization

Parallelization

• Load balancing and locality– Better graph based techniques– Geometric partitioning– Dynamic schemes

• Use structure of the tube• Stochastic versions of

– Spectral partitioning– Diffusive schemes

Multi-scale simulation of nano-composites

• Molecular dynamics at nano-scale

• Finite element method at larger scale– New models to links the scales– New algorithms to dynamically balance the

loads, while minimizing communication– Latency tolerance and scalability to large

numbers of processors