computational issues in carbon nanotube simulation ashok srinivasan department of computer science...
TRANSCRIPT
Computational issues in Carbon nanotube simulation
Ashok SrinivasanDepartment of Computer Science
Florida State University
Outline
• Background– Sequential computation– Performance analysis
• Parallelization– Shared memory – Message passing– Load balancing– Communication reduction
• Research issues
Background
• Uses of Carbon nanotubes– Materials– NEMS– Transistors– Displays– Etc
• www.ipt.arc.nasa.gov
Sequential computation
• Molecular dynamics, using Brenner’s potential– Short-range interactions– Neighbors can change dynamically during
the course of the simulation– Computational scheme
• Find force on each particle due to interactions with “close” neighbors
• Update position and velocity of each atom
Profile of execution time• 1: Force
• 2: Neighbor list
• 3: Predictor/corrector
• 4: Thermostating
• 5: Miscellaneous
Shared memory parallelization
• Do each of the following loops in parallel– For each atom
• Update forces due to atom i
• If neighboring atoms are owned by other threads, update an auxiliary array
– For each thread• Collect force terms for atoms it owns
– Srivastava, et al, SC-97 and CSE 2001• Simulated 105 to 107 atoms
• Speedup around 16 on 32 processors
• Include long-range forces too
Lexical decomposition
Message passing parallelization
• Decompose domain into cells– Each cell contains its atoms
• Assign a set of adjacent cells to each processor• Each processor computes values for its cells,
communicating with neighbors when their data is needed
• Caglar and Griebel, World scientific, 1999– Simulated 108 atoms on up to 512 processors– Linear speedup for 160,000 atoms on 64 processors
Load balancing
• Atom based decomposition– For each atom, compute
forces due to each bond, angle, and dihedral
– Load not balanced
Load balancing ... 2• Bond based decomposition
– For each bond, compute forces due to that bond, angles, and dihedrals
– Finer grained– Load still not
balanced!
Load balancing ... 3
• Load imbalance was not caused by granularity– Symmetry is used to reduce calculations through
– If i > j, don’t compute for bond (i,j)
• So threads get unequal load
– Change condition to• If i+j is even, don’t compute bond (i,j) if i > j• If i+j is odd, don’t compute bond (i,j) if i < j• Does not work, due to regular structure of nanotube
– Use a different condition to balance load
Locality
• Locality important to reduce cache misses
• Current scheme based on lexical ordering
• Alternate: Decompose based on a breadth first search traversal of the atom-interaction graph
Research issues
• Neighbor search
• Parallelization– Dynamic load balancing– Communication reduction
• Multi-scale simulation of nano-composites
Neighbor search
• Neighbor lists– Crude algorithm
• Compare each pair, and determine if they are close enough• O(N2) for N atoms
– Cell based algorithm• Divide space into cells• Place atoms in their respective cells• Compare atoms only in neighboring cells• Problem
– Many empty cells
– Inefficient use of memory
Computational geometry techniques
• Orthogonal search data structures– K-d tree
• Tree construction time: O(N log N)• Worst case search overhead: O(N2/3)• Memory: O(N)
– Range tree• Tree construction time: O(N log2N)• Worst case search overhead: O(log2N)• Memory: O(N log2N)
Desired properties of search techniques
• Update should be efficient– But the number of atoms does not change– Position changes only slightly– The queries are known too– May be able to use knowledge of the
structure of the nanotube– Parallelization
Parallelization
• Load balancing and locality– Better graph based techniques– Geometric partitioning– Dynamic schemes
• Use structure of the tube• Stochastic versions of
– Spectral partitioning– Diffusive schemes