the distributed data interface in gamess
DESCRIPTION
The Distributed Data Interface in GAMESS. Brett M. Bode, Michael W. Schmidt, Graham D. Fletcher, and Mark S. Gordon Ames Laboratory-USDOE, Iowa State University. 10/7/99. What is GAMESS?. G eneral A tomic and M olecular E lectronic S tructure S ystem. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/1.jpg)
The Distributed Data The Distributed Data Interface in GAMESSInterface in GAMESS
Brett M. Bode, Michael W. Schmidt, Brett M. Bode, Michael W. Schmidt,
Graham D. Fletcher, and Mark S. Graham D. Fletcher, and Mark S. Gordon Gordon
Ames Laboratory-USDOE,Ames Laboratory-USDOE,
Iowa State UniversityIowa State University
10/7/99
![Page 2: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/2.jpg)
22
GGeneral eneral AAtomic and tomic and MMolecular olecular EElectronic lectronic SStructure tructure SSystemystem
• First principles - fully quantum First principles - fully quantum mechanicalmechanical
• Created from other programs in Created from other programs in ~1980~1980
• Developed by Dr. Mark Gordon’s Developed by Dr. Mark Gordon’s research group since 1982 with Dr. research group since 1982 with Dr. Michael Schmidt as the principle Michael Schmidt as the principle developer.developer.
• Parallelization begin in 1991Parallelization begin in 1991•Emphasis on Distributed memory Emphasis on Distributed memory systemssystems
• Currently includes methods for Currently includes methods for treating 1-atom to several treating 1-atom to several hundred atomshundred atoms
What is GAMESS?What is GAMESS?
![Page 3: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/3.jpg)
33Partial list of capabilitiesPartial list of capabilities
SCF type RHF ROHF UHF GVB MCSCF
Energy CDP CDP CDP CDP CDP
Gradient CDP CDP CDP CDP CDP
Hessian CDP CDP - CDP -
MP2Energy
CDP CDP CDP - C
MP2Gradient
CDP - - - -
CIEnergy
CDP CDP - CDP CDP
C = Uses disk storageD = Minimal disk usageP = Parallel execution
![Page 4: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/4.jpg)
44
First Generation ParallelFirst Generation ParallelCodeCode
Parallel communications were Parallel communications were performed using either:performed using either:• TCGMSGTCGMSG• Vendor supplied MPIVendor supplied MPI
Parallel version was usually a Parallel version was usually a slightly modified version of the slightly modified version of the sequential codesequential code
![Page 5: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/5.jpg)
55IBM-SUR clusterIBM-SUR cluster
22 IBM RS/6000 43P-260:22 IBM RS/6000 43P-260:– Dual 200MHz Power3 CPUsDual 200MHz Power3 CPUs– 4 Mb of Level 2 cache4 Mb of Level 2 cache– 1 GByte of RAM 1 GByte of RAM – 18 GBytes fast local disks18 GBytes fast local disks– Jumbo Frames Gig EthernetJumbo Frames Gig Ethernet– Integrated Fast-Ethernet Integrated Fast-Ethernet
Fast Ethernet Switch to allFast Ethernet Switch to all 3x9 port Gigabit Switches3x9 port Gigabit Switches
![Page 6: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/6.jpg)
66
Gigabit Performance on Gigabit Performance on thetheIBM 43P-260 ClusterIBM 43P-260 Cluster
0
100
200
300
400
500
600
700
800
900
1x100 1x101 1x102 1x103 1x104 1x105 1x106 1x107 1x108
Message Size
Fast EtherNet, MPI
Fast EtherNet, TCP
Gigabit EtherNet, MPI, Jumbo
Gigabit EtherNet, TCP, Jumbo
Gigabit EtherNet, TCP, Normal
Gigabit Ethernet, MPI, Normal
Alteon 180
Network Performance Comparison onthe IBM RS/6000 43P-m260 running AIX v4.3.2
![Page 7: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/7.jpg)
77Test MoleculeTest Molecule
Ti(C5H5)2 C2H4SiHCl3 Basis Set
• 6-31G(d,p) on C and H.
• SBKJC ECP on Si, Ti, and Cl extended with 1 d-type polarization function on Si and Cl.
• 345 total basis functions
![Page 8: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/8.jpg)
88Parallel SCFParallel SCF
Very good Very good scaling scaling dependant dependant on the size on the size of the of the molecule.molecule.
Large Large systems systems show nearly show nearly linear linear scaling scaling through 256 through 256 nodesnodes
J
J
J
J
J
J
H
H
H
H
FF
F
F
F
F
Ñ
Ñ
Ñ
Ñ
É
É
É
É
É
É
Ç
Ç
Ç
Ç
ÅÅ
Å
Å
Å
Å
M
M
M
M
1
4
7
10
13
16
1 4 7 10 13 16
Sp
ee
du
p o
ver
1C
PU
wa
ll tim
ing
Number of CPUs
Ideal
J Direct SCF, Gig. Ethernet
H Direct SCF, Gig. Ethernet, 1 CPU/box
F Conv. SCF, Gig. Ethernet
Ñ Conv. SCF, Gig. Ethernet, 1 CPU/box
É Direct SCF, Fast Ethernet
Ç Direct SCF, Fast Ethernet, 1 CPU/box
Å Conv. SCF, Fast Ethernet
M Conv. SCF, Fast Ethernet, 1 CPU/box
SCF Energy Speedup Curve
![Page 9: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/9.jpg)
99Successes and LimitationsSuccesses and Limitations
SCF methods SCF methods scale very wellscale very well
Most methods run Most methods run in parallelin parallel
Good use is made Good use is made of aggregate CPU of aggregate CPU and disk and disk resources.resources.
MP2 and MCSCF MP2 and MCSCF methods scale to methods scale to only a few (8-32) only a few (8-32) nodesnodes
The aggregate The aggregate memory is not memory is not utilized so jobs are utilized so jobs are still limited by the still limited by the memory size of memory size of one node.one node.
![Page 10: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/10.jpg)
1010
Second Generation Second Generation MethodsMethods
New methods should take advantage New methods should take advantage of the aggregate memory of a parallel of the aggregate memory of a parallel systemsystem• Implies a higher communication demandsImplies a higher communication demands• Many to many messaging profileMany to many messaging profile
Methods should scale to hundreds of Methods should scale to hundreds of nodes (at least)nodes (at least)
Demanding local storage needsDemanding local storage needs
![Page 11: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/11.jpg)
1111
The Distributed Data The Distributed Data Interface (DDI)Interface (DDI)
GAMESS GAMESS
replicated data
DDI DDI
replicated data
distributed data
distributed data
node 0 node 1
process 0 process 1
Figure 1. Memory model if using a full function one-sided messaging library. DDI_GET's interrupt of process 0 results in the data transfer to the requesting node.
Interrupt
"patch"
DDI provides the core functions needed to treat a portion of the memory on each node as part of a global shared array.
![Page 12: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/12.jpg)
1212DDIDDI
Runs on top of:Runs on top of:• MPI (MPI-2 MPI (MPI-2
preferred)preferred)• TCP/IP socketsTCP/IP sockets
Lightweight - Lightweight - Provides only the Provides only the functionality functionality needed by needed by GAMESSGAMESS
Is not intended as Is not intended as a general purpose a general purpose library.library.
Does optimize for Does optimize for mixed SMP and mixed SMP and distributed distributed memory systemsmemory systems
![Page 13: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/13.jpg)
1313New MP2 implementationNew MP2 implementation
Uses DDI to utilize the aggregate Uses DDI to utilize the aggregate memory of the parallel machine at the memory of the parallel machine at the expense of communications expense of communications
Trades some symmetry in the MP2 Trades some symmetry in the MP2 equations for better parallel scalabilityequations for better parallel scalability• Requires more memory than the Requires more memory than the
sequential versionsequential version• Is slower than the sequential version on 1 Is slower than the sequential version on 1
CPUCPU
![Page 14: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/14.jpg)
1414MP2 ScalabilityMP2 Scalability
J
J
J
J
H
H
H
F F
FF
Ñ
Ñ
Ñ
1
4
7
10
13
16
1 4 7 10 13 16
Sp
ee
du
p o
ver
a s
cale
d 1
no
de
tim
ing
Number of CPUs
Ideal
J Gigabit Ethernet
H Gigabit Ethernet, 1 CPU/box
F Fast Ethernet
Ñ Fast Ethernet, 1CPU/box
MP2 Energy and Gradient Speedup Curves
![Page 15: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/15.jpg)
1515ConclusionsConclusions
DDI provides a scalable way of DDI provides a scalable way of taking advantage of the global taking advantage of the global memory of a parallel system memory of a parallel system
The new MP2 code demonstrates The new MP2 code demonstrates code written specifically for code written specifically for parallel execution without parallel execution without replacing the sequential version.replacing the sequential version.
![Page 16: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/16.jpg)
1616Future WorkFuture Work
DDI needs further work to enhance DDI needs further work to enhance the features and increase the features and increase robustness, or possibly needs to be robustness, or possibly needs to be replaced with a more general library replaced with a more general library such as the GA tools from PNNL.such as the GA tools from PNNL.
The global shared memory approach The global shared memory approach is being applied to many other parts is being applied to many other parts of GAMESS to increase scalability.of GAMESS to increase scalability.
![Page 17: The Distributed Data Interface in GAMESS](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56813c0e550346895da58154/html5/thumbnails/17.jpg)
1717Thanks!Thanks!
David HalsteadDavid Halstead Guy HelmerGuy Helmer
For $:For $: IBM Corp. for an SUR grant (of 15 IBM Corp. for an SUR grant (of 15
Workstations)Workstations) DOE MICS program (interconnects and 7 DOE MICS program (interconnects and 7
workstations)workstations) Air Force OSR (long term dev. Funding)Air Force OSR (long term dev. Funding) DOD CHSSI program (improved DOD CHSSI program (improved
parallelization)parallelization)