protein folding with python on...
TRANSCRIPT
June 30, 2010
Mitg
lied
de
r H
e lm
hol
tz-G
em
e in
scha
ft
Protein Folding with Python on Supercomputers
Jan H. Meinke
June 30, 2010 Slide 2
Research Centre Jülich
AustinAustin
JülichJülich
June 30, 2010 Slide 3
Jülich Supercomputing Centre
June 30, 2010 Slide 4
JUGENE
IBM BlueGene/P 72 racks 32-bit PowerPC 450 SMP
processor @ 850 MHz 294,912 cores 144 TB RAM 1 Petaflop/s peak 0.826 Petaflop/s Linpack 3D Torus network
Number 5 worldwide, number 1 in Europe (Top500, June 2010)
June 30, 2010 Slide 5
JUGENE Compute Card
Source: IBM
June 30, 2010 Slide 6
Blue Gene/P Node Card
Source: IBM
June 30, 2010 Slide 7
Blue Gene/P Design
June 30, 2010 Slide 8
JUGENE
IBM BlueGene/P 72 racks 32-bit PowerPC 450 SMP
processor @ 850 MHz 294,912 cores 144 TB RAM 1 Petaflop/s peak 0.826 Petaflop/s Linpack 3D Torus network
Number 5 worldwide, number 1 in Europe (Top500, June 2010)
June 30, 2010 Slide 9
JuRoPA
Intel Nehalem Cluster Dual-socket, quad-core Intel
Nehalem @ 2.93 GHz 3288 nodes, 26,304 cores 79 TB RAM 308 Teraflop/s peak 275 Teraflop/s Linpack Infiniband with a Fat Tree
topology
Number 14 worldwide, number 3 in Europe (Top500, June 2010)
June 30, 2010 Slide 10
Non-blocking full “fat tree” (Infiniband)
JuRoPA Interconnect
June 30, 2010 Slide 11
Simulation Laboratory Biology
Olav Zimmermann Jan H. Meinke Sandipan Mohanty
June 30, 2010 Folie 12
Simulation Laboratory BiologyServiceResearch
Community
SL BIO3 Ph.D.Scientists1 M.S. Student
Structure predictionProtein folding andaggregationParallel algorithms
Projects w/ SL BioScientific supportWorkshops
DatabasesSoftwareBenchmarks
June 30, 2010 Slide 13
Python at the SimLab Biology
WorkflowPrototyping Production
Analysis and visualization
Irbäck, A., Mitternacht, S. & Mohanty, S. PMC Biophysics 2, 2 (2009).
Irbäck, A., Mitternacht, S. & Mohanty, S. PMC Biophysics 2, 2 (2009).
Zimmermann, O. & Hansmann, U.H.E. Journal of Chemical Information and Modeling 48, 1903-1908 (2008).
June 30, 2010 Slide 14
Proteins
ERVRISITARTKKEAEKFAAILIKVFAELGYNDINVTWDGDTVTEGQL
α-helix
β-sheet
June 30, 2010 Slide 15
Simple Molecular Mechanics for Proteins (SMMP)
Protein simulations with Monte Carlo Standard geometry (bond length and angle fixed) Dihedrals are degrees of freedom Force field: ECEPP/3
dihedral angles
http://apple.sysbio.info/~mjhsieh/sstour/
ω
June 30, 2010 Slide 16
PySMMP
Python modules:universe.pyprotein.py
Compiled Fortran code with binding:
smmp.so
Python modules:ParallelTempering.py
algorithms.py
Built with f2py
Wrapper around SMMP's internal data structure and property functions
Algorithms implemented on top of PySMMP
June 30, 2010 Slide 17
import universe, proteinimport ParallelTempering
seq = "EXAMPLES/1LQ7.seq"; var = ' '
myUniverse = universe.Universe()myProtein = protein.Protein(seq, var)myUniverse.add(myProtein)
Tmin = 250; Tmax = 1000; n = 32nequi = 10; sweeps = 60; nup = 1 try: dT = (Tmax - Tmin) / (n - 1.0)except: dT = 0
T = [int(Tmin + i * dT) for i in range(0, n)] myPT = ParallelTempering(myUniverse, nequi, sweeps, nup, T, seed=314)myPT.run()
June 30, 2010 Slide 18
Compiling PySMMP on JUGENE
Set environment variables export BGPGNU=/bgsys/drivers/ppcfloor/gnu-linux export F90=$BGPGNU/powerpc-bgp-linux/bin/gfortran
Use correct Python binary export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:\
$BGPGNU/lib $BGPGNU/bin/python /bgsys/local/numpy/1.2.1/bin/f2py
June 30, 2010 Slide 19
from mpi4py import MPIimport sys
size = MPI.COMM_WORLD.Get_size()rank = MPI.COMM_WORLD.Get_rank()name = MPI.Get_processor_name()
sys.stdout.write( "Hello, World! I am process %d of %d on %s.\n" % (rank, size, name))
helloworld.py
June 30, 2010 Slide 20
Launching Python
June 30, 2010 Slide 21
Parallel Tempering Monte Carlo
PPT=e E
T0
T1
T2
< <
P PT=e E
June 30, 2010 Slide 22
Parallel Tempering with SMMP and PySMMP
Calculation of Cartesian coordinates and energy.
SMMP PySMMP
Python modules:universe.pyprotein.py
algorithms.py
partem_p
common blocks
metropolis
ParallelTempering.py
June 30, 2010 Slide 23
GS-α3W (1LQ7)
Designed 3-helix bundle 67 amino acids 1110 atoms
June 30, 2010 Slide 24
Parallel Tempering with SMMP and PySMMP
Calculation of Cartesian coordinates and energy.
SMMP PySMMP
Python modules:universe.pyprotein.py
algorithms.py
partem_p
common blocks
metropolis
ParallelTempering.py
June 30, 2010 Slide 25
Scaling of the Energy Function
JuRoPA
JUGENE
SMMP
PySMMP
SMMP
PySMMP
June 30, 2010 Slide 26
Weak scaling of Parallel Tempering
June 30, 2010 Slide 27
Scaling of Parallel Tempering
June 30, 2010 Slide 28
Protein Clusters
Meinke, J.H. & Hansmann, U.H.E. J. Comp. Chem. 30, 1642--1648 (2009).
Meinke, J.H. & Hansmann, U.H.E. J. Comp. Chem. 30, 1642--1648 (2009).
June 30, 2010 Slide 29
Clustering
Fully connected
Connected componentswith minimum number of links
June 30, 2010 Slide 30
Distance Between Two Protein Conformations
Root-mean square deviation (rmsd) Dihedral rmsd Overlap of contacts Scores
→ n2 operations
June 30, 2010 Slide 31
Density-Based Clustering
June 30, 2010 Slide 32
MAFIA
Nagesh, H., Goil, S. & Choudhary A., Data mining for scientific and engineering applications (2001).
Nagesh, H., Goil, S. & Choudhary A., Data mining for scientific and engineering applications (2001).
June 30, 2010 Slide 33
PyMAFIA
def determineClusters(self):self.buildAdaptiveGrid()self.CDU = []
for A in xrange(0, self.d):for i in xrange(len(self.thresholds[A])):
self.CDU.append(((A, i), )) A = 0while self.CDU:
if A > 0:self.findCandidateDenseUnits()self.eliminateDuplicateCandidates()self.getDensityOfCDU()
self.identifyDenseUnits()A += 1
self.buildGraphOfDenseUnits() self.findClustersOfDenseUnits()
def buildAdaptiveGrid(self):
minimum = np.array([self.data[:, A].min() for A in xrange(self.d)])
maximum = np.array([self.data[:, A].max() for A in xrange(self.d)])
# Get collective extrema self.globalMinimum = np.zeros(self.d) self.globalMaximum = np.zeros(self.d)
self.comm.Allreduce([minimum, self.dataType], [self.globalMinimum, self.dataType], op = MPI.MIN)
self.comm.Allreduce(maximum, self.comm.Allreduce(maximum, self.globalMaximum,self.globalMaximum, MPI.MAX)MPI.MAX)
… …
def buildAdaptiveGrid(self):
minimum = np.array([self.data[:, A].min() for A in xrange(self.d)])
maximum = np.array([self.data[:, A].max() for A in xrange(self.d)])
# Get collective extrema self.globalMinimum = np.zeros(self.d) self.globalMaximum = np.zeros(self.d)
self.comm.Allreduce([minimum, self.dataType], [self.globalMinimum, self.dataType], op = MPI.MIN)
self.comm.Allreduce(maximum, self.comm.Allreduce(maximum, self.globalMaximum,self.globalMaximum, MPI.MAX)MPI.MAX)
… …
June 30, 2010 Slide 34
PyMAFIA in Action
June 30, 2010 Slide 35
( )
Conclusion
Python ready for developing HPC algorithms. ready for production runs in HPC. scales to 100 k cores on BG/P.