protein folding with python on...

35
June 30, 2010 M i t g l i e d d e r H e l m h o l t z - G e m e i n s c h a f t Protein Folding with Python on Supercomputers Jan H. Meinke

Upload: others

Post on 22-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010

Mitg

lied

de

r H

e lm

hol

tz-G

em

e in

scha

ft

Protein Folding with Python on Supercomputers

Jan H. Meinke

Page 2: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 2

Research Centre Jülich

AustinAustin

JülichJülich

Page 3: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 3

Jülich Supercomputing Centre

Page 4: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 4

JUGENE

IBM BlueGene/P 72 racks 32-bit PowerPC 450 SMP

processor @ 850 MHz 294,912 cores 144 TB RAM 1 Petaflop/s peak 0.826 Petaflop/s Linpack 3D Torus network

Number 5 worldwide, number 1 in Europe (Top500, June 2010)

Page 5: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 5

JUGENE Compute Card

Source: IBM

Page 6: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 6

Blue Gene/P Node Card

Source: IBM

Page 7: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 7

Blue Gene/P Design

Page 8: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 8

JUGENE

IBM BlueGene/P 72 racks 32-bit PowerPC 450 SMP

processor @ 850 MHz 294,912 cores 144 TB RAM 1 Petaflop/s peak 0.826 Petaflop/s Linpack 3D Torus network

Number 5 worldwide, number 1 in Europe (Top500, June 2010)

Page 9: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 9

JuRoPA

Intel Nehalem Cluster Dual-socket, quad-core Intel

Nehalem @ 2.93 GHz 3288 nodes, 26,304 cores 79 TB RAM 308 Teraflop/s peak 275 Teraflop/s Linpack Infiniband with a Fat Tree

topology

Number 14 worldwide, number 3 in Europe (Top500, June 2010)

Page 10: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 10

Non-blocking full “fat tree” (Infiniband)

JuRoPA Interconnect

Page 11: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 11

Simulation Laboratory Biology

Olav Zimmermann Jan H. Meinke Sandipan Mohanty

[email protected]

Page 12: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Folie 12

Simulation Laboratory BiologyServiceResearch

Community

SL BIO3 Ph.D.Scientists1 M.S. Student

Structure predictionProtein folding andaggregationParallel algorithms

Projects w/ SL BioScientific supportWorkshops

DatabasesSoftwareBenchmarks

Page 13: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 13

Python at the SimLab Biology

WorkflowPrototyping Production

Analysis and visualization

Irbäck, A., Mitternacht, S. & Mohanty, S. PMC Biophysics 2, 2 (2009).

Irbäck, A., Mitternacht, S. & Mohanty, S. PMC Biophysics 2, 2 (2009).

Zimmermann, O. & Hansmann, U.H.E. Journal of Chemical Information and Modeling 48, 1903-1908 (2008).

Page 14: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 14

Proteins

ERVRISITARTKKEAEKFAAILIKVFAELGYNDINVTWDGDTVTEGQL

α-helix

β-sheet

Page 15: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 15

Simple Molecular Mechanics for Proteins (SMMP)

Protein simulations with Monte Carlo Standard geometry (bond length and angle fixed) Dihedrals are degrees of freedom Force field: ECEPP/3

dihedral angles

http://apple.sysbio.info/~mjhsieh/sstour/

ω

Page 16: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 16

PySMMP

Python modules:universe.pyprotein.py

Compiled Fortran code with binding:

smmp.so

Python modules:ParallelTempering.py

algorithms.py

Built with f2py

Wrapper around SMMP's internal data structure and property functions

Algorithms implemented on top of PySMMP

Page 17: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 17

import universe, proteinimport ParallelTempering

seq = "EXAMPLES/1LQ7.seq"; var = ' '

myUniverse = universe.Universe()myProtein = protein.Protein(seq, var)myUniverse.add(myProtein)

Tmin = 250; Tmax = 1000; n = 32nequi = 10; sweeps = 60; nup = 1 try: dT = (Tmax - Tmin) / (n - 1.0)except: dT = 0

T = [int(Tmin + i * dT) for i in range(0, n)] myPT = ParallelTempering(myUniverse, nequi, sweeps, nup, T, seed=314)myPT.run()

Page 18: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 18

Compiling PySMMP on JUGENE

Set environment variables export BGPGNU=/bgsys/drivers/ppcfloor/gnu-linux export F90=$BGPGNU/powerpc-bgp-linux/bin/gfortran

Use correct Python binary export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:\

$BGPGNU/lib $BGPGNU/bin/python /bgsys/local/numpy/1.2.1/bin/f2py

Page 19: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 19

from mpi4py import MPIimport sys

size = MPI.COMM_WORLD.Get_size()rank = MPI.COMM_WORLD.Get_rank()name = MPI.Get_processor_name()

sys.stdout.write( "Hello, World! I am process %d of %d on %s.\n" % (rank, size, name))

helloworld.py

Page 20: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 20

Launching Python

Page 21: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 21

Parallel Tempering Monte Carlo

PPT=e E

T0

T1

T2

< <

P PT=e E

Page 22: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 22

Parallel Tempering with SMMP and PySMMP

Calculation of Cartesian coordinates and energy.

SMMP PySMMP

Python modules:universe.pyprotein.py

algorithms.py

partem_p

common blocks

metropolis

ParallelTempering.py

Page 23: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 23

GS-α3W (1LQ7)

Designed 3-helix bundle 67 amino acids 1110 atoms

Page 24: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 24

Parallel Tempering with SMMP and PySMMP

Calculation of Cartesian coordinates and energy.

SMMP PySMMP

Python modules:universe.pyprotein.py

algorithms.py

partem_p

common blocks

metropolis

ParallelTempering.py

Page 25: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 25

Scaling of the Energy Function

JuRoPA

JUGENE

SMMP

PySMMP

SMMP

PySMMP

Page 26: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 26

Weak scaling of Parallel Tempering

Page 27: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 27

Scaling of Parallel Tempering

Page 28: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 28

Protein Clusters

Meinke, J.H. & Hansmann, U.H.E. J. Comp. Chem. 30, 1642--1648 (2009).

Meinke, J.H. & Hansmann, U.H.E. J. Comp. Chem. 30, 1642--1648 (2009).

Page 29: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 29

Clustering

Fully connected

Connected componentswith minimum number of links

Page 30: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 30

Distance Between Two Protein Conformations

Root-mean square deviation (rmsd) Dihedral rmsd Overlap of contacts Scores

→ n2 operations

Page 31: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 31

Density-Based Clustering

Page 32: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 32

MAFIA

Nagesh, H., Goil, S. & Choudhary A., Data mining for scientific and engineering applications (2001).

Nagesh, H., Goil, S. & Choudhary A., Data mining for scientific and engineering applications (2001).

Page 33: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 33

PyMAFIA

def determineClusters(self):self.buildAdaptiveGrid()self.CDU = []

for A in xrange(0, self.d):for i in xrange(len(self.thresholds[A])):

self.CDU.append(((A, i), )) A = 0while self.CDU:

if A > 0:self.findCandidateDenseUnits()self.eliminateDuplicateCandidates()self.getDensityOfCDU()

self.identifyDenseUnits()A += 1

self.buildGraphOfDenseUnits() self.findClustersOfDenseUnits()

def buildAdaptiveGrid(self):

minimum = np.array([self.data[:, A].min() for A in xrange(self.d)])

maximum = np.array([self.data[:, A].max() for A in xrange(self.d)])

# Get collective extrema self.globalMinimum = np.zeros(self.d) self.globalMaximum = np.zeros(self.d)

self.comm.Allreduce([minimum, self.dataType], [self.globalMinimum, self.dataType], op = MPI.MIN)

self.comm.Allreduce(maximum, self.comm.Allreduce(maximum, self.globalMaximum,self.globalMaximum, MPI.MAX)MPI.MAX)

… …

def buildAdaptiveGrid(self):

minimum = np.array([self.data[:, A].min() for A in xrange(self.d)])

maximum = np.array([self.data[:, A].max() for A in xrange(self.d)])

# Get collective extrema self.globalMinimum = np.zeros(self.d) self.globalMaximum = np.zeros(self.d)

self.comm.Allreduce([minimum, self.dataType], [self.globalMinimum, self.dataType], op = MPI.MIN)

self.comm.Allreduce(maximum, self.comm.Allreduce(maximum, self.globalMaximum,self.globalMaximum, MPI.MAX)MPI.MAX)

… …

Page 34: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 34

PyMAFIA in Action

Page 35: Protein Folding with Python on Supercomputersconference.scipy.org/scipy2010/slides/jan_meinke_protein... · 2014-02-02 · Python modules: universe.py protein.py Compiled Fortran

June 30, 2010 Slide 35

( )

Conclusion

Python ready for developing HPC algorithms. ready for production runs in HPC. scales to 100 k cores on BG/P.