1 integrative biology exploiting e-science to combat fatal diseases damian mac randal cclrc

1

Integrative Biology exploiting e-Science to combat fatal diseases

Damian Mac RandalCCLRC

2

Overview of Talk

• Project background

• The scientific challenge

• The e-Scientific challenge

• Proposed system

3

Scientific background

• Breakthroughs in biotechnology and IT have provided a wealth (mountain) of biological data

• Key post-genomic challenge is to transform this data into information that can be used to determine biological function

• Biological function arises from complex non-linear interactions between biological processes occurring over multiple spatial and temporal scales

• Gaining an understanding of these processes is only possible via an iterative interplay between experimental data (in vivo and in vitro), mathematical modelling, and HPC-enabled simulation

4

e-Scientific background

• Majority of the first round of UK e-Science Projects focused

primarily on data intensive applications (Data storage, aggregation, and synthesis)

• Life Sciences projects focused on supporting the data generation work of laboratory-based scientists

• In other scientific domains, projects such as RealityGrid, GEODISE, and gViz began to consider compute-intensive applications.

5

The Science and e-Science Challenge

• To build an Integrative Biology Grid to support applications scientists addressing the key post-genomic aim of determining biological function

• To use this Grid to begin to tackle the two chosen Grand Challenge problems: the in-silico modelling of heart failure and of cancer. – Why these two? together they cause 61% of all deaths in the UK

6

Courtesy of Peter Kohl (Physiology, Oxford)

Normal beating Fibrillation

7

Multiscale modelling of the heart

Current flow through ion channels

Fibre orientation ensures correct spread of

excitation

Contraction of individual cells

MRI image of a beating heart

8

Heart modelling

• Typically solving coupled systems of PDEs (tissue level) and non-linear ODE’s (cellular level) for the electrical potential

• Complex three-dimensional geometries

• Anisotropic

• Up to 60 variables

• FEM and FD approaches

9

Details of test-run of Auckland heart simulation code on HPCx

• Modelled 2ms of electrophysiological excitation of a 5700mm3 volume of tissue from the left ventricular free wall

• Noble 98 cell model used

• Mesh contained 20,886 bilinear elements (spatial resolution 0.6mm)

• 0.05ms timestep (40 timesteps in total)

• Required 978s CPU on 8 processors and 2.5 Gbytes of memory

• A complete simulation of the ventricular myocardium would require up to 30 times the volume and at least 100 times the duration

• Estimated max compute time to investigate arrhythmia ~107s (~100 days) requiring ~100Gb of memory (compute time scales to the power ~5/3)

• At high efficiency this scales to approximately 1 day on HPCx

10

Multiscale modelling of cancer

11

Cancer modelling

• Focusing on avascular tumours

• Current models range from discrete population-based models and cellular automata, to non-linear ode systems and complex systems of non-linear PDEs

• Key goal is the coupling (where necessary) of these models into an integrated system which can be used to gain insight into experimental findings, to help design new experiments, and ultimately to test novel approaches to cancer detection, and new drugs and treatment regimes

12

Summary of the scientific challenge

Modelling and coupling phenomena which occur on many different length and time scales

• 1m person• 1mm tissue morphology• 1m cell function• 1nm pore diameter of a membrane proteinRange = 109

• 109 s (years) human lifetime• 107 s (months) cancer development• 106 s (days) protein turnover• 103 s (hours) digest food• 1 s heart beat• 1 ms ion channel gating• 1 s Brownian motionRange = 1015

13

The e-Science Challenge

• To leverage the first round of e-Science projects and the global Grid infrastructure to build an international “collaboratory” which places the applications scientist “within” the Grid allowing fully integrated and collaborative use of:

– HPC resources (capacity and capability)– Computational steering, performance control and visualisation– Storage and data-mining of very large data sets– Easy incorporation of experimental data – User- and science-friendly access

=> Predictive in-silico models to guide experiment and, ultimately, design of novel drugs and treatment regimes

14

Key e-Science Deliverables

• A robust and fault-tolerant infrastructure to support post-genomic research in integrative biology that is user and application driven

• 2nd Generation Grid bringing together components across range of current EPSRC pilot projects

15

e-Science/Grid Research Issues

– Ability to carry out reliably and resiliently large scale distributed coupled HPC simulations

– Ability to co-schedule Grid resources based on a GGF-agreed standard

– Secure data management and access-control in a Grid environment – Grid services for computational steering conforming to an agreed

GGF standard – Development of powerful visualisation and computational steering

capabilities for complex models• Contributing projects:

– RealityGrid, gViz, Geodise, myGrid, BioSimGrid, eDiaMoND, GOLD, various CCLRC projects, ….

16

Service oriented Architecture

• The user-accessible services will initially be grouped into four main categories:– Job management

• including deployment, co-scheduling and workflow management across heterogeneous resources

– Computational steering • both interactive for simulation monitoring/control and pre-defined for

parameter space searching– Data management

• from straightforward data handling and storage of results to location and assimilation of experimental data for model development and validation

– Analysis and visualization • final results, interim state, parameter spaces, etc, for steering

purposes

17

“Strawman Architecture”

IB

Data storage

Visualization Pipeline

Solver

Simulation Control

Data Control

ComputationalSteering

VisualizationControl

BrowserPortletServer

Filter

ResourceDirectory

Coupled Solver

Details as above

DataManagement

steeringparameters

lookup

metadata

data

datasetup

commands/feedback

cmds/f’back

commands/feedback

commands/feedback

monitoringstate

lookup

register

retrievecodelookup

setupjob

retrievemetadata

lookup

interimresults

results

cmds/f’backload

portal

model libraryCellML?

simulationlibrary

lookupdeploy

lookup

Localdisplay

remotedisplays

Map Map

Solver

Model

Data

State

SRB?

Database

Database

JobSubmission

JobComposition

JobDirectory

Security

Data“Mining”

IB

Data storage

Visualization Pipeline

Solver

Simulation Control

Data Control

ComputationalSteering

VisualizationControl

BrowserPortletServer

Filter

ResourceDirectory

Coupled Solver

Details as above

DataManagement

steeringparameters

lookup

metadata

data

datasetup

commands/feedback

cmds/f’back

commands/feedback

commands/feedback

monitoringstate

lookup

register

retrievecodelookup

setupjob

retrievemetadata

lookup

interimresults

results

cmds/f’backload

portal

model libraryCellML?

simulationlibrary

lookupdeploy

lookup

Localdisplay

remotedisplays

Map Map

Solver

Model

Data

State

SRB?

Database

Database

JobSubmission

JobComposition

JobDirectory

Security

Data“Mining”

Simulation Engine

Visualization

Data Management

External Resources

IB Server

18

Software architecture

• Underpinning development of the architecture are three fundamental considerations: – standardization, scalability and security

• Initially, Web service technology is being used for interactions between the system components

• Many of the underlying components are being adopted from previous projects, and adapted if necessary, in collaboration with their original developers

• Portal/portlet technology, integrated with the user's desktop environment, will provide users with a lightweight interface to the operational services

• The data management facilities are being built using Storage Resource Broker technology to provide a robust and scalable data infrastructure

• Security is being organized around “Virtual Organizations” to mirror existing collaborations

• A “rapid prototyping” development methodology is being adopted

19

Demonstrators

• Objectives:– Immediate boost in size/complexity of problems scientists can

tackle– Validation of the Architecture– Learning exercise, exploring new technology– Introduce scientists to potential of advanced IT, so they can better

specify requirements

• 4 Demonstrators, chosen for diversity

20

Demonstrators

• Implementation of GEODISE job submission middleware (via MATLAB) using the Oxford JISC cluster on the NGS. (A simple cellular model of nerve excitation)

• MPI implementation of Jon Whiteley and Prasanna Pathmanathan’s soft tissue deformation code (for use in image analysis for breast disease). (FEM code, non-linear elasticity)

• MPI implementation of Alan Garny’s 3D model of the SAN incorporating the ReG Steering Library (FD code for non-linear reaction-diffusion (anisotropic) plus an XML-based parser for cellular model definition)

• CMISS modelling environment for complex bioengineering problems - Peter Hunter, Auckland, NZ Production quality FE/BE library plus front/back ends)

21

Resources

• Project manager, project architect, 7.5 post-docs and 6 PhD students broken down into three main teams

• Heart modelling and HPC: 1.5 post-docs, 2 PhD students in Oxford, 0.5 post-doc at UCL. Led by Denis Noble and myself.

• Cancer Modelling: 1 senior post-doc and 2 PhD students in Nottingham, 1 post-doc and 1 PhD student in Oxford, 1 PhD student in Birmingham. Led by Helen Byrne in Nottingham. (Several further PhD students have also been funded from other sources)

• Interactive services and Grid team: Project architect plus 2 post-docs at CCLRC, 1 post-doc in Leeds, 0.5 post-doc at UCL

• Note: well over half of the effort is dedicated to the science

22

Current Status

• Official project start date 1/2/04, recruitment of staff now complete

• Initial project structure defined and agreed, initial requirements gathering and security policy exercises completed, initial architecture agreed

• Heart-modelling and cancer modelling workshops held in Oxford in June, with talks by user communities

• Cancer modelling meeting with all users in Oxford in July

• Full IB workshop with all stakeholders in Oxford, 29th September

• Survey of capabilities of existing middleware under way (thanks to everyone who has given us lots of their time)

• Four demonstrators identified and development commenced

23

Summary

• Science-driven project that aims to build on existing middleware to (begin to) prove the benefits of Grid computing for complex systems biology – i.e. to do some novel science

• Huge and increasing (initial) buy-in from the user community

• Challenge is to develop sufficiently robust and usable tools to maintain that interest.

1 integrative biology exploiting e-science to combat fatal diseases damian mac randal cclrc

Documents

experimental data

escience challenge

scientific domains

scientific challengemodelling

biological processes

nonlinear ode systems

computeintensive applications

round of uk escience