falk schreiber - from big data to smart knowledge ‐ integrating multimodal biological data and...

Post on 10-May-2015

471 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Modern data acquisition methods in the life sciences allow the procurement of different types of data in increasing quantity, facilitating a comprehensive view of biological systems. As data is usually gathered and interpreted by separate domain scientists, it is hard to grasp multi‐domain properties and structures. Consequently there is a need for the integration, analysis, modelling, simulation, and visualisation of life science data from different sources and of different types. This talk focuses on these two aspects: firstly, methods for the integration and visualization of multimodal biological data are presented. This is achieved based on two graphs representing the meta‐relations between biological data, and the measurement combinations, respectively. Both graphs are linked and serve as different views of the integrated data with navigation and exploration possibilities. Data can be combined and visualised multifariously, resulting in views of the integrated biological data. Secondly, methods to reconstruct, simulate, and analyse detailed metabolic models are presented. We will focus on stoichiometric models, and see how different types of data are used to gather new insights into metabolic processes shown on an example of metabolism in plants. First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

TRANSCRIPT

Martin Luther University Halle-Wittenberg

Falk Schreiber

From Big Data to Smart Knowledge

Integrating Multimodal Biological Data and Modelling Metabolism

14/07/2014 1

Leibniz Institute IPK Gatersleben

Observations

1.  A tidal wave of scientific data

Observations

1.  A tidal wave of scientific data

Year Time Costs (Mio. US$) 2003 13 years 2700 2007 a few months 1 2009 a few weeks 0,05 2014 a few days 0,001 ~2017 cheaper to reproduce data than storing it

Observations

1.  A tidal wave of scientific data 2.  From building blocks to complex systems

genes transcripts proteins metabolites

redu

ctio

nist

app

roac

h

Observations

1.  A tidal wave of scientific data 2.  From building blocks to complex systems

redu

ctio

nist

app

roac

h integrative approach

genes transcripts proteins metabolites

Observations

1.  A tidal wave of scientific data 2.  From building blocks to complex systems 3.  Multi-domain data

Observations

1.  A tidal wave of scientific data 2.  From building blocks to complex systems 3.  Multi-domain data

From Data to Knowledge – Outline of the Talk

Understanding metabolism

via modelling

From Data to Knowledge – Outline of the Talk

Understanding metabolism

via modelling

Integrating and exploring multimodal biological data

! Network of thousands of biochemical reactions ! Enzyme-catalysed ! Transporter-mediated

!  Supports all biological activity

! Metabolic model = List of reactions + associated information

Metabolism

Source: http://www.genome.jp/kegg/ Source: Michael 1993

+ kinetic rate laws + kinetic

parameters

Topological analysis

network structure

Petri net (P/T) analysis

+ thermodynamics + stoichiometry

Flux balance analysis (FBA)

+ mass balance + capacity constraints

+ stochastic rate laws

+ metabolite concentrations

Kinetic modelling

Petri net (SPN) analysis

Metabolic Models S

ize

of m

odel

Leve

l of d

etai

l

+ kinetic rate laws + kinetic

parameters

Topological analysis

network structure

Petri net (P/T) analysis

+ thermodynamics + stoichiometry

Flux balance analysis (FBA)

+ mass balance + capacity constraints

+ stochastic rate laws

+ metabolite concentrations

Kinetic modelling

Petri net (SPN) analysis

Metabolic Models S

ize

of m

odel

Leve

l of d

etai

l

Flux Balance Analysis

! Constraint-based stoichiometric modelling approach to predict and analyse the metabolic steady state conversion rates (fluxes)

!  Advantages ! No kinetic parameters required ! Quantitative predictions ! Applicable to large systems

!  Applications ! Prediction of optimal metabolic yields and flux distributions ! Prediction of phenotype/viability of knockout-mutants ! Prediction of pathway redundancies ! And more

Principles of Flux Balance Analysis

Simulation

Oxygene level

Objective Function

How to identify plausible physiological states?

Question Objective What are the biochemical production capabilities?

Maximise metabolite product

What is the maximal growth rate and biomass yield?

Maximise growth rate

How efficiently can metabolism channel metabolites through the network?

Minimise the Euclidean norm

What is the tradeoff between biomass production and metabolite overproduction?

Maximise biomass production for a given metabolite production

How energetically efficient can metabolism operate?

Minimise ATP production or minimise nutrient uptake

History of FBA

Software Tools and Pipelines for FBA

! CellNetAnalyzer (CNA) http://www.mpi-magdeburg.mpg.de/projects/cna/cna.html

! COBRA Toolbox http://gcrg.ucsd.edu/downloads/COBRAToolbox

!  FBA-SimVis http://fbasimvis.ipk-gatersleben.de

!  Thiele et al. A protocol for generating a high-quality genome-

scale metabolic reconstruction. Nature Protocols, 5(1): 93–121, 2010.

! Grafahrend-Belau et al. Plant metabolic pathways: databases and pipeline for stoichiometric analysis. In Agrawal and Rakwal (Eds.), Seed development: omics technologies toward improvement of seed quality and crop yield, Springer, 345-366, 2012.

FBA Model of seed Metabolism in Hordeum vulgare

Grafahrend-Belau et al. Plant Physiology, 2009

FBA Model of seed Metabolism in Hordeum vulgare

Grafahrend-Belau et al. Plant Physiology, 2009

Size 257 reactions, 234 metabolites

Pathways Glyc, TCA, PPP, oxP, Ferm, Rubisco, AA, Starch, CW, and others

Example of Model Application

!  Imaging uncovers metabolic compartmentation !  Alanine synthesis mainly in central endosperm, alanine gradient

reflects the local oxygen state ! Modelling purpose: elucidate the role of alanine metabolism

Source of images: L. Borisjuk and H. Rolletschek, IPK

Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011

Simulation of Region-specific Metabolism

A B

Central endosperm (hypoxic) Peripheral endosperm (aerobic)

Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011

Simulation of Region-specific Metabolism

A B

Central endosperm (hypoxic) Peripheral endosperm (aerobic)

Melkus et al. Plant Biotechnology Journal, 2011 Rolletschek et al. Plant Cell, 2011

Obtaining Parameters

!  Influx ! Quantification from video

data

! Relation of substances in the same area ! Multimodal alignment Scharfe et al. BMC Bioinformatics, 2010 Fester et al. GCB, 2009

!  Biomass accumulation ! Quantification from

image series Hartmann et al. BMC Bioinformatics, 2011

Scaling up - Multi* and High Throughput Modelling

Coupling of Organ-specific FBA Models

Coupling of FBA and FSA Models

Müller et al. IEEE PMA, 2012 Grafahrend-Belau et al. Plant Physiology, 2013

High Throughput Modelling

!  Path2Models: A pipeline to compute draft models !  >140.000 kinetic, logical and constraint-based models

Le Novère et al. BMC Systems Biology, 2013

High Throughput Modelling

!  Path2Models: A pipeline to compute draft models !  >140.000 kinetic, logical and constraint-based models

Le Novère et al. BMC Systems Biology, 2013

From Data to Knowledge – Outline of the Talk

Understanding metabolism

via modelling

Integrating and exploring multimodal biological data

Multi-domain Biological Data

Data Domains

Available Tools

Data Integration – A Major Problem (Example: Networks)

!  Bridge the abyss!

Data Integration – A Major Problem (Example: Networks)

! Many information resources can be utilized as IDMappers: ! Web services, web sites

(e.g. PICR, CRONOS, …) ! Relational databases

(e.g. STRING, PDD, …) ! Flat files

(e.g. Kegg, UniProt, …)

Overview: Mehlhorn et al. TransID – the flexible identifier mapping service 112-121 (Internat. Symp. Integrative Bioinformatics), 2013.

! Unified using the

BridgeDB framework

IDMappers

! Comprises a set of identifiers (nodes) and a set of identifier mappings (edges)

! Used to explore identifier interconnections !  Basis of the integration of biological networks !  Example

The Data Linkage Graph

(Tair) (UniProt) (EC number)

! Composed of biological networks and the inferred identifier mappings as mapping edges

! Mapping edges represent identifier connections in the data linkage graph

!  Example

The Integrated Graph

Data linkage graph Integrated graph

! Metabolic pathways: Glycolysis, Pyruvate metabolism from KEGG

! Gene regulatory network: Arabidopsis thaliana from Regulogs

Example

Example

Available Tools

Data, Mappings and Mapping Function

!  Set of measurements ! Mappings with the object path functions which derives the

relevant metadata and any set of graph element attributes !  Basis: ID Mappers

𝑚

𝑚

𝑚

𝑚

Rohn et al. Bioinformatics, 2011

Example of Integrated Data http://www.vanted.org

!  The ABC(DE)-model of Arabidopsis thaliana floral organ specification

! Determination of floral organ identity depends on the combinatorial expression of floral homeotic genes from different classes

!  Integration of color-coded images, representing floral homeotic gene expression patterns, into the context of a regulatory network

Junker et al. Frontiers in Plant Science, 2012.

Standards for Modelling and Simulation in SysBio

Standards for Modelling and Simulation in SysBio

Can You Understand This?

Can You Understand This?

Stimulates? but ... what exactly?

Associates into?

Trans- locates?

Reciprocal stimulation?

Is degraded?

Stimulates gene Trans- cription?

Ambiguity in Conventional Representation

Standardised Symbols are Important

Most English speaking country

Quebec Iran China Israel

Singapore Norway Poland USA and Canada

What is SBGN?

!  A way to unambiguously describe biochemical and cellular events in graphs

!  Limited amount of symbols (~30) à Smooth learning curve

! Can graphically represent quantitative models, biochemical pathways, at different levels of granularity

! Developed since 2006 by a interdisciplinary community, part of COMBINE

!  Three languages ! Process Descriptions à one state = one glyph ! Entity Relationships à one entity = one glyph ! Activity Flow à conceptual level

Graph Trinity: Three Languages in One http://sbgn.org

Process Description maps

Entity Relationships

maps

Activity Flow

maps

! Unambiguous ! Mechanistic !  Sequential ! Combinatorial

explosion

! Unambiguous ! Mechanistic ! Non-Sequential

!  Ambiguous ! Conceptual !  Sequential

Le Novère et al. Nature Biotechnology, 2009

Graph Trinity: Three Languages in One

Process Description

Entity Relationships

Activity Flow

Systems Biology Graphical Notation (SBGN)

Working with SBGN http://www.sbgn-ed.org

!  Verification Czauderna et al. Bioinformatics, 2010 !  Synthesis / bricks Junker et al. Trends in Biotechnology, 2012 !  Translation Czauderna et al. BMC Bioinformatics, 2013

!  Layout Schreiber et al. BMC Bioinformatics, 2009 Dwyer et al. IEEE Transactions Visualization & Computer Graphics, 2008

! Data integration Junker et al. Nature Protocols, 2012

Modelling, Visual Analytics, Standards, Network Analysis

Optimise

Predict

visualise, explore, integrate, analyse, model present, understand simulate, predict

Thank You

“We now have unprecedented ability to collect data about nature but there is now a crisis developing in biology, in that completely unstructured information does not enhance understanding. We need a framework to put all of this knowledge and data into - that is going to be the problem in biology. […] Driving toward that framework is really the big challenge.” Sydney Brenner

top related