building dbnp the nutritional phenotype database · 2010-02-08 · the european nutrigenomics...

Post on 13-Apr-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Building dbNPthe nutritional phenotype databaseA real data structure for systems biology

Chris Evelo

Department of Bioinformatics - BiGCaT

Maastricht University

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The NBX network

1. Small server network (nbx)

2. Data-Grid

3. Nutritional Phenotype Database (dbNP)

4. Genomics pipelines

5. Statistical tools

6. Pathway analysis

7. Pathway development

8. Systems Biology

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NBX network

BioLinux 5.0Repository at NEBC

NuGO NBXRepositories,

Potsdam, Maastricht

Ubuntu 8.04

Included inBioLinux

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO e.g. http://nbx1.nugo.org

1. Web access through GenepatternBroad Institutes interface for biologists to bioinformatics.

2. A NuGO desktop for interactive analysis.

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The NuGO Desktop

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Also for

• Yet another nbx (small server)

• A workstation (PC)

• Memory stick (when needed)

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The Data-Grid

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The Data-Grid

Transparent data sharing

Allows to:• Access shared data

on any nbxwithout knowing where it is.

• Share data with other NuGO members• Use NuGOnet to give access (LDAP)

Contains PPS and focus team areas

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Pathway & GOTechnology X

Profiles

The overall structure

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The basic structure

Clean data db

Study metadata

db

Dataanalysis

sampleanalysis

1

2

3

4

5

Study design

Clean data db

Study metadata

db

Dataanalysis

sampleanalysis

1

2

3

4

5

Study design

1) Protocols are stored in the study metadata database.

2) Analytical procedures on study samples

3) Processing to ‘clean data’

4) By interrogating the study metadata database, data subsets of multiple

studies can be selected

5) and analyzed by statistical and bioinformatics tools.

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The food intake module

Clean data db

Study metadata

db

Dataanalysis

sampleanalysis

1

2

3

4

5

Study design

Clean data db

Study metadata

db

Dataanalysis

sampleanalysis

1

2

3

4

5

Study design

Food

analysis

Food

consumption

6

7

8

9

Clean data db

Study metadata

db

Dataanalysis

sampleanalysis

1

2

3

4

5

Study design

Clean data db

Study metadata

db

Dataanalysis

sampleanalysis

1

2

3

4

5

Study design

Food

analysis

Food

consumption

6

7

8

9

6) food consumption captured

7) and stored in the clean data database.

8) foods are analysed

9) and food composition data are stored in the clean data database

2) The food metabolome is analysed in biofluid samples

5) Statistical and bioinformatic analyses of the food metabolome data are

used to assess food intake or compliance to the dietary intervention.

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Pathway & GOTechnology X

Profiles

The overall structure

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

Why Pathway Analysis?

Intuitive to Biologists

• Provide biological context for results

• More efficient than searching databases gene-by-gene

Computation on Pathway Content

• Analyze over-representation of changed genes or metabolites

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO NBX Pathway tools

• PathVisio: Open source Maastricht/San Francisco, general pathwaytool, statistics, metabolomics, plugin architecture

• EuGene: Florence, pathways statistics, large pathway collection, uses PathVisio.

• Metacore: commercial tool, interesting content

• WikiPathways: pathway creation, webservices

• Cytoscape: Open source, network analysis, plugins for PathVisio/WikiPathways.

Biological Pathways

PathVisioPathVisio

• Visualize data on biological pathways

• It can use gene expression, proteomics and metabolomics data

• Identify significantly changed processes

www.pathvisio.org

Martijn P van Iersel, Thomas Kelder, Alexander R Pico, Kristina Hanspers,

Susan Coort, Bruce R Conklin, Chris Evelo (2008) Presenting and

exploring biological pathways with PathVisio. BMC Bioinformatics 9: 399

Pathway Analysis: Z-score

1717

2 dimensional

pathway profile

vector

PID1, zscore 1

PID2, zscore 2

WikiPathways

• Public resource for biological pathways

• Anyone can contribute and curate

• More up-to-date representation of biological knowledge

WikiPathways: Pathway Editing for the People. Alexander R. Pico, Thomas Kelder, Martijn P. van Iersel, Kristina Hanspers, Bruce R. Conklin, Chris Evelo. PLoS Biology 2008: 6: 7. e184

Commentaries:Big data: Wikiomics. Mitch Waldrop. Nature 2008: 455, 22-25

We the curators. Allison Doerr. Nature Methods 2008: 5, 754 - 755

Download

Tutorials are in Help

Revision history

Select two versions

Click

Diff tool

Species• We can add

any speciesthat is in ENSEMBL

• Or anything where youcreate an ENSEMBL like DB

• Just ask!

Metabolomics

Visualization and statistics fully supported

HMDB

Chebi

NuGOwiki

Automatic content generation

GO, KEGG and BioPAX: converters available

- GO analysis in R (through webservices) vsiualisation in tables.

- KEGG full downloadable set

- BioPAX: interesting for Reactome round trip

Assisted content generation

Suggestions from:

• HMDB

• KEGG

• BIND/IntAct

• Text mining (Phasar, EBI-tools)

• WikiPathways itself

• Extendible…

Assisted content generation

Assisted content generation

Assisted content generation

Assisted content generation

Portals

Need one?

Webservice

Web service

Mining biological pathways using WikiPathways web services. Thomas Kelder, Alexander R Pico, Kristina Hanspers, Martijn P van Iersel, Chris Evelo, Bruce R Conklin. PLoS One 2009: 4:7 07

Webservice example

Integrates data from

ArrayExpress Atlas

with WikiPathways pathways

Cytoscape plugin

Search and open pathways

from WikiPathways directly in

Cytoscape.

• SSOAPhttp://www.omegahat.org/SSOAP/

R Example

GSEA in R on WikiPathways

Pathways using Webservices

Credit system

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO A real life example

Using Pathways in Networks

The NuGO PPS1 for Dummies

• Mice

• High fat vs low fat diet

• Samples taken before and after treatment

• Three tissues: liver, muscle and white adipose

Distribution genes vs. pathways

Genes with q <= 0.01 Pathways with z >= 2

Cytoscape visualization

• Network visualization

• Data visualization

– Z score for pathways

– Expression profile for reporter nodes

Pathway

Significant reporter

A

B

G1

G2

G3

G1, G3: present in pathway A

G2: present in pathway A and B

Higher z-score

Log2(tn/t0): [t1, t6, t9, t12]

Liver

All pathways

Pathways with high z-score

grouped together.

Explains why there are

relatively few significant

genes, but many pathways

with high z-score.

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Can we also?

We can show metabolites and do pathway statistics.

Can we also work with fluxes, intergratedynamice modelling.

Not yet! SBML integration started (hard)

Flux representation planned (eassier)

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Can we also?

We can show gene products.

Can we also combine with sequence basedregulatory information (e.g. TF binding).

Yes! Direct connection to Cytoscape, many data combinations possible (but needs coding).

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Can we also?

We can show gene products.

Can we also show SNPs.

Yes! Well… almost ;-)BridgeDB connections allow this.

But… What do you want to see?

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO

Pathway & GOTechnology X

Profiles

The overall structure

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

Integrated Pipelines

• QC• Normalisation• Data treatment• Statistics• …

Most often done in R/Bioconductor

Using NuGO R-servers or Grid

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The overall structure

Pathway & GOTechnology X

Profiles

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The overall structure

Pathway & GOTechnology X

Profiles

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

BridgeDb

Problem: Identifier Mapping

?

Affymetrix probeset100234_at

Entrez Gene 3643

Solution: Conversion tools

Problem: Usability

• Check for double IDs

• Check for missing IDs

• Only 1000 at once

• Check alignment of

Excel columns

• Manual

• Error-prone

Solution: Built-in Mapping

• Genericbioinformaticsplatforms shouldhave identifiermapping built-in.

BioConductor

PathVisio

Cytoscape

...BatteriesIncluded

Solution: Built-in Mapping

Mapping

service

Entrez Gene 3643

Affymetrix probeset100234_at

• Synergizer

• EnsMart

• DAVID

• CRONOS

• AliasServer

• MatchMiner

• OntoTranslate

Problem: Which mapping service?

Solution: Abstraction Layer

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO The overall structure

Pathway & GOTechnology X

Profiles

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

AnalysisTech X

Studydesign

Technology XClean data db

Technology XRaw data db

Samples

Study metadata

db

Research answer

Queryinterface

Data selection for bioinformatics

Statisticaltoolbox

Bioinformaticstoolbox

Study subset selection

Research question

IdentifiermappingBridgeDB

Technology XData processing

IdentifiermappingBridgeDB

Or, from a user perspective:

metabolite

info

genome

info

Biological

information

Bioinformatics

Toolbox

Food

composition

Food

metabolome

Intake

Nutrition

information

Genepattern

R

proteome

db

metabolite

db

Generic

Data storage

GEO

Array

Express

PathVisio

Cytoscape

Specific

db`sSelenoDB

HuGE-

net

Polyphenol

db

proteome

info

WikiPathwaysKnowledge

sources

Semantic

Web

PubMed

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGOSemantic web integration

Literature,

text mining

External databases,

data mining

Concept store Triple store

dbNP Analytical tools

Knowledge integration

allows systems biology

approaches

www.dbNP.org

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Available

• NBXs

• DataGrid

• Toolbox (R, Cytoscape, PathVisio, Eu.Gene, etc)

• BridgeDB

• Data processing pipelines– Affymetrix

– Two colour– ChIP and Methylation arrays

– (miRNA analysis)

• Environment (Grails) chosen

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Being done

• Specifications (on wiki)

• Data format (Isatab +)

• Capturing tool

• Data structure

• Importing SNP data and representing

the European Nutrigenomics Organisation

NuNuGOGONuNuGOGO Many people involved

Challenges of molecular nutrition research 6:

The Nutritional Phenotype database to store, share and

evaluate nutritional systems biology studies.

Ben van Ommen, Jildau Bouwman, Lars Dragsted,

Christian A. Drevon, Ruan Elliott, Philip de Groot, Jim

Kaput, John C. Mathers, Michael Müller, Fre Pepping, Jahn

Saito, Augustin Scalbert, Marijana Radonjic, Philippe

Rocca-Serra, Tony Travis, Suzan Wopereis and

Chris Evelo. Genes and Nutrition in press

Developers on dnNP.org

Many more to come…

top related