torik.ayoubi@vib -...

58
The state-of-the-art in genomics and proteomics data analysis and interpretation: challenges and opportunities Torik Ayoubi VIB [email protected]

Upload: vankhue

Post on 28-Jun-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

The state-of-the-art in genomics and proteomics data

analysis and interpretation: challenges and opportunities

Torik AyoubiVIB

[email protected]

Page 2: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Historical perspective on genomic  challenges

•James Watson

•Leroy Hood•Greg Venter

Page 3: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Historical perspective on genomic  challenges

•James Watson

Cavendish lab, Cambridge, 

data of Franklin, and Wilkins

Page 4: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Historical perspective on genomic  challenges

•Leroy Hood

Page 5: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Historical perspective on genomic  challenges

•Greg Venter

Page 6: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Take home message:

•Do not let any one come in between you and your  ambition, 

•There ain't

no mountain high enough. •Ain't

no valley low enough. 

•Ain't

no river wide enough.•If the computational or statistical tool you need to 

realize your ambition does not exist, make it,  develop it, program it or improve current technology

Page 7: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

The Basic Concept of Microarray  Technology

“Microspot

assays that rely on the immobilization of interacting elements on a few square microns

should, in principle, be capable of detecting analytes

with a higher sensitivity than

conventional macroscopic assays”

Ambient analyte

model: Ekins

et al (1990) Ann. 

Biol. Clin. 48, 655‐666

Page 8: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Microarray experiments  generate a huge amount of data

We Can Hardly Understand  What is Means, ....but 

We Can Measure Everything

Page 9: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Microarray experiments  generate a huge amount of data

Page 10: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Microarray experiments  generate a huge amount of data

Page 11: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Microarray experiments  generate a huge amount of data

Page 12: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Microarray experiments  generate a huge amount of data

Page 13: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Having one ball  to play with is 

fun

Large data sets are FUN

Page 14: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Having one ball  to play with is 

fun

Being in a Ball  Pit is REAL FUN

Large data sets are FUN

Page 15: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Problems Associated with large data sets

Page 16: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Common types of objectives

Class comparisonIdentify genes differentially expressed 

among predefined classes

Class discoveryDiscover clusters among specimens or 

among genes

Class predictionDevelop multi‐gene predictor of class label 

for a sample using its gene expression  profile

Page 17: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Concordance Issues

Fold Change

1.9 X

Fold Change

2.1 X

Let’s Publish !!

Fold Change

1.9 X

Fold Change

2.1 X

Threshold for 

Fold‐change 2.0

Microarrays SUCK

Page 18: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Problems Associated with large data sets

Page 19: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Tlr4 Gene KO Study in Mouse  Heart

Page 20: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

OXPHOS genes coordinately  decreased in diabetic muscle

Mootha

et al (2003) Nature Genetics 34: 267‐273

Page 21: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Common types of objectives

Class comparisonIdentify genes differentially expressed 

among predefined classes

Class discoveryDiscover clusters among specimens or 

among genes

Class predictionDevelop multi‐gene predictor of class label 

for a sample using its gene expression  profile

Page 22: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Why so many FATAL motor cycle accidents in the Caribbean?

Page 23: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Why so many FATAL motor cycle accidents in the Caribbean?

Page 24: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Why so many FATAL motor cycle accidents in the Caribbean?

NOBODY Wears a Helmet!!

Page 25: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Class Discovery Among Cancers

Page 26: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Class Discovery: using Cluster Analysis

Page 27: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

About 1000 Mitochondrial Genes of which 250 Enriched

in Heart Tissue

Page 28: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Skypaint of ~1000 Mitochondrial Genes

Page 29: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Skypaint of ~1000 Mitochondrial Genes

apoptosis

lipid and lipoprotein 

metabolism

metabolism of amino acids

nucleotide metabolism

pyruvate

metabolism and TCA cycle

translation

Page 30: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Skypaint of ~1000 Mitochondrial Genes

apoptosis

lipid and lipoprotein 

metabolism

metabolism of amino acids

nucleotide metabolism

pyruvate

metabolism and TCA cycle

translation

Page 31: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Biological Processes Heart- Enriched Mitochondrial

GenesTerm Count % PValue

oxidative phosphorylation 47 20.61% 7.39E-56electron transport 56 24.56% 4.85E-38coenzyme metabolism 40 17.54% 2.53E-33ATP synthesis coupled electron transport 20 8.77% 1.75E-23cofactor catabolism 17 7.46% 8.56E-22phosphate metabolism 50 21.93% 4.63E-17cofactor biosynthesis 23 10.09% 1.23E-15ATP metabolism 16 7.02% 9.66E-15proton transport 15 6.58% 1.94E-11hydrogen transport 15 6.58% 2.26E-11cellular carbohydrate metabolism 25 10.96% 7.07E-11carboxylic acid metabolism 26 11.40% 1.18E-08nucleotide biosynthesis 15 6.58% 4.81E-08nucleotide metabolism 16 7.02% 4.29E-07fatty acid metabolism 12 5.26% 1.28E-05mitochondrion organization and biogenesis 5 2.19% 6.35E-05mitochondrial membrane organization and biogenesis 4 1.75% 2.61E-04cation transport 19 8.33% 4.11E-04protein biosynthesis 24 10.53% 5.46E-04macromolecule biosynthesis 25 10.96% 9.52E-04

Page 32: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Class Discovery: using Cluster Analysis

Page 33: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Assigning Function to Genes

siRNA

mRNA 

ATP levels 

Q‐PCR

Luminescence

Tissue culture:•uncoupler: FCCP•ATPase

inhibitor: oligomycin•no  glucose

Page 34: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Common types of objectives

Class comparisonIdentify genes differentially expressed 

among predefined classes

Class discoveryDiscover clusters among specimens or 

among genes

Class predictionDevelop multi‐gene predictor of class label 

for a sample using its gene expression  profile

Page 35: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

National Center for Health Statistics:

http://www.cdc.gov/nchs/fastats/lifexpec.htm

Page 36: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

What is CUP?

• Cancer of Unknown Primary (CUP) is one of the 10 most  frequently occurring cancers worldwide. 

• (3‐4% of all cancer cases)

• Patients with CUP present themselves at the clinic with  metastatic

disease for which the primary tumor site remains 

unknown even after extensive attempts to determine the site  of tumor origin. 

Page 37: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

CUP is Difficult to Treat

• In the current state of knowledge in medical  oncology, the primary site appears important  for the correct choice of treatment

Page 38: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

• The fact that the physician does not know  the origin of the tumor puts an extra 

psychological burden on the patient

• The patient has an impression that the  diagnosis has not been thorough enough.

CUP Puts Extra Psychological Burden  on the Patient

Page 39: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

CUP has one of the Worst Prognoses  of all Human Malignancies

In unselected populations:

• poor prognosis of 3‐5 months

• less than 20% of patients alive one year after  diagnosis

Page 40: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

A Medical Oncologist’s Point of  View:

“Few diagnoses engender as much uncertainty, pessimism, and therapeutic nihilism.”

Barry C. Lemberski, MD,

Pathology Case Reviews July/August 2001

Page 41: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Need to Improve Poor Prognosis

In order to improve the poor prognosis of this tumor type, we need to 

‐Diagnose it better: get rid of the “unknown”

‐Better stratification

‐prognosis

‐response to treatment

‐Individualize treatment

Page 42: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Current approach:

Current diagnostic standards rely largely on morphological and  immunocytochemical

analysis. However, this is a laborious and time‐

consuming approach. 

The ESMO Minimum Clinical Recommendations for diagnosis  treatment and follow‐up of cancers of unknown primary site (CUP) 

include: • thorough physical examination • histological evaluation, combined with immunohistochemistry•

basic blood and biochemistry survey, PSA determination in male 

patients• urine analysis• fecal occult blood test• breast mammography in female patients• chest X‐ray• CT‐scan of the abdomen and pelvis

Page 43: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Proposed Alternative:

Development of a diagnostic method to determine site  of origin of CUPs

based on gene expression profiling. 

Gene expression profiling using high‐density  microarray technology is an extremely powerful tool to 

determine gene expression patterns that are  characteristic for a particular tissue or physiological 

condition. 

Page 44: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

TCLASS

Public 

Array 

Data

CUP data

TCLASS®

Page 45: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

TCLASS® Identifies Misclassified  NCI‐60 Cell Lines

Cell line Breast ER-_MDA_MB435 Melanoma_M14TCLASS Melanoma Melanoma

0025_sarcoma 1.11 1.180040_melanoma 18.12 22.310045_lymphoma 0.86 0.740117_repr_breast_tumor 0.54 0.820152_repr_ovary_tumor 1.21 1.280160_repr_endometrium_tumor 0.41 0.370170_cervix 1.06 0.860185_repr_prostate_tumor 0.26 0.340195_repr_testis 1.37 1.510200_skin 6.32 8.470260_digestive_colon_tumor 0.59 0.540270_digestive_small_intestine 1.07 0.650280_digestive_tumor_gastric 0.52 0.330317_digestive_liver_tumor 0.99 0.870322_digestive_pancreas_tumor 0.63 0.410332_renal_kidney_tumor 0.43 0.370337_renal_bladder_tclas 1.01 0.620357_respiratory_lung_tumor 1.02 0.880357_respiratory_lung_tumor_adeno 0.25 0.300357_respiratory_lung_tumor_neuroendocrine 1.06 0.750360_respiratory_lung_tumor_sq 2.01 1.360385_CNS_glioblastoma 0.86 0.620410_endocrine_thyroid_tumor 0.35 0.33

Pierson correlation : 0.9976

Page 46: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

http://www.bio.davidson.edu/people/macam

pbell/CSU/CaseStudies.html

Page 47: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and
Page 48: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and
Page 49: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

>600 Publications Mentioning MDA‐MB‐435 as 

Mammary Carcinoma in PubMed

Breast Cancer Res

Treat. 2007 Jul;104(1):13‐9. 

Page 50: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

www.mastergenix.com

Page 51: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Genomics Revolution

Major gains in knowledge have always been  the driving force for innovation, increased 

productivity and wealth.‐Agricultural revolution‐Industrial revolution‐Digital revolution‐Knowledgebase economy‐Genomic revolution

Digital world: 1,0Genetic information is digital: A,C,T,GThe code of life is DIGITAL!

Page 52: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Genomics Revolution

The genomic revolution is producing by far the largest  amount of knowledge ever generated by mankind and  this knowledge is about LIFE itself.

Genome Centers world‐wide produce more than the  printed collection of The Library of Congress (140 

million books!) every month!Most of this information is freely available

It is inevitable that this knowledge will be a major  driving force for future, innovation, increase in 

productivity and wealth.

Page 53: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Who is using this knowledge?

Mining gold, oil or minerals is expensive

Mining genomic and proteomic data is not

Most data/knowledge is accessible to everyone

But very few use it!

Even in the US most info is used in a couple of states

Bio‐literates‐Bio‐illiterates

Page 54: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

The Past

• 19th

century: India and China, represented  40% of global trade

Agriculture‐Industrialization

• Today about 5‐10%• However, Silicon Valley runs on ICs

Page 55: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Biggest Challenge

The realization that data mining and exploiting this genomic knowledge will be crucial to guarantee future productivity and wealth of western societies.

It is most of all: a Change in Mindset

In addition:-develop new data analysis and computational tools-educate a new generation of scientists to become familiar with the extraction of knowledge from such genomics and proteomics data sets.

Page 56: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and

Acknowledgements

• UM:Ruben MartherusFons

Stassen

Bert SmeetsPatrick Lindsey

• Mastergenix:Hans Bloemendal

Page 57: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and
Page 58: Torik.Ayoubi@vib - telemedicina.med.muni.cztelemedicina.med.muni.cz/bioinformatics2009/res/file/Ayoubi_Brno2009.pdfThe state-of-the-art in genomics and proteomics data analysis and