bioinformatics course

44
BIOINFORMATICS. Dr. Etienne Z. GNIMPIEBA [email protected] 1 May 2012

Upload: usd-bioinformatics

Post on 10-May-2015

1.190 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Bioinformatics Course

BIOINFORMATICS.

Dr. Etienne Z. [email protected]

1May 2012

Page 2: Bioinformatics Course

Plan

• PART I: Fundamentals• PART II: Career• PART III: Applications

22

Page 3: Bioinformatics Course

BIOINFORMATICS

3

PART IFUNDAMENTALS.

3

Page 4: Bioinformatics Course

• From biotechnology to bioinformatics

• Bioinformatics world• FOCUS: main areas• 2 Key concepts between

biology and computer science

44

Bioinformatics FundamentalsPlan

PART I: FundamentalsPART II: CareerPART III: Applications

Page 5: Bioinformatics Course

1The United Nations Convention on Biological Diversity, 2008

"Any technological application that uses

biological systems, living organisms,

or derivatives thereof, to make or modify products or processes for specific use.“1

5

Bioinformatics FundamentalsFrom Biotechnology to Bioinformatics 1

Page 6: Bioinformatics Course

Agriculture

Education

MedicineBioinformati

cs

Pharmacogenomics

Gene therapy

DNA Vaccines

Clinical trials

Reduce vulnerability

Increase nutritional

quality

Biotechnology Training Programs (BTPs)

Genetic test (DNA)

Good yield

Reduce dependence on

fertilizer, pesticide,

agrochemical

Novel substance in crop plant

Bio-process

Biochemical

Biosystems

Organism adapt.

Environment contaminatio

n

Bioengineerin

g

Biode

grad

atio

n &

Biore

med

iatio

n

Cloning

6

Bioinformatics Fundamentals

From Biotechnology to Bioinformatics: apply area 2

2 Spellex BioScientific, v.2011 6

Page 7: Bioinformatics Course

• Biology– Organism– Organ– Tissue– Cell– Metabolites– Proteins (enzymes)– RNA (TF)– Gene

• Clinics (health) – Pharmacy

• Drug• Material

– Hospital – Pathology/ Organ

/specialist• Cardio• Onco (Cancer)• Neuro• Pneumo• Dermato

• Ecology– Ecosystem– Adaptation– Growth

• Nutrition– Food – Nutrient– Micronutrient– Macromolecule– Vitamin– Molecule– Proteins

• Chemistry– Reaction– kinetic – Compound– Compartment– Inhibition– Activation

• Pharmaceutics– Molecule screening & modeling–Pharmacogenomics –Pharmacokinetics– Pharmacodynamics– Clinical trial (data management, e-CRF)

• Epidemiology– Population– Pandemic– Epidemic– mortality– Morbidity

BIO => life• Environment

– Contaminants– Factors

Bioinformatics FundamentalsBio World 3

3 Etienne Gnimpieba, 2012

7

Page 8: Bioinformatics Course

1950 201019801960 1970 1990 2000

H. sap

iens

geno

me

D. mel

anog

aste

r gen

ome

C. el

egan

s gen

ome

S. ce

revis

ae ge

nom

e

HT DNA se

quen

cing

H. infl

uenz

ae ge

nom

e

Auto

DNA sequ

ncin

g

Insu

linRi

bonu

cleas

e

Dayho

ff At

las

Auto

prot

ein

sequ

ence

rs

DNA sequ

encin

g

65 13.5 M105,000

3,900859568

ARPAnetEm

ail

Internet

EMBL

, Gen

Bank

PUB

CSD

PIR

Swiss

-Pro

t

FlyB

ase

PROS

ITE

PRIN

TS

TrEM

BLpf

amIn

terP

ro

UniP

rot

EMBnet NCBI SIBEBI

• Accumulating mass of data

• Biological systems complexity

• Development of new research interest on DNA

Bioinformatics FundamentalsChallenges 4

4 Attwood T. K., 2012 8

Page 9: Bioinformatics Course

• Accumulating mass of data

• Biological systems complexity

• Development of new research interest on DNA

9

Bioinformatics FundamentalsChallenges 5

5 MiPPI, 2007 9

Page 10: Bioinformatics Course

• Math– Calculus– Representation tools – Modeling & predicting tools– Formalisms – Exploration tools– Optimization tools– Theories– Inference tools– Statistics– Graphics (Surfaces, Volumes)– Comparison and 3D Matching(Vision, recognition)

• Software– Data manipulation

tools– Programming tools – Artificial intelligence

tools– High computing tools– Singling tools– Web

• Art & music– Design (Human machine

interaction)– Usefulness (beauty,

attractiveness)– Philosophy – Signal

• Physics– Quantum computing– Signal treatment tools– Biomedical material

interaction (electric, optic fiber, Wi-Fi, radio wave)

– Electrostatics– Robotics

• Data Manipulation / Management–Creation (Learning, interpreting, deducing, simulation, .. )

–Acquire / Collect–Organize–Store –Secure –Validate (standard, norms, safety)–Analyze (statistics, mining)–Visualize–Share (security, import, export, clean, …)

– Archiving

• Process–Experiment process design

–Algorithm–Process–Workflow• Material– Server– Network– Storage supports– Processor

10• Cloud

computin

g

Bioinformatics FundamentalsInformatics world 6

6 Etienne Gnimpieba, 2012 10

Page 11: Bioinformatics Course

Genome Sequence• Finding Genes in Genomic DNA• Characterizing Repeats in Genomic DNA• Duplications in the Genome• Secondary Structure “Prediction”

Protein Sequence• Sequence AlignmentDynamic Programming for Local vs Global Alignment• Multiple Alignment and Consensus Patterns• Scoring schemes and Matching statistics (How to tell if a given alignment or match is statistically significant)

• Basic Protein Geometry and Least-Squares Fitting• Calculating a helix axis in 3D via fitting a line• Calculation of Volume and Surface• Structural Alignment

Structures

Databases• Relational Database Concepts• Natural Join as "where“ selection on cross product• Array Referencing (perl/dbm)• Protein Units?• sequence, structure• motifs, modules, domains• Clustering and Trees• UPGMA• single-linkage• multiple linkage• Parsimony, Maximum likelihood• The Bias Problem

Genomics• Expression Analysis• Large scale cross referencing of information• Function Classification and Orthologs• The Genomic vs. Single molecule Perspective• Genome Comparisons• Structural Genomics• Genome Trees

• Molecular Simulation• How to measure the change in a vector (gradient)• Parameter Sets• Number Density• Poisson-Boltzman Equation• Lattice Models and Simplification

Modeling & Simulation

Bioinformatics Fundamentals

Bioinformatics World: some topics 7

7 Etienne Gnimpieba, 2012 11

Page 12: Bioinformatics Course

Bioinformatics FundamentalsBioinformatics World: some topics 8

8 SABU M. THAMPI, Dept. of CSE, LBS College of Engineering, Kasaragod, Kerala-671542, 2011 12

DNA Sequence

Gene & Genome Organization

Molecular Evolution

Protein Structure, Folding, Function, & Interaction

Metabolic Pathways

Regulation Signaling Networks

Physiology & Cell Biology Interspecies Interaction

Ecology & Environment

Methodology & ExpertiseSe

quen

ce

Phy

siol

ogy

(and

bey

ond)

Experiment CompulationInformation Technology

Hardware & Instrumentation Mathematical & Physical Models

Data standards, data representations, and analytical tools for complex biological data

Genome sequencing Geomonic data analysis

Statistical genetics

Proteomics Protein structure prediction, protein dynamics, protein folding

and design

Functional genomics

(microarrays, 2D-PAGE, etc.)

Dynamical systems modeling

High-tech field ecology

Computational ecology

Page 13: Bioinformatics Course

13

DNA

E

DNA

mRNA

EDegradatio

n

Degradation

Translation

Transcription

Gene Repressi

on

S P

Catalyse

Bioinformatics FundamentalsKey concept: central dogma of Molecular Biology 9,10

9 Barbeillini, 2003 10 Etienne Gnimpieba, 2012 13

Page 14: Bioinformatics Course

Genes and its binding sites

In the "induced" state, the lac repressor is NOT bound to the

operator site

In the "repressed" state, the repressor IS bound to the

operator.

Bioinformatics FundamentalsKey concept: Lactose Operon (Lac) 11

11 blc.arizona.edu 14

Page 15: Bioinformatics Course

Bioinformatics FundamentalsSummary Part I

15

Page 16: Bioinformatics Course

16

*BIOINFORMATICS PART II

Career.

16

Page 17: Bioinformatics Course

Bioinformatics Career

WHO?

WHAT?Doing by

17

PART I: FundamentalsPART II: CareerPART III: Applications

Page 18: Bioinformatics Course

Fundamental research

Development research (product)

Apply research

Used, commercialization, market

• Public institution– University( research project,

training)– Research center (research

project)– State & Federal agency (FDA, )• Companies

– Pharmaceuticals– Biotech– Agricultural & food– Health – Information systems

• International institutions

– WHO– UN

• Owner (your own boss)

– Contractor (entrepreneur)

– Consultant

Bioinformatics CareerWhere can you be a bioinformatician? 12

12 Etienne Gnimpieba, 2012 18

Page 19: Bioinformatics Course

• Algorithms• Databases and information

systems• Web technologies• Artificial intelligence and soft

computing• Information and computation

theory• Software engineering• Data mining• Image processing• Modeling and simulation• Signal processing• Discrete mathematics• Control and system theory• Statistics• Integrative computing• Database Administration

• DNA computing• Neural computing• Evolutionary computing• Immuno-computing• Swarm-computing• Cellular-computing• Visualization• Decisions making• Sequence Assembly • Genomic Sequence

Analysis• Functional genomics• Genotyping• Proteomics• Pharmacogenomics

As informaticians, you have a lot of tasks

Bioinformatics CareerWhat do you do in Bioinformatics?

19

Page 20: Bioinformatics Course

Skills Needed• Database administration and programming skills • (SQL Server, Oracle, Sybase, MySQL, CORBA, PERL,

Java, C, C++, web scripting).• Genomic sequence analysis , • Molecular modeling programs, • Biologist and computers scientists, • Skills for data analysis, storage and retrieval. • Skills filter information and from possible relationships

between datasets.

Training• Bachelor• Master• MD• PhD• High school

diploma

Eligibility biopharmaceutical :• Life Sciences Graduates• Computer Sciences Graduates• Databases Specialists• Engineering Graduates• Marketing and Management Graduates• MD-s, RN-s and Medical Professionals

Bioinformatics CareerHow to become a bioinformatician?

20

Page 21: Bioinformatics Course

• Bioinformatician– Cheminformatician– Computational Biologist– Gene Analyst– Genomic Scientist– Molecular Modeler– Phylogenitist– Protein Analyst– Scientific Curator– Structural Analyst

• Biomedical Computer Scientist• Geneticist• Computational Biologist

More than 100 profile denominations according to: country, company, domain, experience, education profile, competence

From BIO based profile to Informatics based profile

• Biostatistician• Scientist• Biomedical Chemist• Clinical Data Manager• Molecular

Microbiologist• Software/Database

Programmer• Medical

Writer/Technical Writer• Research Associates

and Research Scientists• Data analyst• Data designer

Bioinformatics CareerWho does bioinformatics?

21

Page 22: Bioinformatics Course

An example of a

Bioinformatician work profile

22

Bioinformatics CareerCareer profile: an example

22

Page 23: Bioinformatics Course

• Cloud

• Databank

• Database

• Data designer

• Information manipulation

• Create/collect information

• Statistic analysis

• Date inference, learning

• Model from data

• Model from SB

• Large scale model

Modeling & learning SB

Info

rmatics

Data manipulation

Bio

/lif

e

Sr. data manager

Sr. B

ioin

form

atic

s da

ta s

cien

tist

Data analyst

Data program

mer

Sr. c

ompu

tation

al b

iolo

gist

Bio

info

rmat

ics

data

Eng

.

Bioinform

atics manager

Bioinform

atics scientist B

ioinformatics analyst

Syst

em b

iolo

gy E

ng.

Bioinformatics CareerSummary Part II 13

13 Etienne Gnimpieba, 2012 23

Page 24: Bioinformatics Course

24

BIOINFORMATICSPART III

Applications.

Page 25: Bioinformatics Course

Bioinformatics Applications

25

PART I: FundamentalsPART II: CareerPART III: Applications

CORE

Tools

Tool

s

Tools

ToolsToolsTo

ols

ToolsTools

Ad Hoc Interface

Ad Hoc Interface

Ad Hoc Interfa

ceAd Hoc Interface

Ad Hoc Interface

Ad H

oc In

terf

ace

Ad Hoc Interfa

ce

Biology

Computer Science

Molecular Nutrition

Pharma-cology

MedicineEcology

14 COSBI Report, 2010

Overview 14

Page 26: Bioinformatics Course

Bioinformatics ApplicationsSmall synopsis view of bioinformatics 15

15 Korean Bioinformation Center, 2010 26

Page 27: Bioinformatics Course

• Data manipulation– Data analysis– Designing database and databank– Management (collect, store, explore, secure)– Inference/ mining– Statistics

• Model design– From biological process to mathematical formalism– Model checking and validation

• Program building– Data analyzing tools (implement algorithm)– Integration tools (data, program, model)– Modeling & Simulink tools– Data protection tools– …

Bioinformatics ApplicationsInformatician’s view of bioinformatics

27

Page 28: Bioinformatics Course

Molecular online tools and Bioextract Server.

Data Manipulation

Bioinformatics ApplicationsExeample 1

28

Page 29: Bioinformatics Course

Resolution process

Context

0. Specification & aims

Lab #1

Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart,

spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty.

Molecular online tools and server 16

Keywords: Bio: FXN, Frataxin, pancreatic cancer, CDKN4Math: HMM, Informatics: programing, bioinformatics tools, getting and exporting data

16 Korean Bioinformation Center, 2010

Conclusion: ?

Reduced expression of frataxin is the cause of Friedrich's ataxia (FRDA), a lethal neurodegenerative disease, how about liver cancer?

Aim: The purpose of this experiment is to initiate online biological exploration tools of the human genome. We simulated the application (FXN gene and pancreatic cancer). Now we can understand how a researcher can come to identify cross biological knowledge available in data banks.

T1. Genome exploration: Objective: used of Ensembl online tools to localize the FXN on the human genome and identify the genes implicate in pancreatic cancer disease. After, getting an appropriate data (sequence) on FASTA and Blast format.

T2. Sequences manipulation Objective: Find similar sequence using BLAST tools and make an alignment

on given sequences.

Acquired skillsOnline and server tools:- Query biological DB (fasta, Html, txt, figure formats)- Sequence tools (protein and gene) Mapping (tmap) Alignment (clustalw2)- Manage data result (select, keep, map, export)- Built and reuse workflow

Biological Hypothesis

FXN on chromosome 9

Frataxin molecule structure (pymol)

Pancreatic cancer

Pancreas anatomy

?

Bio

log

ical

DB

Tools

T1.1. Locate a given gene on human genomeT1.2. Get a genomic sequence from NCBI T1.3. Get the protein information and sequence from EBI T1.4. Save the export sequences data in data folder

T2.1. Find similar sequences using BLAST toolT2.2. Align generated sequences with ClustalW toolT1.3. Visualized result using phylogenic tree on Jalview

T2. Bioextract serverObjective: used server tool to optimized data manipulation process, apply on Bioextract server.

T3.1. Server Initialization T3.2. Pancreatic cancer & Frataxin (FXN) T3.3. Mapping, Alignment T3.4. Workflow save & reused

29

Page 30: Bioinformatics Course

Biostatistics: gene expression data analysis

Gene expression data: Microarray, NGS & qRT-PCR

[1] Saffroy & al., 2004[2] Chango & al., 2008

Bioinformatics ApplicationsExample 2

30

Page 31: Bioinformatics Course

Biological questionDifferentially expressed genesSample class prediction etc.

Testing

Biological verification and interpretation

Microarray experiment

Estimation

Experimental design

Image analysis

Normalization

Clustering Discrimination

Gene expression data (microarray, NGS) analysis process

Bioinformatics ApplicationsBiostatistics: gene expression data analysis

31

Page 32: Bioinformatics Course

Mathematical modeling of molecular nutrition

Model design

Bioinformatics ApplicationsExample 3

32

From food to molecule: folate absorption, metabolism, and distribution

Page 33: Bioinformatics Course

17 Achuthsankar S. Nair, 2007

Bioinformatics ApplicationsModel design: Molecular nutrition and nutrigenomic 17

Page 34: Bioinformatics Course

Mathematical modeling of Biological systems

Model design

Bioinformatics ApplicationsExample 2

34

Folate mediate one carbon metabolism: MTHFR (gene) mutation and cancer genesis

Page 35: Bioinformatics Course

Folate metabolism (folic acid or Vitamin B9) and pathogenesis

Bioinformatics ApplicationsMathematical modeling of Biological systems 18

18 J. M. Scott, 1994 35

Formalization of the model of metabolic networks

S

mi

rij(Eij,Vij)

mj

rji(Eji,Vji)

rii(Eii,Vii)

),,( ijijij Pmtfv

))()),,(,(),(

)(),( 00

tVPPtmtVdt

Ptdm

PmPtm

rc

neHomocysteikdt

Methionined

neHomocysteikdt

neHomocysteid

c

c

.

.

MethionineneHomocystei ck

Uracile m

ethylation

Fig. 6

0 5 10 1519.84

19.86

19.88

19.9

19.92

19.94

19.96

19.98

20

20.02

Time(Hours)

dU

MP

(µM

)

0 5 10 1510

10.02

10.04

10.06

10.08

10.1

10.12

10.14

10.16

10.18

Time(Hours)

dT

MP

(µM

)

0 5 10 151.95

1.96

1.97

1.98

1.99

2

2.01

Time(Hours)

dU

MP

/dT

MP

0 5 10 150.5

0.502

0.504

0.506

0.508

0.51

0.512

0.514

Time(Hours)

dT

MP

/dU

MP

0 5 10 1519.84

19.86

19.88

19.9

19.92

19.94

19.96

19.98

20

20.02

Time(Hours)

dU

MP

(µM

)

0 5 10 1510

10.02

10.04

10.06

10.08

10.1

10.12

10.14

10.16

10.18

Time(Hours)

dT

MP

(µM

)

0 5 10 151.95

1.96

1.97

1.98

1.99

2

2.01

Time(Hours)

dU

MP

/dT

MP

0 5 10 150.5

0.502

0.504

0.506

0.508

0.51

0.512

0.514

Time(Hours)

dT

MP

/dU

MP

dUMP/dTMP dUMP

UM

Unit

Unit

Unit

Unit

Unit

Unit

Unit

Unit

Unit

Unit

Page 36: Bioinformatics Course

Drug-DNA interaction

Model design

[1] Saffroy & al., 2004[2] Chango & al., 2008

Bioinformatics ApplicationsExample 4

36

Page 37: Bioinformatics Course

37

Ligand (drug molecule)Protein/DNA

Evaluate the uploaded molecule through the Lipinski's Rule of Five

Predict the possible target protein allosteric site

Target Protein ready for Docking

Target Protein ready for Docking

Docking & Scoring

[1] Saffroy & al., 2004[2] Chango & al., 2008

Bioinformatics ApplicationsModel design: drug-DNA interaction 19

19 B. Jayaram, 2011 37

Page 38: Bioinformatics Course

3D Modeling /simulation in biology

Model design

[1] Saffroy & al., 2004[2] Chango & al., 2008

Bioinformatics ApplicationsExample 5

38

Page 39: Bioinformatics Course

Bioinformatics ApplicationsModel design: 3D Modeling 20, 21

20 Google, 2011 21 E-Cell.org, 2011 39

Google Body browser E-cell project

Page 40: Bioinformatics Course

Cancer tumor model

Model design

[1] Saffroy & al., 2004[2] Chango & al., 2008

Bioinformatics ApplicationsExample 6

40

Page 42: Bioinformatics Course

Epidemiology: HIV spread

Model design

Bioinformatics ApplicationsExample 7

42

Page 44: Bioinformatics Course

THANKS.

44