bioinformatics course
TRANSCRIPT
Plan
• PART I: Fundamentals• PART II: Career• PART III: Applications
22
BIOINFORMATICS
3
PART IFUNDAMENTALS.
3
• From biotechnology to bioinformatics
• Bioinformatics world• FOCUS: main areas• 2 Key concepts between
biology and computer science
44
Bioinformatics FundamentalsPlan
PART I: FundamentalsPART II: CareerPART III: Applications
1The United Nations Convention on Biological Diversity, 2008
"Any technological application that uses
biological systems, living organisms,
or derivatives thereof, to make or modify products or processes for specific use.“1
5
Bioinformatics FundamentalsFrom Biotechnology to Bioinformatics 1
Agriculture
Education
MedicineBioinformati
cs
Pharmacogenomics
Gene therapy
DNA Vaccines
Clinical trials
Reduce vulnerability
Increase nutritional
quality
Biotechnology Training Programs (BTPs)
Genetic test (DNA)
Good yield
Reduce dependence on
fertilizer, pesticide,
agrochemical
Novel substance in crop plant
Bio-process
Biochemical
Biosystems
Organism adapt.
Environment contaminatio
n
Bioengineerin
g
Biode
grad
atio
n &
Biore
med
iatio
n
Cloning
6
Bioinformatics Fundamentals
From Biotechnology to Bioinformatics: apply area 2
2 Spellex BioScientific, v.2011 6
• Biology– Organism– Organ– Tissue– Cell– Metabolites– Proteins (enzymes)– RNA (TF)– Gene
• Clinics (health) – Pharmacy
• Drug• Material
– Hospital – Pathology/ Organ
/specialist• Cardio• Onco (Cancer)• Neuro• Pneumo• Dermato
• Ecology– Ecosystem– Adaptation– Growth
• Nutrition– Food – Nutrient– Micronutrient– Macromolecule– Vitamin– Molecule– Proteins
• Chemistry– Reaction– kinetic – Compound– Compartment– Inhibition– Activation
• Pharmaceutics– Molecule screening & modeling–Pharmacogenomics –Pharmacokinetics– Pharmacodynamics– Clinical trial (data management, e-CRF)
• Epidemiology– Population– Pandemic– Epidemic– mortality– Morbidity
BIO => life• Environment
– Contaminants– Factors
Bioinformatics FundamentalsBio World 3
3 Etienne Gnimpieba, 2012
7
1950 201019801960 1970 1990 2000
H. sap
iens
geno
me
D. mel
anog
aste
r gen
ome
C. el
egan
s gen
ome
S. ce
revis
ae ge
nom
e
HT DNA se
quen
cing
H. infl
uenz
ae ge
nom
e
Auto
DNA sequ
ncin
g
Insu
linRi
bonu
cleas
e
Dayho
ff At
las
Auto
prot
ein
sequ
ence
rs
DNA sequ
encin
g
65 13.5 M105,000
3,900859568
ARPAnetEm
ail
Internet
EMBL
, Gen
Bank
PUB
CSD
PIR
Swiss
-Pro
t
FlyB
ase
PROS
ITE
PRIN
TS
TrEM
BLpf
amIn
terP
ro
UniP
rot
EMBnet NCBI SIBEBI
• Accumulating mass of data
• Biological systems complexity
• Development of new research interest on DNA
Bioinformatics FundamentalsChallenges 4
4 Attwood T. K., 2012 8
• Accumulating mass of data
• Biological systems complexity
• Development of new research interest on DNA
9
Bioinformatics FundamentalsChallenges 5
5 MiPPI, 2007 9
• Math– Calculus– Representation tools – Modeling & predicting tools– Formalisms – Exploration tools– Optimization tools– Theories– Inference tools– Statistics– Graphics (Surfaces, Volumes)– Comparison and 3D Matching(Vision, recognition)
• Software– Data manipulation
tools– Programming tools – Artificial intelligence
tools– High computing tools– Singling tools– Web
• Art & music– Design (Human machine
interaction)– Usefulness (beauty,
attractiveness)– Philosophy – Signal
• Physics– Quantum computing– Signal treatment tools– Biomedical material
interaction (electric, optic fiber, Wi-Fi, radio wave)
– Electrostatics– Robotics
• Data Manipulation / Management–Creation (Learning, interpreting, deducing, simulation, .. )
–Acquire / Collect–Organize–Store –Secure –Validate (standard, norms, safety)–Analyze (statistics, mining)–Visualize–Share (security, import, export, clean, …)
– Archiving
• Process–Experiment process design
–Algorithm–Process–Workflow• Material– Server– Network– Storage supports– Processor
10• Cloud
computin
g
Bioinformatics FundamentalsInformatics world 6
6 Etienne Gnimpieba, 2012 10
Genome Sequence• Finding Genes in Genomic DNA• Characterizing Repeats in Genomic DNA• Duplications in the Genome• Secondary Structure “Prediction”
Protein Sequence• Sequence AlignmentDynamic Programming for Local vs Global Alignment• Multiple Alignment and Consensus Patterns• Scoring schemes and Matching statistics (How to tell if a given alignment or match is statistically significant)
• Basic Protein Geometry and Least-Squares Fitting• Calculating a helix axis in 3D via fitting a line• Calculation of Volume and Surface• Structural Alignment
Structures
Databases• Relational Database Concepts• Natural Join as "where“ selection on cross product• Array Referencing (perl/dbm)• Protein Units?• sequence, structure• motifs, modules, domains• Clustering and Trees• UPGMA• single-linkage• multiple linkage• Parsimony, Maximum likelihood• The Bias Problem
Genomics• Expression Analysis• Large scale cross referencing of information• Function Classification and Orthologs• The Genomic vs. Single molecule Perspective• Genome Comparisons• Structural Genomics• Genome Trees
• Molecular Simulation• How to measure the change in a vector (gradient)• Parameter Sets• Number Density• Poisson-Boltzman Equation• Lattice Models and Simplification
Modeling & Simulation
Bioinformatics Fundamentals
Bioinformatics World: some topics 7
7 Etienne Gnimpieba, 2012 11
Bioinformatics FundamentalsBioinformatics World: some topics 8
8 SABU M. THAMPI, Dept. of CSE, LBS College of Engineering, Kasaragod, Kerala-671542, 2011 12
DNA Sequence
Gene & Genome Organization
Molecular Evolution
Protein Structure, Folding, Function, & Interaction
Metabolic Pathways
Regulation Signaling Networks
Physiology & Cell Biology Interspecies Interaction
Ecology & Environment
Methodology & ExpertiseSe
quen
ce
Phy
siol
ogy
(and
bey
ond)
Experiment CompulationInformation Technology
Hardware & Instrumentation Mathematical & Physical Models
Data standards, data representations, and analytical tools for complex biological data
Genome sequencing Geomonic data analysis
Statistical genetics
Proteomics Protein structure prediction, protein dynamics, protein folding
and design
Functional genomics
(microarrays, 2D-PAGE, etc.)
Dynamical systems modeling
High-tech field ecology
Computational ecology
13
DNA
E
DNA
mRNA
EDegradatio
n
Degradation
Translation
Transcription
Gene Repressi
on
S P
Catalyse
Bioinformatics FundamentalsKey concept: central dogma of Molecular Biology 9,10
9 Barbeillini, 2003 10 Etienne Gnimpieba, 2012 13
Genes and its binding sites
In the "induced" state, the lac repressor is NOT bound to the
operator site
In the "repressed" state, the repressor IS bound to the
operator.
Bioinformatics FundamentalsKey concept: Lactose Operon (Lac) 11
11 blc.arizona.edu 14
Bioinformatics FundamentalsSummary Part I
15
16
*BIOINFORMATICS PART II
Career.
16
Bioinformatics Career
WHO?
WHAT?Doing by
17
PART I: FundamentalsPART II: CareerPART III: Applications
Fundamental research
Development research (product)
Apply research
Used, commercialization, market
• Public institution– University( research project,
training)– Research center (research
project)– State & Federal agency (FDA, )• Companies
– Pharmaceuticals– Biotech– Agricultural & food– Health – Information systems
• International institutions
– WHO– UN
• Owner (your own boss)
– Contractor (entrepreneur)
– Consultant
Bioinformatics CareerWhere can you be a bioinformatician? 12
12 Etienne Gnimpieba, 2012 18
• Algorithms• Databases and information
systems• Web technologies• Artificial intelligence and soft
computing• Information and computation
theory• Software engineering• Data mining• Image processing• Modeling and simulation• Signal processing• Discrete mathematics• Control and system theory• Statistics• Integrative computing• Database Administration
• DNA computing• Neural computing• Evolutionary computing• Immuno-computing• Swarm-computing• Cellular-computing• Visualization• Decisions making• Sequence Assembly • Genomic Sequence
Analysis• Functional genomics• Genotyping• Proteomics• Pharmacogenomics
As informaticians, you have a lot of tasks
Bioinformatics CareerWhat do you do in Bioinformatics?
19
Skills Needed• Database administration and programming skills • (SQL Server, Oracle, Sybase, MySQL, CORBA, PERL,
Java, C, C++, web scripting).• Genomic sequence analysis , • Molecular modeling programs, • Biologist and computers scientists, • Skills for data analysis, storage and retrieval. • Skills filter information and from possible relationships
between datasets.
Training• Bachelor• Master• MD• PhD• High school
diploma
Eligibility biopharmaceutical :• Life Sciences Graduates• Computer Sciences Graduates• Databases Specialists• Engineering Graduates• Marketing and Management Graduates• MD-s, RN-s and Medical Professionals
Bioinformatics CareerHow to become a bioinformatician?
20
• Bioinformatician– Cheminformatician– Computational Biologist– Gene Analyst– Genomic Scientist– Molecular Modeler– Phylogenitist– Protein Analyst– Scientific Curator– Structural Analyst
• Biomedical Computer Scientist• Geneticist• Computational Biologist
More than 100 profile denominations according to: country, company, domain, experience, education profile, competence
From BIO based profile to Informatics based profile
• Biostatistician• Scientist• Biomedical Chemist• Clinical Data Manager• Molecular
Microbiologist• Software/Database
Programmer• Medical
Writer/Technical Writer• Research Associates
and Research Scientists• Data analyst• Data designer
Bioinformatics CareerWho does bioinformatics?
21
An example of a
Bioinformatician work profile
22
Bioinformatics CareerCareer profile: an example
22
• Cloud
• Databank
• Database
• Data designer
• Information manipulation
• Create/collect information
• Statistic analysis
• Date inference, learning
• Model from data
• Model from SB
• Large scale model
Modeling & learning SB
Info
rmatics
Data manipulation
Bio
/lif
e
Sr. data manager
Sr. B
ioin
form
atic
s da
ta s
cien
tist
Data analyst
Data program
mer
Sr. c
ompu
tation
al b
iolo
gist
Bio
info
rmat
ics
data
Eng
.
Bioinform
atics manager
Bioinform
atics scientist B
ioinformatics analyst
Syst
em b
iolo
gy E
ng.
Bioinformatics CareerSummary Part II 13
13 Etienne Gnimpieba, 2012 23
24
BIOINFORMATICSPART III
Applications.
Bioinformatics Applications
25
PART I: FundamentalsPART II: CareerPART III: Applications
CORE
Tools
Tool
s
Tools
ToolsToolsTo
ols
ToolsTools
Ad Hoc Interface
Ad Hoc Interface
Ad Hoc Interfa
ceAd Hoc Interface
Ad Hoc Interface
Ad H
oc In
terf
ace
Ad Hoc Interfa
ce
Biology
Computer Science
Molecular Nutrition
Pharma-cology
MedicineEcology
14 COSBI Report, 2010
Overview 14
Bioinformatics ApplicationsSmall synopsis view of bioinformatics 15
15 Korean Bioinformation Center, 2010 26
• Data manipulation– Data analysis– Designing database and databank– Management (collect, store, explore, secure)– Inference/ mining– Statistics
• Model design– From biological process to mathematical formalism– Model checking and validation
• Program building– Data analyzing tools (implement algorithm)– Integration tools (data, program, model)– Modeling & Simulink tools– Data protection tools– …
Bioinformatics ApplicationsInformatician’s view of bioinformatics
27
Molecular online tools and Bioextract Server.
Data Manipulation
Bioinformatics ApplicationsExeample 1
28
Resolution process
Context
0. Specification & aims
Lab #1
Statement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart,
spinal cord, liver, pancreas, and muscles. The protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic condition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty.
Molecular online tools and server 16
Keywords: Bio: FXN, Frataxin, pancreatic cancer, CDKN4Math: HMM, Informatics: programing, bioinformatics tools, getting and exporting data
16 Korean Bioinformation Center, 2010
Conclusion: ?
Reduced expression of frataxin is the cause of Friedrich's ataxia (FRDA), a lethal neurodegenerative disease, how about liver cancer?
Aim: The purpose of this experiment is to initiate online biological exploration tools of the human genome. We simulated the application (FXN gene and pancreatic cancer). Now we can understand how a researcher can come to identify cross biological knowledge available in data banks.
T1. Genome exploration: Objective: used of Ensembl online tools to localize the FXN on the human genome and identify the genes implicate in pancreatic cancer disease. After, getting an appropriate data (sequence) on FASTA and Blast format.
T2. Sequences manipulation Objective: Find similar sequence using BLAST tools and make an alignment
on given sequences.
Acquired skillsOnline and server tools:- Query biological DB (fasta, Html, txt, figure formats)- Sequence tools (protein and gene) Mapping (tmap) Alignment (clustalw2)- Manage data result (select, keep, map, export)- Built and reuse workflow
Biological Hypothesis
FXN on chromosome 9
Frataxin molecule structure (pymol)
Pancreatic cancer
Pancreas anatomy
?
Bio
log
ical
DB
Tools
T1.1. Locate a given gene on human genomeT1.2. Get a genomic sequence from NCBI T1.3. Get the protein information and sequence from EBI T1.4. Save the export sequences data in data folder
T2.1. Find similar sequences using BLAST toolT2.2. Align generated sequences with ClustalW toolT1.3. Visualized result using phylogenic tree on Jalview
T2. Bioextract serverObjective: used server tool to optimized data manipulation process, apply on Bioextract server.
T3.1. Server Initialization T3.2. Pancreatic cancer & Frataxin (FXN) T3.3. Mapping, Alignment T3.4. Workflow save & reused
29
Biostatistics: gene expression data analysis
Gene expression data: Microarray, NGS & qRT-PCR
[1] Saffroy & al., 2004[2] Chango & al., 2008
Bioinformatics ApplicationsExample 2
30
Biological questionDifferentially expressed genesSample class prediction etc.
Testing
Biological verification and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
Gene expression data (microarray, NGS) analysis process
Bioinformatics ApplicationsBiostatistics: gene expression data analysis
31
Mathematical modeling of molecular nutrition
Model design
Bioinformatics ApplicationsExample 3
32
From food to molecule: folate absorption, metabolism, and distribution
17 Achuthsankar S. Nair, 2007
Bioinformatics ApplicationsModel design: Molecular nutrition and nutrigenomic 17
Mathematical modeling of Biological systems
Model design
Bioinformatics ApplicationsExample 2
34
Folate mediate one carbon metabolism: MTHFR (gene) mutation and cancer genesis
Folate metabolism (folic acid or Vitamin B9) and pathogenesis
Bioinformatics ApplicationsMathematical modeling of Biological systems 18
18 J. M. Scott, 1994 35
Formalization of the model of metabolic networks
S
mi
rij(Eij,Vij)
mj
rji(Eji,Vji)
rii(Eii,Vii)
),,( ijijij Pmtfv
))()),,(,(),(
)(),( 00
tVPPtmtVdt
Ptdm
PmPtm
rc
neHomocysteikdt
Methionined
neHomocysteikdt
neHomocysteid
c
c
.
.
MethionineneHomocystei ck
Uracile m
ethylation
Fig. 6
0 5 10 1519.84
19.86
19.88
19.9
19.92
19.94
19.96
19.98
20
20.02
Time(Hours)
dU
MP
(µM
)
0 5 10 1510
10.02
10.04
10.06
10.08
10.1
10.12
10.14
10.16
10.18
Time(Hours)
dT
MP
(µM
)
0 5 10 151.95
1.96
1.97
1.98
1.99
2
2.01
Time(Hours)
dU
MP
/dT
MP
0 5 10 150.5
0.502
0.504
0.506
0.508
0.51
0.512
0.514
Time(Hours)
dT
MP
/dU
MP
0 5 10 1519.84
19.86
19.88
19.9
19.92
19.94
19.96
19.98
20
20.02
Time(Hours)
dU
MP
(µM
)
0 5 10 1510
10.02
10.04
10.06
10.08
10.1
10.12
10.14
10.16
10.18
Time(Hours)
dT
MP
(µM
)
0 5 10 151.95
1.96
1.97
1.98
1.99
2
2.01
Time(Hours)
dU
MP
/dT
MP
0 5 10 150.5
0.502
0.504
0.506
0.508
0.51
0.512
0.514
Time(Hours)
dT
MP
/dU
MP
dUMP/dTMP dUMP
UM
Unit
Unit
Unit
Unit
Unit
Unit
Unit
Unit
Unit
Unit
Drug-DNA interaction
Model design
[1] Saffroy & al., 2004[2] Chango & al., 2008
Bioinformatics ApplicationsExample 4
36
37
Ligand (drug molecule)Protein/DNA
Evaluate the uploaded molecule through the Lipinski's Rule of Five
Predict the possible target protein allosteric site
Target Protein ready for Docking
Target Protein ready for Docking
Docking & Scoring
[1] Saffroy & al., 2004[2] Chango & al., 2008
Bioinformatics ApplicationsModel design: drug-DNA interaction 19
19 B. Jayaram, 2011 37
3D Modeling /simulation in biology
Model design
[1] Saffroy & al., 2004[2] Chango & al., 2008
Bioinformatics ApplicationsExample 5
38
Bioinformatics ApplicationsModel design: 3D Modeling 20, 21
20 Google, 2011 21 E-Cell.org, 2011 39
Google Body browser E-cell project
Cancer tumor model
Model design
[1] Saffroy & al., 2004[2] Chango & al., 2008
Bioinformatics ApplicationsExample 6
40
Bioinformatics ApplicationsModel design: cancer tumor development 22
22 Northwestern, 2010 41
Epidemiology: HIV spread
Model design
Bioinformatics ApplicationsExample 7
42
Bioinformatics ApplicationsModel design: HIV spread 23
23 Northwestern, 2010 43
THANKS.
44