computational problems in molecular biology dong xu computer science department 109 engineering...
Post on 18-Dec-2015
219 views
TRANSCRIPT
![Page 1: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/1.jpg)
Computational Problemsin Molecular Biology
Dong Xu
Computer Science Department109 Engineering Building WestE-mail: [email protected]
573-882-7064http://digbio.missouri.edu
![Page 2: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/2.jpg)
Lecture Outline
From DNA to gene
Protein sequence and structure
Gene expression
Protein interaction and pathway
Provide a roadmap for the entire course
Biology from system level (computational perspective)
![Page 3: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/3.jpg)
About Life
Life is wonderful: amazing mechanisms
Life is not perfect: errors and diseases
Life is a result of evolution
![Page 4: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/4.jpg)
Cells
Basic unit of life Prokaryotes/eukaryotes Different types of cell:
Skin, brain, red/white blood
Different biological function
Cells produced by cells Cell division (mitosis)
2 daughter cells
![Page 5: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/5.jpg)
DNA
Double Helix (Watson & Crick)
Nitrogenous Base Pairs Adenine Thymine [A,T]
Cytosine Guanine [C,G]
Weak bonds (can be broken)
Form long chains
![Page 6: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/6.jpg)
Genome
Each cell contains a full genome (DNA) The size varies:
Small for viruses and prokaryotes (10 kbp-20Mbp)Medium for lower eukaryotes
Yeast, unicellular eukaryote 13 Mbp Worm (Caenorhabditis elegans) 100 Mbp Fly, invertebrate (Drosophila melanogaster) 170 Mbp
Larger for higher eukaryotes Mouse and man 3000 Mbp
Very variable for plants (many are polyploid) Mouse ear cress (Arabidopsis thaliana) 120 Mbp Lilies 60,000 Mbp
![Page 7: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/7.jpg)
Differences in DNA
~2% ~4%
~0.2%
![Page 8: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/8.jpg)
Genes
Chunks of DNA sequence that can translate into functional biomolecules (protein, RNA)
2% human DNA sequence for coding genes
32,000 human genes, 100,000 genes in tulips
![Page 9: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/9.jpg)
Gene Structure
General structure of an eukaryotic gene
Unlike eukaryotic genes, a prokaryotic gene typically consists of only one contiguous coding region
![Page 10: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/10.jpg)
Informational Classes in Genomic DNA
Transcribed sequences (exons and introns) Messenger sequences (mRNA, exons only) Coding sequences (CDS, part of the exons only) Heads and tails: untranslated parts (UTR) Regulatory sequences ... and all the rest
Identify them: gene-finding
![Page 11: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/11.jpg)
Genetic CodeA=Ala=Alanine
C=Cys=Cysteine
D=Asp=Aspartic acid
E=Glu=Glutamic acid
F=Phe=Phenylalanine
G=Gly=Glycine
H=His=Histidine
I=Ile=Isoleucine
K=Lys=Lysine
L=Leu=Leucine
M=Met=Methionine
N=Asn=Asparagine
P=Pro=Proline
Q=Gln=Glutamine
R=Arg=Arginine
S=Ser=Serine
T=Thr=Threonine
V=Val=Valine
W=Trp=Tryptophan
Y=Tyr=Tyrosine
![Page 12: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/12.jpg)
Protein Synthesis
AGCCACTTAGACAAACTA (DNA)Transcribed to:
AGCCACUUAGACAAACUA (mRNA)Translated to:
SHLDKL (Protein)
![Page 13: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/13.jpg)
About Protein
10s – 1000s amino acids (average 300)
Lysozyme sequence (129 amino acids):KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS
TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS
DGNGMNAWVA WRNRCKGTDV QAWIRGCRL
Protein backbones:Side chain
![Page 14: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/14.jpg)
Evolution of Genes: Mutation
Genes alter (slightly) during reproduction
Caused by errors, from radiation, from toxicity
3 possibilities: deletion, insertion, alteration
Deletion: ACGTTGACTC ACGTGACTC
Insertion: ACGTTGACTC AGCGTTGACTC
Substitution: ACGTTGACTC ACGATGACTC
Mutations are mostly deleterious
![Page 15: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/15.jpg)
Ancestor
Gene duplication
X Y
Recombination
75%X 25%Y
Paralogs(related functions)
Mixed Homology
Orthologs(similar
function)
Evolution and Homology
Twilight zone: undetectable homology (<20% sequence identity)
![Page 16: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/16.jpg)
Sequence Comparison
o Pairwise sequence comparison
o multiple alignment
SAANLEYLKNVLLQFIFLKPG--SERERLLPVINTMLQLSPEEKGKLAAV O15045
NEKNMEYLKNVFVQFLKPESVP-AERDQLVIVLQRVLHLSPKEVEILKAA P34562
KNEKIAYIKNVLLGFLEHKE----QRNQLLPVISMLLQLDSTDEKRLVMS Q06704
REINFEYLKHVVLKFMSCRES---EAFHLIKAVSVLLNFSQEEENMLKET Q92805
MLIDKEYTRNILFQFLEQRD----RRPEIVNLLSILLDLSEEQKQKLLSV O42657
EPTEFEYLRKVMFEYMMGR-----ETKTMAKVITTVLKFPDDQAQKILER O70365
DPAEAEYLRNVLYRYMTNRESLGKESVTLARVIGTVARFDESQMKNVISS Q21071
STSEIDYLRNIFTQFLHSMGSPNAASKAILKAMGSVLKVPMAEMKIIDKK Q18013
![Page 17: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/17.jpg)
Phylogenetic Trees
Understand evolution
![Page 18: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/18.jpg)
Protein Structure
Lysozyme structure:
ball & stick strand surface
![Page 19: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/19.jpg)
Structure Features of Folded Proteins
Compact Secondary structures:
loop -helix -sheet
Protein cores mostly consist of -helices and -sheets
![Page 20: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/20.jpg)
Protein Structure Comparison
Structure is better conserved than sequence
Structure can adopt a wide range of mutations.
Physical forces favorcertain structures.
Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel
![Page 21: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/21.jpg)
Protein Folding Problem
A protein folds into a unique 3D structure under the physiological condition
Lysozyme sequence: KVFGRCELAA AMKRHGLDNY
RGYSLGNWVC AAKFESNFNT
QATNRNTDGS TDYGILQINS
RWWCNDGRTP GSRNLCNIPC
SALLSSDITA SVNCAKKIVS
DGNGMNAWVA WRNRCKGTDV
QAWIRGCRL
![Page 22: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/22.jpg)
Structure-Function Relationship
Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.
A predicted structure is a powerful tool for function inference. Trp repressor as a function switch
![Page 23: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/23.jpg)
Structure-Based Drug Design
HIV protease inhibitor
Structure-based rational drug design is still a major method for drug discovery.
![Page 24: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/24.jpg)
Gene Expression
Same DNA in all cells, but only a few percent common
genes expressed (house-keeping genes).
A few examples:
(1) Specialized cell: over-represented hemoglobin in blood cells.
(2) Different stages of life cycle: hemoglobins before and after
birth, caterpillar and butterfly.
(3) Different environments: microbial in nutrient poor or rich
environment.
(4) Special treatment: response to wound.
![Page 25: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/25.jpg)
Eucaryote Gene Expression Control
DNAPrimaryRNA
transcriptmRNA mRNA
nucleus cytosol
RNA transportcontrol
inactivemRNA
mRNA degradation
control
translationcontrol
nucleus membrane
transcriptionalcontrol
protein
inactiveprotein
protein activitycontrol
RNA processing
control
Methods: Mass-spec Microarray
![Page 26: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/26.jpg)
Gene Regulation
DNA sequenceStart of transcription
promoter
operator
![Page 27: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/27.jpg)
Microarray Experiments
Microarray data
Regulation/function/pathway/cellular state/phenotype
Disease: diagnosis/gene identification/sub-typing
Microarray chip
![Page 28: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/28.jpg)
Genetic vs. Physical Interaction
Regulatory network
Genetic interaction
Complex system
Physical interaction
Gene/protein interaction
Expressedgene
Transcriptionfactor
![Page 29: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/29.jpg)
Biological Pathway
![Page 30: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/30.jpg)
Studying Pathways throughSystems Biology Approach
RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC
sequence
structure
function protein interaction
gene regulation
pathway(cross-talk)
![Page 31: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/31.jpg)
Discussion
Possible impacts of biotechnology to our life
![Page 32: Computational Problems in Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d255503460f949fc323/html5/thumbnails/32.jpg)
Assignments
Required reading:* Chapter 13 in “Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, 2000.”
* Larry Hunter: molecular biology for computer scientists
Optional reading: http://www.ncbi.nih.gov/About/primer/bioinformatics.html
http://www.bentham.org/cpps1-1/Dong%20Xu/xu_cpps.htm