genomics and bioinformatics - computational services and

51
Genomics and Bioinformatics Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 http://biochem218.stanford.edu/

Upload: others

Post on 12-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomics and Bioinformatics - Computational Services and

Genomics and Bioinformatics

Doug BrutlagProfessor Emeritus

Biochemistry & Medicine (by courtesy)

Computational Molecular BiologyBiochem 218 – BioMedical Informatics 231

http://biochem218.stanford.edu/

Page 3: Genomics and Bioinformatics - Computational Services and

• Alway M114 – Tuesdays & Thursdays 2:15-3:30 PM

• Course Web Site– http://biochem218.stanford.edu/

• Stanford Center for Professional Development– http://scpd.stanford.edu/

• Videos available 24 hours/day, 7 days/week• Course offered Autumn, Winter and Spring

quarters

Course and Video Availability

Page 4: Genomics and Bioinformatics - Computational Services and

Course Requirements• Lectures

– Theoretical background of current methods– Strengths and weaknesses of current approaches– Future directions for improvements

• Demonstrations– Applications (Mac, PC, Unix, Web)– Web applications– Illustrate homework

• All homework and questions must be submitted by email to [email protected]

• Several homework assignments (35%)– Due one week after assigned

• Final project (Due March 12th)– A critical or comparative review of computational approaches to

any problem in computational molecular biology– Propose new approach– Implement a new approach– Examples of previous projects for the class can be found at

http://biochem218.stanford.edu/Projects.html

Page 12: Genomics and Bioinformatics - Computational Services and

NCBI Handbookhttp://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook

Page 13: Genomics and Bioinformatics - Computational Services and

NCBI Handbookhttp://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook

Page 14: Genomics and Bioinformatics - Computational Services and

EMBL-EBI Home Pagehttp://www.ebi.ac.uk/

Page 17: Genomics and Bioinformatics - Computational Services and

Genomics, Bioinformatics &Computational Biology

Computational Biology

Computational Molecular Biology

BioinformaticsGenomics

ProteomicsStructural Genomics

Page 18: Genomics and Bioinformatics - Computational Services and

Genomics, Bioinformatics &Computational Biology

Computational Biology

Computational Molecular Biology

BioinformaticsGenomics

ProteomicsStructural GenomicsSystems Biology

Page 19: Genomics and Bioinformatics - Computational Services and

DatabasesMachine Learning Robotics

Statistics & ProbabilityArtificial Intelligence

Graph TheoryInformation Theory

Algorithms

Genomics, Bioinformatics &Computational Biology

Computational Biology

Computational Molecular Biology

BioinformaticsGenomics

ProteomicsStructural Genomics

Page 20: Genomics and Bioinformatics - Computational Services and

What is Bioinformatics?

RNA Protein

DNA Phenotype

SelectionEvolution

Individuals

Populations

Biological Information

Page 21: Genomics and Bioinformatics - Computational Services and

Computational Goals of Bioinformatics• Learn & Generalize: Discover conserved patterns (models) of

sequences, structures, interactions, metabolism & chemistries from well-studied examples.

• Prediction: Infer function or structure of newly sequenced genes, genomes, proteins or proteomes from these generalizations.

• Organize & Integrate: Develop a systematic and genomic approach to molecular interactions, metabolism, cell signaling, gene expression…

• Simulate: Model gene expression, gene regulation, protein folding, protein-protein interaction, protein-ligand binding, catalytic function, metabolism…

• Engineer: Construct novel organisms or novel functions or novel regulation of genes and proteins.

• Gene Therapy: Target specific genes, or mutations, RNAi to change a disease phenotype.

Page 22: Genomics and Bioinformatics - Computational Services and

Central Paradigm of Molecular Biology

DNA RNA Protein Phenotype(Symptoms)

Page 23: Genomics and Bioinformatics - Computational Services and

Molecular Biology of the Gene 1965

Page 24: Genomics and Bioinformatics - Computational Services and

Central Paradigm of Bioinformatics

MolecularStructure

Phenotype(Symptoms)

BiochemicalFunction

GeneticInformation

MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVANALAHKYH

Page 25: Genomics and Bioinformatics - Computational Services and

Central Paradigm of Bioinformatics

MolecularStructure

Phenotype(Symptoms)

BiochemicalFunction

GeneticInformation

MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVANALAHKYH

Page 26: Genomics and Bioinformatics - Computational Services and

Challenges Understanding Genetic Information

GeneticInformation

MolecularStructure

BiochemicalFunction Phenotype

• Genetic information is redundant• Structural information is redundant• Genes and proteins are meta-stable• Single genes have multiple functions• Genes are one dimensional but function depends

on three-dimensional structure

Page 27: Genomics and Bioinformatics - Computational Services and

Redundancy in Genomic& Protein Sequences

• DNA is double-stranded• Genetic code• Acceptable amino-acid

replacements• Intron-exon variation• Alternative splicing• Strain variations (SNPs)• Sequencing errors

Page 28: Genomics and Bioinformatics - Computational Services and

Using A Controlled Vocabulary for Literature Searchhttp://www.ncbi.nlm.nih.gov/sites/entrez?db=mesh

Page 29: Genomics and Bioinformatics - Computational Services and

Gene Ontology Databasehttp://www.geneontology.org/

Page 31: Genomics and Bioinformatics - Computational Services and

ExPASy Proteomics Serverhttp://www.expasy.ch/doc.html

Page 32: Genomics and Bioinformatics - Computational Services and

Inferring Biological Function fromProtein SequenceConsensus Sequences

or Sequence MotifsZinc Finger (C2H2 type)

C x {2,4} C x {12} H x {3,5} H

Sequence Similarity 10 20 30 40 50Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS |:| :|: | |:|||| | |:||| |: : :|:| :| | |: |Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN

10 20 30 40 50

Sequences of CommonStructure or Function

Page 33: Genomics and Bioinformatics - Computational Services and

A Typical Motif:Zinc Finger DNA Binding Motif

C..C............H....H

Page 34: Genomics and Bioinformatics - Computational Services and

Profiles, PSI-BLASTHidden Markov Models

AA1 AA2 AA3 AA4 AA5 AA6

I 1 I 2 I 3 I 4 I 5

D 2 D 3 D 4 D 5

Inferring Biological Function fromProtein SequenceConsensus Sequences

or Sequence MotifsZinc Finger (C2H2 type)

C x {2,4} C x {12} H x {3,5} H

Sequence Similarity 10 20 30 40 50Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS |:| :|: | |:|||| | |:||| |: : :|:| :| | |: |Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN

10 20 30 40 50

Sequences of CommonStructure or Function

1 2 3 4 5 6 7 8 9 10 11 121 2 3 4 5 6 7 8 9 10 11 12

AA 2 1 3 13 10 12 67 4 13 9 1 22 1 3 13 10 12 67 4 13 9 1 2RR 7 5 8 9 4 0 1 16 7 0 1 07 5 8 9 4 0 1 16 7 0 1 0NN 0 8 0 1 0 0 0 2 1 1 10 00 8 0 1 0 0 0 2 1 1 10 0DD 0 1 0 1 13 0 0 12 1 0 4 00 1 0 1 13 0 0 12 1 0 4 0CC 0 0 1 0 0 0 0 0 0 2 2 10 0 1 0 0 0 0 0 0 2 2 1QQ 1 1 21 8 10 0 0 7 6 0 0 21 1 21 8 10 0 0 7 6 0 0 2EE 2 0 0 9 21 0 0 15 7 3 3 02 0 0 9 21 0 0 15 7 3 3 0GG 9 7 1 4 0 0 8 0 0 0 46 09 7 1 4 0 0 8 0 0 0 46 0HH 4 3 1 1 2 0 0 2 2 0 5 04 3 1 1 2 0 0 2 2 0 5 0II 10 0 11 1 2 10 0 4 9 3 0 1610 0 11 1 2 10 0 4 9 3 0 16LL 16 1 17 0 1 31 0 3 11 24 0 1416 1 17 0 1 31 0 3 11 24 0 14KK 3 4 5 10 11 1 1 13 10 0 5 23 4 5 10 11 1 1 13 10 0 5 2MM 7 1 1 0 0 0 0 0 5 7 1 87 1 1 0 0 0 0 0 5 7 1 8FF 4 0 3 0 0 4 0 0 0 10 0 04 0 3 0 0 4 0 0 0 10 0 0PP 0 6 0 1 0 0 0 0 0 0 0 00 6 0 1 0 0 0 0 0 0 0 0SS 1 17 0 8 3 1 3 0 2 2 2 01 17 0 8 3 1 3 0 2 2 2 0TT 5 22 3 11 1 5 0 2 2 2 0 55 22 3 11 1 5 0 2 2 2 0 5WW 2 0 0 0 0 0 0 0 0 1 0 12 0 0 0 0 0 0 0 0 1 0 1YY 1 0 4 2 0 1 0 0 2 4 0 11 0 4 2 0 1 0 0 2 4 0 1VV 6 3 1 1 2 15 0 0 2 12 0 286 3 1 1 2 15 0 0 2 12 0 28

Weight Matrices orPosition-Specific Scoring Matrices

Page 38: Genomics and Bioinformatics - Computational Services and

Clustal Globin Alignment

Page 39: Genomics and Bioinformatics - Computational Services and

Consensus Sequence From aMultiple Sequence Alignment

ClustalW Insulin Alignments

IPGPIPDKIPDGIPCHIPCAIPBOIPAF

10 20 30F V S R HA A N Q H

M A L W M R L L P L L A L L A L W A P A P T R A F V N Q HM A L W I R S L P L L A L L V F S G P G - T S Y A A N Q HM A V W I Q A G A L L F L L A V S S V N A N A G A P - Q H

F V N Q HM A A L W L Q S F S L L V L L V V S W P G S Q A V A P A Q H

A . W . . L L L L A N Q H

IPGPIPDKIPDGIPCHIPCAIPBOIPAF

40 50 60L C G S N L V E T L Y S V C Q D D G F F Y I P K D X X E L EL C G S H L V E A L Y L V C G E R G F F Y S P K T X X D V EL C G S H L V E A L Y L V C G E R G F F Y T P K A R R E V EL C G S H L V E A L Y L V C G E R G F F Y S P K A R R D V EL C G S H L V D A L Y L V C G P T G F F Y N P K R D V D P PL C G S H L V E A L Y L V C G E R G F F Y T P K A R R E V EL C G S H L V D A L Y L V C G D R G F F Y N P K R D V D Q LL C G S H L V E A L Y L V C G E R G F F Y . P K . D V E

IPGPIPDKIPDGIPCHIPCAIPBOIPAF

70 80 90D P Q V E Q T E L G M G - - - - - L G A G G L Q P - - L Q GQ P - L V N G P L H G E - - - - - V G E L P F Q - - - - H ED L Q V R D V E L A G A - - - - - P G E G G L Q P L A L E GQ P - L V S S P L R G E - - - - - A G V L P F Q - - - - Q EL G F L P P K S - - - - - - A Q E T E V A D F A F K D H A EG P Q V G A L E L A G G - - - - - P G A G G L E - - - - - GL G F L P P K S G G A A A A G A D N E V A E F A F K D Q M E

P L L G G F Q E

IPGPIPDKIPDGIPCHIPCAIPBOIPAF

100 110 120A L Q X X - - G I V D Q C C T G T C T R H Q L Q S Y C NE Y Q X X - - G I V E Q C C E N P C S L Y Q L E N Y C NA L Q K R - - G I V E Q C C T S I C S L Y Q L E N Y C NE Y E K V K R G I V E Q C C H N T C S L Y Q L E N Y C NV I R K R - - G I V E Q C C H K P C S I F E L Q N Y C NP P Q K R - - G I V E Q C C A S V C S L Y Q L E N Y C NM M V K R - - G I V E Q C C H R P C N I F D L Q N Y C N

. Q K R G I V E Q C C C S L Y Q L E N Y C N

Page 40: Genomics and Bioinformatics - Computational Services and

HMM Model of Hemoglobinshttp://decypher.stanford.edu/

Page 41: Genomics and Bioinformatics - Computational Services and

GrowTree VegF Neighbor Joining Tree

Page 42: Genomics and Bioinformatics - Computational Services and

T Cells Signaling

DNA Damage

Fibroblast Stimulation

B Cells Signaling

CMV Infection

Anoxia

Polio InfectionMonocytes Signaling IL4Hormone

Human Gene Expression Signatures

Page 43: Genomics and Bioinformatics - Computational Services and

Clustering Gene Expression Profiles: Comparison of Methods

D'haeseleer P (2005). Nat Biotechnol. 23,1499-501.

Page 45: Genomics and Bioinformatics - Computational Services and

Finding Transcription Factor Binding Sites

Upstream Regions Co-expressed

Genes

GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC

CACATCGCATCACGTGACCAGT...GACATGGACGGC

GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA

TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG

CGCTAGCCCACGTGGATCTTGA...AGAATGACTGGC

Pho 5

Pho 8

Pho 81

Pho 84

Pho …

Transcription Start

Page 46: Genomics and Bioinformatics - Computational Services and

Upstream Regions Co-expressedGenes

GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC

CACATCGCATCACGTGACCAGT...GACATGGACGGC

GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA

TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG

CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT

Finding Transcription Factor Binding Sites

Page 47: Genomics and Bioinformatics - Computational Services and

Upstream Regions Co-expressedGenes

ATGGCTGCACCACGTTTATGC...ACGATGTCTCGC

CACATCGCATCACGTGACCAGT...GACATGGACGGC

GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA

TTAGGACCATCACGTGA...ACAATGAGAGCG

CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT

Pho4 binding

Finding Transcription Factor Binding Sites

Page 48: Genomics and Bioinformatics - Computational Services and

Metabolic Networks: BioCychttp://biocyc.org/

Page 49: Genomics and Bioinformatics - Computational Services and

C. crescentus Cell Cycle Gene Expression

Page 50: Genomics and Bioinformatics - Computational Services and

Genome Wide Associations in Rheumatoid Arthritis

Pearson, T. A. et al. JAMA 2008;299:1335-1344

Page 51: Genomics and Bioinformatics - Computational Services and

Leveraging Genomic Information in Medicine

Novel DiagnosticsMicrochips & Microarrays - DNAGene Expression - RNAProteomics - Protein

Understanding MetabolismUnderstanding Disease

Inherited Diseases - OMIMInfectious Diseases

Pathogenic BacteriaViruses

Novel Therapeutics Drug Target DiscoveryRational Drug DesignMolecular DockingGene TherapyStem Cell Therapy