chapter 1: bio primer - columbia · pdf filechapter 1: bio primer 1.1 cell structure; dna;...

23
1 COMS 4761 --2007 Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Chapter 1: Bio Primer 1.1 Cell Structure; DNA; RNA; transcription; translation; proteins COMS 4761 --2007 2 Overview Cell structure and mechanisms DNA; RNA; Transcription; Regulation Translation; protein; sequence & structure References: B. Alberts et al, “Molecular Biology of The Cell”, 4 th edition, Garland Science. R. Horton et al, “Principles of Biochemistry”, 3 rd Edition, Prentice Hall. J.D. Watson et al, “Molecular Biology of The Gene”, 5 th edition, Pearson Benjamin Cummings. NCBI Introductory overview: http://www.ncbi.nih.gov/About/primer/index.html Animation sites: o http://www. johnkyrk .com/ o http: //vcell . ndsu . nodak . edu/~christjo/vcell/animationSite

Upload: vumien

Post on 22-Mar-2018

233 views

Category:

Documents


6 download

TRANSCRIPT

1

COMS 4761 --2007

Prof. Yechiam Yemini (YY)

Computer Science DepartmentColumbia University

Chapter 1: Bio Primer

1.1 Cell Structure; DNA; RNA; transcription; translation; proteins

COMS 4761 --2007 2

Overview Cell structure and mechanisms DNA; RNA; Transcription; Regulation Translation; protein; sequence & structure References:

B. Alberts et al, “Molecular Biology of The Cell”, 4th edition, GarlandScience.

R. Horton et al, “Principles of Biochemistry”, 3rd Edition, PrenticeHall.

J.D. Watson et al, “Molecular Biology of The Gene”, 5th edition,Pearson Benjamin Cummings.

NCBI Introductory overview:http://www.ncbi.nih.gov/About/primer/index.html

Animation sites:o http://www.johnkyrk.com/o http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite

2

COMS 4761 --2007 3

Organisms Are Made of Cells

COMS 4761 --2007 4

Prokaryotes & Eukaryotes Have Different Cells

Prokaryotes: single cell organisms without nucleusE.g., Bacteria: E-coli, H-Pylori

Eukaryotes: single/multi-cell organisms with nucleusE.g., Yeast, plants, drosophila, humans

-0.5B yrs

-1.5B yrs

-3.5B yrs

-4.5B yrsEarth formed

Prokaryotic bacteria

NucleatedcellsMulti-cellulareukaryotes

© Pearson; Benjamin Cummings

3

COMS 4761 --2007 5

DNA is tightly packed (chromatin + histones)DNA is loosely organized

Organelles: mitochondria, Golgi, chloroplastsNo organelles

~107-9 base pairs~105-6 base pairs

5-20k protein species1-2k protein speciesProteins ~109 proteins per cell~106 proteins per cell

DNA

Structure

Two or more chromosomesSingle circular DNAGenes have large non-coding regions (introns)Genes code proteins95-97% non-coding DNA90% of DNA encodes proteins

Multiple membranes/compartmentsOne membrane at cell boundary

MitosisCell division through fission

CytoskeletonNo cytoskeleton

NucleusNo nucleusSingle or multi cell; cell size 10-100µmSingle cell; size 0.2-2µm

EukaryotesProkaryotes

COMS 4761 --2007 6

Cells Are Made of Macromolecules

Sugars Polysaccharides

Fatty Acids Fats, Lipids, Membranes

Amino Acids Proteins

Nucleotides Nucleic Acids (DNA, RNA)

0.2%Other small molecules26%Macromolecules (proteins, DNA, RNA, polysaccharides)

1%Fatty acids0.4%Nucleotides0.4%Amino acids

1%Sugars1%Inorganic ions

70%Water% weightMolecules

Small molecules: 3% Macromolecules: 26%

4

COMS 4761 --2007 7

DNA Structure

COMS 4761 --2007 8

The Central Dogma of Biology

DNA stores hereditary information DNA is transcribed into RNA RNA is translated into proteins Proteins perform the key functions of cells

DNA Transcription RNA Translation Protein

5

COMS 4761 --2007 9

DNA Consists of Sequences of Nucleotides DNA strands are sequences of nucleotides

Bases: Adenine, Guanine, Thymine, Cytosine

DNA is organized in complementary double strands Hydrogen bonds hybridize complementary pairs: AT, CG

A C T T A C G C

C G

A C T A A C G CT G A T T GHydrogen bonds

5’-end

3’-end

TSugar Phosphate Base

+Nucleotide

TBackbone

COMS 4761 --2007 10

DNA Forms A Double HelixHelix full turn: 10.5bpVertical hydrogen bonds

support the structureMajor and minor grooves

provide access by proteins(e.g., transcription factors)

6

COMS 4761 --2007 11

DNA Is Tightly PackedDNA is 2m long; needs to fold

into 10-6m nucleusChromatin beads fold around

4 histonesTranscription needs to unpack

the DNA to copy it

COMS 4761 --2007 12

Sample Bioinformatics Challenges

Sequencing the genomeDiscovering sequence similarityDiscovering genesAnalyzing evolutionary relationshipsDiscovering other important structuresDistinguishing exons from intronsRegulatory structures: (promoters & transcription factors)Regions expressing micro RNA….

7

COMS 4761 --2007 13

Transcription

COMS 4761 --2007 14

Schematics

DNA

Transcription

mRNA

Translation

Protein

8

COMS 4761 --2007 15

Overview

A. Assembling transcription complex

B. Transcribing DNA to mRNA

C. Removing introns

COMS 4761 --2007 16

Animation

The Transcription Process

9

COMS 4761 --2007 17

Transcription Detailshttp://cwx.prenhall.com/horton/medialib/

From PDB

COMS 4761 --2007 18

Transcription Factors

TFs bind to promoters regionsand to RNA polymerases

TFs regulate the rate oftranscription (up/down)

Regulation is yet to be wellunderstood

10

COMS 4761 --2007 19

Transcription Is Regulated

http://cwx.prenhall.com/horton/medialib/

COMS 4761 --2007 20

Example The Lac Operon

Lac consists of 3 genes; commonly transcribedUsed by bacteria to transport and metabolize lactose

cAMP activatestranscription toinitiate transport& metabolism oflactose

11

COMS 4761 --2007 21

Lac ActivationLow-level sugar generate cAMP cAMP binds with CRP; adjusts its alpha helix to fit the

DNA grooves and binds with itCRP-cAMP accelerates polymerase binding

LacLac

http://cwx.prenhall.com/horton/medialib/

COMS 4761 --2007 22

Splicing The Introns

http://cwx.prenhall.com/horton/medialib/

12

COMS 4761 --2007 23

From Genes ToNetworks

Regulation is organized innetworks

Top: gene networkregulating the bodydevelopment of sea urchin

Middle: a promoter region

Bottom: interaction of twomodules

COMS 4761 --2007 24

Regulatory Networks Can Be Complex

Genetic regulatory network controlling the development of the body plan of the sea urchin embryoDavidson et al., Science, 295(5560):1669-1678.

13

COMS 4761 --2007 25

Sample Bioinformatics Challenges

Discovering and analyzing transcription factorsEvolutionary analysis; motifs finding

Discovering the structure of regulatory networksAnalyzing the operations of regulatory networksDesigning synthetic regulatory networks

COMS 4761 --2007 26

Translation

14

COMS 4761 --2007 27

RNA Encodes Protein Sequences

Proteins are sequences of amino-acids (AA) Translation uses RNA sequence as a template to construct AA sequence

The coding problem: Code sequence of 20 amino-acids using 4 nucleic acids 2 nucleic acids can code only 42=16 amino-acids Codon: sequence of 3 nucleic acids; encodes amino acid

Translation: translate mRNA codons to amino acids Start/Stop codons define an open reading frame(ORF) Translation requires reading/identifying codons and forming a respective protein

sequence

DNA Transcription RNA Translation Protein

COMS 4761 --2007 28

The Genetic Code

GGU GlycineGGC GlyGGA GlyGGG Gly

GAU AspartateGAC AspGAA GlutamateGAG Glu

GCU AlanineGCC AlaGCA AlaGCG Ala

GUU ValineGUC ValGUA ValGUG Val

G

AGU SerineAGC SerAGA ArgAGG Arg

AAU AsparagineAAC AsnAAA LysineAAG Lys

ACU ThreonineACC ThrACA ThrACG Thr

AUU IsoleucineAUC IleAUA IleAUG Methionine

A

CGU ArginineCGC ArgCGA ArgCGG Arg

CAU HistidineCAC HisCAA GlutamineCAG Gln

CCU ProlineCCC ProCCA ProCCG Pro

CUU LeuCUC LeuCUA LeuCUG Leu

C

UGU CysteineUGC CysUGA StopUGG Tryptophan

UAU TyrosineUAC TyUAA StopUAG Stop

UCU SerineUCC SerUCA SerUCG Ser

UUU PhenylalanineUUC PheUUA LeucineUUG Leu

U

GACU

15

COMS 4761 --2007 29

tRNA Provides Translation Units

Anticodon 3’ CGA 5’ binds to codon 5’ GCU 3’ of mRNA

It translates GCU to Alanine

http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html

COMS 4761 --2007 30

Translation Basics Initiation:

Ribosome binds to mRNA; movesin 5’3’ until it finds Start codonAUG

Elongation Ribosome recruits tRNA to match

next codon tRNA binds its AA into peptide

bond with protein Ribosome releases tRNA and

moves to next codob Termination

Until a Stop codon is reached Release factor releases

polypeptide from ribosome

http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/T/Translation.html

16

COMS 4761 --2007 31

Animation

Translation of RNA into proteins

COMS 4761 --2007 32

Proteins Are Sequences of Amino Acids

Proteins are constructed through peptide bonds Proteins are folded into complex conformations Proteins perform functions by bindingTranscription factors and polymerase bind to DNAEnzymes bind to molecules to accelerate their reactionsGlobins bind to oxygen to transport itAntibodies bind to pathogens

17

COMS 4761 --2007 33

Example: Hemoglobin

COMS 4761 --2007 34

Sickle-Cell Anemia: A Single Nucleotide Change

Sickle structure

Codon 6 in β-globin

18

COMS 4761 --2007 35

Evolution of β-Globin

(α-globin cluster is coded by chromosome 16 )

COMS 4761 --2007 36

The Evolution of α-Globin Across Species

19

COMS 4761 --2007 37

Protein Structures

COMS 4761 --2007 38

Protein Structure Is Of Central Importance Structure is found through complex crystallography

X-ray diffraction; NMR The holy-grail: compute structure from sequence

Ab-initio: compute structure directly from sequence Homology techniques: use similarity to known proteins

Structure is conserved across wide variations Small number of fold families (α-helix, β-sheets…) There are rules (e.g., hydrophobic AA are packed inside) Nature folds proteins very fast

So why is it so difficult to predict structure?

20

COMS 4761 --2007 39

SwissProt vs. PDB Statistics

PDB ~30k structures

COMS 4761 --2007 40

Proteins Interact Via Active Sites

Protein interactions are defined by active sitesE.g., antibody with pathogenE.g., drug design

Proteins use geometry: ligands latch with holes Proteins use physics: electrical fields How can protein-protein interactions be computed?

21

COMS 4761 --2007 41

Sample Bioinformatics Challenges

Analyzing protein sequence similarityEvolutionary conservation/changes

Computing structure from sequencesAnalyzing structure homologiesAnalyzing protein-2-protein interactionsInferring function from structure

COMS 4761 --2007 42

The Cell Cycle

22

COMS 4761 --2007 43

Cells Operate In Cycles G0 Phase

cell is at rest G1 Phase (4hrs)

Cell either progresses into synthesis or leaves cell cycle to differentiate

S Phase (10hrs) DNA Synthesis Checkpoint determines integrity of DNA

G2 Phase (4hrs) Cell prepares for Mitosis Checkpoint determines integrity of DNA DNA is repaired or cell dies (Apoptosis)

Mitosis (2hrs) Chromosomes are separated Cell divides

COMS 4761 --2007 44

The Cell Cycle is RegulatedTransition among

phases is controlled bya regulatory network

Checkpoints are usedto assure quality

23

COMS 4761 --2007 45

Evolution

COMS 4761 --2007 46

Optimizing Functionality

DNA is substantially conserved through evolution Evolution = mutation + selectionMutation = single nucleotide polymorphism (SNP);

duplication of entire DNA segments mating; recombination

Selection = optimize fitness of species ExamplesMetabolic nets learn to optimize energy budget (Alon 05)

Functional similarity Sequence similarity