rachel patton mccord bulyk lab harvard university biophysics program 3/20/08

42
Translating the Cell’s “Instruction Manual” A Biophysicist’s Approach to Understanding Gene Regulation Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Upload: adara

Post on 01-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Translating the Cell’s “Instruction Manual” A Biophysicist’s Approach to Understanding Gene Regulation. Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08. “Knobloch lives?” What are characteristics of “life”? Response to environment - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Translating the Cell’s “Instruction Manual”

A Biophysicist’s Approach to Understanding Gene Regulation

Rachel Patton McCordBulyk Lab

Harvard University Biophysics Program3/20/08

Page 2: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

“Knobloch lives?” What are characteristics of “life”?

Response to environment Take in nutrients and produce waste Reproduction ….

Page 3: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Biological Signal Processing

oxygen ethanol

Page 4: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Inputs

Nucleus

Transcription Factor

mRNA

protein

Outputs

Biological Signal Processing

Page 5: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Regulation of Gene Expression

Transcription Factor (TF) recognizes DNA bases (ACGT)

Promotes gene expression: transcription of mRNA

(output)

RNA

Sequence-Specific TFs

RNA Polymerase

Page 6: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Organisms

Ideal: understand gene regulation in human Problems: Large genome size, diverse cell types,

likely complicated gene regulation “rules”

Begin with model system single celled organism Saccharomyces cerevisiae (yeast)

A few hundred kb

A few hundred bp

Page 7: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Goals: Find DNA sequences bound by TFs

Predict how TFs function in the cell

Look for biophysical links between TF structure and function

Use quantitative approaches to maintain a physically realistic view of biology.

Page 8: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Mukherjee, Berger, et al., Nature Genetics (2004), 36:1331-1339.

Protein Binding Microarray (PBM) Technology

TF-DNA Sequence Recognition

TF

TF

Microarray slide

Fluorophore labeled antibody

dsDNA

Page 9: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Mukherjee, Berger, et al., Nature Genetics (2004), 36:1331-1339.

Protein Binding Microarray (PBM) Technology

TF-DNA Sequence Recognition

DetectorLaser

(488 nm)

Page 10: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Universal Array Design

Interested in sequences of 8-10 bases

CTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTG5’

3’

CTATCTACACA TATCTACACAC ATCTACACACA TCTACACACAA

410 ≈ 1,000,000 total 10-mers

24 nt fixed sequence

36 nt variable sequence

27 10-mers per spot

Philippakis, Qureshi et al., RECOMB (2007).

Berger, Philippakis et al., Nature Biotechnology (2006), 24:1429-1435.

410 ≈ 1,000,000 total 10-mers

410 / 27 ≈ 40,000 total spots

Page 11: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Universal Array Design Use an idea from cryptography:

“de Bruijn” sequence contains all sequence variants of length k in the shortest sequence possible

Anthony Philippakis, Mike Berger

AAA AAC AAG AAT ACA ACC ACG ACTAGA AGC AGG AGTATA ATC ATG ATTCAA CAC CAG CAT CCA CCC CCG CCTCGA CGC CGG CGTCTA CTC CTG CTTGAA GAC GAG GAT GCA GCC GCG GCT GGA GGC GGG GGTGTA GTC GTG GTTTAA TAC TAG TATTCA TCC TCG TCTTGA TGC TGG TGT TTA TTC TTG TTT

All possible 3-mers

Length = 43 = 64 bp

de Bruijn sequence

TC

GA

TT

GC

GT

GA

CA

GG

GT

AA

AA

CA

AG

AC

CC

TG

AC

CA

TG

GC

AG

TG

T

TC

GA

TT

GC

GT

GA

CA

GG

GT

AG

TC

CG

GG

TT

CT

TT

GC

GC

TC

AC

TA

TA

C

Fixed sequence (24 bp)

Test sequence (36 bp)

Page 12: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Deriving Binding Strength at each Sequence

CCGTCAGCAGTCATGGAAAGCTGGTAGAAGTTCTGGGTCTGTGTTCCGTTGTCCGTGCTGTTATACCATGGAAAGACAAACGTAGCATGTTGGAGTGTCTGTGTTCCGTTGTCCGTGCTGCCATGGAAATGTGTCCCTAAGGGTGGTAACAAAATAGTCTGTGTTCCGTTGTCCGTGCTGCACTACGCAAGTGCGGTGCATGGAAAGGGTTCTGGAGTCTGTGTTCCGTTGTCCGTGCTGATCTCATGGAAAAGACTCATAACGATCAACAGTCGGGTCTGTGTTCCGTTGTCCGTGCTGACAACAGAGCACCGATGGCATGGAAACTTGCGTAGAGTCTGTGTTCCGTTGTCCGTGCTGGTGGAGAAAGGGGTCAAACATGGAAACGCATCGACAGTCTGTGTTCCGTTGTCCGTGCTGGCCCGGGATCCCATCCATGGAAAATGTCGCTTACATGTCTGTGTTCCGTTGTCCGTGCTGCAGAAGTGTCCTACGTAACATCCACATGGAAAGTACGTCTGTGTTCCGTTGTCCGTGCTGGTTGCATACACGCATGGAAATAACAATCGAACTCCAGTCTGTGTTCCGTTGTCCGTGCTGTCATGTGCTGGGCTTGATTCAGCATGGAAAACCAGTGTCTGTGTTCCGTTGTCCGTGCTGTATTCTTCTCTTCATGGAAACAGTAAAAAATCGGACGTCTGTGTTCCGTTGTCCGTGCTGCTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTGCCTGGGGACATGGAAAAATGAAGTCACCCATGGTGCGTCTGTGTTCCGTTGTCCGTGCTGATCATCCTTACATTACATGGAAATCGTGTGCCAATAGTCTGTGTTCCGTTGTCCGTGCTGAAGGCCCATGGAAACCACGTCATATTCACAACTAACGTCTGTGTTCCGTTGTCCGTGCTG

Example: CATGGAAA

Every 8mer is represented 16 times Take median over intensities of all spots containing this 8mer

Page 13: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

GTCACGTG CACGCGAC 108178GCACGTGC GCACGTGC 95854CACGTGCC GGCACGTG 89203GCACGTGA TCACGTGC 74295TCACGTGA TCACGTGA 69377ACACGTGA TCACGTGT 68733ATCACGTG CACGTGAT 58874CACGTGTA TACACGTG 58656CCACGTGA TCACGTGG 47900ACACGTGG CCACGTGT 47240CACGTGAG CTCACGTG 42887AGCACGTG CACGTGCT 41755ACACGTGC GCACGTGT 36764CACGTGTC GACACGTG 36463ACCACGTG CACGTGGT 36380CACGTGCG CGCACGTG 35515CACGTGCA TGCACGTG 32370AACACGTG CACGTGTT 28948CCACGTGC GCACGTGG 22983CACGTGGC GCCACGTG 19315... ... ...

8-mer Rev. Comp. Median Signal

Deriving Binding Strength at each Sequence

ka kd

[TF] + [DNA] [TF-DNA]ka

kd

Affinity vs. PBM Signal (Cbf1)

log (PBM Median Signal)

log

(KD

-1)

Maerkl and Quake. Science (2007); 315:233-237.

Page 14: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Goals: Find DNA sequences bound by TFs

PBMs

Predict how TFs function in the cell

Look for biophysical links between TF structure and function

Use quantitative approaches to maintain a physically realistic view of biology.

Page 15: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Predicting TF Cellular Functions

Use known/measurable inputs and outputs:

Heat shock

Gene Deletion

Gene expression

mRNA

Page 16: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Gene Expression Data 1327 Publicly Available Microarray Datasets

Condition 2

Condition 1 mRNA

Page 17: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Predicting Cellular Functions of Components

Basic model/assumptions TF binding near genes

causes change in expression

Similar TF binding probability + similar expression = active regulation

TF1

TF1

TF1

TF1Gene 2

Gene 1

Gene 3

Gene 4

Gene 5

PBM data Expression data

Page 18: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Physically Realistic Binding Probability

Simple (and often used) view:

GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC

Cbf1

Gene

Promoter region is BOUND:

Gene is ON

GGCACGTGGCTGCATGAGCGGAGGCTCGCGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCCGTGCGCCCTTTTATGTTGTCAGTGGGTGCAC

Gene

Promoter region is NOT BOUND:

Gene is OFF

Page 19: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Physically Realistic Binding Probability Physical reality:

Energy landscape of potential TF binding

TF occupancy probability = Integration of binding potential across sequence near gene Dictates likelihood of recruiting RNA polymerase and

thus level of mRNA transcription

GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC

Cbf1

Gene

Page 20: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Physically Realistic Binding Probability Physical reality:

Energy landscape of potential binding

Sum median intensity data across all possible 8-mers in sequence near gene

GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC

Cbf1

Gene

GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTGCCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC

Gene

Intensity = 117651 Intensity = 215352

Page 21: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Goals of New Analysis Method Combine binding probability with expression data

to predict TF function and condition specific binding site usage

Gene expressionPBM dataCondition A

Condition B

Condition C

Condition D

Target Gene:

1

2

3

4

5

6

TF Function

Page 22: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Goals of New Analysis Method

Consider all data rather than drawing arbitrary cutoffs Low affinity binding as well as minor

expression changes may be biologically relevant Tanay, 2006; Foat et al., 2006

Bin

ding

pr

obab

ility

?

Page 23: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

CRACR

“Combination Rank-order Analysis of Condition-specific Regulation”

Page 24: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Basics of CRACR Approach

TF binding rank:

2 3 6 9 1 8 5 10 4 7 11

Order genes by expression in condition of interest

Assign ranks based on PBM-derived binding probability for TF

Most Most induced repressedY

GR043C

YAR014C

YAR029W

YGR087C

YAR018W

YAL003C

YAR003W

YGR088W

YAR044W

YER130C

YPL054W

Page 25: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

PBM p-value rank:

2 3 6 9 1 8 5 10 4 7 11 Most Most induced repressed

Basics of Analysis Approach

YGR043C

YAR014C

YAR029W

YGR087C

YAR018W

YAL003C

YAR003W

YGR088W

YAR044W

YER130C

YPL054W

Select: similarly expressed foreground genes background set

foreground background

Page 26: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Most Most induced repressed

Basics of Analysis Approach

YGR043C

YAR014C

YAR029W

YGR087C

YAR018W

YAL003C

YAR003W

YGR088W

YAR044W

YER130C

YPL054W

Slide window along ordered expression Calculate an area statistic for enrichment of PBM targets

within each window vs. background

PBM p-value rank:

2 3 6 9 1 8 5 10 4 7 11

1 ρB ρF

(B + F) B F

ρ = rank sum

F = foreground B = background[[

area =

Page 27: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Predicting TF Function

Plot area statistic (ranges -0.5 to 0.5) at each window Determine condition significance by permutation test-derived

threshold (gray line: p < 0.001)

area

sta

tistic

Glucose added: Mig1 targets repressed

induced-----------------repressed >8.0 5.0 3.4 2.3 1.5 0 -1.5 -2.3 -3.4 -5 <-8

Expression

fold change

Glucose

Mig1

mRNA

metabolism enzymemetabolism switch

Page 28: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Predicting TF Function

Determine which individual genes are repressed by Mig1

area

sta

tistic

Glucose added: Mig1 targets repressed

induced-----------------repressed >8.0 5.0 3.4 2.3 1.5 0 -1.5 -2.3 -3.4 -5 <-8

Expression

fold change

Group of genes repressed by Mig1

Mig1YHR005C

Mig1YER130C

Mig1YBL054W

Page 29: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Prediction of General TF Function

Conditions for which there is significant enrichment of PBM targets:  Effect

Cell Cycle: Expression in response to Clb2p (set 1, 40 min) induced

Expression during the cell cycle (alpha factor arrest and release)(16) induced

Expression during the cell cycle (cdc15 arrest and release)(8) induced

Expression during the cell Cycle (cdc28)(7) induced

Expression in response to 50 nM alpha-factor: 120 min induced

Expression in ckb2 deletion mutant induced

Expression in dig1, dig2 deletion mutant induced

Expression in swi6 (haploid) deletion mutant induced

Expression in tec1 (haploid) deletion mutant induced

Expression in yel044w deletion mutant induced

Expression in sir2 deletion mutant repressed

Expression in snf2 mutant cells in minimal medium repressed

Expression in response to 50 nM alpha-factor in bni1mutant: 60 min repressed

Selected Mcm1 significant conditions

Find all (of 1327) expression conditions where a TF is predicted to be active

Look for enrichment of general biological functions in this set

Page 30: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Prediction of General TF Function

Conditions for which there is significant enrichment of PBM targets:  Effect

Cell Cycle: Expression in response to Clb2p (set 1, 40 min) induced

Expression during the cell cycle (alpha factor arrest and release)(16) induced

Expression during the cell cycle (cdc15 arrest and release)(8) induced

Expression during the cell Cycle (cdc28)(7) induced

Expression in response to 50 nM alpha-factor: 120 min induced

Expression in ckb2 deletion mutant induced

Expression in dig1, dig2 deletion mutant induced

Expression in swi6 (haploid) deletion mutant induced

Expression in tec1 (haploid) deletion mutant induced

Expression in yel044w deletion mutant induced

Expression in sir2 deletion mutant repressed

Expression in snf2 mutant cells in minimal medium repressed

Expression in response to 50 nM alpha-factor in bni1mutant: 60 min repressed

Selected Mcm1 significant conditions

Find all (of 1327) expression conditions where a TF is predicted to be active

Look for enrichment of general biological functions in this set

Page 31: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Prediction of General TF Function

Selected Mcm1 significant conditions

Find all (of 1327) expression conditions where a TF is predicted to be active

Look for enrichment of general biological functions in this set

Prediction: Mcm1 involved in cell cycle and mating

“a” cell“alpha” cell

alpha factor

Page 32: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Prediction of TF function

After PBM experiments, CRACR has been used to predict functions of 90 yeast TFs (paper in process)

Page 33: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Binding Site Affinity EffectsB

indi

ng a

ffin

ity

TFGene 1

TFGene 2

TFGene 3

ka kd

[TF] + [DNA] [TF-DNA]ka

kd

TF concentration low

TF concentration medium

TF concentration high

High affinity

Medium affinity

Low affinity

Page 34: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Demonstrating Effects of Binding site affinity

Low vs. high affinity binding sites may have different biological functions

Expression after oxidative stress vs. Rap1 binding affinity

Highest binding affinity……………Lowest binding affinity

ALD4- Predicted Conditional Target

02468

101214161820

0 20 30

Time after diamide treatment (min)

Oc

cu

pa

nc

y U

nit

s *****

MCR1- Predicted Conditional Target

0123456789

10

0 20 30

Time after diamide treatment (min)O

ccu

pan

cy U

nit

s

* ***

Experimentally Validated

Page 35: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Goals: Find DNA sequences bound by TFs

PBMs

Predict how TFs function in the cell

CRACR

Look for biophysical links between TF structure and function

Use quantitative approaches to maintain a physically realistic view of biology.

Page 36: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Reasons for Different Functions: TF structure?

Goal: Consider biophysical TF structure instead of cartoon “TF blob”

Mig1

cyc8tup1

Page 37: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

TF Structure and Function Are certain TFs structurally suited for

certain types of biological processes? Case Study:

Lower Information Content Motif

GAL4 (Zn2Cys6)

CST6 (bZIP)

Regulatory hub; many target genes

Higher Information Content Motif

More specific, fewer target genes

metabolism of specific nutrients

cell fate, cell cycle

Page 38: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Goals: Find DNA sequences bound by TFs

PBMs

Predict how TFs function in the cell

CRACR

Look for biophysical links between TF structure and function

Use quantitative approaches to maintain a physically realistic view of biology.

Page 39: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Future Directions

Completion of functional predictions and study of yeast gene regulation

Toward predictive model in humans Experiments for understanding gene regulation

rules

Page 40: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Acknowledgements

Martha BulykMike Berger

Anthony PhilippakisCong Zhu

Kelsey ByersTrevor Siggers

Vicky ZhouCherelle WallsJason WarnerJaime Chapoy

Other Bulyk Lab Members

NSF graduate research fellowshipNIH/NHGRI R01

Page 41: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

GO CATS!!

Page 42: Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08

Advantages and Challenges of Interdisciplinary Work

Insight gained by quantitative reasoning in biology, combining of different perspectives

“Physicists and mathematicians choose projects in biology that are fun, but not necessarily important”

Important not to get caught up in what “counts” as “true biology” or “true physics”