a simple method for incorporating sequence information into directed evolution experiments

16
A simple method for incorporating sequence information into directed evolution experiments Kyle L. Jensen*, Hal Alper*, Curt Fischer, Gregory Stephanopoulos Department of Chemical Engineering Massachusetts Institute of Technology sequence phenotype

Upload: kyle-jensen

Post on 26-May-2015

418 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A simple method for incorporating sequence information into directed evolution experiments

A simple method for incorporating sequence information into directed evolution experiments

Kyle L. Jensen*, Hal Alper*,Curt Fischer, Gregory Stephanopoulos

Department of Chemical EngineeringMassachusetts Institute of Technology

sequence phenotype

Page 2: A simple method for incorporating sequence information into directed evolution experiments

When screening throughput is limit, linking sequence to phenotype can help direct

downstream searches

Screen based(selectable trait)

Assay based(no selectable trait)

Page 3: A simple method for incorporating sequence information into directed evolution experiments

Here, a PLtet promoter was mutated to create a

library of promoter variants

Alper H., C. Fischer, E. Nevoigt, and G. Stephanopoulos, 2005. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. U S A 102:12678-83.

Page 4: A simple method for incorporating sequence information into directed evolution experiments

69 promoter variants were created using error prone PCR

Page 5: A simple method for incorporating sequence information into directed evolution experiments

The 69 promoter variants spanned an 800-fold range of activity

- How different are the underlying, mutagenized sequences?

- What, on a sequence level, causes the variation?

800 fold range

Log relative fluorescence

Muta

nt

num

ber

Top 50%

Bottom 50%

Page 6: A simple method for incorporating sequence information into directed evolution experiments

Each of the 69 mutants had a unique sequence and incorporated multiple transition SNPs

mutations

promoter region

Log relative fluorescence

Muta

nt

num

ber

Position [nt]

Muta

nt n

um

ber

Page 7: A simple method for incorporating sequence information into directed evolution experiments

The effects of individual mutations were “masked” by the presence of other mutations

Just because a mutation occurs more frequently in one class, is it correlated?

Is the ratio of top/bottom important?

What is the statistical significance of a mutation that is distributed between the two classes?

Some mutations

have obvious effects

...most do not

Position [nt]M

uta

nt n

um

ber

Cla

ss distrib

utio

n

Page 8: A simple method for incorporating sequence information into directed evolution experiments

Each individual position can be evaluated using a simple binomial distribution

Same as: what's the probability of getting heads 14

of 20 coin tosses?

P-value: 14 or more heads out of 20

Assuming the positions are independent

Position [nt]

Cla

ss distrib

utio

n

Page 9: A simple method for incorporating sequence information into directed evolution experiments

Similar analysis over the promoter region revealed 7 positions significantly correlated with activity

Cla

ss distrib

utio

n

Position [nt]

Page 10: A simple method for incorporating sequence information into directed evolution experiments

Position [nt]M

uta

nt n

um

ber

Cla

ss distrib

utio

nLog relative fluorescence

Muta

nt

num

ber

Position [nt]

Page 11: A simple method for incorporating sequence information into directed evolution experiments

A similar analysis can be applied to an arbitrary number of mutants and phenotypic classes

1

2

M

.

.

.

mutants

M phenotypes

Mutants with mutations as “position 35”

.

.

.

.

.

.

.

.

.

or

1

2

3

4

5

6

Y

Page 12: A simple method for incorporating sequence information into directed evolution experiments

The generalized probability of the phenotype distribution can be used to find

mutation-phenotype correlations

Probability of a particular vector color distribution

Significance of a correlation between mutations at “position 35” and the green phenotypic class

Prior probability ofgreen phenotype

Page 13: A simple method for incorporating sequence information into directed evolution experiments

In our case, we tested 8 locations, spanning a range of functions & confidences

Cla

ss distrib

utio

n

Position [nt]

Page 14: A simple method for incorporating sequence information into directed evolution experiments

7/8 of the single position mutants were in agreement with the predicted function

Site Predictedactivity

P-value Observ-ations

Confidence RelativeFluorescnece

Log RelativeFluorescence

Agreement?

-8 Low <0.0001 22 High 0.036 -3.32 Yes

-10 Low 0.1094 6 Med 0.011 -4.52 Yes

-14 High 0.0625 4 High 1.428 0.35 Yes

-21 High 0.0625 4 High 1.585 0.46 Yes

-28 Low 0.3770 10 Low 0.756 -2.58 Yes

-82 No effect 0.5000 2 Low 0.926 -0.08 Yes

-96 No effect 0.5000 5 Med 0.046 -3.08 No

-123 Low 0.1938 12 Med 0.087 -2.45 Yes

Page 15: A simple method for incorporating sequence information into directed evolution experiments

Rationally designed promoters with combinations of mutations showed predicted activity but also

signs of site interaction

Sites Predictedactivity

Log RelativeFluorescence

Agreement?

-14, -21 High 0.65 Yes-14, -82 High -0.04 No-21, -82 High 0.36 Yes-96, -123 Low -1.43 Yes

-82,-14,-21 High -1.97 No-8,-10,-28 Low -4.03 Yes

*

Page 16: A simple method for incorporating sequence information into directed evolution experiments

In summary, this simple method, based on multinomial statistics, can be used to link sequence variations

to particular phenotypes