a simple method for incorporating sequence information into directed evolution experiments

A simple method for incorporating sequence information into directed evolution experiments

Kyle L. Jensen*, Hal Alper*,Curt Fischer, Gregory Stephanopoulos

Department of Chemical EngineeringMassachusetts Institute of Technology

sequence phenotype

When screening throughput is limit, linking sequence to phenotype can help direct

downstream searches

Screen based(selectable trait)

Assay based(no selectable trait)

Here, a PLtet promoter was mutated to create a

library of promoter variants

Alper H., C. Fischer, E. Nevoigt, and G. Stephanopoulos, 2005. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. U S A 102:12678-83.

69 promoter variants were created using error prone PCR

The 69 promoter variants spanned an 800-fold range of activity

- How different are the underlying, mutagenized sequences?

- What, on a sequence level, causes the variation?

800 fold range

Log relative fluorescence

Muta

nt

num

ber

Top 50%

Bottom 50%

Each of the 69 mutants had a unique sequence and incorporated multiple transition SNPs

mutations

promoter region

Log relative fluorescence

Muta

nt

num

ber

Position [nt]

Muta

nt n

um

ber

The effects of individual mutations were “masked” by the presence of other mutations

Just because a mutation occurs more frequently in one class, is it correlated?

Is the ratio of top/bottom important?

What is the statistical significance of a mutation that is distributed between the two classes?

Some mutations

have obvious effects

...most do not

Position [nt]M

uta

nt n

um

ber

Cla

ss distrib

utio

n

Each individual position can be evaluated using a simple binomial distribution

Same as: what's the probability of getting heads 14

of 20 coin tosses?

P-value: 14 or more heads out of 20

Assuming the positions are independent

Position [nt]

Cla

ss distrib

utio

n

Similar analysis over the promoter region revealed 7 positions significantly correlated with activity

Cla

ss distrib

utio

n

Position [nt]

Position [nt]M

uta

nt n

um

ber

Cla

ss distrib

utio

nLog relative fluorescence

Muta

nt

num

ber

Position [nt]

A similar analysis can be applied to an arbitrary number of mutants and phenotypic classes

1

2

M

.

.

.

mutants

M phenotypes

Mutants with mutations as “position 35”

.

.

.

.

.

.

.

.

.

or

1

2

3

4

5

6

Y

The generalized probability of the phenotype distribution can be used to find

mutation-phenotype correlations

Probability of a particular vector color distribution

Significance of a correlation between mutations at “position 35” and the green phenotypic class

Prior probability ofgreen phenotype

In our case, we tested 8 locations, spanning a range of functions & confidences

Cla

ss distrib

utio

n

Position [nt]

7/8 of the single position mutants were in agreement with the predicted function

Site Predictedactivity

P-value Observ-ations

Confidence RelativeFluorescnece

Log RelativeFluorescence

Agreement?

-8 Low <0.0001 22 High 0.036 -3.32 Yes

-10 Low 0.1094 6 Med 0.011 -4.52 Yes

-14 High 0.0625 4 High 1.428 0.35 Yes

-21 High 0.0625 4 High 1.585 0.46 Yes

-28 Low 0.3770 10 Low 0.756 -2.58 Yes

-82 No effect 0.5000 2 Low 0.926 -0.08 Yes

-96 No effect 0.5000 5 Med 0.046 -3.08 No

-123 Low 0.1938 12 Med 0.087 -2.45 Yes

Rationally designed promoters with combinations of mutations showed predicted activity but also

signs of site interaction

Sites Predictedactivity

Log RelativeFluorescence

Agreement?

-14, -21 High 0.65 Yes-14, -82 High -0.04 No-21, -82 High 0.36 Yes-96, -123 Low -1.43 Yes

-82,-14,-21 High -1.97 No-8,-10,-28 Low -4.03 Yes

*

In summary, this simple method, based on multinomial statistics, can be used to link sequence variations

to particular phenotypes

a simple method for incorporating sequence information into directed evolution experiments

Documents

phenotype distribution

individual position

single position mutants

arbitrary number of

promoter engineering

simple binomial distribution

combinations of mutations

mutants mphenotypes