a simple method for incorporating sequence information into directed evolution experiments
TRANSCRIPT
A simple method for incorporating sequence information into directed evolution experiments
Kyle L. Jensen*, Hal Alper*,Curt Fischer, Gregory Stephanopoulos
Department of Chemical EngineeringMassachusetts Institute of Technology
sequence phenotype
When screening throughput is limit, linking sequence to phenotype can help direct
downstream searches
Screen based(selectable trait)
Assay based(no selectable trait)
Here, a PLtet promoter was mutated to create a
library of promoter variants
Alper H., C. Fischer, E. Nevoigt, and G. Stephanopoulos, 2005. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. U S A 102:12678-83.
69 promoter variants were created using error prone PCR
The 69 promoter variants spanned an 800-fold range of activity
- How different are the underlying, mutagenized sequences?
- What, on a sequence level, causes the variation?
800 fold range
Log relative fluorescence
Muta
nt
num
ber
Top 50%
Bottom 50%
Each of the 69 mutants had a unique sequence and incorporated multiple transition SNPs
mutations
promoter region
Log relative fluorescence
Muta
nt
num
ber
Position [nt]
Muta
nt n
um
ber
The effects of individual mutations were “masked” by the presence of other mutations
Just because a mutation occurs more frequently in one class, is it correlated?
Is the ratio of top/bottom important?
What is the statistical significance of a mutation that is distributed between the two classes?
Some mutations
have obvious effects
...most do not
Position [nt]M
uta
nt n
um
ber
Cla
ss distrib
utio
n
Each individual position can be evaluated using a simple binomial distribution
Same as: what's the probability of getting heads 14
of 20 coin tosses?
P-value: 14 or more heads out of 20
Assuming the positions are independent
Position [nt]
Cla
ss distrib
utio
n
Similar analysis over the promoter region revealed 7 positions significantly correlated with activity
Cla
ss distrib
utio
n
Position [nt]
Position [nt]M
uta
nt n
um
ber
Cla
ss distrib
utio
nLog relative fluorescence
Muta
nt
num
ber
Position [nt]
A similar analysis can be applied to an arbitrary number of mutants and phenotypic classes
1
2
M
.
.
.
mutants
M phenotypes
Mutants with mutations as “position 35”
.
.
.
.
.
.
.
.
.
or
1
2
3
4
5
6
Y
The generalized probability of the phenotype distribution can be used to find
mutation-phenotype correlations
Probability of a particular vector color distribution
Significance of a correlation between mutations at “position 35” and the green phenotypic class
Prior probability ofgreen phenotype
In our case, we tested 8 locations, spanning a range of functions & confidences
Cla
ss distrib
utio
n
Position [nt]
7/8 of the single position mutants were in agreement with the predicted function
Site Predictedactivity
P-value Observ-ations
Confidence RelativeFluorescnece
Log RelativeFluorescence
Agreement?
-8 Low <0.0001 22 High 0.036 -3.32 Yes
-10 Low 0.1094 6 Med 0.011 -4.52 Yes
-14 High 0.0625 4 High 1.428 0.35 Yes
-21 High 0.0625 4 High 1.585 0.46 Yes
-28 Low 0.3770 10 Low 0.756 -2.58 Yes
-82 No effect 0.5000 2 Low 0.926 -0.08 Yes
-96 No effect 0.5000 5 Med 0.046 -3.08 No
-123 Low 0.1938 12 Med 0.087 -2.45 Yes
Rationally designed promoters with combinations of mutations showed predicted activity but also
signs of site interaction
Sites Predictedactivity
Log RelativeFluorescence
Agreement?
-14, -21 High 0.65 Yes-14, -82 High -0.04 No-21, -82 High 0.36 Yes-96, -123 Low -1.43 Yes
-82,-14,-21 High -1.97 No-8,-10,-28 Low -4.03 Yes
*
In summary, this simple method, based on multinomial statistics, can be used to link sequence variations
to particular phenotypes