proteomics informatics protein characterization: post...
TRANSCRIPT
Proteomics Informatics – Protein characterization: post-translational
modifications and protein-protein interactions (Week 10)
Top down / bottom up
Top down
Bottom up
mass/charge
inte
nsit
y
Top down Bottom up
Charge distribution
mass/charge in
ten
sit
y
mass/charge
inte
nsit
y
1+
2+
3+
4+
27+
31+
Top down Bottom up
m = 1035 Da m = 1878 Da m = 2234 Da
Isotope distribution
mass/charge in
ten
sit
y
mass/charge
inte
nsit
y
Fragmentation
Top down Bottom up
Fragmentation
Correlations between modifications
Top down
Bottom up
Alternative Splicing
Top down
Bottom up
Exon 1 2 3
Top down
Kellie et al., Molecular BioSystems 2010
Protein mass spectra
Fragment mass spectra
Protein Complexes
A
B
A
C D
Digestion
Mass spectrometry
Sowa et al., Cell 2009
Protein Complexes – specific/non-specific binding
Protein Complexes – specific/non-specific binding
Choi et al., Nature Methods 2010
Tackett et al. JPR 2005
Protein Complexes – specific/non-specific binding
Analysis of Non-Covalent Protein Complexes
Taverner et al., Acc Chem Res 2008
Non-Covalent Protein Complexes
Schreiber et al., Nature 2011
More / better quality
interactions
Affinity Capture Optimization Screen
+
Cell extraction
Lysate clearance/
Batch Binding Binding/Washing/Eluting
SDS-PAGE
Filtration
LaCava, Hakhverdyan, Domanski, Rout
Over 20 different extraction and washing
conditions ~ 10 years or art.
(41 pullouts are shown)
Molecular Architecture of the NPC
Actual model
Alber F. et al. Nature (450) 683-694. 2007
Alber F. et al. Nature (450) 695-700. 2007
Cloning nanobodies for GFP pullouts
• Atypical heavy chain-only IgG antibody produced in camelid family – retain high affinity for antigen without light chain
• Aimed to clone individual single-domain VHH antibodies against GFP – only ~15 kDa, can be recombinantly expressed, used as bait for pullouts, etc.
• To identify full repertoire, will identify GFP binders through combination of high-throughput DNA sequencing and mass spectrometry
VHH clone for
recombinant
expression
Cloning llamabodies for GFP pullouts
Llama GFP
immunization
Lymphocyte
total RNA Crude serum
VHH amplicon
454 DNA
sequencing
RT / Nested PCR IgG fractionation &
GFP affinity purification
VHH DNA
sequence library
GFP-specific
VHH fraction
LC-MS/MS
GFP-specific
VHH clones
Bone marrow
aspiration Serum bleed
500 400 300
1000 bp
0
100,000
200,000
300,000
400,000
500,000
No
. o
f R
ea
ds
Read length (bp)
V
H
VHH
Fridy, Li, Keegan, Chait, Rout
CDR3: 100.0% (14/14); combined CDR: 100.0% (33/33); DNA count: 10
MAQVQLVESGGGLVQAGGSLRLSCVASGRTFSGYAMGWFRQTPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS
CDR3: 100.0% (14/14); combined CDR: 72.7% (24/33; DNA count: 1
MADVQLVESGGGLVQSGGSRTLSCAASGRVLATYHLGWFRQSPGREREAVAAITWSAHSTYYSDSVKGRFTISIDNARNTGYLQMNSLKPEDTAVYYCTVRHGTWFTVSRYWTDWGQGTQVTVS
CDR3: 100.0% (14/14); combined CDR: 72.7% (24/33); DNA count: 1
MAQVQLVESGGALVQAGASLSVSCAASGGTISKYNMAWFRRAPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS
CDR3: 100.0% (14/14); combined CDR: 42.4% (14/33); DNA count: 1
MAQVQLEESGGGLVQAGDSLTLSCSASGRTFTNYAMAWSRQAPGKERELLAAIDAAGGATYYSDSVKGRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS
CDR3: 100.0% (14/14); combined CDR: 42.4% (14/33); DNA count: 1
MAQVQLVESGGGRVQAGGSLTLSCVGSEGIFWNHVMGWFRQSPGKDREFVARISKIGGTTNYADSVKGRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS
CDR1 CDR2 CDR3
Underlined regions are covered by MS
Rank sequences according to:
CDR3 coverage; Overall coverage;
Combined CDR coverage; DNA counts;
Identifying full-length sequences from peptides
Sequence diversity of 26 verified anti-GFP nanobodies
• Of ~200 positive sequence hits, 44 high confidence clones were synthesized
and tested for expression and GFP binding: 26 were confirmed GFP binders.
• Sequences have characteristic conserved VHH residues, but significant
diversity in CDR regions.
FR1 CDR1 FR2 CDR2 CDR3 FR3 FR4
HIV-1
gp120
Lipid Bilayer gp41
MA
CA
NC
PR
IN
RT
RNA
Particle
Genome
env
rev
vpu
tat
nef
3’ LTR 5’ LTR
vif gag
pol vpr
CA MA NC p6
PR RT IN
gp41 gp120
9,200 nucleotides
Genetic-Proteomic Approach
Tagged Viral Protein
Tag
Protein Complex SDS-PAGE
*
Mass
Spectrometry
I-Dirt for Specific Interaction
3xFLAG Tagged HIV-1 WT HIV-1
Infection
Light Heavy
(13C labeled Lys, Arg)
1:1 Mix
Immunoisolation
MS
I-DIRT = Isotopic Differentiation of Interactions as Random or Targeted
Lys Arg (+6 daltons) (+6 daltons)
Modified from Tackett AJ et al., J
Proteome Res. (2005) 4, 1752-6.
IDIRT and Reverse IDIRT
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.40 0.50 0.60 0.70 0.80 0.90 1.00
Sp
ec
ific
ity,
Re
rve
rse
Specificity, Forward
gp160 IDIRT: Forward-Reverse Ratio Comparison
Env-3xFLAG Vif-3xFLAG
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Re
vers
e R
ati
o
Forward Ratio
Forward and Reverse Ratio Comparison N = 273, ≥ 3 peptides quantified, S/N = 10.0
Luo, Jacobs, Greco, Cristae, Muesing, Chait, Rout
Protein Exchange
Vif-3F
Heavy labeled Vif-3F lysate
IP in heavy labeled Vif-3F
lysate
Vif-3F
Light labeled wt lysate
Incubation with light labeled wt lysate
Vif-3F
15min
Vif-3F
5min
Stable Interactor
Vif-3F
Interactor with fast
exchange
60min
Env Time Course SILAC
• Differentially labeled
infection harvested
at early or late
stage of infection
• Distinguish proteins
that interact with
Env at early or late
stage during
infection
Early during infection Late during infection
Light Heavy
(13C labeled
Lys, Arg) 1:1 Mix
Immunoisolation
MS
Early interactor Late interactor
M/Z
Peptides
Fragments
Fragmentation
Proteolytic
Peptides
Enzymatic Digestion
Protein
Complex
Chemical Cross-Linking
MS
MS/MS
Isolation
Cross-Linked
Protein Complex
Interaction Partners by Chemical Cross-Linking
M/Z
Peptides
Fragments
Fragmentation
Proteolytic
Peptides
Enzymatic Digestion
Protein
Complex
Chemical Cross-Linking
MS
MS/MS
Isolation
Cross-Linked
Protein Complex
Interaction Sites by Chemical Cross-Linking
Cross-linking
protein
n peptides with reactive groups
(n-1)n/2 potential ways to cross-link peptides pairwise
+ many additional uninformative forms
Protein A + IgG heavy chain 990 possible peptide pairs
Yeast NPC ˜106 possible peptide pairs
Protein Crosslinking by Formaldehyde
~1% w/v Fal
20 – 60 min
~0.3% w/v Fal
5 – 20 min
1/100 the volume
LaCava
Protein Crosslinking by Formaldehyde
RED: triplicate experiments, FAl treated grindate
BLACK: duplicated experiments, FAl treated cells (then ground)
SCORE: Log Ion Current / Log protein abundance Akgöl, LaCava, Rout
Cross-linking
Mass spectrometers have a limited dynamic range and it therefore important to limit the number of possible reactions not to dilute the cross-linked peptides. For identification of a cross-linked peptide pair, both peptides have to be sufficiently long and required to give informative fragmentation. High mass accuracy MS/MS is recommended because the spectrum will be a mixture of fragment ions from two peptides. Because the cross-linked peptides are often large, CAD is not ideal, but instead ETD is recommended.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Number of fragment ions
Pro
bab
ilit
y o
f L
ocali
zati
on
Phosphopeptide
identification
mprecursor = 2000 Da
Dmprecursor = 1 Da
Dmfragment = 0.5 Da
Phosphorylation
Localization of modifications
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Pro
bab
ilit
y o
f L
ocali
zati
on
Number of fragment ions
ID
3
Localization (dmin=3)
mprecursor = 2000 Da
Dmprecursor = 1 Da
Dmfragment = 0.5 Da
Phosphorylation
dmin>=3 for 47%
of human tryptic
peptides
Localization of modifications
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Pro
bab
ilit
y o
f L
ocali
zati
on
Number of fragment ions
ID
3
2
Localization (dmin=2)
mprecursor = 2000 Da
Dmprecursor = 1 Da
Dmfragment = 0.5 Da
Phosphorylation
dmin=2 for 33% of
human tryptic
peptides
Localization of modifications
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Pro
bab
ilit
y o
f L
ocali
zati
on
Number of fragment ions
ID
3
2
1
Localization (dmin=1)
mprecursor = 2000 Da
Dmprecursor = 1 Da
Dmfragment = 0.5 Da
Phosphorylation
dmin=1 for 20% of
human tryptic
peptides
Localization of modifications
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25
Pro
bab
ilit
y o
f L
ocali
zati
on
Number of fragment ions
ID3211*
Localization
(d=1*)
mprecursor = 2000 Da
Dmprecursor = 1 Da
Dmfragment = 0.5 Da
Phosphorylation
Localization of modifications
Peptide with two possible modification sites
Localization of modifications
Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsit
y
Localization of modifications
Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsit
y
Matching
Localization of modifications
Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsit
y
Matching
Which assignment does
the data support?
1, 1 or 2, or 1 and 2?
Localization of modifications
AAYYQK
Visualization of evidence for localization
AAYYQK
Visualization of evidence for localization
AAYYQK
AAYYQK
Visualization of evidence for localization
3
2
1
3
2
1
Estimation of global false localization rate using decoy sites
By counting how many times the phosphorylation is localized to
amino acids that can not be phosphorylated we can estimate the
false localization rate as a function of amino acid frequency.
0
0.005
0.01
0.015
0.02
0 0.05 0.1 0.15
0
0.005
0.01
0.015
0.02
0 0.05 0.1 0.15
Amino acid frequency
Fa
lse l
oc
ali
zati
on
fre
qu
en
cy
Y
S2
1
Sm
1
How much can we trust a single localization assignment?
If we can generate the distribution of scores for
assignment 1 when 2 is the correct assignment, it is
possible to estimate the probability of obtaining a certain
score by chance for a given peptide sequence and
MS/MS spectrum assignment.
SSmm
21
0
2
1
2
1
2
0
2
1
2
1
2
2
1
1
dSSF
dSSFp
S m
)(
)(
1.
2.
Is it a mixture or not?
If we can generate the distribution of scores for
assignment 2 when 1 is the correct assignment, it is
possible to estimate the probability of obtaining a certain
score by chance for a given peptide sequence and
MS/MS spectrum assignment.
S1
2
Sm
2
SSmm
21
0
1
2
1
2
1
0
1
2
1
2
1
1
2)(
)(2
dSSF
dSSFp
Sm
1.
2.
ppppthth
and1
2
2
11 and 2
ppppthth
and1
2
2
11
ppppthth
and1
2
2
1
ppppthth
and1
2
2
11 or 2
Ø )( ppSS mm 1
2
2
121
Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsit
y
Matching
Which assignment does
the data support?
1, 1 or 2, or 1 and 2?
Localization of modifications
Proteomics Informatics – Protein characterization: post-translational
modifications and protein-protein interactions (Week 10)