thanks to: george church (harvard gtl & cegs centers) 5-jan-2006 hpcgg landsdowne 2 pm personal...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Thanks to:
George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM
Personal Genomics meets Quantitative Proteomics
NHGRI Seq Tech 2004: Agencourt, 454, Microchip, 2005: Nanofluidics, Network, VisiGen Affymetrix, Helicos, Solexa-Lynx
"Open-source" Personal Genome Project (PGP)
• Harvard Medical School IRB Human Subjects protocol submitted 16-Sep-2004, approved Aug 31, 2005.
• Gradual plan. Start with "highly-informed" individuals consenting to non-anonymous genomes & extensive phenotypes (medical records, imaging, omics).
• Cell lines in Coriell NIGMS Repository
• Diploid genome subsets at $0.1/kb, <3E-7 FP Errors How? Polony bead Sequencing-by-Ligation (SbL)
Analyses of single chromosomes (single cells , RNAs, particles)
(1) When we only have one cell as in Preimplantation Genetic Diagnosis (PGD) or environmental samples
(2) Candidate chromosome region sequencing
(3) Prioritizing or pooling (rare) species based on an initial DNA screen.
(4) Multiple chromosomes in a cell or virus
(5) RNA splicing
(6) Cell-cell interactions (predator-prey, symbionts, commensals, parasites)
CD44 Exon Combinatorics (Zhu & Shendure)
• Alternatively Spliced Cell Adhesion Molecule• Specific variable exons are up-or-down-regulated in
various cancers (>2000 papers)• v6 & v7 enable direct binding to chondroitin sulfate,
heparin…
Zhu,J, et al. Science. 301:836-8.
Zhu J, Shendure J, Mitra RD, Church GM. Science 301:836-8. Single molecule profiling of alternative pre-mRNA splicing.
EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
Eph4 = murine mammary epithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
CD44 RNA isoforms
Molecular Weight Assessment of Proteins in Total Proteome Profiles Using 1D-PAGE and LC/MS/MS.
Proteome Sci. 3:6 (2005) Ahmad R, Nguyen DH, Wingerd MA, Church GM, Steffen MA.
Candidates for alternative splicing (AS), endoproteolytic processing (EPP), & post-translational modifications (PTMs) in Lymphoblastoid cells
Protein Name Predicted MW Observed MW Difference before & after leader cleavageCytochrome c oxidase subunit IV isoform 1 19577 2582 205NADH dehydrogenase 21750 5084 334Coproporphyrinogen oxidase 50175 13632 357MHC II, DQ 1 29733 25896 404NADH (ubiquinone) Fe-S protein 2 52545 48185 815Mito short-chain enoyl-coA hydratase 1 31371 27499 901Peptidylprolyl isomerase B (cyclophilin) 23742 19360 940
-Glc-1P ADP-Glc -1,4-glucosyl-glucan glycogenCentralCarbonMetabol.
glgC
glgX
glgA glgB
glgP
Glycogen metabolism
Time (hours)
0 4 8 12 16 20 24 28 32 36 40 44 48
Nor
mal
ized
Exp
ress
ion
0.1
1
10
glgAglgBglgCglgXglgP
Zinser et al. unpublZinser et al. unpubl..
Light regulated Circadian metabolism
Viral Photosynthetic Proteins
Podovirus P-SSP7 46 kb
PC HLIPs Fd D1
12kb 24kb
PC HLIPs Fd D1
12kb 24kb
~500 bp
HLIPs D1 D2
6.4kb 2.8kb
~500 bp
Myovirus P-SSM4 181 kbHLIPs D1 D2
6.4kb 2.8kb
Lindell, Sullivan, Chisholm et al. 2004Lindell, Sullivan, Chisholm et al. 2004
HLIP D1
Myovirus P-SSM2 255 kb
Photosynthesis genes in marine viruses yield proteins during host infection.
Nature 2005 438:86-9. Lindell D, Jaffe JD,
Johnson ZI, Church GM, Chisholm SW.
Photosynthesis genes in marine viruses yield proteins during host infection.
Nature 2005 438:86-9. Lindell D, Jaffe JD, Johnson ZI, Church GM, Chisholm SW.
15N 13C synthetic standards
host
phage
Improving MS Peptide Coverage
? Ionization efficiencyX Ions outside the mass range of the analyzer ? Chromatographic behavior ? Sample preparation bias X Instrument duty cycle • Improve Spectra interpretation over current algorithms
– Details of fragmentation patterns– Dipeptide P, DE/KR, V.G intensity effects– B & Y ions unequal & co-dependent – More intense ions in middle of peptides
MDQuest: Mike Chou, Dan Schwartz, Steve Gygi, Josh Elias http://gygi.med.harvard.edu/dpsp/
SEQUEST vs MDQUEST PerformanceROC Curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 - Specificity (FP rate)
Se
nsi
tivity
(T
P r
ate
)
sequest
mdquest
MapQuant is a program designed to isolate unique organic species and quantify their relative
abundances from an LC/MS experiment.
Scheme: Data from an LC/MS experiment are analyzed after being formatted into a data structure called a 2-D map, analogous to a gray-scale image.
Scan number: N N+1 N+2 N+3
2-D peptide map
time or scans
m/z
uni
ts
m/z
uni
ts
2-D map
Retention time
m/z
uni
ts
MapQuant Gives a List of All Organic Species In the Sample
MapQuant
AbundanceVolume
Retention Time
RT m/z
MZ Charge Carbons
60123 27.30 0.118 828.938 0.0117 2 7530227 42.67 0.162 772.432 0.0102 2 7619363 48.01 0.150 913.449 0.0143 3 13513838 34.52 0.131 736.060 0.0092 3 1089726 28.17 0.129 797.385 0.0108 2 745370 34.19 0.131 762.360 0.0099 2 744729 52.25 0.153 906.988 0.0141 2 871612 47.22 0.136 786.402 0.0105 4 165151 24.65 0.116 883.525 0.0132 1 33
MapQuant is a program designed to isolate unique organic species and quantify their relative
abundances from an LC/MS experiment.
Scheme: Data from an LC/MS experiment are analyzed after being formatted into a data structure called a 2-D map, analogous to a gray-scale image.
Scan number: N N+1 N+2 N+3
2-D peptide map
time or scans
m/z
uni
ts
m/z
uni
ts
2-D map
Retention time
m/z
uni
ts
MapQuant Gives a List of All Organic Species In the Sample
MapQuant
AbundanceVolume
Retention Time
RT m/z
MZ Charge Carbons
60123 27.30 0.118 828.938 0.0117 2 7530227 42.67 0.162 772.432 0.0102 2 7619363 48.01 0.150 913.449 0.0143 3 13513838 34.52 0.131 736.060 0.0092 3 1089726 28.17 0.129 797.385 0.0108 2 745370 34.19 0.131 762.360 0.0099 2 744729 52.25 0.153 906.988 0.0141 2 871612 47.22 0.136 786.402 0.0105 4 165151 24.65 0.116 883.525 0.0132 1 33
Leptos et al. Proteomics 2006
MapQuant is publicly available at http://arep.med.harvard.edu/mapquant.html
Leptos et al. Proteomics 2006
Leptos et al. Proteomics 2006
retention time (in min)
m/z
units
EKLAVSAR
QEPERSEK
DAFLSGER
??
?
MapQuant gives me a list of all organic species in the sample BUT
WHAT ARE THEIR IDENTITIES?
MapQuant identifies approx. 2x104 organic species per LC/MS experiment.
ONLY ~ 500 (3%) organic species have fragmentation (CID) spectra and hence sequence IDs
retention time (in min)
EKLAVSAR
QEPERSEK
DAFLSGER
??
?m/z units
Dealing With Many Peptides (Organic Species)22
= CID spectrum or MS/MS event
Dealing With Many Peptides (Organic Species)
retention time (in min)
EKLAVSAR
QEPERSEK
DAFLSGER
??
?
Database of 11845 peptides from ALL LC/MS experiments carried out on
Prochlorococcus samples
(rt, m/z) coordinatesm/z units
Proteins observedin diel
experiment
Proteinsobserved in experimentsprior to diel
TOTAL NUMBER OF ORFS: 1742
1314 539
522792 17
Protein Distribution Among Experiments
Sequence Coverage of the Protein groES
Summary
Proteome Sci. 3:6 (2005) Ahmad R, Nguyen DH, Wingerd MA, Church GM, Steffen MA.
• Open Personal Genome Project (PGP) including Proteomics• Single molecule RNAs for alternative splicing (AS)• Gel –MS methods for endoproteolytic processing • MapQuest for MS quantitation without isotopic labeling
http://arep.med.harvard.edu
Thanks to:
George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM
Personal Genomics meets Quantitative Proteomics
NHGRI Seq Tech 2004: Agencourt, 454, Microchip, 2005: Nanofluidics, Network, VisiGen Affymetrix, Helicos, Solexa-Lynx