1 results from the hupo plasma proteome project: core dataset of 3020 proteins us hupo symposium 14...

53
1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan, Richard Simpson, Henning Hermjakob, and Sam Hanash, on behalf of the HUPO PPP Investigators

Upload: andra-diana-newman

Post on 15-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

1

RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS

US HUPO Symposium

14 March, 2005

Gilbert S. Omenn, David States, Dan Chan, Richard Simpson, Henning Hermjakob, and

Sam Hanash, on behalf of the HUPO PPP Investigators

Page 2: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

2Protein DNA

Page 3: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

3

LONGTERM SCIENTIFIC GOALS OF HUPO PPP

1. Comprehensive analysis of plasma and serum protein constituents in people

2. Identification of biological sources of variation within individuals over time, with validation of biomarkers Physiological: age, sex/menstrual cycle, exercise Pathological: selected diseases/special cohorts Pharmacological: common medications

3. Determination of the extent of variation across populations and within populations

Page 4: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

4

HUPO HUMAN PLASMA PROTEOME

PROJECT (PPP)HUPO PPP Participating Labs

Technology Vendors

Development & Validation of Biomarkers

Liver and Brain Proteome Projects

Reference Specimens

Technology Platforms--Separation and Identification

Serum vs Plasma

Omenn GS. The Human Proteome Organization Plasma Proteome Project Pilot Phase: Reference Specimens, Technology Platform Comparisons, and Standardized Data Submissions and Analyses. Proteomics 2004;4:1235-1240.

Scheme Showing Aims and Linkages of the HUPO Plasma Proteome Project

Page 5: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

5

PPP TECHNICAL COMMITTEE STRUCTURE

• Reference Specimens and Specimen Handling Issues

(Dan Chan, chair)

• Technology Platforms & Protocols (Richard Simpson)

• Database Development and Links with EBI (HUPO/PSI)

(David States, Henning Hermjakob)

• Population Cohorts/Specimen Banks (Gerard Siest)

• Education & Training Committee (Peipei Ping)

• Executive Committee (including Partnerships) (Omenn)

Page 6: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

6

1. BD: specially prepared male/female pooled samples, divided into EDTA-, Heparin-, and Citrate-anti-coagulated Plasma and Serum (250 ul x4 of each).

BD clot activator. No protease inhibitors. Three separate ethnic pools prepared. Shipped frozen.

2. Chinese Academy of Medical Sciences: Sets of three plasmas + serum, similar to BD protocol.

3. National Institute for Biological Standards & Control, UK: citrate-anti-coagulated, freeze-dried plasma, from 25 donors, prepared for Intl Soc Thrombosis & Hemostasis, 1 ml aliquots/ampoules.

SERUM AND PLASMA REFERENCE SPECIMENS

Page 7: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

7

UPDATED SUMMARY OF PPP LABS

31 Total Participating Labs (18 US, 13-International):

9 – US Academic, 3 – US Federal, 6– US Corporate

4 – Europe, 1– Israel, 6 – Asia, 2 – Australia

LC-MS/MS datasets from 18; MALDI-MS from 5; SELDI-MS from 8; antibody arrays/immunoassays from 4

Number that analyzed various reference specimens:

9 – UK NIBSC, 26 – BD b1, Caucasian-American

9 – BD b2/b3, African- and Asian-American, 5 -CAMS

Page 8: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

8

Arie Admon, Technion, Haifa, IsraelRuedi Aebersold, Institute for Systems Biology, Seattle William Hancock, Barnett Institute, Northeastern UnivStan Hefta, Bristol-Myers Squibb, NJHelmut Meyer, Ruhr University BochumGil Omenn/Sam Hanash/Phil Andrews/Mike Pisano, MI Young Ki Paik, Yonsei Research Center, KoreaJohn Peltier, Myriad Proteomics Inc.Peipei Ping, UCLAJoel Pounds, Pacific Northwest Natl Lab Xiaohong Qian, Beijing Institute of Radiation Medicine Richard Simpson, Ludwig Institute for Cancer Research David Speicher, Wistar InstituteRong Wang, Mass Spec Proteomics Lab, Mount SinaiValerie Wasinger, Univ of New South WalesChi Yue Wu, Institute of Biol Chem, Acad Sinica, TaiwanXiaohang Zhao, Natl Lab of Molecular Oncology, CAMSRobert Gerszten, Harvard/Erik Forsberg, Amersham-GE

Page 9: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

9

SELDI LabsBao-Ling Adam, Univ of GeorgiaAlexander Archakov, Institute of Biomedical Chemistry, Moscow Dan Chan/Alex Rai, Johns Hopkins Kenneth Greis, Procter & Gamble Eastwood Leung, Genome Institute of Singapore Sandra McCutcheon-Maloney/Brett Chromy, Lawrence Livermore Lab William Morgan, Univ of Missouri-KC, Jean-Charles Sanchez, Geneva Proteomics Research CenterPaul Stemmer, Wayne State University

MALDI-MS LabsAlexander Archakov, Institute of Biomedical Chemistry, Moscow, Erik Forsberg, Amersham Biosciences, Uppsala, SwedenYoung-ki Paik, Yonsei Proteome Research CenterAkira Tsugita, Proteomics Research Lab, Tsukuba, Japan

Immunoassay LabsBrian Haab, Van Andel Research InstituteFrank Vitzthum, Dade Behring, Marburg GMBH, GermanyMark Driscoll, Molecular Staging IncBernhard Geierstanger, Genomics Inst of Novartis Research Fdn

Page 10: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

10

SPECIFICATIONS FOR DATA SUBMISSION

Each lab was instructed (July, 2003) to provide

a) a detailed experimental protocol, to “push the limits” to detect low-abundance proteins

b) peptide sequences, rated as “high” or “lower” confidence, based on MS/MS criteria

c) protein IDs from IPI 2.21 (July 2003) and search engine used to align peptide sequences with proteins in human database

Later, we requested m/z peak lists and raw spectra (by CD or DVD); search parameters

Page 11: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

11

CRITERIA FOR HIGH CONFIDENCE IDENTIFICATION OF PEPTIDES, ILLUSTRATED WITH SEQUEST

Xcorr: singly-charged ion, >=1.9

doubly-charged ion, >=2.2

triply-charged ion, >=3.75

Delta Cn >= 0.1; Rsp <= 4

Fully tryptic

Page 12: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

12

Database Design and Implementation

RDBMS– Stable, proven technology– Data validation

Commercial package– Microsoft SQL Server– Stable and supported– Full RDBMS functionality

• Transactions• Referential integrity

checks

– Effective development tools

• GUI• Cross-tab extension

IdentificationsIdentificationsLaboratoryLaboratorySampleSampleMethodMethodDatabase IDDatabase IDPeptidesPeptides

LaboratoriesLaboratories

SamplesSamples

MethodsMethods

Page 13: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

13

Bioinformatics Acknowledgements

University of MichiganDavid StatesMarcin AdamskiThomas BlackwellRajasree MenonYin Xu

EBI – EnglandRolf Apweiler

Henning HermjakobChris Taylor

Nicky Mulder Sandra Orchard

Ludwig Institute - AustraliaRichard SimpsonEugene Kapp

James Eddes

Institute for Systems Biology Jimmy Eng Alexey Nesvizhskii

Technion/IBM, Ilan Beer

Page 14: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

14

Integration Algorithm (Adamski et al)

Objectives: o Integrate results from disparate instruments, search engines, and specimens o Evaluate concordance between results from different laboratories o Reduce ambiguity and redundancy of the identifications o Select accession numbers of the most representative and protein for those matching equally.

We designed a workflow that uses sequences of identified peptides, rather than submitted protein accession numbers.

Page 15: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

15

Numbers of Proteins Identified (LC-MS/MS or FTICR-MS)

From 15,519 reported distinct protein IDs in IPI 2.21, chose representative protein for clusters:

(a) all protein IDs (high and lower conf)

9504 = 1 or more peptide matches (>=6 aa)

3020 = 2 or more peptide matches [1274 = 3+]

[2580 in plasma x3; 2353 in serum; 1913 in both]

(b) all protein IDs (high conf peptides, only)

2852 = 1 or more peptide matches

http://www.bioinformatics.med.umich.edu/app1/

MsSqlAccess [UM] and www.ebi.ac.uk/pride [EBI]

Page 16: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

16

Distribution of protein identifications in function of peptides detected per protein

97%92%91%

25% 75%

86%

0

2,000

4,000

6,000

8,000

10,000

≥ 1 ≥ 2 ≥ 3 ≥ 4 ≥ 5 ···//··· ≥ 10

number of peptides per protein detected across experiments and laboratories

nu

mb

er o

f id

enti

fied

pro

tein

s

all identifications - left axis confirmed identifications

Page 17: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

17

ESTIMATION OF ERROR RATE Poisson Model (States/Blackwell)Ndb, total non-redundant protein entries in IPI v2.21 (49,924)Lambda, proportion of matches false-positives Upper bound: all 9504 FP, = 0.211 Lower bound: accept 1920 high confidence single-peptide-based protein IDs, reject 4864 lower confidence, = 0.146Pr (true positives): 4 peptides, 0.99 3 peptides, 0.95-0.98 2 peptides, 0.70-0.85Use 2+ peptides to obtain more representative dataset.

Page 18: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

18

Virtual 2D gel

Proteins detected with at least 2 peptides

All Detected Proteins

All proteins in IPI 2.21

Page 19: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

19

INDEPENDENT ANALYSES FROM RAW SPECTRA (#IDs with 2+ peptides)

Core Dataset (18 datasets, 3020)

• Mascot/Digger (Kapp, Australia, 18 datasets, 3178)

• PepMiner (Beer, Israel, 8 large datasets, 2902)

(c) PeptideProphet/ProteinProphet (Eng, USA, 5 datasets, 508)

Plus alternative integration scheme with Sequest (Eddes, Australia, 18 datasets, 2344)

Page 20: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

20

GREATEST RESOLUTION AND SENSITIVITY

The most extensive high-confidence yield was from combined use of immunoaffinity (“top-6”) depletion, 2 or 3-D high-resolution fractionation, and then ESI-MS/MS with ion-trap LTQ instrument.

LTQ gave several fold more IDs than did LCQ in same hands (B1-serum vs B1-heparin).

Page 21: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

21

Page 22: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

22

SPECIFIC OBSERVATIONS: DEPLETION

• Many investigators depleted albumin and/or immunoglobulins

• Several obtained Agilent immunoaffinity column to remove Top-6 proteins.

• Much higher numbers of identifications after depletion.

• Inadvertent removal of other proteins open issue: LC-MS/MS vs gels; “sponge” effect of albumin.

• Feasible to assay both flow-through & bound fractions.

Page 23: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

23

Example of Depletion Analysis Echan/Speicher

Immunoaffinity/Top-6 polyclonal (Agilent) o Column for HPLC o Spin columnTwo-antibody spin column (Proteoprep); IgYCibacron Blue (for albumin)Protein A or G (for immunoglobulinsTop-6 best: 85% of protein removed; least non-

target removal (lots of fragments of top 6); few “new” proteins on 2D gel despite 10-20X

loading Suggest depleting 12-20 proteins OR using multi-

dimensional (microSol IEF) fractionation

Page 24: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

24

Glycoprotein-Enriched Subproteomes[Hancock presentation this afternoon]

Methods Lab 2 Lab 11

Enrichment hydrazide chem lectin chrom’y

Peptide Fxn SCX + RP RP

Mass Spec qtof deca-xp

Search engine Seq/ProteinProphet Sequest

Protein IDs 222 83

in B1-serum [51 in common]

Of total 254, 164 found among data from 11 other labs without glycoprotein enrichment.

Page 25: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

25

A B

First dimension fraction numbers (relative pI) and estimated MW of identified proteins. Left (A): 39 locations with complement component 3 precursor (C3); (B):14 locations with clusterin (CLU).

Page 26: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

26

INFLUENCE OF ABUNDANCE

Using quantitative immunoassays and microarrays (generally unknown epitopes), we have found very high rates of detection of the more abundant proteins, less in the mid-range, and occasional detection of very low abundance proteins, as expected.

High correlation (r=0.9) between # peptides and measured concentrations.

Page 27: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

27

Page 28: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

28

log10(N)=0.365* log10(conc)-0.711; r2 = 0.92

concentration (pg/ml)

# o

f pe

ptid

e id

en

tifca

tion

s

100 10,000 1e6 1e8 1e10

11

01

00

1,0

00

Page 29: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

29

Least Abundant Proteins Identified with two distinct peptides

(pg/ml: range 200 pg/ml to 20 ng/ml)

Alpha fetoprotein 2.9E+-02 TNF-R-8 3.3E+02 TNF-ligand-6 1.5E+03 PDGF-R alpha 4.6E+03 Leukemia inhibitory factor receptor 5.0E+03 MMP-2/gelatinase 8.8E+03 EGFR 1.1E+04 TIMP-1 1.4E+04 IGFBP-2 1.5E+04 Activated leukocyte adhesion mol 1.6E+04 Selectin L [five labs;10 peptides] 1.7E+04

Page 30: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

30

SPECIMEN VARIABLES

What evidence have we developed for choice of specimens for analysis?

Plasma preferred over serum

Citrate or EDTA preferred over Heparin for plasma

Protease inhibitors desirable, but complicated

Clot activator unnecessary (serum only)

Minimize freeze/thaw cycles (archives)

Avoid 4C step

Page 31: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

31

SPECIMENS

The sets of four specimens from a given donor pool yielded rather similar numbers of proteins when analyzed identically. More fragmentation of serum. Little evidence of platelet in vitro contamination.

Quantitative immunoassays show generally 15-20% lower values for citrate-plasma, due to dilution and osmosis; no interference with or loss of identifiable proteins.

Page 32: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

32

PROTEASES

Should anti-protease cocktails be used in specimen collection, or in a later step?

Advantages: reduce proteolytic degradation ex vivo; reduce complexity of peptides after tryptic digestion.

Disadvantages: adds peptides, as well as small molecules, to the mix to be analyzed; may covalently modify the proteins (ABESF does so).

Page 33: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

33

BIOLOGICAL INSIGHTS

The proteins identified can be annotated by many methods. We have searched multiple databases, including Gene Ontology, Novartis Atlas, Online Mendelian Inheritance in Man (OMIM), incomplete or unidentified sequences in the human genome, microbial genomes, and protein domains.

Some examples follow.

Page 34: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

34

Shown in the figure are the rates of occurrence of Gene Ontology terms in the HUPO PPP 3020 set relative to the frequency of occurrence of the same terms in the human genome. The solid line shows a linear regression estimate for the frequency that would be expected if the 3020 uniformly sampled the genome. The parallel dotted lines show 2 fold over and under representation relative to uniform sampling. The curved dashed lines show over and under representation by 3 standard deviations.

1 5 10 50 100 500 1000

12

51

02

05

01

00

20

0

occurrences in genome

occ

urr

en

ces

in P

PP

GO term usage in the PPP 3020 vs. Human Genome

Page 35: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

35

Over- and Under-Represented GO Terms

Over: extracellular, immune response, blood coagulation, lipid transport, complement activation, regulation of blood pressure; also, cytoskeletal proteins, receptors and transporters

Under: perception of smell (1 vs 25 expected); cation transporters, ribosomal proteins, G-protein coupled receptors, and nucleic acid binding proteins

Page 36: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

36

1 5 10 50 100 500 1000

12

51

02

05

0

occurrences in genome

occ

urr

en

ces

in P

PP

InterPro domain usage in the PPP 3020 vs. Human Genome

Page 37: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

37

OVER- AND UNDER-REPRESENTED DOMAINS IN INTERPRO FOR PPP vs FULL

IPI DATASET

Over: EGF, intermediate filament protein,

sushi, thrombospondin, complement C1q,

and cysteine protease inhibitor

Under: Zinc finger (C2H2, B-box, RING), tyrosine protein phosphatase, tyrosine and serine/threonine protein kinases, helix-turn-helix motif, and IQ calmodulin binding region

Page 38: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

38

GENE ONTOLOGY SPECIFIC TERMS

Over-represented in PPP 3020 (vs whole genome): “extracellular”, “immune response”, “blood coagulation”, “lipid transport”, “complement activation”, “regulation of blood pressure”, as expected; also: cytoskeletal proteins, receptors and transporters.

Proteins from most cellular locations and molecular processes are recognized.

Under-represented: “perception of smell” (1 vs 25 exp); cation transporters, ribosomal proteins, G-protein coupled receptors, and nucleic acid binding proteins.

Page 39: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

39

InterPro Protein Domain Analysis

Compared with the whole human genome, the 3020 PPP proteins are:

Over-represented for EGF, intermediate filament protein, sushi, thrombospondin, complement C1q, and cysteine protease inhibitor, and

Under-represented Zinc finger (C2H2, B-box, RING), tyrosine protein phosphatase, tyrosine and serine/threonine protein kinases, helix-turn-helix motif, and IQ calmodulin binding region domains.

Page 40: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

40

TRANSMEMBRANE AND SECRETED PROTEIN FEATURES

1297 of 3020: SwissProt Annotated ProFun Both

Transmembrane 230 151 104

Secretion signal 373 420 358

1723 of 3020: ProFun Predicted TM domain(s) 137

Secretion signal 255

Page 41: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

41

Cardiovascular-Related Proteins Biomarker Candidates in the PPP Database

(Vondriska-Ping: presentation today)

Proteins characterized in eight groups:

Inflammation

Vascular

Signaling

Growth and differentiation

Cytoskeletal

Transcription factors

Channels

Receptors

Page 42: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

42

PROTEINS FROM INHERITED CANCER DISORDERSLinking IPI IDs and Mendelian Inheritance in Man (OMIM)

IPI Cancer Types ProteinLabs

No of Peptides

IPI00012391.1 Colorectal APC 2 2

IPI00017303.1 Colorectal , NHPCC; Ovarian DNA mismatch repair protein Msh2 2 2

IPI00020732.2 Medullary or papillary thyroid Tyrosine kinase ret receptor precursor 2 3

IPI00025087.1 Colorectal Cellular tumor antigen p53 1 3

IPI00031036.1 Colon Chloride anion exchanger 2 4

IPI00164713.1 Breast, Endometrial, Gastric, Ov Epithelial-cadherin precursor 2 4

IPI00181932.1 Prostate Zinc phosphodiesterase 2 5

IPI00185027.1 Pancreatic Arg-Glu dipeptide (RE) repeats 2 2

IPI00218982.1 Breast , Ovarian BRCA1 3 5

IPI00257731.1 Prostate N33 protein. 2 2

IPI00289819.1 Hepatocellular Cation-ind mannose-6-P receptor precursor 2 3

IPI00293471.1 Breast 2, Pancreatic BRCA2 4 8

IPI00294982.1 Breast Estrogen receptor 2 2

IPI00329643.1 Endometrial DNA mismatch repair protein Msh3 3 3

Page 43: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

43

IDENTIFICATION OF 94 NOVEL PEPTIDES USING WHOLE GENOME ORF SEARCH

States has enhanced the annotation of the Human Genome by identifying novel and cryptic genes not previously known to have protein products. Mass spectra peaklists from a subset of PPP labs were searched against all ORFs in NCBI Build 33 in all three reading frames and both strands, using X!Tandem.

A bonus of the PPP: protein to DNA mapping of the human genome!

Page 44: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

44

COMPARISON WITH LITERATURE

Report #IDs #IPI in 3020 in 9504

Anderson 1175 990 316 471

Shen [1682] 1842 213 526

Chan 1444 1019 257 402

Zhou 210 107 51 68

Page 45: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

45

NEXT STEPS

1. We are in the homestretch on manuscripts from the Pilot Phase of the PPP for Special Issue of PROTEOMICS August 2005 & for Nature Biotech.

2. Plan potential future phases of PPP

a) Identify and perform critical experiments to support development of standardized procedures for specimens, fractionation, analysis.

b) Provide high-quality bioinformatics and database for plasma proteome datasets from all sources, assuring linkage with organ-proteomes.

c) Organize strategies, labs, and bioinformatics for large-scale studies, or play facilitation role.

Page 46: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

46

DEFINE HIGH THROUGHPUT OPTIONS FOR LARGE-SCALE PROTEOMICS STUDIES (1)

Admon/Dongre: LC-MS with highly accurate mass and elution time parameters for peptide IDs

Combine with depletion; rely on very slow flow (2 hr) LC and accurate mass and elution characteristics for mass fingerprints, after building a high-quality mass x elution database.

Page 47: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

47

DEFINE HIGH THROUGHPUT OPTIONS FOR LARGE-SCALE PROTEOMICS STUDIES (2)

Mann, Beijing Congress (2004)Use MS (3) with FTICR for much greater precision of mass determination and for detection and localization of post-translational modifications.Probably convert to microarrays for high throughput of clinical and epidemiological specimens.

Page 48: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

48

Genome-Wide Studies of Proteome (3) Humphery-Smith (Proteomics 2004;4:2519-21)

Design and produce affinity ligands against conserved regions in each ORF for signal enrichment: antibodies, receptins, aptamers; sequence strings unencumbered by PTMs, uncleaved, near 5’ end, exposed at surface

Use ECL, rolling circles, isotopic labeling, and/or light scattering as readout technologies.

Page 49: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

49

Large-Scale Proteomics Studies (4)

Aebersold (Nature 2003;422:115-116)

Go from discovery using MS to “browsing” using unique chemically-synthesized peptides tagged with heavy isotope for each gene and even each protein isoform.

Combine this standard peptide mixture with specimen fractions on sample plate for MS, examine double peaks (with the precise differential mass) in the ordered peptide array.

Try the method first on yeast.

Page 50: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

50

HUPO PPP SUPPORT FROM NIH

Trans-NIH Consortium

Natl Cancer Inst: Div CancerPrevention;

Div Cancer Treatment

Natl Institute on Aging

Natl Inst on Alcoholism & Alcohol Abuse

Natl Inst on Diabetes, Digestive, & Kidney

Diseases

Natl Inst for Environmental Health Science

Natl Inst for Neurologic Diseases & Stroke

Page 51: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

51

CORPORATE SPONSORS

Johnson & Johnson Abbott Labs

Pfizer Novartis

Invitrogen Amersham

Procter & Gamble

BD Biosciences BioVision

Ciphergen Molecular Staging

Bristol Myers Squibb Sigma-Aldrich

Agilent Dade Behring

Page 52: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

52

OUR GENETIC FUTURE

“Mapping the human genetic terrain may rank with the great expeditions of Lewis and Clark, Sir Edmund Hillary, and the Apollo Program.”

--Francis Collins, Director

National Human Genome Research Institute, 1999

Next: Functional Genomics/Systems Biology

Understand the dynamic proteomic compartments.

Page 53: 1 RESULTS FROM THE HUPO PLASMA PROTEOME PROJECT: CORE DATASET OF 3020 PROTEINS US HUPO Symposium 14 March, 2005 Gilbert S. Omenn, David States, Dan Chan,

53