e’ vero che tutta l’informazione sta nel genoma . 17 f.magni - siena 2014 swiss-prot: release...
TRANSCRIPT
1
F.Magni - Siena 2014
Proteomica e Spettrometria di Massa: applicazioni biochimiche e
clinicheFulvio Magni
Dipartimento di Medicina Sperimentale
Facoltà di Medicina e Chirurgia
Università degli Studi di Milano-Bicocca
Dipartimento di Medicina Sperimentale
F.Magni - Siena 2014
Genome1 Gene Single Protein Disease
E’ vero che tutta l’informazione sta nel genoma ?
2
F.Magni - Siena 2014
Differenze nel Genoma ≈1%
Differenze nel Proteoma >1 %
F.Magni - Siena 2014
Identico Genoma Differente Proteoma
3
F.Magni - Siena 2014
Only 2% of human disease results from a single gene defect
Alternative splicingDNA mRNA Protein5-10.000 activated genes 15-30.000 proteins
Final form of proteins (3D and function) cannot be predicted with certainty from the linear codes of genes
Most proteins are modified after they are synthesised
Why do we analyse proteins ?
Proteins are the molecules of the correct cellular function
Single gene defect => Which proteins is altered ?
F.Magni - Siena 2014
Genome Proteome
Proteome indicates the PROTEins expressed by a genOME or tissue
PROTEOME
Proteomics is the large-scale study of gene expression at the proteinlevel.
PROTEOMICS
4
F.Magni - Siena 2014
Alternative splicingDNA mRNA5-10.000 activated genes
Why do we analyse
proteins ?
Protein15-30.000 proteins
PTMs
F.Magni - Siena 2014
Proteome, unlike genome, is not a fixed feature of an organ
A single genome can give rise to an essential infinitive number of qualitatively and quantitatively different proteomes depending on:…...
The simultaneous study of the whole range of proteins expressed in acell at any given time
PROTEOME
Which components of proteomic profile:
-are relevant for human disease Diagnosis
-are excellent therapeutic target Therapy
Aim
5
F.Magni - Siena 2014
Expression Proteomics
Normal sample Disease sample
1st Separated and visualised by 2D-gel electrophoresis
F.Magni - Siena 2014
Expression Proteomics
Normal sample2nd Gel images are compared with a special software
Disease sample
Not present in normal sample:Prostate cancer
Low levels in normal sample:Alzheimer
High levels in normal sample:Parkinson
6
F.Magni - Siena 2014
Proteomics has now branched into two specific disciplines:
Expression proteomics (classical):qualitative
displaying on 2D-PAGE or alternative technique, identification by mass spectrometry
quantitative evaluation of different expression
Functional proteomics: localization or identification studies of proteins with specific biological activities and interaction studies.
General strategy for proteins study:Steps: Methodology1- Purification – Isolation 2D-PAGE2- Identification – Characterization Mass spectrometry3- Database searching Bioinformatics
F.Magni - Siena 2014
Expression Proteomics:3rd Proteins that differ in abundance between the gels are identified by MS
2D-PAGE
Protein Band
Cut out
1- Enzymaticdigestion(i.e. trypsin)
2-Peptide extraction
3-Peptide mass fingerprint by MSanalysis
4-Database search
Evaluation of the Mr of each tryptic peptide
Identification of the protein by databasesearching
7
F.Magni - Siena 2014
PARTE ANALITICA:Protocollo di digestione con tripsina
Per ridurre S-S
Per rimuovere i reagenti in eccesso
Riduzione edalchilazione
Taglio delle bande proteiche (spots) dal gel
Lavaggio del gel Per rimuovere colorante e SDS
Aggiunta di ditiotreitolo (DTT)
Aggiunta di iodoacetamide Per alchilare S-H
Lavaggio del gel
Incubazione a 37°C overnight
Aggiunta di TripsinaDigestione
in-gel
1
2
3
4 Estrazione
F.Magni - Siena 2014
Proteomics: MALDI-TOF ? Peptides mass fingerprint:
A set of peptide molecular weight from an enzyme digestion of a protein are evaluated by mass spectrometry.
MALDI-TOF:
Analyze high masses >100kDa
Measure entire mass range
Compatible with many buffers
Applicable to variety of compound types
High sensitivity (low fmole)
Molecular weight and structure info (PSD, TOF/TOF)
EASY TO DO
8
F.Magni - Siena 2014
MALDI-TOF: How’s It Work?
•Pulsed laser
2. Target is introduced into high vacuum of MS
4. Ions are accelerated by an electrical field to the same kinetic energy, and they drift (or fly) down a field free flight tube where they are separated in space.
el Flight tube
1. Sample is mixed with matrix& dried on sample plate
High vacuum
Time
High voltage
3. Sample spot is irradiated with laser, desorbing ions into the gas phase and starting the clock measuring the time of flight.
20 - 30 kV
6. A data system controls all instrument parameters, acquires the signal vs. time, and permits data processing.
5. Ions strike the detector at different times, depending on the mass to charge ratio of the ion.
F.Magni - Siena 2014
Tripsina:Lys, Arg
S
S
m/z Intens.824.478 2122.39842.504 1293.31940.328 26750.83947.487 1837.601293.649 1323.951376.528 877.151448.721 2550.851544.626 1001.621790.921 580.691877.937 418.261907.895 742.162211.220 630.932225.172 465.532355.091 835.772528.241 3874.78
9
F.Magni - Siena 2014
Peptide Mass Fingerprint (PMF)Protein indentification by Database search
F.Magni - Siena 2014
Identificazione delle proteine
Come si può arrivare alla identità CERTA di una proteina ?
1- Determino sperimentalmente TUTTE le informazioni riguardanti la/le proteina/e:APPROCCIO INTEGRALE
2- Determino sperimentalmente PARTE delle informazioni riguardanti la(le) proteina(e) e da queste cerco di ricavare le informazioni mancanti:APPROCCIO PER APPROSSIMAZIONE
10
F.Magni - Siena 2014
Identificazione delle proteineInformazioni sperimentali:
F.Magni - Siena 2014
Identificazione delle proteine
Informazioni sperimentali (complete o parziali)
Identità
Informazioni Programmi di ricercaArchiviate
Banche dati Bioinformatica
11
F.Magni - Siena 2014
Identificazione delle proteine
1 – Banche dati
2 - Identificazione e caratterizzazione delle proteine: Metodiche Analitiche
3 - Identificazione e caratterizzazione delle proteine:Programmi
Molecular & Cellular Proteomics 2009 Vol 8 : 2827 - 2842
F.Magni - Siena 2014
12
F.Magni - Siena 2014
F.Magni - Siena 2014
13
F.Magni - Siena 2014
F.Magni - Siena 2014
14
Nature Methods 6, 423–430 (1 June 2009)Alexander W Bell , Eric W Deutsch , Catherine E Au , Robert E Kearney , Ron Beavis , SalvatoreSechi , Tommy Nilsson , John J M Bergeron , Thomas A Beardslee , Thomas Chappell , GavinMeredith , Peter Sheffield , Phillip Gray , Mahbod Hajivandi , Marshall Pope , Paul Predki , MajlindaKullolli , Marina Hincapie , William S Hancock , Wei Jia , Lina Song , Lei Li , Junying Wei , BingYang , Jinglan Wang , Wantao Ying , Yangjun Zhang , Yun Cai , Xiaohong Qian , Fuchu He , HelmutE Meyer , Christian Stephan , Martin Eisenacher , Katrin Marcus , Elmar Langenfeld , Caroline May ,Steve A Carr , Rushdy Ahmad , Wenhong Zhu , Jeffrey W Smith , Samir M Hanash , Jason J Struthers, Hong Wang , Qing Zhang , Yanming An , Radoslav Goldman , Elisabet Carlsohn , Sjoerd van derPost , Kenneth E Hung , David A Sarracino , Kenneth Parker , Bryan Krastins , Raju Kucherlapati ,Sylvie Bourassa , Guy G Poirier , Eugene Kapp , Heather Patsiouras , Robert Moritz , RichardSimpson , Benoit Houle , Sylvie LaBoissiere , Pavel Metalnikov , Vivian Nguyen , Tony Pawson ,Catherine C L Wong , Daniel Cociorva , John R Yates III , Michael J Ellison , Ana Lopez-Campistrous , Paul Semchuk , Yueju Wang , Peipei Ping , Giuliano Elia , Michael J Dunn , KieranWynne , Angela K Walker , John R Strahler , Philip C Andrews , Brian L Hood , William L Bigbee ,Thomas P Conrads , Derek Smith , Christoph H Borchers , Gilles A Lajoie , Sean C Bendall , Kaye DSpeicher , David W Speicher , Masanori Fujimoto , Kazuyuki Nakamura , Young-Ki Paik , Sang YunCho , Min-Seok Kwon , Hyoung-Joo Lee , Seul-Ki Jeong , An Sung Chung , Christine A Miller ,Rudolf Grimm , Katy Williams , Craig Dorschel , Jayson A Falkner , Lennart Martens & JuanAntonio Vizca F.Magni - Siena 2014
F.Magni - Siena 2014
15
Quale insegnamento traiamo da questi articoli ?
1_Importante separare le proteine
2_Importante ottenere buoni/ottimi dati in spettrometria di massa (SM).
3_Tutti gli sforzi fatti nei punti 1 e 2 SONO INUTILI se non li sappiamo utilizzare correttamente per mancaza di conoscenza :
3_1 Gli algoritmi per la identificazione
3_2 Le banche dati
3_3 Tutte le possibilità offerte dalla SM e i suoi avanzamenti
F.Magni - Siena 2014
F.Magni - Siena 2014
16
Problemi:
Algoritmi x elaborazione dati
Banche dati
Tipo di strumento:analizzatorebassa o alta risoluzione
Modificazioni Post-traduzionali
F.Magni - Siena 2014
F.Magni - Siena 2014
Banche dati
IN CONTINUO AGGIORNAMENTO
Esempi:
http://www.roseindia.net/bioinformatics/biologicaldatabases.shtml
http://molbio.info.nih.gov/molbio/Index.htm
17
F.Magni - Siena 2014
Swiss-Prot:Release 50.9 of 17-Oct-06
Release 2011_12 of 14-Dec-11 of UniProtKB/Swiss-Prot contains 533657
sequence entries,
F.Magni - Siena 2014
18
Release 2013_01 of 09-Jan-13 of UniProtKB/Swiss-Prot contains 538849 sequence entries, comprising 191337357
amino acids abstracted from 215706 references
F.Magni - Siena 2014
14-Dec-2011 Swiss-Prot
contains 533.657 sequence entries,
14-Dec-2011TrEMBL
contains 18.510.272 sequence entries,
F.Magni - Siena 2014
19
Release 2013_01 of 09-Jan-2013
of TrEMBL
Release 2013_01 of 09-Jan-13 of Swiss-Prot
F.Magni - Siena 2014
Factors relevant to the utility of a database
1. Number of entries
2. Frequency of errors
3. Redundancy of the entries
4. Presence of ancillary infromation
5. Frequency at which the database is update
F.Magni - Siena 2014
20
Description Elimination• gi|2947219|gb|AAC39645.1|
UDP-galactose 4' epimerase [Homo sapiens]
• gi|1119217|gb|AAB86498.1| UDP-galactose-4-epimerase [Homo sapiens]
• gi|14277913|pdb|1HZJ|B Chain B, Human Udp-Galactose 4-Epimerase: Accommodation Of Udp-N- Acetylglucosamine Within The Active Site
• gi|14277912|pdb|1HZJ|A Chain A, Human Udp-Galactose 4-Epimerase: Accommodation Of Udp-N- Acetylglucosamine Within The Active Site
• gi|2494659|sp|Q14376|GALE_HUMAN UDP-glucose 4-epimerase (Galactowaldenase) (UDP-galactose 4-epimerase)
• gi|1585500|prf||2201313AUDP galactose 4'-epimerase
F.Magni - Siena 2014
F.Magni - Siena 2014
B- The comprehensive protein sequence databases, derived by translation of all of
the entries in the NSDs
GenPep: translated from the GenBankdatabase (NCBI)
Protein DataBank: translated from the DNADataBank of Japan
TrEMBL: translated from the EMBLnucleotide Sequence database.
21
F.Magni - Siena 2014
C- The curated protein sequence databases
NCBInr
UniProtKBwww.expasy.ch
•Protein knowledgebase, consists of two sections: Swiss-Prot, which is manually annotated and reviewed.
•TrEMBL, which is automatically annotated and is not reviewed.
F.Magni - Siena 2014
Swiss-ProtSWISS-PROT1. is a curated protein sequence database which strives to provide
a high level of annotations (such as the description of thefunction of a protein, its domains structure, post-translationalmodifications, variants, etc),
2. a minimal level of redundancy3. and high level of integration with other databases.
It was established in 1986 and has been maintainedcollaboratively, since 1987, by the Department of MedicalBiochemistry of the University of Geneva and the EMBL DataLibrary (now the EMBL Outstation of The EuropeanBioinformatics Institute - EBI).
22
F.Magni - Siena 2014
Swiss-Prot SwissProt is a high quality, curated protein database. On this
server, the database has been expanded using the SwissknifeVARSPLIC utility. This parses the annotation text and createsnew entries for any splice variants, sequence variants, orsequence conflicts. Original entries have a standard Swiss-Protaccession string, such as P13813. New entries, created byvarsplic, have accession numbers in the form P13813-00-00-01.The title line describes the nature of the differences betweenthe new entry and the parent entry.
Swiss-Prot VarSplic OutputP13746-00-01-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYTQAASSDSAQ
P13746-01-01-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSGGEGVKDRKGGSYTQAASSDSAQ
P13746-00-00-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYTQAASSDSAQ
P13746-00-03-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYTQAASSDSAQ
P13746-01-03-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSGGEGVKDRKGGSYTQAASSDSAQ
P13746-00-04-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYTQAASSDSAQ
P13746-01-04-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSGGEGVKDRKGGSYTQAASSDSAQ
P13746-00-05-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYTQAASSDSAQ
P13746-01-05-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSGGEGVKDRKGGSYTQAASSDSAQ
P13746-01-00-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSGGEGVKDRKGGSYTQAASSDSAQ
P13746-00-02-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYSQAASSDSAQ
P13746-01-02-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSGGEGVKDRKGGSYSQAASSDSAQ
************************************* *******:*********
F.Magni - Siena 2014
23
F.Magni - Siena 2014
F.Magni - Siena 2014
24
F.Magni - Siena 2014
F.Magni - Siena 2014
25
F.Magni - Siena 2014
F.Magni - Siena 2014
26
F.Magni - Siena 2014
TrEMBLEMBL: The EMBL Nucleotide Sequence Database is acomprehensive database of DNA and RNA sequences collectedfrom the scientific literature and patent applications and directlysubmitted from researchers and sequencing groups. Datacollection is done in collaboration with GenBank (USA) and theDNA Databank of Japan (DDBJ).
TrEMBL is a computer-annotated supplement of SWISS-PROT thatcontains all the translations of EMBL nucleotide sequence entriesnot yet integrated in SWISS-PROT. TrEMBL_New files areidentical in format and contain very recent, unannotatedsequences.TrEMBL is developed by the SWISS-PROT groups at SIB and EBI.
F.Magni - Siena 2014
NCBInrNCBI (National Center for Biotechnology Information)maintains composite, non-identical protein and nucleicacid databases for their search tools BLAST and Entrez.
The nr database is compiled by the NCBI (National Centerfor Biotechnology Information) as a protein database forBlast searches. It contains non-identical sequences fromGenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF.One of the main advantages of nr is that it is updated veryfrequently. NCBI has made strong efforts to cross-reference the sequences in these databases in order toavoid duplication.
Banche dati
27
F.Magni - Siena 2014
nr
The nr database is compiled by the NCBI (National Center forBiotechnology Information) as a protein database for Blastsearches. It contains non-identical sequences from GenBank CDStranslations, PDB, Swiss-Prot, PIR, and PRF. One of the mainadvantages of nr is that it is updated very frequently. NCBI hasmade strong efforts to cross-reference the sequences in thesedatabases in order to avoid duplication.
Banche dati miste
F.Magni - Siena 2014
IPI(International Protein Index) is compiled by the EBI (European
Bioinformatics Institute) to provide a top level guide to themain databases that describe the human and mouseproteomes: SWISS-PROT, TrEMBL, NCBI RefSeq and Ensembl.The aim is to:
1. effectively maintain a database of cross references betweenthe primary data sources
2. provide a minimally redundant yet maximally complete set ofproteins (one sequence per transcript)
3. maintain stable identifiers (with incremental versioning) toallow the tracking of sequences in IPI between IPI releases.
4. IPI is updated monthly in accordance with the latest datareleased by the primary data sources. There are currentlytwo IPI databases, Human and Mouse.
28
F.Magni - Siena 2014
dbESTThe EST database represents a final type of sequencedatabases:
-dbEST is composed of a large number of entries
-each entry is a short piece of nucleotide sequence,typically about 300 bases in lenght.
-this type of nucleotide sequence is produced by highlyautomated sequencing of randomly selected portions ofthe expressed DNS of a given tissue.
-the advantage of this approach to genomic sequencing isthat a large amount of sequence data is produced at arelatively low cost.
F.Magni - Siena 2014
dbEST
This is a nucleic acid database which is translated by Mascotin all six reading frames. This generates a very large database,so that dbEST searches take far longer than a search of one ofthe non-redundant protein databases.
You should only search dbEST if a search of a protein database has failed to find a
match.
29
Decoy database
F.Magni - Siena 2014
For large scale experiments:provide the results of any additional statistical analyses that
indicate or establish a measure of identification certainty, or allow adetermination of the false-positive rate, e.g., the results of randomizeddatabase searches or other computational approaches."
This is a recommendation to repeat the search, using identicalsearch parameters, against a database in which the sequences havebeen reversed or randomised.
You do not expect to get any true matches from the "decoy"database. So, the number of matches that are found is an excellentestimate of the number of false positives that are present in the resultsfrom the real or "target" database.
Elias, J. E., et al., Nature Methods 2 667-675 (2005).
F.Magni - Siena 2014
30
F.Magni - Siena 2014
Database searchMASCOThttp://www.matrixscience.com
PROTEIN PROSPECTORhttp://prospector.ucsf.edu/
PEPTIDE SEARCHhttp://www.mann.embl-heidelberg.de/
MOWSEhttp://www.hgmp.mrc.ac.uk/Bioinformatics/
ProFoundhttp://prowl.rockefeller.edu/cgi-bin/ProFound
SEQUESThttp://thompson.mbt.washington.edu/sequest/
F.Magni - Siena 2014
Peptide Mass Fingerprint (PMF)Protein indentification by Database search
31
F.Magni - Siena 2014
Proteoma: strategia 2
F.Magni - Siena 2014
Proteoma: strategia 2
32
F.Magni - Siena 2014
Proteoma: strategia 2
Massa monoisotopica(monoisotopic mass) = la massadello ione molecolare calcolatautilizzando il valore esatto dellamassa dell’isotopo piùabbondante di ogni elemento (esH=1.007825, 12C=12.000000)
948.5
949.5
950.5
951.5
F.Magni - Siena 2014
Proteoma: strategia 2
Adenylate kinase tryptic digested ==> 17 peptidesMr 23634 ==> MALDI-TOF
==> Database search
MASS TOLERANCE in ppm No of peptide matched
1000 700 400 200 100 75 50 30
5 429 136 51 39 29 19 3 1 6 163 54 9 10 7 7 82 16 6 6 8 36 2 9 9 1 10 8 1 1 1
33
F.Magni - Siena 2014
MA
SC
OT
F.Magni - Siena 2014
Fingerprint Search Results
1300 1800 2300 2800 3300 3800 m/z
5000
10000
15000
20000
25000
30000
35000
40000
45000
a.i.
/S=/010427Italien/Sample1/0_L14_1SRef/pdata/1 Hufnagel Fri May 4 13:25:28 2001
MALDI-TOF Mass Spectrum
Risultato della ricerca
Codice e nome proteina con score maggiore
Dettagli sulla identificazione
Elenco proteine
Nuova ricerca con i dati non utilizzati
34
F.Magni - Siena 2014
IDENTIFICATION PARAMETERS
• Score > 65 (www.matrixscience.com)
• MS match for at least 4-5 peptides
• Mass accuracy lower than 150ppm (external calibration)
• Mass accuracy lower than 50ppm (internal calibration)
• Sequence coverage: at least 20%
• Mr and pI should match the estimates or published values
F.Magni - Siena 2014
Ion trap: MS/MS scan mode
1. Inject
2. Isolate
3. Fragment
4. Detect
Tri
psin
a:Ly
, Arg
S
S
35
F.Magni - Siena 2014
Product Ion Spectrum from T7 of Human Growth Hormone
Product Ion Spectrum from T7 of Human Growth Hormone
1000 1100 1200 1300 1400 1500 1600 1700 1800
19.3
14.4
9.6
4.8
0.0
21189
Re
l. In
t. (
%)
10
01
.8
11
87
.4 12
74
.8
12
96
.2
13
85
.81
40
2.9
15
15
.9
16
28
.7
17
42
.2
600 700 800 900
21189
65
3.3
73
4.0
75
9.4
78
1.8
86
8.4
88
8.6
92
9.0
66
2.4
100 200 300 400 500
19.3
14.4
9.6
4.8
0.0
Re
l. In
t. (
%)
86
.2
17
5.4
22
7.4 29
6.8 3
14
.4
34
1.0
40
8.6
42
7.4
51
2.6
54
0.2
56
3.4
43
5.6
28
8.2
20
1.1
10
54
.4
11
67
.4
1942 1855 1742
Y1
Ile --- Ser --- Leu --- Leu --- Leu --- Ile --- Gln --- Ser --- Trp --- Leu --- Glu --- Pro --- Val --- Gln --- Phe --- Leu --- Arg
Y16
B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13
Y15 Y14 Y13 Y12 Y11 Y10 Y9 Y8 Y7 Y6 Y5 Y4
114 201 314 427 540 653 781 868 1054 1168 1297 1394 1493
1629 1516 1403 1274 1187 1001 888 759 662 563
B14
Y3
1621
435
B15
Y2
1768
288
B161881
175
F.Magni - Siena 2014
36
F.Magni - Siena 2014
F.Magni - Siena 2014
– peptide molecular weight
– partial sequence (region 2)
– molecular wt before partial sequence (region 1)
– molecular wt after partial sequence (region 3)
Protein ID by Sequence Tags:1 Tag uses 5 components
A V I/L T
Peptide measured molecular wt = 1927.2
1108.13Partial Sequence- A-V-I/L-T- 1546.11Da381.1
region 1 region 2 region 3
37
F.Magni - Siena 2014
Mass accuracy in database searching (2 AA sequence tag)
1488.754
1000ppm = 202
100 ppm = 29
10 ppm = 1 hit
1237.661
1000ppm = 39
100ppm = 9
10ppm = 1 hit
1925.837
1000ppm = 738
100ppm = 15
10ppm = 1 hit
1981.035
1000ppm = 412
100ppm = 38
10ppm = 2
5ppm = 1 hit
1171.591
1000ppm = 573
100ppm = 71
10 ppm = 5 hits
1213.207
1000ppm = 314
100ppm = 65
10ppm = 1 hit
F.Magni - Siena 2014
Three ways to use mass spectrometry data for protein ID:
2. Sequence Query Database search 4.
Mass values combined with amino acid sequence or
composition data
1. Peptide Mass Fingerprint MALDI-TOF
A set of peptide molecular weights from an enzyme digest
of a protein
3. MS/MS Ions Search HPLC-ESI-MS/MS
MS/MS data from a single peptide or from a complete
LC-MS/MS run: complete or partial aminoacid sequence
38
F.Magni - Siena 2014
Proteomics: clinical studies.Identification of disease-specific proteins for dilated cardiomyopathy
Electrophoresis 1999Anti-endothelial cell antibodies as a potential predictive test for chronic
heart transplantation rejection Hum. Immunol. 1999Identification of several disease-specific protein for cell carcinoma of
bladder Cancer Res. 1999Potential marker for prostate and ovarian cancer
Mol. Med. Today 1999Identification of 18 proteins with abnormal expression in schizophrenics
Mol. Psychiatry 2000Defining urinary proteome… Proteomics 2001Proteome of human cerebrospinal fluid Proteomics 2001Clinical proteomics for cancer biomarker discovery and therapeutic
targeting. Technol. Cancer Res Treat. 2002.The human plasma proteome: history, character, and diagnostic
prospects. Mol Cell Proteomics. 2003
F.Magni - Siena 2014
Healthy population Breast cancer population
39
Fulvio Magni
Dipartimento di Medicina Sperimentale (DIMS)
Università degli Studi di Milano-Bicocca
ATB 2003
PROTEOMICS IN CLINICAL LABORATORYAPPLICAZIONI BIOCHIMICHE E CLINICHE:Tecniche SELDI e ClinProt
F.Magni - Siena 2014
La gran parte delle malattie sono poligeniche quindi un singolo antigene e’ insuffciente alla individuazione sicura della malattia (CA 125, PSA)
le modificazioni del proteoma di un organo possono dar luogo a un pattern proteico caratteristico nei fluidi biologici ci sono le prime evidenze sulla possibilità di individuare
marcatori multipli, che consistono in un insieme di proteine sovra- o sottoespresse nel soggetto malato rispetto al soggetto sano
BIOMARCATORI
La gran parte delle malattie si originano da modificazioni del metabolismo proteico quindi si possono individuare proteine che fungano da marcatori della malattia
F.Magni - Siena 2014
40
MASS SPECTROMETRY BASED PROTEOMICS: CURRENT STATUS AND POTENTIAL USE IN CLINICAL CHEMISTRY
P-A Binz , DF Hochstrasser and RD AppelClin. Chem. Lab. Med, 2003,41,1540
Proteomica classica
Scanner Molecolare
Identificazione multidimensionale (MuDPIT)
Marcatura con ICAT
SELDI - ClinProt
F.Magni - Siena 2014
STRATEGIA
fornire una “immagine” da interpretare in modo semplice. (gel view)
distinguere il profilo proteico normale da uno alteratomediante appositi algoritmi.
identificare con tandem MS le proteine espresse in modo differenziato
costruire i profili di proteine in un campione “normale” e di uno “patologico” ed individuare le differenze
F.Magni - Siena 2014
41
PROTEIN-CHIP Ciphergen Biosystems, Inc., California
SELDICiphergen Biosystems, Inc., California
SI BASA SULLA COMBINAZIONE DI DUE TECNICHE:
spettrometria di massa
barrette « ProteinChip Array » che permettono di separare gruppi di proteine con caratteristiche simili
UTILIZZA:
il lettore « ProteinChip Reader » che utilizza la spettrometria di massa SELDI-TOF
separazione cromatografica con fasi diverse
F.Magni - Siena 2014
PROTEIN-CHIP ARRAY
Presentato da B. Reed Ciphergen BiosystemsMeeting/Conference: Swiss Proteomics, 2001http://www.ciphergen.com/pub/showPubInfo.asp?id=117
INTERAZIONE BIOLOGICA
PS-1 or PS-2 Antibody-Antigen Receptor-Ligand DNA-Protein
superfici biochimiche (anticorpi, recettori, DNA,etc.) trattengono una sola proteina
superfici chimiche (ioniche, idrofobiche, idrofile, ecc..) che trattengono classi di proteine
INTERAZIONE CHIMICAReverse phase Cation Anion Metal Ions Normal
Exchange Exchange
F.Magni - Siena 2014
42
Ricerca di markers proteici
SCELTA DEL PROTEIN-CHIP
http://www.ciphergen.com/techapps/pc/tech/arrays.aspF.Magni - Siena 2014
PROTEIN-CHIP A SCAMBIO IONICO
Massa Molecolare/ Carica
Lavaggio con tamponi diversi
Analisi con SELDI
profili proteici diversi
http://www.ciphergen.com/techapps/pc/tech/arrays.aspF.Magni - Siena 2014
43
CLINPROT ™ MAGNETIC BEADS
Wash
LEGAME SPECIFICO
Elute
ELUZIONE E PREPARAZIONE DEL TARGET SEPARAZIONE MAGNETICA
Profile
MALDI-TOF MS
Bind
MISCELA DI PEPTIDI O PROTEINE
F.Magni - Siena 2014
CLINPROT™Bruker Daltonics
Biglie magneticheClinProt
Automazione
ClinProToolsClustering e
Classificazione
Cluster analysis
Disease
Normal
Cluster analysis
Disease
Normal
Sano
Malato
autoflex MALDI-TOF
AnchorChip™
Target
ultraflex
TOF/TOF
Profili ProteiciClinProToolsAnalisi dei Dati
F.Magni - Siena 2014
44
RISULTATI CON PROTEIN-CHIP
Bruker Daltonics
Verde/Malato Rosso/Sano
F.Magni - Siena 2014
PROTEINE SOBRAESPRESSE
Box-e-whiskers
controlli
pazienti
F.Magni - Siena 2014
45
controlli
pazienti
PROTEINE SOTTOESPRESSE
Box-e-whiskers
F.Magni - Siena 2014
12 MALATO312 SANO3
12 MALATO312 SANO3
12 MALATO312 SANO3
BREAST CANCER RESULTSJinong Li et al. Clinical Chemistry, 48, 1296-1304 (2002)
SPETTRI GEL CAMPIONI
F.Magni - Siena 2014
46
CANCRO DELL’OVAIO: IDENTIFICAZIONE DEI MARKERS(Petricoin, E.F. et al., The Lancet, 359, 572-577, (2002)
Samples from unaffected subjects
Samples from cancer patients
Genetic algorithm + self-organising cluster analysis
Generate protein mass spectra (15200 m/z values)
Discriminatory pattern: plot of relative abundance of 5-20 key proteins (m/z values) that best distinguish cancer from non-cancer
Phase I: pattern discovery
Obtain mass spectrum from masked serum test sample
Generate signature pattern from test sample: plot relative abundance of 5-20 specific key discriminatory proteins identified in phase I
Pattern matching:Compare unknown test sample signature pattern for likeness to previously found discriminatory pattern
Unaffected Cancer New cluster(no match)
Phase II: pattern matching
F.Magni - Siena 2014
VALIDAZIONE DEL SET DI BIOMARCATORI MEDIANTECLASSIFICAZIONE DI SIERI ANALIZZATI IN CIECO
0/320/3232/32STADIO II, III, IV
0/180/1818/18STADIO I
0/70/7
1/100/10
6/60/6
18/191/19CISTI OVARICA BENIGNA <2cm
22/242/24
0/3232/32
0/1818/18
DONNE SENZA TUMORE OVARICO
7/70/70/7NESSUN DISTURBO GINECOLOGICO
9/101/100/10PATOLOGIA GINECOLOGICA BENIGNA
0/66/60/6
0/1918/191/19
0/2422/242/24N ESSUNA CISTI OVARICA
CANCRO NO CANCRONUOVO CLUSTER
CISTI OVARICA BENIGNA >2cm
DONNE CON TUMORE OVARICO
F.Magni - Siena 2014
47
Step 1: Discovery
Training data set
Pattern discoveryxy
Profile 1 Profile 2
**
Disease Normal
Use biomarker pattern for step 2.
Step 2: Evaluation
Test data set
x
y
Cluster analysis
Determination of:• Sensitivity• Specificity• Positive predictive value• Negative predictive value
Disease Normal
Profile 1 Profile 2
**
Step 3: Class prediction
Unknown data set
Profile 1 Profile 2
**
x
y
Cluster analysis
Disease Normal
Il Problema dell’analisi statistica:
F.Magni - Siena 2014