e’ vero che tutta l’informazione sta nel genoma . 17 f.magni - siena 2014 swiss-prot: release...

1

F.Magni - Siena 2014

Proteomica e Spettrometria di Massa: applicazioni biochimiche e

clinicheFulvio Magni

Dipartimento di Medicina Sperimentale

Facoltà di Medicina e Chirurgia

Università degli Studi di Milano-Bicocca

Dipartimento di Medicina Sperimentale


Genome1 Gene Single Protein Disease

E’ vero che tutta l’informazione sta nel genoma ?

2


Differenze nel Genoma ≈1%

Differenze nel Proteoma >1 %


Identico Genoma Differente Proteoma

3


Only 2% of human disease results from a single gene defect

Alternative splicingDNA mRNA Protein5-10.000 activated genes 15-30.000 proteins

Final form of proteins (3D and function) cannot be predicted with certainty from the linear codes of genes

Most proteins are modified after they are synthesised

Why do we analyse proteins ?

Proteins are the molecules of the correct cellular function

Single gene defect => Which proteins is altered ?


Genome Proteome

Proteome indicates the PROTEins expressed by a genOME or tissue

PROTEOME

Proteomics is the large-scale study of gene expression at the proteinlevel.

PROTEOMICS

4


Alternative splicingDNA mRNA5-10.000 activated genes

Why do we analyse

proteins ?

Protein15-30.000 proteins

PTMs


Proteome, unlike genome, is not a fixed feature of an organ

A single genome can give rise to an essential infinitive number of qualitatively and quantitatively different proteomes depending on:…...

The simultaneous study of the whole range of proteins expressed in acell at any given time

PROTEOME

Which components of proteomic profile:

-are relevant for human disease Diagnosis

-are excellent therapeutic target Therapy

Aim

5


Expression Proteomics

Normal sample Disease sample

1st Separated and visualised by 2D-gel electrophoresis


Expression Proteomics

Normal sample2nd Gel images are compared with a special software

Disease sample

Not present in normal sample:Prostate cancer

Low levels in normal sample:Alzheimer

High levels in normal sample:Parkinson

6


Proteomics has now branched into two specific disciplines:

Expression proteomics (classical):qualitative

displaying on 2D-PAGE or alternative technique, identification by mass spectrometry

quantitative evaluation of different expression

Functional proteomics: localization or identification studies of proteins with specific biological activities and interaction studies.

General strategy for proteins study:Steps: Methodology1- Purification – Isolation 2D-PAGE2- Identification – Characterization Mass spectrometry3- Database searching Bioinformatics


Expression Proteomics:3rd Proteins that differ in abundance between the gels are identified by MS

2D-PAGE

Protein Band

Cut out

1- Enzymaticdigestion(i.e. trypsin)

2-Peptide extraction

3-Peptide mass fingerprint by MSanalysis

4-Database search

Evaluation of the Mr of each tryptic peptide

Identification of the protein by databasesearching

7


PARTE ANALITICA:Protocollo di digestione con tripsina

Per ridurre S-S

Per rimuovere i reagenti in eccesso

Riduzione edalchilazione

Taglio delle bande proteiche (spots) dal gel

Lavaggio del gel Per rimuovere colorante e SDS

Aggiunta di ditiotreitolo (DTT)

Aggiunta di iodoacetamide Per alchilare S-H

Lavaggio del gel

Incubazione a 37°C overnight

Aggiunta di TripsinaDigestione

in-gel

1

2

3

4 Estrazione


Proteomics: MALDI-TOF ? Peptides mass fingerprint:

A set of peptide molecular weight from an enzyme digestion of a protein are evaluated by mass spectrometry.

MALDI-TOF:

Analyze high masses >100kDa

Measure entire mass range

Compatible with many buffers

Applicable to variety of compound types

High sensitivity (low fmole)

Molecular weight and structure info (PSD, TOF/TOF)

EASY TO DO

8


MALDI-TOF: How’s It Work?

•Pulsed laser

2. Target is introduced into high vacuum of MS

4. Ions are accelerated by an electrical field to the same kinetic energy, and they drift (or fly) down a field free flight tube where they are separated in space.

el Flight tube

1. Sample is mixed with matrix& dried on sample plate

High vacuum

Time

High voltage

3. Sample spot is irradiated with laser, desorbing ions into the gas phase and starting the clock measuring the time of flight.

20 - 30 kV

6. A data system controls all instrument parameters, acquires the signal vs. time, and permits data processing.

5. Ions strike the detector at different times, depending on the mass to charge ratio of the ion.


Tripsina:Lys, Arg

S

S

m/z Intens.824.478 2122.39842.504 1293.31940.328 26750.83947.487 1837.601293.649 1323.951376.528 877.151448.721 2550.851544.626 1001.621790.921 580.691877.937 418.261907.895 742.162211.220 630.932225.172 465.532355.091 835.772528.241 3874.78

9


Peptide Mass Fingerprint (PMF)Protein indentification by Database search


Identificazione delle proteine

Come si può arrivare alla identità CERTA di una proteina ?

1- Determino sperimentalmente TUTTE le informazioni riguardanti la/le proteina/e:APPROCCIO INTEGRALE

2- Determino sperimentalmente PARTE delle informazioni riguardanti la(le) proteina(e) e da queste cerco di ricavare le informazioni mancanti:APPROCCIO PER APPROSSIMAZIONE

10


Identificazione delle proteineInformazioni sperimentali:



Informazioni sperimentali (complete o parziali)

Identità

Informazioni Programmi di ricercaArchiviate

Banche dati Bioinformatica

11



1 – Banche dati

2 - Identificazione e caratterizzazione delle proteine: Metodiche Analitiche

3 - Identificazione e caratterizzazione delle proteine:Programmi

Molecular & Cellular Proteomics 2009 Vol 8 : 2827 - 2842


12



13



14

Nature Methods 6, 423–430 (1 June 2009)Alexander W Bell , Eric W Deutsch , Catherine E Au , Robert E Kearney , Ron Beavis , SalvatoreSechi , Tommy Nilsson , John J M Bergeron , Thomas A Beardslee , Thomas Chappell , GavinMeredith , Peter Sheffield , Phillip Gray , Mahbod Hajivandi , Marshall Pope , Paul Predki , MajlindaKullolli , Marina Hincapie , William S Hancock , Wei Jia , Lina Song , Lei Li , Junying Wei , BingYang , Jinglan Wang , Wantao Ying , Yangjun Zhang , Yun Cai , Xiaohong Qian , Fuchu He , HelmutE Meyer , Christian Stephan , Martin Eisenacher , Katrin Marcus , Elmar Langenfeld , Caroline May ,Steve A Carr , Rushdy Ahmad , Wenhong Zhu , Jeffrey W Smith , Samir M Hanash , Jason J Struthers, Hong Wang , Qing Zhang , Yanming An , Radoslav Goldman , Elisabet Carlsohn , Sjoerd van derPost , Kenneth E Hung , David A Sarracino , Kenneth Parker , Bryan Krastins , Raju Kucherlapati ,Sylvie Bourassa , Guy G Poirier , Eugene Kapp , Heather Patsiouras , Robert Moritz , RichardSimpson , Benoit Houle , Sylvie LaBoissiere , Pavel Metalnikov , Vivian Nguyen , Tony Pawson ,Catherine C L Wong , Daniel Cociorva , John R Yates III , Michael J Ellison , Ana Lopez-Campistrous , Paul Semchuk , Yueju Wang , Peipei Ping , Giuliano Elia , Michael J Dunn , KieranWynne , Angela K Walker , John R Strahler , Philip C Andrews , Brian L Hood , William L Bigbee ,Thomas P Conrads , Derek Smith , Christoph H Borchers , Gilles A Lajoie , Sean C Bendall , Kaye DSpeicher , David W Speicher , Masanori Fujimoto , Kazuyuki Nakamura , Young-Ki Paik , Sang YunCho , Min-Seok Kwon , Hyoung-Joo Lee , Seul-Ki Jeong , An Sung Chung , Christine A Miller ,Rudolf Grimm , Katy Williams , Craig Dorschel , Jayson A Falkner , Lennart Martens & JuanAntonio Vizca F.Magni - Siena 2014


15

Quale insegnamento traiamo da questi articoli ?

1_Importante separare le proteine

2_Importante ottenere buoni/ottimi dati in spettrometria di massa (SM).

3_Tutti gli sforzi fatti nei punti 1 e 2 SONO INUTILI se non li sappiamo utilizzare correttamente per mancaza di conoscenza :

3_1 Gli algoritmi per la identificazione

3_2 Le banche dati

3_3 Tutte le possibilità offerte dalla SM e i suoi avanzamenti



16

Problemi:

Algoritmi x elaborazione dati

Banche dati

Tipo di strumento:analizzatorebassa o alta risoluzione

Modificazioni Post-traduzionali



Banche dati

IN CONTINUO AGGIORNAMENTO

Esempi:

http://www.roseindia.net/bioinformatics/biologicaldatabases.shtml

http://molbio.info.nih.gov/molbio/Index.htm

17


Swiss-Prot:Release 50.9 of 17-Oct-06

Release 2011_12 of 14-Dec-11 of UniProtKB/Swiss-Prot contains 533657

sequence entries,


18

Release 2013_01 of 09-Jan-13 of UniProtKB/Swiss-Prot contains 538849 sequence entries, comprising 191337357

amino acids abstracted from 215706 references


14-Dec-2011 Swiss-Prot

contains 533.657 sequence entries,

14-Dec-2011TrEMBL

contains 18.510.272 sequence entries,


19

Release 2013_01 of 09-Jan-2013

of TrEMBL

Release 2013_01 of 09-Jan-13 of Swiss-Prot


Factors relevant to the utility of a database

1. Number of entries

2. Frequency of errors

3. Redundancy of the entries

4. Presence of ancillary infromation

5. Frequency at which the database is update


20

Description Elimination• gi|2947219|gb|AAC39645.1|

UDP-galactose 4' epimerase [Homo sapiens]

• gi|1119217|gb|AAB86498.1| UDP-galactose-4-epimerase [Homo sapiens]

• gi|14277913|pdb|1HZJ|B Chain B, Human Udp-Galactose 4-Epimerase: Accommodation Of Udp-N- Acetylglucosamine Within The Active Site

• gi|14277912|pdb|1HZJ|A Chain A, Human Udp-Galactose 4-Epimerase: Accommodation Of Udp-N- Acetylglucosamine Within The Active Site

• gi|2494659|sp|Q14376|GALE_HUMAN UDP-glucose 4-epimerase (Galactowaldenase) (UDP-galactose 4-epimerase)

• gi|1585500|prf||2201313AUDP galactose 4'-epimerase



B- The comprehensive protein sequence databases, derived by translation of all of

the entries in the NSDs

GenPep: translated from the GenBankdatabase (NCBI)

Protein DataBank: translated from the DNADataBank of Japan

TrEMBL: translated from the EMBLnucleotide Sequence database.

21


C- The curated protein sequence databases

NCBInr

UniProtKBwww.expasy.ch

•Protein knowledgebase, consists of two sections: Swiss-Prot, which is manually annotated and reviewed.

•TrEMBL, which is automatically annotated and is not reviewed.


Swiss-ProtSWISS-PROT1. is a curated protein sequence database which strives to provide

a high level of annotations (such as the description of thefunction of a protein, its domains structure, post-translationalmodifications, variants, etc),

2. a minimal level of redundancy3. and high level of integration with other databases.

It was established in 1986 and has been maintainedcollaboratively, since 1987, by the Department of MedicalBiochemistry of the University of Geneva and the EMBL DataLibrary (now the EMBL Outstation of The EuropeanBioinformatics Institute - EBI).

22


Swiss-Prot SwissProt is a high quality, curated protein database. On this

server, the database has been expanded using the SwissknifeVARSPLIC utility. This parses the annotation text and createsnew entries for any splice variants, sequence variants, orsequence conflicts. Original entries have a standard Swiss-Protaccession string, such as P13813. New entries, created byvarsplic, have accession numbers in the form P13813-00-00-01.The title line describes the nature of the differences betweenthe new entry and the parent entry.

Swiss-Prot VarSplic OutputP13746-00-01-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYTQAASSDSAQ

P13746-01-01-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSGGEGVKDRKGGSYTQAASSDSAQ

P13746-00-00-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYTQAASSDSAQ








P13746-00-02-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSS------DRKGGSYSQAASSDSAQ

P13746-01-02-00 SSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSGGEGVKDRKGGSYSQAASSDSAQ

************************************* *******:*********


23



24



25



26


TrEMBLEMBL: The EMBL Nucleotide Sequence Database is acomprehensive database of DNA and RNA sequences collectedfrom the scientific literature and patent applications and directlysubmitted from researchers and sequencing groups. Datacollection is done in collaboration with GenBank (USA) and theDNA Databank of Japan (DDBJ).

TrEMBL is a computer-annotated supplement of SWISS-PROT thatcontains all the translations of EMBL nucleotide sequence entriesnot yet integrated in SWISS-PROT. TrEMBL_New files areidentical in format and contain very recent, unannotatedsequences.TrEMBL is developed by the SWISS-PROT groups at SIB and EBI.


NCBInrNCBI (National Center for Biotechnology Information)maintains composite, non-identical protein and nucleicacid databases for their search tools BLAST and Entrez.

The nr database is compiled by the NCBI (National Centerfor Biotechnology Information) as a protein database forBlast searches. It contains non-identical sequences fromGenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF.One of the main advantages of nr is that it is updated veryfrequently. NCBI has made strong efforts to cross-reference the sequences in these databases in order toavoid duplication.

Banche dati

27


nr

The nr database is compiled by the NCBI (National Center forBiotechnology Information) as a protein database for Blastsearches. It contains non-identical sequences from GenBank CDStranslations, PDB, Swiss-Prot, PIR, and PRF. One of the mainadvantages of nr is that it is updated very frequently. NCBI hasmade strong efforts to cross-reference the sequences in thesedatabases in order to avoid duplication.

Banche dati miste


IPI(International Protein Index) is compiled by the EBI (European

Bioinformatics Institute) to provide a top level guide to themain databases that describe the human and mouseproteomes: SWISS-PROT, TrEMBL, NCBI RefSeq and Ensembl.The aim is to:

1. effectively maintain a database of cross references betweenthe primary data sources

2. provide a minimally redundant yet maximally complete set ofproteins (one sequence per transcript)

3. maintain stable identifiers (with incremental versioning) toallow the tracking of sequences in IPI between IPI releases.

4. IPI is updated monthly in accordance with the latest datareleased by the primary data sources. There are currentlytwo IPI databases, Human and Mouse.

28


dbESTThe EST database represents a final type of sequencedatabases:

-dbEST is composed of a large number of entries

-each entry is a short piece of nucleotide sequence,typically about 300 bases in lenght.

-this type of nucleotide sequence is produced by highlyautomated sequencing of randomly selected portions ofthe expressed DNS of a given tissue.

-the advantage of this approach to genomic sequencing isthat a large amount of sequence data is produced at arelatively low cost.


dbEST

This is a nucleic acid database which is translated by Mascotin all six reading frames. This generates a very large database,so that dbEST searches take far longer than a search of one ofthe non-redundant protein databases.

You should only search dbEST if a search of a protein database has failed to find a

match.

29

Decoy database


For large scale experiments:provide the results of any additional statistical analyses that

indicate or establish a measure of identification certainty, or allow adetermination of the false-positive rate, e.g., the results of randomizeddatabase searches or other computational approaches."

This is a recommendation to repeat the search, using identicalsearch parameters, against a database in which the sequences havebeen reversed or randomised.

You do not expect to get any true matches from the "decoy"database. So, the number of matches that are found is an excellentestimate of the number of false positives that are present in the resultsfrom the real or "target" database.

Elias, J. E., et al., Nature Methods 2 667-675 (2005).


30


Database searchMASCOThttp://www.matrixscience.com

PROTEIN PROSPECTORhttp://prospector.ucsf.edu/

PEPTIDE SEARCHhttp://www.mann.embl-heidelberg.de/

MOWSEhttp://www.hgmp.mrc.ac.uk/Bioinformatics/

ProFoundhttp://prowl.rockefeller.edu/cgi-bin/ProFound

SEQUESThttp://thompson.mbt.washington.edu/sequest/


Peptide Mass Fingerprint (PMF)Protein indentification by Database search

31


Proteoma: strategia 2



32



Massa monoisotopica(monoisotopic mass) = la massadello ione molecolare calcolatautilizzando il valore esatto dellamassa dell’isotopo piùabbondante di ogni elemento (esH=1.007825, 12C=12.000000)

948.5

949.5

950.5

951.5



Adenylate kinase tryptic digested ==> 17 peptidesMr 23634 ==> MALDI-TOF

==> Database search

MASS TOLERANCE in ppm No of peptide matched

1000 700 400 200 100 75 50 30

5 429 136 51 39 29 19 3 1 6 163 54 9 10 7 7 82 16 6 6 8 36 2 9 9 1 10 8 1 1 1

33


MA

SC

OT


Fingerprint Search Results

1300 1800 2300 2800 3300 3800 m/z

5000

10000

15000

20000

25000

30000

35000

40000

45000

a.i.

/S=/010427Italien/Sample1/0_L14_1SRef/pdata/1 Hufnagel Fri May 4 13:25:28 2001

MALDI-TOF Mass Spectrum

Risultato della ricerca

Codice e nome proteina con score maggiore

Dettagli sulla identificazione

Elenco proteine

Nuova ricerca con i dati non utilizzati

34


IDENTIFICATION PARAMETERS

• Score > 65 (www.matrixscience.com)

• MS match for at least 4-5 peptides

• Mass accuracy lower than 150ppm (external calibration)

• Mass accuracy lower than 50ppm (internal calibration)

• Sequence coverage: at least 20%

• Mr and pI should match the estimates or published values


Ion trap: MS/MS scan mode

1. Inject

2. Isolate

3. Fragment

4. Detect

Tri

psin

a:Ly

, Arg

S

S

35


Product Ion Spectrum from T7 of Human Growth Hormone

Product Ion Spectrum from T7 of Human Growth Hormone

1000 1100 1200 1300 1400 1500 1600 1700 1800

19.3

14.4

9.6

4.8

0.0

21189

Re

l. In

t. (

%)

10

01

.8

11

87

.4 12

74

.8

12

96

.2

13

85

.81

40

2.9

15

15

.9

16

28

.7

17

42

.2

600 700 800 900

21189

65

3.3

73

4.0

75

9.4

78

1.8

86

8.4

88

8.6

92

9.0

66

2.4

100 200 300 400 500

19.3

14.4

9.6

4.8

0.0

Re

l. In

t. (

%)

86

.2

17

5.4

22

7.4 29

6.8 3

14

.4

34

1.0

40

8.6

42

7.4

51

2.6

54

0.2

56

3.4

43

5.6

28

8.2

20

1.1

10

54

.4

11

67

.4

1942 1855 1742

Y1

Ile --- Ser --- Leu --- Leu --- Leu --- Ile --- Gln --- Ser --- Trp --- Leu --- Glu --- Pro --- Val --- Gln --- Phe --- Leu --- Arg

Y16

B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13

Y15 Y14 Y13 Y12 Y11 Y10 Y9 Y8 Y7 Y6 Y5 Y4

114 201 314 427 540 653 781 868 1054 1168 1297 1394 1493

1629 1516 1403 1274 1187 1001 888 759 662 563

B14

Y3

1621

435

B15

Y2

1768

288

B161881

175


36



– peptide molecular weight

– partial sequence (region 2)

– molecular wt before partial sequence (region 1)

– molecular wt after partial sequence (region 3)

Protein ID by Sequence Tags:1 Tag uses 5 components

A V I/L T

Peptide measured molecular wt = 1927.2

1108.13Partial Sequence- A-V-I/L-T- 1546.11Da381.1

region 1 region 2 region 3

37


Mass accuracy in database searching (2 AA sequence tag)

1488.754

1000ppm = 202

100 ppm = 29

10 ppm = 1 hit

1237.661

1000ppm = 39

100ppm = 9

10ppm = 1 hit

1925.837

1000ppm = 738

100ppm = 15

10ppm = 1 hit

1981.035

1000ppm = 412

100ppm = 38

10ppm = 2

5ppm = 1 hit

1171.591

1000ppm = 573

100ppm = 71

10 ppm = 5 hits

1213.207

1000ppm = 314

100ppm = 65

10ppm = 1 hit


Three ways to use mass spectrometry data for protein ID:

2. Sequence Query Database search 4.

Mass values combined with amino acid sequence or

composition data

1. Peptide Mass Fingerprint MALDI-TOF

A set of peptide molecular weights from an enzyme digest

of a protein

3. MS/MS Ions Search HPLC-ESI-MS/MS

MS/MS data from a single peptide or from a complete

LC-MS/MS run: complete or partial aminoacid sequence

38


Proteomics: clinical studies.Identification of disease-specific proteins for dilated cardiomyopathy

Electrophoresis 1999Anti-endothelial cell antibodies as a potential predictive test for chronic

heart transplantation rejection Hum. Immunol. 1999Identification of several disease-specific protein for cell carcinoma of

bladder Cancer Res. 1999Potential marker for prostate and ovarian cancer

Mol. Med. Today 1999Identification of 18 proteins with abnormal expression in schizophrenics

Mol. Psychiatry 2000Defining urinary proteome… Proteomics 2001Proteome of human cerebrospinal fluid Proteomics 2001Clinical proteomics for cancer biomarker discovery and therapeutic

targeting. Technol. Cancer Res Treat. 2002.The human plasma proteome: history, character, and diagnostic

prospects. Mol Cell Proteomics. 2003


Healthy population Breast cancer population

39

Fulvio Magni

Dipartimento di Medicina Sperimentale (DIMS)

Università degli Studi di Milano-Bicocca

ATB 2003

PROTEOMICS IN CLINICAL LABORATORYAPPLICAZIONI BIOCHIMICHE E CLINICHE:Tecniche SELDI e ClinProt


La gran parte delle malattie sono poligeniche quindi un singolo antigene e’ insuffciente alla individuazione sicura della malattia (CA 125, PSA)

le modificazioni del proteoma di un organo possono dar luogo a un pattern proteico caratteristico nei fluidi biologici ci sono le prime evidenze sulla possibilità di individuare

marcatori multipli, che consistono in un insieme di proteine sovra- o sottoespresse nel soggetto malato rispetto al soggetto sano

BIOMARCATORI

La gran parte delle malattie si originano da modificazioni del metabolismo proteico quindi si possono individuare proteine che fungano da marcatori della malattia


40

MASS SPECTROMETRY BASED PROTEOMICS: CURRENT STATUS AND POTENTIAL USE IN CLINICAL CHEMISTRY

P-A Binz , DF Hochstrasser and RD AppelClin. Chem. Lab. Med, 2003,41,1540

Proteomica classica

Scanner Molecolare

Identificazione multidimensionale (MuDPIT)

Marcatura con ICAT

SELDI - ClinProt


STRATEGIA

fornire una “immagine” da interpretare in modo semplice. (gel view)

distinguere il profilo proteico normale da uno alteratomediante appositi algoritmi.

identificare con tandem MS le proteine espresse in modo differenziato

costruire i profili di proteine in un campione “normale” e di uno “patologico” ed individuare le differenze


41

PROTEIN-CHIP Ciphergen Biosystems, Inc., California

SELDICiphergen Biosystems, Inc., California

SI BASA SULLA COMBINAZIONE DI DUE TECNICHE:

spettrometria di massa

barrette « ProteinChip Array » che permettono di separare gruppi di proteine con caratteristiche simili

UTILIZZA:

il lettore « ProteinChip Reader » che utilizza la spettrometria di massa SELDI-TOF

separazione cromatografica con fasi diverse


PROTEIN-CHIP ARRAY

Presentato da B. Reed Ciphergen BiosystemsMeeting/Conference: Swiss Proteomics, 2001http://www.ciphergen.com/pub/showPubInfo.asp?id=117

INTERAZIONE BIOLOGICA

PS-1 or PS-2 Antibody-Antigen Receptor-Ligand DNA-Protein

superfici biochimiche (anticorpi, recettori, DNA,etc.) trattengono una sola proteina

superfici chimiche (ioniche, idrofobiche, idrofile, ecc..) che trattengono classi di proteine

INTERAZIONE CHIMICAReverse phase Cation Anion Metal Ions Normal

Exchange Exchange


42

Ricerca di markers proteici

SCELTA DEL PROTEIN-CHIP

http://www.ciphergen.com/techapps/pc/tech/arrays.aspF.Magni - Siena 2014

PROTEIN-CHIP A SCAMBIO IONICO

Massa Molecolare/ Carica

Lavaggio con tamponi diversi

Analisi con SELDI

profili proteici diversi

http://www.ciphergen.com/techapps/pc/tech/arrays.aspF.Magni - Siena 2014

43

CLINPROT ™ MAGNETIC BEADS

Wash

LEGAME SPECIFICO

Elute

ELUZIONE E PREPARAZIONE DEL TARGET SEPARAZIONE MAGNETICA

Profile

MALDI-TOF MS

Bind

MISCELA DI PEPTIDI O PROTEINE


CLINPROT™Bruker Daltonics

Biglie magneticheClinProt

Automazione

ClinProToolsClustering e

Classificazione

Cluster analysis

Disease

Normal

Cluster analysis

Disease

Normal

Sano

Malato

autoflex MALDI-TOF

AnchorChip™

Target

ultraflex

TOF/TOF

Profili ProteiciClinProToolsAnalisi dei Dati


44

RISULTATI CON PROTEIN-CHIP

Bruker Daltonics

Verde/Malato Rosso/Sano


PROTEINE SOBRAESPRESSE

Box-e-whiskers

controlli

pazienti


45

controlli

pazienti

PROTEINE SOTTOESPRESSE

Box-e-whiskers


12 MALATO312 SANO3

12 MALATO312 SANO3

12 MALATO312 SANO3

BREAST CANCER RESULTSJinong Li et al. Clinical Chemistry, 48, 1296-1304 (2002)

SPETTRI GEL CAMPIONI


46

CANCRO DELL’OVAIO: IDENTIFICAZIONE DEI MARKERS(Petricoin, E.F. et al., The Lancet, 359, 572-577, (2002)

Samples from unaffected subjects

Samples from cancer patients

Genetic algorithm + self-organising cluster analysis

Generate protein mass spectra (15200 m/z values)

Discriminatory pattern: plot of relative abundance of 5-20 key proteins (m/z values) that best distinguish cancer from non-cancer

Phase I: pattern discovery

Obtain mass spectrum from masked serum test sample

Generate signature pattern from test sample: plot relative abundance of 5-20 specific key discriminatory proteins identified in phase I

Pattern matching:Compare unknown test sample signature pattern for likeness to previously found discriminatory pattern

Unaffected Cancer New cluster(no match)

Phase II: pattern matching


VALIDAZIONE DEL SET DI BIOMARCATORI MEDIANTECLASSIFICAZIONE DI SIERI ANALIZZATI IN CIECO

0/320/3232/32STADIO II, III, IV

0/180/1818/18STADIO I

0/70/7

1/100/10

6/60/6

18/191/19CISTI OVARICA BENIGNA <2cm

22/242/24

0/3232/32

0/1818/18

DONNE SENZA TUMORE OVARICO

7/70/70/7NESSUN DISTURBO GINECOLOGICO

9/101/100/10PATOLOGIA GINECOLOGICA BENIGNA

0/66/60/6

0/1918/191/19

0/2422/242/24N ESSUNA CISTI OVARICA

CANCRO NO CANCRONUOVO CLUSTER

CISTI OVARICA BENIGNA >2cm

DONNE CON TUMORE OVARICO


47

Step 1: Discovery

Training data set

Pattern discoveryxy

Profile 1 Profile 2

**

Disease Normal

Use biomarker pattern for step 2.

Step 2: Evaluation

Test data set

x

y

Cluster analysis

Determination of:• Sensitivity• Specificity• Positive predictive value• Negative predictive value

Disease Normal

Profile 1 Profile 2

**

Step 3: Class prediction

Unknown data set

Profile 1 Profile 2

**

x

y

Cluster analysis

Disease Normal

Il Problema dell’analisi statistica:


e’ vero che tutta l’informazione sta nel genoma . 17 f.magni - siena 2014 swiss-prot: release...

Documents