analysis of proteomics data using maldiquant of proteomics data using maldiquant sebastian gibb...
TRANSCRIPT
![Page 1: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/1.jpg)
Analysis of Proteomics Data using MALDIquant
Sebastian GibbInstitute for Medical Informatics, Statistics and Epidemiology (IMISE)
University of Leipzig
17. August 2011
Sebastian Gibb, MALDIquant, 2011-08-17 1
![Page 2: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/2.jpg)
Table of Contents
1 IntroductionProteomicsMass Spectrometry
2 Analysis of Proteomics Data using MALDIquantDesignSingle Spectrum WorkflowPreview
Sebastian Gibb, MALDIquant, 2011-08-17 2
![Page 3: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/3.jpg)
Predicting disease
Sebastian Gibb, MALDIquant, 2011-08-17 3
![Page 4: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/4.jpg)
Proteomics
Proteomics
• Study of the entirety of proteins produced by an organism.
• Foci: identification, structure determination, biomarker,pathways, expression.
Sebastian Gibb, MALDIquant, 2011-08-17 4
![Page 5: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/5.jpg)
Mass Spectrometry
Ion Source: MALDI
Matrix-Assisted LaserDesorption/Ionization
Mass Analyzer: TOF
Time Of Flight (t ∝√
mq )
Detector
QuantityMeasurement
Abb. 3.14.; S. 67; “Biochemie & Pathobiochemie”, Loffler G., 8. Auflage (2007), Springer Medizin Verlag
Sebastian Gibb, MALDIquant, 2011-08-17 5
![Page 6: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/6.jpg)
MALDI-TOF Example Spectrum
2000 4000 6000 8000 10000
050
0010
000
1500
020
000
2500
030
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 6
![Page 7: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/7.jpg)
MALDIquantMotivation
• Only relatively few open source software solutions availableand very few for the R platform.
• No MALDI-TOF package fitting our needs for clinicaldiagnostics.
• Necessity of handling both technical and biologicalreplicates.
• Unsatisfying quantification of relative intensities(total-ion-current, 0/1)
• Investigation of impact of calibration of spectra on clinicalprognosis.
• Modular and easy to customize analysis routines.
Sebastian Gibb, MALDIquant, 2011-08-17 7
![Page 8: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/8.jpg)
MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)
MALDIquant
single spectrum multiple spectra
peak alignmentcalibration/
normalization
smoothing
baseline correction
peak detection
raw data
readBrukerFlexData
...
readMzXmlData
classification
sda
randomForest
...
pamr
smoothing
baseline correction
peak detection
Sebastian Gibb, MALDIquant, 2011-08-17 8
![Page 9: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/9.jpg)
MALDIquantThe UNIX philosophy: do one thing and do it well McIlroy (1978)
⇒
f1,1 f1,2 f1,3 . . . f1,n
f2,1 f2,2 f2,3 . . . f2,n
. . . . . . . . . . . . . . .fm,1 fm,2 fm,3 . . . fm,n
Sebastian Gibb, MALDIquant, 2011-08-17 9
![Page 10: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/10.jpg)
MALDIquantStructure
object-oriented, S4
AbstractMassObject+mass: vector+intensity: vector+metaData: list
+as.matrix()+intensity()+intensityMatrix()+isEmpty()+length()+lines()+mass()+metaData()+plot()+points()+transformIntensity()
MassSpectrum
+detectPeaks()+estimateBaseline()+estimateNoise()+findLocalMaxima()+removeBaseline()
MassPeaks
+labelPeaks()
Sebastian Gibb, MALDIquant, 2011-08-17 10
![Page 11: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/11.jpg)
Example Data
Serum Peptidome Profiling Revealed Platelet Factor 4 as aPotential Discriminating Peptide Associated withPancreatic Cancer
G.M. Fiedler, A.B. Leichtle, J. Kase et alClin Cancer Res June 1, 2009 15:3812-3819
“Two significant peaks ( m/z 3884; 5959) achieved asensitivity of 86.3% and a specificity of 97.6% for thediscrimination of patients and healthy controls . . . ”
“MALDI-TOF MS-based serum peptidome profilingallowed the discovery and validation ofplatelet factor 4 [m/z 3884, 7767; S.G.] as a newdiscriminating marker in pancreatic cancer.”
Sebastian Gibb, MALDIquant, 2011-08-17 11
![Page 12: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/12.jpg)
MALDIquanthands-on: File Import
> library("MALDIquant")
> library("readBrukerFlexData")
> spectra <- mqReadBrukerFlex("/data/fiedler2009/")
> length(spectra)
[1] 480
> spectra[[1]]
S4 class type : MassSpectrum
Number of m/z values : 42388
Range of m/z values : 1000.015 - 9999.734
Range of intensity values: 5 - 101840
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
other possibilities:
library("MALDIquant")
library("readMzXmlData")
s <- mqReadMzXml("/data/exampleMS/spectrum.mzXML")
library("MALDIquant")
s <- createMassSpectrum(mass=1:5,
intensity=runif(5),
metaData=list(name="example"))
Sebastian Gibb, MALDIquant, 2011-08-17 12
![Page 13: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/13.jpg)
MALDIquanthands-on: plot
> plot(spectra[[1]])
> abline(v=3884, col="blue")
2000 4000 6000 8000 10000
050
0010
000
1500
020
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 13
![Page 14: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/14.jpg)
MALDIquanthands-on: plot m/z 3884
> plot(spectra[[1]], xlim=c(3800, 4500))
> abline(v=3884, col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
050
0010
000
1500
020
000
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 14
![Page 15: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/15.jpg)
MALDIquanthands-on: Variance Stabilization and Smoothing
> spectra <- lapply(spectra, transformIntensity, fun=sqrt)
> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}
> spectra <- lapply(spectra, transformIntensity, fun=movAvg)
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 15
![Page 16: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/16.jpg)
MALDIquanthands-on: Baseline Correction – don’t harm the data
> bl <- estimateBaseline(spectra[[1]], method="Median")
> plot(spectra[[1]], xlim=c(3800, 4500));
> lines(bl, col="red");
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 16
![Page 17: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/17.jpg)
MALDIquanthands-on: Baseline Correction – SNIP
> bl <- estimateBaseline(spectra[[1]], method="SNIP")
> plot(spectra[[1]], xlim=c(3800, 4500)); lines(bl, col="red");
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988
Sebastian Gibb, MALDIquant, 2011-08-17 17
![Page 18: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/18.jpg)
MALDIquanthands-on: Baseline Correction – SNIP
> spectra <- lapply(spectra, removeBaseline)
> lines(spectra[[1]], col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
050
100
150
mass
inte
nsity
C. G. Ryan, E. Clayton, W. L. Griffin, S. H. Sie, and D. R. Cousens. SNIP, a statistics-sensitive background treatment for thequantitative analysis of PIXE spectra in geoscience applications. Nucl. Instrument. Meth. B, 34:396–402, 1988
Sebastian Gibb, MALDIquant, 2011-08-17 17
![Page 19: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/19.jpg)
MALDIquanthands-on: Baseline Correction – SNIP
2000 4000 6000 8000 10000
050
100
150
mass
inte
nsity
Sebastian Gibb, MALDIquant, 2011-08-17 18
![Page 20: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/20.jpg)
MALDIquanthands-on: Peak Detection
> spectra[[1]]
S4 class type : MassSpectrum
Number of m/z values : 42388
Range of m/z values : 1000.015 - 9999.734
Range of intensity values: 0 - 709.207
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
> peaks <- lapply(spectra, detectPeaks)
> peaks[[1]]
S4 class type : MassPeaks
Number of m/z values : 198
Range of m/z values : 1011.059 - 9423.422
Range of intensity values: 19.273 - 709.207
File : /data/fiedler2009/[...]/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
Sebastian Gibb, MALDIquant, 2011-08-17 19
![Page 21: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/21.jpg)
MALDIquanthands-on: Peak Detection
> plot(spectra[[1]], xlim=c(3800, 4500)); points(peaks[[1]], col="green")
> top5 <- intensity(p) %in% sort(intensity(p)[mass(p)>3800 & mass(p)<4500], decreasing=TRUE)[1:5]
> labelPeaks(peaks[[1]], index=top5); labelPeaks(peak[[1]], mass=3884, col="blue")
3800 3900 4000 4100 4200 4300 4400 4500
020
4060
8010
0
mass
inte
nsity
●●● ●●●●● ●●
●●● ●●●
●
●●●●●
●●●●● ●●
●
●
●
●● ●●
●●
●
●
● ●
●
●●●● ●
● ●
●
●● ●
●
●● ●●●
●
●●
●
●
●
●● ●●
●●
●
●
● ●
●
●●
●
4052.944090.24193.806
4209.072
4265.5643882.052
accepted maximarejected maximanoise threshold
Sebastian Gibb, MALDIquant, 2011-08-17 20
![Page 22: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/22.jpg)
MALDIquantSingle Spectrum Workflow
> library("MALDIquant")
> library("readBrukerFlexData")
> spectra <- mqReadBrukerFlex("/data/fiedler2009/")
> movAvg <- function(y) {return(filter(y, rep(1, 5)/5, sides=2));}
> spectra <- lapply(spectra, transformIntensity, fun=sqrt)
> spectra <- lapply(spectra, transformIntensity, fun=movAvg)
> spectra <- lapply(spectra, removeBaseline)
> peaks <- lapply(spectra, detectPeaks)
Sebastian Gibb, MALDIquant, 2011-08-17 21
![Page 23: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/23.jpg)
MALDIquantComparison of Multiple Spectra
3850 3900 3950 4000 4050 4100
010
2030
40
mass
inte
nsity 3882.052
4052.94
4069.799
4090.2
3880.425
4050.8854068.133
4088.301
spectrum 1 (control)spectrum 401 (cancer)
Sebastian Gibb, MALDIquant, 2011-08-17 22
![Page 24: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/24.jpg)
MALDIquantConclusion
MALDIquant is free software (GPLv3).
Currently available:
• Easy to use and full established single spectrum workflow.
• Easy to customize.
• Easy data exchange.
Shortly available:
• Peak alignment.
• Calibration/Normalization routines.
Sebastian Gibb, MALDIquant, 2011-08-17 23
![Page 25: Analysis of Proteomics Data using MALDIquant of Proteomics Data using MALDIquant Sebastian Gibb Institute for Medical Informatics, Statistics and Epidemiology (IMISE) University of](https://reader034.vdocuments.mx/reader034/viewer/2022042708/5ae851257f8b9a6d4f8f5562/html5/thumbnails/25.jpg)
MALDIquantThanks
Alexander B. Leichtle, helpful discussions(Institute for Clinical Chemistry, Bern University Hospital)
Korbinian Strimmer, supervision(IMISE, University of Leipzig)
Thanks for your attention!
Download of MALDIquant software:http://strimmerlab.org/software/maldiquant/
Sebastian Gibb, MALDIquant, 2011-08-17 24