advancing statistical analysis of multiplexed ms/ms quantitative data with scaffold q+

Advancing Statistical Analysis of Multiplexed MS/MS Quantitative Data with

Scaffold Q+

Brian C. Searle and Mark TurnerProteome Software Inc.

Vancouver Canada, ASMS 2012

Creative Commons Attribution

114 115 116 117

Reference

114 115 116 117 114 115 116 117

ANOVA

114

115

116

117

114

(2)

115

(2)

116

(2)

117

(2)0

0.5

1

1.5

2

Oberg et al 2008 (doi:10.1021/pr700734f)

Ref Ref

“High Quality” Data• Virtually no

missing data

• Symmetric distribution

• High Kurtosis

“Normal Quality” Data• High Skew due

to truncation• >20% of intensities

are missing in this channel!

• Either ignore channels with any missing data (0.84 = 41%) …

“Normal Quality” Data…Or deal with a very

non-Gaussian data!

Contents

• A Simple, Non-parametric Normalization Model

• Refinement 1: Intelligent Intensity Weighting

• Refinement 2: Standard Deviation Estimation

• Refinement 3: Kernel Density Estimation

• Refinement 4: Permutation Testing

Simple, Non-parametric Normalization Model

Additive Effects on Log Scale

• Experiment: sample handling effects across MS acquisitions (LC and MS variation, calibration etc)

• Sample: sample handling effects between channels (pipetting errors, etc)

• Peptide: ionization effects

• Error: variation due to imprecise measurements

log2(intensity) experiment sample peptide error

Oberg et al 2008 (doi:10.1021/pr700734f)

Additive Effects on Log Scale

Effect Subtract Add Back Across

Experimentmedian for all intensities in MS/MS

median for all intensities

entire experiment

Sample median for each channel

median of all channels each MS/MS

Peptide summed intensity for each peptide

median summed intensity each protein

Median Polish

RemoveInter-Experiment

Effects

RemoveIntra-Sample

Effects

RemovePeptideEffects

3x

“Non-Parametric ANOVA”

Refinement 1: Intensity Weighting

Linear Intensity Weighting

Low Intensity,Low Weight High Intensity,

High Weight

Desired Intensity Weighting

Low Intensity,Low Weight

Most Data,High Weight

Saturated Data,Decreased Weight

Variance At Different Intensities

Estimate Confidence from Protein Deviation

Estimate Confidence from Protein Deviation

• Pij = 2 * cumulative t-distribution(tij), wherei = raw intensity binj = each spectrum in bin i = protein median for spectrum j

tij =

• Pi =

x ij

sn

n 1

Pijni

x

Data Dependent Intensity Weighting


Most Data,High Weight Saturated Data,

Decreased Weight

Desired Intensity Weighting



Saturated Data,Decreased Weight

Data Dependent Intensity Weighting



Algorithm Schematic


Effects

RemoveIntra-Sample

Effects


3xData Dependent

Intensity Weighting

Refinement 2: Standard Deviation Estimation

Standard Deviation Estimation

i = intensity binj = each spectrum in bin i = protein median for spectrum j

Stdev i x ijni

x

Data Dependent Standard Deviation Estimation

Algorithm Schematic


Effects

RemoveIntra-Sample

Effects


3xData Dependent

Intensity Weighting

Data Dependent Standard Dev

Estimation

Refinement 3: Kernel Density Estimation

Protein Variance Estimation

Kernels

Kernels

Stdev i max min

n

Pi 1.0

Kernels

Kernel Density Estimation


Deviation that shifts distribution

0.3 shift on Log2 Scale

Improved Kernels

• We have a better estimate for Pi: the intensity-based weight!

• We have a better estimate for Stdevi: the intensity-based standard deviation!

Improved Kernels

Improved Kernel Density Estimation


Significant Deviation Worth

InvestigatingUnimportant

Deviation


1.0 shift on Log2 Scale = 2 Fold Change

Refinement 4: Permutation Testing

Why Use Permutation Testing?

• Why go through all this work to just use a t-test or ANOVA?

• Ranked-based Mann-Whitney and Kruskal-Wallis tests “work”, but lack power

Basic Permutation Test1.11.10.81.11.41.01.00.91.21.00.71.00.70.90.90.00.50.30.71.0

T=4.84

Basic Permutation Test1.1 0.51.1 1.10.8 1.11.1 0.01.4 1.01.0 0.81.0 1.00.9 1.01.2 1.11.0 0.30.7 1.01.0 0.70.7 0.70.9 1.00.9 0.70.0 1.40.5 0.90.3 0.90.7 1.21.0 0.9

T=4.84 T=1.49

Basic Permutation Test1.1 0.5 0.5 0.51.1 1.1 0.9 0.90.8 1.1 1.0 1.41.1 0.0 0.7 1.01.4 1.0 0.7 0.71.0 0.8 1.1 1.11.0 1.0 1.2 1.10.9 1.0 1.0 0.31.2 1.1 1.1 1.21.0 0.3 1.1 1.00.7 1.0 1.0 1.11.0 0.7 0.9 0.70.7 0.7 0.3 0.80.9 1.0 1.0 0.90.9 0.7 0.8 1.00.0 1.4 1.0 1.00.5 0.9 0.7 0.00.3 0.9 0.0 1.00.7 1.2 1.4 0.71.0 0.9 0.9 0.9

x1000

T=4.84 T=1.49 T=1.34 T=1.14

Basic Permutation Test950 below 50 above

501000

p - value 0.05

Improvements…

• N is frequently very small

• Instead of randomizing N points, randomly select N points from Kernel Densities

• Expensive! What if you want more precision?

Extrapolating Precision

Actual T-Statistic of 6.6?

LastUsable

Permutation

1000 below 0 above

01000

p - value ?


Actual T-Statistic of 6.6?

Knijnenburg, et al 2011 (doi:10.1186/1471-2105-12-411)


LastUsable

Permutation


p-value = 0.0000018

LastUsable

Permutation

Conclusions


Effects

RemoveIntra-Sample

Effects


3xData Dependent

Intensity Weighting

Data Dependent Standard Dev

Estimation


(Fold Changes)

Permutation Testing

(P-Values)

Normalization Interpretation

• All of these ideas work for SILAC/ICAT as well!

Acknowledgements

Proteome Software Team–Bryan Head–Jana Lee–Audrey Lester–Susan Ludwigsen–Jimar Millar–De’Mel Mojica–Mark Turner–Nick Vincent-Maloney–Luisa Zini

Institute of Molecular Pathology–Karl Mechtler

Colorado State University–Jessica Prenni–Karen Dobos

Mayo Clinic, MN–Ann Oberg

advancing statistical analysis of multiplexed ms/ms quantitative data with scaffold q+

Documents

missing data points

high quality data

overall distribution

typical data set

awful lot of data

best data sets weve

normal quality data

quantitative techniques