workshop spotted 2-channel arrays data processing and quality control eugenia migliavacca and mauro...

53
WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Upload: anabel-beasley

Post on 16-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

WORKSHOPSPOTTED 2-channel ARRAYS

DATA PROCESSING AND QUALITY CONTROL

Eugenia Migliavacca andMauro Delorenzi,

ISREC, December 11, 2003

Page 2: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

AIMSDiscussionInformation

Introduction to the use of the webpage for automated normalization

interface btw experimentalists and analystsfeedback

resource allocation

Page 3: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Acknowledgments

some slides originally provided by:Terry Speed (Berkeley / WEHI)Sandrine Dudoit (Sandrine Dudoit (Berkeley))Yee Hwa Yang (Berkeley)Natalie Thorne (WEHI)

Otto Hagenbuechle

Eugenia MigliavaccaDarlene Goldsteinand others

Page 4: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

RNA ISOLATION

(AMPLIFICATION) AND LABELING WITH FLUORO-DYES

Preparation

Page 5: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

HybridisationBinding labelled samples (targets) to

complementary probes on a slide

cover

slip

Hybridise for

5-12 hours

Wash

Mix

Page 6: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Scanning

1

2

Adjust scanner parameters; frequently can adapt:

1. excitation wave (laser) intensity2. "gain" (amplification) of the photon detection system

1

2

Page 7: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Human 10KcDNA Array

How to extract data ?How to recognize

problems ?

Page 8: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Part of the image of one channel false-coloured on a white (v. high) red (high) through yellow and green (medium) to blue (low) and black scale.

Scanner's Spots

Page 9: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

RNA preparation and Labeling

Data for further analysis

Slide scanning

Hybridisation

Image analysis

Normalization

Steps of a Microarray Experiment

Why perform an experiment ?What is the aim ?

Which conclusions do you want to reach ?

first: DESIGN !

Page 10: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

mRNA abundance

rRNA 80%

tRNA tRNAtRNA

mRNA 1%

1-50

50-500

500+

approx. 300'000 mRNA Molecules/cellapprox. 10-20'000 different genes

What do you want to measure ?

RNA massdifferent in different cells

Page 11: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Relative vs Absolute changes

200'000 mRNA Molecules/cell200 for gene X (0.1%)

400'000 mRNA Molecules/cell400 for gene X (0.1%)

Is gene X differentially expressed ?

Page 12: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

RNA preparation and Labeling

Data for further analysis

Slide scanning

Hybridisation

Image analysis

Normalization

R, G, M, A, etc

16-bit TIFF files

(Rfg, Rbg), (Gfg, Gbg), etc

What is needed for high quality data ?

Which are the critical steps ?

Steps of a Microarray Experiment

Page 13: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

RNA preparation and Labeling

Data for further analysis

Slide scanning

Hybridisation

Image analysis

Normalization

Adjust / Balance channels approx.; avoid saturation

check normalized and unnormalized data of exp RNA and of

spiked RNA

Spike-in RNA in known conc. and ratios

Steps of a Microarray Experiment

Page 14: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Why avoid saturation ?Why balance channels ?

Why perform "normalization" ?What to check before and after normalization ?

Why calculate ratios ?Why calculate log ratios ?

Page 15: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Aim: Gene Expression Data

Gene expression data on p genes for n samples

Genes

Slides

Gene expression level of gene 5 in slide 4 j

M =Log2( Red intensity / Green intensity)

slide 1 slide 2 slide 3 slide 4 slide 5 …

1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...

These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.

Page 16: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Objectives for high quality

Important aspects include:

• Tentatively separating • systematic sources of variation ("artefacts"), that

bias the results,• from random sources of variation ("noise"), that hide

the truth.• Removing the former as well as possible and

quantifying the latter

Only if this is done can we hope to reach good quality andmake valid statements about the confidence in the

results

Page 17: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Typical Statistical Approach

Measured value = real value + systematic errors + noise

Corrected value = real value + noise

• Analysis of Corrected value => (unbiased) CONCLUSIONS

• Estimation of Noise => quality of CONCLUSIONS, statistical significance

(level of confidence) of the conclusions

Page 18: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Image Analysis => Rfg ; Rbg ; Gfg ; Gbg (fg = foreground, bg = background.) For each spot on the slide calculate:

Red intensity = R = Rfg - Rbg Green intensity = G = Gfg - Gbg

M = Log2( Red intensity / Green intensity)

Subtraction of background values (additive background model assuming to be locally constant …)

Sources of background: probe unspecifically sticking on slide, irregular / dirty slide surface, dust,

and noise / errors) in the scanner measurement Not included: real cross-hybridisation and unspecific

hybridisation to the probe

Step 1: a) Background Correctionb) Calculation of (log) ratios

Page 19: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Subtraction of background has shown frequently not to improve the performance:while making the average of many measurements closer to

the true values (reduced bias or systematic error)it causes higher variability (lower reproducibility)

Comment to Background Correction

A. High variance - Unbiased Estimator

B. Low variance - Biased Estimator

average

single meas.

Page 20: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

A. High variance - Unbiased Estimator

when you take many measurements: the average will be closer to the true value more frequently

B. Low variance - slightly biased Estimator

when you take one or a few measurements: the average will be closer to the true value more frequently

DAF Microarrays 2002: we preferred no subtraction, should be re-evaluated with Agilent scanner (and GenePix IAS)

Which is better ?

Page 21: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

A reminder on logarithms

Page 22: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

A numerical example

Page 23: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

M = log R/G = logR - logG

A = ( logR + logG ) /2

Positive controls

(spotted in varying concentrations) Negative controls

blanks

Lowess curve

Step 2: An M vs A (MVA) Plot

Page 24: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Why use an M vs A plot ?

1. Logs stretch out region we are most interested in.2. Can more clearly see features of the data such as intensity

dependent variation, and dye-bias.3. Differentially expressed genes more easily identified.4. Intuitive interpretation

Page 25: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

S1.n. Control Slide: Dye Effect, Spread.

MVA plot: looking at data

Lowess curve

Spot identifier

Page 26: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

• Assumption: Changes roughly symmetric

• First panel: smooth density of log2G and log2R.

• Second panel: M vs A plot with median put to zero

Step 3: Normalisation - global median centering

common median

Page 27: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

• Assumption: changes roughly symmetric at all intensities.

Step 4: Normalisation - lowess- local median centering

Page 28: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

What is this normalization doing?

Page 29: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Local regression

• Classical (global) regression: draws a single line to the entire set of points

• Local regression: draws a curve through noisy data by smoothing

• Lowess (LOcally WEighted Scatterplot Smoothing) is a type of local regression

• Can correct for both print-tip and intensity-dependent bias with lowess fits to the data within print-tip groups

Page 30: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Local regression illustrated

Page 31: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Lowess line

Page 32: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

• After within slide global lowess normalization.• Likely to be a spatial effect.

Print-tip groups

Lo

g-r

ati o

sStep 5: Normalisation - spatial corrections

Page 33: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Normalization between groups (ctd)

• After print-tip location- and scale- normalization.

Lo

g-r

ati o

s

Print-tip groups

normalized values look nice , but .....

Page 34: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Effects of Location

Normalisation

(example)Before

After

Page 35: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Boxplots of log ratios by pin group

Lowess lines through points from each pin group

Identifying sub-array effects

Page 36: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Assumption:

All (print-tip-)groups should have the same spread in M

True ratio is ij where i represents different (print-tip)-groups and j represents different spots. Observed is Mij, where Mij = ai * log(ij)

Robust estimate of ai is

Corrected values are calculated as:

Step 6: Rescaling (Spread-Normalisation)

Page 37: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Illustration: print-tip-group - NormalisationAssumption: For every print group: changes roughly symmetric at all intensities.

Glass Slide

Array of bound cDNA probes

4x4 blocks = 16 pin groups

Page 38: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Which normalization to use?

Case 1: A few genes that are likely to change and / or a random large collection of genes (expect as many up as down):

Each slide per se:– Location: print-tip-group lowess normalization.– Scale: for all print-tip-groups, adjust MAD to equal the geometric

mean for MAD for all print-tip-groups.

Case 2: Non-random gene collection and / or many genes do change appreciably: – USE DYE-SWAP APPROACH– Self-normalization: take the difference of the two log-ratios.– Check using controls or known information.

Page 39: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

MVA plots: what to look at ?

How to use the spikes ?

Points:signal intensity

backgroundsaturation

homogeneity , normalizabilityproblem diagnosis

Page 40: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Webpage

How to use the plots ?

Use of the different options

Page 41: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

Quality control before normalization (?)

Choice of normalization

Page 42: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 43: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 44: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 45: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 46: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 47: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 48: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 49: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 50: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 51: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 52: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003
Page 53: WORKSHOP SPOTTED 2-channel ARRAYS DATA PROCESSING AND QUALITY CONTROL Eugenia Migliavacca and Mauro Delorenzi, ISREC, December 11, 2003

END

questions