microarray data analysis system (version 2.19 )

40
MIcroarray Data Analysis System (version 2.19) Wei Liang October 2004

Upload: vince

Post on 05-Jan-2016

52 views

Category:

Documents


1 download

DESCRIPTION

MIcroarray Data Analysis System (version 2.19 ). Wei Liang October 2004. Printer. Scanner. Database. AGED. Database. Others…. Database. MAD. Microarray Data Flow. .tiff Image File. Image Analysis. Raw Gene Expression Data. Gene Annotation. Normalization / Filtering. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MIcroarray Data Analysis System (version  2.19 )

MIcroarray Data Analysis System(version 2.19)

Wei Liang

October 2004

Page 2: MIcroarray Data Analysis System (version  2.19 )

Microarray Data Flow

Image Analysis

Database

AGED

Database

Others…

Database

MAD

Raw Gene Expression Data

Normalized Data with Gene Annotation

Interpretation of Analysis Results

.tiff Image File

Gene Annotation

ScannerPrinter

Normalization / Filtering

Expression Analysis

Data Entry / Management

Page 3: MIcroarray Data Analysis System (version  2.19 )

MIDAS is a

Normalization and

Filtering tool for microarray data analysis!

Page 4: MIcroarray Data Analysis System (version  2.19 )

MIDAS is a

Normalization and

Filtering tool for microarray data analysis!

Serves as a data pre-processor for clustering analysis (MeV).

Page 5: MIcroarray Data Analysis System (version  2.19 )

Why Normalization and Filtering?

Cy3

Cy5

Cy5-cDNA

Cy3-cDNA

RT

RT

cDNAarray

Cy5 intensity

Cy3 intensity

Sample2 mRNA

Sample1 mRNA

Wavelength dependent

Intensity dependent

Uneven hybridization gel

print-tip variations

Background variations

Image processing algorithm-dependent

Systematic experimental error

.tiff Image Files

Raw Data File

Page 6: MIcroarray Data Analysis System (version  2.19 )

Why Normalization and Filtering?

• We use these intensities to identify biologically relevant patterns of expression by comparing measured levels between states on a gene-by-gene basis.

• However, before the levels can be appropriately compared, one generally performs a number of transformations on the data to eliminate questionable or low quality data, to adjust the measured intensities to facilitate comparisons, and to select those genes that are significantly differentially expressed.

• The hypothesis underlying microarray analysis is that the measured intensities for each arrayed gene represent its relative expression level.

Page 7: MIcroarray Data Analysis System (version  2.19 )

MIDAS data analysis methods• 8 normalization/transformation methods

Total Intensity normalization

• 10 quality control filtering methods

Invalid-intensity checking

LOWESS (Locfit) normalization

Iterative linear regression normalization

Iterative log mean centering normalization

Ratio Statistics normalization

Low intensity filter

Standard deviation regularization

Slice analysis (non-statistical)

In-slide replicates analysis

Flip-dye consistency checking

Ratio Statistics confidence interval checking

Signal/Noise checking

Cross-file-trim

Spot QC flag checking

MA-ANOVA

Cross-slide replicates t-test (statistical)

Cross-slide one-class SAM (statistical)

• 3 significant genes identification methods

Page 8: MIcroarray Data Analysis System (version  2.19 )

Graphical scripting language

Page 9: MIcroarray Data Analysis System (version  2.19 )

Graphical scripting language

• Read input files

• Define analysis

pipeline and set

parameters for

each analysis module

• Write output files

Page 10: MIcroarray Data Analysis System (version  2.19 )

MIDAS data analysis methods• 8 normalization/transformation methods

Total Intensity normalization

• 10 quality control filtering methods

Invalid-intensity checking

LOWESS (Locfit) normalization

Iterative linear regression normalization

Iterative log mean centering normalization

Ratio Statistics normalization

Low intensity filter

Standard deviation regularization

Slice analysis (non-statistical)

In-slide replicates analysis

Flip-dye consistency checking

Ratio Statistics confidence interval checking

Signal/Noise checking

Cross-file-trim

Spot QC flag checking

MA-ANOVA

Cross-slide replicates t-test (statistical)

Cross-slide one-class SAM (statistical)

• 3 significant genes identification methods

Page 11: MIcroarray Data Analysis System (version  2.19 )

Sample dataPair # 1st file name 2nd file name

1 NFE005d0001.mev NFE005d00020.mev

2 NFE005d0002.mev NFE005d00021.mev

3 NFE005d0003.mev NFE005d00022.mev

4 NFE005d0004.mev NFE005d00023.mev

5 NFE005d0005.mev NFE005d00024.mev

6 NFE005d0006.mev NFE005d00025.mev

7 NFE005d0007.mev NFE005d00026.mev

9 NFE005d0008.mev NFE005d00027.mev

10 NFE005d0009.mev NFE005d00028.mev

11 NFE005d00010.mev NFE005d00029.mev

12 NFE005d00011.mev NFE005d00030.mev

13 NFE005d00012.mev NFE005d00031.mev

14 NFE005d00013.mev NFE005d00032.mev

15 NFE005d00014.mev NFE005d00033.mev

16 NFE005d00015.mev NFE005d00034.mev

17 NFE005d00016.mev NFE005d00035.mev

18 NFE005d00017.mev NFE005d00036.mev

19 NFE005d00018.mev NFE005d00037.mev

20 NFE005d00019.mev NFE005d00038.mev

Page 12: MIcroarray Data Analysis System (version  2.19 )

LOWESS (Locfit) normalization

-3

-2

-1

0

1

2

3

7 8 9 10 11 12 13 14

log(Cy3*Cy5)

A SD = 0.346

• Observations

1. Tilted tails at low intensity end and high intensity end2. Mean not centered at 0 – intensity dependent

R-I plot: logRatio vs. logIntensityProduct

Page 13: MIcroarray Data Analysis System (version  2.19 )

LOWESS (Locfit) normalization

-3

-2

-1

0

1

2

3

7 8 9 10 11 12 13 14

log(Cy3*Cy5)

A SD = 0.346Gene X

• If Cy3, Cy5 equally expressed, log2(Cy5/Cy3) = 0

• Two factors contributed to the up-regulated gene X: 1. Biological factors (we are interested) 2. Experimental factors, e.g. different sensitivity to red and green lasers (we are NOT interested and desire to get rid of.)

Exp factor

Bio factor

Page 14: MIcroarray Data Analysis System (version  2.19 )

-3

-2

-1

0

1

2

3

7 8 9 10 11 12 13 14

log(Cy3*Cy5)

A SD = 0.346Gene X

Exp factor

Bio factor

We need to find a way to extract the experimental factors

Approach: Assume similar experimental factors applied

to genes closer to each other in the logProd-logRatio plot Predict the Exp factor from a group of locally neighboring

data --- equivalent to a curve fitting problem.

LOWESS (Locfit) normalization

Page 15: MIcroarray Data Analysis System (version  2.19 )

LOWESS (Locfit) normalization

• Local linear regression model

• Tri-cube weight function

• Least Squares

Estimated values of log2(Cy5/Cy3) as function of log10(Cy3*Cy5)

WYXWXX

xyxw

xyxw

xy

iii

iii

ii

')'(

0)()(

)()(

1

2

2

-3

-2

-1

0

1

2

3

7 8 9 10 11 12 13 14

log(Cy3*Cy5)

A SD = 0.346

Page 16: MIcroarray Data Analysis System (version  2.19 )

LOWESS (Locfit) normalization

Use the estimated curve y(xi) to correct raw data

-3

-2

-1

0

1

2

3

7 8 9 10 11 12 13 14

log(Cy3*Cy5)

A SD = 0.346

Gene X

y(xi) = Exp factorBio factor

log2(Ri’/Gi’) = log2(Ri/Gi) – y(xi)

log2(Ri’/Gi’) = log2(Ri/Gi) – log22y(xi)

log2(Ri’/Gi’) = log2(Ri/Gi * 1/2y(xi))

Ri’ = Ri

Gi’ = Gi * 2 y(xi)

Page 17: MIcroarray Data Analysis System (version  2.19 )

LOWESS (Locfit) normalization

-3

-2

-1

0

1

2

3

7 8 9 10 11 12 13 14

log(Cy5*Cy3)

SD = 0.346SD = 0.338

B

LOWESS-corrected RI plot

Page 18: MIcroarray Data Analysis System (version  2.19 )

Standard deviation regularization

Assumption: Within each block and each slide, spots should have the same spread for log(Cy5/Cy3, 2) values

SD-Reg scales the (Cy3, Cy5) intensity pair for each spot so that the spot sets within each block or each slide will have the same standard deviation as other blocks or slides.

Page 19: MIcroarray Data Analysis System (version  2.19 )

Standard deviation regularization

3

5log2 Cy

Cyaij

• Let aij be the raw log ratio for the jth spot in ith block (or slide)

M

M j

j

ijij

Nijij

Nijij

aa

aa

aa

1

1'

)(

)(

2

2

where Nj denotes the number of genes ith block or ith slide, M denotes the number of blocks or slides, aij denotes the

log ratio mean of ith block (or ith slide)

a’ij be the scaled log ratio for the jth spot in ith block (or slide)

Page 20: MIcroarray Data Analysis System (version  2.19 )

Standard deviation regularization

Page 21: MIcroarray Data Analysis System (version  2.19 )

Flip dye replicates consistency filter

• The intensities in the file pair are flipped, i.e.

R1/G1 ~ G2/R2 or R1~ G2, G1 ~ R2

G1R1 G2R2Gene1

Gene2

Gene3

Gene4

Gene8

Gene7

Gene6

Gene5

• Flip dye experiments help reduce random error

Page 22: MIcroarray Data Analysis System (version  2.19 )

Flip dye replicates consistency filter• Calculate expression levels for all genes in the flip-dye pair

• Filter genes with inconsistent expression levels betweenflip-dye replicates

• For those genes passed the consistency checking, take geometric mean for the corresponding intensities from the replicated pairs

How consistency is measured between replicates?

Page 23: MIcroarray Data Analysis System (version  2.19 )

Flip dye replicates consistency filter

1

2211

RGGR

File 1 File 2G1R1 G2R2Gene

2

2

1

1

R

G

G

R100% consistency: 0

21

21log

2

21

1

log 22 GG

RR

R

GG

R

Page 24: MIcroarray Data Analysis System (version  2.19 )

Flip dye replicates consistency Filter

• SD cut vs. Threshold cut

SD cut

Threshold cut

Regardless of datasets, always cut the same percentage for the same

The percentage to cut depends on the specified log-ratio consistency range

-1< < 1

1/2 < < 2

21

21log2 GG

RR

21

21

GG

RR

Page 25: MIcroarray Data Analysis System (version  2.19 )

Flip dye replicates consistency filter• Calculate expression levels for all genes in the flip-dye pair

• Filter genes with inconsistent expression levels betweenflip-dye replicates

• For those genes passed the consistency checking, take geometric mean for the corresponding intensities from the replicated pairs

Page 26: MIcroarray Data Analysis System (version  2.19 )

Slice Analysis filter• Remove genes with z-scores beyond an interested range

Page 27: MIcroarray Data Analysis System (version  2.19 )

Slice Analysis filter• Remove genes with z-scores beyond an interested range

Page 28: MIcroarray Data Analysis System (version  2.19 )

Slice Analysis filter

-3

-2

-1

0

1

2

3

7 8 9 10 11 12 13 14

log(Cy5*Cy3)

SD = 0.346SD = 0.338

B

• Define a slice window• Sliding the window along the log(IntensityProduct) axis• Calculate logRatioMean and logRatioSD of data points within each slice window• Calculate Z-scores of each data point

Z-score = (logRatio-logRatioMean)/ logRatioSD• Trim data with Z-scores beyond interested range

Page 29: MIcroarray Data Analysis System (version  2.19 )

Slice Analysis filter

-4

-3

-2

-1

0

1

2

3

4

7 8 9 10 11 12 13 14

log(Cy3*Cy5)

log

2(C

y5/C

y3)

-8

-6

-4

-2

0

2

4

6

8

7 8 9 10 11 12 13 14

log(Cy3*Cy5)

log

2(C

y5/C

y3)

Page 30: MIcroarray Data Analysis System (version  2.19 )

Analysis packaging

myAnalysis.prj

Page 31: MIcroarray Data Analysis System (version  2.19 )

MIDAS graphing

Page 32: MIcroarray Data Analysis System (version  2.19 )

MIDAS graphing

R-I plot (.prc)

Box plot (.box)

FlipDye Diagnostic plot (.rrc)Intensity plot (.ity, .lty)

Z-score Distribution plot (.his) SAM plot (.sam)

Page 33: MIcroarray Data Analysis System (version  2.19 )

MIDAS data viewer

Page 34: MIcroarray Data Analysis System (version  2.19 )

Statistical significant genes identification methods

Two methods implemented in this release of MIDAS:

• Cross-slide replicates one-class T-test

• Cross-slide replicates one-class SAM

Page 35: MIcroarray Data Analysis System (version  2.19 )

SAM (Significance Analysis of Microarrays)

Tusher, V.G., R. Tibshirani and G. Chu. 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences USA 98: 5116-5121.

A statistical technique for finding significant genes in a set of microarray experiments.

Reference:

Designs:

• two-class unpaired• two-class paired• multi-class unpaired• censored survival• one-class (available in this release)

Page 36: MIcroarray Data Analysis System (version  2.19 )

SAM (Significance Analysis of Microarrays)

One-class SAM:

Identify genes whose mean expression across experiments are different from a user-specified mean.

• Assign a score (d) to each gene based on its change in expression relative to the standard deviation of repeated measurements for the gene

• Genes with scores > a threshold (Δ) are deemed potentially significant

• For these “deemed potentially significant” genes, the proportion of

them likely to have been wrongly identified by chance, or

False Discovery Rate (FDR) is estimated

• The goal is picking a set of differentially expressed genes with a

user-satisfied FDR

Page 37: MIcroarray Data Analysis System (version  2.19 )

SAM (Significance Analysis of Microarrays)

Δ adjustment

FDR

positively significant genes

Page 38: MIcroarray Data Analysis System (version  2.19 )

Automated report generation

Page 39: MIcroarray Data Analysis System (version  2.19 )

Automated report generation

Page 40: MIcroarray Data Analysis System (version  2.19 )

TM4 MIDAS web page

http://www.tigr.org/software/tm4/midas.htmlhttp://www.tm4.org/midas.html