application of available statistical tools

21
Applicati on of available statistic al tools Developmen t of specific, more appropriat e statistica l tools for use with microarray s Functional annotation of results Inadequa te Computer skills to handle large datasets Intimacy with nature (strength s and deficienc ies) of the raw data Facile use of computer operating system is absent Biologica l interpret ation Applicati on of available statistic al tools Functional annotation of results Inadequa te Computer skills to handle large datasets Intimacy with nature (strength s and deficienc ies) of the raw data Facile use of computer operating system is absent Biologica l interpret ation Biology experimen t complete Thorough mining of the data for useful informati on Obstacles that thwart a successful analysis of micro-array data

Upload: hisano

Post on 24-Feb-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Intimacy with nature (strengths and deficiencies) of the raw data. Facile use of computer operating system is absent. Biological interpretation. Inadequate Computer skills to handle large datasets. Functional annotation of results. Application of available statistical tools. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Application of available statistical tools

Application of available statistical tools

Development of specific, more appropriate statistical tools for use with microarrays

Functional annotation of results

Inadequate Computer skills to handle large datasets

Intimacy with nature (strengths and deficiencies) of the raw data

Facile use of computer operating system is absent

Biological interpretation

Application of available statistical tools

Functional annotation of results

Inadequate Computer skills to handle large datasets

Intimacy with nature (strengths and deficiencies) of the raw data

Facile use of computer operating system is absent

Biological interpretation

Biology experiment complete

Thorough mining of the data for useful information

Obstacles that thwart a successful analysis of micro-array data

Page 2: Application of available statistical tools

1. Interrogates thousands of genes. (12,000 55,000 28,869)

2. Versatile with respect to tissues.

3. Recently expanded beyond major biomedical research models.

4. Asks which genes are affected by a treatment?

5. Equivalent to 35,000 northern blots overnight.

6. Time course experiments gain immense value.

Benefits of the Gene Array Approach

Page 3: Application of available statistical tools

Genechip

Page 4: Application of available statistical tools

Generate Affy.dat file

What is covered in this course?

Hyb. cRNAcocktail

Hybridize to Affy arrays

Output as Affy.chp file

Export as Text file

TotRNA

Data mining

Pattern miningPathway Analysis

Illumina platform at CCF facility

Case/UH Core Facility

Page 5: Application of available statistical tools

Perfect Match 25 mer DNA oligo

WT Expression Array Design

3’

5’

Only PM used

Perfect MatchMismatch

Probe Set (<= 26 probes)

PMProbe Cell

11m

11m

Validate using Blast and Tm

Page 6: Application of available statistical tools

Total RNA (1-5 mg) AAAAAAAAA

cRNA preparation

cRNA is now ready for hybridization to test chip

cDNA Strand 1 synthesis TTTTTTTTTNNNNNNNNNAAAAAAAAA

SS II reverse transcriptaseT7RNA pol. promoter

cDNA Strand 2 synthesisTTTTTTTTTNNNNNNNNNAAAAAAAAA NNNNN

E. coli DNA pol. I

T7RNA pol. promoter

NNNNNNNN

IVT cRNA synthesis amplifies and labels transcripts with Biotin

NNNNNNNNNNNNNAAAAAAAAAAAAAAN

TTTTTT T T T T T

UUUUUUUUUU

………..UUUUUUUUUU………..

UUUUUUUUUU………..

UUUUUUUUUU………..

UUUUUUUUUU………..

T7 RNA pol. TT

Fragmented cRNA

1. Conversion to cRNA2. Amplification (linear)3. Labelling (biotin)

Page 7: Application of available statistical tools

Chips are placed in the Fluidics station where they are washed, stained and washed again (2.5 hours)

After staining, the signal intensities are measured with a laser scanner (15 min)

Data is acquired by the computer as soon as the scan has been completed.

Chip is placed in a hybridization oven and incubatedovernight

Hybridization cocktail

Affymetrix Array Chip

Sample is added to a hybridization cocktail along with spiked control transcripts and is loaded onto an array chip

Page 8: Application of available statistical tools
Page 9: Application of available statistical tools

The first image is “sample1.dat.” note the pixel to pixel variation within a probe cell

A “*.cel.” file is automatically generated when the “*.dat” image first appears on the screen. Note that this derivative file has homogenous signal intensity within its probe cells

Page 10: Application of available statistical tools

Sample 1 Sample 2 Sample 3Gene

1

Gene

2

Gene

3

g1p1

g1p2

g1p3

g1p4

G

G

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p3

g2p4

g1p2

g3p2

g1p1

g3p1

g2p3

g2p2

g3p3

g2p1

g1p4

g3p4

g2p1

g2p3

g3p4

g2p2

g1p1

g3p1

g3p3

g2p4

g1p2

g1p3

g1p4

g3p2

g1p4

g2p3

g1p1

g3p2

g2p2

g1p3

g3p1

g3p3

g3p4

g1p2

g2p1

g2p4

Average

How do we get the individual gene signals using RMA in EC?

Page 11: Application of available statistical tools

Sample 1 Sample 2 Sample 3Gene

1

Gene

2

Gene

3

g1p1

g1p2

g1p3

g1p4

G

G

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p3

g2p4

g1p2

g3p2

g1p1

g3p1

g2p3

g2p2

g3p3

g2p1

g1p4

g3p4

g2p1

g2p3

g3p4

g2p2

g1p1

g3p1

g3p3

g2p4

g1p2

g1p3

g1p4

g3p2

g1p4

g2p3

g1p1

g3p2

g2p2

g1p3

g3p1

g3p3

g3p4

g1p2

g2p1

g2p4

Page 12: Application of available statistical tools

Sample 1 Sample 2 Sample 3Gene

1

Gene

2

Gene

3

g1p1

g1p2

g1p3

g1p4

G

G

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p3

g2p4

g1p2

g3p2

g1p1

g3p1

g2p3

g2p2

g3p3

g2p1

g1p4

g3p4

g2p1

g2p3

g3p4

g2p2

g1p1

g3p1

g3p3

g2p4

g1p2

g1p3

g1p4

g3p2

g1p4

g2p3

g1p1

g3p2

g2p2

g1p3

g3p1

g3p3

g3p4

g1p2

g2p1

g2p4

216 50 150

150 300 120

95 112 110

Page 13: Application of available statistical tools

SOMs Hierarchical clustering

Plaid clustering

Diff Call

NC

I

MI

MD

D

FoldChange

10.54.915

-11.8-3.7

Probe set Pairs Pairs used

Pos Neg Ave Diff

YDL200C 20 18 16 2 2378 P

YDL200D 20 19 16 3 237

YDM167A 20 14 7 7 5003

Abs. Call

M

A

Data manipulation is essential prior to submission of results to third party clustering and analytical programs

Page 14: Application of available statistical tools
Page 15: Application of available statistical tools

SOMs

Self organizing maps or SOMs are a popular method for detecting patterns in large data sets

Page 16: Application of available statistical tools

Sample 1 Sample 2 Sample 3Gene

1

Gene

2

Gene

3

g1p1

g1p2

g1p3

g1p4

G

G

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p3

g2p4

g1p2

g3p2

g1p1

g3p1

g2p3

g2p2

g3p3

g2p1

g1p4

g3p4

g2p1

g2p3

g3p4

g2p2

g1p1

g3p1

g3p3

g2p4

g1p2

g1p3

g1p4

g3p2

g1p4

g2p3

g1p1

g3p2

g2p2

g1p3

g3p1

g3p3

g3p4

g1p2

g2p1

g2p4

Average

How do we get the individual gene signals using RMA in EC?

Page 17: Application of available statistical tools

Sample 1 Sample 2 Sample 3Gene

1

Gene

2

Gene

3

g1p1

g1p2

g1p3

g1p4

G

G

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p3

g2p4

g1p2

g3p2

g1p1

g3p1

g2p3

g2p2

g3p3

g2p1

g1p4

g3p4

g2p1

g2p3

g3p4

g2p2

g1p1

g3p1

g3p3

g2p4

g1p2

g1p3

g1p4

g3p2

g1p4

g2p3

g1p1

g3p2

g2p2

g1p3

g3p1

g3p3

g3p4

g1p2

g2p1

g2p4

Page 18: Application of available statistical tools

Sample 1 Sample 2 Sample 3Gene

1

Gene

2

Gene

3

g1p1

g1p2

g1p3

g1p4

G

G

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p1

g1p2

g1p3

g1p4

g2p1

g2p2

g2p3

g2p4

g3p1

g3p2

g3p3

g3p4

g1p3

g2p4

g1p2

g3p2

g1p1

g3p1

g2p3

g2p2

g3p3

g2p1

g1p4

g3p4

g2p1

g2p3

g3p4

g2p2

g1p1

g3p1

g3p3

g2p4

g1p2

g1p3

g1p4

g3p2

g1p4

g2p3

g1p1

g3p2

g2p2

g1p3

g3p1

g3p3

g3p4

g1p2

g2p1

g2p4

216 50 150

150 300 120

95 112 110

Page 19: Application of available statistical tools

7% not transcribed

1% ORF

1%UTR

35-40% Intron

Non-protein-coding RNAs

The information content of the human genome

ENCODE Consortium (Nature 2007 Vol 447: 799-816)

The Human Genome

Protein-coding genes}

Small RNAs

~10%

Functional LongncRNAs

Page 20: Application of available statistical tools

The increase in complexity among eukaryotes is concomitant with an increase in the ratio of non-coding to coding DNA

Mattick, 2007

Page 21: Application of available statistical tools

Application of available statistical tools

Development of specific, more appropriate statistical tools for use with microarrays

Functional annotation of results

Inadequate Computer skills to handle large datasets

Intimacy with nature (strengths and deficiencies) of the raw data

Facile use of computer operating system is absent

Biological interpretation

Application of available statistical tools

Functional annotation of results

Inadequate Computer skills to handle large datasets

Intimacy with nature (strengths and deficiencies) of the raw data

Facile use of computer operating system is absent

Biological interpretation

Biology experiment complete

Thorough mining of the data for useful information

Obstacles that thwart a successful analysis of micro-array data