data processing technologies for dna microarray nini rao school of life science and technology...

34
Data Processing Techno Data Processing Techno logies for DNA logies for DNA Micr Micr oarray oarray Nini Rao Nini Rao School of Life Science And School of Life Science And Technology Technology UESTC UESTC 14/11/2004 14/11/2004

Upload: joleen-goodwin

Post on 13-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Data Processing TechnologieData Processing Technologies for DNA Micrs for DNA Microarrayoarray

Nini RaoNini Rao

School of Life Science And School of Life Science And Technology Technology

UESTCUESTC

14/11/200414/11/2004

Page 2: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

• IntroductionIntroduction

• The Applications of SVD The Applications of SVD TechnologyTechnology

• The Applications of NMF The Applications of NMF TechnologyTechnology

• SummarizationSummarization

Page 3: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

IntroductionIntroduction

• 1. Gene and Genomes1. Gene and Genomes

Gene ----The basic unit of genetic function Gene ----The basic unit of genetic function

Gene Expression ----The process by which Gene Expression ----The process by which

genetic information at the DNA level is genetic information at the DNA level is converted into functional proteins.converted into functional proteins.

Page 4: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

IntroductionIntroduction

Genome Structure ---- each organism Genome Structure ---- each organism contains a unique genomic sequence contains a unique genomic sequence with a unique structure.with a unique structure.

Page 5: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Gene structureGene structure

Page 6: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004
Page 7: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004
Page 8: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004
Page 9: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004
Page 10: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004
Page 11: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Genome Data with unknown Genome Data with unknown biological biological

meanings exponentially increase.meanings exponentially increase.

There are needs for mining these There are needs for mining these data. data.

Page 12: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Analysis of these new data requires Analysis of these new data requires mathematical tools that are adaptable to mathematical tools that are adaptable to the large quantities of data, while reducing the large quantities of data, while reducing the complexity of the data to make them the complexity of the data to make them comprehensible.comprehensible.

Page 13: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

2. A Microarray2. A Microarray A small analytical device. A small analytical device.

That allows genomic exploration with That allows genomic exploration with

speed and precision unprecedented in speed and precision unprecedented in

the history of biology.the history of biology.

This technology was presented in 1990s.This technology was presented in 1990s.

Page 14: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

3. Microarray Analysis3. Microarray Analysis

The process of using microarrays for scientific The process of using microarrays for scientific exploration.exploration.

Massive Technologies for microarray analysis have Massive Technologies for microarray analysis have been adopted since the early 1990s.been adopted since the early 1990s.

Page 15: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

4. Type of Microarray4. Type of Microarray

Page 16: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

5. The Roles of Microarray5. The Roles of Microarray

To monitor gene expression levels on a To monitor gene expression levels on a genomic scalegenomic scale

To enhance fundamental understanding To enhance fundamental understanding of life on the molecular levelof life on the molecular level

regulation of gene expression regulation of gene expression gene function gene function cellular mechanisms cellular mechanisms medical diagnosis, treatment,medical diagnosis, treatment, drug designdrug design

Page 17: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

The microarray data form a matrixThe microarray data form a matrix

Page 18: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Applications of SVD Applications of SVD

Mathematical definition of the SVD Mathematical definition of the SVD

UU is an is an mm x  x nn matrix matrix

SS is an is an nn x  x nn diagonal matrix diagonal matrix

VVTT is also an is also an nn x  x nn matrix matrix

Page 19: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

One important result of the One important result of the SVD of SVD of XX

Page 20: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

• XX(l)(l) is the closest rank-l matrix to X. is the closest rank-l matrix to X.

• The term “closest” means that XThe term “closest” means that X(l)(l) mi minimizes the sum of the squares of the dnimizes the sum of the squares of the difference of the elements of X and Xifference of the elements of X and X(l)(l)

∑ ∑ ijij|x|xijij – x – x(l)(l)

ijij||22=min=min

Page 21: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

SVD analysis of gene SVD analysis of gene expression dataexpression data

Page 22: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

The results for Elutriation The results for Elutriation DatasetDataset

Page 23: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Pattern InferencePattern Inference

Page 24: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

The result analysis for Pattern The result analysis for Pattern InferenceInference

• (a) Raster display of v(a) Raster display of v’’ , the expression of , the expression of 14 eigengenes in 14 arrays.14 eigengenes in 14 arrays.

• (b) Bar chart of the fractions of eigenexpre(b) Bar chart of the fractions of eigenexpressionssion

• (c) Line-joined graphs of the expression lev(c) Line-joined graphs of the expression levels of r1 (red) and r2 (blue) in the 14 arrays els of r1 (red) and r2 (blue) in the 14 arrays fit dashed graphs of normalized sine(red) afit dashed graphs of normalized sine(red) and osine(blue) of period T =390 min and phnd osine(blue) of period T =390 min and phase = 2*3.14/13, respectively.ase = 2*3.14/13, respectively.

Page 25: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Data SortingData Sorting

Page 26: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

The results analysis for data The results analysis for data sortingsortingFig.3.Genes sorted by relative correlation with r1 aFig.3.Genes sorted by relative correlation with r1 a

nd r2 of normalized elutriation. nd r2 of normalized elutriation. (a)(a)Normalized elutriation expression of the sorted Normalized elutriation expression of the sorted

5,981 genes in the 14 arrays, showing traveling 5,981 genes in the 14 arrays, showing traveling wave of expression. wave of expression.

(b)(b) Eigenarrays expression; the expression of a1 aEigenarrays expression; the expression of a1 and a2, the eigenarrays corresponding to r1 and nd a2, the eigenarrays corresponding to r1 and r2, displays the sorting.r2, displays the sorting.

(c)(c)Expression levels of a1(red) and a2(green) fit noExpression levels of a1(red) and a2(green) fit normalized sine and cosine functions of period Z=rmalized sine and cosine functions of period Z=N-1= 5,980 and phase Q=2*3.14/13 (blue), respN-1= 5,980 and phase Q=2*3.14/13 (blue), respectively.ectively.

Page 27: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Other Applications for SVDOther Applications for SVD

• Missing data Missing data

• Comparison between two genomic sequenComparison between two genomic sequencesces

Page 28: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

The Applications of The Applications of NMFNMF

Mathematical definition of the NMFMathematical definition of the NMF VV (n (nm)m) = W = W (n (nr)r) . H . H (r (rm)m)

In general, (n+m)r < nm.In general, (n+m)r < nm.

It can be used to extract the features that are hidden in daIt can be used to extract the features that are hidden in dataset.taset.

Page 29: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004
Page 30: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Comparison with SVDComparison with SVD

Page 31: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

The results for Elutriation The results for Elutriation DatasetDataset

Page 32: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

The results for The results for a - factor a - factor DatasetDataset

Page 33: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

SummarizationSummarization

1. SVD1. SVD :: Normalization Normalization 。 。 no data limitationno data limitation NMFNMF :: No Normalization No Normalization Positive dataPositive data 2. SVD: Missing data, Cluster, Pattern inference, 2. SVD: Missing data, Cluster, Pattern inference, weak pattern extraction, Comparison weak pattern extraction, Comparison NMF: Pattern inference, Cluster, Finding NMF: Pattern inference, Cluster, Finding similaritysimilarity

3. ICA is used to mining DNA microarray data. 3. ICA is used to mining DNA microarray data.

Page 34: Data Processing Technologies for DNA Microarray Nini Rao School of Life Science And Technology UESTC14/11/2004

Thanks a Thanks a lot!lot!