data processing technologies for dna microarray nini rao school of life science and technology...
TRANSCRIPT
Data Processing TechnologieData Processing Technologies for DNA Micrs for DNA Microarrayoarray
Nini RaoNini Rao
School of Life Science And School of Life Science And Technology Technology
UESTCUESTC
14/11/200414/11/2004
• IntroductionIntroduction
• The Applications of SVD The Applications of SVD TechnologyTechnology
• The Applications of NMF The Applications of NMF TechnologyTechnology
• SummarizationSummarization
IntroductionIntroduction
• 1. Gene and Genomes1. Gene and Genomes
Gene ----The basic unit of genetic function Gene ----The basic unit of genetic function
Gene Expression ----The process by which Gene Expression ----The process by which
genetic information at the DNA level is genetic information at the DNA level is converted into functional proteins.converted into functional proteins.
IntroductionIntroduction
Genome Structure ---- each organism Genome Structure ---- each organism contains a unique genomic sequence contains a unique genomic sequence with a unique structure.with a unique structure.
Gene structureGene structure
Genome Data with unknown Genome Data with unknown biological biological
meanings exponentially increase.meanings exponentially increase.
There are needs for mining these There are needs for mining these data. data.
Analysis of these new data requires Analysis of these new data requires mathematical tools that are adaptable to mathematical tools that are adaptable to the large quantities of data, while reducing the large quantities of data, while reducing the complexity of the data to make them the complexity of the data to make them comprehensible.comprehensible.
2. A Microarray2. A Microarray A small analytical device. A small analytical device.
That allows genomic exploration with That allows genomic exploration with
speed and precision unprecedented in speed and precision unprecedented in
the history of biology.the history of biology.
This technology was presented in 1990s.This technology was presented in 1990s.
3. Microarray Analysis3. Microarray Analysis
The process of using microarrays for scientific The process of using microarrays for scientific exploration.exploration.
Massive Technologies for microarray analysis have Massive Technologies for microarray analysis have been adopted since the early 1990s.been adopted since the early 1990s.
4. Type of Microarray4. Type of Microarray
5. The Roles of Microarray5. The Roles of Microarray
To monitor gene expression levels on a To monitor gene expression levels on a genomic scalegenomic scale
To enhance fundamental understanding To enhance fundamental understanding of life on the molecular levelof life on the molecular level
regulation of gene expression regulation of gene expression gene function gene function cellular mechanisms cellular mechanisms medical diagnosis, treatment,medical diagnosis, treatment, drug designdrug design
The microarray data form a matrixThe microarray data form a matrix
Applications of SVD Applications of SVD
Mathematical definition of the SVD Mathematical definition of the SVD
UU is an is an mm x x nn matrix matrix
SS is an is an nn x x nn diagonal matrix diagonal matrix
VVTT is also an is also an nn x x nn matrix matrix
One important result of the One important result of the SVD of SVD of XX
• XX(l)(l) is the closest rank-l matrix to X. is the closest rank-l matrix to X.
• The term “closest” means that XThe term “closest” means that X(l)(l) mi minimizes the sum of the squares of the dnimizes the sum of the squares of the difference of the elements of X and Xifference of the elements of X and X(l)(l)
∑ ∑ ijij|x|xijij – x – x(l)(l)
ijij||22=min=min
SVD analysis of gene SVD analysis of gene expression dataexpression data
The results for Elutriation The results for Elutriation DatasetDataset
Pattern InferencePattern Inference
The result analysis for Pattern The result analysis for Pattern InferenceInference
• (a) Raster display of v(a) Raster display of v’’ , the expression of , the expression of 14 eigengenes in 14 arrays.14 eigengenes in 14 arrays.
• (b) Bar chart of the fractions of eigenexpre(b) Bar chart of the fractions of eigenexpressionssion
• (c) Line-joined graphs of the expression lev(c) Line-joined graphs of the expression levels of r1 (red) and r2 (blue) in the 14 arrays els of r1 (red) and r2 (blue) in the 14 arrays fit dashed graphs of normalized sine(red) afit dashed graphs of normalized sine(red) and osine(blue) of period T =390 min and phnd osine(blue) of period T =390 min and phase = 2*3.14/13, respectively.ase = 2*3.14/13, respectively.
Data SortingData Sorting
The results analysis for data The results analysis for data sortingsortingFig.3.Genes sorted by relative correlation with r1 aFig.3.Genes sorted by relative correlation with r1 a
nd r2 of normalized elutriation. nd r2 of normalized elutriation. (a)(a)Normalized elutriation expression of the sorted Normalized elutriation expression of the sorted
5,981 genes in the 14 arrays, showing traveling 5,981 genes in the 14 arrays, showing traveling wave of expression. wave of expression.
(b)(b) Eigenarrays expression; the expression of a1 aEigenarrays expression; the expression of a1 and a2, the eigenarrays corresponding to r1 and nd a2, the eigenarrays corresponding to r1 and r2, displays the sorting.r2, displays the sorting.
(c)(c)Expression levels of a1(red) and a2(green) fit noExpression levels of a1(red) and a2(green) fit normalized sine and cosine functions of period Z=rmalized sine and cosine functions of period Z=N-1= 5,980 and phase Q=2*3.14/13 (blue), respN-1= 5,980 and phase Q=2*3.14/13 (blue), respectively.ectively.
Other Applications for SVDOther Applications for SVD
• Missing data Missing data
• Comparison between two genomic sequenComparison between two genomic sequencesces
The Applications of The Applications of NMFNMF
Mathematical definition of the NMFMathematical definition of the NMF VV (n (nm)m) = W = W (n (nr)r) . H . H (r (rm)m)
In general, (n+m)r < nm.In general, (n+m)r < nm.
It can be used to extract the features that are hidden in daIt can be used to extract the features that are hidden in dataset.taset.
Comparison with SVDComparison with SVD
The results for Elutriation The results for Elutriation DatasetDataset
The results for The results for a - factor a - factor DatasetDataset
SummarizationSummarization
1. SVD1. SVD :: Normalization Normalization 。 。 no data limitationno data limitation NMFNMF :: No Normalization No Normalization Positive dataPositive data 2. SVD: Missing data, Cluster, Pattern inference, 2. SVD: Missing data, Cluster, Pattern inference, weak pattern extraction, Comparison weak pattern extraction, Comparison NMF: Pattern inference, Cluster, Finding NMF: Pattern inference, Cluster, Finding similaritysimilarity
3. ICA is used to mining DNA microarray data. 3. ICA is used to mining DNA microarray data.
Thanks a Thanks a lot!lot!