recent research and development on microarray data mining shin-mu tseng 曾新穆...
TRANSCRIPT
![Page 1: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/1.jpg)
Recent Research and Development on Microarray Data Mining
Shin-Mu Tseng 曾新穆[email protected]
Dept. Computer Science and Information EngineeringNational Cheng Kung University
Taiwan, R.O.C.
August 13, 2001
![Page 2: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/2.jpg)
2
Outline
Microarray Techniques Goal of Microarray Data Mining Clustering Methods Efficient Microarray Data Mining Conclusions
![Page 3: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/3.jpg)
3
Current Status
Human genome project is at finishing stage, revealing that there are about 30,000 functional genes in a human cell
For more than 90% of the genes, we know little about their real functions
![Page 4: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/4.jpg)
4
Microarray Techniques
Main Advantage of Microarray Techniques allow simultaneous studies of the expression of thousands of
genes in a single experiment
Microarray Process Arrayer Experiments: Hybridization Image Capturing of Results Analysis
![Page 5: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/5.jpg)
5
Goal of Microarray Mining
genetest
1234....1000
A
0.60.2 00.7 .. ..0.3
B
0.40.9 00.5 .. ..0.8
C
0.20.80.30.2 .. ..0.7
… … ….
… … … … … … …
Multi-Conditions Expression Analysis
![Page 6: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/6.jpg)
6
Goal of Microarray Mining
genetest
1234....1000
A
0.60.2 00.7 .. ..0.3
B
0.40.9 00.5 .. ..0.8
C
0.20.80.30.2 .. ..0.7
… … ….
… … … … … … …
Multi-Conditions Expression Analysis
![Page 7: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/7.jpg)
7
Sample Clustering Results
![Page 8: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/8.jpg)
8
Clustering Methods
Types of Clustering Methods Partitioning : K-Means, K-Medoids, PAM, CLARA … Hierarchical : HAC 、 BIRCH 、 CURE 、 ROCK Density-based : CAST, DBSCAN 、 OPTICS 、 CLIQUE
… Grid-based : STING 、 CLIQUE 、 WaveCluster… Model-based : COBWEB 、 SOM 、 CLASSIT 、 AutoCl
ass…
![Page 9: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/9.jpg)
9
Clustering Methods (cont.)
a,b,c,d,e
c,d,e
a,b
d,ed
a
b
c
e
0 1 2 3 4agglomerative
divisive4 3 2 1 0
Partitioning Hierarchical
![Page 10: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/10.jpg)
10
Clustering Methods (cont.)
Density-based Grid-based
![Page 11: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/11.jpg)
11
CAST Clustering
Input S : a symmetic n × n Similarity Matrix , S(i, j) [0, 1]∈ t : Affinity Threshold (0 < t < 1)
Method1. Choose a seed for generating a new cluster
2. ADD: add qualified items to the cluster
3. REMOVE: remove unqualified items from the stable cluster
4. Repeat Steps 1-3 till no more clusters can be generated
![Page 12: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/12.jpg)
12
Similarity Measurements:Correlation Coefficients
The most popular correlation coefficient is Pearson correlation coefficient (1892)
correlation between X={X1, X2, …, Xn} and Y={Y1, Y2,
…, Yn} :
where
n
k
kk
YX
YYXX
nr
1
1
n
k
k
n
GGG
1
2
![Page 13: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/13.jpg)
13
Similarity Measurements:Correlation Coefficients (cont.)
It captures the similarity of the ‘‘shapes’’ of two expression profiles, and ignores differences between their magnitudes.
![Page 14: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/14.jpg)
14
Problems in Microarray Mining
How to cluster microarray data with the following requirements met simultaneously ? Efficiency Accuracy Automation
![Page 15: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/15.jpg)
15
Problems in Microarray Mining (cont.)
How to cluster microarray data with the following requirements met simultaneously ? Efficiency Accuracy Automation
Good Clustering Methods +
Validation Techniques
![Page 16: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/16.jpg)
16
Efficient Microarray Mining
Improved CAST algorithm for clustering Hubert’s Γ statistic for validation Iterative sampled computation for automatic clustering
![Page 17: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/17.jpg)
17
Reduce the Computation
1. Narrow down the threshold range
2. Split and Conquer: find “nearly-best” result
<Example> m = 4
threshold0 100%
LM RM
LM: Left MarginRM: Right Margin
![Page 18: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/18.jpg)
18
Experimental Results
Dataset Source : Lawrence Berkeley National Lab (LBNL)
Michael Eisen's Lab (http://rana.lbl.gov/EisenData.htm )
Microarray expression data of yeast saccharomyces cerevisiae, containing 6221 genes with 80 conditions
Similarity matrix was obtained in advance
![Page 19: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/19.jpg)
19
Experimental Results (cont.)
Without Range Narrow down Executions: 19 Execution Time: 246
sec Γ statistic: 0.5138
With Range Narrow down Executions: 13 Execution Time: 27 sec Γ statistic: 0.5137
![Page 20: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/20.jpg)
20
Experimental Results (cont.)
Comparison
Method
Execution
Time (Sec)
Cluster Number
Best
Γ Statistic
Our Method 27 57 0.51
K-means(k= 3 ~ 21)
404 5 0.45
K-means(k= 3 ~ 39)
1092 5 0.45
![Page 21: Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering](https://reader035.vdocuments.mx/reader035/viewer/2022062409/5697bfa31a28abf838c9681e/html5/thumbnails/21.jpg)
21
Conclusions
Microarray data analysis is an emerging field needing support of data mining techniques Accuracy Efficiency Automation