Download - Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning
![Page 1: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/1.jpg)
Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning
Jian Zhang
Supervised by: Karen Petrie
1
![Page 2: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/2.jpg)
Background
Cancer research has become an extremely data rich environment.
Plenty of analysis packages can be used for analyzing the data.
Data preprocessing.
2
![Page 3: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/3.jpg)
Rich data environment
3
• There are some factors about breast cancer
![Page 4: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/4.jpg)
Raw clinical data sample
Yes-No data:
yes: yes, Yes, Ye, yed, yef …
no: No, n, not …
null: don’t know, no data, waiting for lab Positive-Negative data:
Positive: +, ++, p, p++…
Negative: -, n, neg, n---…
Null: no data, ruined sample, waiting for lab
4
![Page 5: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/5.jpg)
Basic version
5
![Page 6: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/6.jpg)
Question?
Could we make the process automated?
6
![Page 7: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/7.jpg)
Introduction
Decision Tree learning Weka
7
![Page 8: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/8.jpg)
Decision Tree Learning
Decision tree learning is a method for approximating discrete-valued functions, which is one of the most popular inductive algorithms.
8
![Page 9: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/9.jpg)
Decision tree sample
9
![Page 10: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/10.jpg)
Weka
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, which contains a collection of algorithms for data analysis and predictive modeling.
10
![Page 11: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/11.jpg)
Experiment
Data: Training dataset with 100 instances
Test dataset with 100 instances, which has 17 different values from the training dataset
Tool: weka
11
![Page 12: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/12.jpg)
Experiment
Experiment 1 : training dataset Experiment 2 : training dataset, test dataset
12
![Page 13: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/13.jpg)
Experiment 1
Name of Tree Correctly Classified Instances (%)
Testing (%) Root mean squared error
BFTree 89 99 0.0588DecisionStump 47 55 0.422
FT 87 98 0.1698J48 82 98 0.0976
J48graft 82 98 0.0976LADTree 81 90 0.2317
LMT 84 91 0.2344NBTree 80 98 0.2326
RandomForest 83 100 0.0781
RandomTree 83 100 0.0447
REPTree 82 98 0.0985SimpleCart 89 96 0.1511
13
![Page 14: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/14.jpg)
Experiment 2
Name of Tree Correctly Classified Instances(%)
Testing (%)
Root mean squared error
BFTree 89 88 0.2813
DecisionStump 47 49 0.4318
FT 87 90 0.2194
J48 82 88 0.2098
J48graft 82 88 0.2098
LADTree 81 89 0.2494
LMT 84 89 0.234
NBTree 80 88 0.2569
RandomForest 83 88 0.2095
RandomTree 83 88 0.209
REPTree 82 88 0.2098
SimpleCart 89 87 0.284814
![Page 15: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/15.jpg)
Result
Through the results, the decision tree has a good classification and prediction for the existing entries, but for the unknown entries, the prediction is not as good as expected.
15
![Page 16: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/16.jpg)
Future work
Find and correct the incorrect prediction in the process
Automated transformation for unknown entries
16
![Page 17: Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813cb3550346895da65df0/html5/thumbnails/17.jpg)
Thank you !
17