the art and technology of data mining
DESCRIPTION
TRANSCRIPT
University of Toronto04/11/23 1
Data Mining
The Art and Science of Obtaining Knowledge from Data
Dr. Saed Sayad
University of Toronto04/11/23 2
Agenda
Explosion of data Introduction to data mining Examples of data mining in science
and engineering Challenges and opportunities
University of Toronto04/11/23 3
Explosion of Data Data in the world doubles every 20 months!
NASA’s Earth Orbiting System:
46 megabytes of data per second
4,000,000,000,000 bytes a day
FBI fingerprints image library:
200,000,000,000,000 bytes
In-line image analysis for particle detection:
1 megabyte in one second
University of Toronto04/11/23 4
Explosion of Data (cont.)
University of Toronto04/11/23 5
Explosion of Data (cont.)
University of Toronto04/11/23 6
Explosion of Data (cont.)
University of Toronto04/11/23 7
Explosion of Data (cont.)
University of Toronto04/11/23 8
Fast, accurate, and scalable data analysis techniques to extract useful knowledge:
The answer is Data Mining.
What we need?
University of Toronto04/11/23 9
What is Data Mining?
“Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.”
Data KnowledgeData Mining
University of Toronto04/11/23 10
AI,Machine Learning
Statistics
Data Mining
Database
Data Analysis
Data WarehouseOLAP
University of Toronto04/11/23 11
Data MiningData Mining
Data Analysis Database
Statistics Machine Learning Data Warehouse OLAP
University of Toronto04/11/23 12
Text Files Relational Database
Multi-dimensional Database
Entities File Table Cube
Attributes Row and Col
Record, Field, Index
Dimension, Level, Measurement
Methods Read, Write
Select, Insert, Update, Delete
Drill down, Drill up, Drill through
Language - SQL MDX
Database
University of Toronto04/11/23 13
Data Analysis
Classification Regression Clustering Association Sequence Analysis
University of Toronto04/11/23 14
Data Analysis
X1
X2 Y2
Output Variablesor
Targets
Y1Numeric
Categorical
Numeric
Categorical
Regression (0,1)
Classification (good, bad)
age, income, …
gender, occupation, …
Linear Modelsor
Decision Trees
Input Variablesor
Attributes
ModelModel
W1
W2
University of Toronto04/11/23 15
Data Analysis (cont.)
Age
Income
Clustering
1, chips, coke, chocolate2, gum, chips3, chips, coke4, …
Probability (chips, coke) ?
Association
Sequence Analysis
…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…
Xt-1 XtT
University of Toronto04/11/23 16
Data Mining in Research Life Cycle
Questions Needs
Search
Research
Experiment
Modeling
Report
Library
Data
Database
Data Analysis
University of Toronto04/11/23 17
Data Mining – Modeling Steps
1.Problem Definition
2.Data Preparation
3.Exploration
4.Modeling
5.Evaluation
6.Deployment
University of Toronto04/11/23 18
Agenda
Explosion of data Introduction to data mining Examples of data mining in science and
engineering Challenges and opportunities
University of Toronto04/11/23 19
Examples of data mining in science & engineering
1. Data mining in Biomedical Engineering
“Robotic Arm Control Using Data Mining Techniques”
2. Data mining in Chemical Engineering
“Data Mining for In-line Image Monitoring of Extrusion Processing”
University of Toronto04/11/23 20
1. Problem Definition“Control a robotic arm by means of EMG signals from biceps and triceps muscles.”
Supination Pronation Flexion Extension
Muscle Contraction
Biceps Triceps
Supination H HPronation L LFlexion H LExtension L H
University of Toronto04/11/23 21
2. Data Preparation
The dataset includes 80 records.
There are two input variables; biceps signal and triceps signal.
One output variable, with four possible values; Supination, Pronation, Flexion and Extension.
University of Toronto04/11/23 22
3. Exploration
Triceps
Record#
Scatter Plot
Flexion Extension Supination Pronation
University of Toronto04/11/23 23
3. Exploration (cont.)
Biceps
Record#
Scatter Plot
Flexion Extension Supination Pronation
University of Toronto04/11/23 24
5. Modeling
Classification
OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines …
University of Toronto04/11/23 25
6. Model Deployment
A neural network model was successfully implemented inside the robotic arm.
University of Toronto04/11/23 26
Examples of data mining in science & engineering
1. Data mining in Biomedical Engineering
“Robotic Arm Control Using Data Mining Techniques”
2. Data mining in Chemical Engineering
“Data Mining for In-line Image Monitoring of Extrusion Processing”
University of Toronto04/11/23 27
Plastics Extrusion
Plastic pellets
Plastic melt
University of Toronto04/11/23 28
Film Extrusion
Extruder
Plastic Film
Defect due to particle
contaminant
University of Toronto04/11/23 29
In-Line Monitoring
Transition Piece
Window Ports
University of Toronto04/11/23 30
In-Line Monitoring
Light Source Extruder and Interface
Optical Assembly
Imaging Computer
Light
University of Toronto04/11/23 31
Melt Without Contaminant Particles (WO)
University of Toronto04/11/23 32
Melt With Contaminant Particles (WP)
University of Toronto04/11/23 33
1. Problem Definition
Classify images into those with particles (WP) and those without particles (WO).
WO WP
University of Toronto04/11/23 34
2. Data Preparation
2000 Images
54 Input variables all numeric
One output variables with two possible values-With Particle -Without Particle
University of Toronto04/11/23 35
2. Data Preparation (cont.) Pre-processed images to remove noise
Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles
Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles
54 Input variables, all numeric
One output variable, with two possible values (WP and WO)
University of Toronto04/11/23 36
3. Exploration
Demo!
University of Toronto04/11/23 37
4. Modeling
Classification:
• OneR• Decision Tree• 3-Nearest Neighbors• Naïve Bayesian
University of Toronto04/11/23 38
5. Evaluation
Dataset Attrib. Class One-R C4.5 3.N.N Bayes
Sharp Images
54 2 99.9 99.8 99.8 95.8
Sharp + Blurry Images
54 2 98.5 97.8 97.8 93.3
Sharp + Blurry Images
54 3 87 87 84 79
10 -fold cross-validation
If pixel_density_max < 142 then WP
University of Toronto04/11/23 39
6. Deploy model A Visual Basic program will be developed to implement the model.
University of Toronto04/11/23 40
Agenda
Explosion of data Introduction to data mining Examples of data mining in science &
engineering Challenges and opportunities
University of Toronto04/11/23 41
Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques.
University of Toronto04/11/23 42
Data mining is an exciting and challenging field with the ability to solve many complex scientific and
business problems.
You can be part of the solution!