data mining the art and science of obtaining knowledge from data

42
University of Toronto 06/13/22 1 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad

Upload: jola

Post on 22-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Data Mining The Art and Science of Obtaining Knowledge from Data. Dr. Saed Sayad. Agenda. Explosion of data Introduction to data mining Examples of data mining in science and engineering Challenges and opportunities. Explosion of Data. Data in the world doubles every 20 months! - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 1

Data Mining

The Art and Science of Obtaining Knowledge from Data

Dr. Saed Sayad

Page 2: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 2

Agenda

Explosion of data Introduction to data mining Examples of data mining in science

and engineering Challenges and opportunities

Page 3: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 3

Explosion of Data Data in the world doubles every 20 months!

NASA’s Earth Orbiting System:

46 megabytes of data per second

4,000,000,000,000 bytes a day

FBI fingerprints image library:

200,000,000,000,000 bytes

In-line image analysis for particle detection:

1 megabyte in one second

Page 4: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 4

Explosion of Data (cont.)

Page 5: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 5

Explosion of Data (cont.)

Page 6: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 6

Explosion of Data (cont.)

Page 7: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 7

Explosion of Data (cont.)

Page 8: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 8

Fast, accurate, and scalable data analysis techniques to extract useful knowledge:

The answer is Data Mining.

What we need?

Page 9: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 9

What is Data Mining?

“Data Mining is the exploration and analysis of large or small quantities of data in order to discover meaningful patterns, trends and rules.”

Data KnowledgeData Mining

Page 10: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 10

AI,Machine Learning

Statistics

Data Mining

Database

Data Analysis

Data WarehouseOLAP

Page 11: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 11

Data MiningData Mining

Data Analysis Database

Statistics Machine Learning Data Warehouse OLAP

Page 12: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 12

Text Files Relational Database

Multi-dimensional Database

Entities File Table Cube

Attributes Row and Col

Record, Field, Index

Dimension, Level, Measurement

Methods Read, Write

Select, Insert, Update, Delete

Drill down, Drill up, Drill through

Language - SQL MDX

Database

Page 13: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 13

Data Analysis

Classification Regression Clustering Association Sequence Analysis

Page 14: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 14

Data Analysis

X1

X2 Y2

Output Variablesor

Targets

Y1Numeric

Categorical

Numeric

Categorical

Regression (0,1)

Classification (good, bad)

age, income, …

gender, occupation, …

Linear Modelsor

Decision Trees

Input Variablesor

Attributes

ModelModel

W1

W2

Page 15: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 15

Data Analysis (cont.)

Age

Income

Clustering

1, chips, coke, chocolate2, gum, chips3, chips, coke4, …

Probability (chips, coke) ?

Association

Sequence Analysis…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…

Xt-1 XtT

Page 16: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 16

Data Mining in Research Life Cycle

Questions Needs

Search

Research

Experiment

Modeling

Report

Library

Data

Database

Data Analysis

Page 17: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 17

Data Mining – Modeling Steps

1.Problem Definition2.Data Preparation3.Exploration4.Modeling5.Evaluation6.Deployment

Page 18: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 18

Agenda

Explosion of data Introduction to data mining Examples of data mining in science and

engineering Challenges and opportunities

Page 19: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 19

Examples of data mining in science & engineering

1. Data mining in Biomedical Engineering“Robotic Arm Control Using Data Mining Techniques”

2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”

Page 20: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 20

1. Problem Definition“Control a robotic arm by means of EMG signals from biceps and triceps muscles.”

Supination Pronation Flexion Extension

Muscle Contraction

Biceps Triceps

Supination H HPronation L LFlexion H LExtension L H

Page 21: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 21

2. Data Preparation

The dataset includes 80 records.

There are two input variables; biceps signal and triceps signal.

One output variable, with four possible values; Supination, Pronation, Flexion and Extension.

Page 22: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 22

3. Exploration

Triceps

Record#

Scatter Plot

Flexion Extension Supination Pronation

Page 23: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 23

3. Exploration (cont.)

Biceps

Record#

Scatter Plot

Flexion Extension Supination Pronation

Page 24: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 24

5. Modeling

Classification

OneR Decision Tree Naïve Bayesian K-Nearest Neighbors Neural Networks Linear Discriminant Analysis Support Vector Machines …

Page 25: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 25

6. Model Deployment

A neural network model was successfully implemented inside the robotic arm.

Page 26: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 26

Examples of data mining in science & engineering

1. Data mining in Biomedical Engineering“Robotic Arm Control Using Data Mining Techniques”

2. Data mining in Chemical Engineering “Data Mining for In-line Image Monitoring of Extrusion Processing”

Page 27: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 27

Plastics Extrusion

Plastic pellets

Plastic melt

Page 28: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 28

Film Extrusion

Extruder

Plastic Film

Defect due to particle

contaminant

Page 29: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 29

In-Line Monitoring

Transition Piece

Window Ports

Page 30: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 30

In-Line Monitoring

Light Source Extruder and Interface

Optical Assembly

Imaging Computer

Light

Page 31: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 31

Melt Without Contaminant Particles (WO)

Page 32: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 32

Melt With Contaminant Particles (WP)

Page 33: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 33

1. Problem Definition

Classify images into those with particles (WP) and those without particles (WO).

WO WP

Page 34: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 34

2. Data Preparation

2000 Images

54 Input variables all numeric

One output variables with two possible values-With Particle -Without Particle

Page 35: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 35

2. Data Preparation (cont.) Pre-processed images to remove noise

Dataset 1 with sharp images: 1350 images including 1257 without particles and 91 with particles

Dataset 2 with sharp and blurry images: 2000 images including 1909 without particles and blurry particles and 91 with particles

54 Input variables, all numeric

One output variable, with two possible values (WP and WO)

Page 36: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 36

3. Exploration

Demo!

Page 37: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 37

4. Modeling

Classification:

• OneR• Decision Tree• 3-Nearest Neighbors• Naïve Bayesian

Page 38: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 38

5. Evaluation

Dataset Attrib. Class One-R C4.5 3.N.N Bayes

Sharp Images 54 2 99.9 99.8 99.8 95.8

Sharp + Blurry Images

54 2 98.5 97.8 97.8 93.3

Sharp + Blurry Images

54 3 87 87 84 79

10 -fold cross-validation

If pixel_density_max < 142 then WP

Page 39: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 39

6. Deploy model A Visual Basic program will be developed to implement the model.

Page 40: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 40

Agenda

Explosion of data Introduction to data mining Examples of data mining in science &

engineering Challenges and opportunities

Page 41: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 41

Challenges and Opportunities Data mining is a ‘top ten’ emerging technology. High pay job! in the financial, medical and engineering. Faster, more accurate and more scalable techniques. Incremental, on-line and real-time learning algorithms. Parallel and distributed data processing techniques.

Page 42: Data Mining The Art and Science of  Obtaining Knowledge from Data

University of Toronto04/22/23 42

Data mining is an exciting and challenging field with the ability to solve many complex scientific and

business problems.

You can be part of the solution!