data mining chapter 1 introduction -- basic data mining tasks -- related concepts -- data mining...

29
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Upload: carol-wendy-hamilton

Post on 16-Jan-2016

283 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Data Mining

Chapter 1

Introduction

-- Basic Data Mining Tasks

-- Related Concepts

-- Data Mining Techniques

Page 2: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Definition:

Data Mining is defined as finding a hidden information in a database.

General database is access as follows :

DBMS Database

SQL

Results

PC

Page 3: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Data Mining involves number of algorithms to accomplish the tasks:

The algorithms examine the data and determine a

model that is closest to the characteristics of the data being examined.

Data mining algorithms are categorized as :

1) Model : To fit a model for data

2) Preference: Some criteria must be used to fit one model over another.

3) Search: All algorithms require some technique to search the data.

Page 4: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Data Mining Models and Tasks

Data mining

Predictive Descriptive

Classification

RegressionTime series analysis

PredictionClustering

Summarization

Association rules

Sequence Theory

Page 5: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Predictive model makes prediction based on the previous result sets ; it uses historical data.

For e.g a credit card use might be refused not because of the user’s own credit history, but because of the current purchase is similar to earlier purchases that were subsequently found to be made stolen cards.

Here the predictive model is used to predict the credit risk.

A descriptive model identifies patterns or relationship

Page 6: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Classification:

- Maps data into predefined groups or classes

- It is also referred as supervised learning because the classes are defined before examining the data.

-E.g whether to make a bank loan and identifying credit risks.

-Pattern recognition is a type of classification.

Page 7: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

In pattern recognition an input pattern is classified into one of several classes based on its similarity to these predefined classes

Example:

An airport security screening station used to determine if passenger is terrorist or criminals

Page 8: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Regression:

It is used to map a data item to a real valued prediction variable.

In regression there is a learning of function that does mapping.

Regression assumes that the target data fit into some known type of function (e.g linear , logistic,etc);

For e.g A professor want to reach a certain level of savings

Page 9: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Time Series Analysis :

The value of an attribute is examined as it varies over time. The values are obtained as evenly spaced(daily,weekly,hourly etc.).

The time series plot is used to visualize the time series.

Page 10: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Prediction:

Prediction is a type of classification.

The only difference is that prediction is predicting a future state rather than current state.

e.g Predicting flooding ;

Page 11: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Clustering:

Clustering is alternatively referred to as unsupervised learning or segmentation.

The clustering is usually accomplished by determining the similarity among data on predefined attributes.

For e.g Catlogs of demographic groups;

Page 12: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Summarization :

It maps data into subsets with associated simple descriptions.

Summarization is also called characterization or generalization.

It extracts or derives representative information about the database.

For e.g One of many criteria used to compare universities by the U.S News and World Report is the average SAT or ACT score.

Page 13: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Association Rules:

An association rule is a model that identifies specific types of data associations.

Sequence Discovery:

Sequential analysis is used to determine sequential patterns in data.And these patterns are based on a time sequence of actions.

They are also similar to associations in that data are found to be related , but the relationship is based on time.

Page 14: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Data Mining versus Knowledge Discovery Databases :

Knowledge discovery in databases is the process of finding useful information and patterns in data .

While , data mining is the use of algorithms to extract the information and patterns derived by the KDD process.

Page 15: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

KDD is a process which has data as an input and the output is useful information.

SQL stmt.

Database

Result

Page 16: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

selection preprocessing transformation

Data mining

Interpretation

Initial data target dataPreprocessed data

Transformed data

Knowledge

The KDD process consists of the following five steps:

Page 17: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Some Related Concepts

-Database / OLTP

-FUZZY sets and FUZZY LOGIC

-Information Retrieval

-Decision Support System

-Dimensional Modeling

-Data Warehousing

-OLAP

Page 18: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Some Related Concepts

-Web Search Engine

-Statistics

-Machine Learning

-Pattern Matching

Page 19: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Database/OLTP Systems

-A Database contains the data of an organization or enterprise .

-A database follows the database techniques and handles the entire data with respect to its model and relationship among its entities.

-To describe the data a data model is design

Page 20: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

ER Model Example

Employee JobHasJob

ID Name Job No Job Desc

Address Salary Basic

Page 21: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Fuzzy Sets

Fuzzy Logic means reasoning with uncertainty

A Set of fuzzy values .

-fuzzy values means appropriate values

Consider a Fuzzy set F,

F = { x | x Є Z+ and x<= 5}

Page 22: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Information Retrieval

-

Users

Computer IRS

Keywords

Page 23: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

IR query result measures

IR systems consists of a set of documents ,

Where , D = { D1 , D2 ,…., Dn} .

Input to the system is query q ( which contains the keywords) .

Then , Similarity between the query and each document is calculated by : sim(q,Di) .

So the effectiveness of the system in processing the query is measured by , precision and recall

Page 24: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

IR query result measures

Precision = | Relevant and Retrieved |

|Retrieved|

Recall = | Relevant and Retrieved ||Relevant|

Precision value is to answer : “Are all documents retrieved ones ?“

And, Recall value is : “Have all relevant documents been retrieved?”

Page 25: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Decision Support System

-Dimensional Modeling

A dimension is a collection of logically related attributes and is viewed as an axis for modeling the data.

The time dimension : year , time , month , century , decade etc;

Page 26: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Web Search Engine

Web Search engines are treated as IR systems.

KeywordsSearch Process

Servers Servers

Page 27: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Search Engine LimitationsSearch Engine is facing a lot of problems:

-Abundance

Single query cannot retrieve all the database on the Web;

-Limited Coverage

Though the search engines are available but only limited data is searched by it

-Limited Query : Limitations due to search engines.

-Limited Customization : lack of knowledge to the user

Page 28: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Machine Learning

Machine learning is the area of AI that examines how to write programs that can learn.

In data mining machine learning is used for prediction or classification.

For data mining applications it follows some model.

The two types of machine learning are :

- Supervised Learning

- Unsupervised learning

Page 29: Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Pattern Matching