d ata m ining a n o verview by : j oseph c asabona data warehouse-->

16
DATA MINING AN OVERVIEW BY: JOSEPH CASABONA Data Warehouse-- >

Upload: randall-mccormick

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

DATA MININGAN OVERVIEW

BY: JOSEPH CASABONA

Data Warehouse-->

Page 2: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

OVERVIEW

• What is Data Mining?• Introduction to KDD• Type of Data found using Data Mining• The 4 Goals of Data Mining• Case Study: MetLife

Page 3: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

WHAT IS DATA MINING?

• Definition: The mining or discovery of new information in terms of patterns or rules from vast amounts of data

• Adds more functionality than a DBMS• Creates relationships within the data• One step in the KDD Process 

Page 4: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

KDD

• Stands for "Knowledge Discovery in Databases"•  Six step process that helps us organize and extract

new data from already existing data• The six steps are: data selection, cleansing,

enrichment, transformation, mining, and report generation.

Page 5: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

KDD CONT.

• Selection and cleaning grab and validate the data to make sure it's good, complete, and proper.

• Enrichment will add more to the data from other sources.

• Transformation then limits the data in some way

Page 6: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

DATA MINING

• Result is new information the user would not know just by standard querying.

•  Can be in the form of:o Association Ruleso Sequential Patternso Classification Trees

Page 7: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

THE FOUR GOALS OF DATA MINING

• Prediction: Using current data to make prediction on future activities

• Identification: "Data patterns can be used to identify the existence of an item, an event, or an activity"

Page 8: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

THE FOUR GOALS CONT.

• Classification: Breaking the data down into categories based on certain attributes.

•  Optimization: Using the mined data to make optimizations on resources, such as time, money, etc.

Page 9: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

DATA MINING EXAMPLES

• Most have been consumer bases• Applicable in most industries• Next: Case Study on MetLife

Page 10: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

CASE STUDY: METLIFE

Company Profile

MetLife, Inc. is a leading provider of insurance and other financial services to millions of individual and institutional customers throughout the United States. 

Established in 1863, Metlife now has offices all over the US and the world, and offers ten different types of insurances and financial services.

Page 11: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

CASE STUDY: METLIFE

Industry: Insurance and Financial Services How they use Data Mining:  Fraud Detection

Page 12: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

CASE STUDY: METLIFE

• Project first started in 2001• MetLife set out to build $50 Million relational

database• This project would consolidate data from 30

business world wide. 

Page 13: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

CASE STUDY: METLIFE

• Around same time, it was reported that $30 Million of insurance money went to fraudulent claims.

• MetLife teamed up with Computer Sciences Corporation (CSC) to o License their data mining tool (called Fraud

Investigator), o Develop @First, "an early fraud

detection system"

Page 14: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

CASE STUDY: METLIFE

• By 2003, MetLife's data mining operation was in full swing.

• They were able to detect fraud in a fraction of the time it would take in man hours

• One example is detecting rate evasion

Page 15: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

CASE STUDY: METLIFE

•  Rate evasion is lying about where you live to pay lower premiums.

• Metlife used data mining to detect rate evasion by matching ZIP codes with phone numbers to see if the cities matched.

• In 2.5 hours, Metlife found 107 fraudulent claims, all linked to a rate-evasion ring in NY and Massachusetts. 

Page 16: D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->

QUESTIONS/COMMENTS?