1 data mining books: 1.data mining, 1996 pieter adriaans and dolf zantinge addison-wesley...

14
1 Data Mining Books: 1. Data Mining, 1996 Pieter Adriaans and Dolf Zanti nge Addison-Wesley 2. Discovering Data Mining, 1997 From Concept to Implementation Cabena and et al. Prentice Hall 3. Data Mining, 2000 Concept and Techniques Jiawei Han and Micheline Kambe r Morgan Kaufmann

Upload: meredith-reeves

Post on 27-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

1

Data Mining

Books:1. Data Mining, 1996

Pieter Adriaans and Dolf ZantingeAddison-Wesley

2. Discovering Data Mining, 1997From Concept to ImplementationCabena and et al.Prentice Hall

3. Data Mining, 2000 Concept and TechniquesJiawei Han and Micheline KamberMorgan Kaufmann

Page 2: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

2

Proceedings1. Proceedings of the International Conference on Data Mining

(ICDM)2. Proceedings of the International Conference on Data

Engineering (ICDE)3. Proceedings of ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining4. Proceedings of the International Conference on Very Large Data

Bases (VLDB)5. Proceedings of ACM SIGMOD International Conference on

Management of Data 6. Proceedings of the International Conference on Database

Systems for Advanced Applications (DASFAA)7. Proceedings of the International Conference on Database and

Expert Systems Applications (DEXA)

Page 3: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

3

8. Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (DaWak)

9. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

10. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD)

Journals1. IEEE Transactions on Knowledge and Data Engineering (TKDE)2. Journal of Intelligent Information Systems 3. Data Mining and Knowledge Discovery 4. ACM SIGMOD Record 5. The International Journal on Very Large Database 6. Knowledge and Information Systems 7. Data & Knowledge Engineering8. International Journal of Cooperative Information Systems

Page 4: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

4

Outline Introduction Knowledge Discovery in Databases (KDD) Data Mining and Query Tools Basic Data Mining Techniques Data Mining and Data Warehouse Association Rules

Page 5: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

5

A short story• The library of Babel (infinite)

Books must be somewhere in the libraryPeople wander round this library until they dieThe library contains an infinite amount of data but

no information

• Today’s environmentToo much data but too little information

Challenge• Find the required information from huge

amounts of data• The amount of data is growing increasingly

difficult to find the meaningful information

Page 6: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

6

Knowledge Discovery in Database (KDD)• The whole process of extraction of implicit,

previously unknown and potentially useful knowledge as a production factor from a large data sets

• Include data selection, cleaning, coding, data mining, and reporting

Data Mining• The key stage of Knowledge Discovery in

Database (KDD)• The process of finding the desired information

from large database

Page 7: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

7

KDD is not a new technique but rather a multi-disciplinary field of research

Page 8: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

8

AI, machine learning (1950) It is extremely difficult to create computer

that has an intelligent close to that of human beings• Lack of creativity and self-learning

1960: stop researching about learning• Neural network fail (XOR)

1980 ~: neural network changes architecture, new machine learning algorithm (decision tree, genetic algorithm, etc.), powerful computer, focus on simple and practical problem

Page 9: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

9

Why learning• Even for simple problem, such as timetable

planning extremely hard to solve with a computer but easily solved by experienced human

Using expert system to solve problem• Even for simple systems, a great many rules

existed . It is difficult to find the right rules.• Need to interview relevant experts many times

and integrate them to obtain the expert knowledgeKnowledge acquisition: using learning algorithms

to generate rules automatically

Page 10: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

10

Why interest in data mining• In the 1980s, all organizations begin to build

database. Until now, they contain gigabytes of data with much ‘hidden’ information that cannot easily be traced using SQLSQL is just a query language under the constraints that

you already know

• As the use of networks, it will become increasingly easy to connect databaseDiscover more information

• Machine learning techniques have been improvedEasier to find interesting information

• Client/server environmentElectronic commerce

Page 11: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

11

Data mining tool & Query tool• Suppose a large database containing millions of

records that describe customers’ purchasesWho bought which product on what date?What is the average turnover in July?What is an optimal segmentation of clientsWhat are the most important trends in customer

behavior

• If you know exactly what you are looking for, use query tool

• If you know only vaguely what you are looking for, use data mining tool

Page 12: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

12

Data mining in electronic commerce• The success of KDD come primarily from

marketing• Prediction

Customer buying baby clothes today may buy computer games in ten years, and fifteen years later a motorcycle

Page 13: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

13

• Suppose a company keeps the data about what products they boughtMail to everyone only 3% ~ 4% interestAnalyze user behavior, and cluster customers

according to their interests can save 50% of mailing costs

Page 14: 1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation

14

The problems of data mining• Lack of long-term vision

What do we want to get from the database in the future?

• Not all files are up to dateExample: the price of computer

• Struggle between departments• Poor cooperation between users and EDP dept.• Legal and privacy restrictions• Data model need to be transformed for different

data mining technique• Timing problems: integrate data from different

sources• Interpretation problems