1 data mining books: 1.data mining, 1996 pieter adriaans and dolf zantinge addison-wesley...
TRANSCRIPT
1
Data Mining
Books:1. Data Mining, 1996
Pieter Adriaans and Dolf ZantingeAddison-Wesley
2. Discovering Data Mining, 1997From Concept to ImplementationCabena and et al.Prentice Hall
3. Data Mining, 2000 Concept and TechniquesJiawei Han and Micheline KamberMorgan Kaufmann
2
Proceedings1. Proceedings of the International Conference on Data Mining
(ICDM)2. Proceedings of the International Conference on Data
Engineering (ICDE)3. Proceedings of ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining4. Proceedings of the International Conference on Very Large Data
Bases (VLDB)5. Proceedings of ACM SIGMOD International Conference on
Management of Data 6. Proceedings of the International Conference on Database
Systems for Advanced Applications (DASFAA)7. Proceedings of the International Conference on Database and
Expert Systems Applications (DEXA)
3
8. Proceedings of the International Conference on Data Warehousing and Knowledge Discovery (DaWak)
9. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)
10. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD)
Journals1. IEEE Transactions on Knowledge and Data Engineering (TKDE)2. Journal of Intelligent Information Systems 3. Data Mining and Knowledge Discovery 4. ACM SIGMOD Record 5. The International Journal on Very Large Database 6. Knowledge and Information Systems 7. Data & Knowledge Engineering8. International Journal of Cooperative Information Systems
4
Outline Introduction Knowledge Discovery in Databases (KDD) Data Mining and Query Tools Basic Data Mining Techniques Data Mining and Data Warehouse Association Rules
5
A short story• The library of Babel (infinite)
Books must be somewhere in the libraryPeople wander round this library until they dieThe library contains an infinite amount of data but
no information
• Today’s environmentToo much data but too little information
Challenge• Find the required information from huge
amounts of data• The amount of data is growing increasingly
difficult to find the meaningful information
6
Knowledge Discovery in Database (KDD)• The whole process of extraction of implicit,
previously unknown and potentially useful knowledge as a production factor from a large data sets
• Include data selection, cleaning, coding, data mining, and reporting
Data Mining• The key stage of Knowledge Discovery in
Database (KDD)• The process of finding the desired information
from large database
7
KDD is not a new technique but rather a multi-disciplinary field of research
8
AI, machine learning (1950) It is extremely difficult to create computer
that has an intelligent close to that of human beings• Lack of creativity and self-learning
1960: stop researching about learning• Neural network fail (XOR)
1980 ~: neural network changes architecture, new machine learning algorithm (decision tree, genetic algorithm, etc.), powerful computer, focus on simple and practical problem
9
Why learning• Even for simple problem, such as timetable
planning extremely hard to solve with a computer but easily solved by experienced human
Using expert system to solve problem• Even for simple systems, a great many rules
existed . It is difficult to find the right rules.• Need to interview relevant experts many times
and integrate them to obtain the expert knowledgeKnowledge acquisition: using learning algorithms
to generate rules automatically
10
Why interest in data mining• In the 1980s, all organizations begin to build
database. Until now, they contain gigabytes of data with much ‘hidden’ information that cannot easily be traced using SQLSQL is just a query language under the constraints that
you already know
• As the use of networks, it will become increasingly easy to connect databaseDiscover more information
• Machine learning techniques have been improvedEasier to find interesting information
• Client/server environmentElectronic commerce
11
Data mining tool & Query tool• Suppose a large database containing millions of
records that describe customers’ purchasesWho bought which product on what date?What is the average turnover in July?What is an optimal segmentation of clientsWhat are the most important trends in customer
behavior
• If you know exactly what you are looking for, use query tool
• If you know only vaguely what you are looking for, use data mining tool
12
Data mining in electronic commerce• The success of KDD come primarily from
marketing• Prediction
Customer buying baby clothes today may buy computer games in ten years, and fifteen years later a motorcycle
13
• Suppose a company keeps the data about what products they boughtMail to everyone only 3% ~ 4% interestAnalyze user behavior, and cluster customers
according to their interests can save 50% of mailing costs
14
The problems of data mining• Lack of long-term vision
What do we want to get from the database in the future?
• Not all files are up to dateExample: the price of computer
• Struggle between departments• Poor cooperation between users and EDP dept.• Legal and privacy restrictions• Data model need to be transformed for different
data mining technique• Timing problems: integrate data from different
sources• Interpretation problems