data mining iiavellido/teaching/09-10/... · what is data mining?(1) “data mining is the process...

27
Lluis Belanche + Alfredo Vellido Data Mining II

Upload: others

Post on 29-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

Lluis Belanche + Alfredo Vellido

Data Mining II

Page 2: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

DM22009/2010. Alfredo Vellido

An Introduction to Mining (1)www.lsi.upc.edu/~avellido/teaching/data_mining.html

Despacho 319, Edificio Omega, BCNDespacho 107, Edif. TR-2, Terrassa

[email protected] avellido, gtalk

Tels. 934137796, 937398090

Page 3: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

Contents of the course disclaimer:(but who knows)

1. Introduction to DM and its methodologies2. Visual DM: Exploratory DM through visualization3. Pattern recognition 14. Pattern recognition 25. Feature extraction6. Feature selection7. Error estimation8. Linear classifiers, kernels and SVMs9. Probability in Data Mining10. Nonlinear Dimensionality Reduction (NLDR)11. Applications of NLDR: from medicine to ecology12. DM Case studies

Sorry guys! … no fuzzy systems …

Feature Selection & Extraction

SML & Kernel Methods

Page 4: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What is DATA MINING? (1)

“Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting through your data using pattern recognition technologies (…) is a hot new technology about one of the oldest processes of human endeavour: pattern recognition (…) It is an iterative process of extracting knowledge from business transactions (…) DM is the automatic discovery of usable knowledge from your stored data.”

Jesús Mena: Data Mining your Website(Digital Press, 1999, available @ books.google)

Page 5: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What is DATA MINING? (2)

“Data Mining, by its simplest definition, automates the detection of relevant patterns in a database (…) For many years, statisticians have manually “mined” databases (…) DM uses well-established statistical and machine learningtechniques to build models that predict customer behaviour. Today, technology automates the mining process, integrates it with commercial data warehouses, and presents it in a relevant way for business users (…) the leading DM products address the broader business and technical issues, such as their integration into complex IT environments.”

Berson, Smith, & Thearling: Building Data Mining Applications for CRM (McGraw-Hill, 2000)

Page 6: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What is DATA MINING? (3)

WIKIPEDIA DIXIT: “Data mining has been definedas "The nontrivial extraction of implicit, previouslyunknown, and potentially useful information fromdata" (1) and "The science of extracting usefulinformation from large data sets or databases" (2). Although it is usually used in relation to analysis of data, data mining, like artificial intelligence, is an umbrella termand is used with varied meaning in a wide range ofcontexts.”

(1) W. Frawley and G. Piatetsky-Shapiro and C. Matheus, Knowledge Discovery in Databases: An Overview. AI Magazine, 1992, 213-228.

(2) D. Hand, H. Mannila, P. Smyth: Principles of Data Mining. MIT Press, 2001.

wikipedia 2005: en.wikipedia.org/wiki/Data_mining

Page 7: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What is DATA MINING? (4)

WIKIPEDIA’06 DIXIT: “Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules. It is a fairly recent topic in computer science but applies many older computational techniques from statistics, information retrieval, machine learning and pattern recognition.

wikipedia 2006: en.wikipedia.org/wiki/Data_mining

Page 8: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What is DATA MINING? (5)

In 1996, in the proceedings of the 1st International Conference on KDD, Fayyad gave one of the best-known definitions of Knowledge Discovery from Data:

“The non-trivial process of identifying valid, novel, potentially useful,and ultimately understandable patterns in data.”

KDD quickly gathered strength as an interdisciplinary research field where a combination of advanced techniques from Statistics, Artificial Intelligence, Information Systems, and Visualization are used to tackle knowledge acquisition from large data bases. The term Knowledge Discovery from Data appeared in 19891989 referring to the:

“[...] overall process of finding and interpreting patterns from data, typically interactive and iterative, involving repeated application of specific data mining methods or algorithms and the interpretation of the patterns generated by these algorithms.”

Page 9: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What is DATA MINING? (6)

WIKIPEDIAWIKIPEDIA’’08 DIXIT08 DIXIT: “Data mining is the process of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but is increasingly being used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. It has been described as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data" and "the science of extracting useful information from large data sets or databases." Data mining in relation to enterprise resource planning is the statistical and logical analysis of large sets of transaction data, looking for patterns that can aid decision making.”

wikipedia 2008: en.wikipedia.org/wiki/Data_mining

Page 10: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What is DATA MINING? (7)

WIKTIONARY’08 summarizes:

“a technique for searching large-scale databases for patterns; used mainly to find previously unknown correlations between variables that may be commercially useful”

wiktionary 2008:http://en.wiktionary.org/wiki/data_mining

Homework:

1. Look for DM in Wikipedia’09

2. Write your own WIKTIONARY entry for the term “Data Mining”

Page 11: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What to expect from a DM conference…

15-17 September’04: Wessex Institute of Technology (W.I.T.)

Page 12: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What to find in a DM conference…

Sessions 1 & 2: Text Mining

Session 3: Web Mining

Session 4: Clustering Techniques

Session 5: Data Preparation Techniques

Session 6 & 7: Applications in Business, Industry andGovernment

Session 8: Customer Relationship Management (CRM)

Session 9 & 10: Applications in Science and Engineering

Page 13: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What to find in a DM conference(three years later)…

Session 1: Categorisation Methods

Session 2: Data Preparation

Session 3: Enterprise Information Systems

Session 4: Clustering Techniques

Session 5: National Security

Session 6: Data and Text Mining

Session 7: Mining Environmental and Geospatial Data

Session 8: Applications in Business, Industry and Government

Page 14: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What to find these days …

Page 15: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What to find these days …

Investigative Data Mining For Security And Criminal Detection, First EditionJesus MenaButterworth-Heinemann 2003

Page 16: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

A different conference, a different take …IEEE CIDM 2009

2009 Symposium on Computational Intelligence and Data Mining• CI/probabilistic/statistical and other methods• Data understanding, rule extraction, logical models• Feature extraction, selection, aggregation, construction• Multimedia data mining, recognition and interpretation of image and video sequences• Mining of signals and data streams• Mining spatial and spatio-temporal data• Mining of very large datasets, scalability• Text, graph and web mining• Meta-learning, predictive data mining• Visual data mining• Case studies.• Applications to biometrics, biomedicine, chemistry, drug design, e-commerce, engineering, finance and marketing research, intelligence, industry, remote sensing, scientific data mining, security, sensory networks and others.

Page 17: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

Starved for ca$h?: ask your TIA

Page 18: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

The T.I.A.

“The Total Information Awareness (TIA) program may have been killed by congressional decree, but key elements of the program have survived at other intelligence agencies, according to congressional, federal, and research officials. TIA's goal was to employ data-mining to shift through public and private databases to track terrorists, which stirred up fears that the program would be used to spy on millions of innocent Americans.”

“Congressional officials have not disclosed which TIA programs were eliminated and which were retained, but insiders report that TIA's Evidence Extraction and Link Discovery projects, collectively encompassing 18 data-mining initiatives, are among the surviving components. “

“Despite the death of TIA, Capitol Hill is still paying for the development of software designed to collect foreign intelligence on terrorists: a $64 million research programrun by the Advanced Research and Development Activity (ARDA), which has employed some of the same researchers as TIA, was left untouched by Congress.”

Page 19: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting
Page 20: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting
Page 21: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting
Page 22: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting
Page 23: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting
Page 24: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

www.darpa.mil/ipto/programs/programs.htm

Page 25: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What’s DATA MINING?: A historicist viewpoint

���������������������

��������������� ����

�������������

��������������

������� �����

��

��������

���

Page 26: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What’s DATA MINING?: A historicist viewpoint

���������������������

��������������� ����

������

���

��������������

���������� �!���"!��

��#�$%��!&!�%

�$� � ����'�(��"!��

��)�������������������

�������

Page 27: Data Mining IIavellido/teaching/09-10/... · What is DATA MINING?(1) “Data Mining is the process of discovering actionable and meaningful patterns, profiles, and trends by sifting

What’s DATA MINING?: A historicist viewpoint

���������������������

��������������� ����

������

���

��������������

���������� �!��"!��

��#�$%��!&!�%

�$� � ����'�(��"!��

�������������������������� �