introduction to data mining jiang li
TRANSCRIPT
![Page 1: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/1.jpg)
1
Introduction to Data Mining
Jiang Li
Department of Computer Science & Information Technology
Austin Peay State University
![Page 2: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/2.jpg)
2
Outline
Data Collected
Knowledge Discovery – An Iterative Process
Data Mining Examples
Data Mining Functions and Algorithms
![Page 3: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/3.jpg)
3
Data Collected
Business Wal-Mart
20 million transactions a day Mobile Oil Corporation
A 100 terabytes data warehouse Science
The human genome database project
Gigabytes of data NASA Earth Observing System (EOS)
50 gigabytes data per hour Radio, Television, and Film Studios
Multimedia databases WWW – the infinite resources
Email – huge digital libraries
![Page 4: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/4.jpg)
4
Data vs. Knowledge
Technology is available to help us collect data Bar code, cameras, scanners, Radars, satellites, etc.
Technology is available to help us store data Databases, data warehouses, variety of repositories
We are swamped by data that pours on us We need to interpret this data in search for new knowledge
Our need is to extract interesting knowledge (rules, regularities, patterns, constraints) from data in large collections.
“We are drowning in information, but starving for knowledge.” John Naisbitt
![Page 5: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/5.jpg)
5
Evolution of Database Technology
1960s: Data collection, database creation (hierarchical
and network models) 1970s:
Relational data model, relational DBMS implementation
1980s: Ubiquitous RDBMS, advanced data models
(extended-relational, Object-Oriented, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)
1990s: Data mining and data warehousing, multimedia
databases, and Web-based database technology
![Page 6: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/6.jpg)
6
Knowledge Discovery
![Page 7: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/7.jpg)
7
Data Mining
In theory, data mining is a step in the knowledge discovery process. It is the extraction of implicit information from a large dataset.
In practice, data mining and knowledge discovery are becoming synonyms.
KDD – Knowledge Discovery and Data Mining
Notice the misnomer for data mining. Shouldn’t it be knowledge mining?
![Page 8: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/8.jpg)
8
Steps of a KDD Process
Learning the application domain relevant prior knowledge and goals of
application Gathering and integrating of data Cleaning and preprocessing data (may
take 60% of effort!) Reducing and projecting data
Find useful features, dimensionality/variable reduction,…
Choosing mining functions and algorithms
summarization, classification, regression, association,…
Data mining: search for patterns of interest
Evaluating results Interpretation: analysis of results
visualization, alteration, removing redundant patterns,…
Use of discovered knowledge
![Page 9: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/9.jpg)
9
Data Mining – On What Kind of Data?
Flat Files Generic Data
Relational & Object-Relational Databases Object-Oriented Databases
Multimedia Data Text Databases Audio, Image, and Video Databases
Business Data Transactional Databases
Engineering Data Spatial databases Temporal and Time-series databases
WWW Data
![Page 10: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/10.jpg)
10
Data Mining Examples
Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations.
It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics.
And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits.
Finally, it enables them to "drill down" into summary information to view detail transactional data.
![Page 11: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/11.jpg)
11
Data Mining Examples
With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history.
By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.
Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers.
American Express can suggest products to its cardholders based on analysis of their monthly expenditures.
![Page 12: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/12.jpg)
12
Data Mining Examples
WalMart is pioneering massive data mining to transform its supplier relationships.
WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse.
WalMart allows more than 3,500 suppliers, to access data on their products and perform data analyses.
These suppliers use this data to identify customer buying patterns at the store display level.
They use this information to manage local store inventory and identify new merchandising opportunities.
![Page 13: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/13.jpg)
13
Business Data Mining Examples
The NBA is exploring a data mining application that can be used in conjunction with image recordings of basketball games.
The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies.
For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one!
A coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video footage.
Those clips show a very successful pick-and-roll play in which Price draws the Knick's defense and then finds Williams for an open jump shot.
![Page 14: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/14.jpg)
14
Data Mining Functions and Algorithms
Association Rules Data can be mined to identify associations.
The butter->bread example is an example of associative mining. To find rules like “inside(x, city) near(x, highway)”.
Classification and Prediction Classify data based on the values in a classifying attribute,
e.g., classify countries based on climate classify cars based on gas mileage
Stored data is used to locate data in predetermined groups. A restaurant chain could mine customer purchase data to
determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.
![Page 15: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/15.jpg)
15
Data Mining Functions and Algorithms
Clustering Data items are grouped according to logical relationships or
consumer preferences. Data can be mined to identify market segments or consumer
affinities. To cluster houses to find distribution patterns.
Sequential patterns Data is mined to anticipate behavior patterns and trends.
An outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.
To find and characterize similar sequences and deviation data, e.g., stock analysis.
To find segment-wise or total cycles or periodic behaviors in time-related data.
![Page 16: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/16.jpg)
16
Data Mining – Linear Classification
D ebt
Incom e
Loan
N o Loan
$T
A simple linear classification boundary for the loan data set: shaded region denotes class “no loan”
![Page 17: Introduction to Data Mining Jiang Li](https://reader036.vdocuments.mx/reader036/viewer/2022062405/555cce44d8b42a64718b5ba4/html5/thumbnails/17.jpg)
17
Data Mining - Confluence of Multiple Disciplines