introduction to data miningeakbas/cs5423/slides/datamining-1.pdf · 2019-11-13 · introduction to...
TRANSCRIPT
Introduction to Data Mining
Principles of Database Systems
November 13, 2019
CS5423 Introduction to Data Mining November 13, 2019 1 / 21
Outline
1 What is data?
2 What is data science/mining?
3 Graph Mining
CS5423 Introduction to Data Mining November 13, 2019 2 / 21
What is data?
Outline
1 What is data?
2 What is data science/mining?
3 Graph Mining
CS5423 Introduction to Data Mining November 13, 2019 3 / 21
What is data?
Data is Big!
I Big Data
I Data is everywhere.
CS5423 Introduction to Data Mining November 13, 2019 4 / 21
What is data?
Data Growth
I An article by Forbesstates that
• Data is growing fasterthan ever before
• By the year 2020,about 1.7 megabytesof new informationwill be created everysecond for everyhuman being on theplanet.
CS5423 Introduction to Data Mining November 13, 2019 5 / 21
What is data?
Types of Data
How many?
CS5423 Introduction to Data Mining November 13, 2019 6 / 21
What is data?
Types of Data
How many?I Time-series data
CS5423 Introduction to Data Mining November 13, 2019 6 / 21
What is data?
Types of Data
How many?I Time-series dataI Sequence data
CS5423 Introduction to Data Mining November 13, 2019 6 / 21
What is data?
Types of Data
How many?I Time-series dataI Sequence dataI Graphs, social
networks
CS5423 Introduction to Data Mining November 13, 2019 6 / 21
What is data?
Types of Data
How many?I Time-series dataI Sequence dataI Graphs, social
networksI Multimedia, WWW
data
CS5423 Introduction to Data Mining November 13, 2019 6 / 21
What is data?
Types of Data
How many?I Time-series dataI Sequence dataI Graphs, social
networksI Multimedia, WWW
dataI Text data
CS5423 Introduction to Data Mining November 13, 2019 6 / 21
What is data science/mining?
Outline
1 What is data?
2 What is data science/mining?
3 Graph Mining
CS5423 Introduction to Data Mining November 13, 2019 7 / 21
What is data science/mining?
CS5423 Introduction to Data Mining November 13, 2019 8 / 21
What is data science/mining?
What do data scientists do?
I Make discoveries while swimming indata
CS5423 Introduction to Data Mining November 13, 2019 9 / 21
What is data science/mining?
What do data scientists do?
I Make discoveries while swimming indata
I The statistics represent this significantand growing demand for data scientists.
• Data mining tops LinkedIn’s list of the“hottest skills of 2016”
• Best job in USA for 2016• 3,433: Number of Job Openings in
2016• #16 Highest Paying Job in Demand
in 2016Average Base Salary : $105,395:
CS5423 Introduction to Data Mining November 13, 2019 9 / 21
What is data science/mining?
What is Data science/mining?
I Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
CS5423 Introduction to Data Mining November 13, 2019 10 / 21
What is data science/mining?
What is Data science/mining?
I Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
I Storing, organizing and integrating hugeamount of unstructured data
CS5423 Introduction to Data Mining November 13, 2019 10 / 21
What is data science/mining?
What is Data science/mining?
I Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
I Storing, organizing and integrating hugeamount of unstructured data
I a.k.a. KDD (knowledge discovery indatabases)
CS5423 Introduction to Data Mining November 13, 2019 10 / 21
What is data science/mining?
Application of Data Science
I ?
CS5423 Introduction to Data Mining November 13, 2019 11 / 21
What is data science/mining?
Application of Data Science
I Internet search
CS5423 Introduction to Data Mining November 13, 2019 11 / 21
What is data science/mining?
Application of Data Science
I Internet searchI Recommender
systems
CS5423 Introduction to Data Mining November 13, 2019 11 / 21
What is data science/mining?
Application of Data Science
I Internet searchI Recommender
systemsI Biological
Classification
CS5423 Introduction to Data Mining November 13, 2019 11 / 21
What is data science/mining?
Application of Data Science
I Internet searchI Recommender
systemsI Biological
ClassificationI ...
CS5423 Introduction to Data Mining November 13, 2019 11 / 21
What is data science/mining?
Data Mining Tasks
1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data
CS5423 Introduction to Data Mining November 13, 2019 12 / 21
What is data science/mining?
Data Mining Tasks
1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data
• Classification: Is this A or B?
CS5423 Introduction to Data Mining November 13, 2019 12 / 21
What is data science/mining?
Data Mining Tasks
1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data
• Classification: Is this A or B?
• Anomaly detection: Is this weird?Fraud Detection
CS5423 Introduction to Data Mining November 13, 2019 12 / 21
What is data science/mining?
Data Mining Tasks
1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data
• Classification: Is this A or B?
• Anomaly detection: Is this weird?Fraud Detection
• Regression: How much? Howmany?
CS5423 Introduction to Data Mining November 13, 2019 12 / 21
What is data science/mining?
Data Mining Tasks - Continued
2 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)
CS5423 Introduction to Data Mining November 13, 2019 13 / 21
What is data science/mining?
Data Mining Tasks - Continued
2 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)
I Clustering: How is dataorganized?
CS5423 Introduction to Data Mining November 13, 2019 13 / 21
What is data science/mining?
Data Mining Tasks - Continued
2 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)
I Clustering: How is dataorganized?
I Association rule mining:Are these related?
CS5423 Introduction to Data Mining November 13, 2019 13 / 21
What is data science/mining?
What is dataset?
I Collection of data objects andtheir attributes
I Simple Case : n × d matrix• n objects with d dimension
each,• d columns are called variables,
features or attributes ofobjects
CS5423 Introduction to Data Mining November 13, 2019 14 / 21
What is data science/mining?
What is dataset?
I Collection of data objects andtheir attributes
I Simple Case : n × d matrix• n objects with d dimension
each,• d columns are called variables,
features or attributes ofobjects
CS5423 Introduction to Data Mining November 13, 2019 14 / 21
Graph Mining
Outline
1 What is data?
2 What is data science/mining?
3 Graph Mining
CS5423 Introduction to Data Mining November 13, 2019 15 / 21
Graph Mining
Graph Mining
I Graph
I Community Detection
I Community Search
CS5423 Introduction to Data Mining November 13, 2019 16 / 21
Graph Mining
Graphs
I Structured data representing relationship btw objects
I Important in modeling sophisticated structures and their interaction
I Formed by
• A set of vertices
• A set of edges
I Examples
• Computer networks
• Social networks
• Protein interactionnetworks
CS5423 Introduction to Data Mining November 13, 2019 17 / 21
Graph Mining
Graphs
I Structured data representing relationship btw objects
I Important in modeling sophisticated structures and their interaction
I Formed by
• A set of vertices
• A set of edges
I Examples
• Computer networks
• Social networks
• Protein interactionnetworks
CS5423 Introduction to Data Mining November 13, 2019 17 / 21
Graph Mining
Graphs
I Structured data representing relationship btw objects
I Important in modeling sophisticated structures and their interaction
I Formed by
• A set of vertices
• A set of edges
I Examples
• Computer networks
• Social networks
• Protein interactionnetworks
CS5423 Introduction to Data Mining November 13, 2019 17 / 21
Graph Mining
Graphs
I Structured data representing relationship btw objects
I Important in modeling sophisticated structures and their interaction
I Formed by
• A set of vertices
• A set of edges
I Examples
• Computer networks
• Social networks
• Protein interactionnetworks
CS5423 Introduction to Data Mining November 13, 2019 17 / 21
Graph Mining
Graphs
I Structured data representing relationship btw objects
I Important in modeling sophisticated structures and their interaction
I Formed by
• A set of vertices
• A set of edges
I Examples
• Computer networks
• Social networks
• Protein interactionnetworks
CS5423 Introduction to Data Mining November 13, 2019 17 / 21
Graph Mining
Graphs
I Structured data representing relationship btw objects
I Important in modeling sophisticated structures and their interaction
I Formed by
• A set of vertices
• A set of edges
I Examples
• Computer networks
• Social networks
• Protein interactionnetworks
CS5423 Introduction to Data Mining November 13, 2019 17 / 21
Graph Mining
Community Detection
I Graph Clustering: afundamental data miningproblem
I Discover densely connectedgroups in a large graph
• Many links within acommunity
• Few links betweencommunities
CS5423 Introduction to Data Mining November 13, 2019 18 / 21
Graph Mining
Community Detection
I Graph Clustering: afundamental data miningproblem
I Discover densely connectedgroups in a large graph
• Many links within acommunity
• Few links betweencommunities
CS5423 Introduction to Data Mining November 13, 2019 18 / 21
Graph Mining
Community Detection
I Graph Clustering: afundamental data miningproblem
I Discover densely connectedgroups in a large graph
• Many links within acommunity
• Few links betweencommunities
CS5423 Introduction to Data Mining November 13, 2019 18 / 21
Graph Mining
Community Detection
I Graph Clustering: afundamental data miningproblem
I Discover densely connectedgroups in a large graph
• Many links within acommunity
• Few links betweencommunities
CS5423 Introduction to Data Mining November 13, 2019 18 / 21
Graph Mining
Community Detection
I Widely used in many fields
• Social networks
• Biological networks
• Citation networks
CS5423 Introduction to Data Mining November 13, 2019 19 / 21
Graph Mining
Community Detection
I Widely used in many fields
• Social networks
• Biological networks
• Citation networks
CS5423 Introduction to Data Mining November 13, 2019 19 / 21
Graph Mining
Community Detection
I Widely used in many fields
• Social networks
• Biological networks
• Citation networks
CS5423 Introduction to Data Mining November 13, 2019 19 / 21
Graph Mining
Community Detection
I Widely used in many fields
• Social networks
• Biological networks
• Citation networks
CS5423 Introduction to Data Mining November 13, 2019 19 / 21
Graph Mining
Community Search
I Big Data; increasing size of the networks
• Expensive time/space cost to identify all communities
I Interested in the communities pertaining to a given vertex
I Local community detection (Community Search)
• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)
CS5423 Introduction to Data Mining November 13, 2019 20 / 21
Graph Mining
Community Search
I Big Data; increasing size of the networks
• Expensive time/space cost to identify all communities
I Interested in the communities pertaining to a given vertex
I Local community detection (Community Search)
• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)
CS5423 Introduction to Data Mining November 13, 2019 20 / 21
Graph Mining
Community Search
I Big Data; increasing size of the networks
• Expensive time/space cost to identify all communities
I Interested in the communities pertaining to a given vertex
I Local community detection (Community Search)
• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)
CS5423 Introduction to Data Mining November 13, 2019 20 / 21
Graph Mining
Community Search
I Big Data; increasing size of the networks
• Expensive time/space cost to identify all communities
I Interested in the communities pertaining to a given vertex
I Local community detection (Community Search)
• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)
CS5423 Introduction to Data Mining November 13, 2019 20 / 21
Graph Mining
Community Search
I Big Data; increasing size of the networks
• Expensive time/space cost to identify all communities
I Interested in the communities pertaining to a given vertex
I Local community detection (Community Search)
• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)
CS5423 Introduction to Data Mining November 13, 2019 20 / 21
Graph Mining
Any Questions?
CS5423 Introduction to Data Mining November 13, 2019 21 / 21