introduction to data miningeakbas/cs5423/slides/datamining-1.pdf · 2019-11-13 · introduction to...

54
Introduction to Data Mining Principles of Database Systems November 13, 2019 CS5423 Introduction to Data Mining November 13, 2019 1 / 21

Upload: others

Post on 03-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Introduction to Data Mining

Principles of Database Systems

November 13, 2019

CS5423 Introduction to Data Mining November 13, 2019 1 / 21

Page 2: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Outline

1 What is data?

2 What is data science/mining?

3 Graph Mining

CS5423 Introduction to Data Mining November 13, 2019 2 / 21

Page 3: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Outline

1 What is data?

2 What is data science/mining?

3 Graph Mining

CS5423 Introduction to Data Mining November 13, 2019 3 / 21

Page 4: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Data is Big!

I Big Data

I Data is everywhere.

CS5423 Introduction to Data Mining November 13, 2019 4 / 21

Page 5: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Data Growth

I An article by Forbesstates that

• Data is growing fasterthan ever before

• By the year 2020,about 1.7 megabytesof new informationwill be created everysecond for everyhuman being on theplanet.

CS5423 Introduction to Data Mining November 13, 2019 5 / 21

Page 6: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Types of Data

How many?

CS5423 Introduction to Data Mining November 13, 2019 6 / 21

Page 7: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Types of Data

How many?I Time-series data

CS5423 Introduction to Data Mining November 13, 2019 6 / 21

Page 8: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Types of Data

How many?I Time-series dataI Sequence data

CS5423 Introduction to Data Mining November 13, 2019 6 / 21

Page 9: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Types of Data

How many?I Time-series dataI Sequence dataI Graphs, social

networks

CS5423 Introduction to Data Mining November 13, 2019 6 / 21

Page 10: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Types of Data

How many?I Time-series dataI Sequence dataI Graphs, social

networksI Multimedia, WWW

data

CS5423 Introduction to Data Mining November 13, 2019 6 / 21

Page 11: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data?

Types of Data

How many?I Time-series dataI Sequence dataI Graphs, social

networksI Multimedia, WWW

dataI Text data

CS5423 Introduction to Data Mining November 13, 2019 6 / 21

Page 12: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Outline

1 What is data?

2 What is data science/mining?

3 Graph Mining

CS5423 Introduction to Data Mining November 13, 2019 7 / 21

Page 13: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

CS5423 Introduction to Data Mining November 13, 2019 8 / 21

Page 14: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

What do data scientists do?

I Make discoveries while swimming indata

CS5423 Introduction to Data Mining November 13, 2019 9 / 21

Page 15: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

What do data scientists do?

I Make discoveries while swimming indata

I The statistics represent this significantand growing demand for data scientists.

• Data mining tops LinkedIn’s list of the“hottest skills of 2016”

• Best job in USA for 2016• 3,433: Number of Job Openings in

2016• #16 Highest Paying Job in Demand

in 2016Average Base Salary : $105,395:

CS5423 Introduction to Data Mining November 13, 2019 9 / 21

Page 16: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

What is Data science/mining?

I Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data

CS5423 Introduction to Data Mining November 13, 2019 10 / 21

Page 17: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

What is Data science/mining?

I Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data

I Storing, organizing and integrating hugeamount of unstructured data

CS5423 Introduction to Data Mining November 13, 2019 10 / 21

Page 18: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

What is Data science/mining?

I Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data

I Storing, organizing and integrating hugeamount of unstructured data

I a.k.a. KDD (knowledge discovery indatabases)

CS5423 Introduction to Data Mining November 13, 2019 10 / 21

Page 19: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Application of Data Science

I ?

CS5423 Introduction to Data Mining November 13, 2019 11 / 21

Page 20: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Application of Data Science

I Internet search

CS5423 Introduction to Data Mining November 13, 2019 11 / 21

Page 21: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Application of Data Science

I Internet searchI Recommender

systems

CS5423 Introduction to Data Mining November 13, 2019 11 / 21

Page 22: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Application of Data Science

I Internet searchI Recommender

systemsI Biological

Classification

CS5423 Introduction to Data Mining November 13, 2019 11 / 21

Page 23: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Application of Data Science

I Internet searchI Recommender

systemsI Biological

ClassificationI ...

CS5423 Introduction to Data Mining November 13, 2019 11 / 21

Page 24: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Data Mining Tasks

1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data

CS5423 Introduction to Data Mining November 13, 2019 12 / 21

Page 25: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Data Mining Tasks

1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data

• Classification: Is this A or B?

CS5423 Introduction to Data Mining November 13, 2019 12 / 21

Page 26: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Data Mining Tasks

1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data

• Classification: Is this A or B?

• Anomaly detection: Is this weird?Fraud Detection

CS5423 Introduction to Data Mining November 13, 2019 12 / 21

Page 27: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Data Mining Tasks

1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data

• Classification: Is this A or B?

• Anomaly detection: Is this weird?Fraud Detection

• Regression: How much? Howmany?

CS5423 Introduction to Data Mining November 13, 2019 12 / 21

Page 28: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Data Mining Tasks - Continued

2 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)

CS5423 Introduction to Data Mining November 13, 2019 13 / 21

Page 29: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Data Mining Tasks - Continued

2 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)

I Clustering: How is dataorganized?

CS5423 Introduction to Data Mining November 13, 2019 13 / 21

Page 30: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

Data Mining Tasks - Continued

2 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)

I Clustering: How is dataorganized?

I Association rule mining:Are these related?

CS5423 Introduction to Data Mining November 13, 2019 13 / 21

Page 31: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

What is dataset?

I Collection of data objects andtheir attributes

I Simple Case : n × d matrix• n objects with d dimension

each,• d columns are called variables,

features or attributes ofobjects

CS5423 Introduction to Data Mining November 13, 2019 14 / 21

Page 32: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

What is data science/mining?

What is dataset?

I Collection of data objects andtheir attributes

I Simple Case : n × d matrix• n objects with d dimension

each,• d columns are called variables,

features or attributes ofobjects

CS5423 Introduction to Data Mining November 13, 2019 14 / 21

Page 33: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Outline

1 What is data?

2 What is data science/mining?

3 Graph Mining

CS5423 Introduction to Data Mining November 13, 2019 15 / 21

Page 34: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Graph Mining

I Graph

I Community Detection

I Community Search

CS5423 Introduction to Data Mining November 13, 2019 16 / 21

Page 35: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Graphs

I Structured data representing relationship btw objects

I Important in modeling sophisticated structures and their interaction

I Formed by

• A set of vertices

• A set of edges

I Examples

• Computer networks

• Social networks

• Protein interactionnetworks

CS5423 Introduction to Data Mining November 13, 2019 17 / 21

Page 36: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Graphs

I Structured data representing relationship btw objects

I Important in modeling sophisticated structures and their interaction

I Formed by

• A set of vertices

• A set of edges

I Examples

• Computer networks

• Social networks

• Protein interactionnetworks

CS5423 Introduction to Data Mining November 13, 2019 17 / 21

Page 37: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Graphs

I Structured data representing relationship btw objects

I Important in modeling sophisticated structures and their interaction

I Formed by

• A set of vertices

• A set of edges

I Examples

• Computer networks

• Social networks

• Protein interactionnetworks

CS5423 Introduction to Data Mining November 13, 2019 17 / 21

Page 38: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Graphs

I Structured data representing relationship btw objects

I Important in modeling sophisticated structures and their interaction

I Formed by

• A set of vertices

• A set of edges

I Examples

• Computer networks

• Social networks

• Protein interactionnetworks

CS5423 Introduction to Data Mining November 13, 2019 17 / 21

Page 39: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Graphs

I Structured data representing relationship btw objects

I Important in modeling sophisticated structures and their interaction

I Formed by

• A set of vertices

• A set of edges

I Examples

• Computer networks

• Social networks

• Protein interactionnetworks

CS5423 Introduction to Data Mining November 13, 2019 17 / 21

Page 40: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Graphs

I Structured data representing relationship btw objects

I Important in modeling sophisticated structures and their interaction

I Formed by

• A set of vertices

• A set of edges

I Examples

• Computer networks

• Social networks

• Protein interactionnetworks

CS5423 Introduction to Data Mining November 13, 2019 17 / 21

Page 41: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Detection

I Graph Clustering: afundamental data miningproblem

I Discover densely connectedgroups in a large graph

• Many links within acommunity

• Few links betweencommunities

CS5423 Introduction to Data Mining November 13, 2019 18 / 21

Page 42: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Detection

I Graph Clustering: afundamental data miningproblem

I Discover densely connectedgroups in a large graph

• Many links within acommunity

• Few links betweencommunities

CS5423 Introduction to Data Mining November 13, 2019 18 / 21

Page 43: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Detection

I Graph Clustering: afundamental data miningproblem

I Discover densely connectedgroups in a large graph

• Many links within acommunity

• Few links betweencommunities

CS5423 Introduction to Data Mining November 13, 2019 18 / 21

Page 44: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Detection

I Graph Clustering: afundamental data miningproblem

I Discover densely connectedgroups in a large graph

• Many links within acommunity

• Few links betweencommunities

CS5423 Introduction to Data Mining November 13, 2019 18 / 21

Page 45: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Detection

I Widely used in many fields

• Social networks

• Biological networks

• Citation networks

CS5423 Introduction to Data Mining November 13, 2019 19 / 21

Page 46: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Detection

I Widely used in many fields

• Social networks

• Biological networks

• Citation networks

CS5423 Introduction to Data Mining November 13, 2019 19 / 21

Page 47: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Detection

I Widely used in many fields

• Social networks

• Biological networks

• Citation networks

CS5423 Introduction to Data Mining November 13, 2019 19 / 21

Page 48: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Detection

I Widely used in many fields

• Social networks

• Biological networks

• Citation networks

CS5423 Introduction to Data Mining November 13, 2019 19 / 21

Page 49: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Search

I Big Data; increasing size of the networks

• Expensive time/space cost to identify all communities

I Interested in the communities pertaining to a given vertex

I Local community detection (Community Search)

• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)

CS5423 Introduction to Data Mining November 13, 2019 20 / 21

Page 50: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Search

I Big Data; increasing size of the networks

• Expensive time/space cost to identify all communities

I Interested in the communities pertaining to a given vertex

I Local community detection (Community Search)

• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)

CS5423 Introduction to Data Mining November 13, 2019 20 / 21

Page 51: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Search

I Big Data; increasing size of the networks

• Expensive time/space cost to identify all communities

I Interested in the communities pertaining to a given vertex

I Local community detection (Community Search)

• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)

CS5423 Introduction to Data Mining November 13, 2019 20 / 21

Page 52: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Search

I Big Data; increasing size of the networks

• Expensive time/space cost to identify all communities

I Interested in the communities pertaining to a given vertex

I Local community detection (Community Search)

• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)

CS5423 Introduction to Data Mining November 13, 2019 20 / 21

Page 53: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Community Search

I Big Data; increasing size of the networks

• Expensive time/space cost to identify all communities

I Interested in the communities pertaining to a given vertex

I Local community detection (Community Search)

• Given a (set of) query node(s), finding all densely connected subgraphsof the input graph, containing the given query node(s)

CS5423 Introduction to Data Mining November 13, 2019 20 / 21

Page 54: Introduction to Data Miningeakbas/CS5423/slides/DataMining-1.pdf · 2019-11-13 · Introduction to Data Mining Principles of Database Systems November13,2019 CS5423 Introduction to

Graph Mining

Any Questions?

CS5423 Introduction to Data Mining November 13, 2019 21 / 21