data mining & machine learning applications

20
Data Mining & Machine Learning Applications David TJ Huang

Upload: others

Post on 30-Nov-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining & Machine Learning Applications

Data Mining &

Machine Learning

Applications

David TJ Huang

Page 2: Data Mining & Machine Learning Applications

Learning Outcomes

Understand what is pattern and what is noise

Recognize the influence of input data and

preprocessing to the mining results

Connect data mining and machine learning

algorithms to real world problems

Know how pattern & noise are defined differently in

different problems

Page 3: Data Mining & Machine Learning Applications

Real World Applications

PART III

Page 4: Data Mining & Machine Learning Applications

Algorithms and the World All of these algorithms and techniques…

• Association Rule Mining (ARM)

• Clustering

• Classification

• Change Detection

• Artificial Neural Nets (ANN)

• Genetic Algorithms (GA)

• Natural Language Processing (NLP)

• Graph Theories

• Regression

• Etc.

Page 5: Data Mining & Machine Learning Applications

Algorithms and the World How are they applied onto real world data to improve our lives?

• Google search queries

• Tweets

• Facebook posts / check-ins / messages

• Web click streams

• Browsing history

• Emails

Page 6: Data Mining & Machine Learning Applications

The Akinator Similar to the 20 question game…

• Given a series of 20 yes and no questions, the goal is to try and guess

the person on your mind

http://en.akinator.com

Page 7: Data Mining & Machine Learning Applications

The Akinator The actual algorithm of the Akinator is unknown, but let’s try and

build one…

• How can we use the knowledge we have on machine learning

algorithms to build a replica of the Akinator?

Page 8: Data Mining & Machine Learning Applications

The Akinator The actual algorithm of the Akinator is unknown, but let’s try and

build one…

• How can we use the knowledge we have on machine learning

algorithms to build a replica of the Akinator?

Let’s start with the data input…

• What is the data input and what does it consist of?

• How to represent the data input?

• Is there any preprocessing needed?

• If so, what sort of preprocessing should we do?

• Cleaning, Integration, Reduction, Transformation

Page 9: Data Mining & Machine Learning Applications

The Akinator Things to consider…

• What are the patterns that we are finding/using?

• What are the potential noise in the data/system?

• How to deal with the noise?

• Supervised or Unsupervised?

• If supervised, what task? If unsupervised, what task?

• Can we do it using both supervised and unsupervised?

• How do you build up your model?

• How do you evaluate your model?

• Is there a need to do Change Detection?

Page 10: Data Mining & Machine Learning Applications

Google Flu Trends A web-based tool for real-time surveillance of disease outbreaks

• Use IP addresses & keyword searches that are related to the flu

• Symptoms of an influenza complication

• Influenza complication

• Specific influenza symptom

• General influenza symptom

• Cold/flu remedy

• Term for influenza

• Antibiotic medication

• Related disease

http://www.google.com/flutrends

Page 11: Data Mining & Machine Learning Applications

Google Flu Trends A web-based tool for real-time surveillance of disease outbreaks

• Use IP addresses & keyword searches that are related to the flu

Page 12: Data Mining & Machine Learning Applications
Page 13: Data Mining & Machine Learning Applications

Google Flu Trends Again, the actual models are unknown…

• So can we go through the same process and come up with a possible

replica of the model?

Page 14: Data Mining & Machine Learning Applications

Google Flu Trends Again, the actual models are unknown…

• So can we go through the same process and come up with a possible

replica of the model?

Let’s start with the data input…

• What is the data input and what does it consist of?

• How to represent the data input?

• Is there any preprocessing needed?

• If so, what sort of preprocessing should we do?

• Cleaning, Integration, Reduction, Transformation

Page 15: Data Mining & Machine Learning Applications

Google Flu Trends Things to consider…

• What are the patterns that we are finding/using?

• What are the potential noise in the data/system?

• How to deal with the noise?

• Supervised or Unsupervised?

• If supervised, what task? If unsupervised, what task?

• Can we do it using both supervised and unsupervised?

• How do you build up your model?

• How do you evaluate your model?

• Is there a need to do Change Detection?

Page 16: Data Mining & Machine Learning Applications

Recommending and Overfitting When the system is making recommendations AND updating the

model based the results of those recommendations…

• It is very likely that you are going to overfit

• Input data is not clean per se

• Other examples:

• Page rank & website suggestions

Page 17: Data Mining & Machine Learning Applications

Netflix

Page 18: Data Mining & Machine Learning Applications

Netflix

Page 19: Data Mining & Machine Learning Applications

Netflix

Page 20: Data Mining & Machine Learning Applications

References • The Akinator

• http://en.akinator.com

• Google Flu Trend

• http://www.google.com/flutrends

• Assessing Google Flu Trends Performance in the United States

during the 2009 Influenza Virus A (H1N1) Pandemic

• http://journals.plos.org/plosone/article?id=10.1371/journal.pone.

0023610