introduction to machine learning

17
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR

Upload: king

Post on 05-Jan-2016

54 views

Category:

Documents


1 download

DESCRIPTION

Introduction to Machine Learning. Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR. Bibliography. Machine Learning , Tom Mitchell ( McGraw Hill, 1997) Principal Component Analysis , Ian Jolliffe (Springer-Verlag, 2002) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Introduction to Machine Learning

Alejandro Ceccatto

Instituto de Física Rosario CONICET-UNR

Page 2: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Bibliography

Machine Learning, Tom Mitchell (McGraw Hill, 1997)

Principal Component Analysis, Ian Jolliffe (Springer-Verlag, 2002)

An introduction to SVM and other kernel-based learning methods, Cristianini-Shawe Taylor (Cambrige, 2000)

The Elements of Statistical Learning, Hastie-Tibshirani-Friedman (Springer, 2001)

Page 3: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Machine Learning

• The field of Machine Learning is concerned with the question of how to construct computer programs that automatically improve with experience

• The purpose of this course is to present key algorithms and theory that form the core of Machine Learning

Page 4: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Machine Learning

• Interdisciplinary nature of the material:

Statistics, Artificial Intelligence, Information Theory, etc.

• Basic question:

How to program computers to learn?

Page 5: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Machine Learning

Intelligent Data Analysis:

• Intelligent application of data analytic tools (Statistics)

• Application of “intelligent” data analytic tools (Machine Learning)

Modern world: Data-driven world (industrial, commercial, financial, scientific activities)

Page 6: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

• Recent progress in algorithms and theory

• Growing flood of online data

• Computational power available

Page 7: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

• Niches for Machine Learning:

– Data Mining: using historical data to improve decisions

Medical records medical knowledge

– Software applications we can’t program by handAutonomous driving

Speech recognition

– Self customizing programsNewsreader that learns user interests

Page 8: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

• Data Mining

– Data: Recorded facts– Information: Set of patterns, or expectations, that

underlie the data– Data Mining: Extraction of implicit, previously

unknown, and potentially useful information from data

– Machine Learning: Provides the technical basis of data mining

Page 9: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

• Typical Datamining Tasks

– Risk of Emergency Cesarean Section

Given

• 9714 patient records, each describing a pregnancy and birth

• Each patient record contains 215 features

Learn to predict:

• Classes of patients at high risk for emergency cesarean section

Page 10: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

Page 11: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

One of the learned rules:

IF No previous vaginal delivery, and Abnormal 2nd Trimester

Ultrasound, and Malpresentation at admission

THEN Probability of Emergency C-Section 0.6

Over training data: 16/41=0.63

Over Test Data: 12/20=0.60

Page 12: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

– Credit Risk Analysis

Page 13: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

– Customer Retention

Page 14: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

– Problems Too Difficult to Program by Hand

Page 15: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Why Machine Learning?

– Software that Customizes to User

Page 16: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Where is This Headed?

Today: tip of the iceberg

• First-generation algorithms: neural nets, decision trees, regression....

• Applied to well-formated databases

Tomorrow: enormous impact

• Learn across mixed-media data and multiple databases

• Learn by active experimentation

• Learn decisions rather than predictions

• Cumulative, life-long learning

Page 17: Introduction to  Machine Learning

1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006

Where is This Headed?

Autonomous entities?

“I'm sorry Dave; I can't let you do that.” –HAL 9000 in 2001: A Space Odyssey, by Arthur Clarke