data11002 introduction to machine learning

34
DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyv¨ onen and Janne Lepp¨ a-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer and Jyrki Kivinen) November 2nd–December 15th 2017 1,

Upload: others

Post on 16-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA11002 Introduction to Machine Learning

DATA11002

Introduction to Machine Learning

Lecturer: Teemu RoosTAs: Ville Hyvonen and Janne Leppa-aho

Department of Computer ScienceUniversity of Helsinki

(based in part on material by Patrik Hoyer and Jyrki Kivinen)

November 2nd–December 15th 2017

1 ,

Page 2: DATA11002 Introduction to Machine Learning

Introduction

I Practical details of the courseI Lectures

I Exercises

I Exam

I Grading

I Course outline

I What is machine learning? Motivation & examples

I Definition

I Relation to other fields

I Examples

2 ,

Page 3: DATA11002 Introduction to Machine Learning

Practical details (1)

I Lectures:

I November 2nd (today) – December 15th

I Thursdays at 2pm-4pm and Fridays at 10am-12pm in ExactumCK112

I Lecturer: Teemu Roos(Exactum A322, [email protected])

I Language: English

I Based on the course textbook (next slide)

I (previous instances of this course have used differenttextbooks)

3 ,

Page 4: DATA11002 Introduction to Machine Learning

Practical details (2)

I Textbook:

I authors: Gareth James, Daniela Witten,Trevor Hastie and Robert Tibshirani

I title: An Introduction to StatisticalLearning – with Applications in R

I publisher: Springer (2013, first edition)

I web page:www-bcf.usc.edu/~gareth/ISL/

I we’ll cover the whole book except splinesand generalized additive models (GAMs) –and include some additional Bayesian stuff

4 ,

Page 5: DATA11002 Introduction to Machine Learning

Practical details (3)

I Lecture material

I this set of slides (by Hoyer/Kivinen/Roos) is intended for useas part of the actual lectures, together with the blackboard etc.

I we will cover some topics in more detail than the textbook(and some less)

I in particular some additional detail is needed for homeworkproblems

I both the selected parts of the textbook as well as additionalmaterial indicated on the course homepage are requiredmaterial for the exam

5 ,

Page 6: DATA11002 Introduction to Machine Learning

Practical details (4)

I Exercises:

I Two kinds:I mathematical exercises (pencil-and-paper)I computer exercises (support given in R but Python is a good

choice too)

I Problem set handed out every Friday, focusing on topics fromthat week’s lectures

I Solutions returned at the exercise sessions

I NB: You get points only by attending your own exercise group(not group 99)

I Solutions can be returned by email only in exceptionalcircumstances, not including being busy at work: [email protected]

I Language of exercise sessions: English

I Exercise points make up 40% of your total grade, must get atleast half the points to be eligible for the course exam.

6 ,

Page 7: DATA11002 Introduction to Machine Learning

Practical details (5)

I Exercises this week:

I This week we offered voluntary R “tutorials”

I Wednesday Nov 1st at 4pm (C222) andThursday Nov 2nd at 12pm (D123)

I Instruction on R and its features used on this course

I Voluntary, no points awarded. Recommended for everyone notpreviously familiar with R.

I Bring you own laptop, with R (and possible RStudio)installed.

7 ,

Page 8: DATA11002 Introduction to Machine Learning

Practical details (6)

I Course exam (these can sometimes change with short notice!):

I December 19th at 9:00am (NB: not 9:15am)

I Makes 60% of your course grade

I Must get a minimum of half the points of the exam to passthe course

I Pencil-and-paper problems, similar style as in exercises (also‘essay’ or ‘explain’ problems)

I (Note: To be eligible to take a ‘separate exam’ you need to first

complete some programming assignments. These will be available

on the course web page a bit later. However since you are here at

the lecture, this probably does not concern you.)

I You may answer exam problems also in Finnish or Swedish.

8 ,

Page 9: DATA11002 Introduction to Machine Learning

Practical details (8)

I Prerequisites:

I Mathematics: Basics of probability theory and statistics, linearalgebra (i.e., vectors and matrices) and real analysis (i.e.,derivatives, etc.)

I Computer science: Good programming skills (but no previousfamiliarity with R necessary)

9 ,

Page 10: DATA11002 Introduction to Machine Learning

Related courses

I Various advanced Data Science courses:

I Advanced Course in Machine Learning (period IV)

I Introduction to Bayesian Inference (period II)

I Computational Statistics I-II (periods I-II)

I High Dimensional Statistics (period II)

I Introduction to Data Science (period I)

I Data Mining (self study, plus optional project)

I Deep Learning (period II)

I Seminar: Deep Learning for Natural Language Processing(periods III-IV)

I Seminar: Machine Learning Methods for Fossil Data Analysis(period II)

I Lots of courses at Aalto as well!

10 ,

Page 11: DATA11002 Introduction to Machine Learning

Practical details (9)

I Course material:

I Webpage (public information about the course):https://courses.helsinki.fi/en/data11002/119123177

I NB: You should have signed up on the departmentregistration system

I Help?

I Ask the assistants/lecturer at exercises/lectures

I Contact assistants/lecturer

11 ,

Page 12: DATA11002 Introduction to Machine Learning

Course outline

I Introduction

I Ingredients of machine learning

I task, models, data

I evaluation and model selection

I Supervised learning

I classification

I regression

I Unsupervised learning

I clustering

I dimension reduction

12 ,

Page 13: DATA11002 Introduction to Machine Learning

What is machine learning?

I Definition:

machine = computer, computer program (in this course)

learning = improving performance on a given task, basedon experience / examples

I In other words

I instead of the programmer writing explicit rules for how tosolve a given problem, the programmer instructs the computerhow to learn from examples

I in many cases the computer program can even become betterat the task than the programmer is!

13 ,

Page 14: DATA11002 Introduction to Machine Learning

Example 1: tic-tac-toe

I How to program the computer to play tic-tac-toe?

I Option A: The programmer writes explicit rules, e.g. ‘if theopponent has two in a row, and the third is free, stop it byplacing your mark there’, etc (lots of work, difficult, not at allscalable!)

I Option B: Go through the game tree, choose optimally (fornon-trivial games, must be combined with some heuristics torestrict tree size)

I Option C: Let the computer try out various strategies byplaying against itself and others, and noting which strategieslead to winning and which to losing (=‘machine learning’)

14 ,

Page 15: DATA11002 Introduction to Machine Learning

I Arthur Samuel (50’s and 60’s):

I Computer program that learns to play checkers

I Program plays against itself thousands of times, learns whichpositions are good and which are bad (i.e. which lead towinning and which to losing)

I The computer program eventually becomes much better thanthe programmer.

15 ,

Page 16: DATA11002 Introduction to Machine Learning

Example 2: spam filter

I Programmer writes rules: “If it contains ‘viagra’ then it isspam.” (difficult, not user-adaptive)

I The user marks which mails are spam, which are legit, and thecomputer learns itself what words are predictive

Example 2: spam filter

� X is the set of all possible emails (strings)

� Y is the set { spam, non-spam }

From: [email protected]

Subject: viagra

cheap meds...spam

From: [email protected]

Subject: important information

here’s how to ace the test...non-spam

......

From: [email protected]

Subject: you need to see this

how to win $1,000,000...?

6 , 16 ,

Page 17: DATA11002 Introduction to Machine Learning

Problem setup

I One definition of machine learning: A computer programimproves its performance on a given task with experience(i.e. examples, data).

I So we need to separate

I Task: What is the problem that the program is solving?

I Performance measure: How is the performance of the program(when solving the given task) evaluated?

I Experience: What is the data (examples) that the program isusing to improve its performance?

17 ,

Page 18: DATA11002 Introduction to Machine Learning

Related scientific disciplines (1)

I Artificial Intelligence (AI)

I Machine learning can be seen as ‘one approach’ towardsimplementing ‘intelligent’ machines (or at least machines thatbehave in a seemingly intelligent way).

I Artificial neural networks, computational neuroscience

I Inspired by and trying to mimic the function of biologicalbrains, in order to make computers that learn from experience.Modern machine learning really grew out of the neuralnetworks boom in the 1980’s and early 1990’s.

I Pattern recognition

I Recognizing objects and identifying people in controlled oruncontrolled settings, from images, audio, etc. Such taskstypically require machine learning techniques.

18 ,

Page 19: DATA11002 Introduction to Machine Learning

Availability of data

I These days it is very easy toI collect data (sensors are cheap, much information digital)I store data (hard drives are big and cheap)I transmit data (essentially free on the internet).

I The result? Everybody is collecting large quantities of data.I Businesses: shops (market-basket data), search engines (web

pages and user queries), financial sector (stocks, bonds,currencies etc), manufacturing (sensors of all kinds), socialnetworking sites (facebook, twitter), anybody with a webserver (hits, user activity)

I Science: genomes sequenced, gene expression data,experiments in high-energy physics, images of remote galaxies,global ecosystem monitoring data, drug research anddevelopment, public health data

I But how to benefit from it? Analysis is becoming key!

19 ,

Page 20: DATA11002 Introduction to Machine Learning

Big Data

I one definition: data of a very large size, typically to the extentthat its manipulation and management present significantlogistical challenges (Oxford English Dictionary)

I 3V: volume, velocity, and variety (Doug Laney, 2001)

I a database may be able to handle a lot of data, but you can’timplement a machine learning algorithm as an SQL query

I on this course we do not consider technical issues relating toextremely large data sets

I basic principles of machine learning still apply, but manyalgorithms may be difficult to implement efficiently

20 ,

Page 21: DATA11002 Introduction to Machine Learning

Related scientific disciplines (2)

I Data miningI Trying to identify interesting and useful associations and

patterns in huge datasets

I Focus on scalable algorithms

I Example: shopping basket analysis

I StatisticsI historically, introductory courses on statistics tend to focus on

hypothesis testing and some other basic problemsI however there’s a lot more to statistics than hypothesis testingI there is a lot of interaction between research in machine

learning, data mining and statistics

21 ,

Page 22: DATA11002 Introduction to Machine Learning

Example 3

I Prediction of search queries

I The programmer provides a standard dictionary (words andexpressions change!)

I Previous search queries are used as examples!

22 ,

Page 23: DATA11002 Introduction to Machine Learning

Example 4

I Ranking search results:

I Various criteria forranking results

I What do users click onafter a given search?Search engines canlearn what users arelooking for bycollecting queries andthe resulting clicks.

23 ,

Page 24: DATA11002 Introduction to Machine Learning

Example 5

I Detecting credit card fraud

I Credit card companies typically end uppaying for fraud (stolen cards, stolen cardnumbers)

I Useful to try to detect fraud, for instancelarge transactions

I Important to be adaptive to the behaviorsof customers, i.e. learn from existing datahow users normally behave, and try todetect ‘unusual’ transactions

24 ,

Page 25: DATA11002 Introduction to Machine Learning

Example 6

I Self-driving cars:

I Sensors (radars,cameras) superior tohumans

I How to make thecomputer reactappropriately to thesensor data?

25 ,

Page 26: DATA11002 Introduction to Machine Learning

Example 7

I Character recognition:

I Automatically sortingmail (handwrittencharacters)

I Digitizing old booksand newspapers intoeasily searchableformat (printedcharacters)

26 ,

Page 27: DATA11002 Introduction to Machine Learning

Example 8

I Recommendation systems(‘collaborative filtering’):

I Amazon: ”Customerswho bought X alsobought Y ”...

I Netflix: ”Based onyour movie ratings, youmight enjoy...”

Challenge: One milliondollars ($1,000,000)prize money recentlyawarded!

4 5 5 1 2

3 4 3

1 4 1 5 1

? 4 1 ?2 1 1 5

1 1 4 5

4 5 5

2 3 3

Fargo

Seven

Leon

Aliens

Avatar

Bill

Jack

Linda

John

Lucy

27 ,

Page 28: DATA11002 Introduction to Machine Learning

Example 9

I Machine translation:

I Traditional approach: Dictionary and explicit grammar

I More recently, statistical machine translation based onexample data is increasingly being used

28 ,

Page 29: DATA11002 Introduction to Machine Learning

Example 10

I Online store websiteoptimization:

I What items to present,what layout?

I What colors to use?

I Can significantly affectsales volume

I Experiment, andanalyze the results!(lots of decisions onhow exactly toexperiment and how toensure meaningfulresults)

29 ,

Page 30: DATA11002 Introduction to Machine Learning

Example 11

I Mining chat and discussion forums

I Breaking news

I Detecting outbreaks of infectious disease

I Tracking consumer sentiment about companies / products

30 ,

Page 31: DATA11002 Introduction to Machine Learning

Example 12

I Real-time sales andinventory management

I Picking up quickly onnew trends (what’s hotat the moment?)

I Deciding on what toproduce or order

31 ,

Page 32: DATA11002 Introduction to Machine Learning

Example 13

I Prediction of friends in Facebook, or prediction of who you’dlike to follow on Twitter.

32 ,

Page 33: DATA11002 Introduction to Machine Learning

What about privacy?

I Users are surprisingly willing to sacrifice privacy to obtainuseful services and benefits

I Regardless of what position you take on this issue, it isimportant to know what can and what cannot be done withvarious types information (i.e. what the dangers are)

I ‘Privacy-preserving data mining’

I What type of statistics/data can be released without exposingsensitive personal information? (e.g. government statistics)

I Developing data mining algorithms that limit exposure of userdata (e.g. ‘Collaborative filtering with privacy’, Canny 2002)

33 ,

Page 34: DATA11002 Introduction to Machine Learning

We’re in this together. Let’s do it!

34 ,