machine learning lecture 2 basics

Machine Learning: BasicsApplied Machine Learning: Unit 1, Lecture 1

Anantharaman Narayana Iyer

narayana dot Anantharaman at gmail dot com

9th Jan 2016

Types of Learning Algorithms• Supervised

• Given a set of pairs (x, y) where y is a label (or class) and x is an observation, discover a function that assigns the correct labels to the x.

• Unsupervised• The data is unlabelled. We need to explore the data to discover the intrinsic structures in them

• Semi supervised• Part of the data is labelled while the rest is unlabelled. The labelled data is used to bootstrap. For

example deep learning architectures leverage the vast amount of unlabelled data available over the web and use a small quantity of labelled data for finetuning.

• Reinforcement• Reinforcement learning (RL) is learning by interacting with an environment. An RL agent learns from the

consequences of its actions, rather than from being explicitly taught and it selects its actions on basis of its past experiences (exploitation) and also by new choices (exploration), which is essentially trial and error learning.

x1

x2

Supervised Learning

L1

L2

L5

L3

L4

Class = True

Class = False

Key Concepts• Supervised learning is a technique where the classifier is

trained using training examples• The training examples contain the input attributes

(Features) and the expected outputs.• In the Fig, X1 and X2 are the features

• The input typically is a n-dimensional vector and output may have 1 or more dimensions

• A Binary Classifier classifies the input vector in to one of the two states• Illustrated by Red and Purple boxes in the fig

• A linearly separable system is one where the class labels can be separated by a linear decision boundary.• The straight lines L1, L2, L3, L4, L5 show different

decision boundaries in the Fig• The example in the Fig is 2 dimensional linearly separable

system. It can be generalized to an n-dimensional system. The decision surface is then called an Hyperplane

• Each decision surface can be considered to be a Hypothesis

x1

x2

Unsupervised Learning

Cluster = 2

Cluster = 1

Key Concepts• Unsupervised techniques do not require the

expected outputs to be specified in the dataset.

• This has an advantage as the availability of labelled data is scarce relative to the vast amount of data that is available in Web and other media.

• Clustering is one of the machine learning algorithms that belongs to the category of unsupervised learning

• In the Fig the system finds inputs that can be logically grouped together as a cluster. The example shows 2 such clusters.

Classification and Regression Problems

• The term regression refers to a system with a continuous variable as the output

• Classification is a process by which the machine learning system partitions the input space in a to discrete set of classes

• Example:• Credit card approval (Approve, Not approved decisions)

• Credit line limit

• Home loan approval

• Sentiment Polarity (Positive, Negative, Neutral)

• Sentiment as a real number: -1 <= sentiment <= 1

Notations

•

• m = Number of training examples

• n = Number of features in the input example

• x’s = “input” variable / features

• y’s = “output” variable / “target” variable

• The unknown target function f maps the input space to the outputs as:

f: X -> Y

Problem Statement: ML Classifier

• Given a finite set of training examples and the space of all applicable hypothesis, select a hypothesis that best approximates the unknown target function.• The unknown target function f is

the ideal function that characterizes the underlying pattern that generated the data

• Training examples are provided to the ML designer

• The output of this process is a hypothesis g that approximates f

• The hypothesis set and the learning algorithm constitutes the solution set.

Fig from: Yasser Mustafa, Caltech

Let’s begin: Perceptron Learning

• National cricket team selectors choose the team members of the team and thus play a key role in the performance of the team.

• Suppose we want to build a system that acts as a “virtual selector” by selecting (or rejecting) a player given the data on his past performances.

• Let us consider a selector who looks at only 2 input variables: Batting Average, Bowling Average.

• Here, the features are: x1 = Batting Average, x2 = Bowling Average

• Let us consider PLA for this purpose

Example data

PLAYER BATTING AVERAGE BOWLING AVERAGE SELECTED

Shikhar Dhawan 45.46 -1 Yes

Rohit Sharma 37.89 60.37 Yes

Ajinkya Rahane 29.28 -1 Yes

Virat Kohli 52.61 145.5 Yes

Suresh Raina 35.82 48 Yes

Ambati Rayudu 60 53 Yes

Kedar Jadhav 20 -1 No

Manoj Tiwary 31.62 28.8 No

Manish Pandey -1 -1 No

Murali Vijay 19.46 -1 No

MS Dhoni 52.85 31 Yes

Wriddhiman Saha 13.66 -1 No

Robin Uthappa 26.96 -1 No

Sanju Samson -1 -1 No

Ravindra Jadeja 34.51 32.29 Yes

Akshar Patel 20 20.28 Yes

Stuart Binny 13.33 13 Yes

Parvez Rasool -1 30 Yes

R Ashwin 16.91 32.46 Yes

Karn Sharma -1 -1 No

Amit Mishra 4.8 23.95 No

Kuldeep Yadav -1 -1 No

Ishant Sharma 5.14 31.25 Yes

Bhuvneshwar Kumar 10.4 36.59 Yes

Mohammed Shami 9.12 26.08 Yes

Umesh Yadav 14.66 35.93 Yes

Varun Aaron 8 38.09 No

Dhawal Kulkarni -1 23 No

Mohit Sharma -1 58 No

Ashok Dinda 4.2 51 No

45.46, 100

37.89, 60.37

29.28, 100 52.61, 100

35.82, 48

60, 53

20, 100

31.62, 28.8

0, 100 19.46, 100

52.85, 31

13.66, 100 26.96, 1000, 100

34.51, 32.29

20, 20.28

13.33, 13

0, 3016.91, 32.46

0, 100

4.8, 23.95

0, 100

5.14, 31.25

10.4, 36.59

9.12, 26.08

14.66, 35.938, 38.09

0, 23

0, 58

4.2, 51

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70

Visualization of team performance

PLA Model

x = (x1, x2) where x1, x2 are the features of a given data sample

Select the player if 𝑑 𝑤𝑖𝑥𝑖 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝑒𝑙𝑠𝑒 𝑟𝑒𝑗𝑒𝑐𝑡

The above can be written as:

ℎ 𝑥 = 𝑠𝑖𝑔𝑛(( 𝑑 𝑤𝑖𝑥𝑖) – threshold))

ℎ 𝑥 = 𝑠𝑖𝑔𝑛(( 𝑑 𝑤𝑖𝑥𝑖) + w0))

Let us introduce an artificial input x0

ℎ 𝑥 = 𝑠𝑖𝑔𝑛( 𝑑 𝑤𝑖𝑥𝑖) where i takes values from 0 to d

In vector form: h(x) = sign(wTx)


PLA Training

• Perceptron implements: h(x) = sign(wTx)

• The goal of training is to determine the model parameters wi’s, given the training data (x1, y1), (x2, y2)…(xn, yn). • Note: Usually x is a vector and y can be a real number or a vector by

itself

• Training Algorithm:• Initialize w to small random numbers

• Iterate t = 1, 2, …• Pick a misclassified point h(𝑥𝑛) ≠ 𝑦𝑛

• Update the weight vector: 𝑤 ← 𝑤 + 𝑦𝑛𝑥𝑛

• It can be shown that for linearly separable data the algorithm converges in a finite number of iterations

• A learning rate α is used to control the increments to the weight vector


Representational Power of Perceptrons

• Equation for the decision hyperplane is 𝑤. 𝑥= 0

• The space of candidate hypothesis 𝐻 = {𝑤 |𝑤 ∈ ℝ(𝑛+1)}

• A perceptron represents a hyperplane decision surface in the n-dimensional space of data instances where the hyperplaneseparates positive examples from the negative ones.

• Not all points in the input space can be separated by this hyperplane. The ones that can be separated by the perceptron are called linearly separable.

• Perceptrons can be used to represent many Boolean functions. • E.g. assume logical 0 to be 0 and logical 1 to be +1. Suppose we want to

represent a 2 input AND function is to set the weights w0 = -1.5, w1 = w2 = 1. We can design OR logic similarly by setting w0 = -0.3

• Functions like XOR are non linearly separable and so can’t be represented by perceptrons

• The ability of the perceptrons to represent AND, OR, NAND, NOR is important complex Boolean functions can be built combining these

-1.5w1=1 w2=1

x0 x1 x2

-0.5w1=1 w2=1

x0 x1 x2

Exercise

• Design a perceptron that can represent:• NAND

• NOR

• NOT

Exercise

• Implement the Perceptron Learning Algorithm to learn the given training dataset (Cricket player data). Test the classifier using the test data provided and report the accuracy computed as the percentage of correct classifications.• Set maximum iterations to 1000, 10000, 100000

• Does this converge? If so after how many iterations?

• How many misclassified points do you get?

machine learning lecture 2 basics

Technology