introduction to ml - iit bombay...introduction to ml - roadmap • definition of machine learning...
Post on 25-Jan-2020
4 Views
Preview:
TRANSCRIPT
Introduction to MLAbhijit Mishra
Research Scholar
Center for Indian Language Technology
Department of Computer Science and Engineering
Indian Institute of Technology Bombay
Email: abhijitmishra@cse.iitb.ac.inURL: http://www.cse.iitb.ac.in/~abhijitmishra
Task: Get mangoes of a particular type from the market
Randomness??Ambiguity??
Nuances??
Task 1: Solve an equation
Task 2: Get mangoes of a particular type from the market
RandomnessSlight Variation in shape, size, color and odor etc.
AmbiguitySimilarity in size, color but belong to different categoriesNuances??Differences in size, color but belong to the same categoryHow to make machines understand these?
Introduction to ML - Roadmap• Definition of Machine Learning• Learning to predict
• Classification• Regression
• Learning Paradigms• Rule based• Statistical • Example Based
• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement
• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches
• Example - Text Classification• Books, Online Courses and Tools
Definition of Machine Learning• Machine learning1 is a type of artificial intelligence
(AI) that provides computers with the ability to learn without being explicitly programmed.
• Explores the study and construction of algorithms that can learn and make predictions on data
• Applications:• Pattern Recognition (e.g., Handwriting Recognition, Face
detection, Gesture detection)• Prediction of events (e.g., Stock market predictions,
weather forecasting, prediction of diseases based on symptoms)
• Almost all popular online services (e.g., Google, Facebook, Amazon) use ML.
https://en.wikipedia.org/wiki/Machine_learning
Introduction to ML - Roadmap• Definition of Machine Learning• Learning to predict
• Classification• Regression
• Learning Paradigms• Rule based• Statistical • Example Based
• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement
• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches
• Example - Text Classification
Learning to Predict - Classification• Classification is the problem of predicting to
which of a set of categories (sub-populations) a new observation belongs.
• Input: Properties of the new observation • Output: or the class of the new observation• When , the problem is called “binary
classification problem” (e.g., classifying emails into spam or non-spam categories)
• When the problem is called N-class/multi-class classification problem (e.g., classifying documents into multiple categories like sports, health, politics etc.).
•
Learning to Predict - Regression• When the out-put space of a predictor is
a real number instead of (nominal categories as in classification), the prediction problem is referred to as statistical regression or simply regression.
• Input: Properties of the new observation • Output: where • Example: Predicting the temperature of a
day given the climatic conditions of the previous day, estimating number of units of a new product to be sold in an year.
•
Note: Structured prediction
• Deals with more complex output (instead of scalar output as in cases of classification and regression)
• Output: where N
• Example: Automatic text translation (output is a sentence in another language), Parse tree generation (output is a tree structure), Image Captioning
We will only focus on classification problems.
•
Introduction to ML - Roadmap• Definition of Machine Learning• Learning to predict
• Classification• Regression
• Learning Paradigms• Rule based• Statistical • Example Based
• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement
• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches
• Example - Text Classification
Learning Objective• Back to Mangoes -
Task: Given some basic measurable properties of a certain mango, predict which category it belongs to.
ColorWeightSmell
DimensionsTaste??
Alphonso/Alice/Irwin
(Measurable properties/ Attributes/ Features)
(Classes)
Learning Objectives
• What to learn?• Correspondences between various
attributes of the input object and the classes
• How to learn? • Rule based learning • Statistical learning • Example based learning
Learning Paradigms – Rule Based• Learning is based on a set of rules handcrafted by
humans.
• The collection of rules or the “rule-base” has to be exhaustive enough to capture all the corner cases.
• Problems: Extremely hard, needs domain expertise and is highly time-consuming
If (weight<0.5 &&color == “yellow” || color== “green”){ category = “Alphonso”;}else if (…){ category = “Alice”;}
Learning Paradigms – Example Based• A very small set examples having of complete
information (both input and classes) are available. • Templates for each classes are learned automatically. • When a new observation arrives, class prediction is
made based on the template that fits the observation best.
• Problems: • Templates are generic representatives of classes that are
supposed to represent the whole sub-population belonging to certain classes. For many problems, it is quite hard to come up with such representatives with small number of examples.
• Susceptible to change in the nature of the input data
Learning Paradigms – Statistical • Beneficial if a large set of diversified
examples are available.• Feature-Class correspondences are
learned better.• Easy to update classifier if the nature of
the input data changes.• Leverage huge volume of available web-
data • Problems: Overlearning can happen
sometime (referred to as overfitting). Feature selection affects system accuracy.
Introduction to ML - Roadmap• Definition of Machine Learning• Learning to predict
• Classification• Regression
• Learning Paradigms• Rule based• Statistical • Example Based
• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement
• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches
• Example - Text Classification
Statistical Machine Learning- Supervised Approaches• Learning is based on a set of
observations for class labels are available.
Alice Irwin Alphonso
Learned Model
Alphonso
Statistical Machine Learning- Semi-Supervised Approaches
• Learning is based on a set of observations for class labels are available AND another set (typically of larger volume than labelled set) of observations for which class labels are not available
Alice Irwin Alphonso
Learned Model
Alphonso
Statistical Machine Learning- Un-Supervised Approaches• Learning when no class labels are
available.
Statistical Machine Learning- Reinforcement Learning• Learning happens with the objective of
maximizing the reward associated with the task.
• Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented. Association is captured in terms of rewards.
Introduction to ML - Roadmap• Definition of Machine Learning• Books, Online Courses and Tools• Learning to predict
• Classification• Regression
• Learning Paradigms• Rule based• Statistical • Example Based
• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement
• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches
• Example - Text Classification
Supervised Approaches
• Recap:
ColorWeightSmell
DimensionsTaste??
Alphonso/Alice/Irwin
(Measurable properties/ Attributes/ Features)
(Classes)
Supervised Approaches – Probabilistic Models• Given a set of features the classification
decision of probabilistic models can be expressed as
where ,
•
Supervised Approaches – Naïve Bayes
The prior can be assumed to be a multinomial distribution for classification problems
•
Likelihood
Prior
Posterior
Supervised Approaches – Naïve Bayes (1)• Now if we assume that features are
independent of each other.
• Note: The independent assumption may not hold true for many real life problems.
•
Supervised Approaches – Logistic Regression• Remember:
• In Logistic Regression is directly estimated
Where u follows a regular weighted linear equation
The coefficients ( and have to be learned during training).
•
Supervised Approaches: Non-Probabilistic Models
Class-1
Class-2
(x1,x2)Class-1
Class-2
Supervised Approaches: K-Nearest Neighbor
Class-1
Class-2
K-closest neighbors are decided based on a pre-defined distance measure. The class to which maximum number of close neighbors belong to becomes the winner class
Distance/similarity measures• Euclidian Distance (between vectors X1
and X2)
Which is a special case of Minkowski Distance
• Cosine Distance
•
Supervised Approaches: Support Vector Machines
Class-1
Class-2
f(x,w,b) = sign(w. x - b)
w. x – b = 0
SVMs: Specifying the boundary
Plus-Plane
Minus-Plane
Classifier Boundary
“Predict Class
= +1”
zone
“Predict Class
= -1”
zone
M = Margin Width =
w. x – b = 1
w. x – b = -1
w. x – b = 0
Given a guess of w and b we can• Compute whether all data
points in the correct half-planes• Compute the width of the
marginSo now we just need to write a program to search the space of w’s and b’s to find the widest margin that matches all the datapoints. This is primarily done through quadratic programming.
ww.
2
Supervised Approaches – Decision Tree
catego
rical
catego
rical
continuo
us
class MarSt
Color
TaxInc
BA
A
A
Yellow Green
Small Big, Medium
< 80
There could be more than one tree that fits the same data!
Supervised Approaches - Note• It is important to decide a set of features that
adequately explains the data.• Selecting extremely small number of features may
underspecify the data and may not help the classifier to learn properly
• As the number of features increases, the model-complexity increases (i.e., more number of parameters to be learned and chances of overfitting increases).
• Very high dimensional feature vectors make it unintuitive to analyze them, design distance functions and performing combinatorics and optimizations. This is known as “Curse of Dimensionality”
Introduction to ML - Roadmap• Definition of Machine Learning• Books, Online Courses and Tools• Learning to predict
• Classification• Regression
• Learning Paradigms• Rule based• Statistical • Example Based
• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement
• Supervised approaches• Generative approaches• Discriminative approaches
• Example - Text Classification
Example – Text Classification• Text classification is an important
problem in the field of Natural Language Processing and Machine Learning.
• Objective: Assign labels to a given text with a class
• Example:1: Obama won the election: Politics2: Brasil lost the football match: Sports
Problems in Text Classification• Lexical Problems:
• Presence of ambiguous words e.g., Cricket (game) vs Cricket (insect)
• Structural Problems:• Complexity at the syntactic levele.g., Mohd. Kaif, who was the hero of the Natwest final match against England in 2002, has joined BJP and will be running for an MP position. (Politics)
• Semantic Problems:• Complexity at the semantic levele.g., With the humiliating defeat in Bihar, INC’s innings seems to be over.
• Pragmatic Problems:e.g., India lost to Zimbawe yesterday (Sports) Bernie lost to Clinton in Newyork. (Politics)
Text Classification – Method
Some Documents
Training DataAnnotation
MODEL(Naïve Bayes, SVM,
Decision Tree etc.)
AnyUnseen
Document
Prediction
Features
Labels
Compute Features
Text Classification – Feature Extraction• Example:
• Training Sample: (Domain classification)1: Obama won the election: Politics2: Brasil lost the football match: Sports
• Features:• Vocabulary: <Obama, won, the, election,
Brasil, lost , football, match>• Bag of Word Features based on
presence/absence:• 1: <1,1,1,1,0,0,0,0>:0• 2: <0,0,1,0,1,1,1,1>:1
Text Classification – Training and Testing• Training:
• Weight of each feature towards a label is computed by training algorithm. Weight decides predictability.
• Test:• Based on the features presented in the test
data, the combined weightage is computed and a label is decided.
• Problem: When a feature is not seen in the training data (Data sparsity problem).
• Solution – instead of taking Bag of Word based features, consider bag of senses, word embedding etc.
Text Classification – Evaluation Metric• Performance of classifiers are typically
measured by Accuracy, Precision, Recall and F-Measure
• For a binary classification problem, if the class lables are positive and negative
• True Positive (TP): Number of test documents that are actually positive, are predicted positive
• True Negative (TN): Number of test documents that are actually negative, are predicted negative.
• False Positive (FP): Number of test documents that are actually negative, are predicted positive.
• False Negative (FN): Number of test documents that are actually positive, are predicted negative.
•
Text Classification – Evaluation Metric (1)
Text Classification - DEMO
• Package: Scikit-learn (install numpy, scipy, matplotlib and scikit-learn packages)
• Demo:• Naïve Bayes• SVM• KNN• Decision Tree
Books and Online Courses
• Books• Machine Learning by Tom Mitchell• Pattern Recognition and Machine Learning by Christopher M.
Bishop• Foundations of Machine Learning by Mehryar Mohri, Afshin
Rostamizadeh, Ameet Talwalkar • Machine learning: a Probabilistic Perspective – Kevin Murphy• Bayesian Reasoning and Machine Learning - David Barber• Probabilistic Graphical Models: Principles and Techniques by
Daphne Koller, Nir Friedman
• Courses• Machine Learning - Stanford University (Coursera) –Andrew
Ng• Mining Massive Datasets – Stanford Online
Tools
• Java• Weka (for supervised/semi-supervised)(www.cs.waikato.ac.nz/ml/weka/)• Mallet (for unsupervised)(www.mallet.cs.umass.edu)
• Python• Scikit-Learn (http://scikit-learn.org/)• Statsmodel
(www.statsmodels.sourceforge.net)
• R statistical packages (https://cran.r-project.org/web/packages/)
Thank you
Questions?
References
• C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html
• Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience; 1998
• Bishop, Christopher M. "Pattern recognition." Machine Learning 128 (2006).
Image URLS
• depositphotos.com• vizagcityonline.com• en.wikipedia.org/wiki/List_of_mango_culti
vars• tropicalfloridagardens.com• alphonsomango.net• alamy.com
top related