1 Comparison of Machine Learning Algorithms [ aarti/Class/10701/MLAlgo_Comparisons.pdf · 1 Comparison…

Download 1 Comparison of Machine Learning Algorithms [ aarti/Class/10701/MLAlgo_Comparisons.pdf · 1 Comparison…

Post on 12-Sep-2018

212 views

Category:

Documents

0 download

TRANSCRIPT

  • 1 Comparison of Machine Learning Algorithms [Jayant, 20 points]

    In this problem, you will review the important aspects of the algorithms we have learned about in class. Forevery algorithm listed in the two tables on the next pages, fill out the entries under each column according tothe following guidelines. Turn in your completed table with your problem set. [ 12 point per entry]

    Guidelines:

    1. Generative or Discriminative Choose either generative or discriminative; you may write Gand D respectively to save some writing.

    2. Loss Function Write either the name or the form of the loss function optimized by the algorithm(e.g., exponential loss).

    3. Decision Boundary / Regression Function Shape Describe the shape of the decision surfaceor regression function, e.g., linear. If necessary, enumerate conditions under which the decisionboundary has different forms.

    4. Parameter Estimation Algorithm / Prediction Algorithm Name or concisely describe analgorithm for estimating the parameters or predicting the value of a new instance. Your answer shouldfit in the provided box.

    5. Model Complexity Reduction Name a technique for limiting model complexity and preventingoverfitting.

    Solution: Completed tables are on the following pages.

    1

  • LearningMethod

    Generative orDiscriminative?

    Loss Func-tion

    Decision Boundary Parameter Estimation Algorithm Model Complexity Re-duction

    Gaussian NaveBayes

    Generative logP (X,Y ) Equal variance: linearboundary. Unequal vari-ance: quadratic bound-ary

    Estimate , 2, and P (Y ) usingmaximum likelihood

    Place prior on parame-ters and use MAP esti-mator

    Logistic Regres-sion

    Discriminative logP (Y |X) Linear No closed form estimate. Opti-mize objective function using gra-dient descent.

    L2 regularization

    Decision Trees Discriminative Either logP (Y |X)or zero-oneloss

    Axis-aligned partition offeature space

    Many algorithms: ID3, CART,C4.5

    Prune tree or limit treedepth

    K-NearestNeighbors

    Discriminative zero-one loss Arbitrarily complicated Must store all training data toclassify new points. ChooseK us-ing cross validation.

    Increase K

    Support VectorMachines (withslack variables,no kernel)

    Discriminative hinge loss:|1y(wTx)|+

    linear (depends on ker-nel)

    Solve quadratic program to findboundary that maximizes margin

    Reduce C

    Boosting (withdecision stumps)

    Discriminative exponentialloss:exp{yf(x)}

    Axis-aligned partition offeature space

    AdaBoost Reduce the number of it-erations

    Table 1: Comparison of Classification Algorithms

    2

  • Learning Method Loss Function RegressionFunctionShape

    Parameter Estimation Algorithm Prediction Algorithm

    Linear Regression (assumingGaussian noise model)

    square loss:(Y Y )2

    Linear Solve = (XTX)1XTY Y = X

    Nadaraya-Watson Kernel Regres-sion

    square loss:(Y Y )2

    Arbitrary Store all training data. Choosekernel bandwidth h using crossvalidation.

    f(x) =

    i yiK(xi,x)j K(xj ,x)

    Regression Trees square loss:(Y Y )2

    Axis-alignedpartition offeature space

    Many: ID3, CART, C4.5 Move down tree based onx, predict value at theleaf.

    Table 2: Comparison of Regression Algorithms

    3

  • 2 Comparison of Machine Learning Algorithms [Jayant, 15 pts]

    In this problem, you will review the important aspects of the algorithms we have learned about in class sincethe midterm. For every algorithm listed in the two tables on the next pages, fill out the entries under eachcolumn according to the following guidelines. Do not fill out the greyed-out cells. Turn in your completedtable with your problem set.

    Guidelines:

    1. Generative or Discriminative Choose either generative or discriminative; you may write Gand D respectively to save some writing.

    2. Loss Function Write either the name or the form of the loss function optimized by the algorithm(e.g., exponential loss). For the clustering algorithms, you may alternatively write a short descriptionof the loss function.

    3. Decision Boundary Describe the shape of the decision surface, e.g., linear. If necessary, enu-merate conditions under which the decision boundary has different forms.

    4. Parameter Estimation Algorithm / Prediction Algorithm Name or concisely describe analgorithm for estimating the parameters or predicting the value of a new instance. Your answer shouldfit in the provided box.

    5. Model Complexity Reduction Name a technique for limiting model complexity and preventingoverfitting.

    6. Number of Clusters Choose either predetermined or data-dependent; you may write P andD to save time.

    7. Cluster Shape Choose either isotropic (i.e., spherical) or anisotropic; you may write I andA to save time.

    Solution: Completed tables are on the following pages.

    1

  • LearningMethod

    Generative orDiscriminative?

    Loss Func-tion

    ParameterEstimationAlgorithm

    Prediction Algorithm Model Complexity Reduction

    Bayes Nets Generative logP (X,Y ) MLE Variable Elimination MAP

    Hidden MarkovModels

    Generative logP (X,Y ) MLE Viterbi or Forward-Backward, depending onprediction task

    MAP

    Neural Networks Discriminative Sum-squarederror

    Back-Propagation

    Forward Propagation Reduce number of hidden lay-ers, regularization, early stop-ping

    Table 1: Comparison of Classification Algorithms

    2

  • LearningMethod

    Loss Function Number of clusters:Predetermined or Data-dependent

    Cluster shape: isotropicor anisotropic?

    Parameter Estimation Algorithm

    K-means Within-classsquared distancefrom mean

    Predetermined Isotropic K-means

    Gaussian Mix-ture Models(identity covari-ance)

    logP (X),(equivalent towithin-classsquared distancefrom mean)

    Predetermined Isotropic Expectation Maximization (EM)

    Single-Link Hi-erarchical Clus-tering

    Maximum dis-tance betweena point and itsnearest neighborwithin a cluster

    Data-dependent Anisotropic Greedy agglomerative clustering

    Spectral Clus-tering

    Balanced cut Predetermined Anisotropic Run Laplacian Eigenmaps fol-lowed by K-means or threshold-ing eigenvector signs

    Table 2: Comparison of Clustering Algorithms

    3

Recommended

View more >