noise resilience in machine learning algorithms

40
Exploring the Noise Resilience Combined Sturges Algorithm Akrita Agarwal Advisor: Dr. Anca Ralescu November 7, 2015 Akrita Agarwal Exploring the Noise Resilience Combined Sturges Algorithm November 7, 2015 1 / 39

Upload: akrita-agarwal

Post on 14-Apr-2017

380 views

Category:

Engineering


4 download

TRANSCRIPT

Page 1: Noise Resilience in Machine Learning Algorithms

Exploring the Noise ResilienceCombined Sturges Algorithm

Akrita AgarwalAdvisor: Dr. Anca Ralescu

November 7, 2015

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 1 / 39

Page 2: Noise Resilience in Machine Learning Algorithms

Motivation

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 2 / 39

Page 3: Noise Resilience in Machine Learning Algorithms

Motivation

A study on Noise?

Real-world datasets are noisyRecordings under normal environmental conditionsEquipment Measurement ErrorMost algorithms ignore Noise.Not a lot of research done on Noise.

Aim : Explore the robustness of algorithms to Noise.

Which algorithm is least affected by noisy Datasets?

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 3 / 39

Page 4: Noise Resilience in Machine Learning Algorithms

Classification

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 4 / 39

Page 5: Noise Resilience in Machine Learning Algorithms

Classification

Classification : Assigning a new observation to a set of knowncategories

Companies store large amounts of data.

Effective Classifier can assist in making good predictions and informedbusiness decisions.

E.g. Whether to recommend Prime products to the non-primecustomers, based on behavior

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 5 / 39

Page 6: Noise Resilience in Machine Learning Algorithms

Classification Algorithms

Two broad kinds of Classifiers are -

Frequency based classifiers: use the frequency of datapoints in thedataset to determine the class membership of a given test point,Geometry based classifiers leverage the geometrical aspects of adataset such as the distance.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 6 / 39

Page 7: Noise Resilience in Machine Learning Algorithms

Naive Bayes

The Naive Bayes Classifier

Frequency based classifierComputes the probability of a test data point to be in each classclass probability extracted from training data.

Pros

Intuitive to understand and build.Easily trained, even with a small datasetIt’s fast

Cons

Assumes conditional independence of the dataignores the underlying geometry of the data.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 7 / 39

Page 8: Noise Resilience in Machine Learning Algorithms

k Nearest Neighbors

The k Nearest Neighbors Classifier

Geometry based classifierAssigns the class to test data point by determining the majority classof k nearest points

Pros

Easy to implement and understandClasses don’t have to be linearly separable

Cons

Tends to ignore the importance of an attribute; uses allonly indirectly takes into account the frequency of the data

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 8 / 39

Page 9: Noise Resilience in Machine Learning Algorithms

Combined Sturges Classifier

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 9 / 39

Page 10: Noise Resilience in Machine Learning Algorithms

Combined Sturges

The Combined Sturges(CS)Classifier

Explicitly uses geometry + frequencyData represented as Frequency distribution on class.Classification Score is computed for each class.Test point assigned to class with highest Score.

Continuous data values are binned.

No. of bins = d1 + log2neSturges, 1926 - Choice of a Class Interval

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 10 / 39

Page 11: Noise Resilience in Machine Learning Algorithms

Combined Sturges

Dummy Dataset

Table: Dummy Dataset

A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0

Table: Frequency Distribution on Classes 0 & 1

A1 f (A1) A2 f (A2)

1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25

A1 f (A1) A2 f (A2)

1 0.25 2 0.752 0.25 3 0.253 0.50

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39

Page 12: Noise Resilience in Machine Learning Algorithms

Combined Sturges

Dummy Dataset

Table: Dummy Dataset

A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0

Table: Frequency Distribution on Classes 0 & 1

A1 f (A1) A2 f (A2)

1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25

A1 f (A1) A2 f (A2)

1 0.25 2 0.752 0.25 3 0.253 0.50

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39

Page 13: Noise Resilience in Machine Learning Algorithms

Combined Sturges

Test Point : T1

3 4

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 12 / 39

Page 14: Noise Resilience in Machine Learning Algorithms

Combined Sturges

1 Geometric Criterion

Test Point : T1

3 4

minimum Distance

ClassificationCriteria :Geometric

ClassificationScore: HighestPosteriorProbability

Table: Nearest distance of T1 to Classes

A1 f (A1) A2 f (A2)

1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25

A1 f (A1) A2 f (A2)

1 0.25 2 0.752 0.25 3 0.253 0.50

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 13 / 39

Page 15: Noise Resilience in Machine Learning Algorithms

Combined Sturges

Classification Score, S(c) c ∈ 0, 1

S(0)

A1 = P(Class0)× f (A1)A2 = P(Class0)× f (A2)average(A1,A2) =average(0.5 ×0.25, 0.5× 0.25) = 0.125

S(1)

A1 = P(Class1)× f (A1)A2 = P(Class1)× f (A2)average(A1,A2) = average(0.5× 0.50, 0.5× 0.25) = 0.187

S(0) < S(1)

Class 1

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 14 / 39

Page 16: Noise Resilience in Machine Learning Algorithms

Combined Sturges

1 Statistical Criterion

Test Point : T1

3 4

maximumFrequency

Classificationcriteria : Statistical

ClassificationScore: MinimumDistance

Table: Maximum Frequency in Classes

A1 f (A1) A2 f (A2)

1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25

A1 f (A1) A2 f (A2)

1 0.25 2 0.752 0.25 3 0.253 0.50

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 15 / 39

Page 17: Noise Resilience in Machine Learning Algorithms

Combined Sturges

Classification Score

S(0)

A1 = (4− 3) = 1A2 = (4− 1) = 3average(A1,A2) = average(1, 3) = 2

S(1)

A1 = (3− 3) = 0A2 = (4− 2) = 2average(A1,A2) = average(0, 2) = 1

S(0) > S(1)

Class 1

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 16 / 39

Page 18: Noise Resilience in Machine Learning Algorithms

Combined Sturges

1 Combined Criterion

Test Point : T1

3 4

d =(T1 − A1).f (A1)

Expected DistanceED = EDc

A1.EDcA2

min ExpectedDistance, ED

Table: Aggregate Expected Distance, ED

A1 f (A1) d .f A2 f (A2) d .f

1 0.25 0.50 1 0.50 1.503 0.25 0 2 0.25 0.504 0.50 0.50 3 0.25 0.25

ED0A1 1.00 ED0

A2 2.25

A1 f (A1) d .f A2 f (A2) d .f

1 0.25 0.50 2 0.75 1.502 0.25 0.25 3 0.25 0.253 0.50 0

ED1A1 0.75 ED1

A2 1.75

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 17 / 39

Page 19: Noise Resilience in Machine Learning Algorithms

Combined Sturges

Classification Penalty

S(0)

ED = 1.00× 2.25 = 2.25S(0) = ED × (1− P(Class0)) = 1.125

S(1)

ED = 0.75× 1.75 = 1.31S(1) = ED × (1− P(Class1)) = 0.655

S(0) > S(1)

Class 1

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 18 / 39

Page 20: Noise Resilience in Machine Learning Algorithms

The Noise Model

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 19 / 39

Page 21: Noise Resilience in Machine Learning Algorithms

The Noise Model

Dealing with Noise

Brodley & Fried, 1999 - detect and reduce noise

Kubica & Moore, 2003 - identify Noise using a probabilistic modeland remove it.

Elias Kalapanidas, 2003 - Developed a Noise Model based on dataproperties.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 20 / 39

Page 22: Noise Resilience in Machine Learning Algorithms

The Noise Model

Additive Noise, x′

= x + δx

δx = σxj × zi,jσxj , standard deviation of attribute j,zi,j = CDF (pi,j)

xi ,j =

{x

′i ,j if pi ,j ≥ n

xi ,j if pi ,j < n(1)

Based on Noise level n ∈ {0, 0.15, 0.30, 0.50, 0.80}

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 21 / 39

Page 23: Noise Resilience in Machine Learning Algorithms

The Noise Model

Attribute-level Noise

Table: Original Dataset

A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0

Table: 40% (n = 0.4) Noisy Dataset

A1 A2 Class8.5 0.55 18.9 2 14 0.7 03 2 14.7 1 02 2 13 3 01.6 0.02 0

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 22 / 39

Page 24: Noise Resilience in Machine Learning Algorithms

Datasets

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 23 / 39

Page 25: Noise Resilience in Machine Learning Algorithms

Datasets

Artificial datasets

Multivariate Normal

x1 = random Normal vector, t = random Normal vectorx2 = 0.8x1 + 0.6tx3 = 0.6x1 + 0.8tx4 = t

Linear Function with Non-normal inputs

x2 = (x1)2 + 0.5t

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 24 / 39

Page 26: Noise Resilience in Machine Learning Algorithms

Datasets

2 Artificial datasets

Different Imbalanced-Ratio

3 Real Datasets

Table: Comparison of physical properties of Datasets.

DatasetNo. of

SamplesNo. ofClasses

No. ofAttributes

AttributeValue

ImbalanceRatio

Haberman 306 2 3 Integer 2.78A1 200 3 4 Real 6.66A2 200 3 4 Real 39Iris 150 3 4 Real 2

PimaDiabetes

768 2 8Integer,Real

1.87

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 25 / 39

Page 27: Noise Resilience in Machine Learning Algorithms

Process Flow

1 Create Artificial Datasets

2 Implement the Noise model on all Datasets

3 Apply the three algorithms

4 Compare the results

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 26 / 39

Page 28: Noise Resilience in Machine Learning Algorithms

Results

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 27 / 39

Page 29: Noise Resilience in Machine Learning Algorithms

Results

Performance Measures

Confusion Matrix

Table: Confusion matrix for 2 classes.

Predicted OutcomePositive Negative

Actual valuesPositive TP FNNegative FP TN

Accuracy Acc = TP+TNTP+TN+FP+FN

Precision P = TPTP+FP

Recall R = TPTP+FN

F-measure Fα = PRαP+(1−α)R

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 28 / 39

Page 30: Noise Resilience in Machine Learning Algorithms

Results

Non-Noisy Datasets

Artificial Datasets -

knn does the best - 91.2% & 93.7%Good improvement in CS from 65% - 76%

Table: Non-Noisy Artificial Datasets - Performance of all algorithms

Dataset Algorithm Accuracy Precision Recall F-measure

CS 65.0 63.5 70.1 66.6knn 91.2 92.8 87.4 89.8A1Naive Bayes 60.2 61.6 60.14 64.1

CS 76.0 68.4 71.62 69.7knn 93.7 94.7 91.9 93.2A2Naive Bayes 63.1 61.1 65.2 63.5

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 29 / 39

Page 31: Noise Resilience in Machine Learning Algorithms

Results

Real Datasets -Iris : knn does best. Followed by Naive Bayes.Haberman : CS does best. Naive Bayes is really bad.Pima-Diabetes : CS is best. Naive Bayes follows.

Table: Non-Noisy Real Datasets - Performance of all algorithms

Dataset Algorithm Accuracy Precision Recall F-Measure

CS 94.3 95.1 94.3 94.7knn 96.7 96.8 96.7 96.8IrisNaive Bayes 96.2 93.7 95 94.3

CS 75.2 67.2 61.6 64.2knn 73.4 63.2 54.8 58.5HabermanNaive Bayes 0.5 41.9 47.6 47.3

CS 73.7 74.9 65.1 69.6knn 64.5 65.6 66.9 66.3

Pima -Diabetes

Naive Bayes 70.3 59.2 56.7 57.9

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 30 / 39

Page 32: Noise Resilience in Machine Learning Algorithms

Results

Noisy Datasets : A1knn does best.For both knn and CS, No change with noiseNaive Bayes does bad.

Table: Noisy A1 dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 65 63.5 70.1 66.615 64.8 63.4 96.7 96.8CS50 65.5 63.2 95 94.3

0 87.5 87.2 61.6 61.615 87.3 88.1 54.8 58.5knn50 86.7 88.5 47.6 47.3

0 ≈ 0 ≈ 0 ≈ 0 ≈ 015 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes50 ≈ 0 ≈ 0 ≈ 0 ≈ 0

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 31 / 39

Page 33: Noise Resilience in Machine Learning Algorithms

Results

Noisy Datasets : A2knn does best, but goes from 92.6% - 86.3%For CS, no change with noiseFrom A1 to A2, CS : 65% - 76%

Table: Noisy A2 dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 76.0 68.4 71.6 69.715 76.8 64.7 73.1 68.4CS50 76.4 66.9 71.7 68.5

0 92.6 86.9 85.5 86.215 91.1 84.2 84.2 83.5knn50 86.3 83.0 78.2 77.9

0 ≈ 0 ≈ 0 ≈ 0 ≈ 015 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes50 ≈ 0 ≈ 0 ≈ 0 ≈ 0

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 32 / 39

Page 34: Noise Resilience in Machine Learning Algorithms

Results

Noisy Datasets : Irisknn does best at 0% Noise (96.7%) , then CS 94.5%CS does best at 50% Noise - 73.1%, then knn - 63.8%

Table: Noisy Iris dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 94.5 94.9 94.5 94.715 86.2 87.6 86.2 86.9CS50 73.1 74.9 73.1 73.9

0 96.7 96.8 96.7 96.815 83.6 84.6 83.6 84.1knn50 63.8 63.2 63.8 63.5

0 93.3 92.3 91.9 92.115 92.3 91.5 91.2 91.4Naive Bayes50 0.7 18.3 0.7 NaN

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 33 / 39

Page 35: Noise Resilience in Machine Learning Algorithms

Results

Noisy Datasets : HabermanCS does best at 74.7%Naive Bayes performs badly at ≈ 43%

Table: Noisy Haberman dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 74.7 66.7 61.4 63.915 66.1 62.2 61.9 62.0CS50 74.5 66.6 63 64.7

0 74.1 65.7 55.1 59.715 72.0 56.2 52.3 54.0knn50 70.5 51.8 50.6 51.0

0 41.0 47.1 46.5 46.815 43.3 46.2 45.3 45.7Naive Bayes50 41.4 34.7 32.4 31.8

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 34 / 39

Page 36: Noise Resilience in Machine Learning Algorithms

Results

Noisy Datasets : Pima-DiabetesCS does best, followed by knnNaive Bayes bad with Noise : 70% - 55.7% - 0%

Table: Noisy Pima-Diabetes dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 72.8 72.8 64.2 68.215 70.8 68.3 65.8 67CS50 67.0 64.9 55.9 60.0

0 63.5 64.6 65.9 65.215 60.8 61.2 62.3 61.7knn50 55.0 55.6 56.1 55.8

0 70.3 59.2 56.7 57.915 55.7 49.4 46.0 NaNNaive Bayes50 0 0 0 NaN

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 35 / 39

Page 37: Noise Resilience in Machine Learning Algorithms

Results

Results Summary

Table: Best Algorithm for different Noise Levels

Dataset 0% Noise 15% Noise 50% Noise

A1 knn knn knnA2 knn knn knn

Haberman CS knn CSIris knn Naive Bayes CS

Pima -Diabetes

CS CS CS

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 36 / 39

Page 38: Noise Resilience in Machine Learning Algorithms

Conclusion

No algorithm is best.

In general knn has better accuracy but CS is more robust to noise.

Naive Bayes does much worse for noise, than others.

Also:

CS performs well for Imbalanced Datasets.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 37 / 39

Page 39: Noise Resilience in Machine Learning Algorithms

Future Work

Test with more datasets.

Test for performance on imbalanced datasets.

Only additive Noise model was used, try with other variations.

Compare with more algorithms.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 38 / 39

Page 40: Noise Resilience in Machine Learning Algorithms

Questions

Questions?

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 39 / 39