1
Application of Application of Metamorphic Testing to Metamorphic Testing to Supervised ClassifiersSupervised Classifiers
Xiaoyuan Xie, Tsong Yueh ChenSwinburne University of Technology
Christian Murphy, Gail KaiserColumbia University
Joshua HoUniversity of Sydney
Baowen XuNanjing University
2
BackgroundBackground Many applications in the field of Many applications in the field of
scientific computing depend on scientific computing depend on machine learningmachine learning (ML) algorithms (ML) algorithms
ML applications often do not have ML applications often do not have test test oraclesoracles that indicate whether the that indicate whether the output is correct for arbitrary inputoutput is correct for arbitrary input
Applications without test oracles are Applications without test oracles are called “called “non-testable programsnon-testable programs””
3
Problem StatementProblem Statement Oracles may exist for a Oracles may exist for a limitedlimited subset subset
of the input domain, and gross errors of the input domain, and gross errors ((e.g.e.g. crashes) can be detected with crashes) can be detected with certain inputs or techniquescertain inputs or techniques
However, it is difficult to detect However, it is difficult to detect subtlesubtle (computational) errors for (computational) errors for arbitraryarbitrary inputsinputs
4
Testing ML ApplicationsTesting ML Applications There has been much research into There has been much research into
applying ML techniques to software applying ML techniques to software testing, but not the other way aroundtesting, but not the other way around
Reusable real-world data sets and Reusable real-world data sets and frameworks are available for checking frameworks are available for checking that an ML algorithm that an ML algorithm predictspredicts wellwell, , but not for checking that an but not for checking that an implementation implementation worksworks correctlycorrectly
5
ObservationObservation If there is no oracle in the general case, If there is no oracle in the general case,
we cannot know the expected relationship we cannot know the expected relationship between a particular input and its outputbetween a particular input and its output
However, it may be possible to know However, it may be possible to know relationships between a relationships between a setset of inputs and of inputs and the corresponding the corresponding setset of outputs of outputs
““Metamorphic TestingMetamorphic Testing” ” [Chen [Chen et al.et al. ’98] ’98] is is such an approachsuch an approach
6
Metamorphic TestingMetamorphic Testing An approach for An approach for creating follow-on test casescreating follow-on test cases
based on previous test casesbased on previous test cases
If input If input xx produces output produces output f(x)f(x), then the , then the function’s “function’s “metamorphic propertiesmetamorphic properties” are used ” are used to guide a transformation function to guide a transformation function tt, which is , which is applied to produce a new test case input, applied to produce a new test case input, t(x)t(x)
We can then predict the expected value of We can then predict the expected value of f(t(x))f(t(x)) based on the value of based on the value of f(x)f(x) obtained obtained from the actual executionfrom the actual execution
7
Metamorphic Testing without an Metamorphic Testing without an OracleOracle
When a test oracle exists, we can know When a test oracle exists, we can know whether whether f(t(x))f(t(x)) is correct is correct– Because we have an oracle for Because we have an oracle for f(x)f(x)– So if So if f(t(x))f(t(x)) is as expected, then it is correct is as expected, then it is correct
When there is no test oracle, When there is no test oracle, f(x)f(x) acts as a acts as a “pseudo-oracle” for “pseudo-oracle” for f(t(x))f(t(x))– If If f(t(x))f(t(x)) is as expected, it is is as expected, it is not not necessarily necessarily
correctcorrect– However, if However, if f(t(x))f(t(x)) is is notnot as expected, either as expected, either f(x)f(x) or or f(t(x))f(t(x)) (or both) is wrong (or both) is wrong
8
Metamorphic Testing Metamorphic Testing ExampleExample
Consider a program that reads a text file of Consider a program that reads a text file of test scores for students in a class, test scores for students in a class, and and computes thecomputes the averageaverages and thes and the standard standard deviation of the averagesdeviation of the averages
If we If we permutepermute the the valuesvalues in the text file, the in the text file, the resultresults s should stay the sameshould stay the same
If we If we multiplymultiply each score by 10, the each score by 10, the final final resultresultss should al should all be multiplied by 10 as welll be multiplied by 10 as well
These metamorphic properties can be used to These metamorphic properties can be used to create a “pseudo-oracle” for the applicationcreate a “pseudo-oracle” for the application
9
ApproachApproach To apply Metamorphic Testing to such To apply Metamorphic Testing to such
ML applications, we first enumerate the ML applications, we first enumerate the metamorphic relations based on the metamorphic relations based on the expected behaviors of a given machine expected behaviors of a given machine learning learning algorithmalgorithm
We then utilize these relations to We then utilize these relations to conduct metamorphic testing on the conduct metamorphic testing on the implementationimplementation
10
Verification & ValidationVerification & Validation The scope of which metamorphic properties The scope of which metamorphic properties
are are necessarynecessary may differ between various may differ between various problems in the domainproblems in the domain
Properties that are necessary can be used for Properties that are necessary can be used for verificationverification: : “Is the implementation of the “Is the implementation of the algorithm correct?”algorithm correct?”
Other properties can be used for Other properties can be used for validationvalidation: : “Is the algorithm appropriate for solving this “Is the algorithm appropriate for solving this problem?”problem?”
11
Research QuestionsResearch Questions What are the metamorphic What are the metamorphic
properties of supervised ML properties of supervised ML classification algorithms?classification algorithms?– Which can be used for verification? Which can be used for verification? – Which can be used for validation?Which can be used for validation?
Can metamorphic testing detect Can metamorphic testing detect defects in real-world ML applications?defects in real-world ML applications?
12
Machine Learning Machine Learning FundamentalsFundamentals
Data sets consist of a number of Data sets consist of a number of samplessamples, , each of which has each of which has attributesattributes and a and a labellabel
In the first phase (“In the first phase (“trainingtraining”), a ”), a modelmodel is is generated that attempts to generalize how generated that attempts to generalize how attributes relate to the labelattributes relate to the label
In the second phase, the model is applied to In the second phase, the model is applied to a previously-unseen data set (“a previously-unseen data set (“testingtesting” data) ” data) with unknown labels to produce a with unknown labels to produce a classification of each sampleclassification of each sample
13
Algorithms InvestigatedAlgorithms Investigated kk-Nearest Neighbors (-Nearest Neighbors (kkNN)NN)
– Samples in the testing data are classified by using Samples in the testing data are classified by using Euclidean distance to find the Euclidean distance to find the kk nearest samples nearest samples in the training datain the training data
– Classification is then done by majority ruleClassification is then done by majority rule
Naïve Bayes Classifier (NBC)Naïve Bayes Classifier (NBC)– For a given sample in the testing data, computes For a given sample in the testing data, computes
the probability of that sample belonging to each the probability of that sample belonging to each class, assuming conditional independence class, assuming conditional independence between the attributesbetween the attributes
– Chooses the class that is most likelyChooses the class that is most likely
14
Metamorphic RelationsMetamorphic Relations We identified 11 properties that we would We identified 11 properties that we would
expect expect allall classification algorithms to have classification algorithms to have
Affine transformation of attributesAffine transformation of attributes Permutation of labels or attributesPermutation of labels or attributes Addition of informative or uninformative attributesAddition of informative or uninformative attributes Addition of classes by duplicating or re-labeling Addition of classes by duplicating or re-labeling
samplessamples Removal of classes or samplesRemoval of classes or samples
15
Experimental SetupExperimental Setup Applied the approach to implementations in Applied the approach to implementations in
the Weka 3.5.7 toolkitthe Weka 3.5.7 toolkit
Initial test cases:Initial test cases:– Randomly generated values Randomly generated values – Four attributes (“columns”)Four attributes (“columns”)– 20-50 samples (“rows”)20-50 samples (“rows”)
Metamorphic relations were applied to create Metamorphic relations were applied to create 20-300 follow-on test cases20-300 follow-on test cases
16
PropertPropertyy
NecessarNecessary?y?
% % violatedviolated
NecessarNecessary?y?
% % violatedviolated
00 00 7.47.41.11.1 15.915.9 0.30.31.21.2 00 002.12.1 00 0.60.62.22.2 4.14.1 003.13.1 00 003.23.2 00 004.14.1 25.325.3 004.24.2 00 3.93.95.15.1 5.95.9 5.65.65.25.2 2.82.8 2.82.8
k Nearest Neighbors Naïve Bayes Classifier
ResultsResults
17
Analysis: Analysis: kkNNNN No necessary properties were violatedNo necessary properties were violated
Issues related to validation:Issues related to validation:– Labels that are non-existent in the Labels that are non-existent in the
training data have a non-zero chance of training data have a non-zero chance of being selected in classificationbeing selected in classification
– If two labels are equally likely, the “first” If two labels are equally likely, the “first” one that is listed is chosenone that is listed is chosen
18
Analysis: Naïve BayesAnalysis: Naïve Bayes Four necessary properties were Four necessary properties were
violated, indicating defects in the violated, indicating defects in the implementationimplementation– Loss of precision related to use of the Loss of precision related to use of the
“double” datatype in Java“double” datatype in Java– Laplace Accuracy used to determine Laplace Accuracy used to determine
probabilities; thus, labels that did not probabilities; thus, labels that did not appear in training data have non-zero appear in training data have non-zero probabilityprobability
19
SuggestionsSuggestions We suggest using the “BigDecimal” We suggest using the “BigDecimal”
class instead of the “double” class instead of the “double” datatype datatype
Laplace Accuracy is appropriate for Laplace Accuracy is appropriate for the the attributesattributes but not for the but not for the labelslabels
Use of Laplace Accuracy should be Use of Laplace Accuracy should be set as an optionset as an option
20
Future WorkFuture Work Apply the testing approach to other Apply the testing approach to other
domains that depend on ML, such as domains that depend on ML, such as scientific computingscientific computing
Further investigation of testing “non-Further investigation of testing “non-testable programs”testable programs”
Measure the effectiveness of the Measure the effectiveness of the approach in empirical studiesapproach in empirical studies
21
SummarySummary Metamorphic testing is easy to Metamorphic testing is easy to
implement and automateimplement and automate We were able to devise fault-revealing We were able to devise fault-revealing
properties even with just a basic properties even with just a basic understanding of the ML algorithmsunderstanding of the ML algorithms
Metamorphic testing can be used for Metamorphic testing can be used for both both verificationverification and and validationvalidation
22
Application of Application of Metamorphic Testing to Metamorphic Testing to Supervised ClassifiersSupervised Classifiers
Xiaoyuan Xie, Tsong Yueh ChenSwinburne University of Technology
Christian Murphy, Gail KaiserColumbia University
Joshua HoUniversity of Sydney
Baowen XuNanjing University
23
Related WorkRelated Work Applying MT to non-testable Applying MT to non-testable
programs in other domainsprograms in other domains
General properties for use in MTGeneral properties for use in MT