when training and test sets are different: characterising ...why is this important? • environments...
TRANSCRIPT
![Page 1: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/1.jpg)
When Training and Test Sets are Different: Characterising Learning
TransferAmos J Storkey
Presenter: Desh Raj
![Page 2: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/2.jpg)
Outline
• Why is this important?
• Preliminaries - conditional and generative models
• Categories of dataset shift
• Practical considerations
![Page 3: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/3.jpg)
Why is this important?
Use recognition algorithm developed for one device on another device.
![Page 4: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/4.jpg)
Why is this important?
Will an old Network intrusion detection software still work?
![Page 5: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/5.jpg)
Why is this important?
Spam filter trained on different users’ data
![Page 6: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/6.jpg)
Why is this important?
• Environments are non-stationary.
• Most predictive ML models work by ignoring the different training and test scenarios.
• How to address this?
![Page 7: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/7.jpg)
But first, a small point• Are dataset shift and transfer learning the same thing?
• No!
• Transfer learning is more general; transfer information from a variety of environments to help with learning and inference in a new environment.
• Dataset shift is more specific; make prediction in one environment given data in another closely related environment.
![Page 8: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/8.jpg)
Preliminary: Conditional and Generative models
• Generative model: joint probability distribution over all variables of interest.
• Conditional model: distribution of some smaller set of variables is given for each possible known value of the other variables.
![Page 9: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/9.jpg)
• Conditional models offer more flexibility in choosing model parameterizations.
• Fit of P(y|x) is never compromised in favor of a better fit for the unused model P(x).
• If generative model accurately specifies a known generative process, then the choice of modeling structure may fit real constraints better than a conditional model.
![Page 10: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/10.jpg)
• P(y,x) = P(y|x) P(x)
• “Causal graphical model”: structure is more than just a representation of a factorization
• May happen that factorization holds even if model is not causal.
• Dataset shift still changes P(y|x) and P(x) by intervening on some possibly latent variable.
• So even non-causal conditional models are affected by dataset shift.
![Page 11: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/11.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
![Page 12: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/12.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Distribution of x changes
![Page 13: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/13.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Distribution of y changes
![Page 14: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/14.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Distribution differs due to unknown sample rejection process
![Page 15: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/15.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Deliberate dataset shift for modeling/computational convenience
![Page 16: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/16.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Changes in measurement
![Page 17: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/17.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift Changes in strength of contributing components
![Page 18: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/18.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Distribution of x changes
![Page 19: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/19.jpg)
Simple Covariate Shift
![Page 20: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/20.jpg)
• y depends on that particular x, so change in distribution should not affect a prediction.
• But still some work claiming P(y|x) needs to be changes to deal with covariate shift. Why?
• If the class of P(y|x) does not contain true conditional model, then it is not actually true distribution.
Simple Covariate Shift
![Page 21: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/21.jpg)
Less computation but not global
Global but wasteful
Linear with importance weighting
Local linear model
![Page 22: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/22.jpg)
• Might suggest that only model known form of model for test region and ignore the others.
• But what if test region is also contaminated with several other source.
• More in source component shift.
Simple Covariate Shift
![Page 23: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/23.jpg)
Two examples
• Gaussian process model is a conditional model.
• Introducing new covariate point x* has no predictive effect. (Kolmogorov consistency)
![Page 24: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/24.jpg)
Two examples
• Support Vector Machine (SVM) is NOT a conditional model.
![Page 25: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/25.jpg)
Two examples
• Support Vector Machine (SVM) is NOT a conditional model.
![Page 26: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/26.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Distribution of y changes
![Page 27: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/27.jpg)
Prior Probability Shift
![Page 28: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/28.jpg)
• If Pte(y) is known, it is easy to correct for the new joint probability.
• What to do if it is not known?
• Given P(x|y) and the covariates of test data, certain distributions of y are more or less likely.
• Specify a prior over valid P(y) and compute posterior based on test covariates.
Prior Probability Shift
![Page 29: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/29.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Distribution differs due to unknown sample rejection process
![Page 30: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/30.jpg)
Sample Selection Bias
![Page 31: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/31.jpg)
Surveys: Some people may be less willing to participate in, say, election polls.
Handwriting recognition: Some unintelligible (but important) characters may be discarded.
![Page 32: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/32.jpg)
• “Regression to the mean”
• Example:
• Measure rate of illness X in several districts.
• Choose those with highest rate to try new drug.
• May improve just because of random fluctuations; incorrectly attributed to the new drug.
Sample Selection Bias
![Page 33: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/33.jpg)
Some approaches
• Training:
• Test:
![Page 34: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/34.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Deliberate dataset shift for modeling/computational convenience
![Page 35: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/35.jpg)
Imbalanced Data
![Page 36: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/36.jpg)
• Example: prediction of rare events like loan defaulting.
• Throw away common class to make the data balanced. This is okay since common class is already easy to characterize (large amount of data).
• But this creates imbalanced training and test distributions!
Imbalanced Data
![Page 37: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/37.jpg)
• Generative models may be better at handling this issue.
• P(y,x) = P(x|y) P(y)
• How? The problem is just change in P(y) with a known shift.
• Imbalance is decoupled from modeling.
Imbalanced Data
![Page 38: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/38.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift
Changes in measurement
![Page 39: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/39.jpg)
Domain Shift
![Page 40: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/40.jpg)
• Example: two photographs of the same scene from different cameras or in different lighting may look different.
• We never observe x0. We only observe x = f(x0).
• Modeling involves estimating f using the distributional information. Example: gamma correction for photographs
Domain Shift
![Page 41: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/41.jpg)
Types of Dataset Shift
Dataset Shift
Simple Covariate Shift
Prior Probability Shift
Sample Selection Bias
Imbalanced Data
Domain Shift
Source Component Shift Changes in strength of contributing components
![Page 42: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/42.jpg)
Source Component Shift
![Page 43: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/43.jpg)
• Observed data is made up of a number of different sources with different characteristics.
• Proportions of these sources can vary between training and test scenarios.
• How is it different from sample selection bias? In S.S.B, change is a result of measurement process.
Source Component Shift
![Page 44: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/44.jpg)
Shift or No Shift?
• Using a model designed to consider covariate shift may perform poorly on data where there is no shift.
• Must check model on real environment before rolling out.
• In the same way as semi-supervised learning can provide major benefits over unsupervised learning.
![Page 45: When Training and Test Sets are Different: Characterising ...Why is this important? • Environments are non-stationary. • Most predictive ML models work by ignoring the different](https://reader036.vdocuments.mx/reader036/viewer/2022071416/61124e052c8a3a223424602d/html5/thumbnails/45.jpg)
Thank you!