yellowbrick: steering machine learning with visual transformers
TRANSCRIPT
![Page 1: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/1.jpg)
Yellowbrick:Steering Machine Learning with
Visual Transformers
![Page 2: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/2.jpg)
Rebecca BilbroTwitter: twitter.com/rebeccabilbroGithub: github.com/rebeccabilbroEmail: [email protected]
![Page 3: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/3.jpg)
Once upon a time ...
![Page 4: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/4.jpg)
And then things got ...
![Page 5: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/5.jpg)
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import cross_validation as cv
classifiers = [
KNeighborsClassifier(5),
SVC(kernel="linear", C=0.025),
RandomForestClassifier(max_depth=5),
AdaBoostClassifier(),
GaussianNB(),
]
kfold = cv.KFold(len(X), n_folds=12)
max([
cv.cross_val_score(model, X, y, cv=kfold).mean
for model in classifiers
])
Try them all!
![Page 6: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/6.jpg)
● Search is difficult, particularly in high dimensional space.
● Even with clever optimization techniques, there is no guarantee of a solution.
● As the search space gets larger, the amount of time increases exponentially.
Except ...
![Page 7: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/7.jpg)
Solution: Visual Steering
![Page 8: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/8.jpg)
Enter Yellowbrick
● Extend the Scikit-Learn API.● Enhance the model selection
process.● Tools for feature visualization,
visual diagnostics, and visual steering.
● Not a replacement for other visualization libraries.
![Page 9: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/9.jpg)
# Import the estimator
from sklearn.linear_model import Lasso
# Instantiate the estimator
model = Lasso()
# Fit the data to the estimator
model.fit(X_train, y_train)
# Generate a prediction
model.predict(X_test)
Scikit-Learn Estimator Interface
![Page 10: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/10.jpg)
# Import the model and visualizer
from sklearn.linear_model import Lasso
from yellowbrick.regressor import PredictionError
# Instantiate the visualizer
visualizer = PredictionError(Lasso())
# Fit
visualizer.fit(X_train, y_train)
# Score and visualize
visualizer.score(X_test, y_test)
visualizer.poof()
Yellowbrick Visualizer Interface
![Page 11: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/11.jpg)
How do I select the right features?
![Page 12: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/12.jpg)
Is this room occupied?● Given labelled data with
amount of light, heat,
humidity, etc.
● Which features are
most predictive?
● How hard is it going to
be to distinguish the
empty rooms from the
occupied ones?
![Page 13: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/13.jpg)
Yellowbrick Feature Visualizers
Use radviz or parallel coordinates to look for
class separability
![Page 14: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/14.jpg)
Yellowbrick Feature Visualizers Use Rank2D for pairwise feature analysis
![Page 15: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/15.jpg)
Yellowbrick Feature Visualizers - for text, too!
Visualize top tokens, document distribution & part-of-speech tagging
![Page 16: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/16.jpg)
What’s the right model to use?
![Page 17: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/17.jpg)
Why isn’t my model predictive?
● What to do with a
low-accuracy classifier?
● Check for class
imbalance.
● Visual cue that we might
try stratified sampling,
oversampling, or getting
more data.
![Page 18: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/18.jpg)
Yellowbrick Score Visualizers
Visualize accuracy
and begin to diagnose problems
![Page 19: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/19.jpg)
Yellowbrick Score Visualizers
Visualize the distribution of error to
diagnose heteroscedasticity
![Page 20: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/20.jpg)
How do I tune this thing?
![Page 21: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/21.jpg)
What’s the right k?
● How many clusters do
you see?
● How do you pick an
initial value for k in
k-means clustering?
● How do you know
whether to increase or
decrease k?
● Is partitive clustering the
right choice?
![Page 22: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/22.jpg)
Hyperparameter Tuning with Yellowbrick
higher silhouette scores mean denser, more separate clusters
The elbow shows the best value
of k…Or suggests a different algorithm
![Page 23: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/23.jpg)
Hyperparameter Tuning with Yellowbrick
Should I use Lasso, Ridge, or ElasticNet?
Is regularlization even working?
![Page 24: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/24.jpg)
Got an idea?
![Page 25: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/25.jpg)
The main API implemented by Scikit-Learn is that of the estimator. An estimator is any object that learns from data;
it may be a classification, regression or clustering algorithm, or a transformer that extracts/filters useful features from raw data.
class Estimator(object):
def fit(self, X, y=None):
"""
Fits estimator to data.
"""
# set state of self
return self
def predict(self, X):
"""
Predict response of X
"""
# compute predictions pred
return pred
Estimators
![Page 26: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/26.jpg)
Transformers are special cases of Estimators -- instead of making predictions, they transform the input dataset X to a new dataset X’.
class Transformer(Estimator):
def transform(self, X):
"""
Transforms the input data.
"""
# transform X to X_prime
return X_prime
Transformers
![Page 27: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/27.jpg)
A visualizer is an estimator that produces visualizations based on data rather than new datasets or predictions.
Visualizers are intended to work in concert with Transformers and Estimators to shed light onto the modeling process.
class Visualizer(Estimator):
def draw(self):
"""
Draw the data
"""
self.ax.plot()
def finalize(self):
"""
Complete the figure
"""
self.ax.set_title()
def poof(self):
"""
Show the figure
"""
plt.show()
Visualizers
![Page 28: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/28.jpg)
![Page 29: Yellowbrick: Steering machine learning with visual transformers](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5a6489257f8b9a57568b4a87/html5/thumbnails/29.jpg)
Thank you!Twitter: twitter.com/rebeccabilbroGithub: github.com/rebeccabilbroEmail: [email protected]