machine learning with sas viya - ottawa sas...
TRANSCRIPT
Machine Learning with SAS ViyaPresented By: Sabrina Mancini, Solutions Specialist at SAS
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Presented By: Sabrina Mancini, Solutions Specialist at SAS
Agenda
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Overview of Overview of Overview of Overview of
SAS SAS SAS SAS ViyaViyaViyaViyaMachine LearningMachine LearningMachine LearningMachine Learning Machine LearningMachine LearningMachine LearningMachine Learning
in SASin SASin SASin SAS
DemonstrationDemonstrationDemonstrationDemonstration
Overview of SAS Viya
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Application Services:
SAS Metadata & Middle Tiers
Products and Solutions:
SAS Visual Analytics
SAS Visual Statistics
SAS Decision Manager
SAS Visual Data Machine & Learning
SAS Data Preparation
Hadoop:Hadoop:Hadoop:Hadoop: Web based Web based Web based Web based
applications:applications:applications:applications:
• SAS Visual Analytics
High Level Conceptual Architecture
APIs/
UIs
Data
Access
ACCESS engines
SAS Viya
Copyright © SAS Inst itute Inc. A l l r ights reserved.
SAS Metadata & Middle Tiers
Run-time Engines
Security, Governance, Administration
Cloudera
Horton Works
Relational databases:Relational databases:Relational databases:Relational databases:
Teradata
Oracle
Other sources:Other sources:Other sources:Other sources:
Excel workbooks
HTML files
Text files
• SAS Visual Analytics
• SAS Visual Statistics
• SAS Decision
Manager
• SAS Data
Preparation
• SAS Studio
• SAS Visual Data
Mining and Machine
Learning
Mobile applications:Mobile applications:Mobile applications:Mobile applications:
• SAS Visual Analytics
Mobile
Host EnvironmentsOn-premises, Private, Public,
Hybrid Cloud
Directory Services,
Authentication,
Encryption
MVA CAS MAS
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Machine Learning
Copyright © SAS Inst itute Inc. A l l r ights reserved.
What is Machine Learning?
Using iterative processes,
machine learning builds models
that automaticallyautomaticallyautomaticallyautomatically adaptadaptadaptadapt with
Copyright © SAS Inst itute Inc. A l l r ights reserved.
that automaticallyautomaticallyautomaticallyautomatically adaptadaptadaptadapt with
little or no human intervention.
Machine Learning and Inferential Statistics
There is a lot of overlap!
Machine LearningMachine LearningMachine LearningMachine Learning
Results-driven
Copyright © SAS Inst itute Inc. A l l r ights reserved.
• Results-driven
• Black box analytics
Inferential StatisticsInferential StatisticsInferential StatisticsInferential Statistics
• Inferential
• White box analytics
Machine Learning Accuracy and Interpretability
• Machine learning automates the
model building process by:
• learning iteratively from data to
identify patterns and
• predicting future results
Copyright © SAS Inst itute Inc. A l l r ights reserved.
• predicting future results
with minimal human intervention
• Machine Learning emphasizes more
on predictive accuracy rather than
interpretability of the models
Sample dataSample dataSample dataSample data
Traditional regressionTraditional regressionTraditional regressionTraditional regression
Neural Network Neural Network Neural Network Neural Network
S U P E R V I S E D L E A R N I N G
MACHINE
LEARNING
CATEGORIES
Copyright © SAS Inst itute Inc. A l l r ights reserved.Copyright © SAS Inst itute Inc. A l l r ights reserved.
S E M I - S U P E R V I S E D L E A R N I N G
U N S U P E R V I S E D L E A R N I N G
R E I N F O R C E M E N T L E A R N I N G
Your chosen technique depends on
your problem and your data
Supervised Learning
Trained on labeled observationsTrained on labeled observationsTrained on labeled observationsTrained on labeled observations
Has a target variableHas a target variableHas a target variableHas a target variable
Classification, PredictionClassification, PredictionClassification, PredictionClassification, Prediction
Algorithms: Logistic Regression, Gradient Boosting etc.Algorithms: Logistic Regression, Gradient Boosting etc.Algorithms: Logistic Regression, Gradient Boosting etc.Algorithms: Logistic Regression, Gradient Boosting etc.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Unsupervised Learning
Trained on unlabeled observationsTrained on unlabeled observationsTrained on unlabeled observationsTrained on unlabeled observations
No target variableNo target variableNo target variableNo target variable
Clustering, Feature ExtractionClustering, Feature ExtractionClustering, Feature ExtractionClustering, Feature Extraction
Algorithms: KAlgorithms: KAlgorithms: KAlgorithms: K----means clustering, PCA, etc.means clustering, PCA, etc.means clustering, PCA, etc.means clustering, PCA, etc.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Semi-Supervised Learning
Uses labeled and unlabeled observationsUses labeled and unlabeled observationsUses labeled and unlabeled observationsUses labeled and unlabeled observations
Classification, Regression, PredictionClassification, Regression, PredictionClassification, Regression, PredictionClassification, Regression, Prediction
Algorithms: Autoencoders, TSVM etc.Algorithms: Autoencoders, TSVM etc.Algorithms: Autoencoders, TSVM etc.Algorithms: Autoencoders, TSVM etc.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Reinforcement Learning
A machine learning technique where the goal is to learn a behaviour strategy that A machine learning technique where the goal is to learn a behaviour strategy that A machine learning technique where the goal is to learn a behaviour strategy that A machine learning technique where the goal is to learn a behaviour strategy that
maximizes the long term sum of rewards in an unknown and stochastic environmentmaximizes the long term sum of rewards in an unknown and stochastic environmentmaximizes the long term sum of rewards in an unknown and stochastic environmentmaximizes the long term sum of rewards in an unknown and stochastic environment....
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Optimization + Optimization + Optimization + Optimization +
Unlike standard supervised learning, correct input/output pairs need not be presented, and
sub-optimal actions need not be explicitly corrected. Instead the focus is on performance,
which involves finding a balance between exploration (of uncharted territory) and exploitation
(of current knowledge).
Modern Machine Learning ModelsThe black box
Copyright © SAS Inst itute Inc. A l l r ights reserved.
How can we generate models which are
not only accurate, but also:
FairFairFairFair
AccountableAccountableAccountableAccountable
TransparentTransparentTransparentTransparent
TrustworthyTrustworthyTrustworthyTrustworthy
ExplainableExplainableExplainableExplainable
????
Interpretability Enables Trust in AI ModelsFigure out when NOT to trust a model
You are detecting
snow, not wolves!
I can’t trust you
Prediction accuracy
is very high. It is time
to put this system
online.
Copyright © SAS Inst itute Inc. A l l r ights reserved.
Neural network
to predict
wolf vs husky
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin.
ACM SIGKDD, 2016
Data scientist
QuestionQuestionQuestionQuestion TechniqueTechniqueTechniqueTechnique
What are the top inputs? Variable Importance(VI)/Relative VIVariable Importance(VI)/Relative VIVariable Importance(VI)/Relative VIVariable Importance(VI)/Relative VI
How do the drivers work? Partial Dependence (PD)Partial Dependence (PD)Partial Dependence (PD)Partial Dependence (PD)
Post-Modeling DiagnosticsInput-Output Relationship
• Model agnostic
Copyright © SAS Inst itute Inc. A l l r ights reserved.
How do the drivers work? Partial Dependence (PD)Partial Dependence (PD)Partial Dependence (PD)Partial Dependence (PD)
Individual Conditional Expectation (ICE)Individual Conditional Expectation (ICE)Individual Conditional Expectation (ICE)Individual Conditional Expectation (ICE)
What is the explanation for a
particular prediction?
LocalLocalLocalLocal InterpretableInterpretableInterpretableInterpretable ModelModelModelModel----agnosticagnosticagnosticagnostic
ExplanationsExplanationsExplanationsExplanations ((((LIME)LIME)LIME)LIME)
• Model agnostic
• Visual
• Can be used to
compare models
Machine Learning WorkflowBest Practices
1.1.1.1. DefineDefineDefineDefine thethethethe problemproblemproblemproblem: What data is available, and what are you trying to predict? Will
you need to collect more data or hire people to manually label a dataset or use a
pre-trained model to initialise your weights?
2.2.2.2. IdentifyIdentifyIdentifyIdentify aaaa waywaywayway totototo reliablyreliablyreliablyreliably measuremeasuremeasuremeasure successsuccesssuccesssuccess onononon youryouryouryour goalgoalgoalgoal. For simple tasks, this may
be prediction accuracy, but in many cases it will require sophisticated domain-
Copyright © SAS Inst itute Inc. A l l r ights reserved.
be prediction accuracy, but in many cases it will require sophisticated domain-
specific metrics.
3.3.3.3. PreparePreparePreparePrepare thethethethe validationvalidationvalidationvalidation processprocessprocessprocess thatthatthatthat you’llyou’llyou’llyou’ll useuseuseuse totototo evaluateevaluateevaluateevaluate youryouryouryour modelsmodelsmodelsmodels. In particular,
you should define a training set, a validation set, and a test set.
4.4.4.4. EnrichEnrichEnrichEnrich andandandand randomizerandomizerandomizerandomize thethethethe datadatadatadata by shuffling the training data and applying random
augmentations/filters depending on the use case (normalization, convolutions
etc). Start with a random sub-set of the data while designing the architecture.
Machine Learning WorkflowBest Practices (cont.)
5555.... DevelopDevelopDevelopDevelop aaaa firstfirstfirstfirst modelmodelmodelmodel thatthatthatthat beatsbeatsbeatsbeats aaaa trivialtrivialtrivialtrivial commoncommoncommoncommon----sensesensesensesense baselinebaselinebaselinebaseline, thus
demonstrating that machine learning can work on your problem. This may not always
be the case!
6.6.6.6. GraduallyGraduallyGraduallyGradually refinerefinerefinerefine youryouryouryour modelmodelmodelmodel architecturearchitecturearchitecturearchitecture bybybyby tuningtuningtuningtuning hyperhyperhyperhyper----parametersparametersparametersparameters andandandand addingaddingaddingadding
regularizationregularizationregularizationregularization. Make changes based on performance on the validation data only,
not the test data or the training data
Copyright © SAS Inst itute Inc. A l l r ights reserved.
not the test data or the training data
7.7.7.7. BeBeBeBe awareawareawareaware ofofofof validationvalidationvalidationvalidation----setsetsetset overfittingoverfittingoverfittingoverfitting whenwhenwhenwhen turningturningturningturning hyperhyperhyperhyper----parametersparametersparametersparameters: the fact that
your hyperparameters may end up being overspecialized to the validation set.
Avoiding this is the purpose of having a separate test set!
8.8.8.8. MostMostMostMost commoncommoncommoncommon wayswayswaysways totototo preventpreventpreventprevent overfittingoverfittingoverfittingoverfitting::::
• Get more training data
• Reduce architecture complexity
Machine Learning in SAS
Copyright © SAS Inst itute Inc. A l l r ights reserved.
There are Multiple Ways
How Do I Use Machine Learning in SAS Viya??
Visual StatisticsVisual StatisticsVisual Statistics
Copyright © SAS Inst itute Inc. A l l r ights reserved.
SAS Studio SAS Visual Analytics SAS Model Studio
VDMML VDMML VDMML
Visual StatisticsVisual StatisticsVisual Statistics
SAS Visual Data Mining and Machine Learning
• Utilizes ML and Deep Learning ML and Deep Learning ML and Deep Learning ML and Deep Learning Models to
analyze structuredstructuredstructuredstructured and unstructuredunstructuredunstructuredunstructured data
Copyright © SAS Inst itute Inc. A l l r ights reserved.
analyze structuredstructuredstructuredstructured and unstructuredunstructuredunstructuredunstructured data
• Combines advanced analytics, data prep, advanced analytics, data prep, advanced analytics, data prep, advanced analytics, data prep,
visualization, model assessment, and visualization, model assessment, and visualization, model assessment, and visualization, model assessment, and
deploymentdeploymentdeploymentdeployment in single environment
• Supports a GUI and Coding InterfaceSupports a GUI and Coding InterfaceSupports a GUI and Coding InterfaceSupports a GUI and Coding Interface
SAS Visual Data Mining and Machine Learning
• Brings together allallallall the members of an
Copyright © SAS Inst itute Inc. A l l r ights reserved.
• Brings together allallallall the members of an
analytical team in one openopenopenopen and
collaborativecollaborativecollaborativecollaborative environment
• Supports Open Source programming Open Source programming Open Source programming Open Source programming
languages
In Memory Algorithms
STATISTICSSTATISTICSSTATISTICSSTATISTICS MACHINE LEARNINGMACHINE LEARNINGMACHINE LEARNINGMACHINE LEARNING DEEP LEARNINGDEEP LEARNINGDEEP LEARNINGDEEP LEARNING
Cox Proportional Hazards
Decision Trees
Design Matrix
General Additive Models
Bayesian Networks
Boolean Rules
Factorization Machines
Frequent Item Set Mining
Deep Forward Neural Networks (DNNs)
Convolutional Neural Networks (CNNs)
Support VGG-like, ResNet models
Copyright © SAS Inst itute Inc. A l l r ights reserved.
General Additive Models
Generalized Linear Models
K-means and K-modes Clustering
Linear Regression
Logistic Regression
Nonlinear Regression
Ordinary Least Squares Regression
Partial Least Squares Regression
Pearson Correlation
Principal Component Analysis
Quantile Regression
Shewhart Control Chart Analysis
Frequent Item Set Mining
Gradient Boosting
K Nearest Neighbor
Market Basket Analysis
Moving Windows PCA
Network Analytics/Community Detection
Neural Networks
Random Forest
Robust PCA
Support Vector Data Description
Support Vector Machines
Text Mining
Variable Clustering
Support VGG-like, ResNet models
Recurrent Neural Networks(RNNs)
Support LSTM, GRU model
Autoencoders for neural networks
Image processing extensions
Augment image action
Convert image table action
Match images action
2D/3D medical image visualization
S
Machine Learning and Deep Learning
S
CSV file, NFS
Cloud or On-Premise
Sources
APIs / REST
Python, R, LUA,
JAVA, Scala SAS Clients
Copyright © SAS Inst itute Inc. A l l r ights reserved.
S
S
S
SAS
Machine Learning,
Deep Learning
VDMMLVDMMLVDMMLVDMML
S
SAS
Visual Analytics
VAVAVAVA
Social Data
Oracle
Hadoop
Excel
S
SAS
DATA CONNECTOR
SAS ACCESSSAS ACCESSSAS ACCESSSAS ACCESS
DemonstrationDemonstrationDemonstrationDemonstrationDemonstrationDemonstrationDemonstrationDemonstration
SAS Government Analytics Leadership Forum: SAS Government Analytics Leadership Forum: SAS Government Analytics Leadership Forum: SAS Government Analytics Leadership Forum:
May 16, 2019May 16, 2019May 16, 2019May 16, 2019
Register at the Front Desk or Online!Register at the Front Desk or Online!Register at the Front Desk or Online!Register at the Front Desk or Online!
VDMML Workshop: June 5VDMML Workshop: June 5VDMML Workshop: June 5VDMML Workshop: June 5thththth at the SAS Officeat the SAS Officeat the SAS Officeat the SAS Office
Sign up online!Sign up online!Sign up online!Sign up online!Sign up online!Sign up online!Sign up online!Sign up online!