building an ai cloud | qcon new york presentation
TRANSCRIPT
![Page 1: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/1.jpg)
Building an A.I. CloudWhat We Learned from PredictionIO
![Page 2: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/2.jpg)
Simon ChanSr. Director, Product Management, Salesforce
Co-founder, PredictionIO
PhD, University College London
![Page 3: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/3.jpg)
The A.I. Developer Platform Dilemma
FlexibleSimple
![Page 4: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/4.jpg)
“ Every prediction problem is unique.
![Page 5: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/5.jpg)
3 Approaches to Customize Prediction
GUIAutomationCustom
Code & Script
FlexibleSimple
![Page 6: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/6.jpg)
10 KEY STEPSto build-your-own A.I.
P.S. Choices = Complexity
![Page 7: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/7.jpg)
One platform, build multiple apps. Here are 3 examples.
1. E-Commerce
Recommend products
2. Subscription
Predict churn
3. Social Network
Detect spam
![Page 8: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/8.jpg)
Let’s Go Beyond Textbook Tutorial
// Example from Spark ML website
import org.apache.spark.ml.classification.LogisticRegression
// Load training data
val training = sqlCtx.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
![Page 9: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/9.jpg)
Define the Prediction ProblemBe clear about the goal
1
![Page 10: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/10.jpg)
Define the Prediction Problem
Basic Ideas:◦ What is the Business Goal?
▫ Better user experience▫ Maximize revenue▫ Automate a manual task▫ Forecast future events
◦ What’s the input query?◦ What’s the prediction output?◦ What is a good prediction?◦ What is a bad prediction?
![Page 11: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/11.jpg)
Decide on the PresentationIt’s (still) all about human perception
2
![Page 12: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/12.jpg)
Decide on the Presentation
NOT SPAM SPAM
NOT SPAM True Negative False Negative
SPAM False Positive True Positive
Actual
Predicted
Mailing List and Social Network, for example, may tolerate false predictions differently
![Page 13: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/13.jpg)
Decide on the Presentation
Some UX/UI Choices:◦ Toleration to Bad Prediction?◦ Suggestive or Decisive?◦ Prediction Explainability? ◦ Intelligence Visualization?◦ Human Interactive?◦ Score; Ranking; Comparison; Charts; Groups◦ Feedback Interface
▫ Explicit or Implicit
![Page 14: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/14.jpg)
Import Free-form Data SourceLife is more complicated than MNIST and MovieLens datasets
3
![Page 15: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/15.jpg)
Import Free-form Data Sources
Some Types of Data:◦ User Attributes◦ Item (product/content) Attributes◦ Activities / Events
Estimate (guess) what you need.
![Page 16: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/16.jpg)
Import Free-form Data Sources
Some Ways to Transfer Data:◦ Transactional versus Batch◦ Batch Frequency◦ Full data or Changed Delta Only
Don’t forget continuous data sanity checking
![Page 17: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/17.jpg)
Construct Features & Labelsfrom DataMake it algorithm-friendly!
4
![Page 18: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/18.jpg)
Construct Features & Labels from Data
Some Ways of Transformation:◦ Qualitative to Numerical◦ Normalize and Weight◦ Aggregate - Sum, Average?◦ Time Range◦ Missing Records
Different algorithms may need different things
Label-specific:◦ Delayed Feedback◦ Implicit Assumptions◦ Reliability of Explicit Opinion
![Page 19: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/19.jpg)
Construct Features & Labels from Data
Qualitative to Numerical
// Example from Spark ML website - TF-IDF
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
val sentenceData = sqlContext.createDataFrame(Seq(
(0, "Hi I heard about Spark"),
(0, "I wish Java could use case classes"),
(1, "Logistic regression models are neat")
)).toDF("label", "sentence")
val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words")
val wordsData = tokenizer.transform(sentenceData)
val hashingTF = new HashingTF()
.setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20)
val featurizedData = hashingTF.transform(wordsData)
val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
val idfModel = idf.fit(featurizedData)
val rescaledData = idfModel.transform(featurizedData)
![Page 20: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/20.jpg)
Set Evaluation MetricsMeasure things that matter
5
![Page 21: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/21.jpg)
Set Evaluation Metrics
Some Challenges:◦ How to Define an Offline Evaluation that
Reflects Real Business Goal?◦ Delayed Feedback (again)◦ How to Present The Results to Everyone?◦ How to Do Live A/B Test?
![Page 22: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/22.jpg)
Clarify “Real-time”The same word can mean different things
6
![Page 23: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/23.jpg)
Clarify “Real-time”
Different Needs:◦ Batch Model Update, Batch Queries◦ Batch Model Update, Real-time Queries◦ Real-time Model Update, Real-time Queries
When to Train/Re-train for Batch?
![Page 24: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/24.jpg)
Find the Right ModelThe “cool” modeling part - algorithms and hyperparameters
7
![Page 25: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/25.jpg)
Find the Right Model
Example of model hyperparameter selection
// Example from Spark ML website
// We use a ParamGridBuilder to construct a grid of parameters to search over.
val paramGrid = new ParamGridBuilder()
.addGrid(hashingTF.numFeatures, Array(10, 100, 1000))
.addGrid(lr.regParam, Array(0.1, 0.01)).build()
// Note that the evaluator here is a BinaryClassificationEvaluator and its default metric
// is areaUnderROC.
val cv = new CrossValidator()
.setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(2) // Use 3+ in practice
// Run cross-validation, and choose the best set of parameters.
val cvModel = cv.fit(training)
![Page 26: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/26.jpg)
Find the Right Model
Some Typical Challenges:◦ Classification, Regression, Recommendation
or Something Else?◦ Overfitting / Underfitting◦ Cold-Start (New Users/Items)◦ Data Size◦ Noise
![Page 27: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/27.jpg)
Serve PredictionsTime to Use the Result
8
![Page 28: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/28.jpg)
Serve Predictions
Some Approaches:◦ Real-time Scoring◦ Batch Scoring
Real-time Business Logics/Filters is often added on top.
![Page 29: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/29.jpg)
Collect Feedback for ImprovementMachine Learning is all about “Learning”
9
![Page 30: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/30.jpg)
Collect Feedback for Improvement
Some Mechanism:◦ Explicit User Feedback
▫ Allow users to correct, or express opinion on, prediction manually
◦ Implicit Feedback
▫ Learn from subsequence effects of the previous predictions
▫ Compare predicted results with the actual reality
![Page 31: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/31.jpg)
Keep Monitoring & Be Crisis-ReadyMake sure things are still working
10
![Page 32: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/32.jpg)
Keep Monitoring & Be Crisis-Ready
Some Ideas:◦ Real-time Alert (email, slack, pagerduty)◦ Daily Reports◦ Possibly integrate with existing monitoring tools◦ Ready for production rollback◦ Version control
For both prediction accuracy and production issues.
![Page 33: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/33.jpg)
Summary: Processes We Need to Simplify
Define the Prediction Problem
Decide on the Presentation
Import Free-form Data Sources
Construct Features & Labels from Data
Set Evaluation Metrics
Clarify “Real-Time”
Find the Right Model
Serve Predictions
Collect Feedback for Improvement
Keep Monitoring & Be Crisis-Ready
![Page 34: Building an AI Cloud | QCon New York Presentation](https://reader033.vdocuments.mx/reader033/viewer/2022052514/58f9ad86760da3da068b992c/html5/thumbnails/34.jpg)
The Future of A.I.
is the automation of A.I.