predictive analytics and big data tutorial
DESCRIPTION
This presentation covers data science buzz words, big data introduction, predictive analytics, and model building methods. Structured vs unstructured. Supervised learning vs unsupervised learning.TRANSCRIPT
![Page 1: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/1.jpg)
Ben Taylor @bentaylordata
Predictive Analytics / Data Science
![Page 2: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/2.jpg)
Presentation Objectives
• Enable you to be smarter than your prospect (data history / lingo)
• Motivate you to be unstoppable and hyper-confident
• Motivate you to begin looking for data driven opportunities
• Motivate you to become a data scientist
![Page 3: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/3.jpg)
"What the hell is cloud computing?"-Larry Ellison, CEO Oracle
![Page 4: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/4.jpg)
What is cloud computing?
?
![Page 5: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/5.jpg)
What is big data?
Big data includes datasets or problems which exceed the capacity of a single computer and require a distributed data access system.
The concept of "big" is relative to the conventional systems and technology and is subject to change in the future with advances in memory and storage solutions.
http://www.pcmag.com/article2/0,2817,2453838,00.asp
![Page 6: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/6.jpg)
Big data trends
![Page 7: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/7.jpg)
What is a data scientist?
![Page 8: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/8.jpg)
What is a data scientist?
Engineering Finance Economics Mathematics Computer Science Physics
Data Science6-10yrs
Python Bootcamp $8,000 (3 months)
$16,000-$4,000 (3 months)
$115K avg
![Page 9: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/9.jpg)
What is a data scientist?
![Page 10: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/10.jpg)
What is a data scientist?
Master Builder
![Page 11: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/11.jpg)
What is a data scientist?
Reality distortion: Hyper-confidence
![Page 12: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/12.jpg)
![Page 13: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/13.jpg)
Data Scientist = Peacock
![Page 14: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/14.jpg)
@bentaylordata
Humans Algorithms
VS
![Page 15: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/15.jpg)
Smartest pirate
![Page 16: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/16.jpg)
Humans Algorithms
VS
NA
![Page 17: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/17.jpg)
Humans Algorithms
VSGerman (1795), French (1806)
![Page 18: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/18.jpg)
Humans Algorithms
VS
1997, IBM deep blue
Kasparov
![Page 19: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/19.jpg)
Humans Algorithms
VS
2011, IBM Watson
Ken Jennings & Brad Rutter
![Page 20: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/20.jpg)
Humans Algorithms
VS
2014, HireVue Iris
Hiring Panel
![Page 21: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/21.jpg)
Prediction process
Raw data
Data munging
Training
Model
![Page 22: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/22.jpg)
Data munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
Clean data
![Page 23: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/23.jpg)
Numeric Excel example
@bentaylordata
![Page 24: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/24.jpg)
Data munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET
![Page 25: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/25.jpg)
Missing values + categorical
@bentaylordata
![Page 26: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/26.jpg)
![Page 27: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/27.jpg)
Data munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET
Retail > 15, Engineering > 95
> 5.67
![Page 28: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/28.jpg)
Resume model
![Page 29: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/29.jpg)
Resume model
![Page 30: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/30.jpg)
Data munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET
Retail > 15, Engineering > 95GPA, Colleges, Hobbies
> 5.67
![Page 31: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/31.jpg)
Text deeper dive
![Page 32: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/32.jpg)
Sentiment example
![Page 33: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/33.jpg)
Sentiment example
![Page 34: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/34.jpg)
Sentiment
![Page 35: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/35.jpg)
Given data, find cat? dog?
@bentaylordata
![Page 36: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/36.jpg)
Talk like a data nerd
@bentaylordata
![Page 37: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/37.jpg)
Confidence & Over-fitting
![Page 38: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/38.jpg)
Confidence & Over-fitting
![Page 39: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/39.jpg)
Data Lingo Supervised vs unsupervised learning
Supervised: Training set provided.
Unsupervised: No training set, clustering based on similar attributes.
![Page 40: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/40.jpg)
Data Lingo Analytic Layers
Descriptive Analytics: Telling a data story, plotting, or visualization.
Predictive Analytics: Predict future outcomes, usually trained on a historical training set
Prescriptive Analytics: Using the insight from your predictive model to proactively change something
Interview/Interaction Analytics: Any analytics surrounding the interview or interaction.
![Page 41: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/41.jpg)
Data Lingo Prediction methods
Regression: Predicting a continuous output (stock)
Classification: Predicting discrete category outputs. i.e. Yes/Maybe/No
![Page 42: Predictive analytics and big data tutorial](https://reader037.vdocuments.mx/reader037/viewer/2022103109/5451a839af7959b5648b63f2/html5/thumbnails/42.jpg)
Data Lingo
Data Types Structured: Does it play well in Excel?
Unstructured: Raw text (Twitter), audio, video, photos, resumes, etc…