introduction to data science
DESCRIPTION
What is Data Science? Why are Data Scientists so sought after at modern technology companies? In this talk, I answer those questions by reviewing the basics of data science and 3 examples of typical data science projects.TRANSCRIPT
![Page 1: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/1.jpg)
Introduction
Sean Byrnes
http://seanbyrnes.com
@sbyrnes
to
Data Science
![Page 2: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/2.jpg)
Who Am I?
f
ATTENDED
FOUNDED
CURRENTLY
from Yahoo!
![Page 3: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/3.jpg)
Introduction to Data Science
• What is Data Science?
• Example 1: Basic Math
• Example 2: Regression Modeling
• Example 3: Recommender Systems
• Getting started in data science
![Page 4: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/4.jpg)
What is Data Science?
Software Engineering
+
Statistical Analysis
![Page 5: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/5.jpg)
What is Data Science?
1. Question
2. Data Gathering
3. Exploration
4. Modeling
5. Answer
6. Production
![Page 6: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/6.jpg)
Example 1: Basic Math
What is my customer churn rate?
def. Churn rate: The percentage of subscribers to a
service that discontinue their subscription to that service
in a given time period. (aka attrition rate)
![Page 7: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/7.jpg)
Example 1: Basic Math
Churn(month) =
# customers at start
# customers lost
![Page 8: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/8.jpg)
Example 1: Basic Math
Month Churn
Dec '13 3.75%
Nov '13 1.87%
Oct '13 3.82%
Sep '13 2.76%
Aug '13 2.43%
Jul '13 2.04%
Jun '13 1.60%
![Page 9: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/9.jpg)
Example 1: Basic Math
For all customers acquired in a given month
Retention(Cmonth) =
Active(Cmonth)
Total(Cmonth)
![Page 10: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/10.jpg)
Example 1: Basic Math
0 1 2 3 4 5 6
Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14%
Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77%
Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98%
Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4%
Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13%
Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63%
Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
![Page 11: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/11.jpg)
Example 1: Basic Math
0 1 2 3 4 5 6
Dec '13 100% 12.82% 8.04% 6.34% 4.91% 3.95% 3.14%
Nov '13 100% 15.66% 9.97% 6.96% 5.46% 3.88% 2.77%
Oct '13 100% 16% 10.86% 8.62% 6.22% 5.06% 3.98%
Sep '13 100% 13.28% 9.52% 7.28% 5.28% 4.48% 4%
Aug '13 100% 12.96% 9.18% 6.55% 4.73% 3.86% 3.13%
Jul '13 100% 15.84% 10.85% 8.27% 6.67% 5.60% 4.63%
Jun '13 100% 16.08% 11.36% 8.36% 7.07% 6% 5.25%
![Page 12: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/12.jpg)
Example 2: Regression Modeling
How many users will we have next month?
![Page 13: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/13.jpg)
Example 2: Regression Modeling
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
![Page 14: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/14.jpg)
Example 2: Regression Modeling
For data set X(n), find f(n) such that
f(ni) ~ X(ni)
![Page 15: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/15.jpg)
Example 2: Regression Modeling
Assume X(ni) = [x1, x2, … xk]
f(n) = c1x1 + c2x2 + c3x3 + … + cnxn
![Page 16: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/16.jpg)
Example 2: Regression Modeling
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
Linear Model
![Page 17: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/17.jpg)
Example 2: Regression Modeling
Assume X(ni) = [x1, x2, … xk]
f(n) = c1x1 + c2x2 + c3x3 + … + cnxn
Or, maybe
f(n) = c1x1 + c2x12 + c3x2 + c4x2
2 + …+ cmxn2
![Page 18: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/18.jpg)
Example 2: Regression Modeling
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
2nd Degree Polynomial Model
![Page 19: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/19.jpg)
Example 2: Regression Modeling
-
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
1/1/13 2/1/13 3/1/13 4/1/13 5/1/13 6/1/13 7/1/13 8/1/13 9/1/13 10/1/13 11/1/13 12/1/13
4th Degree Polynomial Model
![Page 20: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/20.jpg)
Example 2: Regression Modeling
https://github.com/sbyrnes/Lyric
![Page 21: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/21.jpg)
Example 3: Recommender Systems
What other products might this
customer buy?
![Page 22: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/22.jpg)
Example 3: Recommender Systems
Product 1 Product 2 Product 3 … Product N
Customer 1 3.5 4.0 3.0
Customer 2 2.0 3.5
Customer 3 3.0 2.5
…
Customer
N4.5 4.5
![Page 23: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/23.jpg)
Example 3: Recommender Systems
Given customer preference matrix M, find
P x Q ~ M
![Page 24: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/24.jpg)
Example 3: Recommender Systems
Product 1 Product 2 Product 3 … Product N
Customer 1 3.5 4.0 2.5 3.0
Customer 2 2.0 1.5 3.5 3.0
Customer 3 1.5 3.0 2.5 4.0
…
Customer
N4.5 3.5 4.0 4.5
![Page 25: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/25.jpg)
Example 3: Recommender Systems
Given customer preferences c[p1,p2,…pn]
and overall rating average roverall
cbias = mean(c[p1], c[p2],… c[pn]) – roverall
![Page 26: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/26.jpg)
Example 3: Recommender Systems
https://github.com/sbyrnes/likely.js
![Page 27: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/27.jpg)
Getting Started in Data Science
• Programming
• Statistics
• Machine learning
• Toolkit
– R
– Hadoop
– D3
![Page 28: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/28.jpg)
seanbyrnes.com
@sbyrnes
github.com/sbyrnes
![Page 29: Introduction to Data Science](https://reader034.vdocuments.mx/reader034/viewer/2022052622/559403761a28aba4458b471c/html5/thumbnails/29.jpg)
Sean Byrnes
seanbyrnes.com
@sbyrnes
github.com/sbyrnes