data science perspective, manish kurse, 2016
TRANSCRIPT
© Manish Kurse, 2016
Data Science - a PerspectiveManish Kurse, Ph.D.
Data Scientist, Google28 April 2016
1This is my perspective and is not necessarily intended to represent that of my employer
© Manish Kurse, 2016
Agenda
An Introduction
2
Insights on being a Data Scientist in the Industry
Thoughts about this evolving field
Lessons learnt on transitioning to data science
© Manish Kurse, 2016
An Introduction
Insights on being a Data Scientist in the Industry
Thoughts about this evolving field
Lessons learnt on transitioning to data science3
© Manish Kurse, 2016
Extracting insights from structured and unstructured data
Creating actionable solutions and products based on these insights
What is Data Science?
Courtesy : Drew Conway 4
(Programming)
(Domain expertise)
© Manish Kurse, 2016
Popular Examples of Data Science
5
Recommendation Systems Inventory planning Dynamic pricing
© Manish Kurse, 2016
Interest in data science has grown rapidly!
6
© Manish Kurse, 2016
Why this rise in interest?
7
Digital Connected World
Data storage is cheap
Computational power is cheap
Need to make sense of data
© Manish Kurse, 2016
Blind Men and an Elephant
Taken from the internet. Original artist: Not sure8
© Manish Kurse, 2016 99
What do data scientists do in the industry?
© Manish Kurse, 2016
Developing models and
building products using data
Data Science today is a spectrum
Business analysts
generating insights
Researchers developing new mathematical
techniques and algorithms
Insight 1:
10
© Manish Kurse, 2016
Data Scientists wear several hats
Dashboards Continuous
Business Insights
Insight 2:
Slide-decksActionable insights
SoftwareProducts
Prototyping
Tools and infrastructureData science
platforms11
© Manish Kurse, 2016
Data Science Interfaces with Several TeamsDefine project
Define data sources
Build pipelines
Build models
Visualization
Evaluate with users
Launch
Productionize
Determine need with stakeholders.
Experimentation
Data cleaning
Insight 3:
12
© Manish Kurse, 2016
Define project
Define data sourcesBuild pipelines
Build models
Visualization
Evaluate with users
Launch
Productionize
Work with engineers, set-up new data logging
Experimentation
Data cleaning
13
Insight 3:Data Science Interfaces with Several Teams
© Manish Kurse, 2016
Define project
Define data sources
Build pipelines
Build models
Visualization
Evaluate with users
Launch
Productionize
Data engineering
Experimentation
Data cleaning
14
Insight 3:Data Science Interfaces with Several Teams
© Manish Kurse, 2016
Define project
Define data sources
Build pipelines
Build models
Visualization
Evaluate with users
Launch
Productionize
Experimentation
Data cleaningClean raw data, exploratory
analysis
Insight 3:
15
Data Science Interfaces with Several Teams
© Manish Kurse, 2016
Define project
Define data sources
Build pipelines
Build modelsVisualization
Evaluate with users
Launch
Productionize
Experimentation
Machine learning/ computational models
Data cleaning
Insight 3:
16
Data Science Interfaces with Several Teams
© Manish Kurse, 2016
Define project
Define data sources
Build pipelines
Build models
VisualizationEvaluate with users
Launch
Productionize
U/X
Experimentation
Data cleaning
Data Science Interfaces with Several TeamsInsight 3:
17
© Manish Kurse, 2016
Define project
Define data sources
Build pipelines
Build models
Visualization
Evaluate with users
Launch
Productionize
Get user feedback
Experimentation
Data cleaning
Data Science Interfaces with Several TeamsInsight 3:
18
© Manish Kurse, 2016
Data Science Interfaces with Several TeamsDefine project
Define data sources
Build pipelines
Build models
Visualization
Evaluate with users
Launch
ProductionizeWork with Software Engineers
Experimentation
Data cleaning
Insight 3:
19
© Manish Kurse, 2016
Data Science Interfaces with Several TeamsDefine project
Define data sources
Build pipelines
Build models
Visualization
Evaluate with users
LaunchProductionize
Launch to customers/stakeholdersExperimentation
Data cleaning
Insight 3:
20
© Manish Kurse, 2016
Data Science Interfaces with Several TeamsDefine project
Define data sources
Build pipelines
Build models
Visualization
Evaluate with users
Launch
Productionize
ExperimentationA/B Experiments
Data cleaning
Insight 3:
21
© Manish Kurse, 2016
Every Stage in Business is a Data Science Opportunity
Product
Sales
Customer SupportCustomer engagement
Marketing
Understanding need
Insight 4:
22
© Manish Kurse, 2016
Getting the right data could take time, effort
Change is constant and not everything can be modeled
Data cannot solve everything
Gaining stakeholder trust and showing value
Data Science is ChallengingInsight 5:
23
© Manish Kurse, 2016 24
Thoughts on Data Science Evolution
© Manish Kurse, 2016
Need for data scientists will continue to exist
Growing data science tools
Data scientists are needed to ask the right questions
Define the data, the solution
Role of a data scientist will evolve
Google Cloud Machine Learning
Thought 1:
25
© Manish Kurse, 2016
Data science will be an integral part of business strategyThought 2:
Data Infrastructure
Understanding Business Need
Understanding Customers
Data Logging
26
© Manish Kurse, 2016
Machine learning will influence non-data scientist rolesThought 3:
Machine learning becomes mainstream
Business analysts apply more complex predictive models
Software engineers are trained in building machine learning software
27
© Manish Kurse, 2016
Security and Privacy should/will be a focus
“With Great Power Comes Great Responsibility”
Data Science
Thought 4:
28Source: Marvel
© Manish Kurse, 2016
Journey towards Data Science
Source: rei.com29
© Manish Kurse, 2016
Spend time to understand the field
Books
Data Science for Business
Doing Data Science
Big Data: A Revolution...
Longer List
Podcasts
Linear Digressions
Data Skeptic
Partially Derivative...
Longer list
Follow
Subscriptions on online magazines like Flipboard
Data scientists in your field of interest
Longer list
Blogs
KDNuggets
DataTau
Analytics Vidya
Longer List
Lesson 1:
30
© Manish Kurse, 2016
Online tutorialsAlgorithms and data structuresPython: Tutorials, Python for Data Analysis R: TutorialsSQL: Tutorials
Knowledge of tools is important, but understanding of fundamentals is key
Lesson 2
ClassesMOOCs: Udacity, CourseraBootcamps: Logit, Insight, General Assembly, Data IncubatorMentored Courses: Thinkful, Springboard
Machine Learning, Statistics, Programming
31
© Manish Kurse, 2016
Free DatasetsInteresting data-sets for statistics
Datasets curated by data scientists
Data sources for cool data science projects
Side-Projects are invaluableLesson 3
Side projects
Mini projects
Online contests like kaggle.com
Article about choosing projects
Create a web portfolio
Host code on github
Creating a website hosted on github
32
© Manish Kurse, 2016
Exciting Time to be in Data Science!
33
An Introduction
Insights on being a Data Scientist in the Industry
Thoughts about this evolving field
Lessons learnt on transitioning to data science