from big data management to big data scienceeldawy/19fcs226/slides/cs226-14-whats...what is next?...
TRANSCRIPT
From Big Data Management
to Big Data Science
1
What is next?
Real big data is widely available
Only a few people know how to deal with it
You’re now one of them
Applications
The project is a start
Keep your hands dirty
Consider using the public cloud (e.g., AWS,
Google Cloud, or Microsoft Azure)
2
Job Market
https://www.techicy.com/5-best-programming-languages-to-watch-out-in-2019-for-data-science.html
3
Data Science
Credits: Drew Conway 4
Data Science
https://mashimo.wordpress.com/2016/05/28/big-data-data-science-and-machine-learning-explained/
5
Data Scientist
6
Next Steps
CS
Big data tools
Python/R/Scala
Math/Stats
Linear algebra
Correlation analysis
Hypothesis tests
Collaboration with domain experts
Visualization
Prototyping
7
CS
https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
8
CS/Big Data
https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
9
Math/Stats
https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
10
Online Courses
https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
11
Data Analytics
https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
12
Big Data Landscape
Distributed
StorageHDFS
KV
stores
LSM
trees
Column
stores
Query
Processing
Map
ReduceRDD Hyracks
High level
APIsPig
Latin
Spark
SQLHBase
Big data
packages
Algebricks
MLlib GraphX SparkR
13