platforms for data science - computing on the brink
DESCRIPTION
Talk atTRANSCRIPT
![Page 1: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/1.jpg)
There is no magicThere is only awesome
D e e p a k S i n g h
Platforms for data science
![Page 2: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/2.jpg)
bioinformatics
image: Ethan Hein
![Page 3: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/3.jpg)
3
![Page 4: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/4.jpg)
collection
![Page 5: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/5.jpg)
curation
![Page 6: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/6.jpg)
analysis
![Page 7: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/7.jpg)
what’s the big deal?
![Page 8: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/8.jpg)
![Page 9: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/9.jpg)
Source: http://www.nature.com/news/specials/bigdata/index.html
![Page 10: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/10.jpg)
Image: Yael Fitzpatrick (AAAS)
![Page 11: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/11.jpg)
Image: Yael Fitzpatrick (AAAS)
![Page 12: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/12.jpg)
lots of data
![Page 13: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/13.jpg)
lots of people
![Page 14: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/14.jpg)
lots of places
![Page 15: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/15.jpg)
constant change
![Page 16: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/16.jpg)
we want to make our data more effective
![Page 17: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/17.jpg)
versioning
![Page 18: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/18.jpg)
provenance
![Page 19: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/19.jpg)
filter
![Page 20: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/20.jpg)
aggregate
![Page 21: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/21.jpg)
extend
![Page 22: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/22.jpg)
mashup
![Page 23: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/23.jpg)
human interfaces
![Page 24: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/24.jpg)
![Page 25: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/25.jpg)
image: Leo Reynolds
![Page 26: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/26.jpg)
hard problem
![Page 27: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/27.jpg)
really hard problem
![Page 28: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/28.jpg)
so how do get there?
![Page 29: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/29.jpg)
information platforms
![Page 31: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/31.jpg)
dataspaces
Further reading: Jeff Hammerbacher, Information Platforms and the rise of the data scientist, Beautiful Data
![Page 32: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/32.jpg)
the unreasonable effectiveness of data
Halevy, et al. IEEE Intelligent Systems, 24, 8-12 (2009)
![Page 33: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/33.jpg)
accept all data formats
![Page 34: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/34.jpg)
evolve APIs
![Page 35: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/35.jpg)
beyond databases and the data warehouse
![Page 36: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/36.jpg)
data as a programmable
resource
![Page 37: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/37.jpg)
data is a royal garden
![Page 38: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/38.jpg)
compute is a fungible commodity
![Page 39: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/39.jpg)
optimizing the most valuable resource
![Page 40: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/40.jpg)
compute, storage, workflows, memory,
transmission, algorithms, cost, …
![Page 41: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/41.jpg)
people
Credit: Pieter Musterd a CC-BY-NC-ND license
![Page 42: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/42.jpg)
Image: Chris Dagdigian
![Page 43: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/43.jpg)
my bias
![Page 44: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/44.jpg)
cloud services
![Page 45: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/45.jpg)
distributed systems
![Page 46: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/46.jpg)
scale
![Page 47: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/47.jpg)
global
![Page 48: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/48.jpg)
consumptionmodels
![Page 49: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/49.jpg)
on-demand
![Page 50: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/50.jpg)
what is the value of your data?
![Page 51: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/51.jpg)
![Page 52: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/52.jpg)
![Page 53: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/53.jpg)
Credit: Angel Pizzaro, U. Penn
![Page 54: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/54.jpg)
mapreduce for genomics
http://bowtie-bio.sourceforge.net/crossbow/index.shtmlhttp://contrail-bio.sourceforge.net
http://bowtie-bio.sourceforge.net/myrna/index.shtml
![Page 55: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/55.jpg)
![Page 56: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/56.jpg)
Bioproximity
http://aws.amazon.com/solutions/case-studies/bioproximity/
![Page 57: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/57.jpg)
![Page 58: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/58.jpg)
![Page 59: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/59.jpg)
30,472 cores
![Page 60: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/60.jpg)
$1279/hr
![Page 63: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/63.jpg)
in summary
![Page 64: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/64.jpg)
large scale data requires a rethink
![Page 65: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/65.jpg)
data architecture
![Page 66: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/66.jpg)
compute architecture
![Page 67: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/67.jpg)
distributed, programmable infrastructure
![Page 68: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/68.jpg)
cloud services
![Page 69: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/69.jpg)
remove constraints
![Page 70: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/70.jpg)
can we build data science platforms?
![Page 71: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/71.jpg)
there is no magicthere is only awesome
![Page 72: Platforms for Data Science - Computing on the Brink](https://reader033.vdocuments.mx/reader033/viewer/2022051819/54c4ae904a7959c5428b45d1/html5/thumbnails/72.jpg)
[email protected] Twitter:@mndoci
http://slideshare.net/mndocihttp://mndoci.com
Inspiration and ideas from Matt Wood& Larry Lessig
Credit” Oberazzi under a CC-BY-NC-SA license