getting your head around big data
DESCRIPTION
My talk on Big Data from Dallas Day of .NET 2014TRANSCRIPT
![Page 1: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/1.jpg)
Getting your head around
BIG Data
![Page 2: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/2.jpg)
https://github.com/glennblockhttps://twitter.com/gblock
“I should be tweeting"
![Page 3: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/3.jpg)
3
Make machine data accessible, usable and valuable to everyone.
![Page 4: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/4.jpg)
Platform for Machine DataAny Machine Data
HA Indexes and Storage
Search and Investigation
Proactive Monitoring
Operational Visibility
Real-time Business Insights
CommodityServers
Online Services Web
Services
ServersSecurity GPS
Location
StorageDesktops
Networks
Packaged Applications
CustomApplicationsMessaging
TelecomsOnline
Shopping Cart
Web Clickstreams
Databases
Energy Meters
Call Detail Records
Smartphones and Devices
RFID
![Page 5: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/5.jpg)
DATA
![Page 6: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/6.jpg)
15,000 BC – PicturesLascaux, France
![Page 7: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/7.jpg)
6000 BC – Symbols
![Page 8: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/8.jpg)
3,500 BC – Language
![Page 9: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/9.jpg)
1,275 BC – Papyrus
![Page 10: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/10.jpg)
1st - 13th Century - Codex
![Page 11: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/11.jpg)
13th Century – Movable type
![Page 12: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/12.jpg)
15th Century – Printing press
![Page 13: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/13.jpg)
19th to 20th century Babbage Analytical engine
![Page 14: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/14.jpg)
1936 – Turing machine
![Page 15: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/15.jpg)
1945 – ENIAC
![Page 16: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/16.jpg)
1947 – The first bug
![Page 17: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/17.jpg)
1977 - Arpanet
![Page 18: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/18.jpg)
1990s Internet
![Page 19: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/19.jpg)
Phones and Tablets
![Page 20: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/20.jpg)
RFID
![Page 21: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/21.jpg)
Cloud
![Page 22: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/22.jpg)
Services
![Page 23: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/23.jpg)
New consumer devices
23
![Page 24: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/24.jpg)
![Page 25: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/25.jpg)
90 percent of all the data in the world has been generated over the last two years
source: sciencedaily.com
![Page 26: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/26.jpg)
Every day 2.5 quintillion bytes of data is generated
1 quintillion = 1 + 18 zeros!57.5 billion 32 GB iPads
source: storagenewsletter.com
![Page 27: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/27.jpg)
2.7 zettabytes exist in the digital universe
1 zettabyte = 1 + 21 zeros!42zb = All human speech digitized
source: highscalability.com
![Page 28: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/28.jpg)
How big is big?
![Page 29: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/29.jpg)
That’s A LOT of data!
![Page 30: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/30.jpg)
How do you harness it?
![Page 31: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/31.jpg)
This is what big data is really about.
![Page 32: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/32.jpg)
Asking questions andgetting answers
![Page 33: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/33.jpg)
Massive amounts of data.
Machine generated
VOLUME
![Page 34: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/34.jpg)
Data is coming from a multitude of sources
Mix of structured and un-structured (JSON, XML, CSV, Plain Text)
Need a way to store it and and query it
VARIETY
![Page 35: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/35.jpg)
VARIETYLog filesActivity FeedsEmails
Device StreamsAudio FilesVideos
![Page 36: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/36.jpg)
Data arrives at many different frequencies
Need to be able to process real time.
VELOCITY
![Page 37: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/37.jpg)
Not all data that is stored is useful.
Need to identify the useful data
Need to wade through all the noise
VERACITY
![Page 38: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/38.jpg)
SOLUTIONS
![Page 39: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/39.jpg)
Map/Reducefunction map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1)
function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)
![Page 40: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/40.jpg)
Hi scale and availability databases
![Page 41: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/41.jpg)
Distributed processing of large datasets
![Page 42: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/42.jpg)
Data Visualization and analysis
![Page 43: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/43.jpg)
End to end tools
![Page 44: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/44.jpg)
More information
www.mongodb.org www.memsql.com cassandra.apache.orghadoop.apache.org
www.tableausoftware.comwww.elasticsearch.orgsplunk.com
![Page 45: Getting your head around big data](https://reader031.vdocuments.mx/reader031/viewer/2022013114/53f490468d7f728e318b4889/html5/thumbnails/45.jpg)
@gblock http://github.com/glennblock
http://www.flickr.com/photos/11812960@N04/4050576435