big data - mini workshop
TRANSCRIPT
-
8/10/2019 Big Data - Mini Workshop
1/29
David Tarrant @davetaz
Big Data? Big Opportunities?
-
8/10/2019 Big Data - Mini Workshop
2/29
Provide some practical steps to managbig open data projects.
-
8/10/2019 Big Data - Mini Workshop
3/29
Define Big Data
Describe Adi"
erent approaches to managing bigdata projects
Apply enterprise tools to analyse a big dataset q
O
-
8/10/2019 Big Data - Mini Workshop
4/29
WHAT IS BIG D
-
8/10/2019 Big Data - Mini Workshop
5/29
Big Data
Dataset that are too large andcomplex to manipulate withstandard methods or tools.
-
8/10/2019 Big Data - Mini Workshop
6/29
Excel
Workbook WASlimited to 65,536 rows (216aka 1
64-Bit operating system addressing limit is 264
18,446,744,073,709,551,615
q q t b m t h
-
8/10/2019 Big Data - Mini Workshop
7/29
What is big data?
Volume
Velocity
Variety
Veracity
-
8/10/2019 Big Data - Mini Workshop
8/29
What is big data?
Volume
Velocity
Variety
Veracity
We create around 4 zettabytes ofdata day.
Thats 1 sextillion bytes per day(128-Bit OS required)
-
8/10/2019 Big Data - Mini Workshop
9/29
What is big data?
Volume
Velocity
Variety
Veracity
The data is created quicker thanwe can curate its storage.
-
8/10/2019 Big Data - Mini Workshop
10/29
What is big data?
Volume
Velocity
Variety
Veracity
The data is continuously changingin structure, format and detail.
-
8/10/2019 Big Data - Mini Workshop
11/29
What is big data?
Volume
Velocity
Variety
Veracity
The data quality is highly variable and
a"ected by changing perception oftruth and fact.
-
8/10/2019 Big Data - Mini Workshop
12/29
Big Data
Taken collectively. All digital data is bigdata. Looking at a facet might revealthat you are looking at a dataset thatonly conforms to one or two of the Vs.
Can you name a dataset that shows thecharacteristics of all 4 Vs?
-
8/10/2019 Big Data - Mini Workshop
13/29
A few more Vs
Value and Viability
More data does not mean better results.
In fact often entirely the opposite is true.
Sample selection is critical to all good statistic studies.
Not being able to control selection may lead to an incorrect con
-
8/10/2019 Big Data - Mini Workshop
14/29
Conclusion
The majority of datasetsare large.
Lots of rows with lots of joins that can beprocessed. If you know how to exploitcomputing power available.
-
8/10/2019 Big Data - Mini Workshop
15/29
Define Big Data
Discuss the di"
erent approaches to managing bdata projects
Apply enterprise tools to process a large dataset
O
-
8/10/2019 Big Data - Mini Workshop
16/29
Download the data, buy a mac p
12 Cores
64Gb RAM
7,500
-
8/10/2019 Big Data - Mini Workshop
17/29
50,000+ 80 Cores
4Tb RAM
http://browser.primatelabs.com/geekbench3/913858
Download the data, build a clust
-
8/10/2019 Big Data - Mini Workshop
18/29
Take the data to the cloud
32 cores
244Gb RAM
$0.3293 per Hour
!"#$%%&'()&*&+,-).,*%/.0%1-(2&-./324#/(%
-
8/10/2019 Big Data - Mini Workshop
19/29
Separate yourpipelines
-
8/10/2019 Big Data - Mini Workshop
20/29
Compute
Comput!"#$%%&'()&*&+,-
Data
-
8/10/2019 Big Data - Mini Workshop
21/29
Queries: Google speed
-
8/10/2019 Big Data - Mini Workshop
22/29
Data
-
8/10/2019 Big Data - Mini Workshop
23/29
Compute Compute Compute
Compute Compute
Web Server
-
8/10/2019 Big Data - Mini Workshop
24/29
Compute Compute Compute
Compute Compute
Web Server
MAP
REDUC
E
-
8/10/2019 Big Data - Mini Workshop
25/29
Map-reduce services
7,,89/ :18 ;
-
8/10/2019 Big Data - Mini Workshop
26/29
Define Big Data
Discuss the di"erent approaches to managing b
data projects
Apply enterprise tools to process a large dataset
O
-
8/10/2019 Big Data - Mini Workshop
27/29
Exercise
Visualising 6m rows of
data in Socrata.
-
8/10/2019 Big Data - Mini Workshop
28/29
Compute
Comput
Data
-
8/10/2019 Big Data - Mini Workshop
29/29
David Tarrant @davetaz
Thank-you