big$data definion - jordi · pdf filebig data? ! do you need a definition? – is data...

11
Big Data Defini+on Spring - 2014 Jordi Torres, UPC - BSC www.JordiTorres.eu @JordiTorresBCN technology basics for data scientists

Upload: dinhquynh

Post on 27-Mar-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

Big  Data  Defini+on  

Spring - 2014

Jordi Torres, UPC - BSC www.JordiTorres.eu @JordiTorresBCN

technology basics for data scientists

Page 2: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

2

Big Data?

§  Do you need a definition? –  is data that becomes large enough

that it cannot be processed using conventional methods.

–  enough for you? :-)

§  Petabytes of data created daily social networks, mobile phones, sensors, science, …

Source:http://www.datacenterknowledge.com/archives/2011/06/28/digital-universe-to-add-1-8-zettabytes-in-2011/?utm-source=feedburner&utm-medium=feed&utm-campaign=Feed:+DataCenterKnowledge+%28Data

Page 3: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

3

Up to 10,000 Times larger

Traditional Data Warehouse and Business Intelligence

Dat

a Sc

ale

yr mo wk day hr min sec … ms µs

Exa

Peta

Tera

Giga

Mega

Kilo

Decision Frequency Occasional Frequent Real-time

Data in Motion

Dat

a at

Res

t

Big Data Sou

rce:

ww

w.s

lides

hare

.net

/sch

ihei

/pet

asca

le-a

naly

tics-

the-

wor

ld-o

f-bi

g-da

ta-r

equi

res-

big-

anal

ytic

s

Page 4: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

4

My definition :-)

§  Big Data is data that exceeds the storing, processing and managing capacity of conventional systems.

§  The reason is that the data is too big, moves too fast, or doesn’t fit the structures of our current systems architectures.

§ Moreover, to gain value from this data, we must change the way to analyze it.

Page 5: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

5

Big Data VOLUME

Petabytes?  Exabytes?  

Terabytes?  

Ze7abytes?  

Page 6: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

6

1 Gigabyte (GB) = 1.000.000.000 byte 1 Terabyte (TB) = 1.000 Gigabyte (GB) 1 Petabyte (PB) = 1.000.000 Gigabyte (GB) 1 Exabyte (EB) = 1.000.000.000 Gigabyte (GB) 1 Zettabyte (ZB) = 1.000.000.000.000 (GB)

Page 7: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

7

Big Data: VARIETY

Source:  Toni  Brey  –  Urbio1ca.com  

Page 8: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

8

Big Data: VARIETY

§  Data Growth is Increasingly Unstructured –  Structured

•  Data containing a defined data type, format, structure •  E.g. Transactional Data Base

–  Semi-Structured •  Textual data files with a discernable pattern, enabling parsing •  E.g. XML data file + xml schema

–  “Quasi” Structured •  Textual data with erratic data formats •  E. g. Web clickstream (may contain inconsistencies)

–  Unstructured •  No inherent structure and different types of files •  E.g. PDFs, images, videos ….

Page 9: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

9

Big Data: VELOCITY

§  Real-time required

Page 10: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

10

Summary:

§  Volume: –  Large Volumes of data –  Terabytes, Petabytes, … –  Data that cannot be stored in conventional RDBMS

§  Variety: –  Source data is diverse – Web Logs, Application Logs, Machine

generated data, Social network data, etc. –  Doesn't fall into neat relational structures – Unstructured, Semi-

structured

§  Velocity: –  Streaming data, Complex Event Processing data –  Velocity of incoming data and Speed of responding to it

§ 

Page 11: Big$Data Definion - Jordi · PDF fileBig Data? ! Do you need a definition? – is data that becomes large enough that it cannot be processed using ... 1 Zettabyte (ZB) = 1.000.000.000.000

11

Big Data Definition

3V= VOLUME + VARIETY

+ VELOCITY

Value?, Veracity? …