big$data definion - jordi · pdf filebig data? ! do you need a definition? – is data...
TRANSCRIPT
Big Data Defini+on
Spring - 2014
Jordi Torres, UPC - BSC www.JordiTorres.eu @JordiTorresBCN
technology basics for data scientists
2
Big Data?
§ Do you need a definition? – is data that becomes large enough
that it cannot be processed using conventional methods.
– enough for you? :-)
§ Petabytes of data created daily social networks, mobile phones, sensors, science, …
Source:http://www.datacenterknowledge.com/archives/2011/06/28/digital-universe-to-add-1-8-zettabytes-in-2011/?utm-source=feedburner&utm-medium=feed&utm-campaign=Feed:+DataCenterKnowledge+%28Data
3
Up to 10,000 Times larger
Traditional Data Warehouse and Business Intelligence
Dat
a Sc
ale
yr mo wk day hr min sec … ms µs
Exa
Peta
Tera
Giga
Mega
Kilo
Decision Frequency Occasional Frequent Real-time
Data in Motion
Dat
a at
Res
t
Big Data Sou
rce:
ww
w.s
lides
hare
.net
/sch
ihei
/pet
asca
le-a
naly
tics-
the-
wor
ld-o
f-bi
g-da
ta-r
equi
res-
big-
anal
ytic
s
4
My definition :-)
§ Big Data is data that exceeds the storing, processing and managing capacity of conventional systems.
§ The reason is that the data is too big, moves too fast, or doesn’t fit the structures of our current systems architectures.
§ Moreover, to gain value from this data, we must change the way to analyze it.
5
Big Data VOLUME
Petabytes? Exabytes?
Terabytes?
Ze7abytes?
6
1 Gigabyte (GB) = 1.000.000.000 byte 1 Terabyte (TB) = 1.000 Gigabyte (GB) 1 Petabyte (PB) = 1.000.000 Gigabyte (GB) 1 Exabyte (EB) = 1.000.000.000 Gigabyte (GB) 1 Zettabyte (ZB) = 1.000.000.000.000 (GB)
7
Big Data: VARIETY
Source: Toni Brey – Urbio1ca.com
8
Big Data: VARIETY
§ Data Growth is Increasingly Unstructured – Structured
• Data containing a defined data type, format, structure • E.g. Transactional Data Base
– Semi-Structured • Textual data files with a discernable pattern, enabling parsing • E.g. XML data file + xml schema
– “Quasi” Structured • Textual data with erratic data formats • E. g. Web clickstream (may contain inconsistencies)
– Unstructured • No inherent structure and different types of files • E.g. PDFs, images, videos ….
9
Big Data: VELOCITY
§ Real-time required
10
Summary:
§ Volume: – Large Volumes of data – Terabytes, Petabytes, … – Data that cannot be stored in conventional RDBMS
§ Variety: – Source data is diverse – Web Logs, Application Logs, Machine
generated data, Social network data, etc. – Doesn't fall into neat relational structures – Unstructured, Semi-
structured
§ Velocity: – Streaming data, Complex Event Processing data – Velocity of incoming data and Speed of responding to it
§
11
Big Data Definition
3V= VOLUME + VARIETY
+ VELOCITY
Value?, Veracity? …