Download - Hadoop and Big Data
![Page 1: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/1.jpg)
Big Data and Hadoop Essentials
![Page 2: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/2.jpg)
2
Hadoop Ecosystem
Agenda
Map Reduce Algorithm Exemplified
Hadoop Architecture
Brief History in time
Why Hadoop?
How Big is Big Data?
Demo
![Page 3: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/3.jpg)
3
Brief History in time In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but more systems of computers.
—Grace Hopper, American Computer Scientist
![Page 4: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/4.jpg)
4
How Big is Big Data?
![Page 5: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/5.jpg)
5
How Big is Big Data?
![Page 6: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/6.jpg)
6
How Big is Big Data?
![Page 7: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/7.jpg)
7
Why Hadoop?
![Page 8: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/8.jpg)
8
The Problem
![Page 9: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/9.jpg)
9
BIG
DATA
Volume
Big Data comes in on large scale. Its on TB and even PB
Records, Transaction, Tables , Files
Veracity
Quality, consistency, reliability and provenance of
data
Good, bad, undefined, inconsistency, incomplete.
Variety
Big Data extends structured, including semi- structured and unstructured data of all variety
text, log, xml, audio, video, stream, flat files
Velocity
Data flown continues, time sensitive, streaming flow
Batch, Real time, Streams, Historic
Challenges in managing Big Data
![Page 10: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/10.jpg)
10
To overcome Big Data challenges Hadoop evolves
• Cost Effective – Commodity HW
• Big Cluster – (1000 Nodes) --- Provides Storage & Processing
• Parallel Processing – Map reduce
• Big Storage – Memory per node * no of Nodes / RF
• Fail over mechanism – Automatic Failover
• Data Distribution
• Moving Code to data
• Heterogeneous Hardware System (IBM,HP,AIX,Oracle Machine of any memory and CPU configuration)
• Scalable
![Page 11: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/11.jpg)
11
What Exactly is Hadoop?
![Page 12: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/12.jpg)
12
What’s in a name?
![Page 13: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/13.jpg)
13
Hadoop Vendors
![Page 14: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/14.jpg)
14
Who uses Hadoop?
![Page 15: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/15.jpg)
15
Why Hadoop is used for?
![Page 16: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/16.jpg)
16
Stop and Ponder • Is Hadoop an alternative for RDBMS?
• Hadoop is not replacing the traditional data systems used for building
analytic applications – the RDBMS, EDW and MPP systems – but rather is a
complement. & Works fine together with RDBMs.
• Hadoop is being used to distill large quantities of data into something more
manageable
![Page 17: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/17.jpg)
17
Stop and Ponder • But Don’t we know Coherence to be distributed too? Why Hadoop?
Coherence is the market leading In-Memory Data Grid. While Hadoop works fine
for large processing operations, i.e. requiring many TB of data, that can be
processed in a batch like way, there are use cases where the processing
requirements are more real-time and the data volumes are smaller, where
Coherence is a better choice than HDFS for storing the data
![Page 18: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/18.jpg)
18
Hadoop vs. RDBMS
RDBMS MapReduce
Data size Gigabytes Petabytes
Access Interactive and batch Batch
Structure Fixed schema Unstructured schema
Language SQL Procedural (Java, C++, Ruby, etc)
Integrity High Low
Scaling Nonlinear Linear
Updates Read and write Write once, read many times
Latency Low High
![Page 19: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/19.jpg)
19
Using Hadoop in Enterprise
![Page 20: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/20.jpg)
20
Hadoop Architecture
• Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
• Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.
HDFS
Map Reduce
Hadoop
![Page 21: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/21.jpg)
21
Hadoop Distributed File System(HDFS)
![Page 22: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/22.jpg)
22
HDFS Architecture(Master-Slave)
Secondary
Name Node
Master Book Keeper
Slave(s)
Periodic checkpoint
Data Block
![Page 23: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/23.jpg)
23
The CORE
CLIENT Data Analytics Jobs
Map Reduce
Data Storage Jobs
HDFS
MASTER
SLAVE
= HDFS
![Page 24: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/24.jpg)
24
Hadoop Ecosystem
![Page 25: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/25.jpg)
25
MAP REDUCE Algorithm exemplified!
Calculate the yearly average per state.
![Page 26: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/26.jpg)
26
Group the city average temperatures by state
1
![Page 27: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/27.jpg)
27
We don’t really care about the city names, so we will
discard those and keep only the state names and
cities Temperatures.
2
![Page 28: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/28.jpg)
28
3
We’re going to get a list of temperatures averages for
each state.
![Page 29: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/29.jpg)
29
That was Map/Reduce!
4
All we have to do is to calculate the average
temperature for each state.
![Page 30: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/30.jpg)
30
Let’s do it again… • Map/Reduce has 3 stages : Map/Shuffle/Reduce
• The Shuffle part is done automatically by Hadoop, you just need to implement the Map and Reduce parts.
• You get input data as <Key,Value> for the Map part.
• In this example, the Key is the City name, and the Value is the set of attributes : State and City yearly average temperature.
![Page 31: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/31.jpg)
31
• Since you want to regroup your temperatures by state, you’re going to get rid of the city name, and the State will become the Key, while the Temperature will become the Value.
![Page 32: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/32.jpg)
32
Shuffle • Now, the shuffle task will run on the output of the Map task. It is going to
group all the values by Key, and you’ll get a List<Value>
![Page 33: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/33.jpg)
33
Reduce • The Reduce task is the one that does the logic on the data, in our case this
is the calculation of the State yearly average temperature.
• And that’s what we will get as final output
![Page 34: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/34.jpg)
34
Hadoop AppStore
![Page 35: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/35.jpg)
35
Ecosystem Matrix
![Page 36: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/36.jpg)
36
Pig and HIVE in the Hadoop Ecosystem
![Page 37: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/37.jpg)
37
Hadoop Ecosystem Development
![Page 38: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/38.jpg)
38
Demo
![Page 39: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/39.jpg)
39
References
• http://hadoop.apache.org/
• http://hadoop.apache.org/hive/
• Hadoop in Action
(http://www.manning.com/lam/)
• Definitive Guide to Hadoop, 2nd ed.
(http://oreilly.com/catalog/0636920010388)
• Yahoo! Hadoop blog
(http://developer.yahoo.net/blogs/hadoop/)
• Cloudera
(http://www.cloudera.com/)
![Page 40: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/40.jpg)
40
Q & A
![Page 41: Hadoop and Big Data](https://reader031.vdocuments.mx/reader031/viewer/2022032312/55cf9497550346f57ba309a2/html5/thumbnails/41.jpg)
41
Thank You