hadoop administrationfiles.meetup.com/11583652/hadoop_presentation.pdf · why hadoop ? we are...
TRANSCRIPT
![Page 1: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/1.jpg)
Hadoop Administration
![Page 2: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/2.jpg)
Case for Hadoop
Why Hadoop is needed
How Hadoop originated
What problems Hadoop Solve
![Page 3: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/3.jpg)
Why Hadoop ?
We are generating more data then ever before
Financial transactions
Sensor networks
Server logs
Analytics
Social Media
It’s not just about the size of data, but the frequency of data. We are generating data faster then ever before.
We need to make sense out of data.
![Page 4: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/4.jpg)
The 3 V's
Web logs
Images
Videos
Audios
Sensor Data
Volume Velocity Variety
![Page 5: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/5.jpg)
Two Big problems at hand
Large scale data storage
Large scale data analysis
- Traditional ways of moving data to the compute node, does not scale well at this large scale.
- More time spent coping data then actually processing it.
![Page 6: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/6.jpg)
![Page 7: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/7.jpg)
What is Hadoop Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.
It is an Open-source Data Management with scale-out storage
& distributed processing.
![Page 8: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/8.jpg)
Hadoop Eco-System
![Page 9: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/9.jpg)
Hadoop Components
It has two main components:
HDFS – Hadoop Distributed File System (Storage)
Distributed across “nodes”
Natively redundant
NameNode tracks locations.
MapReduce (Processing)
Splits a task across processors
“near” the data & assembles results
![Page 10: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/10.jpg)
Main Components Of HDFS NameNode
- Master Node
- Stores MetaData
DataNode
- Stores the Actual Data Blocks
- Serves Read/Write Requests
![Page 11: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/11.jpg)
NameNode Metadata Meta-data in Memory
- The entire metadata is in main memory
- No demand paging of FS meta-data
Types of Metadata
- List of files
- List of Blocks for each file
- List of DataNode for each block
- File attributes, e.g. access time, replication factor
A Transaction Log
- Records file creations, file deletions. etc
![Page 12: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/12.jpg)
HDFS Architecture
![Page 13: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/13.jpg)
File Split
![Page 14: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/14.jpg)
File Split
![Page 15: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/15.jpg)
![Page 16: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/16.jpg)
![Page 17: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/17.jpg)
Replication
![Page 18: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/18.jpg)
Write Operation
![Page 19: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/19.jpg)
Write Operation
![Page 20: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/20.jpg)
![Page 21: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/21.jpg)
Rack Awareness
![Page 22: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/22.jpg)
Pipelined Write
![Page 23: Hadoop Administrationfiles.meetup.com/11583652/Hadoop_Presentation.pdf · Why Hadoop ? We are generating more data then ever before Financial transactions Sensor networks Server logs](https://reader033.vdocuments.mx/reader033/viewer/2022060219/5f06e1da7e708231d41a34b1/html5/thumbnails/23.jpg)
Thank You!