hadoop architecture meetup
DESCRIPTION
TRANSCRIPT
![Page 1: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/1.jpg)
Hadoop Architecture
![Page 2: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/2.jpg)
Agenda• Different Hadoop daemons & its roles
• How does a Hadoop cluster look like
• Under the Hood:- How does it write a file
• Under the Hood:- How does it read a file
• Under the Hood:- How does it replicate the file
• Under the Hood:- How does it run a job
• How to balance an un-balanced hadoop cluster
![Page 3: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/3.jpg)
Hadoop – A bit of background
• It’s an open source project
• Based on 2 technical papers published by Google
• A well known platform for distributed applications
• Easy to scale-out
• Works well with commodity hard wares(not entirely true)
• Very good for background applications
![Page 4: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/4.jpg)
Hadoop Architecture
• Two Primary components Distributed File System (HDFS): It deals with file
operations like read, write, delete & etc
Map Reduce Engine: It deals with parallel computation
![Page 5: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/5.jpg)
Hadoop Distributed File System
• Runs on top of existing file system
• A file broken into pre-defined equal sized blocks & stored individually
• Designed to handle very large files
• Not good for huge number of small files
![Page 6: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/6.jpg)
Map Reduce Engine
• A Map Reduce Program consists of map and reduce functions
• A Map Reduce job is broken into tasks that run in parallel
• Prefers local processing if possible
![Page 7: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/7.jpg)
![Page 8: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/8.jpg)
Hadoop Cluster
![Page 9: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/9.jpg)
Typical Workflow
![Page 10: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/10.jpg)
![Page 11: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/11.jpg)
![Page 12: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/12.jpg)
![Page 13: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/13.jpg)
![Page 14: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/14.jpg)
![Page 15: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/15.jpg)
![Page 16: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/16.jpg)
![Page 17: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/17.jpg)
![Page 18: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/18.jpg)
![Page 19: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/19.jpg)
![Page 20: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/20.jpg)
![Page 21: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/21.jpg)
![Page 22: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/22.jpg)
![Page 23: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/23.jpg)
![Page 24: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/24.jpg)
![Page 25: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/25.jpg)
![Page 26: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/26.jpg)
Cluster Balancing
![Page 27: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/27.jpg)
Quiz
• If you had written a file of size 1TB into HDFS with replication factor 2, What is the actual size required by the HDFS to store this file?
• True/False? Even if Name node goes down, I still will be able to read files from HDFS.
![Page 28: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/28.jpg)
Quiz
• True/False? In Hadoop Cluster, We can have a secondary Job Tracker to enhance the fault tolerance.
• True/False? If Job Tracker goes down, You will not be able to write any file into HDFS.
![Page 29: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/29.jpg)
Quiz
• True/False? Name node stores the actual data itself.
• True/False? Name node can be re-built using the secondary name node.
• True/False? If a data node goes down, Hadoop takes care of re-replicating the affected data block.
![Page 30: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/30.jpg)
Quiz
• In which scenario, one data node tries to read data from another data node?
• What are the benefits of Name node’s rack-
awareness?
• True/False? HDFS is well suited for applications which write huge number of small files.
![Page 31: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/31.jpg)
Quiz
• True/False? Hadoop takes care of balancing the cluster automatically?
• True/False? Output of Map tasks are written to HDFS file?
• True/False? Output of Reduce tasks are written to HDFS file?
![Page 32: Hadoop architecture meetup](https://reader035.vdocuments.mx/reader035/viewer/2022070303/549a0f87b4795955718b467b/html5/thumbnails/32.jpg)
Quiz
• True/False? In production cluster, commodity hardware can be used to setup Name node.
• Thank You