hadoop ali sharza khan high performance computing 1
TRANSCRIPT
![Page 1: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/1.jpg)
1
Hadoop
Ali Sharza KhanHigh Performance Computing
![Page 2: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/2.jpg)
2
Table of Content
• Hadoop• Where did Hadoop come from ?• What problems can Hadoop solve?• Where does Hadoop applies to ?• How is Hadoop architected?• Two main parts of Hadoop• Conclusion
![Page 3: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/3.jpg)
3
Hadoop
• What is Hadoop ?
– Open Source project
– Processing Large data sets in parallel
![Page 4: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/4.jpg)
4
Where did Hadoop come from?
• Google• Yahoo, Facebook, Twitter and Linkedln are
actively contributing towards Hadoop.
![Page 5: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/5.jpg)
5
What problems can Hadoop solve?
• Where you have lot of data
• Run analytics that are deep and computational extensive
![Page 6: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/6.jpg)
6
Where does Hadoop applies to ?
• Search engine• Finance• Online Retail• Government• Media and entertainment• Research Institution and other market
![Page 7: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/7.jpg)
7
How is Hadoop architected?
• Every server has 2 or 4 or 8 Cpu’s.• Each server operates on its own little piece of
data.• Hadoop clusters at Yahoo covers 25000
servers, and store 25 petabytes of application data.
• The largest cluster being 3500 servers.
![Page 8: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/8.jpg)
8
Cloudera CEO Interview
http://www.youtube.com/watch?v=qNP4_ICDeqE
![Page 9: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/9.jpg)
9
Two main parts of Hadoop
• HDFS (Hadoop Distributed File System)
• Map Reduce Framework– Map Phase– Reduce Phase– JobTracker (The master)– TaskTracker (The slave)
![Page 10: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/10.jpg)
10
MapReduce FrameWork
![Page 11: Hadoop Ali Sharza Khan High Performance Computing 1](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eb75503460f94bc0fd7/html5/thumbnails/11.jpg)
11
Conclusion
• Why Hadoop is able to deal with lots of data?
• Why Hadoop is able to compute complicated Computational questions?