let's talk operations! (hadoop summit 2014)
DESCRIPTION
These are the introductory slides I used (in some form or another) for the Let's Talk Operations! sessions for the 2014 Hadoop Summits. No video for this one!TRANSCRIPT
![Page 1: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/1.jpg)
Let’s Talk Operations!Allen Wittenauer!
![Page 2: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/2.jpg)
![Page 3: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/3.jpg)
Twitter: @_a__w_ Email: aw @ apache.org!
![Page 4: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/4.jpg)
How many individual grids should I have?
![Page 5: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/5.jpg)
One big grid
Grid per project
• Pros!• Lower ops overhead!• One location for all data!
• Cons !• Dev and Prod on one
system
• Pros!• Capacity planning per project!
• Cons !• More headcount to maintain!• Multiple copies of data!• Data ingress is a mess
![Page 6: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/6.jpg)
Data Center
Production
ETL
Development
![Page 7: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/7.jpg)
ETL
Dev Prod
Base ETL Pull
Event FeedsDatabase Feeds
Base ETL Pull
Base ETL PullPost-Processed
Data
![Page 8: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/8.jpg)
DC2DC1
Production
ETL
Development
![Page 9: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/9.jpg)
How do I solve some common distcp issues?
![Page 10: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/10.jpg)
• Common issues!• Version incompatibilities!• Network bandwidth consumption!!
• Some tricks!• Use WebHDFS!
• All modern versions support it!• Read and write in both directions!
• Create a separate queue with hard limits!• Pull from larger, push from smaller
![Page 11: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/11.jpg)
Q&A
Allen Wittenauer Twitter: @_a__w_ Email: aw @ apache.org
![Page 12: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/12.jpg)
Bonus Slide!
![Page 13: Let's Talk Operations! (Hadoop Summit 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022051412/54c5dc0d4a7959b8548b4701/html5/thumbnails/13.jpg)
20 GB /, ... 200 GB task space (rest) HDFS
• root partitioning !!!!!
• non-root partitioning
5 GB swap 200 GB task space (rest) HDFS