building hadoop with chef
DESCRIPTION
Slides from my presentation at #ChefConf 2013 Big Data meets Configuration Management. Edmunds.com's first foray into Hadoop is a tale of challenges, discovery, and ultimately triumph. This is the story of how Edmunds.com leveraged Chef - and its community - to build a fully automated Hadoop cluster in the face of looming project deadlines.TRANSCRIPT
![Page 1: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/1.jpg)
Build & Managing HadoopBuild & Managing Hadoopwith Chefwith Chef
John MartinSr Director, Production Engineering
![Page 2: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/2.jpg)
IntroductionIntroduction
• Me, Me, Me
• 10+ years in .com & JEE space
• Project Crew
• Paul MacDougall
• Greg Rokita
• KC Braunschweig (former)
• Ryan Holmes (former)
• Edmunds.com
• Founded in 1966
• Gopher site in 1994
• HTTP site in 1995
![Page 3: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/3.jpg)
Edmunds.com EnvironmentEdmunds.com Environment
• Nearing 3000 hosts
• Heavily virtualized(Xen, CloudStack, AWS)
• Tomcat with some WebLogic
• Coherence Solr Mongo
• Publishing built on ActiveMQ
• Newly launched DWH built around Hadoop + Netezza
![Page 4: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/4.jpg)
• Explosive infrastructure growth
• Quick to bootstrap
• Easy integration with our tooling
• knife
• The Chef Community
Why Chef?Why Chef?
![Page 5: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/5.jpg)
• Open framework for data-intensive distributed applications
• Reigning King of “Big Data”
• Many services
• HDFS
• MapReduce
• HBase
• ZooKeeper
• Designed to run on commodity hardware
What’s Hadoop?What’s Hadoop?
![Page 6: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/6.jpg)
• Multiple Clusters
• Roughly 200Tb in total
• 40+ nodes in production
• Maintained by Ops + Dev
• Dell R410
• Six-core 2.40Ghz
• 24Gb RAM
• 4x 1Tb 7200RPMs
Edmunds Hadoop EnvironmentEdmunds Hadoop Environment
![Page 7: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/7.jpg)
• First cluster was a Frankenstein
• Part BMC
• Part manual effort
• Part Puppet
• Staff changes & knowledge loss
• Time for a clean slate!
How We Got HereHow We Got Here
![Page 8: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/8.jpg)
• True Dev + Ops effort
• Production built in 3 weeks
• Built with community cookbooks
• All services now administered with knife
• New nodes now cluster-ready within minutes
Building Hadoop with ChefBuilding Hadoop with Chef
![Page 9: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/9.jpg)
• First highly-visible Chef success story at Edmunds
• Cemented Chef as our CM solution
• Engaged us with the community
• Completely automated Hadoop infrastructure
• New suite of administrative scripts
• knife-[start|stop]-all.sh $cluster
• knife-[start|stop]-hbase.sh $cluster
• knife-[start|stop]-mapred.sh $cluster
• knife-[start|stop]-oozie.sh $cluster
What We GainedWhat We Gained
![Page 10: Building Hadoop with Chef](https://reader033.vdocuments.mx/reader033/viewer/2022051817/548fe445b4795963488b4c97/html5/thumbnails/10.jpg)
• New cluster currently being built!
• Integration with Cloudera Manager
• Cluster replication
• Continue evangelism of Chef’s awesomeness
• Extend more of the toolchain around Chef
• See you around at the LA Chef UG!
Where Next?Where Next?