hadoop&bigdata:craingthe...
TRANSCRIPT
![Page 1: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/1.jpg)
© Avalon Consul,ng, LLC 2014
Hadoop & Big Data: Cra>ing the Enterprise Strategy
Sriram Mohan Senior Consultant, Avalon Consul,ng, LLC
Associate Professor of CSSE, Rose-‐Hulman Ins,tute of Technology
![Page 2: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/2.jpg)
© Avalon Consul,ng, LLC 2014
Presenter Overview Who we are • Consultants providing expert technical integra,ons for
enterprise-‐scale Internet, intranet, and extranet sites • 50+ staff, mostly senior-‐level consultants • Offices in Dallas and Washington, D.C. Best known for our work in: • Enterprise search • Hadoop, big data • Enterprise content management • Websites & portals • E-‐learning • Unstructured, semi-‐structured content
![Page 3: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/3.jpg)
© Avalon Consul,ng, LLC 2014
Presenter Overview Who we are • Private engineering school in Terre Haute, Indiana • 2,000 students, 12:1 faculty-‐to-‐student ra,o • 10 math, science, and engineering majors Best known for : • Best undergraduate engineering school in the country • 14 years in a row
![Page 4: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/4.jpg)
© Avalon Consul,ng, LLC 2014
Overview
• Hadoop ecosystem • Enterprise strategy • Lambda architecture
![Page 5: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/5.jpg)
© Avalon Consul,ng, LLC 2014
What Is Big Data? • Volume – Large quan,,es (think gigabytes, terabytes of informa,on daily)
• Velocity – Needs to processed very quickly
• Variety – Data might be structured, unstructured, varying sources
![Page 6: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/6.jpg)
© Avalon Consul,ng, LLC 2014
Overview
• Hadoop ecosystem • Enterprise Strategy • Lambda Architecture
![Page 7: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/7.jpg)
© Avalon Consul,ng, LLC 2014
Apache Hadoop
• Distributed plaaorm for data processing
• Scalable • Runs on commodity hardware • Data & analysis coloca,on
![Page 8: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/8.jpg)
© Avalon Consul,ng, LLC 2014
Typical Hadoop Stack
![Page 9: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/9.jpg)
© Avalon Consul,ng, LLC 2014
Overview
• Hadoop Ecosystem • Enterprise strategy • Lambda Architecture
![Page 10: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/10.jpg)
© Avalon Consul,ng, LLC 2014
How Do You Introduce Hadoop?
![Page 11: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/11.jpg)
© Avalon Consul,ng, LLC 2014
Some Tips
• Start small • Build a POC • Replicate an exis,ng system in Hadoop (EDW offload)
![Page 12: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/12.jpg)
© Avalon Consul,ng, LLC 2014
What Would an EDW Offload Look Like?
• How do you bring your data into HDFS? • How do you analyze the data into HDFS? • How do you verify the results of the analysis? • How do you expose the results of your analysis?
![Page 13: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/13.jpg)
© Avalon Consul,ng, LLC 2014
How Do You Bring Your Data?
• Flume • Sqoop • Every database/data warehouse has a Hadoop connector
![Page 14: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/14.jpg)
© Avalon Consul,ng, LLC 2014
How Do You Analyze the Data?
• Pig • Hive • HBase
![Page 15: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/15.jpg)
© Avalon Consul,ng, LLC 2014
How Do You Verify?
![Page 16: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/16.jpg)
© Avalon Consul,ng, LLC 2014
How Do You Expose Your Results?
• BI tools • Export the data back to a data warehouse
![Page 17: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/17.jpg)
© Avalon Consul,ng, LLC 2014
Dealing With Semi-‐Structured Data
• Naviga,ng the world of NoSQL with Hadoop • Sample use cases – Batch processing – Search
• How does Hadoop fit in? – Use Solr/elas,c search – Use HBase
![Page 18: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/18.jpg)
© Avalon Consul,ng, LLC 2014
Natural Language Processing • What is NLP? • Sample use cases – Adding metadata to emails – Predic,ve models – Forecasts
• How does Hadoop fit in? – Mahout – Using R with Hadoop
![Page 19: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/19.jpg)
© Avalon Consul,ng, LLC 2014
Overview
• Hadoop Ecosystem • Enterprise Strategy • Lambda architecture
![Page 20: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/20.jpg)
© Avalon Consul,ng, LLC 2014
Resource
• Big Data: Principles and best-‐prac4ces of scalable, real4me data systems – Nathan Marz and James Warren
![Page 21: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/21.jpg)
© Avalon Consul,ng, LLC 2014
Why Do We Need This?
• Compu,ng arbitrary func,ons on an arbitrary dataset in real-‐,me is a daun,ng problem
![Page 22: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/22.jpg)
© Avalon Consul,ng, LLC 2014
Batch Layer
![Page 23: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/23.jpg)
© Avalon Consul,ng, LLC 2014
Batch Layer
![Page 24: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/24.jpg)
© Avalon Consul,ng, LLC 2014
Serving Layer
![Page 25: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/25.jpg)
© Avalon Consul,ng, LLC 2014
Speed Layer
![Page 26: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/26.jpg)
© Avalon Consul,ng, LLC 2014
Handling Queries
![Page 27: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/27.jpg)
© Avalon Consul,ng, LLC 2014
Lambda Architecture
![Page 28: Hadoop&BigData:Craingthe EnterpriseStrategydownloads.deusm.com/allanalytics/academy/0312-A2-3-Hadoop-Big-… · Overview • Hadoop"Ecosystem" • Enterprise"Strategy" • Lambdaarchitecture"](https://reader031.vdocuments.mx/reader031/viewer/2022030509/5ab84e7c7f8b9a684c8ca61e/html5/thumbnails/28.jpg)
© Avalon Consul,ng, LLC 2014
Ques,ons?