5 scenarios: when to use & when not to use hadoop

23
www.edureka.co/big-data-and-hadoop When not to use Hadoop View Big Data and Hadoop Course at: http:// www.edureka.co/big-data-and-hadoop

Upload: edureka

Post on 21-Jan-2018

3.385 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: 5 Scenarios: When To Use & When Not to Use Hadoop

www.edureka.co/big-data-and-hadoop

When not to use Hadoop

View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop

Page 2: 5 Scenarios: When To Use & When Not to Use Hadoop

www.edureka.co/big-data-and-hadoopSlide 2

ObjectivesAt the end of this module, you will be able to…

Understand When not to use Hadoop

» Real Time Analytics

» Not a Replacement

» Dataset Size

» Complexity

» Security

Understand When to use Hadoop

» Huge Unstructured Datasets

» Response Time is Not an Issue

» Future Planning

» Multiple Frameworks for Big Data

» Lifetime Data Availability

Page 3: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 3Slide 3 www.edureka.co/big-data-and-hadoopSlide 3

Hadoop Mania

Page 4: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 4Slide 4 www.edureka.co/big-data-and-hadoopSlide 4

When Not To Use Hadoop

Page 5: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 5Slide 5 www.edureka.co/big-data-and-hadoopSlide 5

If you want to do some Real Time Analytics, where you are expecting result quickly, Hadoop should not be used directly

Hadoop works on Batch processing, hence response time is high

Day 1 Day 2 Day 3 Day 4 ......... ………. ………. Day n

Day 1 Day 2 Day 3 Day 4 ......... ………. ………. Day n

InputData

ProcessingData

InputData

ProcessingData

InputData

ProcessingData

Input Data

Processing Data using MR

Time Lag

Real Time Analytics

Page 6: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 6Slide 6 www.edureka.co/big-data-and-hadoopSlide 6

Real Time Analytics – Accepted Way

Streaming Data

Storing

Page 7: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 7Slide 7 www.edureka.co/big-data-and-hadoopSlide 7

14 sec

0.6 sec

Real Time Analytics – Accepted Way

Page 8: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 8Slide 8 www.edureka.co/big-data-and-hadoopSlide 8

Hadoop is not a replacement for your existing data processing infrastructure

After processing the data in Hadoop you need to send the output to relational database technologies today for BI, decision support, reporting etc

It’s not going to replace your database, but your database isn’t likely to replace Hadoop either

Different tools for different jobs

Not a Replacement for Existing Infrastructure

Page 9: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 9Slide 9 www.edureka.co/big-data-and-hadoopSlide 9

Hadoop framework is not recommendable for small structured datasets as you have other tools available in market which can do this work quite easily and at a fast pace than Hadoop like MS excel, RDBMS etc

For a small data analytics, Hadoop can be costlier than other tools

Merge all the small files into one

Multiple Smaller Datasets – Accepted Way

Page 10: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 10Slide 10 www.edureka.co/big-data-and-hadoopSlide 10

Multiple Smaller Datasets – Accepted Way

4225284

Each file of x MB Slow Execution – 10400 ms

4225284

All the abovefiles merged intoone file (9x MB)

Fast Execution – 6140 ms

Same OutputSame Input

Page 11: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 11Slide 11 www.edureka.co/big-data-and-hadoopSlide 11

Unless you have a better understanding of the Hadoop framework, its not suggested to use Hadoop for production

Learning Hadoop and it eco-system tools and deciding which technology suits your need is again a different level of complexity

Novice Hadoopers

Page 12: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 12Slide 12 www.edureka.co/big-data-and-hadoopSlide 12

Many enterprises — especially within highly regulated industries dealing with sensitive data— aren’t able to move as quickly as they would like towards implementing Big Data projects and Hadoop

“Example Health-care data used by Insurance companies to calculate premium”

Where Security is the Primary Concern?

They don’t have to hesitate though, as many of the security and compliance challenges are being continuously worked upon and can be surmountable (for example, by using Apache Accumulo on top of Hadoop).

Page 13: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 13Slide 13 www.edureka.co/big-data-and-hadoopSlide 13

Where security is the primary concern – Accepted way

Healthcare Data

Hadoop Analytic Integration

Healthcare Data

Hadoop Analytic Integration

Page 14: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 14Slide 14 www.edureka.co/big-data-and-hadoopSlide 14

When To Use Hadoop

Page 15: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 15Slide 15 www.edureka.co/big-data-and-hadoopSlide 15

Your have different types of data : structured, semi-structured and unstructured

The data set is huge in size i.e. several Terabytes or Petabytes

You are not in a hurry for Answers

Data Size and Data Diversity

Page 16: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 16Slide 16 www.edureka.co/big-data-and-hadoopSlide 16

To implement Hadoop on you data you should first understand the level of complexity of data and the rate it is going to grow

So we need a cluster planning, its may begin with building a small or medium cluster in your industry as per data (in GBs or few TBs ) available at present and scale up your cluster in future depending on the growth of your data

Future Planning

Page 17: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 17Slide 17 www.edureka.co/big-data-and-hadoopSlide 17

Hadoop can be integrated with multiple analytic tools to get the best out of it, like M-Learning, R , Python, Spark, MongoDB etc.

Multiple Frameworks for Big Data

Page 18: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 18Slide 18 www.edureka.co/big-data-and-hadoopSlide 18

When you want your data to be live and running forever, it can be achieved using Hadoop’s scalability

Lifetime Data Availability

Page 19: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 19Slide 19 www.edureka.co/big-data-and-hadoopSlide 19

Page 20: 5 Scenarios: When To Use & When Not to Use Hadoop

LIVE Online Class

Class Recording in LMS

24/7 Post Class Support

Module Wise Quiz

Project Work

Verifiable Certificate

Slide 20 www.edureka.co/big-data-and-hadoop

How it Works?

Page 21: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 21Slide 21 www.edureka.co/big-data-and-hadoopSlide 21

Module 1

» Understanding Big Data and Hadoop

Module 2

» Hadoop Architecture and HDFS

Module 3

» Hadoop MapReduce Framework - I

Module 4

» Hadoop MapReduce Framework - II

Module 5

» Advance MapReduce

Course Topics

Module 6

» PIG

Module 7

» HIVE

Module 8

» Advance HIVE and HBase

Module 9

» Advance HBase

Module 10

» Oozie and Hadoop Project

Page 22: 5 Scenarios: When To Use & When Not to Use Hadoop

Slide 22

Questions

Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

Page 23: 5 Scenarios: When To Use & When Not to Use Hadoop