big data seminar (tict(cse)batch--> 2013-2017)

31

Upload: souvik-jana

Post on 14-Jan-2017

113 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Big data seminar (TICT(CSE)BATCH--> 2013-2017)
Page 2: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

WHAT IS

Data is raw, unorganized facts that need to be processed. Data can be something simple,

seemingly random and of itself worthless useless until it is organized.

Page 3: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

DIFFERENT TYPES OF DATA

Traditional RDBMS deals with only Structured Data

Need of a Technology which deals with Semi – Structured Data ,Unstructured

Data and Structured Data as well

Semi-Structured Data

Page 4: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Traditional Concept of Data Storage

Organizations

Banking Sector

Stock Exchange

Hospital

Social Media

Online Shopping

Others

Extract Data Transform Data Load into DataBase

End Users Generate Reports & Perform

Analytics

Managing Data

Processing Data

Data GrowsDifficult

Page 5: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Drawback of Using Traditional Approach

Expensive Time Consuming Scalability

Storage Size Resource Failure

Page 6: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

The Model of Generating or Consuming Data Has Change...

OLD MODEL - Few companies are generating the data, all other consuming the data.

NEW MODEL - All of us generating the data, and all of us consuming the data.

Page 7: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

BIG DATA

Page 8: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

WHAT IS

Big data means really a Big Data, it is a collection of large datasets that cannot be

processed using traditional computing techniques. It requires new architecture , new techniques , various tools and frameworks .

Page 9: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Definition of BIG DATA

Page 10: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Different Sources of DATA

SOURCES

Page 11: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

WHERE THE BIG DATA IS USED

IT Industries

Manufacturing Industries

Telecommunications

Banking sector

Healthcare

Page 12: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

CHALLENGES IN HANDLING BIG DATA

There are two main challenges in handle BIG DATA1. How do we store and manage such a huge volume of

DATA, efficiently.2. How do we process & extract valuable information

from the huge volume of DATA within a given time frame.

Page 13: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

BRIEF HISTORY OF HADOOP

Page 14: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

WHAT IS

Hadoop is a open Source Framework. It is designed to store and Process huge volume of Data, efficiently.

Hadoop is a platform that provides both distributed storage and computational capabilities.

Page 15: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Why HADOOP Is Used

Page 16: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

MAJOR COMPONENT S OF HADOOP ECOSYSTEM

HADOOP COMPONENTS

HADOOP DISTRIBUTED FILE SYSTEM

Google MAPREDUCE ALGORITHM

Storage Processing

Page 17: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

HADOOP ECOSYSTEM

Flume Sqoop

Semi-Structured or Unstructured Data Structured Data

Import or Export

Page 18: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Features of HadoopCost Effective System (Use Commodity Machine)

Large Cluster of Nodes (Processing Power & Storage Capacity is Increase)

Page 19: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Features of HadoopParallel Processing (Less Time is Required to Store &

Access the Data)

Distributed Data (Data is Distributed in Different Nodes)

Page 20: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Features of HadoopAutomatic Failover Management

Heterogeneous Cluster

Page 21: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Features of Hadoop

Scalability

Page 22: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

How The Data Is Stored In Hadoop Clusters

Rack 1 Rack 2Node 1 Node 4Node 3Node 2

Page 23: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Hadoop Distributed File SystemName Node

Task Tracker

Client

Block5 Block2 Block4

Block3 Block1 Block6

Data Node

Block4 Block1 Block3

Block2 Block6 Block5

Block1 Block2 Block3

Block4 Block5 Block6

Data Node Data Node

Task Tracker Task Tracker

Job Tracker

Secondary Name Node

Page 24: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

MapReduce Flow

Page 25: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

MapReduce FrameworkMap Reduce works by breaking the processing into two phases

Map Phase & Reduce Phase

Input Split Map Reduce OutputShuffle & Sort

Page 26: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Disadvantages

Security Concerns

Not Fit For Small Data

Page 27: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Future Scope of Big Data & Hadoop

Page 28: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Conclusion...

Page 29: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Source of Information

Google

Page 30: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Presented ByVishal Kumar

Sk Ibrahim AnamSouvik Jana

Page 31: Big data seminar (TICT(CSE)BATCH--> 2013-2017)

Thank You