big data seminar (tict(cse)batch--> 2013-2017)

WHAT IS

Data is raw, unorganized facts that need to be processed. Data can be something simple,

seemingly random and of itself worthless useless until it is organized.

DIFFERENT TYPES OF DATA

Traditional RDBMS deals with only Structured Data

Need of a Technology which deals with Semi – Structured Data ,Unstructured

Data and Structured Data as well

Semi-Structured Data

Traditional Concept of Data Storage

Organizations

Banking Sector

Stock Exchange

Hospital

Social Media

Online Shopping

Others

Extract Data Transform Data Load into DataBase

End Users Generate Reports & Perform

Analytics

Managing Data

Processing Data

Data GrowsDifficult

Drawback of Using Traditional Approach

Expensive Time Consuming Scalability

Storage Size Resource Failure

The Model of Generating or Consuming Data Has Change...

OLD MODEL - Few companies are generating the data, all other consuming the data.

NEW MODEL - All of us generating the data, and all of us consuming the data.

BIG DATA

WHAT IS

Big data means really a Big Data, it is a collection of large datasets that cannot be

processed using traditional computing techniques. It requires new architecture , new techniques , various tools and frameworks .

Definition of BIG DATA

Different Sources of DATA

SOURCES

WHERE THE BIG DATA IS USED

IT Industries

Manufacturing Industries

Telecommunications

Banking sector

Healthcare

CHALLENGES IN HANDLING BIG DATA

There are two main challenges in handle BIG DATA1. How do we store and manage such a huge volume of

DATA, efficiently.2. How do we process & extract valuable information

from the huge volume of DATA within a given time frame.

BRIEF HISTORY OF HADOOP

WHAT IS

Hadoop is a open Source Framework. It is designed to store and Process huge volume of Data, efficiently.

Hadoop is a platform that provides both distributed storage and computational capabilities.

Why HADOOP Is Used

MAJOR COMPONENT S OF HADOOP ECOSYSTEM

HADOOP COMPONENTS

HADOOP DISTRIBUTED FILE SYSTEM

Google MAPREDUCE ALGORITHM

Storage Processing

HADOOP ECOSYSTEM

Flume Sqoop

Semi-Structured or Unstructured Data Structured Data

Import or Export

Features of HadoopCost Effective System (Use Commodity Machine)

Large Cluster of Nodes (Processing Power & Storage Capacity is Increase)

Features of HadoopParallel Processing (Less Time is Required to Store &

Access the Data)

Distributed Data (Data is Distributed in Different Nodes)

Features of HadoopAutomatic Failover Management

Heterogeneous Cluster

Features of Hadoop

Scalability

How The Data Is Stored In Hadoop Clusters

Rack 1 Rack 2Node 1 Node 4Node 3Node 2

Hadoop Distributed File SystemName Node

Task Tracker

Client

Block5 Block2 Block4


Data Node





Data Node Data Node

Task Tracker Task Tracker

Job Tracker

Secondary Name Node

MapReduce Flow

MapReduce FrameworkMap Reduce works by breaking the processing into two phases

Map Phase & Reduce Phase

Input Split Map Reduce OutputShuffle & Sort

Disadvantages

Security Concerns

Not Fit For Small Data

Future Scope of Big Data & Hadoop

Conclusion...

Source of Information

Google

Presented ByVishal Kumar

Sk Ibrahim AnamSouvik Jana

Thank You

big data seminar (tict(cse)batch--> 2013-2017)

Data & Analytics