big data seminar (tict(cse)batch--> 2013-2017)
TRANSCRIPT
WHAT IS
Data is raw, unorganized facts that need to be processed. Data can be something simple,
seemingly random and of itself worthless useless until it is organized.
DIFFERENT TYPES OF DATA
Traditional RDBMS deals with only Structured Data
Need of a Technology which deals with Semi – Structured Data ,Unstructured
Data and Structured Data as well
Semi-Structured Data
Traditional Concept of Data Storage
Organizations
Banking Sector
Stock Exchange
Hospital
Social Media
Online Shopping
Others
Extract Data Transform Data Load into DataBase
End Users Generate Reports & Perform
Analytics
Managing Data
Processing Data
Data GrowsDifficult
Drawback of Using Traditional Approach
Expensive Time Consuming Scalability
Storage Size Resource Failure
The Model of Generating or Consuming Data Has Change...
OLD MODEL - Few companies are generating the data, all other consuming the data.
NEW MODEL - All of us generating the data, and all of us consuming the data.
BIG DATA
WHAT IS
Big data means really a Big Data, it is a collection of large datasets that cannot be
processed using traditional computing techniques. It requires new architecture , new techniques , various tools and frameworks .
Definition of BIG DATA
Different Sources of DATA
SOURCES
WHERE THE BIG DATA IS USED
IT Industries
Manufacturing Industries
Telecommunications
Banking sector
Healthcare
CHALLENGES IN HANDLING BIG DATA
There are two main challenges in handle BIG DATA1. How do we store and manage such a huge volume of
DATA, efficiently.2. How do we process & extract valuable information
from the huge volume of DATA within a given time frame.
BRIEF HISTORY OF HADOOP
WHAT IS
Hadoop is a open Source Framework. It is designed to store and Process huge volume of Data, efficiently.
Hadoop is a platform that provides both distributed storage and computational capabilities.
Why HADOOP Is Used
MAJOR COMPONENT S OF HADOOP ECOSYSTEM
HADOOP COMPONENTS
HADOOP DISTRIBUTED FILE SYSTEM
Google MAPREDUCE ALGORITHM
Storage Processing
HADOOP ECOSYSTEM
Flume Sqoop
Semi-Structured or Unstructured Data Structured Data
Import or Export
Features of HadoopCost Effective System (Use Commodity Machine)
Large Cluster of Nodes (Processing Power & Storage Capacity is Increase)
Features of HadoopParallel Processing (Less Time is Required to Store &
Access the Data)
Distributed Data (Data is Distributed in Different Nodes)
Features of HadoopAutomatic Failover Management
Heterogeneous Cluster
Features of Hadoop
Scalability
How The Data Is Stored In Hadoop Clusters
Rack 1 Rack 2Node 1 Node 4Node 3Node 2
Hadoop Distributed File SystemName Node
Task Tracker
Client
Block5 Block2 Block4
Block3 Block1 Block6
Data Node
Block4 Block1 Block3
Block2 Block6 Block5
Block1 Block2 Block3
Block4 Block5 Block6
Data Node Data Node
Task Tracker Task Tracker
Job Tracker
Secondary Name Node
MapReduce Flow
MapReduce FrameworkMap Reduce works by breaking the processing into two phases
Map Phase & Reduce Phase
Input Split Map Reduce OutputShuffle & Sort
Disadvantages
Security Concerns
Not Fit For Small Data
Future Scope of Big Data & Hadoop
Conclusion...
Source of Information
Presented ByVishal Kumar
Sk Ibrahim AnamSouvik Jana
Thank You