data analytics all about data v5
DESCRIPTION
Information about Data Analytics and Big Data.TRANSCRIPT
Data Analytics – All about data
Harish Dixit
Getting the context
• Examining raw data.
• Discovering useful information.
• Make better business decisions.
Life cycle stages
What is Big Data?
From where data is coming?
Complex ecosystem
Cloud
Hadoop Architecture
HDFS
• Distributed file system.
• Partitioning of data.
• Fault tolerant.
• Java API.
• Higher scalability.
• Master slave paradigm.
HDFS
Map Reduce
• Parallel processing model.
• Move operations not data.
• Distributed computations.
• User defined functions
– Map()
– Reduce()
Map/Reduce Operations
Example
R - Add analytic power to Hadoop
Data modeling techniques
• Regression
• Classification
• Clustering
• Recommendation
• Text mining
Regression
Regression can be formulated as follows:
y = ax +e
x y -----------------------------
63 3.164 3.665 3.866 4
-----------------------------
Classification
Clustering
Recommendation
Text Mining
NoSQL
• Non-relational.
• Distributed environment.
• Large volume.
• No fixed schemas.
• Horizontally scalable.
CAP Theorem (any 2 of 3)
Variants of NoSQL
• Key-Value Systems.
• Document-based Systems.
• Column-based Systems.
• Graph-based Systems.
Distributed Key-Value Systems
Column-based Systems
Applications of Big Data
Think Big Q & A