data analytics all about data v5

26
Data Analytics – All about data Harish Dixit

Upload: harish-dixit

Post on 01-Dec-2014

117 views

Category:

Data & Analytics


1 download

DESCRIPTION

Information about Data Analytics and Big Data.

TRANSCRIPT

Page 1: Data analytics all about data v5

Data Analytics – All about data

Harish Dixit

Page 2: Data analytics all about data v5

Getting the context

• Examining raw data.

• Discovering useful information.

• Make better business decisions.

Page 3: Data analytics all about data v5

Life cycle stages

Page 4: Data analytics all about data v5

What is Big Data?

Page 5: Data analytics all about data v5

From where data is coming?

Page 6: Data analytics all about data v5

Complex ecosystem

Cloud

Page 7: Data analytics all about data v5

Hadoop Architecture

Page 8: Data analytics all about data v5

HDFS

• Distributed file system.

• Partitioning of data.

• Fault tolerant.

• Java API.

• Higher scalability.

• Master slave paradigm.

Page 9: Data analytics all about data v5

HDFS

Page 10: Data analytics all about data v5

Map Reduce

• Parallel processing model.

• Move operations not data.

• Distributed computations.

• User defined functions

– Map()

– Reduce()

Page 11: Data analytics all about data v5

Map/Reduce Operations

Page 12: Data analytics all about data v5

Example

Page 13: Data analytics all about data v5

R - Add analytic power to Hadoop

Page 14: Data analytics all about data v5

Data modeling techniques

• Regression

• Classification

• Clustering

• Recommendation

• Text mining

Page 15: Data analytics all about data v5

Regression

Regression can be formulated as follows:

y = ax +e

x y -----------------------------

63 3.164 3.665 3.866 4

-----------------------------

Page 16: Data analytics all about data v5

Classification

Page 17: Data analytics all about data v5

Clustering

Page 18: Data analytics all about data v5

Recommendation

Page 19: Data analytics all about data v5

Text Mining

Page 20: Data analytics all about data v5

NoSQL

• Non-relational.

• Distributed environment.

• Large volume.

• No fixed schemas.

• Horizontally scalable.

Page 21: Data analytics all about data v5

CAP Theorem (any 2 of 3)

Page 22: Data analytics all about data v5

Variants of NoSQL

• Key-Value Systems.

• Document-based Systems.

• Column-based Systems.

• Graph-based Systems.

Page 23: Data analytics all about data v5

Distributed Key-Value Systems

Page 24: Data analytics all about data v5

Column-based Systems

Page 25: Data analytics all about data v5

Applications of Big Data

Page 26: Data analytics all about data v5

Think Big Q & A