data analytics all about data v5

Post on 01-Dec-2014

117 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Information about Data Analytics and Big Data.

TRANSCRIPT

Data Analytics – All about data

Harish Dixit

Getting the context

• Examining raw data.

• Discovering useful information.

• Make better business decisions.

Life cycle stages

What is Big Data?

From where data is coming?

Complex ecosystem

Cloud

Hadoop Architecture

HDFS

• Distributed file system.

• Partitioning of data.

• Fault tolerant.

• Java API.

• Higher scalability.

• Master slave paradigm.

HDFS

Map Reduce

• Parallel processing model.

• Move operations not data.

• Distributed computations.

• User defined functions

– Map()

– Reduce()

Map/Reduce Operations

Example

R - Add analytic power to Hadoop

Data modeling techniques

• Regression

• Classification

• Clustering

• Recommendation

• Text mining

Regression

Regression can be formulated as follows:

y = ax +e

x y -----------------------------

63 3.164 3.665 3.866 4

-----------------------------

Classification

Clustering

Recommendation

Text Mining

NoSQL

• Non-relational.

• Distributed environment.

• Large volume.

• No fixed schemas.

• Horizontally scalable.

CAP Theorem (any 2 of 3)

Variants of NoSQL

• Key-Value Systems.

• Document-based Systems.

• Column-based Systems.

• Graph-based Systems.

Distributed Key-Value Systems

Column-based Systems

Applications of Big Data

Think Big Q & A

top related