big data - part ii

18
By Thanuja Seneviratne

Upload: thanuja-seneviratne

Post on 19-Jul-2015

607 views

Category:

Software


1 download

TRANSCRIPT

By Thanuja Seneviratne

Part I Recap

Big Data Market

› Data Growth

› Market Growth

› Market Drivers

› Adoption Cycle

› Forrester Market Report Findings

Big Data Products

› Enterprise Data Warehouses (EDW) – non-canonical, traditional

› Big Data Products Offering

› Hadoop and its Distros

› MapR and Others

› Big Data Products Stack

Future of Big Data

Data Science vs Traditional Analytics Traditional Analytics - Decide what data is relevant, create a static data model, data visualize

Data Science – Assemble all possible data, create a predictive model, operationalize the model (visualize, feed to another system)

Three types of data stores/data management systems› Relational vs non-relational [MSSQL, Oracle, MySql vs NoSql products]

› Relational “big data” offering called EDW (mostly packaged as MPP appliances)

› Each three types has merits in certain use cases and will be continued to be used in the industry

› Why EDW is not enough for new “big data” scenarios Three V’s becoming too heavy

Time to Market is delayed

High Cost

Write-first schema unnecessary

Importance of Individualized experience› Another Sample case: Money found $ 1000 in front of a bank, Will a person return it to the bank or

runaway with it?

› Multiple business cases and multiple use cases

Hadoop as the premier open source “big data” offering and its distros

Other Hadoop-like “big data” offerings

Data Growth

Market Growth

› will be the largest market overtaking ERP by 2020

Adaption Cycle

Market Drivers

› Business Drivers Reactive Analytics instead Proactive Analytics

Insights generated for competitive advantage

Rise of Data-First enterprise

› Technical Drivers Data growing exponentially to petabyte scale

Data is everywhere with variety of formats

› Financial Drivers Cost of IT continues to grow

Commodity hardware instead Enterprise hardware

Forrester Market Report Findings

› Unstoppable Hadoop momentum in the market

› More and more enterprises wants to do POC’s

› Open source is the key

› Many Big Data products – a fair amount products to chose

from. But no market dominating leader yet.

Hadoop distributions

Other products including MapR

› Enterprise Hadoop and partnerships with large vendors

IBM, TeraData, Pivotal, Microsoft

› Hadoop in the cloud

› Hadoop Ecosystem

Enterprise Data Warehouses (EDWs)› Traditional big data offering

› Non-canonical or original way of storing large data sets

› Refer to Part I slides

Big Data Products Offering

Hadoop and its distros› History of Hadoop

› Hadoop as a Platform

HortonWorks Data Platform (HDP)

Cloudera Distribution on Hadoop (CDH)

Big Vendors› IBM’s BigInsights – This is a Hadoop distro through Cloudera’s CDH

› Microsoft’s HDInsight on Azure – this is a Hadoop distro through

HortonWorks’ HDP

› SAP’s HANA – this is a Hadoop distro through HortonWorks’ HDP

MapR and Others› Instead HDFS MapR uses Network File System (NFS)

› MapR Distros

Open source M3 in Amazon Cloud

Premium M5 in Amazon Cloud

MapR distro on Google

› Others

Amazon EMR› A Hadoop distro on Amazon EC2 clusters in the Amazon cloud

› Exposed a Web service to manage the clusters

› Most popular and cost-effective distro apart from Cloudera and

HortonWorks

Hybrids› Converging SQL Enterprise Data Warehouses (specially MPP

products) with Big Data

› The investments made for long running contracts with EDW vendors

are safeguarded

› Existing SQL/DW knowledge and skill set can be utilized

› Following are popular products:

Big Data Products Stack

Market leader by 2020

Many products and alternatives are coming our way

5Vs-driven ecosystem instead 3Vs

Demanding skill-set around the Big Data technologies

› Enterprise Hadoop,

› Hadoop Distros,

› MapR and its Distros,

› Hadoop stack,

› Application Frameworks and languages

“R” language and frameworks

Scala language and frameworks

Subjective evolution instead objective evolution

› Improvements to Big Data Infrastructure (BDI)

› Improvements to Big Data Life Cycle (BDLC)

› Evolve to All-Data processing