big data - what's the big deal

20
BIG DATA – WHAT’S THE BIG DEAL Debarchan Sarkar Microsoft Corporation The call would start soon, please be on mute. Thanks for your time and patience.

Upload: debarchan-sarkar

Post on 07-Nov-2014

174 views

Category:

Data & Analytics


1 download

DESCRIPTION

This video is a recording of a tech talk where we explain the basics of Big Data. It has certainly been the buzzword in the IT industry and this is an effort towards a level 100 talk where people would learn about the history, basics, current market needs and in and out of Big Data.

TRANSCRIPT

Page 1: Big Data - What's the Big Deal

BIG DATA – WHAT’S THE BIG DEAL

Debarchan SarkarMicrosoft Corporation

The call would start soon, please be on mute.Thanks for your time and patience.

Page 2: Big Data - What's the Big Deal

WHO AM I

Debarchan, from Calcutta with an Indian heart and a global mind. .NET programmer who fell in love with Open Source, specifically Apache Hadoop Author. Community enthusiast. Cricket and music lover. One who gets really scared and bored if tomorrow is exactly like today. Known is a drop, the unknown is an ocean.

Page 3: Big Data - What's the Big Deal

WHAT IS BIG DATA?

Data Complexity: Variety and Velocity

Terabytes

Gigabytes

Megabytes

Petabytes Big

DataLog files

Spatial & GPS coordinates

Data market feeds

eGov feeds

Weather

Text/image

Click stream

Wikis/blogs

Sensors/RFID/devices

Social sentiment

Audio/video

Web 2.0

Web Logs

Digital Marketing

Search Marketing

Recommendations

Advertising

Mobile

Collaboration

eCommerce

ERP/CRM

Payables

Payroll

Inventory

Contacts

Deal Tracking

Sales Pipeline

Page 4: Big Data - What's the Big Deal

How do I optimize my fleet based on weather and traffic patterns?

SOCIAL & WEB ANALYTICS

LIVE DATA FEEDS

ADVANCED ANALYTICS

What’s the social sentiment for my brand or products

How do I better predict future outcomes?

A NEW SET OF QUESTIONS

Page 5: Big Data - What's the Big Deal

COMMON BIG DATA CUSTOMER SCENARIOSGAIN COMPETITIVE ADVANTAGE BY MOVING FIRST AND FAST IN YOUR INDUSTRY

Web app optimization

Smart meter monitoring

Equipment monitoring

Advertising analysis

Life sciences research

Fraud detection

Healthcare outcomes

Weather forecasting

Natural resource exploration

Social network analysis

Churn analysis

Traffic flow optimization

IT infrastructure optimization

Legal discovery

Page 6: Big Data - What's the Big Deal

THE BIG DATA LIFECYCLE

InsightManage Enrich

Page 7: Big Data - What's the Big Deal

RelationalNon-Relational Streaming

MANAGE ANY DATA, ANY SIZE, ANYWHERE

010101010101010101101010101010101001010101010101101010101010

Unified Monitoring, Management & Security

Data Movement

Page 8: Big Data - What's the Big Deal

Extremely large volume of unstructured web logsAd hoc analysis of logs to prototype patternsHadoop data cluster feeds large 24TB cubeBusiness users analyze cube data

6 PB Hadoop Cluster

24 TB SQL Server AS Cube

Microsoft BI Tools

E.g. STRUCTURED & UNSTRUCTURED DATA

Page 9: Big Data - What's the Big Deal

InsightManage Enrich

THE BIG DATA LIFECYCLE

Page 10: Big Data - What's the Big Deal

ENRICH BY CONNECTING TO THE WORLDS DATA

Discover

Combine

Refine

Page 11: Big Data - What's the Big Deal

POWER OF COMBINING THE WORLDS DATA

Personal Data

OrganizationalData

CommunityData

WorldData

Value

Page 12: Big Data - What's the Big Deal

E.g. VALUE OF EXTERNAL DATA

“When it comes to business intelligence, Microsoft SQL Server 2012 demonstrates that the platform has continued to advance and keep up with the innovations that are happening in big data."

David Mariani, Vice President of Engineering

Connects to more than 1 billion signals

Across 15 leading social networks, including Facebook

Generates a ‘Klout’ score for individual people, brands & partners

Enables analysis, targeting and social graphs

Page 13: Big Data - What's the Big Deal

InsightManage Enrich

THE BIG DATA LIFECYCLE

Page 14: Big Data - What's the Big Deal

INSIGHTS ON ANY DATA, ALL USERS, WHEREVER THEY ARE

RelationalNon-Relational Streaming

010101010101010101101010101010101001010101010101101010101010

BI Professionals Business AnalystsData Scientists

Page 15: Big Data - What's the Big Deal

INSIGHTS FOR ALL USERS THROUGH FAMILIAR TOOLS

Advanced Analytics from Microsoft and 3rd parties

Self Service Analysis with PowerPivot & Power View

Interactivity & exploration with Hadoop data in Excel

PB TB GB

BI Professionals Business AnalystsData Scientists

Page 16: Big Data - What's the Big Deal

16

• Application written in java for Big Data Processing• Uses the “Map-Reduce” Processing Paradigm• Characteristics: How is it different from traditional SQL

Server?1. Optimized for distributed storage and computing of data2. Highly-scalable (scale out model)3. Commodity HW-based4. Open Source

Þ Very low cost for acquisition and storage

Hadoop is for Big Data.

HadoopData Analytics

Dataflow

Page 17: Big Data - What's the Big Deal

17

Distributed Storage(HDFS)

Query(Hive)

Distributed Processing(Map Reduce)

Scripting(Pig)

NoSQ

L Data

base

(HB

ase

)

Metadata(HCatalog)

Data

Inte

gra

tion

( OD

BC

/ SQ

OO

P/

REST)

Busin

ess In

tellig

ence

(E

xcel, Po

wer V

iew

…)

Machine Learning(Mahout)

Graph(Pegasus)

Stats processin

g(RHadoop

)

Pipelin

e /

workfl

ow

(Oozie

)

Log fi

le

aggre

gatio

n(Flu

me)

Active

D

irecto

ry (S

ecu

rity)Syste

m C

ente

r

The Hadoop Ecosystem

Page 18: Big Data - What's the Big Deal

18

Welcome to the Zoo!HDInsightApache™Hadoop™on Windows

Azure Blob StorageLibHDFSFTPS

ActiveDirectory

Need to Know*

StreamInsight

JDBC Connector

Good to Know*

HCatalog OozieAmbari

Hadoop: The Definitive Guide 3rd Ed.- Tom White, O’Reilly Books

Page 19: Big Data - What's the Big Deal

19

Feed us back

• Support Team’s blog: http://blogs.msdn.com/b/bigdatasupport/ • Facebook Page: https://www.facebook.com/MicrosoftBigData • Facebook Group: https://www.facebook.com/groups/bigdatalearnings/ • Twitter: @debarchans

Read more:• http://en.wikipedia.org/wiki/Hadoop• http://en.wikipedia.org/wiki/Big_data

Next Session:• Apache Hadoop – A deep dive

Page 20: Big Data - What's the Big Deal

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.