how does microsoft solve big data?

59
How does Microsoft solve Big Data? James Serra Big Data Evangelist Microsoft [email protected] JamesSerra.com

Upload: james-serra

Post on 17-Jul-2015

675 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: How does Microsoft solve Big Data?

How does Microsoft solve Big Data?

James SerraBig Data Evangelist

Microsoft

[email protected]

JamesSerra.com

Page 2: How does Microsoft solve Big Data?

Other Presentations Building an Effective Data Warehouse Architecture

Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)

Building a Big Data Solution (Building an Effective Data Warehouse

Architecture with Hadoop, the cloud and MPP)Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in

Finding business value in Big Data (What exactly is Big Data and why

should I care?)Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects

How does Microsoft solve Big Data?Covers the Microsoft products that can be used to create a Big Data solution

Modern Data Warehousing with the Microsoft Analytics Platform SystemThe next step in data warehouse performance is APS, a MPP appliance

Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etcDeep dives into the various Microsoft Big Data related products

Page 3: How does Microsoft solve Big Data?

About Me

Business Intelligence Consultant, in IT for 30 years

Microsoft, Big Data Evangelist

Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM

architect, PDW/APS developer

Been perm, contractor, consultant, business owner

Presenter at PASS Business Analytics Conference and PASS Summit

MCSE: Data Platform and Business Intelligence

MS: Architecting Microsoft Azure Solutions

Blog at JamesSerra.com

Former SQL Server MVP

Author of book “Reporting with Microsoft SQL Server 2012”

Page 4: How does Microsoft solve Big Data?

I tried understanding all the Microsoft Big Data products…

And ended up passed-out drunk in a Denny’s

parking lot

Let’s prevent that from happening…

Page 5: How does Microsoft solve Big Data?

Agenda

Collect + Manage

Transform + Analyze

Visual + Decide

Access Methods

Product Groupings

Modern Data Warehouse

Sample architectures

Page 6: How does Microsoft solve Big Data?

Microsoft’s portfolio of products• Windows

• Visual Studio

• .NET

• Azure, HDInsight

• Power BI: Power Query, Power Map, PowerPivot, Power View

• Azure ML

• APS

• SQL Server, Azure SQL DB

• SCOM

• SSAS, SSRS, SSIS

• Excel

• Report Builder

• PerformancePoint

• SharePoint

• DQS

• MDS

• Data Lake

• SQL DW

Microsoft has all the Lego's to build anything you want, but difficulty is determining how the pieces fit together

Page 7: How does Microsoft solve Big Data?

The Microsoft data platform

MobileReports

Natural

language

queryDashboardsApplications

StreamingRelational

Internal &

externalNon-relational NoSQL

Orchestration

Machine

learningModeling

Information

management

Complex event

processing

Transform+ analyze

Visualize+ decide

Collect + manage

Data

Page 8: How does Microsoft solve Big Data?

Secure, reliable performance

Increase speed across all your data workloads

Capture any data: structured, unstructured, and streaming

Scale your platform quickly to meet changing demands

Collect and manage diverse data types with breakthrough speed

Collect + manage

Transform+ analyze

Visualize+ decide

Collect + manage

Data

Page 9: How does Microsoft solve Big Data?
Page 10: How does Microsoft solve Big Data?

SQL Server options

Azure SQL Database has a max database size of 500GB

Potential total volume size of up to 64 TB

Page 11: How does Microsoft solve Big Data?

Cloud-born data4

Data sources

Our customer challenges

Increasing data volumes

1

Real-time business requests

2

New data sources and types

3

Non-Relational Data

Page 12: How does Microsoft solve Big Data?

Parallelism

• Uses many separate CPUs running in parallel to execute a single

program

• Shared Nothing: Each CPU has its own memory and disk (scale-out)

• Segments communicate using high-speed network between nodes

MPP - Massively

Parallel

Processing

• Multiple CPUs used to complete individual processes simultaneously

• All CPUs share the same memory, disks, and network controllers (scale-up)

• All SQL Server implementations up until now have been SMP

• Mostly, the solution is housed on a shared SAN

SMP - Symmetric

Multiprocessing

Page 13: How does Microsoft solve Big Data?

Analytics Platform System (APS) for Big Data

Pre-Built Hardware + Software Appliance

• Co-engineered with HP, Dell, Quanta

• Scale-out, up to 100x performance increase

• Optional Hadoop region

• Appliance installed in 1-2 days

• Support - Microsoft provides first call support

• Hardware partner provides onsite break/fix support

Plug and Play Built-in Best Practices

Save Time On-Premise Solution

Page 14: How does Microsoft solve Big Data?

Office 365

Azure

Page 15: How does Microsoft solve Big Data?

YARN

U-SQL

Analytics

ServiceHDInsight

HDFS

Store

Introducing Azure Data Lake Store

No fixed limits file size (PB file sizes)

Designed for diversity of analytic workloads

Accessible to all HDFS compliant analytic applications (Hortonworks, Cloudera, MapR)

Managed, monitored, and supported by Microsoft

Enterprise grade features around security, compliance & management

Page 16: How does Microsoft solve Big Data?

Support HBase as NoSQL columnar database on Azure Blobs

Support Storm as stream processing

Hadoop in Azure (HDP under the covers)

Data Node Data Node Data Node Data Node

Task Tracker Task Tracker Task Tracker Task Tracker

Name Node

Job Tracker

HMasterCoordination

Region Server Region Server Region Server Region Server

HBase as a columnar NoSQL transactional database running on Azure Blobs

Storm as a streaming service for near real time processing

Hadoop 2.4 support for 100x query gains on Hive queries

Mahout support for machine learning + Hadoop

Graphical User Interface for HIVE queries

Page 17: How does Microsoft solve Big Data?

Microsoft Azure Data Lake

YARN

U-SQL

Analytics Service HDInsight

Store

HDFS

Announcing Azure Data Lake Analytics Service

Distributed analytics service

Dynamically scales to meet your business needs

Productive day one with industry leading development tools (for novices & experts)

Analytics over all data (unstructured, semi-structured, structured)

U-SQL: simple and familiar, easily extensible

Hive coming soon

Built on open standards (YARN)

Page 18: How does Microsoft solve Big Data?

Data sources

What happened?

Why did it happen?

Descriptive

Analytics

Diagnostic

Analytics

Why did it happen?

What will happen?

Predictive

Analytics

Prescriptive

Analytics

How can we make it happen?

Page 19: How does Microsoft solve Big Data?

Azure Stream AnalyticsProcess real-time data in Azure

Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure,

and applications

Performs time-sensitive analysis using SQL-like language against multiple real-time streams and

reference data

Outputs to persistent stores, dashboards or back to devices

Point of Service Devices

Self CheckoutStations

Kiosks

Smart Phones

Slates/Tablets

PCs/Laptops

Servers

Digital Signs

DiagnosticEquipmentRemote Medical

MonitorsLogic

Controllers

SpecializedDevicesThin

Clients

Handhelds

Security

POS Terminals

AutomationDevices

VendingMachines

Kinect

ATM

Page 20: How does Microsoft solve Big Data?

• Free and open source R distribution

• Enhanced and distributed by Revolution Analytics

Microsoft R Open

• Secure, Scalable and Supported Distribution of R

• With proprietary components created by Revolution

Analytics

Microsoft R Server

Page 21: How does Microsoft solve Big Data?

Fully managed database service built on a native JSON data model

Application controlled schema with massive scale-out enables iterative development and evolving data models

Automatic indexing enables robust querying over schema-free data

Integrated transactional JavaScript processing + tunable consistency enable high performance application experiences

Azure DocumentDB

Page 22: How does Microsoft solve Big Data?

SQL Server on Linux

(Preview today, GA in

mid-2017)

Red Hat - Microsoft

Partnership

(Nov 2015)

Microsoft joins Eclipse

Foundation (Mar 2016).

HD Insight PaaS on

Linux GA (Sep 2015)

C:\Users\markhill>

root@localhost: #

bash

Azure Marketplace 60% of all images in

Azure Marketplace

are based on

Linux/OSS

In partnership with the Linux

Foundation, Microsoft releases the

Microsoft Certified Solutions Associate

(MCSA) Linux on Azure certification.

493,141,677 ?????? Microsoft Open Source Hub

Ross Gardler: President Apache Software Foundation

Wim Coekaerts: Oracle’s Mr Linux

1 out of 4 VMs on Azure runs

Linux, and getting larger every

day

• 28.9% of All VMs are Linux

• >50% of new VMs

Page 23: How does Microsoft solve Big Data?

Connect, combine, and refine any data

Create data marts and publish reports

Build and test predictive models

Curate and catalog any data

Transform + analyze

Transform+ analyze

Visualize+ decide

Collect + manage

Data

Transform and analyze data for anyone to access anywhere

Page 24: How does Microsoft solve Big Data?
Page 25: How does Microsoft solve Big Data?

Make sense of disparate data and prepare it for analysis

Connect, combine, and refine any data

Integration, Data Quality and Master Data Services

• Rich support for ETL tasks

• Data cleansing and matching

• Manage master data structures

Connect any data and all volumes in real time

• Social data

• SAP and Dynamics data

• Machine data

Page 26: How does Microsoft solve Big Data?

Query aggregated data and build reports

Create data marts and reports

Reporting services

• Create and publish interactive reports

• Consolidate reporting management

• Enable reporting capabilities for anyone

Analysis services

• Single semantic model

• 100x faster analysis with in-memory columnstore

• Manage user-created BI content

Page 27: How does Microsoft solve Big Data?

Use the power of machine learning to predict future trends or behavior

Build and test predictive models

• HDInsight

• SQL Server VM

• SQL DB

• Blobs and tables

Publish API in minutes

Devices Applications Dashboards

Data Microsoft Azure Machine Learning API

Storage space Web

Microsoft

Azure portal

Workspace

ML

Studio

Business problem Business valueModeling Deployment

• Desktop files

• Excel spreadsheet

• Other data files on PC

Cloud

Local

Page 28: How does Microsoft solve Big Data?

Azure Machine Learning

Get started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off azure.com/ml

Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code

Take advantage of business-tested algorithms from Xbox and Bing

Deploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhere

Connect to the world Brand and monetize solutions on our global Machine Learning Marketplace https://datamarket.azure.com/

Beyond business intelligence – machine intelligence

Microsoft Azure Machine Learning StudioModeling environment (shown)

Microsoft Azure Machine Learning API service Model in production as a web service

Microsoft AzureMachine Learning MarketplaceAPIs and solutions for broad use

Page 29: How does Microsoft solve Big Data?

Enable enterprise-wide self-service data source registration and discovery

A metadata repository that allow users to register, enrich,

understand, discover, and consume data sources

Delivers differentiated value though

‒ Data source discovery; rather than data discovery

‒ Support for data from any source; Structured and

unstructured, on premises and in the cloud

‒ Publishing, discovery and consumption through any tool

‒ Annotation crowdsourcing: empowering any user to

capture and share their knowledge.

This, while allowing IT to maintain control and oversight

Page 30: How does Microsoft solve Big Data?

Azure Data Factory

Connect to relational or non-

relational data that is on-

premises or in the cloud

Orchestrate data movement &

data processing

Publish to Power BI users as a

searchable data view

Operationalize (schedule,

manage, debug) workflows

Lifecycle management,

monitoring

Orchestrate trusted information production in Azure

Microsoft Confidential – Under Strict NDA

C#

MapReduce

Hive

Pig

Stored Procedures

Azure Machine Learning

Page 31: How does Microsoft solve Big Data?

Discover, explore, and combine any data type or size, regardless of location

Ask questions of data to visualize, analyze, and forecast

Make faster decisions, share broadly, and access insights on any device

Visualize + decide

Transform+ analyze

Visualize+ decide

Collect + manage

Data

Visualize data and make decisions quickly using everyday tools

Page 32: How does Microsoft solve Big Data?
Page 33: How does Microsoft solve Big Data?

35

Analyze &Visualizein Excel

Discover &Combine in Excel

Collaborate, Get Insights,& Access AnywhereThrough Office 365

Microsoft Power BI

Page 34: How does Microsoft solve Big Data?

Power BI Tools Defined• Front-end (Excel)

• Data shaping and cleanup. Self-service ETL (Power Query)

• Data analysis (Power Pivot)

• Visualization and data discovery (Power View, Power Map, Power BI Designer)

• Dashboarding (Power BI Dashboard)

• Publishing and sharing (Power BI sites)

• Natural language query (Power BI Q&A)

• Mobile (Power BI for Mobile)

• Access on-premise data (DMG, Analysis Services Connector)

Power

Query

Power

Pivot

Power

View

Power

Map

Power BI

DesignerPower BI

Dashboard

Power BI Site

Power BI

Q&A

Power BI

for mobile

Page 35: How does Microsoft solve Big Data?

Power Query:Discover, explore, and combine any data

Right from Excel, find any

data: corporate, social,

machine, Hadoop, open

Easily merge, transform,

and clean up data

Page 36: How does Microsoft solve Big Data?
Page 37: How does Microsoft solve Big Data?
Page 38: How does Microsoft solve Big Data?

Power BI dashboards and KPIs for

monitoring the health of your business

New data visualizations and touch-

optimized exploration in HTML5

Power BI mobile apps across devices

including iPad and iPhone

Support for new data sources including

SalesForce.com, Dynamics CRM online

and SQL Server Analysis Services

Dashboard

Tree Map

Page 39: How does Microsoft solve Big Data?

Q&A:Ask questionsof data

Build ad hoc reports

with a drag-and-drop

interface

Look ahead to forecast

where business will go

Map up to 1 million rows

of data in 3-D

Page 40: How does Microsoft solve Big Data?

Data Management Gateway (DMG)

• Power View/Q&A: DMG refreshes

workbook so reporting not real-time

(daily frequency) and 250MB upload limit

• Power Query: Reporting is real-time

Analysis Services Connector (ASC)

• Power BI Dashboard: Get real-time

reports with ASC and SSAS Tabular

DirectQuery against SQL Server or APS.

Create reports with Power View (limited

functionality)

• You can publish Power View reports to

Power BI Sites and have it use ASC (by

uploading Excel file via Get Data in Power

BI Dashboard)

• Does not support Q&A

• Can run on any domain machine

• Multidimensional cubes coming soon

Intranet

Power BI Site

PDW

HDI

APS

DMG

Metadata

catalog

O365

Power

View/Q&A

3rd-

Party

Hadoop

Power BI

Dashboard

SSAS

Tabular

Public

Internet

ASC

Power Pivot

workbook

SQL

Server

Page 41: How does Microsoft solve Big Data?

PolyBaseQuery relational and non-relational data with T-SQL

Page 42: How does Microsoft solve Big Data?

Use cases where PolyBase simplifies using Hadoop dataBringing islands of Hadoop data together

High performance queries against Hadoop data

(Predicate pushdown)

Archiving data warehouse data to Hadoop (move)

(Hadoop as cold storage)

Exporting relational data to Hadoop (copy)

(Hadoop as backup/DR, analysis, cloud use)

Importing Hadoop data into data warehouse (copy)

(Hadoop as staging area/data lake)

Page 43: How does Microsoft solve Big Data?

Consumption Experiences

Data Visualization

Data Analysis

Data Modeling

Data Discovery & ETL

Data Warehouse/Big Data

Microsoft Analytics Platform

Page 44: How does Microsoft solve Big Data?

Cortana Intelligence SuiteTransform data into intelligent action

Action

People

Automated Systems

Apps

Web

Mobile

Bots

Intelligence

Dashboards &

Visualizations

Cortana

Bot

Framework

Cognitive

Services

Power BI

Information

Management

Event Hubs

Data Catalog

Data Factory

Machine Learning

and Analytics

HDInsight

(Hadoop and

Spark)

Stream Analytics

Intelligence

Data Lake

Analytics

Machine

Learning

Big Data Stores

SQL Data

Warehouse

Data Lake Store

Data Sources

Apps

Sensors and devices

Data

Page 45: How does Microsoft solve Big Data?

Benefits

Accelerate time-to-value

by easily deploying IoT

applications for the most

common use cases, such as

remote monitoring, asset

management, and

predictive maintenance

Plan and budget

appropriately through a

simple predictable business

model

Grow and extend solutions

to support millions of

assets

Preconfigured solutions

Introducing Microsoft Azure IoT SuiteHelping accelerate your business transformation

Azure IoT services

Azure IoT Suite

Predictive MaintenanceRemote Monitoring Asset Management

And more…Addressescommon scenarios:

Mine data Take actionConnect assets

Enables you to

Page 46: How does Microsoft solve Big Data?

Stream Analytics

TransformIngest

Example overall data flow and Architecture

Web logs

Present &

decide

IoT, Mobile Devices

etc.

Social Data

Event Hubs HDInsight

Azure Data

Factory

Azure SQL DB

Azure Blob Storage

Azure Machine

Learning

(Fraud detection

etc.)

Power BI

Web

dashboards

Mobile devices

DW / Long-term

storage

Predictive

analytics

Event & data

producers

Analytics Platform Sys.

Page 47: How does Microsoft solve Big Data?

BI and analytics

Data management and processing

Data sources Non-relational data

Data enrichment and federated query

OLTP ERP CRM LOB Devices Web Sensors Social

Self-service Corporate Collaboration Mobile Machine learning

Single query model Extract, transform, load Data quality Master data management

Box software Appliances Cloud

SQL Server

Box software Appliances Cloud

Page 48: How does Microsoft solve Big Data?
Page 49: How does Microsoft solve Big Data?
Page 50: How does Microsoft solve Big Data?

Industrial automation company partnering with multinational oil company

Oil and GasLeading industrial automation company who employs over 20,000 people.

partnering with

Leading multinational oil and gas company (one of the six oil and gas super majors) who employs over 90,000 people.

Part 1: What They Did | IoT internet-connected sensors to generate analytics for proactive maintenance

ChallengeManage sites used for dispensing liquefied natural gas (clean fuel for commercial customers who do heavy-duty road transportation)

Built LNG refueling stations across US interstate highway

Stations are unmanned so they built 24x7 remote management and monitoring to track diagnostics of each station for maintenance or tuning

Built internet-connected sensors embedded in 350 dispenser sites worldwide generating tens of thousands data points per second

• Temperature, pressure, vibration, etc.

Data needs outgrew company’s internal datacenter and data warehouse

SolutionChose Azure HDInsight, Data Factory, SQL Database, Machine Learning

Dashboards used to detect anomalies for proactive maintenance

• Changes in performance of the components

• Energy consumption of components

• Component downtime and reliability

Future: Goal is to expand program to hundreds of thousands of dispensers

IoT, Analytics

Page 51: How does Microsoft solve Big Data?

BK1

Industrial automation company partnering with multinational oil companyPart 2: How They Did It | IoT internet-connected sensors to generate analytics for proactive maintenance

How They Did ItCollect data from internet-collected sensors• Tens of thousands data points per second

• Interpolate time-series prior to analysis

• Stored raw sensor data in Blobs every 5 minutes

Use Hadoop to execute scripts and Data Factory to orchestrate• Hive and Pig scripts orchestrated by Data Factory

• Data resulting from scripts loaded in SQL Database

• Queries detect site anomalies to indicate maintenance/tuning

Produced dashboards with role-based reporting• Azure Machine Learning , SSRS, Power BI for O365

• Provide users with customizable interface

• View current and historical data (day-to-day operations, asset performance over time, etc.)

• Leveraged Azure Mobile Notification Hub for real-time notifications, alarms, or important events

Use Azure ML to predict • Understand which pumps, run at what speeds, maximized water

supply while minimizing energy use

IoT, Analytics

Page 52: How does Microsoft solve Big Data?

Software Company For Web Analytics

TechnologyA software company for web analytics, live chatting, targeting and business intelligence in e-business.

Part 1: What They Did | Web Analytics – Traffic, trends, visitor actions + Recommendation Engine

ChallengeThey build an e-business service that does site analysis, real-time monitoring of site metrics, an interactive support chat, and dynamic content builder

Needed to find the right set of products that can help them achieve this

SolutionChose Azure HDInsight, SQL Server (with Analysis Services)

Use HDInsight to preprocess and store raw data

Use Analysis Services which generates views from HDInsight

Gives their customers self-service BI on top of these views

Web Analytics Recommendation Engine

Page 53: How does Microsoft solve Big Data?

BK1

Software Company For Web Analytics

Part 2: How They Did It | Web Analytics – Traffic, trends, visitor actions + Recommendation Engine

How They Did ItStore data in Azure Blobs• Track visitor data via JavaScript code

• Used for real-time tracking and statistics

• HDInsight used to pre-process and store raw data

Customers of this company have self-service BI• Drag and drop UI

• Leveraging Analysis Services, results can be represented as tables, charts, etc.

• Analysis Services uses data from HDInsight as source• Uses HIVE ODBC driver to connect to HIVE tables

Web Analytics Recommendation Engine

Page 54: How does Microsoft solve Big Data?

Game Development Company

GamingA predominantly mobile-based game development company. While they are a mid-sized organization, they have partnered with media giants on various gaming projects

Part 1: What They Did | In-game Analytics

ChallengeAs a game development studio, they wanted to do in-game analytics to understand their players more and what they do in the games

SolutionChose Azure HDInsight (MapReduce and Storm), Service Bus and also use SQL Server for reporting

Switched from Amazon AWS EMR

Collects telemetry and logging data to gain in-game analytics:

• How many players using the game

• How many players invited their friends

• How far along did players get into the tutorial

• How many attempts did they make on one level/stage

In-game Analytics

Page 55: How does Microsoft solve Big Data?

BK1

Game Development Company

Part 2: How They Did It | In-game Analytics

How They Did ItCollect data from games in Azure Blobs• Game sends telemetry/logging data as JSON files• Contains every action of user in the game

• Data is pushed to Azure Service Bus as real-time

• Tens of Gigabytes of data captured daily

HDInsight picks up real-time data and processes• From Service Bus, HDInsight processes using Apache Storm and

MapReduce

• Constantly running experiments to determine insight• A/B testing

• In-game metrics and analytics

• Spin up 32-node cluster nightly for four hours

Output sent to SQL Server for BI• Transfer data to SQL Server for BI

In-game Analytics

Service Bus

SQL ServerOn-premises

Page 56: How does Microsoft solve Big Data?

Big Data is coming

Page 57: How does Microsoft solve Big Data?

Summary

Understand the

benefits of big data

Page 58: How does Microsoft solve Big Data?

Resources The Modern Data Warehouse: http://bit.ly/1xuX4Py

Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6

Should you move your data to the cloud? http://bit.ly/1xuXbKU

Presentation slides for Modern Data Warehousing: http://bit.ly/1xuXcP5

Presentation slides for Building an Effective Data Warehouse Architecture: http://bit.ly/1xuXeX4

Hadoop and Data Warehouses: http://bit.ly/1xuXfu9

What is the Microsoft Analytics Platform System (APS)? http://bit.ly/1xuXipO

Parallel Data Warehouse (PDW) benefits made simple: http://bit.ly/1xuXlSy

What is Advanced Analytics?

Page 59: How does Microsoft solve Big Data?

Q & A ?James Serra, Big Data Evangelist

Email me at: [email protected]

Follow me at: @JamesSerra

Link to me at: www.linkedin.com/in/JamesSerra

Visit my blog at: JamesSerra.com (where this slide deck will be posted)