how does microsoft solve big data?
Post on 17-Jul-2015
675 Views
Preview:
TRANSCRIPT
How does Microsoft solve Big Data?
James SerraBig Data Evangelist
Microsoft
JamesSerra3@gmail.com
JamesSerra.com
Other Presentations Building an Effective Data Warehouse Architecture
Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)
Building a Big Data Solution (Building an Effective Data Warehouse
Architecture with Hadoop, the cloud and MPP)Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in
Finding business value in Big Data (What exactly is Big Data and why
should I care?)Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects
How does Microsoft solve Big Data?Covers the Microsoft products that can be used to create a Big Data solution
Modern Data Warehousing with the Microsoft Analytics Platform SystemThe next step in data warehouse performance is APS, a MPP appliance
Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etcDeep dives into the various Microsoft Big Data related products
About Me
Business Intelligence Consultant, in IT for 30 years
Microsoft, Big Data Evangelist
Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
Been perm, contractor, consultant, business owner
Presenter at PASS Business Analytics Conference and PASS Summit
MCSE: Data Platform and Business Intelligence
MS: Architecting Microsoft Azure Solutions
Blog at JamesSerra.com
Former SQL Server MVP
Author of book “Reporting with Microsoft SQL Server 2012”
I tried understanding all the Microsoft Big Data products…
And ended up passed-out drunk in a Denny’s
parking lot
Let’s prevent that from happening…
Agenda
Collect + Manage
Transform + Analyze
Visual + Decide
Access Methods
Product Groupings
Modern Data Warehouse
Sample architectures
Microsoft’s portfolio of products• Windows
• Visual Studio
• .NET
• Azure, HDInsight
• Power BI: Power Query, Power Map, PowerPivot, Power View
• Azure ML
• APS
• SQL Server, Azure SQL DB
• SCOM
• SSAS, SSRS, SSIS
• Excel
• Report Builder
• PerformancePoint
• SharePoint
• DQS
• MDS
• Data Lake
• SQL DW
Microsoft has all the Lego's to build anything you want, but difficulty is determining how the pieces fit together
The Microsoft data platform
MobileReports
Natural
language
queryDashboardsApplications
StreamingRelational
Internal &
externalNon-relational NoSQL
Orchestration
Machine
learningModeling
Information
management
Complex event
processing
Transform+ analyze
Visualize+ decide
Collect + manage
Data
Secure, reliable performance
Increase speed across all your data workloads
Capture any data: structured, unstructured, and streaming
Scale your platform quickly to meet changing demands
Collect and manage diverse data types with breakthrough speed
Collect + manage
Transform+ analyze
Visualize+ decide
Collect + manage
Data
SQL Server options
Azure SQL Database has a max database size of 500GB
Potential total volume size of up to 64 TB
Cloud-born data4
Data sources
Our customer challenges
Increasing data volumes
1
Real-time business requests
2
New data sources and types
3
Non-Relational Data
Parallelism
• Uses many separate CPUs running in parallel to execute a single
program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel
Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP
• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
Analytics Platform System (APS) for Big Data
Pre-Built Hardware + Software Appliance
• Co-engineered with HP, Dell, Quanta
• Scale-out, up to 100x performance increase
• Optional Hadoop region
• Appliance installed in 1-2 days
• Support - Microsoft provides first call support
• Hardware partner provides onsite break/fix support
Plug and Play Built-in Best Practices
Save Time On-Premise Solution
Office 365
Azure
YARN
U-SQL
Analytics
ServiceHDInsight
HDFS
Store
Introducing Azure Data Lake Store
No fixed limits file size (PB file sizes)
Designed for diversity of analytic workloads
Accessible to all HDFS compliant analytic applications (Hortonworks, Cloudera, MapR)
Managed, monitored, and supported by Microsoft
Enterprise grade features around security, compliance & management
Support HBase as NoSQL columnar database on Azure Blobs
Support Storm as stream processing
Hadoop in Azure (HDP under the covers)
Data Node Data Node Data Node Data Node
Task Tracker Task Tracker Task Tracker Task Tracker
Name Node
Job Tracker
HMasterCoordination
Region Server Region Server Region Server Region Server
HBase as a columnar NoSQL transactional database running on Azure Blobs
Storm as a streaming service for near real time processing
Hadoop 2.4 support for 100x query gains on Hive queries
Mahout support for machine learning + Hadoop
Graphical User Interface for HIVE queries
Microsoft Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
Store
HDFS
Announcing Azure Data Lake Analytics Service
Distributed analytics service
Dynamically scales to meet your business needs
Productive day one with industry leading development tools (for novices & experts)
Analytics over all data (unstructured, semi-structured, structured)
U-SQL: simple and familiar, easily extensible
Hive coming soon
Built on open standards (YARN)
Data sources
What happened?
Why did it happen?
Descriptive
Analytics
Diagnostic
Analytics
Why did it happen?
What will happen?
Predictive
Analytics
Prescriptive
Analytics
How can we make it happen?
Azure Stream AnalyticsProcess real-time data in Azure
Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure,
and applications
Performs time-sensitive analysis using SQL-like language against multiple real-time streams and
reference data
Outputs to persistent stores, dashboards or back to devices
Point of Service Devices
Self CheckoutStations
Kiosks
Smart Phones
Slates/Tablets
PCs/Laptops
Servers
Digital Signs
DiagnosticEquipmentRemote Medical
MonitorsLogic
Controllers
SpecializedDevicesThin
Clients
Handhelds
Security
POS Terminals
AutomationDevices
VendingMachines
Kinect
ATM
• Free and open source R distribution
• Enhanced and distributed by Revolution Analytics
Microsoft R Open
• Secure, Scalable and Supported Distribution of R
• With proprietary components created by Revolution
Analytics
Microsoft R Server
Fully managed database service built on a native JSON data model
Application controlled schema with massive scale-out enables iterative development and evolving data models
Automatic indexing enables robust querying over schema-free data
Integrated transactional JavaScript processing + tunable consistency enable high performance application experiences
Azure DocumentDB
SQL Server on Linux
(Preview today, GA in
mid-2017)
Red Hat - Microsoft
Partnership
(Nov 2015)
Microsoft joins Eclipse
Foundation (Mar 2016).
HD Insight PaaS on
Linux GA (Sep 2015)
C:\Users\markhill>
root@localhost: #
bash
Azure Marketplace 60% of all images in
Azure Marketplace
are based on
Linux/OSS
In partnership with the Linux
Foundation, Microsoft releases the
Microsoft Certified Solutions Associate
(MCSA) Linux on Azure certification.
493,141,677 ?????? Microsoft Open Source Hub
Ross Gardler: President Apache Software Foundation
Wim Coekaerts: Oracle’s Mr Linux
1 out of 4 VMs on Azure runs
Linux, and getting larger every
day
• 28.9% of All VMs are Linux
• >50% of new VMs
Connect, combine, and refine any data
Create data marts and publish reports
Build and test predictive models
Curate and catalog any data
Transform + analyze
Transform+ analyze
Visualize+ decide
Collect + manage
Data
Transform and analyze data for anyone to access anywhere
Make sense of disparate data and prepare it for analysis
Connect, combine, and refine any data
Integration, Data Quality and Master Data Services
• Rich support for ETL tasks
• Data cleansing and matching
• Manage master data structures
Connect any data and all volumes in real time
• Social data
• SAP and Dynamics data
• Machine data
Query aggregated data and build reports
Create data marts and reports
Reporting services
• Create and publish interactive reports
• Consolidate reporting management
• Enable reporting capabilities for anyone
Analysis services
• Single semantic model
• 100x faster analysis with in-memory columnstore
• Manage user-created BI content
Use the power of machine learning to predict future trends or behavior
Build and test predictive models
• HDInsight
• SQL Server VM
• SQL DB
• Blobs and tables
Publish API in minutes
Devices Applications Dashboards
Data Microsoft Azure Machine Learning API
Storage space Web
Microsoft
Azure portal
Workspace
ML
Studio
Business problem Business valueModeling Deployment
• Desktop files
• Excel spreadsheet
• Other data files on PC
Cloud
Local
Azure Machine Learning
Get started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off azure.com/ml
Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code
Take advantage of business-tested algorithms from Xbox and Bing
Deploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhere
Connect to the world Brand and monetize solutions on our global Machine Learning Marketplace https://datamarket.azure.com/
Beyond business intelligence – machine intelligence
Microsoft Azure Machine Learning StudioModeling environment (shown)
Microsoft Azure Machine Learning API service Model in production as a web service
Microsoft AzureMachine Learning MarketplaceAPIs and solutions for broad use
Enable enterprise-wide self-service data source registration and discovery
A metadata repository that allow users to register, enrich,
understand, discover, and consume data sources
Delivers differentiated value though
‒ Data source discovery; rather than data discovery
‒ Support for data from any source; Structured and
unstructured, on premises and in the cloud
‒ Publishing, discovery and consumption through any tool
‒ Annotation crowdsourcing: empowering any user to
capture and share their knowledge.
This, while allowing IT to maintain control and oversight
Azure Data Factory
Connect to relational or non-
relational data that is on-
premises or in the cloud
Orchestrate data movement &
data processing
Publish to Power BI users as a
searchable data view
Operationalize (schedule,
manage, debug) workflows
Lifecycle management,
monitoring
Orchestrate trusted information production in Azure
Microsoft Confidential – Under Strict NDA
C#
MapReduce
Hive
Pig
Stored Procedures
Azure Machine Learning
Discover, explore, and combine any data type or size, regardless of location
Ask questions of data to visualize, analyze, and forecast
Make faster decisions, share broadly, and access insights on any device
Visualize + decide
Transform+ analyze
Visualize+ decide
Collect + manage
Data
Visualize data and make decisions quickly using everyday tools
35
Analyze &Visualizein Excel
Discover &Combine in Excel
Collaborate, Get Insights,& Access AnywhereThrough Office 365
Microsoft Power BI
Power BI Tools Defined• Front-end (Excel)
• Data shaping and cleanup. Self-service ETL (Power Query)
• Data analysis (Power Pivot)
• Visualization and data discovery (Power View, Power Map, Power BI Designer)
• Dashboarding (Power BI Dashboard)
• Publishing and sharing (Power BI sites)
• Natural language query (Power BI Q&A)
• Mobile (Power BI for Mobile)
• Access on-premise data (DMG, Analysis Services Connector)
Power
Query
Power
Pivot
Power
View
Power
Map
Power BI
DesignerPower BI
Dashboard
Power BI Site
Power BI
Q&A
Power BI
for mobile
Power Query:Discover, explore, and combine any data
Right from Excel, find any
data: corporate, social,
machine, Hadoop, open
Easily merge, transform,
and clean up data
Power BI dashboards and KPIs for
monitoring the health of your business
New data visualizations and touch-
optimized exploration in HTML5
Power BI mobile apps across devices
including iPad and iPhone
Support for new data sources including
SalesForce.com, Dynamics CRM online
and SQL Server Analysis Services
Dashboard
Tree Map
Q&A:Ask questionsof data
Build ad hoc reports
with a drag-and-drop
interface
Look ahead to forecast
where business will go
Map up to 1 million rows
of data in 3-D
Data Management Gateway (DMG)
• Power View/Q&A: DMG refreshes
workbook so reporting not real-time
(daily frequency) and 250MB upload limit
• Power Query: Reporting is real-time
Analysis Services Connector (ASC)
• Power BI Dashboard: Get real-time
reports with ASC and SSAS Tabular
DirectQuery against SQL Server or APS.
Create reports with Power View (limited
functionality)
• You can publish Power View reports to
Power BI Sites and have it use ASC (by
uploading Excel file via Get Data in Power
BI Dashboard)
• Does not support Q&A
• Can run on any domain machine
• Multidimensional cubes coming soon
Intranet
Power BI Site
PDW
HDI
APS
DMG
Metadata
catalog
O365
Power
View/Q&A
3rd-
Party
Hadoop
Power BI
Dashboard
SSAS
Tabular
Public
Internet
ASC
Power Pivot
workbook
SQL
Server
PolyBaseQuery relational and non-relational data with T-SQL
Use cases where PolyBase simplifies using Hadoop dataBringing islands of Hadoop data together
High performance queries against Hadoop data
(Predicate pushdown)
Archiving data warehouse data to Hadoop (move)
(Hadoop as cold storage)
Exporting relational data to Hadoop (copy)
(Hadoop as backup/DR, analysis, cloud use)
Importing Hadoop data into data warehouse (copy)
(Hadoop as staging area/data lake)
Consumption Experiences
Data Visualization
Data Analysis
Data Modeling
Data Discovery & ETL
Data Warehouse/Big Data
Microsoft Analytics Platform
Cortana Intelligence SuiteTransform data into intelligent action
Action
People
Automated Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards &
Visualizations
Cortana
Bot
Framework
Cognitive
Services
Power BI
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream Analytics
Intelligence
Data Lake
Analytics
Machine
Learning
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data Sources
Apps
Sensors and devices
Data
Benefits
Accelerate time-to-value
by easily deploying IoT
applications for the most
common use cases, such as
remote monitoring, asset
management, and
predictive maintenance
Plan and budget
appropriately through a
simple predictable business
model
Grow and extend solutions
to support millions of
assets
Preconfigured solutions
Introducing Microsoft Azure IoT SuiteHelping accelerate your business transformation
Azure IoT services
Azure IoT Suite
Predictive MaintenanceRemote Monitoring Asset Management
And more…Addressescommon scenarios:
Mine data Take actionConnect assets
Enables you to
Stream Analytics
TransformIngest
Example overall data flow and Architecture
Web logs
Present &
decide
IoT, Mobile Devices
etc.
Social Data
Event Hubs HDInsight
Azure Data
Factory
Azure SQL DB
Azure Blob Storage
Azure Machine
Learning
(Fraud detection
etc.)
Power BI
Web
dashboards
Mobile devices
DW / Long-term
storage
Predictive
analytics
Event & data
producers
Analytics Platform Sys.
BI and analytics
Data management and processing
Data sources Non-relational data
Data enrichment and federated query
OLTP ERP CRM LOB Devices Web Sensors Social
Self-service Corporate Collaboration Mobile Machine learning
Single query model Extract, transform, load Data quality Master data management
Box software Appliances Cloud
SQL Server
Box software Appliances Cloud
Industrial automation company partnering with multinational oil company
Oil and GasLeading industrial automation company who employs over 20,000 people.
partnering with
Leading multinational oil and gas company (one of the six oil and gas super majors) who employs over 90,000 people.
Part 1: What They Did | IoT internet-connected sensors to generate analytics for proactive maintenance
ChallengeManage sites used for dispensing liquefied natural gas (clean fuel for commercial customers who do heavy-duty road transportation)
Built LNG refueling stations across US interstate highway
Stations are unmanned so they built 24x7 remote management and monitoring to track diagnostics of each station for maintenance or tuning
Built internet-connected sensors embedded in 350 dispenser sites worldwide generating tens of thousands data points per second
• Temperature, pressure, vibration, etc.
Data needs outgrew company’s internal datacenter and data warehouse
SolutionChose Azure HDInsight, Data Factory, SQL Database, Machine Learning
Dashboards used to detect anomalies for proactive maintenance
• Changes in performance of the components
• Energy consumption of components
• Component downtime and reliability
Future: Goal is to expand program to hundreds of thousands of dispensers
IoT, Analytics
BK1
Industrial automation company partnering with multinational oil companyPart 2: How They Did It | IoT internet-connected sensors to generate analytics for proactive maintenance
How They Did ItCollect data from internet-collected sensors• Tens of thousands data points per second
• Interpolate time-series prior to analysis
• Stored raw sensor data in Blobs every 5 minutes
Use Hadoop to execute scripts and Data Factory to orchestrate• Hive and Pig scripts orchestrated by Data Factory
• Data resulting from scripts loaded in SQL Database
• Queries detect site anomalies to indicate maintenance/tuning
Produced dashboards with role-based reporting• Azure Machine Learning , SSRS, Power BI for O365
• Provide users with customizable interface
• View current and historical data (day-to-day operations, asset performance over time, etc.)
• Leveraged Azure Mobile Notification Hub for real-time notifications, alarms, or important events
Use Azure ML to predict • Understand which pumps, run at what speeds, maximized water
supply while minimizing energy use
IoT, Analytics
Software Company For Web Analytics
TechnologyA software company for web analytics, live chatting, targeting and business intelligence in e-business.
Part 1: What They Did | Web Analytics – Traffic, trends, visitor actions + Recommendation Engine
ChallengeThey build an e-business service that does site analysis, real-time monitoring of site metrics, an interactive support chat, and dynamic content builder
Needed to find the right set of products that can help them achieve this
SolutionChose Azure HDInsight, SQL Server (with Analysis Services)
Use HDInsight to preprocess and store raw data
Use Analysis Services which generates views from HDInsight
Gives their customers self-service BI on top of these views
Web Analytics Recommendation Engine
BK1
Software Company For Web Analytics
Part 2: How They Did It | Web Analytics – Traffic, trends, visitor actions + Recommendation Engine
How They Did ItStore data in Azure Blobs• Track visitor data via JavaScript code
• Used for real-time tracking and statistics
• HDInsight used to pre-process and store raw data
Customers of this company have self-service BI• Drag and drop UI
• Leveraging Analysis Services, results can be represented as tables, charts, etc.
• Analysis Services uses data from HDInsight as source• Uses HIVE ODBC driver to connect to HIVE tables
Web Analytics Recommendation Engine
Game Development Company
GamingA predominantly mobile-based game development company. While they are a mid-sized organization, they have partnered with media giants on various gaming projects
Part 1: What They Did | In-game Analytics
ChallengeAs a game development studio, they wanted to do in-game analytics to understand their players more and what they do in the games
SolutionChose Azure HDInsight (MapReduce and Storm), Service Bus and also use SQL Server for reporting
Switched from Amazon AWS EMR
Collects telemetry and logging data to gain in-game analytics:
• How many players using the game
• How many players invited their friends
• How far along did players get into the tutorial
• How many attempts did they make on one level/stage
In-game Analytics
BK1
Game Development Company
Part 2: How They Did It | In-game Analytics
How They Did ItCollect data from games in Azure Blobs• Game sends telemetry/logging data as JSON files• Contains every action of user in the game
• Data is pushed to Azure Service Bus as real-time
• Tens of Gigabytes of data captured daily
HDInsight picks up real-time data and processes• From Service Bus, HDInsight processes using Apache Storm and
MapReduce
• Constantly running experiments to determine insight• A/B testing
• In-game metrics and analytics
• Spin up 32-node cluster nightly for four hours
Output sent to SQL Server for BI• Transfer data to SQL Server for BI
In-game Analytics
Service Bus
SQL ServerOn-premises
Big Data is coming
Summary
Understand the
benefits of big data
Resources The Modern Data Warehouse: http://bit.ly/1xuX4Py
Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6
Should you move your data to the cloud? http://bit.ly/1xuXbKU
Presentation slides for Modern Data Warehousing: http://bit.ly/1xuXcP5
Presentation slides for Building an Effective Data Warehouse Architecture: http://bit.ly/1xuXeX4
Hadoop and Data Warehouses: http://bit.ly/1xuXfu9
What is the Microsoft Analytics Platform System (APS)? http://bit.ly/1xuXipO
Parallel Data Warehouse (PDW) benefits made simple: http://bit.ly/1xuXlSy
What is Advanced Analytics?
Q & A ?James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck will be posted)
top related