Search for big data & data warehouse on msdn.microsoft.com
© 2016 Microsoft Corporation. All rights reserved. Created by the Azure Poster Team Email: [email protected]
* The above graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
R logo by Hadley Wickham and others at RStudio - https://www.r-project.org/logo/, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=35599651
Data, data, everywhere. Data is expanding ten-fold every five years with over 85% of the increase coming from new sources outside the traditional relational data warehouse. Data sources include mobile, social, videos, sensors, devices, RFID, web logs, advanced analytics, and click streams. Microsoft’s big data & data warehousing offerings are able to process and query data wherever it might live.
Store any data of any size and speed• Relational data and non-relational data
• Real-time data
• Data of any size from terabytes to petabytes
• Dynamically scales to match your business priorities
Process and query any data, anywhere• Distributed query federates and joins heterogeneous data sources from on-premises
and cloud, structured and unstructured
• Create managed data pipelines to orchestrate data transformation
On-premises and in the cloud• On-premises options with enterprise software, reference architectures and appliances
• Cloud with virtual machines (IaaS) and managed services (PaaS)
• Hybrid deployments across on-premises and cloud
LeaderGartner has named Microsoft a leader in vision and ability to execute for their 2016 Magic Quadrant for Data Warehouse and Management Solutions for Analytics.
COMPLETENESS OF VISION As of February 2016
ABIL
ITY
TO EX
ECUT
E
LEADERSCHALLENGERS
VISIONARIESNICHE PLAYERS
Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics *
Amazon Web Services
Infobright
MarkLogic
1010dataHPE
Oracle
Teradata
SAP
Microsoft
Exasol
MongoDB
Kognitio
Hitachi
MemSQL
Pivotal
Hortonworks
ActianTranswarp
MapR Technologies
Cloudera
IBM
Choice of tools and strong ecosystem• Managed and supported open source tools and Microsoft services for all functions
• Designed for seamless interaction between tools and services
• Strong partner ecosystem to integrate and extend the solution
LANDING/STORAGE
ANALYTICS/PROCESSING
WAREHOUSE/PUBLISHING
BUSINESS INTELLIGENCE/CONSUMPTIONSQL SERVER
2016
AZURE BLOB
Extract,
transfo
rm, lo
ad
SQL Server In
tegration Service
s
Direct lo
ading
AZURE DATA LAKE
STORE
Analytics Platform
System (APS)
Apache HBase
POWER BI
APPLICATIONS
AZURE EVENT
HUBS
AZURE IoT
HUB
APACHE KAFKA
INGESTION
= orchestration
AZURE STREAM
ANALYTICS
APACHESPARK
STREAMING APACHE STORM
STREAM PROCESSING
STRUCTURED
SEMI-STRUCTURED
UNSTRUCTURED
DATA SOURCES
Big data & data warehousing from beginning to endData exists in many forms, from traditional SQL stores, to IoT devices and sensors. Load raw data into a landing stage, or load it directly into storage, then extract and load it into a warehouse. Transform the data before or after it reaches the data warehouse.
Use BI tools to query the isolated store. Or ingest streaming data and process it on the fly. Use big data analytics or machine learning to catch problems before they grow, or to gain insights and meaning.
AZURE SQL DATA
WAREHOUSE
SQL SERVER 2016
SQL Server Analysis Services
SQL Server Reporting Services
SQL SERVER 2016 ANALYTICS PLATFORM SYSTEM
• Enterprise-class cloud data warehouse with T-SQL• Dynamically scale and pause in seconds• Queries integrate relational data with data in Azure Blob Storage• Deploy in seconds• Automatic backup to Azure Blob Storage
Move to storage Process Move to warehouse
• Compose and orchestrate data movement and processing at scale
• Visualize data orchestration• Connect to on-premises and cloud
data sources• Data workflow scheduling
AZURE DATA FACTORY
SQL SERVER INTEGRATION SERVICES
• Integrate & transform enterprise data• Extract data from multiple sources &
load into multiple destinations• Create solutions without writing code• Optionally code custom components
for business needs
AZURE MACHINE LEARNING
• Uncover patterns hidden in data• Apply statistical methods to solve any
problem• Get started in minutes with drag &
drop UI• Leverage familiar R and Python
support
MICROSOFT R SERVER
• Discover valuable data insights • Incorporate advanced analytics
algorithms• Flexible and agile with exceptional
performance & enterprise support• Use ScaleR to compute large data sets• No memory constraints
R
AZURE STREAM ANALYTICS
• Real-time insights from devices and sensors• Enable rapid development with SQL based syntax• Achieve mission-critical reliability and scale• Integrate directly with Power BI to publish real-time data
POWER BI SQL SERVER REPORTINGSERVICES
• Bring together data from a variety of sources and services
• Transform and model your data• Create linked, interactive visuals• Share dashboards and reports • Interact with your data anywhere on
any device
• Create mobile, interactive, tabular, and graphical reports
• Show charts, maps, and KPIs• Integrated with Visual Studio• Programming features for
automation
Data
AZURE DATA LAKESTORE
AZURE DATA LAKE ANALYTICS
• No limits to scale—architected for cloud scale and performance
• High frequency, low latency, real-time analytics
• HDFS for the cloud• Optimized for massive throughput• Stores data in native format• Enterprise ready: secure,
manageable & reliable
• Enterprise scale & performance: Scales from workstations to large
clusters Growing portfolio of parallelized
algorithms Runs R functions in parallel• Secure, scalable R deployment &
operationalization
• Write once, deploy anywhere: Windows: In-database &
standalone R server Linux: RedHat and SuSE Hadoop: HDInsight, Hortonworks,
Cloudera, MapR • IDE for data scientists and
developers (R Tools for Visual Studio)
• On-demand job service built on YARN• Pay for what you use• Use U-SQL—familiar, easily extensible• Develop faster and debug smarter with
Visual Studio tools• Query any data store with federated
query• Enterprise ready: access control &
auditing
APACHE SPARK ON AZURE HDINSIGHT MICROSOFT R SERVER ON AZURE HDINSIGHT
• Open source processing framework for data analytics• Parallel data processing persists data in-memory, on disk• Suited to ETL, batch, interactive queries• Real-time processing for real-time scenarios
APACHE HADOOP ON AZURE HDINSIGHT
BATCH
MapReduce
SCRIPT
Pig
SQL
Hive
NoSQL
HBase
STREAMING
MapReduce
• Hadoop as a service on Azure: cost-effective, elastic & flexible• Works on Azure Storage or Data Lake Store• Customize clusters to run other Hadoop open-source projects• Crunch all data—structured, semi-structured & unstructured• Scale elastically on demand• Develop in your favorite language
PolyBase
Analytics Platform System
Result setSelect ...
Cloudera on LinuxHortonworks on Linux
Hortonworks on Windows Server
AZURE SQL DATA WAREHOUSE
• Scale-out, massively parallel processing system supporting integrated data warehouse scenarios for evolving needs
• Easy to deploy, ships to your datacenter with hardware and software pre-installed and configured
• Queries across relational and non-relational data by leveraging PolyBase • Offers the lowest price per terabyte for large data warehouse workloads
DOCUMENT DBAZURE SQL DATABASE
• Natively supports JSON and JavaScript• Schema-agnostic documents,
automatically indexed • Supports SQL queries• SDKs for JavaScript, Java, Node.js,
Python, and .NET
• Create pools of elastic databases to manage performance and cost
• Develop scalable SaaS applications • Enterprise grade security • Work within your preferred
development environments
• Manage data variety and volume across all data repositories• Balance all system components with Fast Track reference architectures• Optimized for OLTP, data warehouse and mixed workloads• Full hybrid capability• Gain real-time insights without impacting performance• Elastic scale
POLYBASE
• Query Azure HDInsight, external Hadoop clusters, or Azure Blob Storage as external tables using T-SQL
• Import external data into SQL Server 2016• Export cold data from SQL Server to Hadoop or Azure Blob Storage while keeping it
queryable
SQL SERVER 2016
TRANSACT-SQL POLYBASE
U-SQL
U-SQL query
Azure Data Lake Analytics
• Single query language for all data• Optimized for big data• Familiar syntax for SQL developers• Unites declarative SQL with imperative C#• Works across structured, semi-structured,
and unstructured with federated query
• Easily scales across available nodes• Designed for parallelized big data
processing• Dedicated tooling with Visual Studio for
easy query creation and optimization• Minimize data proliferation issues caused
by multiple copies
HDInsight
Batch processingReal-time processingStream processingMachine learningInteractive SQLSpark Core Engine
Spark SQL
Yarn Mesos Standalone scheduler
Interactivequeries
Spark Streaming
Stream processing
Spark MLlib
Machine learning
GraphX
Graph computation
Unifies:
Azure Data Lake Store
YARN
Fast Track DW Real-time Operational AnalyticsMaster Data ServicesSQL Server R Services
APACHE HIVE
APACHE PIG APACHE SPARK SQL
U-SQL
R
AZURE MACHINE
LEARNING
PYTHON
T-SQL POLYBASE
Microsoft’s big data and data warehousing solutions handle all types of data, end-to-end: streaming, collection, processing, storage, and analytics
Big Data & Data Warehousing