big data & advanced analytics - managed services“ on...
TRANSCRIPT
Big Data & Advanced Analytics -"managed Services“ on Azure
Guido Jacobs
Global Black Belt – TSP Big Data
Microsoft Deutschland GmbH
“Hyper scale” Infrastruktur das ist Azure!
Central US
Iowa
West US
California
East US
Virginia
US Gov
Virginia
North Central US
Illinois
US Gov
Iowa
South Central US
Texas
Brazil South
Sao Paulo State
West Europe
Netherlands
China North *
Beijing
China South *
Shanghai
Japan West
Osaka
India South
Chennai
East Asia
Hong Kong
SE Asia
Singapore
India Central
Pune
Japan East
Tokyo, Saitama
Australia East
New South Wales
Australia South East
Victoria
Canada East
Quebec City
Canada Central
Toronto
India West
Mumbai
Germany North East **
Magdeburg
Germany Central **
FrankfurtNorth Europe
Ireland
East US 2
Virginia
United Kingdom
Regions
http://azure.microsoft.com/en-us/regions/
US DoD West
TBD
US DoD East
TBD
West US 2
California
West Central US
Korea Central
Seoul
Korea South
Flexibilität in der CloudCONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology
Workload optimized,
managed clusters
Specific apps in a multi-
tenant form factorAzure Marketplace
HDP | CDH | MapR
Azure Data Lake
Analytics
IaaS Hadoop Managed Hadoop Big Data as-a-service
Azure HDInsight
BIG
DA
TA
S
TO
RA
GE
BIG
DA
TA
A
NA
LY
TIC
S
Use
r A
do
pti
on
Azure Data Lake StoreA hyper-scale repository for Big Data analytics workloads
Hadoop File System (HDFS) for the cloud
No limits to scale
Store any data in its native format
Enterprise-grade access control,
encryption at rest
Optimized for analytic workload performance
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
Azure HDInsightHadoop and Spark as a Service on Azure
Fully-managed Hadoop and Spark
for the cloud
100% Open Source Hortonworks
data platform
Clusters up and running in minutes
Managed, monitored and supported
by Microsoft with the industry’s best SLA
Familiar BI tools for analysis, or open source
notebooks for interactive data science
63% lower TCO than deploy your own
Hadoop on-premises*
AzureData Lake AnalyticsA new distributed analytics service
Distributed analytics service built on
Apache YARN
Elastic scale per query lets users focus on
business goals—not configuring hardware
Includes U-SQL—a language that unifies the
benefits of SQL with the expressive
power of C#
Integrates with Visual Studio to develop,
debug, and tune code faster
Federated query across Azure data sources
Enterprise-grade role based access control
Daten einfach kombinieren
Vorteile
• Große Datenmengen müssen NICHT zwischen unterschiedlichen Speichern verschoben werden
• Einheitliche Sicht auf Daten unabhängig vom physikalischen Speicherplatz
• Verringerung der Datenpflege durch weniger Kopien
• Eine Abfragesprache für ALLE Datenquellen
• Jeder Datenspeicher behält seine Souveränität
• Bedarfsbezogenen Lösungsdesign
• SQL Prädikate werden an die SQL-Quellen gesendet
• Filters
• Joins
U-SQL
Query
Query
Azure Storage Blobs
SQL in Azure VMs
Azure SQL DB
Azure Data
Lake Analytics
Azure SQL Data Warehouse
Azure Data Lake Storage
Trennung von Storage & Compute (2)
Azure Data Lake Store
ADL-A/HDI
SQL DW PowerBIData
ADL-A/HDI ADL-A/HDI
ADL-A/HDIHDI/
R-ServerHDI/
H2O AI
What is Azure Data Catalog?
An enterprise-wide directory in Azure that enables self-service discovery of data from any source
A metadata repository that allow users to register, enrich,understand, discover, and consume data sources
An enterprise-wide catalog in Azure that enables self-service discovery of data from any source
Perimeter Level SecurityVirtual NetworksNetwork Security Groups (Firewalls)
AuthenticationKerberosAzure Active Directory
AuthorizationApache RangerRBAC for AdminPOSIX ACLs for Data Plane
Data SecurityServer-Side encryption at restHTTPS/ TLS in-transit
Enterprise grade Security in HDInsight
HDInsight - 3rd Party Solutions
• H2O – Sparkling Water:https://www.h2o.ai/sparkling-water/
• Datameer:https://www.datameer.com/
• Data iku - Data Science Studio:https://www.dataiku.com/
• Cask Data App Platform (CDAP):http://cask.co/products/cdap/
• StreamSets Data Collector:https://streamsets.com/products/sdc/
• Spark Job Server for KNIME Spark Executor:https://www.knime.org/knime-spark-executor
• SnapLogic Hadooplex:https://snaplogic.com/solutions/microsoft-cortana-analytics-integration
On-premise HDP Cluster
AzureBig Data Storage/
Azure Data Lake Store
Optimized for MMP based Analytical Workloads
Authorized Accessby Azure AD
Access via:• ADL:// (Oauth2)• WebHDFS (Oauth2)
No upfront costNo pre-allocationPay-for-stored-Data
Shared Meta-Data
RANGER
HIVE
OOZIE
…
HN HN
WN WN …
RAM & CPU are configured to fulfill the workload requirements
WN
HDInsight ClusterType: R-Server
HN HN
WN WN …
RAM & CPU are configured to fulfill the workload requirements
WN
HDInsight ClusterType: Spark
HN HN
WN WN …
RAM & CPU are configured to fulfill the workload requirements
WN
HDInsight ClusterType: 3rd Party
HN HN
WN WN …WN
HDP (IaaS)Type: Cloudbreak
HN
HDFS (only for temp & spilling data)
Edge-Node
3rd PartyComponents
Synchronisation on File-level
The Business Value and TCO of HDInsight
• 418% 5-year ROI
• Four month payback period
• 63% 5-year lower TCO than on-premises
• 66% staff efficiencies than on-premises
• Get it at http://aka.ms/hdinsight
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
RESOURCES ACROSS THE SALES CYCLE