die microsoft bi plattform in der clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · hadoop vs....

24
Digicomp 1 Kursleitung: Die Microsoft BI Plattform in der Cloud Matthias Gessenay, 20. Januar 2016 / [email protected]

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

Digicomp 1

Kursleitung:

Die Microsoft BI Plattform in der Cloud

Matthias Gessenay, 20. Januar 2016 / [email protected]

Page 2: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

2Digicomp

Copyrights

Folien z.T. entnommen aus dem Azure Readiness Slidedeck von Microsoft (https://github.com/Azure-Readiness/CloudDataCamp/blob/master/Presentation/HDInsight/Hadoop%20in%20Azure.pptx)

Folien z.T. entnommen aus der MS Ignite Session PowerBI Overview (http://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&cad=rja&uact=8&ved=0ahUKEwiH3pygp7XKAhVBVRoKHQ9KCJwQFghcMAc&url=http%3A%2F%2Fvideo.ch9.ms%2Fsessions%2Fignite%2F2015%2Fdecks%2FBRK2556_Doyle.pptx&usg=AFQjCNHOr7Kb8pJEFnLKHvAMUho0AOBhjA)

Page 3: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

Digicomp 3

Einführung in Apache Hadoop

Page 4: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

4Digicomp

Apache Hadoop

Page 5: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

6Digicomp

Data volume

Hadoop speichert Dateien in einem verteilten Dateisystem

Verteilt über viele Server

Dateien können über viele Knoten verteilt werden

Hadoop kann sehr grosse Datenmengen speichern

Skalierbar von einigen zu vielen tausend Knoten

Dateien können grösser sein als die Kapazität eines einzelnen Knotens

Page 6: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

7Digicomp

Data variety

Hadoop speichert Dateien in einem nicht-relationalen Format

Page 7: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

Hadoop vs. SQL

RelationalDatabase

SCALE (storage & processing)

HadoopPlatform

schema

speed

governance

best fit use

processing

Required on write Required on read

Reads are fast Writes are fast

Standards and structured Loosely structured

Limited, no data processing Processing coupled with data

data typesStructured Multi and unstructured

Interactive OLAP Analytics

Complex ACID Transactions

Operational Data Store

Data Discovery

Processing unstructured data

Massive Storage/Processing

Page 8: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

YARN: Next Generation Hadoop (Azure DataLake ist auf Yarn gebaut)

Single Use System

Batch Apps

Multi Use Data Platform

Batch, Interactive, Online, Streaming, …

1st Gen of Hadoop

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

Redundant, Reliable Storage(HDFS)

Efficient Cluster Resource Management & Shared Services

(YARN)

Flexible DataProcessing

Hive, Pig, others…

BatchMapReduce

Batch & InteractiveTez

Online Data Processing

HBase, Accumulo

Stream Processing

Storm

others…

2nd Gen of Hadoop

Classic Hadoop

Apps

Page 9: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

http://hortonworks.com/blog/introducing-apache-hadoop-yarn/

Hadoop 2.0: Yarn

Page 10: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

11Digicomp

Datenknoten

Verteilt

Lokaler Speicher

Fehlertolerant (3 Kopien per Block)

Splittet Dateien in Blöcke

Namensknoten

Speichert keine Daten

Weiss aber, wo welche Blöcke liegen

HDFS: Hadoop Storage

Page 11: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

Hadoop MapReduce

………

Do work() Do work() Do work()

Page 12: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

Digicomp 13

Apache Hadoop in Azure

Page 13: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

14Digicomp

HDInsight: What’s Different?

Nicht so viel …

HDP on Windows

HDP on Linux

Compute und Storage sind verteilt

Azure Blob Storage

Page 14: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

HDInsight Storage Infrastructure

HDInsight Compute Nodes (Large VMs)

Azure Blob Storage

Azure Flat Network Storage

Stream datato compute

Push databack to storage

map sort shuffle reduce

http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/

Page 15: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

16Digicomp

HDInsight Demo

Page 16: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

17Digicomp

Microsoft Self Service-BI

Page 17: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

Mächtige Self-Service BI mit Excel 2013

Page 18: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

19Digicomp

Suited for self-service data that fits in Excel

Data driven shaping – design while you drive

Ideal for sampling data

Partition data in Hadoop/Hive based on user workloads

No governors to prevent users from pulling «too much data»

Does not read compressed or binary files (yet)

Power Query

Page 19: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

22Digicomp

Demo - HDInsight

Page 20: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

23Digicomp

Azure Data Lake

Basierend auf Apache YARN

Praktisch unbegrenzte Datenmengen / Rechenpower

Zahlung nach Nutzung

Aktuell noch auf Einladung

Neue Sprache: U-SQL

Page 21: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

Demo

Page 22: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

25Digicomp

PowerBI

Cloud Dashboards

On Premise-Technologie verfügbar (DataZen)

Datenanbindung via PowerBI sehr einfach

Hybrid möglich

Page 23: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

Demo

Page 24: Die Microsoft BI Plattform in der Clouddigiblog.s3-eu-central-1.amazonaws.com/app/... · Hadoop vs. SQL Relational Database SCALE (storage & processing) Hadoop Platform schema speed

CalibriDigicomp

Fragen?