azure big data & machine learning matthias gessenay ... · 2 agenda introduction to azure data...

Post on 29-May-2020

13 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Azure Big Data & Machine Learning

Matthias Gessenay & Roman A. Kahr

2

Agenda

Introduction to Azure Data Science Tools

Azure Data Lake

Hadoop

Azure Jupyter Notebooks

Azure Machine Learning Studio

Machine Learning

Regression vs. Classification vs. Neural Network

Sampling Probleme

Case: Building predictive Web Service with Azure ML

Data Analysis

Implementation Algorithm & Web Service

3

Matthias Gessenay

Co-Founder Corporate Software AG

Microsoft Professional Program – Data Science, ITIL Expert, MCSA/E/ITP/A/E

Senior Consultant & Trainer

Matthias.Gessenay@corporatesoftware.ch

4

Roman A. Kahr

Corporate Software AG

Microsoft Data Science Professional

Consultant & Trainer

Roman.Kahr@corporatesoftware.ch

5

Corporate Software

Founded 2011 in Biel/Bienne

Microsoft Partner

Gold Cloud Productivity

Gold Collaboration and Content

Gold Project and Portfolio Management

Gold Data Analytics

17 Consultants

6

Azure Data Science Tools – Big Picture

Bring together all your Data

Exploding Data Volumes

Unstructured Data

No bounds – no cost tradeoff

Improve performance

On-prem infrastructure too slow

Difficulty to build a distributed on-prem infrastructure

Cost-intensive

Scalability

7

Data Lake

8

Data Lake

Two Parts:

Data Lake Store

Data Lake Analytics

U-SQL

Similar to T-SQL

Range of extensions: R, Python..

200x more storage

Pay-as-you-go

1TB ~ 35$ p.a

9

HDInsight

Hadoop in common world

Principles:

Split work

Split Data from Analytics

Pros of Azure

Scalability

Pay-as-you-go/cost optimization

Fast deployment

10

Jupyter Notebook

Virtual instance to run R or Python

Nice interface

Highly performant

Perfectly integrated into the Azure ecosystem

Perfect to make presentations of analysis!

Demo (Analyzing the Data)

11

Machine Learning

“Giving the computer the ability to learn without being explicitly programmed”

Supervised learning

Regression

Unsurpervised learning

Clustering, neural nets

Reinforced learning

AlphaGo

12

Machine Learning – Sampling Issue

Inductive

Issue before Data Science: Capacity!

Paid price: inaccuracy

13

Problems by using on-prem solution

Say you have a performant working algorithm

How do you consume the data?

Request/Response API

How are you working with additional data?

How do you manage the costs?

14

Azure Machine Learning Studio

Free (for the moment)

Unlimited computing power

Prewritten modules

Possibility to use R code

Existing API to consume a trained model

Prewritten Web Applications are shared

15

Demo

~2 Mio flights

Arrival/Departure Zurich

Set of attributes

Objective: Train a model to predict if a plane is on time or not and publish the trained model to a web application where the end user can consume this data

16

Logistic Regression

Problem with linear regressions:

Min < 0

Max > 1

Fit is bad!

Solution: Logistic Regression:

Min == 0

Max == 1

Regression vs. Classification?

17

Method

Jupyter

Merge and clean data

Transform the data

Analyze

Azure ML

Choose type of machine learning

Build predictive model

Publish Web Service

Build Web application

Outcome: simple Website predicting if a flight is on time or not

top related