azure big data & machine learning matthias gessenay ... · 2 agenda introduction to azure data...

17
Azure Big Data & Machine Learning Matthias Gessenay & Roman A. Kahr

Upload: others

Post on 29-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

Azure Big Data & Machine Learning

Matthias Gessenay & Roman A. Kahr

Page 2: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

2

Agenda

Introduction to Azure Data Science Tools

Azure Data Lake

Hadoop

Azure Jupyter Notebooks

Azure Machine Learning Studio

Machine Learning

Regression vs. Classification vs. Neural Network

Sampling Probleme

Case: Building predictive Web Service with Azure ML

Data Analysis

Implementation Algorithm & Web Service

Page 3: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

3

Matthias Gessenay

Co-Founder Corporate Software AG

Microsoft Professional Program – Data Science, ITIL Expert, MCSA/E/ITP/A/E

Senior Consultant & Trainer

[email protected]

Page 4: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

4

Roman A. Kahr

Corporate Software AG

Microsoft Data Science Professional

Consultant & Trainer

[email protected]

Page 5: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

5

Corporate Software

Founded 2011 in Biel/Bienne

Microsoft Partner

Gold Cloud Productivity

Gold Collaboration and Content

Gold Project and Portfolio Management

Gold Data Analytics

17 Consultants

Page 6: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

6

Azure Data Science Tools – Big Picture

Bring together all your Data

Exploding Data Volumes

Unstructured Data

No bounds – no cost tradeoff

Improve performance

On-prem infrastructure too slow

Difficulty to build a distributed on-prem infrastructure

Cost-intensive

Scalability

Page 7: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

7

Data Lake

Page 8: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

8

Data Lake

Two Parts:

Data Lake Store

Data Lake Analytics

U-SQL

Similar to T-SQL

Range of extensions: R, Python..

200x more storage

Pay-as-you-go

1TB ~ 35$ p.a

Page 9: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

9

HDInsight

Hadoop in common world

Principles:

Split work

Split Data from Analytics

Pros of Azure

Scalability

Pay-as-you-go/cost optimization

Fast deployment

Page 10: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

10

Jupyter Notebook

Virtual instance to run R or Python

Nice interface

Highly performant

Perfectly integrated into the Azure ecosystem

Perfect to make presentations of analysis!

Demo (Analyzing the Data)

Page 11: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

11

Machine Learning

“Giving the computer the ability to learn without being explicitly programmed”

Supervised learning

Regression

Unsurpervised learning

Clustering, neural nets

Reinforced learning

AlphaGo

Page 12: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

12

Machine Learning – Sampling Issue

Inductive

Issue before Data Science: Capacity!

Paid price: inaccuracy

Page 13: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

13

Problems by using on-prem solution

Say you have a performant working algorithm

How do you consume the data?

Request/Response API

How are you working with additional data?

How do you manage the costs?

Page 14: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

14

Azure Machine Learning Studio

Free (for the moment)

Unlimited computing power

Prewritten modules

Possibility to use R code

Existing API to consume a trained model

Prewritten Web Applications are shared

Page 15: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

15

Demo

~2 Mio flights

Arrival/Departure Zurich

Set of attributes

Objective: Train a model to predict if a plane is on time or not and publish the trained model to a web application where the end user can consume this data

Page 16: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

16

Logistic Regression

Problem with linear regressions:

Min < 0

Max > 1

Fit is bad!

Solution: Logistic Regression:

Min == 0

Max == 1

Regression vs. Classification?

Page 17: Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

17

Method

Jupyter

Merge and clean data

Transform the data

Analyze

Azure ML

Choose type of machine learning

Build predictive model

Publish Web Service

Build Web application

Outcome: simple Website predicting if a flight is on time or not