azure big data & machine learning matthias gessenay ... · 2 agenda introduction to azure data...
TRANSCRIPT
Azure Big Data & Machine Learning
Matthias Gessenay & Roman A. Kahr
2
Agenda
Introduction to Azure Data Science Tools
Azure Data Lake
Hadoop
Azure Jupyter Notebooks
Azure Machine Learning Studio
Machine Learning
Regression vs. Classification vs. Neural Network
Sampling Probleme
Case: Building predictive Web Service with Azure ML
Data Analysis
Implementation Algorithm & Web Service
3
Matthias Gessenay
Co-Founder Corporate Software AG
Microsoft Professional Program – Data Science, ITIL Expert, MCSA/E/ITP/A/E
Senior Consultant & Trainer
4
Roman A. Kahr
Corporate Software AG
Microsoft Data Science Professional
Consultant & Trainer
5
Corporate Software
Founded 2011 in Biel/Bienne
Microsoft Partner
Gold Cloud Productivity
Gold Collaboration and Content
Gold Project and Portfolio Management
Gold Data Analytics
17 Consultants
6
Azure Data Science Tools – Big Picture
Bring together all your Data
Exploding Data Volumes
Unstructured Data
No bounds – no cost tradeoff
Improve performance
On-prem infrastructure too slow
Difficulty to build a distributed on-prem infrastructure
Cost-intensive
Scalability
7
Data Lake
8
Data Lake
Two Parts:
Data Lake Store
Data Lake Analytics
U-SQL
Similar to T-SQL
Range of extensions: R, Python..
200x more storage
Pay-as-you-go
1TB ~ 35$ p.a
9
HDInsight
Hadoop in common world
Principles:
Split work
Split Data from Analytics
Pros of Azure
Scalability
Pay-as-you-go/cost optimization
Fast deployment
10
Jupyter Notebook
Virtual instance to run R or Python
Nice interface
Highly performant
Perfectly integrated into the Azure ecosystem
Perfect to make presentations of analysis!
Demo (Analyzing the Data)
11
Machine Learning
“Giving the computer the ability to learn without being explicitly programmed”
Supervised learning
Regression
Unsurpervised learning
Clustering, neural nets
Reinforced learning
AlphaGo
12
Machine Learning – Sampling Issue
Inductive
Issue before Data Science: Capacity!
Paid price: inaccuracy
13
Problems by using on-prem solution
Say you have a performant working algorithm
How do you consume the data?
Request/Response API
How are you working with additional data?
How do you manage the costs?
14
Azure Machine Learning Studio
Free (for the moment)
Unlimited computing power
Prewritten modules
Possibility to use R code
Existing API to consume a trained model
Prewritten Web Applications are shared
15
Demo
~2 Mio flights
Arrival/Departure Zurich
Set of attributes
Objective: Train a model to predict if a plane is on time or not and publish the trained model to a web application where the end user can consume this data
16
Logistic Regression
Problem with linear regressions:
Min < 0
Max > 1
Fit is bad!
Solution: Logistic Regression:
Min == 0
Max == 1
Regression vs. Classification?
17
Method
Jupyter
Merge and clean data
Transform the data
Analyze
Azure ML
Choose type of machine learning
Build predictive model
Publish Web Service
Build Web application
Outcome: simple Website predicting if a flight is on time or not