machine learning with r as a servicedownload.microsoft.com/download/6/5/0/65023338-ae... · word...

Post on 25-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

29/10/2015

1

Data Platform Airlift21 de Outubro \\ Microsoft Lisbon Experience

Machine Learning with R as a Service

Manuel Dias

Business Analytics Lead, Microsoft

manuel.dias@microsoft.com

Predictive Analytics Sales and marketing

Finance and risk

Customer and channel

Operations and workforce

Utilities, Oil & Gas

Agent Allocation

Warehouse Efficiency

Smart buildings

Predictive Maintenance

Supply chain optimization

User Segmentation

Personalized Offers

Product Recommendation

Fraud Detection

Credit risk management

Sales Forecasting

Demand Forecasting

Sales Lead Scoring

Marketing mix optimization

Energy Forecasting

Grid Optimization

Theft Prevention

Predictive maintenance

Demand Response

Customer Profiling

Credit Scoring

Revenue Forecasting

29/10/2015

2

Harvard Business, Thomas H. Davenport , October 2012

What is Machine Learning?

Predictive computing

systems become smarter

with experience

We want to learn a mapping from the input to the output; correct

values are provided by supervisor:

• Fraud Detection

• Image Recognition

• SPAM Filter

• Sales Forecast

We want to find regularities in the data. The class labels of training

data is unknown.

• Customer Segmentation

• Movies Recommendation engine

29/10/2015

3

Azure ML

Identify outliers on

the running data

Predict numerical

outcomes

Explore associations

between cases

Discover natural

groupings of cases

Classification Anomaly Detection RecommendersRegression Clustering

Predict what class

case belongs to

Supervised Learning Unsupervised Learning

R and

Python

Mathematical

Programming

Online

analytical

processing

Graph

analytics

Text

analytics

Support

Vector

Machines

Boosted

Decision

Trees

Time series

processing

In the future

Support for

extensibility

by enabling

users to add

their own

algorithms as

modules

Associative

rule mining

Neural

networks

Regression

analysisClustering

Nearest-

neighbor

29/10/2015

4

Azure Machine Learning

DATA

HDInsight

SQL Server VM

SQL Database

Blobs & Tables

Desktop files

Excel spreadsheet

Other data files on PC

Azure Machine

Learning

ML Studio

Azure Machine Learning

ML Marketplace

Devices & Applications

Publish API

Get Historical Data

Feature Engineering

Evaluate Model

Define Model

Score Model

Train/ Test Split

Train Model

Iterate until the test

metrics are satisfactory

29/10/2015

5

1 0

1 506TRUE POSITIVE (TP)

112FALSE NEGATIVE

(FN)

0 169FALSE POSITIVE (FP)

420TRUE NEGATIVE (TN)

Accuracy = 𝑇𝑃 + 𝑇𝑁

𝑃+𝑁

Sometimes a better model may have lower Accuracy!

Precision = 𝑇𝑃

𝑇𝑃+𝐹𝑃

How many of the returned documents are correct

Recall = 𝑇𝑃

𝑇𝑃+𝐹𝑁=

𝑇𝑃

𝑃

How many of the positive labels are correct

Demo 1Building a model to predict Population Income

29/10/2015

6

A Language Platform…

A Community…

Tools & Resourceshttp://www.rstudio.com/

Core R: http://cran.r-project.org/

R was ranked no. 1 in the KDnuggets 2014 poll on Top Languages for analytics, data mining, data science

Classification

Decision trees: rpart, party

Random forest: randomForest, party

SVM: e1071, kernlab

Neural networks: nnet, neuralnet, RSNNS

Performance evaluation: ROCR

Clustering

k-means: kmeans(), kmeansruns()10

k-medoids: pam(), pamk()

Hierarchical clustering: hclust(), agnes(), diana()

DBSCAN: fpc

BIRCH: birch

Cluster validation: packages clv, clValid, NbClust

Association Rules

Association rules: apriori(), eclat() in package arules

Sequential patterns: arulesSequence

Visualisation of associations: arulesVi

Text Mining

Text mining: tm

Topic modelling: topicmodels, lda

Word cloud: wordcloud

Twitter data access: twitteR

29/10/2015

7

Execute R Script

Create R Model

29/10/2015

8

Demo 2Building a model to predict Population Income

in R and deplying it in Azure ML

29/10/2015

9

Predictive Analysis

Usage files

Data Sources Ingest & Pre-

processing

Data Preparation

(normalize, clean, etc)

Analyze & Score

(build Predictive Model)

Publish for

Consumption

Consume

Cloud Storage

batch

Processing

Engine

Machine LearningData Cleanup

Relational

DW/DM

BI Tools

- Volume per move is

typically ~100GB or

- Data is most commonly

collected nightly or

hourly

- Common Pre-processing

steps: scrub for

compliance purposes &

partition for long term

storage

- HDI & Customer code used in

this step as a

transformation/cleaning tool

- E.g. enrich, normalize

ADF: Move Data, Orchestrate, Schedule & Monitor

- Generate BI-Ready results (e.g.

dims or facts, aggregated big

data, etc)

- Create result set to drive app

or business process (e.g. list of

customers likely to churn next

month)

- In this scenario is used

as a queryable storage

system for Information

workers and analyst to

connect their BI tools

to.

- BI Tools: Power BI,

Tableau, etc.

- Apps here means any

programmatic

consumption

Business

AppsCustomer Info

Real time

top related