Predicting the Future with Azure Machine Learning

Download Predicting the Future with Azure Machine Learning

Post on 05-Apr-2017

257 views

Category:

Data & Analytics

2 download

TRANSCRIPT

101 Training Deck Template

Predicting the Futurewith Azure Machine Learning

1

PresenterPaul PraeConsultant, Slalom ConsultingB.A. in Cognitive Science with a Focused Foundation in Artificial IntelligenceB.S. in Computer Science with an Area of Emphasis in Artificial Intelligence@Praeducerwww.paulprae.com

2

The Why. Mention $40B industry from NY Time Article.By 2020, the market for machine learning applications will reach $40 billion, IDC, a market research firm, estimates. And 60 percent of those applications, the firm predicts, will run on the platform software of four companies Amazon, Google, IBM and Microsoft.http://www.nytimes.com/2016/03/26/technology/the-race-is-on-to-control-artificial-intelligence-and-techs-future.html?_r=0Alpha Go problem. Can now solve intractable problems.

Machine learning is the next Internet (Tony Tether, Director, DARPA) https://www.thenewswire.com/client_files/tnwiHW6WU.png3

What is Machine Learning?The field of study that gives computers the ability to learn without being explicitly programmed.MachineLearningAlgorithm

DataOutputProgram

What is machine learning?

Automating automation Getting computers to program themselves Writing software is the bottleneck Let the data do the work instead

https://en.wikipedia.org/wiki/Book:Machine_Learning_%E2%80%93_The_Complete_Guide

4

What is Machine Learning?Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data.Supervised learning is the machine learning task of inferring a function from labeled training data.Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

It is a series of tasks. Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. Today, we will focus on supervised learning.

Supervised (inductive) learning Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Reinforcement learning Rewards from sequence of actions

Links:+ https://en.wikipedia.org/wiki/Supervised_learning+ https://en.wikipedia.org/wiki/Unsupervised_learning+ https://en.wikipedia.org/wiki/Reinforcement_learning5

Supervised Learning

Class discussion about supervised learning examples.

Supervised learning can be used for prediction. What common prediction can you think of? e.g. weather, home loan approval based on credit history

+ https://en.wikipedia.org/wiki/Supervised_learning

6

Prediction with Supervised Learning

Use the weather example to explain the vocab. Label is rain or shine. Input is temp, wind, humidity, amount of rain etc. from the last week.

Afeatureis an individual measurable property of a phenomenon being observed.Labeled data typically takes a set of unlabeled data and augments each piece of that unlabeled data with some sort of meaningful "tag," "label," or "class" that is somehow informative or desirable to know.

+ https://en.wikipedia.org/wiki/Feature_(machine_learning)+ http://stackoverflow.com/questions/19170603/what-is-the-difference-between-labeled-and-unlabeled-data+ http://www.nltk.org/images/supervised-classification.png

7

What is Predictive Analytics then?Supervised learning is a technique for performing predictive analytics.

Most organizations are at a point where they can visualize and analyze historical data to understand what has happened. There is much more we can gain from data though.

Predictive analytics encompasses a variety of statistical techniques that analyze current and historical facts to make predictions about future or otherwise unknown events.+ https://en.wikipedia.org/wiki/Predictive_analytics + http://blogs.gartner.com/it-glossary/files/2012/11/analytic-maturity.jpg8

Supervised Learning vs. Predictive AnalyticsSupervised learning is the machine learning task of inferring a function from labeled training data.Predictive analytics encompasses a variety of statistical techniques that analyze current and historical facts to make predictions about future or otherwise unknown events.

They can be the same thing. Supervised Learning is a predictive analytics technique. In my experience, people with a background in business tend to say predictive analytics while people with a background in science or engineering tend to say machine learning. Notice how the definition for predictive analytics is more friendly to a general audience. + https://en.wikipedia.org/wiki/Supervised_learning+ https://en.wikipedia.org/wiki/Predictive_analytics

9

Here is an example of some data about an individual. This will be an input to our algorithm. These are our features.

https://ramblejim2.files.wordpress.com/2010/03/scan0001.jpg

10

Can you guess what we will predict? As we all know, it didnt end well for the passengers

http://www.titanicfacts.net/titanic-wreck.html11

Classification with a Decision Tree

Given the information we knew at the time the passengers boarded, could we have predicted the outcome? Classification is a type of machine learning that can help us do this.

Classificationis the problem of identifying to which of a set ofcategories(sub-populations) a newobservationbelongs, on the basis of atraining setof data containing observations (or instances) whose category membership is known.

Decision tree learninguses adecision treeas apredictive modelwhich maps observations about an item to conclusions about the item's target value.

Machine learning can take this concept and automate it at a larger scale than is reasonable for a human. It can take deep and wide datasets to create the trees. It can iterate on the tree, breaking it apart and putting it back together, thousands of times until it gets the best prediction accuracy. It can even build thousands of these trees and then have them compete or vote to find the best answer to a problem.

Classification: https://en.wikipedia.org/wiki/Statistical_classificationDecision trees: https://en.wikipedia.org/wiki/Decision_treeDecision tree learning: https://en.wikipedia.org/wiki/Decision_tree_learning12

The Machine Learning Process

As with many other fields, data scientists and machine learning experts have developed standard workflows and procedures. Whether an organization uses Azure ML or another approach, the basic process of machine learning is much the same. The machine learning process starts with raw data and ends up with a model derived from that data. 13

What is Azure Machine Learning?Azure Machine Learning provides tools for creating complete predictive analytics solutions in the cloud: Quickly create, test, operationalize, and manage predictive models.Microsoft Azure Machine Learning Studio is a collaborative, interactive tool you can use to build, test, and deploy predictive analytics solutions on your data.You drag-and-drop datasets and analysis modules onto an interactive canvas, connecting them together to form an experiment, which you run in Machine Learning Studio.

https://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-ml-studio/

14

15

Why Azure Machine Learning?Minimal set-up costs with ability to easily scale compute/storage capacity; fewer barriers to entryEasy to integrate data from various data sourcesUsers can collaborate in common toolset to build and train models using advanced algorithmsEasy to deploy trained models as consumable web servicesCloud-basedDataIntegrationCommonToolsetDeployment simplicity

16

https://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-machine-learning/

Azure Machine Learning provides tools for creating complete predictive analytics solutions in the cloud: Quickly create, test, operationalize, and manage predictive models. Azure ML provides a graphical tool for managing the machine learning process, a set of data preprocessing modules, a set of machine learning algorithms, and an API to expose a model to applications. 17

Data Time

Walkthrough the data and its source.

Searched for data here: http://www.healthdata.gov/dataset/1992-through-2010-treatment-episode-data-set-admissions-teds

Learned about this data set and downloaded all resources here: http://www.icpsr.umich.edu/icpsrweb/ICPSR/series/0023818

How can I know, at the time of admission, if a new patient will successfully complete their substance abuse treatment plan?

A question we can answer with machine learning.

Dr. Satya (pretend stakeholder) runs a big hospital and needs to identify which types of patients may need to go through alternate programs.19

A great place to find data that your organization can use. Great if you do not have much yourself or you simply want to complement your own data.

http://www.healthdata.gov/20

21

I wanted to find data similar to what I am collecting here at DBHDD. I wanted information on the individual level that had some future outcome I could predict with historical data. In this case, we have data from when patients are admitted to a hospital for substance abuse issues. We also have data for when they are discharged. When I saw this, I knew I could try and predict the discharge data with the admission data.

http://www.icpsr.umich.edu/icpsrweb/ICPSR/series/0023821

22

Another benefit of this data set was its fantastic documentation. 22

23

It had all kinds of information about each individual. For example, we knew the general age range of each patient admitted. 23

24

Some of the data had lots of missing values so I decided not to use them for the prediction. My threshold was 25% (if above that number, excluded the column). 24

25

The is what I decided to predict (it is our label). It is the outcome of the treatment. I bucketed all of these categories into two categories: successful completion of the treatment and unsuccessful completion of the treatment. 25

26

Here we can see some basic descriptive statistics of the outcomes. I removed any individuals from the data that had this value as missing.26

Demo

https://studio.azureml.net/27

This can look complicated at first. Well break it up in parts of the machine learning workflow we discussed earlier. That makes it easier to understand.28

29

30

Messy Data30

31

Just like with human learning, you dont want to throw everything at the student all at once. Its best to come up with a curriculum catered to that individual. 31

32

Clean Data32

33

34

The right two columns, Scored Labels and Scored Probabilities are the prediction results. The Scored Probabilities column shows the probability that a flower belongs to the positive class (class 1). For example, the first number 0.028571 in the column means there is 0.028571 probability that the first flower belongs to class 1. The Scored Labels column shows the predicted class for each flower. This is based on the Scored Probabilities column. If the scored probability of a flower is larger than 0.5, it is predicted as class 1, otherwise, it is predicted as class 0.

https://azure.microsoft.com/en-us/documentation/articles/machine-learning-interpret-model-results/ 34

35

That is, the accuracy is the proportion of true results (bothtrue positivesandtrue negatives) among the total number of cases examined.An accuracy of 100% means that the measured values are exactly the same as the given values.On the other hand, precision orpositive predictive valueis defined as the proportion of the true positives against all the positive results (both true positives andfalse positives)

https://en.wikipedia.org/wiki/Accuracy_and_precision35

36

EndQuestions and Feedback

PresenterPaul PraeConsultant, Slalom ConsultingB.A. in Cognitive Science with a Focused Foundation in Artificial IntelligenceB.S. in Computer Science with an Area of Emphasis in Artificial Intelligence@Praeducerwww.paulprae.com

38

http://gotocon.com/dl/goto-aar-2014/slides/OscarNaim_AzureMachineLearningMachineLearningWithTheSimplicityAndProductivityOfTheCloud.pdfhttp://www.slideshare.net/rjovic/azure-machine-learning-101http://dilbert.com/strip/2013-02-02https://en.wikipedia.org/wiki/Book:Machine_Learning_%E2%80%93_The_Complete_Guidehttp://gotocon.com/dl/goto-aar-2014/slides/OscarNaim_AzureMachineLearningMachineLearningWithTheSimplicityAndProductivityOfTheCloud.pdfhttps://azure.microsoft.com/en-us/documentation/articles/machine-learning-studio-overview-diagram/CSE 546 Data Mining Machine Learning by Pedro Domingos www.cs.washington.edu/546https://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-machine-learning/ Microsoft Azure Essentials Azure Machine Learning By Jeff Barnes bit.ly/1omR6wt http://www.icpsr.umich.edu/icpsrweb/ICPSR/series/00238http://www.healthdata.gov/

References

39