predire il futuro con machine learning & big data

43

Upload: data-driven-innovation

Post on 11-Jan-2017

317 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Predire il futuro con Machine Learning & Big Data
Page 2: Predire il futuro con Machine Learning & Big Data

Data Driven Innovation

Codemotion

Presentation title

Antimo Musone

IT Manager

20 Maggio 2016

Page 3: Predire il futuro con Machine Learning & Big Data

Page 3

About Me

►Antimo Musone IT Manager / Architect presso EY

Co - Founder Fifth Ingenum Srls.

Ing. Informatica II Università degli Studi di Napoli

email: [email protected]

Page 4: Predire il futuro con Machine Learning & Big Data

Page 4

Indice

►What is Machine Learning ?

►Predictive Analytics

►Machine Overview

►Defining Predictive Analytics

►Supervised Learning

►Unsupervised Learning

►Watson Service

► Cortana Analytics Suite

►Demo

Page 5: Predire il futuro con Machine Learning & Big Data

Page 5

What is Machine Learning ?

Page 6: Predire il futuro con Machine Learning & Big Data

Page 6

Machine Learning / Predictive Analytics

Vision Analytics

Recommenda-tion engines

Advertising analysis

Weather forecasting for business planning

Social network analysis

Legal discovery and document archiving

Pricing analysisFraud detection

Churn analysis

Equipment monitoring

Location-based tracking and services

Personalized Insurance

Machine learning & predictive analytics are core capabilities that are needed throughout your business

Page 7: Predire il futuro con Machine Learning & Big Data

Page 7

Machine Learning Overview

► Formal definition: “The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience” - Tom M. Mitchell

► Another definition: “The goal of machine learning is to program computers to use example data or past experience to solve a given problem.” – Introduction to Machine Learning, 2nd Edition, MIT Press

► ML often involves two primary techniques:

► Supervised Learning: Finding the mapping between inputs and outputs using correct values to “train” a model

► Unsupervised Learning: Finding patterns in the input data (similar to Density Estimates in Statistics)

Page 8: Predire il futuro con Machine Learning & Big Data

Page 8

Machine Learning

Data: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Rules, or Algorithms:about, Learning, language – Spelling and sounding builds wordsLearning about language. – Words build sentences

Learning, or Abstraction:Any new understanding proceeds from previous knowledge.

Data + Rules/ Algorithms = Machine Learning

Page 9: Predire il futuro con Machine Learning & Big Data

Page 9

Traditional programming VS Machine Learning

ComputerData

ProgramOutput

Traditional Programming

Data

OutputProgram/Algorithms

Machine Learning

Program can predict the output!

Computer

Page 10: Predire il futuro con Machine Learning & Big Data

Page 10

ML : No, more like gardening

Gardener = You

Seeds = AlgorithmsNutrients = Data

Plants = Programs

Page 11: Predire il futuro con Machine Learning & Big Data

Page 11

ML Sample Application

► Web search ► Computational biology► Finance► E-commerce► Space exploration► Robotics► Information extraction► Social networks► Debugging► [Your favorite area]

Page 12: Predire il futuro con Machine Learning & Big Data

Page 12 Presentation title

What is Predictive Analytics?

Wikipedia Definition: (http://en.wikipedia.org/wiki/Predictive_analytics)

“Predictive analytics encompasses a variety of techniques from statistics, modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events.”

Facts Predictions Predictive Analytics Technique

s

Page 13: Predire il futuro con Machine Learning & Big Data

Page 13 Presentation title

Breaking it Down

“Predictive analytics encompasses a variety of techniques from statistics, modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events.”

Machine Learning Use of computer algorithms to derive complex formulations based on objectives and constraints

Tools and Techniques Data visualization, segmentation, correlations

Use in Predictive Analytics Predictive analytics is often applied in the context of datasets that are too large for manual analysis, so data mining techniques are required

Statistics Focus on learning population characteristics based on samples of data

Tools and Techniques p-values, confidence intervals, sampling, ANOVA

Use in Predictive Analytics Underlying theory behind many parametric models – observed facts are a sample from a population including both known/historic and unknown/future events

Modeling Representations of systems used to understand the underlying dynamics of the system

Tools and Techniques Symbolic logic, proxies

Use in Predictive Analytics Complex relationships can be simplified through modeling – these models can then be used to analyze relationships between factors

Page 14: Predire il futuro con Machine Learning & Big Data

Page 14 Presentation title

What is a Model?

A model is a simplified representation of observed effects Key terms: Dependent or target variable – the variable of interest Independent or predictor variable(s) – variable(s) used

for explanation/prediction Effect – the (quantitative) impact of an independent

variable or combination of independent variables on the dependent variable Main Effect – The direct effect of a single independent variable

on the dependent variable Interaction Effect – The effect of a combination of multiple

independent variables on the dependent variable

Page 15: Predire il futuro con Machine Learning & Big Data

Page 15 Presentation title

Two types of model

A model is a simplified representation of observed effects

StatisticalParametric Models

Effects are well-quantified and can be examined

An equation can be used to represent the model

Emphasis on explanation “What causes the dependent

variable to change?” Test hypotheses p-values, confidence intervals

Machine Learning Non-parametric models

Effects may be unquantified (“black box”)

No representative equation Model may be stochastic, so results

my vary Emphasis on prediction “What will the value of the next

observation be?” Generate hypotheses

Page 16: Predire il futuro con Machine Learning & Big Data

Page 16

Types of Learning

► Supervised (inductive) learning► Training data includes desired outputs► Dependent variable is known► May be statistical or non-statistical

► Unsupervised learning► Training data does not include desired outputs► No dependent variable► Non-statistical

► Semi-supervised learning► Training data includes a few desired outputs

Page 17: Predire il futuro con Machine Learning & Big Data

Page 17

Machine Learning Problem

Classification or Categorization Clustering

Regression Dimensionality reduction

Supervised Learning Unsupervised Learning

Dis

cret

eC

ontin

uous

Page 18: Predire il futuro con Machine Learning & Big Data

Page 18

What is Logistic Regression?

Regression Models are a form of supervised learning that attempt to fit “linear” functions to training data – the most common type of regression, linear regression, should be familiar to most of you as a “best fit line”

Logistic Regression is closely related to linear regression, but fits a different shape function by using a binomial link function on the dependent variable

Page 19: Predire il futuro con Machine Learning & Big Data

Page 19

Machine Learning Example

Predict function F(X) for new examples XDiscrete F(X): ClassificationContinuous F(X): Regression

F(X) = Probability(X): Probability estimation

Given examples of a function (X, F(X))

The probability of an event X, denoted F(X), represents the proportion of all events that have X as their outcome, and is typically represented as a decimal 0<P(X)<1

Page 20: Predire il futuro con Machine Learning & Big Data

Page 20

Machine Learning Example

Apply a prediction function to a feature representation of the image to get the desired output:

• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set

• Testing: apply f to a never before seen test example x and output the predicted value y = f(x)

output prediction function

Image feature

y = f(x)F( ) = «apple»

F( ) =«tomato»

F( ) = «dog»

Page 21: Predire il futuro con Machine Learning & Big Data

Page 21

Supervised Learning

Used when you want to predict unknown answers from answers you already have

Data is divided into two parts: the data you will use to “teach” the system (data set), and the data to test the algorithm (test set)

After you select and clean the data, you select data points that show the right relationships in the data. The answers are “labels”, the categories/columns/attributes are “features” and the values are…values.

Then you select an algorithm to compute the outcome. (Often you choose more than one)

You run the program on the data set, and check to see if you got the right answer from the test set.

Once you perform the experiment, you select the best model. This is the final output – the model is then used against more data to get the answers you need

Page 22: Predire il futuro con Machine Learning & Big Data

Page 22

Supervised Learning

Car

Not Car

Page 23: Predire il futuro con Machine Learning & Big Data

Page 23

Unsupervised Learning

Used when you want to find unknown answers –

mostly groupings - directly from data

No simple way to evaluate accuracy of what you learn

Evaluates more vectors, groups into sets or classifications

Start with the data

Apply algorithm

Evaluate groups

Page 24: Predire il futuro con Machine Learning & Big Data

Page 24

Unsupervised Learning

Example 1 example A Example 2 example B Example 3 example C

example A example B example CExample 1 Example 2 Example 3

The clustering strategies have more tendency to transitively group points even if they are not nearby in feature space

Page 25: Predire il futuro con Machine Learning & Big Data

Page 25

Cross-Validation and Model Evaluation

Cross-validation is a method of ensuring that models generalize to data they have not been trained to fit

Given any collection of data points, a model can be developed that fits the data exactly; however, this model will have no predictive power

Page 26: Predire il futuro con Machine Learning & Big Data

Page 26 Presentation title

Evaluating Predictive Models

Model evaluation involves a combination of objective criteria and subjective judgment

Objective Measures

Gain or Lift

Sensitivity

Accuracy

Others

Subjective Considerations

Business intuition

Explainability

Simplicity

Usefulness

Page 27: Predire il futuro con Machine Learning & Big Data

Page 27

Gain or Lift

Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model.

Cumulative gains and lift charts are visual aids for measuring model performance Both charts consist of a lift curve and a baseline The greater the area between the lift curve and the baseline, the

better

Page 28: Predire il futuro con Machine Learning & Big Data

Page 28

Sensitivity

A Receiver Operating Characteristic (ROC) curve is a plot of test sensitivity as a function of (1 - specificity) for several possible (arbitrary) cut off values. The curve illustrates the trade off between type I and type II errors in a given test.

The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test, and the area under the curve is a measure of accuracy.

Page 29: Predire il futuro con Machine Learning & Big Data

Page 29

IBM Watson

Page 30: Predire il futuro con Machine Learning & Big Data

Page 30

Cognitive Services

Page 31: Predire il futuro con Machine Learning & Big Data

Page 31

Cortana Suite

Page 32: Predire il futuro con Machine Learning & Big Data

Page 32

Cortana Analytics Suite

DATA

Business apps

Custom apps

Sensors and devices

ACTION

People

Automated Systems

INTELLIGENCE

Cortana Analytics

Page 33: Predire il futuro con Machine Learning & Big Data

Page 33

Data Flow and Architecture

Stream Analytics

TransformIngest

Web logs

Present & decide

IoT, Mobile Devices etc.

Social Data

Event Hubs HDInsight

Azure Data Factory

Azure SQL DB

Azure Data Lake

Azure Machine Learning

(Fraud detection etc.)

Power BI

Web dashboards

Mobile devices

DW / Long-term storage

Predictive analytics

Event & data producers

Azure SQL DW

Page 34: Predire il futuro con Machine Learning & Big Data

Page 34

Process real-time data in Azure using a simple SQL language

Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications

Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data

Outputs to persistent stores, dashboards or back to devices

Point of Service Devices

Self CheckoutStations

Kiosks

Smart Phones

Slates/Tablets

PCs/Laptops

Servers

Digital Signs

DiagnosticEquipmentRemote Medical

MonitorsLogic

Controllers

SpecializedDevicesThin

Clients

Handhelds

Security

POS Terminals

AutomationDevices

VendingMachines

Kinect

ATM

Stream Analytics

Azure Stream Analytics

Page 35: Predire il futuro con Machine Learning & Big Data

Page 35

Fully managed service to support orchestration of data movement and processing

Connect to relational or non-relational data that is on-premises or in the cloud

Single pane of glass to monitor and manage data processing pipelines.

Publish to Power BI

Compose and orchestrate data services at scale

No SQL

DB

Blob

C#

MapReduceTrusted data

BI & analyticsHivePig

Stored Procedures

VM

Machine Learning

Azure Data Factory

Page 36: Predire il futuro con Machine Learning & Big Data

Page 36

ML Algorithms are best of breed and embrace OSS• MS + R + Python + BYOA

ML Studio for productive development• Faster experiments results in faster improvements• Visual Workflows & ML Experiments

ML Operationalization to remove deployment friction• Build entire ML Apps & Deploy as Cloud APIs

ML Gallery• Provide ML applications like apps in an ‘app store’• Publish/consume APIs in a 2 sided market

Help organizations eliminate undifferentiated heavy lifting

Powerful predictive analytics in Azure

Azure Machine Learning

Page 37: Predire il futuro con Machine Learning & Big Data

Page 37

Power BI investments

New data visualizations and touch-optimized exploration in HTML5

Power BI mobile apps across devices including iPad and iPhone

Support for new data sources including SalesForce.com, Dynamics CRM online and SQL Server Analysis Services

Dashboard

Tree Map

Power BI dashboards and KPIs for monitoring the health of your business

Page 38: Predire il futuro con Machine Learning & Big Data

Page 38

Demo Cognitive

Page 39: Predire il futuro con Machine Learning & Big Data

Page 39

Demo Cortana Suite

Page 40: Predire il futuro con Machine Learning & Big Data

Page 40

Vehicle Telemetry Architecture

Event Hubs for ingesting millions of vehicle telemetry events into Azure.

Stream Analytics for gaining real-time insights on vehicle health and persists that data into long-term storage for richer batch analytics.

Machine Learning for anomaly detection in real-time and batch processing to gain predictive insights.

HDInsight is leveraged to transform data at scale

Data Factory handles orchestration, scheduling, resource management and monitoring of the batch processing pipeline.

Power BI gives this solution a rich dashboard for real-time data and predictive analytics visualizations.

Page 41: Predire il futuro con Machine Learning & Big Data

Page 41

Microsft Azure Learning Machine

Data It’s all about the data. Here’s where you will acquire, compile, and analyze testing and training data sets for use in creating Azure Machine Learning predictive models.

Create the model Use various machine learning algorithms to create new models that are capable of making predictions based on inferences about the data sets.

Evaluate the model Examine the accuracy of new predictive models based on ability to predict the correct outcome, when both the input and output values are known in advance. Accuracy is measured in terms of confidence factor approaching the whole number one.

Refine and evaluate the model Compare, contrast, and combine alternate predictive models to find the right combination(s) that can consistently produce the most accurate results.

Deploy the model Expose the new predictive model as a scalable cloud web service, one that is easily accessible over the Internet by any web browser or mobile client.

Test and use the model Implement the new predictive model web service in a test or production application scenario.

Page 42: Predire il futuro con Machine Learning & Big Data

Page 42

Azure Machine Learning algorithms

Classification algorithms These are used to classify data into

different categories that can then be used to predict one or more

discrete variables, based on the other attributes in the dataset.

Regression algorithms These are used to predict one or more

continuous variables, such as profit or loss, based on other attributes

in the dataset.

Clustering algorithms These determine natural groupings and

patterns in datasets and are used to predict grouping classifications

for a given variable.

Page 43: Predire il futuro con Machine Learning & Big Data

Page 43

Thanks

► Questions?