logistic regression -

of 12/12
Mogae Media Prediction using Logistic Regression

Post on 13-Apr-2017




0 download

Embed Size (px)


Slide 1

Mogae Media

Prediction using Logistic Regression

Need of logistic regression?

Regression allows us to predict an output based on some inputs. For instance, we can predicts someone's height based on their mother's height and father's height.This type of regression is calledlinearregressionbecause our outcome variable is a continuous real number.

But what if we wanted to predict something that's not a continuous number?

Let's say we want to predict if it will rain tomorrow. Using ordinary linear regression won't work in this case because it doesn't make sense to treat our outcome as a continuous number - it either will rain, or won't rain.In this case, we uselogistic regression, because our outcome variable is one of several categories.

Logistic RegressionRegressionIndependent VariableDependent VariableExampleQuantitative, QualitativeQualitativeQuantitative, QualitativeQuantitativeResult (Pass, Fail) is the function of time given to studyMarks obtained is the function of time given to study

MarksStudy Hours

Passing MarksStudy HoursResultPassFail

Logistic RegressionRegression

Binary logistic regression expression

Y = Dependent Variables = Constant1 = Coefficient of variable X1X1 = Independent VariablesE = Error TermBINARY

Problem statement & Methodology

The purpose of campaign is to get 25K customer registered for CRBT. The task can be accomplished by identifying the customers (or prospects) who are most likely to respond out of the total base of around 100 Million users.

We have sample data available for both respondents and non-respondents for the campaign and we used Logistic regression, which allows us to predict a discrete outcome, such as response tracking from a set of variables that may be continuous, discrete, or a mix of any of these. Generally, the dependent or response variable is dichotomous, such as success/failure.

Sample data of Respondents: 13,600 unique subscribers

Sample data of Non Respondents: 14,000 unique subscribers

Hypothesis tests Is an individual predictor variable significant? Is the overall model significant? Is Model A significantly better than Model B?

Dataset used in model:

Outcome variable:1: Responded0: Not Responded

Predictors:Average monthly spendOperating system2G data usage(MB)3G Data usage(MB) GenderIncoming messagesHandset TypeAge on handset

Variable List

Logistic Function using R

Logistic Regression Interpretation

Predicated probability using our model