logistic regression with r

1
## Logistic Regression ## #One should have the basic concept in statistics and R to understand this and the different terms associated with this work sheet. #Please read my Simple Linear Regression With R (http://www.slideshare.net/jeromerick/simple-linear-regression-with- r-52336944) to get a better understanding of this lesson. #We will use the data set mtcars in R data("mtcars") ?mtcars View(mtcars) #We will find out the type of engine(v/s) that the car has based on wgt. and disp #i.e. vs~wt+disp #Because the dependent variable is binomial we have to fit a logistic #Dependent variable is categorical therefore the coefficients that we get are the logits #To fit a logistic model in R we use the command "glm" ?glm # Fitting generalized linear models model1<-glm(formula = vs~wt+disp, data = mtcars, family = "binomial") model1 # Coefficients that we get are the value of the logits i.e. log of odds summary(model1) qchisq(0.95, 29) #Nul deviance - It is the value which shows how the response variabe i.e.how the dependent variable is predited by this model, given that it only includes the intercept, i.e. there is no explanatory variable in the model, there is only one variable which is the dependent variable that is regressed on the intercept only. #Residual deviance - It is the value which shows how the response variabe i.e.how the dependent variable is predited by this model, given that it only includes the intercept, i.e. there is the explanatory variable or variables in the model.explanatory variables out here are wt & disp #AIC Value - stands for Akaike information criterion - It provides us a method to assess how good the quality of my model is in comparison with other related models.So if i have a number of models, we will choose the lowest AIC #Interpretation of coefficents for logistics regression - We have regressed vs i.e. the dependent variable on two independent variable wt. & disp. and the coefficent values that we have received is wt=1.62635 & disp=-0.03443, when the weight of the vehicle increases by 1 pond the log of the odds of vs equals to 1 rises by 1.62units, i.e. the odds will rise by e to the power 1.62 units #vs=1 i.e. it has a vs kind of ingine vs=0 some other kind #After fitting the equation we will find the probability of what kind of engine a particular vehicle has #First we create a new data for which we want to find the probability carA<-data.frame(wt=2.1, disp=180) #Now we will predict the propability whether the engine of carA is vs or not given the values of explanatory variable of carA ?predict predict(model1,carA,type="response") #Interpretations - we get the value 0.2361081 which means when the weight is 2.1 and the displacement is 180 the probability that the vehicles engine will fall under vs categorey is 0.24 or 24% ## By Jerome Gomes ## ## For queries and more information feel free to contact me @ [email protected] ## ## If you want this R-Script then mail me at the above mail id ##

Upload: jerome-gomes

Post on 25-Jan-2017

197 views

Category:

Data & Analytics


0 download

TRANSCRIPT

## Logistic Regression ##

#One should have the basic concept in statistics and R to understand this and the different terms associated with this work sheet.#Please read my Simple Linear Regression With R (http://www.slideshare.net/jeromerick/simple-linear-regression-with-r-52336944) to get a better understanding of this lesson.

#We will use the data set mtcars in Rdata("mtcars")?mtcarsView(mtcars)

#We will find out the type of engine(v/s) that the car has based on wgt. and disp#i.e. vs~wt+disp#Because the dependent variable is binomial we have to fit a logistic #Dependent variable is categorical therefore the coefficients that we get are the logits#To fit a logistic model in R we use the command "glm"?glm # Fitting generalized linear modelsmodel1<-glm(formula = vs~wt+disp, data = mtcars, family = "binomial")model1# Coefficients that we get are the value of the logits i.e. log of oddssummary(model1)qchisq(0.95, 29)#Nul deviance - It is the value which shows how the response variabe i.e.how the dependent variable is predited by this model, given that it only includes the intercept, i.e. there is no explanatory variable in the model, there is only one variable which is the dependent variable that is regressed on the intercept only.

#Residual deviance - It is the value which shows how the response variabe i.e.how the dependent variable is predited by this model, given that it only includes the intercept, i.e. there is the explanatory variable or variables in the model.explanatory variables out here are wt & disp

#AIC Value - stands for Akaike information criterion - It provides us a method to assess how good the quality of my model is in comparison with other related models.So if i have a number of models, we will choose the lowest AIC

#Interpretation of coefficents for logistics regression - We have regressed vs i.e. the dependent variable on two independent variable wt. & disp. and the coefficent values that we have received is wt=1.62635 & disp=-0.03443, when the weight of the vehicle increases by 1 pond the log of the odds of vs equals to 1 rises by 1.62units, i.e. the odds will rise by e to the power 1.62 units

#vs=1 i.e. it has a vs kind of ingine vs=0 some other kind#After fitting the equation we will find the probability of what kind of engine a particular vehicle has#First we create a new data for which we want to find the probabilitycarA<-data.frame(wt=2.1, disp=180)#Now we will predict the propability whether the engine of carA is vs or not given the values of explanatory variable of carA

?predictpredict(model1,carA,type="response")#Interpretations - we get the value 0.2361081 which means when the weight is 2.1 and the displacement is 180 the probability that the vehicles engine will fall under vs categorey is 0.24 or 24%

## By Jerome Gomes #### For queries and more information feel free to contact me @ [email protected] #### If you want this R-Script then mail me at the above mail id ##