multiple regression

28
SOMDEEP SEN; Business Analyst: Trimax Analytics (e) [email protected] ; (p): 09748229123 LinkedIn: http://linkd.in/1ifqs3x

Upload: somdeep-sen

Post on 27-Jan-2015

365 views

Category:

Data & Analytics


1 download

DESCRIPTION

Predicting academic performance of an elementary school using attributes like class size, enrollment, poverty, parent education, student performance, teachers credentials from 400 elementary schools from the California Department of Education's API 2000 dataset

TRANSCRIPT

Page 1: Multiple regression

SOMDEEP SEN; Business Analyst: Trimax Analytics

(e) [email protected]; (p): 09748229123

LinkedIn: http://linkd.in/1ifqs3x

Page 2: Multiple regression

The Data set contains:

Performance of 400 elementary schools from the California Department of Education

Factors like class-size, parent education, student performance, etc.

Objectives:

To find the factors having major influence on the academic performance

To predict academic performance of an school using those factors

Page 3: Multiple regression

Note: Factors have been chosen based on statistical significance

Factors Impact

English language learners(ELL) Negative

Percentage first year in school (Mobility) Negative

Parent grad school (grad_sch) Positive

Percentage full credential (Full) Positive

Average Class size 4-6 (ACS_46) Positive

Page 4: Multiple regression

Variable Label Parameter

Intercept Intercept 459.71

ell english language learners -2.90

mobility pct 1st year in school -3.11

acs_46 avg class size 4-6 3.69

grad_sch parent grad school 3.38

full pct full credential 2.33

Regression Equation

API00= 459.71+ (-2.90)* ell+ (-3.11)*(mobility) + 3.69* acs_46+ grad_sch*(3.38) + full* (2.33)

To view the detailed SAS Code please visit the following link:http://bit.ly/1c08pGE

Page 5: Multiple regression
Page 6: Multiple regression

• ELLs are one of the fastest growing populations in the public schools

• Number of ELLs in CA is healthy due to the geographic location & economic significance

• ELL students come from different backgrounds &face multiple challenges

• But, the main challenge continues to be the problem in communication

Recommendations:

• Provide special coaching to ELLs to ensure that they master English

• Special coaching should be done before they get tested in English in core content areas

• Ensure that all ELL students receive the full range of services

• Improve teacher training opportunities so teachers can understand the needs of ELLs

Page 7: Multiple regression

• Students making non-promotional school changes is known as mobility

• California students, like students in the rest of the U.S., are highly mobile

• Mobility happens due to following reasons:

– Families changing their residences

– School changes initiated by students especially in California

– School changes initiated by schools especially in California

Page 8: Multiple regression

Recommendations:

Families should:

• Attempt to resolve problems at school before initiating transfer

• Make changes between semesters or at the end of the school year

Schools should:

• Counsel students to remain in the school if at all possible

• Prepare in advance for incoming transfers

• Assess the past enrollment history of incoming students

• Assess the number of previous school changes

• Facilitate the transition of new students as soon as they arrive

Page 9: Multiple regression

• Research shows US students spend less than 15% of their time in school

• Therefore parent involvement is as important as the time spend in school

• Checking homework, attending school meetings, influences student performance

• Educated parents finds it easier to get involved than the others

Recommendations:

• Look to ensure that parents with school graduation lie between 65-70%

• Conduct parent interview during the admission of the students

• Also take initiatives to increase parent engagements

• But schools shouldn’t limit a parent’s involvement based on socio-economic status

Page 10: Multiple regression

• Experienced teachers are more effective at raising student

• Experienced teachers are also more likely to be fully credentialed also

• Hence teacher retention could be instrumental in performance improvement

• Shortage of fully credential teachers is a prime reason for low performance & mobility

• Many assume that financial incentive is the silver bullet; but that is only partially true

Recommendations:

• Financial incentives can make schools more attractive to more qualified teachers

• Money is Necessary, But Clearly Not Sufficient

• Teachers often leave due to poor working conditions, and lack of administrative support

• Schools should recruit & develop administrators who can draw on the expertise of teachers

Page 11: Multiple regression

• Improvement in avg. class size in 4-6 grade((ACS_46) tends to improve performance

• ACS_46 can be improved when:

– Mobility is low

– Promotion of student’s from one grade to another is high

• So, it can be said that ACS_46 is an indicator of the overall academic performance

Recommendation:

• Focus should be on all the recommendations mentioned previously to improve ACS_46

Page 12: Multiple regression
Page 13: Multiple regression
Page 14: Multiple regression

• Outliers were found using the proc univariate option & treated accordingly

Before the treatment After the treatment

Page 15: Multiple regression

• This is done to check the overall significance of the model

• H0: independent variables collectively or individually can’t influence the dependent variable

• H1: the independent variables collectively or individually can influence the dependent variable

• If P value>α: H0 can’t be rejected & hence the model is useless

• If P value<α: H0 is rejected & hence some independent can influence the dependent variable

• In this case the P value<α & hence some independent can influence the dependent variable

Page 16: Multiple regression

• This happens when the independent variables are highly interdependent

• Hence the individual impact on the dependent variables can’t be correctly estimated

• The extent of multicolineraity is captured by the variance inflation factor(VIF)

• The final model must have only those variables having VIF ranging from 1.5 to 2

Page 17: Multiple regression

• To control multicolineraity certain variables gets removed based on high VIF

values

• For the rest the significance of the corresponding population parameter

• The P values of the variables are checked for the significance

• Variables having P value>α are not important for the model

• The final model must have variables having P value>α & VIF ranging from 1.5 to 2

Page 18: Multiple regression

• This occurs when the variance of the random error component is not constant

• The White’s test used for the check For Heteroscedasticity

• Null Hypothesis: Model is Homoscedastic

• If P value>α: H0 can’t be rejected & hence the model is Homoscedastic & vice-

versa

• The VIF SPEC option is used to check for the Heteroscedasticity

Page 19: Multiple regression

• Once the model has only the significant variables the o/p file created

• The o/p file contains the predicted & the residual variables

• The residual variables saved in the o/p file for normality

• This is done using the proc univariate with normal option

Page 20: Multiple regression

• Mean absolute percentage error or MAPE captures the overall % error of the model

• Ideally MAPE should be with in 10%

Page 21: Multiple regression

• This captures the proportion variation that can be explained by the linear regression

• Higher the value of R-square, better the explanatory power

• This acts as a measure of goodness of fit of the model

• R- square value should be at least 65% or .65

Page 22: Multiple regression

API00(E3)= C3+C4*D4+C5*D5+C6*D6+C7*D7+C8*D8

OR

API00= 459.71+ (-2.90)* ell+ (-3.11)*(mobility) + 3.69* acs_46+ grad_sch*(3.38) + full* (2.33)

Page 23: Multiple regression
Page 24: Multiple regression
Page 25: Multiple regression
Page 26: Multiple regression
Page 27: Multiple regression
Page 28: Multiple regression