starfishpredictivemodeling eriecommunitycollegetraining/ec_training/po… ·...

18
© 2015 Hobsons Inc. | Proprietary and Confidential Page 1 of 18 Starfish Predictive Modeling Erie Community College 10/23/15 Table of Contents Methodology ....................................................................................................................................... 2 Performance ........................................................................................................................................ 2 Strongest Predictors............................................................................................................................. 4 Observations/Comments ..................................................................................................................... 5 Comments about Data Quality ............................................................................................................. 5 Figures ................................................................................................................................................. 7

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  1  of  18      

Starfish  Predictive  Modeling  

Erie  Community  College  10/23/15  

 

Table  of  Contents  Methodology  .......................................................................................................................................  2  Performance  ........................................................................................................................................  2  Strongest  Predictors  .............................................................................................................................  4  Observations/Comments  .....................................................................................................................  5  Comments  about  Data  Quality  .............................................................................................................  5  Figures  .................................................................................................................................................  7    

   

Page 2: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  2  of  18      

Methodology  Hobsons  analyzed  historical  data  provided  by  Erie  Community  College  via  the  Starfish  platform.  The  analysis  includes  three  years  of  outcome  data  beginning  with  the  Fall  2011  term  and  ending  with  the  Fall  2014  term.    We  developed  a  predictive  model  using  a  random  sample  of  20,000  semester  registrations  selected  from  that  time  period.    The  model  predicts  the  outcome  for  a  student  who  is  enrolled  in  a  given  term.    A  positive  outcome  is  defined  to  be  one  where  the  student  either  graduates  at  the  end  of  that  term  or  begins  another  term  of  study  after  that  term  (i.e.,  is  a  persisting  student).    In  practice,  because  the  data  is  time-­‐censored,  we  never  know  if  a  student  has  permanently  exited.    A  student  is  counted  as  non-­‐persisting  if  there  are  no  subsequent  registrations  in  the  database.    We  excluded  data  later  than  Fall  2014  to  provide  a  one-­‐year  window  to  observe  a  student’s  potential  return.    Students  who  hadn’t  returned  (or  graduated)  by  Fall  2015  are  presumed  to  be  non-­‐persisting.    From  the  sample  of  20,000  semester  registrations,  75.9%  were  characterized  as  returning  and  29.9%  non-­‐returning.      

We  evaluated  the  performance  of  the  model  using  25,790  term-­‐registration  records  that  were  not  used  in  constructing  the  model.  

We  used  a  standard  set  of  predictor  variables  derived  from  data  in  the  Starfish  database.    We  also  incorporate  any  user  attributes  provided  by  the  institution  having  values  that  do  not  change  over  time.  

We  used  the  model  obtained  from  the  historic  data  to  provide  a  predictive  score  for  all  currently  enrolled  students.  

Performance  To  determine  predictive  power  we  use  the  so-­‐called  c-­‐statistic  (sometimes  know  as  the  area  under  the  ROC  curve),  which  measures  whether  students  with  higher  scores  tend  to  persist  more  than  students  with  lower  scores.    The  c-­‐statistic  is  the  probability  that  a  randomly  selected  persisting  student  will  have  a  higher  predictive  score  than  a  randomly  selected  non-­‐persisting  student.    The  c-­‐statistic  on  the  validation  set  was  74%.    A  perfect  predictor  would  have  a  c-­‐statistic  of  100%  while  a  random  guesser  would  score  50%.    This  model  performs  somewhere  in  the  middle.    Although  validation  data  was  not  used  to  construct  the  model,  validation  data  and  the  data  used  to  build  the  model  are  taken  from  the  same  statistical  distribution  (Fall  2011  to  Fall  2014).    The  distribution  for  students  attending  in  Fall  2015  may  be  different.  

Figure  1  below  shows  the  ability  of  the  model  to  discriminate  between  students  who  persist  and  those  who  don’t.    Specifically,  it  shows  the  distribution  of  scores  that  the  model  assigned  to  students  who  did  (green)  and  did  not  (red)  persist,  without  using  any  foreknowledge  of  the  outcome.    In  a  predictive  model,  students  who  persist  tend  to  have  higher  scores  than  those  who  do  not.    Figure  1  exhibits  this  tendency.  A  randomly  selected  persisting  student  is  more  likely  to  have  a  score  above  75%  than  a  randomly  selected  non-­‐persisting  student.  

Page 3: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  3  of  18      

 

Figure  1  -­‐  Distribution  (probability  density  function)  of  predictive  scores  for  both  persisting  and  non-­‐persisting  students  obtained  from  the  validation  set.    Green  indicates  persisting  and  red  non-­‐persisting.    The  horizontal  axis  is  the  score,  which  ranges  from  0  to  1.    The  vertical  coordinate  is  a  smoothed  density  estimate.    The  mean  score  is  0.759  (i.e.,  75.9%).    A  score  above  75.9%  is  above  average  and  has  a  lower  than  average  risk.    Students  with  scores  below  50%  (about  9.6%  of  the  validation-­‐set  students)  have  a  relatively  high  risk  of  not  returning  for  another  term.    Students  with  scores  in  the  range  of  50%  to  75.9%  are  at  an  elevated  risk,  but  more  likely  than  not,  they  will  persist  for  an  additional  term.  

     

Page 4: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  4  of  18      

Strongest  Predictors  Our  models  are  nonlinear  regressions.    Predictive  variables,  therefore,  do  not  have  coefficients  used  in  linear  regression  models.    The  strength  of  each  predictor  is  measured  by  a  “variable  importance  factor”  that  measures  the  effect  of  that  variable  in  reducing  modeling  error.    It  is  a  rough  guide  to  the  importance  of  individual  attributes.    The  10  predictor  variables  with  the  highest  variable  importance  are  listed  in  Table  1.    Here,  the  variable  importance  scores  have  been  normalized  so  that  they  sum  to  100%,  and  these  top  10  represent  about  41%  of  the  total.    Two  of  the  institution-­‐provided  attributes  appeared  to  have  a  significant  variable  importance  score:  “EDUCATION  GOALS”  with  an  importance  of  0.027  and  “EMPLOYMENT”  with  an  importance  of  0.024.  

Table  1  -­‐  Important  Predictive  Variables  with  Measures  of  their  Importance  

Variable  Description   Variable  Importance  

Cumulative  GPA  (at  start  of  term)   0.048  

Program   0.048  

Age  Entering  the  Institution   0.045  

Age  Entering  the  Program   0.045  

Current  Age   0.043  

Attempted  Hours   0.040  

Cumulative  Quality  Points  (at  start  of  term)   0.038  

GPA  (prior  term)   0.035  

Quality  Points  (last  term)   0.034  

Credential   0.032  

 

Figures  2  through  13  that  follow  show  comparisons  of  model  predictions  to  actual  outcomes  for  these  10  variables  (with  the  exception  of  the  program)  and  a  few  additional  variables.    The  green  dots  

Page 5: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  5  of  18      

represent  actual  outcomes  (on  validation  data)  for  groups  of  students  with  similar  values  of  the  independent  variable.    The  horizontal  position  of  each  dot  represents  the  average  of  the  independent  variable  for  the  group  it  represents.    The  vertical  position  represents  the  persistence  rate.    The  vertical  position  of  the  green  dots  represents  the  actual  outcome  and  the  vertical  position  of  the  red  dots  represent  the  outcome  predicted  by  the  model.    No  data  with  a  student/term  combination  that  was  used  to  construct  the  model  was  used  as  validation  data,  so  the  model  had  no  foreknowledge  of  the  outcome  for  that  term.    The  area  of  the  dot  is  proportional  to  the  sample  size.    The  variance  goes  up  as  the  sample  size  goes  down.    In  other  words,  the  persistence  rate  cannot  be  predicted  with  the  same  precision  for  smaller  samples  as  for  larger  sample  sizes.  

The  student’s  program  is  a  significant  factor  partly  due  to  the  Non-­‐Matriculated  students  being  a  large  group  of  students  with  a  relatively  low  persistence  rate.  Table  2  shows  the  model-­‐predicted  and  actual  persistence  rates  for  the  18  largest  programs.  

Observations/Comments  Age  is  an  important  factor.    Students  who  begin  a  program  when  their  age  is  in  a  range  of  about  22  to  23  tend  to  persist  less  than  those  who  are  older  or  younger  (Figures  3  and  4).    Students  who  are  currently  in  this  age  range  (who  may  have  entered  at  a  younger  age)  are  also  less  likely  to  persist  (Figure  5).    Persistence  appears  to  improve  as  students  move  beyond  this  critical  age.  

Another  critical  period  is  either  the  first  year  or  first  30  hours  of  study.    The  likely  of  persistence  steadily  increases  during  this  period.    This  trend  appears  in  several  graphs—Figure  7  which  shows  persistence  plotted  against  cumulative  quality  points,  Figure  11  which  shows  persistence  plotted  against  time  at  the  institution,  and  Figure  12  which  shows  persistence  plotted  against  cumulative  earned  credit  hours.  

Persistence  decreases  when  the  cumulative  GPA  or  the  prior  term  GPA  decreases  below  2.0.    Persistence  increases  as  the  credit-­‐hour  load  increases  from  0  to  about  16  hours  (Figure  6).  

Comments  about  Data  Quality  The  quality  of  the  predictions  may  be  limited  by  the  quality  of  the  data.    We  have  made  an  effort  to  detect  and  mitigate  data  quality  problems  where  possible.    However,  not  all  of  the  irregularities  have  been  corrected.    As  one  example,  term  GPAs  and  cumulative  GPAs  were  reported  as  zero  rather  than  null  for  students  who  had  no  classes.    This  makes  it  difficult  to  distinguish  a  student  who  has  a  poor  performance  (an  earned  GPA  of  zero)  from  a  student  who  took  no  classes.    As  another  example,  we  observed  that  some  (but  not  all)  students  earned  credit  (per  the  student  term  status  file)  for  courses  they  had  failed  (per  the  course  outcome  file).    The  methods  we  use  are  very  robust  and  should  cope  with  these  cases,  provided  that  there  is  no  bias  in  the  data.    Bias  would  be  introduced,  for  example,  by  using  a  different  reporting  process  for  historical  student  data  vs  current  student  data.    It  could  also  be  introduced  by  modifying  historical  data  for  students  who  have  left  (but  not  those  who  were  retained)  after  the  fact.  

Page 6: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  6  of  18      

Table  2  -­‐  Predicted  and  Actual  Persistence  Rates  by  Program  (large  programs  only).    The  average  persistence  rate  (predicted  and  actual,  all  programs)  was  about  75.9%.  

Program   Actual  Persistence   Predicted  Persistence  

Sample  Size  

Non-­‐Matriculated   57.6%   54.4%   3673  

Bus-­‐Business  Administration   73.1%   78.7%   759  

Early  Childhood   73.8%   77.3%   325  

Physical  Education  Studies   74.6%   77.8%   268  

Culinary  Arts   75.3%   78.9%   344  

Mntl  Hlth  Ass't-­‐Substance  Abuse   76.0%   79.4%   312  

Business  Administration   76.4%   78.4%   1428  

Lib  Arts  &  Sci-­‐Hum  &  Soc  Sci.   76.5%   77.4%   1313  

Criminal  Justice   76.7%   78.0%   1000  

Lib  Arts  &  Sci-­‐Mathematics  &  Sci.   77.8%   79.4%   609  

Lib  Arts  &  Sci-­‐General  Studies   78.2%   78.7%   8206  

Criminal  Justice:  Law  Enforcement   78.7%   79.3%   567  

Engineering  Science   81.4%   80.7%   376  

Automotive  Technology   81.6%   79.9%   412  

Information  Technology   82.3%   80.7%   277  

Communication  &  Media  Arts-­‐Communication  Arts  

83.3%   81.3%   324  

Paralegal   83.4%   82.7%   271  

Nursing   94.4%   91.0%   554  

 

Page 7: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  7  of  18      

Figures    

Figure  2  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Cumulative  GPA  (at  the  beginning  of  the  term  of  attendance).    Each  dot  represents  a  group  of  students  having  approximately  the  same  GPA.  The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

 

 

Page 8: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  8  of  18      

Figure  3  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Age  Entering  the  Institution.    Each  dot  represents  a  group  of  students  having  approximately  the  same  age.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

Page 9: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  9  of  18      

Figure  4  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Age  Entering  the  Program.    Each  dot  represents  a  group  of  students  having  approximately  the  same  age.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

Page 10: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  10  of  18    

 

Figure  5  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Current  Age  (at  the  beginning  of  the  term  of  attendance).    Each  dot  represents  a  group  of  students  having  approximately  the  same  age.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

Page 11: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  11  of  18    

Figure  6  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Attempted  Hours.    Each  dot  represents  a  group  of  students  having  approximately  the  same  attempted  hours.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

 

Page 12: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  12  of  18    

 

Figure  7  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Cumulative  Quality  Points  (at  the  beginning  of  the  term  of  attendance).    Each  dot  represents  a  group  of  students  having  approximately  the  same  quality  points.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

Page 13: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  13  of  18    

 

Figure  8  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Term  GPA  (prior  term).    Each  dot  represents  a  group  of  students  having  approximately  the  same  term  GPA.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

 

 

 

 

 

Page 14: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  14  of  18    

Figure  9  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Term  Quality  Points  (prior  term).    Each  dot  represents  a  group  of  students  having  approximately  the  same  term  quality  points.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

Page 15: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  15  of  18    

 

Figure  10  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Credential  Sought.  The  green  bars  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

 

Page 16: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  16  of  18    

Figure  11  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Time  at  Institution  (as  of  the  beginning  of  the  term  of  attendance).    Each  dot  represents  a  group  of  students  having  approximately  the  same  time  at  the  institution.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

Page 17: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  17  of  18    

Figure  12  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Cumulative  Earned  Hours  (as  of  the  beginning  of  the  term  of  attendance).    Each  dot  represents  a  group  of  students  having  approximately  the  same  earned  hours.    The  green  dots  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.  

 

 

Page 18: StarfishPredictiveModeling ErieCommunityCollegeTraining/EC_Training/Po… · ©2015Hobsons"Inc."|"Proprietary"and"Confidential" Page1"of"18" " " Starfish"Predictive"Modeling" Erie"Community"College"

Starfish  Predictive  Modeling  –  Erie  Community  College  

©  2015  Hobsons  Inc.    |  Proprietary  and  Confidential     Page  18  of  18    

 

 

Figure  13  -­‐  Predicted  and  Actual  Persistence  Rates  vs  Education  Goals  Attribute.    The  green  bars  represent  actual  outcomes  for  students  during  a  term  in  which  they  were  enrolled.    The  red  dots  represent  the  model  prediction  for  the  same  group  of  students.    Student/term  combinations  used  to  build  the  model  were  excluded  from  the  validation  data.    However,  the  validation  data  comes  from  the  same  time  period  as  the  data  used  to  construct  the  model  so  it  has  the  same  statistical  distribution.