logistic regression with low event rate (rare events)

Upload: tejamoy-ghosh

Post on 01-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    1/12

    Logistic Regression withLow Event Rate (or Rare

    Events)

    1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    2/12

    Contents:

    !"o#lem with lo$istic "e$"ession with low e%ent"ate

    &ay o't

    (ow to o them in SAS)

    (ow to o them in *)

    1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    3/12

    A typical conversationAnalyst 1+ I m in some t"o'#le, my mana$e" wants me to #'il a lo$istic"e$"ession mo el #'t I ha%e only a 2 e%ent "ate in my ata. The lo$istic"e$"ession won t #e a $oo choice he"e the 0 estimate will #e #iase .Analyst 2+ Not necessa"ily. It s the total co'nt "athe" than the e"centa$e oe%ents that matte"s. (ow many cases o yo' ha%e o" the "a"e" e%ent anhow #i$ is yo'" ataset)

    Analyst1+ &e %e $ot a#o't 1833 o e%ents in a ataset o a#o't 133,333cases a less than 2 scena"ioAnalyst2+ (mm. &ith these many cases o" the "a"e" e%ent, yo' can %e"y well'se lo$istic "e$"ession. The"e a"e metho s to a "ess s'ch s4ewe , o" s a"se

    ata sit'ations.

    Analyst1+ &ow. *eally !lease tell me mo"eAnalyst2+ The"e a"e co' le o alte"nati%es. 6o" one yo' can 'se exactlogistic regression this is to #e 'se when sam le si e is too small o"yo'" 's'al lo$istic "e$"ession 'sin$ the "e$'la" ma9im'm-li4elihoo -#aseestimation. Anothe" o tion in yo'" scena"io is to 'se the penalized-likelihood estimation metho . This secon one has the a %anta$e o #ein$

    com 'tationally less eman in$ than the exact logistic metho .1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    4/12

    a s wrong w myregular logistic regression

    when the event rate is low0ow e%ent "ate/*a"e :%ent+In the c'""ent conte9t, this "e e"s to the scena"io whe"e 'n e" a #ina"yo'tcome s ace ;"es onse/no-"es onse, $oo /#a , e a'lt/no- e a'lt,

    '"chase/no- '"chase, etc.< one o the two e%ents a"e a" ewe" than the othe" S' ose in a sam le o 1333 a licants o" a osition only 23 a"e selecte he"e the

    e%ent o #ein$ selecte is the "a"e e%ent with a low e%ent "ate o 2

    S' ose, in a sam le o 133,333 '"chases "om an online "etaile", a#o't 1833 a"e"et'"ne #y the c'stome" he"e the e%ent o $oo s #ein$ "et'"ne is the "a"e e%entwith a low e%ent "ate o 1.8

    Some "eal li e e9am les+ =ha"$e #ac4s in c"e it ca" t"ansactions Goo s "et'"ne in online "etailin$

    &hy is this a "o#lem o" lo$istic "e$"ession it s still #ina"y anyway) The "o#lem he"e is with the estimation metho the 's'al maximum-likelihood method is s'sce ti#le to >small sam le #ias an this #ias isst"on$ly e en ent on the co'nt ;as o ose to e"centa$e< o the "a"e" othe e%ents

    1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    5/12

    Whats the way out then?

    In case o small sam le an /o" %e"y 'n#alance #ina"y ata ;&hen yo'ha%e j'st 23 cases in a sam le o 1333< >e9act lo$istic "e$"ession is to#e 'se

    :9act lo$istic "e$"ession a "oach "o%i es an alte"nati%e to the ma9im'mli4elihoo metho o" ma4in$ in e"ences a#o't the a"amete"s o the lo$istic

    "e$"ession mo el The metho is #ase on a "o "iate e9act ist"i#'tions o s'?cient statistics o"

    a"amete"s o inte"est an the estimates $i%en #y e9act lo$istic "e$"ession o note en on asym totic "es'lts

    It is 'se 'l o" analy in$ small o" 'n#alance #ina"y ata with co%a"iates This metho is 's'ally %e"y com 'tationally intensi%e

    I , howe%e", yo' ha%e a la"$e" co'nt o the "a"e" o the two e%ents, say,

    1333, ;e%en #ette" i it s 2333< in a sam le o 133,333 with the same lowe%ent "ate ;1 to 2 < yo' can 'se lo$istic "e$"ession the estimation willha%e to #e one 'sin$ > enali e li4elihoo metho ;also calle 6i"th s

    enali e li4elihoo a "oach, a te" its in%ento"&hile we mentione this metho in the conte9t o only small sam le si e/"a"ee%ent scena"io, this is a metho o a "essin$ iss'es o se a"a#ility, smallsam le si es, an #ias o the a"amete" estimates

    1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    6/12

    How to o them in !A!

    1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    7/12

    E"act Logistic !A! co e

    !"oc 0o$istic Data @ o'"*a"e:%entData escen in$B6"eC =ell=o'ntB / the =ell=o'nt %a"ia#le is wei$ht %#l he"e /mo el *a"e:%ent @ E1 E2B:9act E1 / estimate @ #othB

    *'n B

    o' can a othe" o tions o" what yo' want to ha%e in yo'"o't 't

    The o tion :9act a te" the mo el statement an the 6"eCstatements a"e the 4ey iFe"ences he"e

    An alte"nati%e :%ent/T"ial Synta9+!"oc 0o$istic Data @ o'"*a"e:%entDataBmo el *a"e:%ent / =ell=o'nt @ E1 E2B:9act E1 / estimate @ #othB*'n B

    1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    8/12

    #enali$e Logistic !A!co e!"oc 0o$istic Data @ o'"*a"e:%entDataBclass =ate$o"ical #l1 =ate$o"ical #l2/ a"am @"e B

    o el @ =ate$o"ical #l1 =ate$o"ical #l2 E1 E1 /

    H"th B"'n B

    o' can a othe" o tions o" what yo' want toha%e in yo'" o't 't

    The o tion >6I*T( in the mo el statement is the4ey he"e

    1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    9/12

    How to o them in R

    1/28/15 Tejamoy Ghosh Data Science ATG - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    10/12

    E"act Logistic in R

    !ac4a$e *eC'i"e +>el"m

    This ac4a$e im lements ;a "o9imate

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    11/12

    #enali$e Logistic in R

    !ac4a$e +>lo$ist

    This ac4a$e "'ns 6i"th s #ias "e 'ce lo$istic "e$"ession

    a "oach with enali e "oHle li4elihoo #aseconH ence inte"%als o" a"amete" estimates

    Anothe" ac4a$e > enali e "'ns enali e$ene"ali e linea" mo els, enali e "e$"essionmo els

    1/28/15 A"' G'ha - In ian Instit'te o 6o"ei$n T"a e - New Delhi, In ia

  • 8/9/2019 Logistic Regression With Low Event Rate (Rare Events)

    12/12

    ata Sciences ATG

    E%&CA' *

    Econometrics+!tatistics+Economics,an er-ilt+Cincinnati+ n ian!tatistical nstitute+

    .awaharlal *ehru&niversityResearch !cholars

    .ournal Articles

    Free Solutions to Challenging ataProblems

    E/#ER E*CE01 years com-ine +2ar3etinganalytics+ Ris3analytics+ 4inancialanalytics+ Analytic!olution 5 'ools

    evelopment+Analytics CoE set6up+ A vanceAnalytics 'raining

    E/ !' *78!ER,E%CL E*'!A large 7lo-al9everage company+ Asmall insurancecompany+A renowne -usinessschool+ A large7lo-al HR 5CompensationConsulting 7roup+ Alarge 7lo-al 'Research group+ Athir party analyticsven or+ A mi si$eanalytics consulting

    E/#ER' !E#re ictivemo elling+!egmentation+2ar3et research+Clic3stream ataanalysis+4orecasting+4inancial 'ime!eries+ !imulation+9ayesianeconometrics+2achine Learning'echni ues+%ecision 'rees+!A!+ !#!!+ R+

    ctave+ !tata+Eviews+ 2atla-+2a"ima+ *etlogo

    What we ont

    o; Quick and dirty back of theenvelope calculationUse jargon presentations with littleimpact on your problem

    Hide that we are stumped

    What we o;

    FREE analytics help to stuckanalysts and consultants

    Customi ed analytics solutions toinstitutes and companies

    FREE snapshot to companiesconsidering entering analytics

    !pply analytics in non"traditional

    areas including films # education

    FREE data analysis help tostudents researchers and faculty