artificial intelligence and behavioral...

Artificialintelligenceandbehavioraleconomics

ColinF.CamererCaltech

[email protected]/7/17.10:01amPST.PreparedforNBERconference,September13-14,2017.Pleasedonotquoteorcirculate.Commentsareverymuchwelcome.I:Introduction

Thispaperdescribes2-1/2highlyspeculativeideasabouthowartificialintelligence(AI)andbehavioraleconomicsmayinteract,particularinfuturedevelopmentsintheeconomyandinresearchfrontiers.FirstnotethatI’llusethetermsAIandmachinelearning(ML)interchangeably(althoughAIisbroader)becausetheexamplesIhaveinmindallinvolveMLandprediction.AgoodintroductiontoMLforeconomistsisMullainathanandSpiess(2017).

ThefirstideaisthatAIcanbeusedinthesearchfornew“behavioral”-typevariablesthataffectchoice.Twoexamplesaregiven,fromexperimentaldataonbargainingandonriskychoice. Thesecondideaisthatsomecommonlimitsonhumanpredictionmightbeunderstoodasthekindsoferrorsmadebypoorimplementationsofmachinelearning.Thatis,peoplearethinkingasiftheyareexecutingmachinelearningalgorithmsbutaredoingamediocrejobofit. Thehalfidea—it’sshort--isthatitisimportanttostudyhowAItechnologyusedinfirmsandbyotherinstitutionscanbothovercomeandexploithumanlimits.Thefullestunderstandingofthistech-humaninteractionwillrequirenewknowledgefrombehavioraleconomicsaboutattentionandperceivedfairness. Pleaseacceptmyapologiesforlimitsofthispaper:IknowabitaboutAI,fromteachingneuralnetworksinthe1990s,teachingdataanalyticsmorerecently,andhavinggeneroussmartcolleaguesatCaltechandelsewhere,asteachers.ButI’mnotanexpert.AndIwrotethisversioninahurry!II:Machinelearningtofindbehavioralvariables Behavioraleconomicscanbedefinedasthestudyofnaturallimitsoncomputation,willpowerandself-interest,andtheimplicationsofthoselimitsformarketequilibrium.Adifferentapproachistodefinebehavioraleconomicsmore

generally,assimplybeingopen-mindedaboutwhatvariablesarelikelytoinfluenceeconomicchoices. Onewaytodescribethisopen-mindednessistolistneighboringsocialscienceswhicharelikelytobethemostfruitfulsourceofexplanatoryvariables—psychology,perhapssociology(e.g.,norms),anthropology(culturalvariationincognition),neuroscience,etc.Callthisthe“behavioraleconomicsborrowsfromitsneighbors”view. Buttheopen-mindednesscouldalsobecharacterizedevenmoregenerally,asaninvitationtomachine-learnhowtopredictfromthelargestpossiblefeatureset.Inthe“behavioraleconomicsborrowsfromitsneighbors”view,featuresareconstructsandtheirmeasurescontributedbydifferentsciences.Thesecouldbeloss-aversion,identity,moralnorms,in-grouppreference,inattention,habit,model-freereinforcementlearning,etc. Butwhystopthere? InageneralMLapproach,predictivefeaturescouldbe—andshouldbe--anyvariablesthatpredict.Thesecanbemeasurablepropertiesofchoices,thesetofchoices,affordancesandmotorinteractionsduringchoosing,measuresofattention,psychophysiologicalmeasuresofbiologicalstates,socialinfluences,propertiesofindividualsdoingthechoosing(SES,wealth,moods,personality,genes),andsoforth.Themorevariables,themerrier. Fromthisperspective,wecanthinkaboutwhatsetsoffeaturesarecontributedbydifferentdisciplinesandtheories.Whatfeaturesdoestextbookeconomictheorycontribute?Constrainedutility-maximizationinitsmostfamiliarformpointstoonlythreekindsofvariables—prices,information(whichcaninformutilities)andconstraints. Mostpropositionsinbehavioraleconomicsaddsomevariablestothislistoffeatures,suchasreference-dependence,context-dependence(menueffects),anchoring,etc. Anotherwaytosearchforpredictivevariablesistospecifyaverylonglistofcandidatevariables(=features)andincludealloftheminanMLapproach1.Thisapproachhastwoadvantages:First,simpletheoriescanbeseenasbetsthatonlyasmallnumberoffeatureswillpredictwell.Second,iflongerlistsoffeaturespredictbetterthanashortlistoftheory-specifiedfeatures,thenthatfindingestablishesaplausibleupperboundonhowmuchpotentialpredictabilityislefttounderstand.Theresultsarealsolikelytocreaterawmaterialfortheorytofigureouthowtoconsolidatetheadditionalpredictivepowerintocrystallizedtheory(seealsoKleinberg,Liang,andMullainathan2015).

1Anotherexampleisusingmultivoxelpatternanalysisto“decode”brainactivityfor

Ifbehavioraleconomicsisthoughtofasopen-mindednessaboutwhatvariablesmightpredict(asitisforme)thenanMLapproachwithmanyvariablesisapotentiallyusefulapproach.I’llillustrateitwithsomeexamples.2 Bargaining:Thereisalonghistoryofbargainingexperimentstryingtopredictwhatbargainingoutcomes(anddisagreementrates)willresultfromstructuralvariablesusinggame-theoreticmethods.Inthe1980stherewasasharpturn,inexperimentalwork,towardsnoncooperativeapproachesinwhichthecommunicationandstructureofbargainingwascarefullyfixed.Thishappenedbecausegametheory,atthetime,deliveredsharpnewpredictionsaboutwhatoutcomesmightresultdependingonthestructuralparameters(suchastheorderofbargainingoffers,discountratesovertime,etc.) However,mostnaturalbargainingisnotgovernedbysuchstrictrules.Therefore,itisimportanttounderstandwhatbargainingoutcomesmightresultwhenbargainingis“semi-structured”.“Semi-structured”meansthereisadeadlineandprotocolforacceptancebutotherwisenorestrictionsonwhocanofferwhatatwhattime(andpotentiallyincludinglanguage). Unstructuredbargainingisripeformachinelearning.IntheexperimentsofCamereretal(2017),twoplayersbargainoverhowtodivideanamountofmoneyworth$1-$6(inintegervalues).Oneinformed(I)playerknowstheamount;theother,uniformed(U)player,doesn’tknowtheamount.TheyarebargainingoverhowmuchtheuninformedUplayerwillget.ButbothplayersknowthatIknowstheamount. Theybargainover10secondbymovingcursorsonabargainingnumberline(Figure1).Thedatacreatedineachtrialisatimeseriesofcursorlocationswhichareaseriesofstepfunctionscomingfromalowoffertohigherones(representingincreasesinoffersfromI)andfromhigherdemandstolowerones(representingdecreasingdemandsfromU).

2Fudenberg-Liangaddlater

Figure1:(A)Initialofferscreen(forinformedplayerI,redbar);(B)examplecursorlocationsafter3secs(indicatingamountofferedbyI,red,ordemandedbyU,blue).(C)cursorbarsmatchwhichindicatesanoffer,consummatedat6seconds.(D)FeedbackscreenforplayerI.PlayerUalsoreceivesfeedbackaboutpiesizeandprofitifatradewasmade(otherwisezero). Supposearetryingtopredictwhethertherewillbeanagreementornotbasedoneverythingthatcanbeobserved.Fromatheoreticalpointofview,efficientbargainingbasedonrevelationprincipleanalysispredictsanexactrateofdisagreementforeachoftheamounts$1-6,basedonlyonthedifferentamountsavailable.Remarkably,thispredictionisprocess-free. However,fromanMLpointofviewtherearelotsoffeaturesrepresentingwhattheplayersaredoingwhichcouldaddpredictivepower(besidestheprocess-freepredictionbasedontheamountatstake).Bothcursorlocationsarerecordedevery25msec.Thetimeseriesofcursorlocationsisassociatedwithahugenumberoffeatures—howfarapartthecursorsare,thetimesincelastconcession[=cursormovement],sizeoflastconcession,interactionsbetweenconcessionamountsandtimes,etc. Ongoingexperimentsalsorecordfacialexpressionsandpsychophysiology(skinconductance).ThemainpointtoappreciateisthatfromanMLpointofview,recordingeverythingpossibleduringthebargaining,andabouttheplayers(personality,wealth,mood)couldallpotentiallyimprovepredictionofwhetherthereisdisagreement. Figure2showsanROCcurveindicatingtest-setaccuracyinpredictingwhetherabargainingtrialendsinadisagreement(=1)ornot.ROCcurvessketchoutcombinationsoftruepositiverates,P(disagree|predictdisagree)andfalsepositiveratesP(agree|predictdisagree).AnimprovedROCcurvemovesupandto

Figure 1: Bargaining interface. (a) Initial offer screen: in the first two seconds of bargaining,players set their initial position, oblivious to the initial position of their partner. The piesize at the top left corner appears only for the informed type. (b) Players communicate theiroffers using mouse click on the interface. (c) When demands match, feedback in the formof a green vertical stripe appears on the screen. If no changes are made in the following 1.5seconds, a deal is made. (d) Following the game, both players are notified regarding theirpayoffs and the pie size.

Thus, in order to make a deal, the latest time in which players’ bids could match wast = 8.5 seconds.

8. If no deal had been made within 10 seconds of bargaining, both players’ payoffs fromthat round were $0.

9. After each game, both players were told their payoffs and the actual pie size, for 5seconds (see Fig. 1d).

4.2 Methods

We conducted eight experimental sessions, five at the Caltech SSEL and three at the UCLACASSEL labs. There were a total of N=110 subjects (mean age: 21.3 SD: 2.4; 47 females, seeAppendix B for detailed session information). In the beginning of each session, subjects were

14

theleft,reflectingmoretruepositivesandfewerfalsepositives.Asisevident,predictingfromprocessdataonlyisaboutasaccurateasusingjusttheamount(“pie”)sizes(theblueandgreenROCcurves).Usingbothtypesofdataimprovespredictionsubstantially(theredcurve).

Figure2:ROCcurvesshowingcombinationsoffalseandtruepositiveratespredictingbargainingdisagreements.Improvedforecastingisrepresentedbycurvesmovingtotheupperleft.Thecombinationofprocess(cursorlocationfeatures)and“pie”(amount)dataareaclearimprovementovereithertypeofdataalone. Machinelearningisabletofindpredictivevalueindetailsofhowthebargainingoccurs(beyondthesimple,andverygood,predictionfromtheamountbeingbargainedover).Ofcourse,thisdiscoveryisthebeginningofthenextstepforbehavioraleconomics.Itraisesquestions:Whatvariablespredict?Dopeopleunderstandwhythosevariablesareimportant?Cantheyconstrainexpressionofvariablesthathurttheirbargainingpower?Canmechanismsbedesignedthatrecordthesevariablesandthencreateefficientmediationwithnodisagreementintowhichpeoplewillvoluntarilyparticipate? Riskychoice:PeysakhovichandNaecker(2017)usemachinelearningtoanalyzedecisionsbetweensimplefinancialrisks.Thesetofrisksarerandomlygeneratedtriples($y,$x,0)withassociatedprobabilities(p_x,p_y,p_0).Subjectsgiveawillingness-to-pay(WTP)foreachgamble. Thefeaturesetisthefiveprobabilityandamountvariables(excludingthe$0payoff),quadratictermsforallfive,andalltwo-andthree-wayinteractionsamongthelinearandquadraticvariables.Foraggregate-levelestimationthiscreates5+5+45+120=175variables.

process data outperforms the classifier using the pie size alone for every time stamp in thebargaining process (Fig. 7d). These results show that the process of bargaining itself canlead to bargaining failures, above and beyond the pie size alone.35

Figure 7: Strike prediction using bargaining process data. (a-c) Receiver Operating Char-acteristic (ROC) for predicting disagreements, 2 and 7 seconds into the bargaining game.The dashed lines represent the false and true positive rates of a random classifier. (d) Areaunder the curve (AUC) of disagreements clasiffiers using process data, pie size, and the twocombined. Note that the classifier’s input included only trials that were still in progress(when a deal has not yet been achieved), and excluded trials in which the offers and demandwere equal at the relevant time stamp.

35We formally tested the predictive accuracy of process data above and beyond the pie size by comparingthe mean square error of the model that was trained using both process data and pie to the model thatwas trained using only the pie. Paired t-tests of the squared errors from both models (with adjustmentsfor clustering at the session level) showed that adding process data significantly decreased out-of-sampleprediction error for all times greater than or equal to 2 seconds (at the 5% level), with a marginally significantdecrease in prediction error in the first second (p=0.051).

29

MLpredictionsarederivedfromregularizedregressionwithalinearpenalty(LASSO)orsquaredpenalty(ridge)for(absolute)coefficients.ParticipantswereN=315MTurksubjectswhoeachgave10useableresponses.Thetrainingsetconsistsof70%oftheobservations,30%areheldoutasatestset.

Theyalsoestimatepredictiveaccuracyofaone-variableexpectedutilitymodel(EU,withpowerutility)andaprospecttheory(PT)modelwhichaddsoneadditionalparametertoallownonlinearprobabilityweighting(TverskyandKahneman,1992)(withseparateweights,notcumulativeones).Forthesemodelsthereareonly1or2freeparametersperperson.3

Theaggregatedataestimationusesthesamesetofparametersforallsubjects.Inthisanalysis,thetestsetaccuracy(meansquarederror) isalmostexactlythesameforPTandforbothLASSOandridgeMLpredictions,eventhoughPTusesonlytwovariablesandtheMLmethodsuse175variables.Individuallevelanalysis,inwhicheachsubjecthastheirownparametershasabouthalfthemeansquarederrorastheaggregateanalysis.PTandridgeMLareaboutequallyaccurate.

ThefactthatPTandMLareequallyaccurateisabitsurprising,becausetheMLmethodallowsquitealotofflexibilityinthespaceofpossiblepredictions.Indeed,theauthors’motivationwastouseMLtoshowhowamodelwithahugeamountofflexibilitycouldfit,possiblytoprovideaceilinginachievableaccuracy.IftheMLpredictionsweremoreaccuratethanEUorPT,thegapwouldshowhowmuchimprovementcouldbehadbymorecomplicatedcombinationsofoutcomeandprobabilityparameters.Buttheresult,instead,showsthatmuchbusiermodelsarenotmoreaccuratethanthetime-testedtwo-parameterformofPT,forthisdomainofchoices.

III:HumanpredictionasimperfectmachinelearningSomepre-historyofbehavioraleconomics Behavioraleconomicsasweknowit,anddescribeitnowadays,begantothrivewhenchallengestosimplerationalityprinciples(thencalled“anomalies”)cametohaveruggedempiricalstatusandtopointtonaturalimprovementsintheory(Thaler,Misbehaving;Lewis,Undoing).Itwascommoninthoseearlydaystodistinguishanomaliesabout“preferences”,suchasmentalaccountingviolationsof

3 Note,however,thattheMLfeaturesetdoesnotexactlynesttheEUandPTforms.Forexample,aweightedcombinationofthelinearoutcomeXandthequadratic

termX2doesnot exactly equal the power function Xα.

fungibilityandreference-dependence,andanomaliesabout“judgment”oflikelihoodsandquantities. Somewhathiddenfromeconomists,atthetimeandevennow,wasthefactthattherewasactiveresearchinmanyareasofjudgmentanddecisionmaking(JDM)proceedinginparallelandconductedalmostentirelyinpsychologydepartmentsandsomebusinessschools.JDMresearchwasaboutthosejudgment“anomalies”.Thisresearchflourishedbecausetherewasahealthyrespectforsimplemathematicalmodelsandcarefultesting,whichenabledregularitiestocumulateandgavereasonstodismissweakresults.Theresearchcommunityalsohadonefootinpracticaldomainstoo(suchasjudgmentsofnaturalrisks,medicaldecisionmaking,law,etc.)sothatgeneralizabilityoflabresultswasimplicitlyaddressed. AnimportantongoingdebateinJDMwasaboutthecognitiveprocessesinvolvedinactualdecisions,andthequalityofthosepredictions.Therewereplentyofcarefullabexperimentsaboutsuchphenomena,butalsoanearlierliteratureonwhatwasthencalled“clinicalversusstatisticalprediction”.Thereliestheearliestcomparisonbetween(primitiveformsof)AIand(thejudgmentpartof)behavioraleconomics. PaulMeehl’s(1954)compactbookstarteditall.Meehlwasaremarkablecharacter.Hewasarareexample,atthetime,ofaworkingclinicalpsychiatristwhowasalsointerestedinstatisticsandevidence(aswereothersatMinnesota).MeehlhadapictureofFreudinhisoffice,andpracticedclinicallyfor50yearsintheVeteran’sAdministration. Meehl’smotherhaddiedwhenhewas16,undercircumstanceswhichapparentlymadehimsuspiciousofhowmuchdoctorsactuallyknewabouthowtomakesickpeoplewell. Hisbookcouldbereadaspursuitofsuchasuspicionscientifically:Hecollectedallthestudieshecouldfind(22)whichcomparedasetofclinicaljudgmentswithactualoutcomes,andwithsimplelinearmodelsusingobservablepredictors(someobjectiveandsomesubjectivelyestimated).Thedomainsofjudgmentincluded____. Meehl’sideawasthatthesestatisticalmodelscouldbeusedasabenchmarktoevaluateclinicians.4AsDawesandCorrigan(1974,p.97)wrote

"thestatisticalanalysiswasthoughttoprovideafloortowhichthejudgment oftheexperiencedcliniciancouldbecompared.Thefloorturnedouttobea ceiling.”

Ineverycasethestatisticalmodeloutpredictedortiedthejudgmentaccuracyoftheaverageclinician.Alatermeta-analysisof117studies(Groveetal

4CheckifthisisinMeehl“whatIexpectedtobeafloorturnedouttobeaceiling”

2000)foundonlysixinwhichclinicians,onaverage,weremoreaccurate(andseeDawesetal2006Sci). Itispossiblethatinanyonedomain,thedistributionofclinicianscontainssomestarswhocouldpredictmuchmoreaccurately.However,laterstudiesattheindividuallevelshowedthatonlyaminorityofcliniciansweremoreaccuratethanstatisticalmodels(e.g.Goldberg,1970).Kleinbergetal(2017)’sstudyofmachine-learnedandjudicialdetentiondecisionsisamodernexampleofthesametheme. Inthenextcoupleofdecadesevidencebegantomountaboutwhyclinicaljudgmentcouldbesoimperfect.Acommonthemewasthatcliniciansweregoodatmeasuringparticularvariables,orsuggestingwhichobjectivevariablestoinclude,butwerenotsogoodatcombiningthemconsistently(e.g.Sawyer1966).InarecollectionMeehl(1986)gaveasuccinctdescriptionofthistheme(p373): Whyshouldpeoplehavebeensosurprisedbytheempiricalresultsinmy summarychapter?Surelyweallknowthatthehumanbrainispoorat weightingandcomputing.Whenyoucheckoutatasupermarket,youdon’t eyeballtheheapofpurchasesandsaytotheclerk,“Wellitlookstomeasif it’sabout$17.00worth;whatdoyouthink?”Theclerkaddsitup.Thereare nostrongarguments,fromthearmchairorfromempiricalstudiesof cognitivepsychology,forbelievingthathumanbeingscanassignoptimal weightsinequationssubjectivelyorthattheyapplytheirownweights consistently,thequeryfromwhichLewGoldbergderivedsuchfascinating andfundamentalresults. ManyoftheimportantcontributionswereincludedintheKahnemanetal(1982)book(whichintheolddayswascalledthe“blue-greenbible”). Inoneclassicstudy,Oskamp(1965)hadeightexperiencedclinicalpsychologists,and24graduateandundergraduatestudents,readmaterialaboutanactualperson,infourstages.Thefirststagewasjustthreesentencesgivingbasicdemographics,education,andoccupation.Thenextthreestageswere1.5-2pageseachaboutchildhood,schooling,andthesubject’stimeinthearmyandbeyond. Thesubjectshadtoanswer25personalityquestionsaboutthesubject,eachwithfivemultiple-choiceanswers5,aftereachofthefourstagesofreading.Chanceguessingwouldbe20%accurate.Notethatthesequestionshadcorrectanswers,basedonotherevidenceaboutthecase, Oskamplearnedtwothings:First,therewasnodifferenceinaccuracybetweentheexperiencedcliniciansandthestudents. Second,allthesubjectswerebarelyabovechance;andaccuracydidnotimproveastheyreadmorematerialinthethreestages.Afterjustthefirst5Forexample,“Kidd’spresentattitudetowardhismotherisoneof:(a)Loveandrespectforherideals;(b)affectionatetoleranceforherfoibles”;etc.

paragraph,theiraccuracywas26%;afterreadingallfiveadditionalpagesacrossthethreestages,accuracywas28%(aninsignificantdifferencefrom26%).However,subjectiveconfidenceinhowaccuratetheywererosealmostlinearlyastheyreadmore,from33%to53%.6 Thisincreaseinconfidence,combinedwithnoincreaseinaccuracy,isreminiscentofthedifferencebetweentrainingsetandtestsetaccuracyinAI.Asmoreandmorevariablesareincludedinatrainingset,theaccuracywillalwaysincreaseunlessfitispenalizedinsomeway.Asaresultofoverfitting,however,test-setaccuracycandeclinewithmorevariables.Theresultinggapbetweentraining-andtest-setaccuracywillgrow,muchastheoverconfidenceinOskamp’ssubjectsgrewwithmore“variables”(morematerialonthesinglesubject). Someotherimportantfindingsemerged.Onedrawbackofthestatisticalpredictionapproach,forpractice,wasthatitrequireslargesamplesofhigh-qualityoutcomedata(labeleddata,forsupervisedlearning,inAIlanguage).Thesewererarelyavailableatthetime. Dawes(1979)proposedtogiveuponestimatingvariableweightsthroughacriterion-optimizing“proper”procedurelikeOLS7,using“improper”weightsinstead.Anexampleisequal-weightingofstandardizedvariables,whichisoftenaverygoodapproximationtoOLSweighting. AninterestingexampleofimproperweightsiswhatDawescalled“bootstrapping”(adistinctusagefromtheconceptinstatisticsofbootstrapresampling).Theideawastoregressclinicaljudgmentsonpredictors,andusethoseestimatedweightstomakeprediction.Thisisequivalent,ofcourse,tousingthepredictedpartoftheclinical-judgmentregressionanddiscarding(orregularizingtozero,ifyouwill)theresidual.Iftheresidualismostlynoisethencorrelationaccuraciescanbeimprovedbythisprocedure,andtheytypicallyare(e.g.Camerer,1981).Successfulexamplesincluded___. Laterstudiesindicatedaslightlymoreoptimisticpicturefortheclinicians.Ifbootstrap-regressionresidualsarepurenoise,theywillalsolowerthetest-retestreliabilityofclinicaljudgment(i.e.,thecorrelationbetweentwojudgmentsonthesamecasesmadebythesameperson).However,analysisofthefewstudieswhichreportbothtest-retestreliabilityandbootstrappingregressionsindicatethatonlyabout40%oftheresidualvarianceisunreliablenoise(Camerer1981a).Thus,residualsdocontainreliablesubjectiveinformation(thoughitmaybeuncorrelatedwithoutcomes).BlattbergandHoch(1990)laterfoundthatforactualmanagerialforecastsofproductsalesandcouponredemptionrate,residualsarecorrelatedabout.30withoutcomes.Asaresult,averagingstatisticalmodelforecastsand6Otherresultscomparingmore-andless-experiencedclinicians,however,havealsoconfirmedthefirstfinding(experiencedoesnotimproveaccuracymuch),butfoundthatexperiencetendstoreduceoverconfidence(Goldberg1959).7Presciently,Dawesalsomentionsusingridgeregressionasaproperproceduretomaximizeout-of-samplefit.

managerialjudgmentsimprovedpredictionsubstantiallyoverstatisticalmodelsalone.

Sparsityisgoodforyoubuttastesbad

Anotherfeatureoftheearlystatistical-vs-clinicalliteraturewasanemphasisonhowsmallnumbersofvariablesmightfitsurprisinglywell.

AstrikingexampleinDawes(1979)isatwovariablemodelpredictingmaritalhappiness:Therateoflovemakingminustherateoffighting.Hereportscorrelationsof.40and.81intwostudies(EdwardsandEdwards,1977;Thornton1977).8

Forexample,Dawes(1971)didafamousstudyaboutadmittingstudentstotheUniversityofOregonPhDprograminpsychologyfrom1964-67.Hecomparedathree-variablemodelbasedonGRE,undergraduateGPA,andqualityoftheapplicant’sundergraduateschooltoanadmissionscommittee’squantitativerecommendation.Theoutcomevariablewasfacultyratings1969.(Thevariableswerestandardized,thenweightedequally.)Thissimplemodelcorrelatedwithlatersuccessintheprogrammorehighly(.48,cross-validated)thananadmissionscommittee’squantitativerecommendation(.19).9Thebootstrappingmodelcorrelated.25. DespiteDawes’sevidence,Ihaveneverbeenabletoconvinceanygraduateadmissionscommitteeattwoinstitutions(PennandCaltech)toactuallycomputestatisticalratings,evenasawaytofilterout“certainrejection”applications. Whynot? Ithinktheansweristhatthehumanmindrebelsagainstregularizationandtheresultingsparsity.Weareborntooverfit.EveryAIresearcherknowsthatincludingfewervariables(e.g.,bygivingmanyofthemzeroweightsinLASSO,orlimitingtreedepthinrandomforests)isausefulall-purposeprophylacticforoverfittingatrainingset.

8Morerecentanalysesusingtranscribedverbalinteractionsgeneratecorrelationsfordivorceandmaritalsatisfactionaround.6-.7.Thecorevariablesarecalledthe“fourhorsemen”ofcriticism, defensiveness, contempt, and “stonewalling" (listener withdrawal).

9Readersmightguessthatthequalityofeconometricsforinferenceinsomeoftheseearlierpapersislimited.Forexample,Dawes(1971)onlyusedthe111studentswhohadbeenadmittedtotheprogramandstayedenrolled,sothereislikelyscalecompression,etc.Someofthefacultymembersratingthosestudentswereprobablyalsoinitialraterswhichcouldgenerateconsistencybiasesetc.

Butpeopledonotliketoexplicitlythrowawayinformation.Thisisparticularlytrueiftheinformationisalreadyinfrontofus—intheformofaPhDadmissionsapplication,forexample.Ittakessomecombinationofwillpower,arrogance,orwhathaveyou,tosimplyignorelettersofrecommendation,forexample. Theposterchildforusingworthlessinformationispersonalshortface-to-faceinterviews(___cites).Thereisamountainofevidencethatsuchinterviewsdonotpredictanythingaboutlaterworkperformance,ifinterviewsareuntrainedanddonotuseastructuredinterviewformat.AlikelyexampleisinterviewingfacultycandidateswithnewPhDs,inhotelsuitesattheASSAmeetings.SupposethegoalofsuchinterviewsistopredictwhichnewPhDswilldoenoughterrificresearch,goodteaching,andotherkindsofserviceandpublicvaluetogettenureseveralyearslater. Butthebrainofaninterviewerprobablyhasmorebasicthingsonitsmind.Isthispersonwell-dressed?Wouldtheyprotectmeifthereisdanger?Friendorfoe?Doestheiraccentandwordchoicesoundlikemine? Peoplewhodotheseinterviews(includingme)sayexplicitlythatwearetryingtoprobethecandidate’sdepthofunderstandingabouttheirtopic,howpromisingnewplannedresearchis,etc.Butwereallyareevaluatingis“Dotheybelonginmytribe?” WhileIdothinksuchinterviewsareawasteoftime10,itisconceivablethattheygeneratesomeinformationthatisvalid.Theproblemisthatinterviewersmayweightthewronginformation(aswellasoverweightingfeaturesthatshouldberegularizedtozero).Indeed,nowadaysthebestmethodtocapturesuchinformationisprobablytovideotapetheinterview,combineitwithothertasksresemblingworkperformance(e.g.,havethemreviewadifficultpaper),andmachinelearntheheckoutofthatlargercorpusofinformation. AnothersimpleexampleofwhereignoringinformationiscounterintuitiveiscapturedbythetwomodesofforecastingwhichKahnemanandLovallo(19_)wroteabout.Theycalledthem“inside”and“outside”view.Thetwoviewswereinthecontextofforecastingtheoutcomeofaproject(suchaswritingabook,orabusinessinvestment).Theinsideview “focusedonlyonaparticularcase,byconsideringtheplananditsobstaclestocompletion,byconstructingscenariosoffutureprogress”(p25). Theoutsideview “focussesonthestatisticsofaclassofcaseschosentobesimilarinrelevant respectstothecurrentone”(p25)

10Therearemanycaveatsofcoursetothisstrongclaim.Forexample,oftentheschoolispitchingtoattractahighlydesirablecandidate,nottheotherwayaround.

Theoutsideviewdeliberatelythrowsawaymostoftheinformationaboutaspecificcaseathand(butkeepssomeinformation):Itreducestherelevantdimensionstoonlythosethatarepresentintheoutsideviewreferenceclass.(Thisis,again,aregularizationthatzerosoutallthefeaturesthatarenot“similarinrelevantrespects”.) InMLterms,theoutsideandinsideviewsarelikedifferentkindsofclusteranalyses.TheoutsideviewparsesallpreviouscasesintoKclusters;acurrentcasebelongstooneclusteroranother(thoughthereis,ofcourse,adegreeofclustermembershipdependingonthedistancefromclustercentroids).Theinsideview—initsextremeform—treatseachcaseasunique.Hypothesis:Humanjudgmentislikeoverfittedmachinelearning ThecoreideaIwanttoexploreisthatsomeaspectsofeverydayhumanjudgmentcanbeunderstoodasthetypeoferrorsthatwouldresultfrombadly-donemachinelearning.11I’llfocusontwoaspects:Overconfidenceandhowitincreases;and;limitederrorcorrection. Inbothcases,weareimplicitlytalkingaboutaresearchprogramwhichcollectsdataonhumanpredictionsandcomparethemwithmachine-learnedpredictions.Thendeliberatelyre-dothemachinelearningbadly(e.g.,failingtocorrectforover-fitting)andseewhethertheimpairedMLpredictionshavepropertiesofhumanones. Overconfidence:Overconfidencecomesindifferentflavors.Inthepredictivecontextwewilldefineitashavingtoonarrowaconfidenceintervalaroundaprediction.(Inregression,forexample,thismeansunderestimatingthestandarderrorofaconditionalpredictionP(Y|X)basedonobservablesX.) Itcouldbethathumanoverconfidenceresultsfromafailuretowinnowthesetofpredictors(asinLASSOpenaltiesforfeatureweights).Overconfidenceofthistypecouldcorrespondtowhatoneexpectsfromoverfitting:Hightrainingsetaccuracycorrespondstoconfidenceaboutpredictions.Adropinaccuracyfromtrainingtotestmeansthatpredictionswerenotasaccurateashoped. Limitederrorcorrection:InsomeMLprocedurestrainingtakesplaceovertrials.Forexample,theearliestneuralnetworksweretrainedbymakingoutputpredictionsbasedonasetofnodeweights,thenback-propagatingpredictionerrorstoadjusttheweights.Earlycontributionsintendedforthisprocesstocorrespondtohumanlearning—e.g.,howchildrenlearntorecognizecategoriesofnaturalobjectsortolearnpropertiesoflanguage(e.g.McClellandandRumelhart). Onecanthenaskwhethersomeaspectsofadulthumanjudgmentcorrespondtopoorimplementationoferror-correction.Aninvisibleassumptionthatis,ofcourse,partofneuralnetworktrainingisthatoutputerrorsare

11MyintuitionaboutthiswasaidedbyJesseShapiro,whoaskedawell-craftedquestionpointingstraightinthisdirection.

recognized(iflearningissupervisedbylabeleddata).Butwhatifhumansdonotrecognizeerrororrespondtoitinappropriately? Onemaladaptiveresponsetopredictionerroristoaddfeatures.Forexample,supposeacollegeadmissionsdirectorhasapredictivemodelandthinksstudentswhoplaymusicalinstrumentshavegoodstudyhabitsandwillsucceedinthecollege.NowastudentcomesalongwhoplaysdrumsintheDeadMilkmenpunkband.Thestudentgetsadmitted(becauseplayingmusicisagoodfeature),butstrugglesincollegeanddropsout. Theadmissionsdirectorcouldback-propagatethepredictiveerrortoadjusttheweightsonthe“playsmusic”feature.Orshecouldcreateanewfeaturebysplitting“playsmusic”into“playsdrums”and“playsnon-drums”andignoretheerror.Thisprocedurewillgeneratetoomanyfeaturesandwillnotuseerror-correctioneffectively.12 (Notetoothatadifferentadmissionsdirectormightcreatetwodifferentsubfeatures,“playsmusicinapunkband”and“playsnon-punkmusic”.Inthestylizedversionofthisdescription,bothwillbecomeconvincedthattheyhaveimprovedtheirmentalmodelandwillretainhighconfidenceaboutfuturepredictions.Buttheirinter-raterreliabilitywillhavegonedown,whichputsamathematicalcaponhowgoodaveragepredictiveaccuracycanbe.Moreonthisnext.)IV:AItechnologyasabionicpatch,ormalware,forhumanlimits Wespendalotoftimeinbehavioraleconomicsthinkingabouthowpoliticalandeconomicsystemseitherexploitbadchoicesorhelppeoplemakegoodchoices.Whatbehavioraleconomicshastooffertothisgeneraldiscussionistospecifyamorepsychologicallyaccuratemodelofhumanchoiceandhumannaturethanthecaricatureofconstrainedutility-maximization(asusefulasithasbeen). AIentersbycreatingbettertoolsforbothmakinginferencesaboutwhatapersonwantsorwilldo.Sometimesthesetoolswillhurtandsometimestheywillhelp. AIhelps:Aclearexampleisrecommendersystems.Goodsystemsareakindofbehavioralprosthetictoremedyhumanlimitsonattentionandtheresultingincompletenessofpreferences. ConsiderNetflixmovierecommendations.Netflixusesaperson’sviewingandratingshistory,aswellasopinionsofothersandmovieproperties,asinputstoavarietyofalgorithmstosuggestwhatcontenttowatch.Astheirdatascientistsexplained(Gomez-UribeandHunt,2015):

atypicalNetflixmemberlosesinterestafterperhaps60to90secondsof choosing,havingreviewed10to20titles(perhaps3indetail)ononeortwo12Anotherwaytomodelthisisastherefinementofapredictiontree,wherebranchesareaddedfornewfeaturewhenpredictionsareincorrect.Thiswillgenerateabushytree,whichgenerallyharmstest-setaccuracy.

screens...Therecommenderproblemistomakesurethatonthosetwo screenseachmemberinourdiversepoolwillfindsomethingcompellingto view,andwillunderstandwhyitmightbeofinterest.

Forexample,their“BecauseYouWatched”lineusesavideo-videosimilarityalgorithmtosuggestunwatchedvideossimilartoonestheuserwatchedandliked.

Therearesomanyinterestingimplicationsofthesekindsofrecommendersystemsforeconomicsingeneral,andforbehavioraleconomicsinparticular.

Forexample,notethatNetflixwantsitsmembersto“understandwhyit[arecommendedvideo]mightbeofinterest”.Thisis,atbottom,aquestionaboutinterpretabilityofAIoutput,howamemberlearnsfromrecommendersuccessesanderrors,andwhetheramember“trusts”Netflixingeneral.Allthesearepsychologicalprocesseswhichmayalsodependheavilyondesignandexperiencefeatures(UD,UX).

AI‘hurts’13:AnotherfeatureofAI-drivenpersonalizationispricediscrimination.Ifpeopledoknowalotaboutwhattheywant,andhaveprecisewillingness-to-pay(WTP),thencompanieswillquicklydevelopthecapacitytopersonalizepricestoo.Thisseemstobeaconceptthatisemergingrapidlyanddesperatelyneedstobestudiedbyindustrialeconomistswhocanfigureoutwelfareimplications.Behavioraleconomicsenterswithsomeevidenceabouthowpeoplemakejudgmentsaboutfairnessofprices(e.g.,Kahneman,KnetschandThaler,198614)andhowfairnessjudgmentsinfluencesbehavior. Myintuitionisthatingeneral,peoplecancometoacceptahighdegreeofvariationinpricesforwhatisessentiallythesameproduct,aslongasthereisveryminorproductdifferentiation15orfirmscanarticulatewhydifferentpricesarefair(whichrequiresinsightintohowconsumersthink,andheterogeneity).Itisalsolikelythatpersonalizedpricingwillharmconsumerswhoarethemosthabitualorwhodonotshopcleverly,butwillhelpsavvyconsumerswhocanhijackthepersonalizationalgorithmstolooklikelowWTPconsumersandsavemoney.SeeGabaixandLaibson(2006?)foracarefullyworked-outmodelabouthidden(“shrouded”)productattributes.V:ConclusionTBA

13Iputtheword‘hurts’inquoteshereasawaytoconjecture,throughpunctuation,thatinmanyindustriestheAI-drivencapacitytopersonalizepricingwillharmconsumerwelfareoverall.14Recent?15mariachi

References (partial)

Blattberg, R. C., & Hoch, S. J. (1990). Database models and managerial intuition: 50% database + 50% manager. Management Science 36(8). 887-899.

Camerer, C. F. (1981a). The validity and utility of expert judgment. Unpublished Ph.D. dissertation, Center for Decision Research, University of Chicago Gradu- ate Schoo] of Business.

Camerer, C. F. (1981b). General conditions for the success of bootstrapping models. Organizational Behavior and Human Performance, 27, 411-422.

Camerer, C.F and Eric Johnson. 1991. The process-performance paradox in expert judgment: Why can experts know so much and predict so badly? ” In A. Ericsson and J. Smith (Eds.), Toward a General Theory of Expertise: Prospects and Limits, Cambridge: Cambridge University Press, 1991.

Colin F. Camerer, Gideon Nave & Alec Smith. (2017) Dynamic unstructured bargaining with private information and deadlines: theory and experiment. Working paper

Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of Abnormal Psychology, 46, 271-280.

Dawes, R. M. (1971). A case study of graduate admissions: Application of three principles of human decision making. American Psychologist, 26, 180-188.

Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81, 97.

Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668-1674.

Edwards, D. D., & Edwards, J. S. Marriage: Direct and continuous measurement. Bulletin of the Psychonomic Society, 1977, 10, 187-188.

Einhorn, H. J. (1986). Accepting error to make less error. Journal of Personality Assessment, 50, 387-395.

Einhorn, H. J., & Hogarth, R. M. (1975). Unit weighting schemas for decision making. Organization Behavior and Human Performance, 13, 171-192.

Gabaix Laibson QJE

Goldberg, L. R. Man versus model of man: A rationale, plus some evidence for a method of improving on clinical inferences. Psychological Bulletin, 1970, 73, 422-432;

Goldberg, L. R. (1959). The effectiveness of clinicians' judgments: The diagnosis of organic brain damage from the Bender-Gestalt test. Journal ofConsulting Psychol· ogy, 23, 25-33.

Goldberg, L. R. (1968). Simple models or simple processes? American Psychologist, 23, 483-496.

Gomez-Uribe, Carlos and Neil Hunt. (2016)The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Transactions on Management Information Systems (TMIS), 6(4), article 13.

Grove, W. M., Zald, D. H., Lebow, B. S., Snits, B. E., & Nelson, C. E. (2000) Clinical vs. mechanical prediction: A meta-analysis. Psychological Assessment, 12:19-30.

Johnson, E. 1. (1980). Expertise in admissions judgment. Unpublished doctoral dissertation, Carnegie-Mellon University.

Johnson, E. J. (1988). Expertise and decision under uncertainty: Performance and process. In M. T. H. Chi, R. Glaser & M. 1. Farr (Eds.), The nature of expertise (pp. 209-228). Hillsdale, NJ: Erlbaum.

Kahneman,KnetschandThaler,1986fairness

Kleinberg,Jon,AnnieLiang,andSendhilMullainathan.2015.“TheTheoryisPredictive,ButisItComplete?AnApplicationtoHumanPerceptionofRandomness.”Unpublishedpaper.

Kleinberg,Jon,HimabinduLakkaraju,JureLeskovec,JensLudwig,andSendhilMullainathan.2017.“HumanDecisionsandMachinePredic-tions.”NBERWorkingPaper23180.

Meehl,P.E.(1954).Clinicalversusstatisticalprediction:Atheoreticalanalysisandareviewoftheevidence.Minneapolis:UniversityofMinnesotaPress.

Mullainathan,SendhilandJannSpiess(2017)MachineLearning:AnAppliedEconometricApproach.JournalofEconomicPerspectives31(2):87-106

Peysakhovich,AlexanderandNaecker,Jeffrey.Usingmethodsfrommachinelearningtoevaluatebehavioralmodelsofchoiceunderriskandambiguity.JournalofEconomicBehavior&OrganizationVolume133,January2017,Pages373-384

Sawyer, Jack (1966). Measurement and prediction, -clinical and statistical. Psychological Bulletin, 66, 178-200.

Thornton, B. Linear prediction of marital happiness: A replication. Personality and Social Psychology Bulletin, 1977, 3, 674-676.

Daniel Kahneman and Dan Lovallo (1993) Timid Choices and Bold Forecasts: A Cognitive Perspective on Risk Taking Management Science 39:1 , 17-31

Klayman, J., & Ha, Y. (1985). Confirmation, disconfirmation, and information in hypothesis testing. Psychological Review, 211-228.

artificial intelligence and behavioral...

Documents