artificial intelligence and behavioral...

17
Artificial intelligence and behavioral economics Colin F. Camerer Caltech [email protected] 9/7/17. 10:01am PST. Prepared for NBER conference, September 13-14, 2017. Please do not quote or circulate. Comments are very much welcome. I: Introduction This paper describes 2-1/2 highly speculative ideas about how artificial intelligence (AI) and behavioral economics may interact, particular in future developments in the economy and in research frontiers. First note that I’ll use the terms AI and machine learning (ML) interchangeably (although AI is broader) because the examples I have in mind all involve ML and prediction. A good introduction to ML for economists is Mullainathan and Spiess (2017). The first idea is that AI can be used in the search for new “behavioral”-type variables that affect choice. Two examples are given, from experimental data on bargaining and on risky choice. The second idea is that some common limits on human prediction might be understood as the kinds of errors made by poor implementations of machine learning. That is, people are thinking as if they are executing machine learning algorithms but are doing a mediocre job of it. The half idea— it’s short-- is that it is important to study how AI technology used in firms and by other institutions can both overcome and exploit human limits. The fullest understanding of this tech-human interaction will require new knowledge from behavioral economics about attention and perceived fairness. Please accept my apologies for limits of this paper: I know a bit about AI, from teaching neural networks in the 1990s, teaching data analytics more recently, and having generous smart colleagues at Caltech and elsewhere, as teachers. But I’m not an expert. And I wrote this version in a hurry! II: Machine learning to find behavioral variables Behavioral economics can be defined as the study of natural limits on computation, willpower and self-interest, and the implications of those limits for market equilibrium. A different approach is to define behavioral economics more

Upload: others

Post on 11-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

Artificialintelligenceandbehavioraleconomics

ColinF.CamererCaltech

[email protected]/7/17.10:01amPST.PreparedforNBERconference,September13-14,2017.Pleasedonotquoteorcirculate.Commentsareverymuchwelcome.I:Introduction

Thispaperdescribes2-1/2highlyspeculativeideasabouthowartificialintelligence(AI)andbehavioraleconomicsmayinteract,particularinfuturedevelopmentsintheeconomyandinresearchfrontiers.FirstnotethatI’llusethetermsAIandmachinelearning(ML)interchangeably(althoughAIisbroader)becausetheexamplesIhaveinmindallinvolveMLandprediction.AgoodintroductiontoMLforeconomistsisMullainathanandSpiess(2017).

ThefirstideaisthatAIcanbeusedinthesearchfornew“behavioral”-typevariablesthataffectchoice.Twoexamplesaregiven,fromexperimentaldataonbargainingandonriskychoice. Thesecondideaisthatsomecommonlimitsonhumanpredictionmightbeunderstoodasthekindsoferrorsmadebypoorimplementationsofmachinelearning.Thatis,peoplearethinkingasiftheyareexecutingmachinelearningalgorithmsbutaredoingamediocrejobofit. Thehalfidea—it’sshort--isthatitisimportanttostudyhowAItechnologyusedinfirmsandbyotherinstitutionscanbothovercomeandexploithumanlimits.Thefullestunderstandingofthistech-humaninteractionwillrequirenewknowledgefrombehavioraleconomicsaboutattentionandperceivedfairness. Pleaseacceptmyapologiesforlimitsofthispaper:IknowabitaboutAI,fromteachingneuralnetworksinthe1990s,teachingdataanalyticsmorerecently,andhavinggeneroussmartcolleaguesatCaltechandelsewhere,asteachers.ButI’mnotanexpert.AndIwrotethisversioninahurry!II:Machinelearningtofindbehavioralvariables Behavioraleconomicscanbedefinedasthestudyofnaturallimitsoncomputation,willpowerandself-interest,andtheimplicationsofthoselimitsformarketequilibrium.Adifferentapproachistodefinebehavioraleconomicsmore

Page 2: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

generally,assimplybeingopen-mindedaboutwhatvariablesarelikelytoinfluenceeconomicchoices. Onewaytodescribethisopen-mindednessistolistneighboringsocialscienceswhicharelikelytobethemostfruitfulsourceofexplanatoryvariables—psychology,perhapssociology(e.g.,norms),anthropology(culturalvariationincognition),neuroscience,etc.Callthisthe“behavioraleconomicsborrowsfromitsneighbors”view. Buttheopen-mindednesscouldalsobecharacterizedevenmoregenerally,asaninvitationtomachine-learnhowtopredictfromthelargestpossiblefeatureset.Inthe“behavioraleconomicsborrowsfromitsneighbors”view,featuresareconstructsandtheirmeasurescontributedbydifferentsciences.Thesecouldbeloss-aversion,identity,moralnorms,in-grouppreference,inattention,habit,model-freereinforcementlearning,etc. Butwhystopthere? InageneralMLapproach,predictivefeaturescouldbe—andshouldbe--anyvariablesthatpredict.Thesecanbemeasurablepropertiesofchoices,thesetofchoices,affordancesandmotorinteractionsduringchoosing,measuresofattention,psychophysiologicalmeasuresofbiologicalstates,socialinfluences,propertiesofindividualsdoingthechoosing(SES,wealth,moods,personality,genes),andsoforth.Themorevariables,themerrier. Fromthisperspective,wecanthinkaboutwhatsetsoffeaturesarecontributedbydifferentdisciplinesandtheories.Whatfeaturesdoestextbookeconomictheorycontribute?Constrainedutility-maximizationinitsmostfamiliarformpointstoonlythreekindsofvariables—prices,information(whichcaninformutilities)andconstraints. Mostpropositionsinbehavioraleconomicsaddsomevariablestothislistoffeatures,suchasreference-dependence,context-dependence(menueffects),anchoring,etc. Anotherwaytosearchforpredictivevariablesistospecifyaverylonglistofcandidatevariables(=features)andincludealloftheminanMLapproach1.Thisapproachhastwoadvantages:First,simpletheoriescanbeseenasbetsthatonlyasmallnumberoffeatureswillpredictwell.Second,iflongerlistsoffeaturespredictbetterthanashortlistoftheory-specifiedfeatures,thenthatfindingestablishesaplausibleupperboundonhowmuchpotentialpredictabilityislefttounderstand.Theresultsarealsolikelytocreaterawmaterialfortheorytofigureouthowtoconsolidatetheadditionalpredictivepowerintocrystallizedtheory(seealsoKleinberg,Liang,andMullainathan2015).

1Anotherexampleisusingmultivoxelpatternanalysisto“decode”brainactivityfor

Page 3: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

Ifbehavioraleconomicsisthoughtofasopen-mindednessaboutwhatvariablesmightpredict(asitisforme)thenanMLapproachwithmanyvariablesisapotentiallyusefulapproach.I’llillustrateitwithsomeexamples.2 Bargaining:Thereisalonghistoryofbargainingexperimentstryingtopredictwhatbargainingoutcomes(anddisagreementrates)willresultfromstructuralvariablesusinggame-theoreticmethods.Inthe1980stherewasasharpturn,inexperimentalwork,towardsnoncooperativeapproachesinwhichthecommunicationandstructureofbargainingwascarefullyfixed.Thishappenedbecausegametheory,atthetime,deliveredsharpnewpredictionsaboutwhatoutcomesmightresultdependingonthestructuralparameters(suchastheorderofbargainingoffers,discountratesovertime,etc.) However,mostnaturalbargainingisnotgovernedbysuchstrictrules.Therefore,itisimportanttounderstandwhatbargainingoutcomesmightresultwhenbargainingis“semi-structured”.“Semi-structured”meansthereisadeadlineandprotocolforacceptancebutotherwisenorestrictionsonwhocanofferwhatatwhattime(andpotentiallyincludinglanguage). Unstructuredbargainingisripeformachinelearning.IntheexperimentsofCamereretal(2017),twoplayersbargainoverhowtodivideanamountofmoneyworth$1-$6(inintegervalues).Oneinformed(I)playerknowstheamount;theother,uniformed(U)player,doesn’tknowtheamount.TheyarebargainingoverhowmuchtheuninformedUplayerwillget.ButbothplayersknowthatIknowstheamount. Theybargainover10secondbymovingcursorsonabargainingnumberline(Figure1).Thedatacreatedineachtrialisatimeseriesofcursorlocationswhichareaseriesofstepfunctionscomingfromalowoffertohigherones(representingincreasesinoffersfromI)andfromhigherdemandstolowerones(representingdecreasingdemandsfromU).

2Fudenberg-Liangaddlater

Page 4: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

Figure1:(A)Initialofferscreen(forinformedplayerI,redbar);(B)examplecursorlocationsafter3secs(indicatingamountofferedbyI,red,ordemandedbyU,blue).(C)cursorbarsmatchwhichindicatesanoffer,consummatedat6seconds.(D)FeedbackscreenforplayerI.PlayerUalsoreceivesfeedbackaboutpiesizeandprofitifatradewasmade(otherwisezero). Supposearetryingtopredictwhethertherewillbeanagreementornotbasedoneverythingthatcanbeobserved.Fromatheoreticalpointofview,efficientbargainingbasedonrevelationprincipleanalysispredictsanexactrateofdisagreementforeachoftheamounts$1-6,basedonlyonthedifferentamountsavailable.Remarkably,thispredictionisprocess-free. However,fromanMLpointofviewtherearelotsoffeaturesrepresentingwhattheplayersaredoingwhichcouldaddpredictivepower(besidestheprocess-freepredictionbasedontheamountatstake).Bothcursorlocationsarerecordedevery25msec.Thetimeseriesofcursorlocationsisassociatedwithahugenumberoffeatures—howfarapartthecursorsare,thetimesincelastconcession[=cursormovement],sizeoflastconcession,interactionsbetweenconcessionamountsandtimes,etc. Ongoingexperimentsalsorecordfacialexpressionsandpsychophysiology(skinconductance).ThemainpointtoappreciateisthatfromanMLpointofview,recordingeverythingpossibleduringthebargaining,andabouttheplayers(personality,wealth,mood)couldallpotentiallyimprovepredictionofwhetherthereisdisagreement. Figure2showsanROCcurveindicatingtest-setaccuracyinpredictingwhetherabargainingtrialendsinadisagreement(=1)ornot.ROCcurvessketchoutcombinationsoftruepositiverates,P(disagree|predictdisagree)andfalsepositiveratesP(agree|predictdisagree).AnimprovedROCcurvemovesupandto

Figure 1: Bargaining interface. (a) Initial offer screen: in the first two seconds of bargaining,players set their initial position, oblivious to the initial position of their partner. The piesize at the top left corner appears only for the informed type. (b) Players communicate theiroffers using mouse click on the interface. (c) When demands match, feedback in the formof a green vertical stripe appears on the screen. If no changes are made in the following 1.5seconds, a deal is made. (d) Following the game, both players are notified regarding theirpayoffs and the pie size.

Thus, in order to make a deal, the latest time in which players’ bids could match wast = 8.5 seconds.

8. If no deal had been made within 10 seconds of bargaining, both players’ payoffs fromthat round were $0.

9. After each game, both players were told their payoffs and the actual pie size, for 5seconds (see Fig. 1d).

4.2 Methods

We conducted eight experimental sessions, five at the Caltech SSEL and three at the UCLACASSEL labs. There were a total of N=110 subjects (mean age: 21.3 SD: 2.4; 47 females, seeAppendix B for detailed session information). In the beginning of each session, subjects were

14

Page 5: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

theleft,reflectingmoretruepositivesandfewerfalsepositives.Asisevident,predictingfromprocessdataonlyisaboutasaccurateasusingjusttheamount(“pie”)sizes(theblueandgreenROCcurves).Usingbothtypesofdataimprovespredictionsubstantially(theredcurve).

Figure2:ROCcurvesshowingcombinationsoffalseandtruepositiveratespredictingbargainingdisagreements.Improvedforecastingisrepresentedbycurvesmovingtotheupperleft.Thecombinationofprocess(cursorlocationfeatures)and“pie”(amount)dataareaclearimprovementovereithertypeofdataalone. Machinelearningisabletofindpredictivevalueindetailsofhowthebargainingoccurs(beyondthesimple,andverygood,predictionfromtheamountbeingbargainedover).Ofcourse,thisdiscoveryisthebeginningofthenextstepforbehavioraleconomics.Itraisesquestions:Whatvariablespredict?Dopeopleunderstandwhythosevariablesareimportant?Cantheyconstrainexpressionofvariablesthathurttheirbargainingpower?Canmechanismsbedesignedthatrecordthesevariablesandthencreateefficientmediationwithnodisagreementintowhichpeoplewillvoluntarilyparticipate? Riskychoice:PeysakhovichandNaecker(2017)usemachinelearningtoanalyzedecisionsbetweensimplefinancialrisks.Thesetofrisksarerandomlygeneratedtriples($y,$x,0)withassociatedprobabilities(p_x,p_y,p_0).Subjectsgiveawillingness-to-pay(WTP)foreachgamble. Thefeaturesetisthefiveprobabilityandamountvariables(excludingthe$0payoff),quadratictermsforallfive,andalltwo-andthree-wayinteractionsamongthelinearandquadraticvariables.Foraggregate-levelestimationthiscreates5+5+45+120=175variables.

process data outperforms the classifier using the pie size alone for every time stamp in thebargaining process (Fig. 7d). These results show that the process of bargaining itself canlead to bargaining failures, above and beyond the pie size alone.35

Figure 7: Strike prediction using bargaining process data. (a-c) Receiver Operating Char-acteristic (ROC) for predicting disagreements, 2 and 7 seconds into the bargaining game.The dashed lines represent the false and true positive rates of a random classifier. (d) Areaunder the curve (AUC) of disagreements clasiffiers using process data, pie size, and the twocombined. Note that the classifier’s input included only trials that were still in progress(when a deal has not yet been achieved), and excluded trials in which the offers and demandwere equal at the relevant time stamp.

35We formally tested the predictive accuracy of process data above and beyond the pie size by comparingthe mean square error of the model that was trained using both process data and pie to the model thatwas trained using only the pie. Paired t-tests of the squared errors from both models (with adjustmentsfor clustering at the session level) showed that adding process data significantly decreased out-of-sampleprediction error for all times greater than or equal to 2 seconds (at the 5% level), with a marginally significantdecrease in prediction error in the first second (p=0.051).

29

Page 6: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

MLpredictionsarederivedfromregularizedregressionwithalinearpenalty(LASSO)orsquaredpenalty(ridge)for(absolute)coefficients.ParticipantswereN=315MTurksubjectswhoeachgave10useableresponses.Thetrainingsetconsistsof70%oftheobservations,30%areheldoutasatestset.

Theyalsoestimatepredictiveaccuracyofaone-variableexpectedutilitymodel(EU,withpowerutility)andaprospecttheory(PT)modelwhichaddsoneadditionalparametertoallownonlinearprobabilityweighting(TverskyandKahneman,1992)(withseparateweights,notcumulativeones).Forthesemodelsthereareonly1or2freeparametersperperson.3

Theaggregatedataestimationusesthesamesetofparametersforallsubjects.Inthisanalysis,thetestsetaccuracy(meansquarederror) isalmostexactlythesameforPTandforbothLASSOandridgeMLpredictions,eventhoughPTusesonlytwovariablesandtheMLmethodsuse175variables.Individuallevelanalysis,inwhicheachsubjecthastheirownparametershasabouthalfthemeansquarederrorastheaggregateanalysis.PTandridgeMLareaboutequallyaccurate.

ThefactthatPTandMLareequallyaccurateisabitsurprising,becausetheMLmethodallowsquitealotofflexibilityinthespaceofpossiblepredictions.Indeed,theauthors’motivationwastouseMLtoshowhowamodelwithahugeamountofflexibilitycouldfit,possiblytoprovideaceilinginachievableaccuracy.IftheMLpredictionsweremoreaccuratethanEUorPT,thegapwouldshowhowmuchimprovementcouldbehadbymorecomplicatedcombinationsofoutcomeandprobabilityparameters.Buttheresult,instead,showsthatmuchbusiermodelsarenotmoreaccuratethanthetime-testedtwo-parameterformofPT,forthisdomainofchoices.

III:HumanpredictionasimperfectmachinelearningSomepre-historyofbehavioraleconomics Behavioraleconomicsasweknowit,anddescribeitnowadays,begantothrivewhenchallengestosimplerationalityprinciples(thencalled“anomalies”)cametohaveruggedempiricalstatusandtopointtonaturalimprovementsintheory(Thaler,Misbehaving;Lewis,Undoing).Itwascommoninthoseearlydaystodistinguishanomaliesabout“preferences”,suchasmentalaccountingviolationsof

3 Note,however,thattheMLfeaturesetdoesnotexactlynesttheEUandPTforms.Forexample,aweightedcombinationofthelinearoutcomeXandthequadratic

termX2doesnot exactly equal the power function Xα.

Page 7: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

fungibilityandreference-dependence,andanomaliesabout“judgment”oflikelihoodsandquantities. Somewhathiddenfromeconomists,atthetimeandevennow,wasthefactthattherewasactiveresearchinmanyareasofjudgmentanddecisionmaking(JDM)proceedinginparallelandconductedalmostentirelyinpsychologydepartmentsandsomebusinessschools.JDMresearchwasaboutthosejudgment“anomalies”.Thisresearchflourishedbecausetherewasahealthyrespectforsimplemathematicalmodelsandcarefultesting,whichenabledregularitiestocumulateandgavereasonstodismissweakresults.Theresearchcommunityalsohadonefootinpracticaldomainstoo(suchasjudgmentsofnaturalrisks,medicaldecisionmaking,law,etc.)sothatgeneralizabilityoflabresultswasimplicitlyaddressed. AnimportantongoingdebateinJDMwasaboutthecognitiveprocessesinvolvedinactualdecisions,andthequalityofthosepredictions.Therewereplentyofcarefullabexperimentsaboutsuchphenomena,butalsoanearlierliteratureonwhatwasthencalled“clinicalversusstatisticalprediction”.Thereliestheearliestcomparisonbetween(primitiveformsof)AIand(thejudgmentpartof)behavioraleconomics. PaulMeehl’s(1954)compactbookstarteditall.Meehlwasaremarkablecharacter.Hewasarareexample,atthetime,ofaworkingclinicalpsychiatristwhowasalsointerestedinstatisticsandevidence(aswereothersatMinnesota).MeehlhadapictureofFreudinhisoffice,andpracticedclinicallyfor50yearsintheVeteran’sAdministration. Meehl’smotherhaddiedwhenhewas16,undercircumstanceswhichapparentlymadehimsuspiciousofhowmuchdoctorsactuallyknewabouthowtomakesickpeoplewell. Hisbookcouldbereadaspursuitofsuchasuspicionscientifically:Hecollectedallthestudieshecouldfind(22)whichcomparedasetofclinicaljudgmentswithactualoutcomes,andwithsimplelinearmodelsusingobservablepredictors(someobjectiveandsomesubjectivelyestimated).Thedomainsofjudgmentincluded____. Meehl’sideawasthatthesestatisticalmodelscouldbeusedasabenchmarktoevaluateclinicians.4AsDawesandCorrigan(1974,p.97)wrote

"thestatisticalanalysiswasthoughttoprovideafloortowhichthejudgment oftheexperiencedcliniciancouldbecompared.Thefloorturnedouttobea ceiling.”

Ineverycasethestatisticalmodeloutpredictedortiedthejudgmentaccuracyoftheaverageclinician.Alatermeta-analysisof117studies(Groveetal

4CheckifthisisinMeehl“whatIexpectedtobeafloorturnedouttobeaceiling”

Page 8: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

2000)foundonlysixinwhichclinicians,onaverage,weremoreaccurate(andseeDawesetal2006Sci). Itispossiblethatinanyonedomain,thedistributionofclinicianscontainssomestarswhocouldpredictmuchmoreaccurately.However,laterstudiesattheindividuallevelshowedthatonlyaminorityofcliniciansweremoreaccuratethanstatisticalmodels(e.g.Goldberg,1970).Kleinbergetal(2017)’sstudyofmachine-learnedandjudicialdetentiondecisionsisamodernexampleofthesametheme. Inthenextcoupleofdecadesevidencebegantomountaboutwhyclinicaljudgmentcouldbesoimperfect.Acommonthemewasthatcliniciansweregoodatmeasuringparticularvariables,orsuggestingwhichobjectivevariablestoinclude,butwerenotsogoodatcombiningthemconsistently(e.g.Sawyer1966).InarecollectionMeehl(1986)gaveasuccinctdescriptionofthistheme(p373): Whyshouldpeoplehavebeensosurprisedbytheempiricalresultsinmy summarychapter?Surelyweallknowthatthehumanbrainispoorat weightingandcomputing.Whenyoucheckoutatasupermarket,youdon’t eyeballtheheapofpurchasesandsaytotheclerk,“Wellitlookstomeasif it’sabout$17.00worth;whatdoyouthink?”Theclerkaddsitup.Thereare nostrongarguments,fromthearmchairorfromempiricalstudiesof cognitivepsychology,forbelievingthathumanbeingscanassignoptimal weightsinequationssubjectivelyorthattheyapplytheirownweights consistently,thequeryfromwhichLewGoldbergderivedsuchfascinating andfundamentalresults. ManyoftheimportantcontributionswereincludedintheKahnemanetal(1982)book(whichintheolddayswascalledthe“blue-greenbible”). Inoneclassicstudy,Oskamp(1965)hadeightexperiencedclinicalpsychologists,and24graduateandundergraduatestudents,readmaterialaboutanactualperson,infourstages.Thefirststagewasjustthreesentencesgivingbasicdemographics,education,andoccupation.Thenextthreestageswere1.5-2pageseachaboutchildhood,schooling,andthesubject’stimeinthearmyandbeyond. Thesubjectshadtoanswer25personalityquestionsaboutthesubject,eachwithfivemultiple-choiceanswers5,aftereachofthefourstagesofreading.Chanceguessingwouldbe20%accurate.Notethatthesequestionshadcorrectanswers,basedonotherevidenceaboutthecase, Oskamplearnedtwothings:First,therewasnodifferenceinaccuracybetweentheexperiencedcliniciansandthestudents. Second,allthesubjectswerebarelyabovechance;andaccuracydidnotimproveastheyreadmorematerialinthethreestages.Afterjustthefirst5Forexample,“Kidd’spresentattitudetowardhismotherisoneof:(a)Loveandrespectforherideals;(b)affectionatetoleranceforherfoibles”;etc.

Page 9: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

paragraph,theiraccuracywas26%;afterreadingallfiveadditionalpagesacrossthethreestages,accuracywas28%(aninsignificantdifferencefrom26%).However,subjectiveconfidenceinhowaccuratetheywererosealmostlinearlyastheyreadmore,from33%to53%.6 Thisincreaseinconfidence,combinedwithnoincreaseinaccuracy,isreminiscentofthedifferencebetweentrainingsetandtestsetaccuracyinAI.Asmoreandmorevariablesareincludedinatrainingset,theaccuracywillalwaysincreaseunlessfitispenalizedinsomeway.Asaresultofoverfitting,however,test-setaccuracycandeclinewithmorevariables.Theresultinggapbetweentraining-andtest-setaccuracywillgrow,muchastheoverconfidenceinOskamp’ssubjectsgrewwithmore“variables”(morematerialonthesinglesubject). Someotherimportantfindingsemerged.Onedrawbackofthestatisticalpredictionapproach,forpractice,wasthatitrequireslargesamplesofhigh-qualityoutcomedata(labeleddata,forsupervisedlearning,inAIlanguage).Thesewererarelyavailableatthetime. Dawes(1979)proposedtogiveuponestimatingvariableweightsthroughacriterion-optimizing“proper”procedurelikeOLS7,using“improper”weightsinstead.Anexampleisequal-weightingofstandardizedvariables,whichisoftenaverygoodapproximationtoOLSweighting. AninterestingexampleofimproperweightsiswhatDawescalled“bootstrapping”(adistinctusagefromtheconceptinstatisticsofbootstrapresampling).Theideawastoregressclinicaljudgmentsonpredictors,andusethoseestimatedweightstomakeprediction.Thisisequivalent,ofcourse,tousingthepredictedpartoftheclinical-judgmentregressionanddiscarding(orregularizingtozero,ifyouwill)theresidual.Iftheresidualismostlynoisethencorrelationaccuraciescanbeimprovedbythisprocedure,andtheytypicallyare(e.g.Camerer,1981).Successfulexamplesincluded___. Laterstudiesindicatedaslightlymoreoptimisticpicturefortheclinicians.Ifbootstrap-regressionresidualsarepurenoise,theywillalsolowerthetest-retestreliabilityofclinicaljudgment(i.e.,thecorrelationbetweentwojudgmentsonthesamecasesmadebythesameperson).However,analysisofthefewstudieswhichreportbothtest-retestreliabilityandbootstrappingregressionsindicatethatonlyabout40%oftheresidualvarianceisunreliablenoise(Camerer1981a).Thus,residualsdocontainreliablesubjectiveinformation(thoughitmaybeuncorrelatedwithoutcomes).BlattbergandHoch(1990)laterfoundthatforactualmanagerialforecastsofproductsalesandcouponredemptionrate,residualsarecorrelatedabout.30withoutcomes.Asaresult,averagingstatisticalmodelforecastsand6Otherresultscomparingmore-andless-experiencedclinicians,however,havealsoconfirmedthefirstfinding(experiencedoesnotimproveaccuracymuch),butfoundthatexperiencetendstoreduceoverconfidence(Goldberg1959).7Presciently,Dawesalsomentionsusingridgeregressionasaproperproceduretomaximizeout-of-samplefit.

Page 10: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

managerialjudgmentsimprovedpredictionsubstantiallyoverstatisticalmodelsalone.

Sparsityisgoodforyoubuttastesbad

Anotherfeatureoftheearlystatistical-vs-clinicalliteraturewasanemphasisonhowsmallnumbersofvariablesmightfitsurprisinglywell.

AstrikingexampleinDawes(1979)isatwovariablemodelpredictingmaritalhappiness:Therateoflovemakingminustherateoffighting.Hereportscorrelationsof.40and.81intwostudies(EdwardsandEdwards,1977;Thornton1977).8

Forexample,Dawes(1971)didafamousstudyaboutadmittingstudentstotheUniversityofOregonPhDprograminpsychologyfrom1964-67.Hecomparedathree-variablemodelbasedonGRE,undergraduateGPA,andqualityoftheapplicant’sundergraduateschooltoanadmissionscommittee’squantitativerecommendation.Theoutcomevariablewasfacultyratings1969.(Thevariableswerestandardized,thenweightedequally.)Thissimplemodelcorrelatedwithlatersuccessintheprogrammorehighly(.48,cross-validated)thananadmissionscommittee’squantitativerecommendation(.19).9Thebootstrappingmodelcorrelated.25. DespiteDawes’sevidence,Ihaveneverbeenabletoconvinceanygraduateadmissionscommitteeattwoinstitutions(PennandCaltech)toactuallycomputestatisticalratings,evenasawaytofilterout“certainrejection”applications. Whynot? Ithinktheansweristhatthehumanmindrebelsagainstregularizationandtheresultingsparsity.Weareborntooverfit.EveryAIresearcherknowsthatincludingfewervariables(e.g.,bygivingmanyofthemzeroweightsinLASSO,orlimitingtreedepthinrandomforests)isausefulall-purposeprophylacticforoverfittingatrainingset.

8Morerecentanalysesusingtranscribedverbalinteractionsgeneratecorrelationsfordivorceandmaritalsatisfactionaround.6-.7.Thecorevariablesarecalledthe“fourhorsemen”ofcriticism, defensiveness, contempt, and “stonewalling" (listener withdrawal).

9Readersmightguessthatthequalityofeconometricsforinferenceinsomeoftheseearlierpapersislimited.Forexample,Dawes(1971)onlyusedthe111studentswhohadbeenadmittedtotheprogramandstayedenrolled,sothereislikelyscalecompression,etc.Someofthefacultymembersratingthosestudentswereprobablyalsoinitialraterswhichcouldgenerateconsistencybiasesetc.

Page 11: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

Butpeopledonotliketoexplicitlythrowawayinformation.Thisisparticularlytrueiftheinformationisalreadyinfrontofus—intheformofaPhDadmissionsapplication,forexample.Ittakessomecombinationofwillpower,arrogance,orwhathaveyou,tosimplyignorelettersofrecommendation,forexample. Theposterchildforusingworthlessinformationispersonalshortface-to-faceinterviews(___cites).Thereisamountainofevidencethatsuchinterviewsdonotpredictanythingaboutlaterworkperformance,ifinterviewsareuntrainedanddonotuseastructuredinterviewformat.AlikelyexampleisinterviewingfacultycandidateswithnewPhDs,inhotelsuitesattheASSAmeetings.SupposethegoalofsuchinterviewsistopredictwhichnewPhDswilldoenoughterrificresearch,goodteaching,andotherkindsofserviceandpublicvaluetogettenureseveralyearslater. Butthebrainofaninterviewerprobablyhasmorebasicthingsonitsmind.Isthispersonwell-dressed?Wouldtheyprotectmeifthereisdanger?Friendorfoe?Doestheiraccentandwordchoicesoundlikemine? Peoplewhodotheseinterviews(includingme)sayexplicitlythatwearetryingtoprobethecandidate’sdepthofunderstandingabouttheirtopic,howpromisingnewplannedresearchis,etc.Butwereallyareevaluatingis“Dotheybelonginmytribe?” WhileIdothinksuchinterviewsareawasteoftime10,itisconceivablethattheygeneratesomeinformationthatisvalid.Theproblemisthatinterviewersmayweightthewronginformation(aswellasoverweightingfeaturesthatshouldberegularizedtozero).Indeed,nowadaysthebestmethodtocapturesuchinformationisprobablytovideotapetheinterview,combineitwithothertasksresemblingworkperformance(e.g.,havethemreviewadifficultpaper),andmachinelearntheheckoutofthatlargercorpusofinformation. AnothersimpleexampleofwhereignoringinformationiscounterintuitiveiscapturedbythetwomodesofforecastingwhichKahnemanandLovallo(19_)wroteabout.Theycalledthem“inside”and“outside”view.Thetwoviewswereinthecontextofforecastingtheoutcomeofaproject(suchaswritingabook,orabusinessinvestment).Theinsideview “focusedonlyonaparticularcase,byconsideringtheplananditsobstaclestocompletion,byconstructingscenariosoffutureprogress”(p25). Theoutsideview “focussesonthestatisticsofaclassofcaseschosentobesimilarinrelevant respectstothecurrentone”(p25)

10Therearemanycaveatsofcoursetothisstrongclaim.Forexample,oftentheschoolispitchingtoattractahighlydesirablecandidate,nottheotherwayaround.

Page 12: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

Theoutsideviewdeliberatelythrowsawaymostoftheinformationaboutaspecificcaseathand(butkeepssomeinformation):Itreducestherelevantdimensionstoonlythosethatarepresentintheoutsideviewreferenceclass.(Thisis,again,aregularizationthatzerosoutallthefeaturesthatarenot“similarinrelevantrespects”.) InMLterms,theoutsideandinsideviewsarelikedifferentkindsofclusteranalyses.TheoutsideviewparsesallpreviouscasesintoKclusters;acurrentcasebelongstooneclusteroranother(thoughthereis,ofcourse,adegreeofclustermembershipdependingonthedistancefromclustercentroids).Theinsideview—initsextremeform—treatseachcaseasunique.Hypothesis:Humanjudgmentislikeoverfittedmachinelearning ThecoreideaIwanttoexploreisthatsomeaspectsofeverydayhumanjudgmentcanbeunderstoodasthetypeoferrorsthatwouldresultfrombadly-donemachinelearning.11I’llfocusontwoaspects:Overconfidenceandhowitincreases;and;limitederrorcorrection. Inbothcases,weareimplicitlytalkingaboutaresearchprogramwhichcollectsdataonhumanpredictionsandcomparethemwithmachine-learnedpredictions.Thendeliberatelyre-dothemachinelearningbadly(e.g.,failingtocorrectforover-fitting)andseewhethertheimpairedMLpredictionshavepropertiesofhumanones. Overconfidence:Overconfidencecomesindifferentflavors.Inthepredictivecontextwewilldefineitashavingtoonarrowaconfidenceintervalaroundaprediction.(Inregression,forexample,thismeansunderestimatingthestandarderrorofaconditionalpredictionP(Y|X)basedonobservablesX.) Itcouldbethathumanoverconfidenceresultsfromafailuretowinnowthesetofpredictors(asinLASSOpenaltiesforfeatureweights).Overconfidenceofthistypecouldcorrespondtowhatoneexpectsfromoverfitting:Hightrainingsetaccuracycorrespondstoconfidenceaboutpredictions.Adropinaccuracyfromtrainingtotestmeansthatpredictionswerenotasaccurateashoped. Limitederrorcorrection:InsomeMLprocedurestrainingtakesplaceovertrials.Forexample,theearliestneuralnetworksweretrainedbymakingoutputpredictionsbasedonasetofnodeweights,thenback-propagatingpredictionerrorstoadjusttheweights.Earlycontributionsintendedforthisprocesstocorrespondtohumanlearning—e.g.,howchildrenlearntorecognizecategoriesofnaturalobjectsortolearnpropertiesoflanguage(e.g.McClellandandRumelhart). Onecanthenaskwhethersomeaspectsofadulthumanjudgmentcorrespondtopoorimplementationoferror-correction.Aninvisibleassumptionthatis,ofcourse,partofneuralnetworktrainingisthatoutputerrorsare

11MyintuitionaboutthiswasaidedbyJesseShapiro,whoaskedawell-craftedquestionpointingstraightinthisdirection.

Page 13: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

recognized(iflearningissupervisedbylabeleddata).Butwhatifhumansdonotrecognizeerrororrespondtoitinappropriately? Onemaladaptiveresponsetopredictionerroristoaddfeatures.Forexample,supposeacollegeadmissionsdirectorhasapredictivemodelandthinksstudentswhoplaymusicalinstrumentshavegoodstudyhabitsandwillsucceedinthecollege.NowastudentcomesalongwhoplaysdrumsintheDeadMilkmenpunkband.Thestudentgetsadmitted(becauseplayingmusicisagoodfeature),butstrugglesincollegeanddropsout. Theadmissionsdirectorcouldback-propagatethepredictiveerrortoadjusttheweightsonthe“playsmusic”feature.Orshecouldcreateanewfeaturebysplitting“playsmusic”into“playsdrums”and“playsnon-drums”andignoretheerror.Thisprocedurewillgeneratetoomanyfeaturesandwillnotuseerror-correctioneffectively.12 (Notetoothatadifferentadmissionsdirectormightcreatetwodifferentsubfeatures,“playsmusicinapunkband”and“playsnon-punkmusic”.Inthestylizedversionofthisdescription,bothwillbecomeconvincedthattheyhaveimprovedtheirmentalmodelandwillretainhighconfidenceaboutfuturepredictions.Buttheirinter-raterreliabilitywillhavegonedown,whichputsamathematicalcaponhowgoodaveragepredictiveaccuracycanbe.Moreonthisnext.)IV:AItechnologyasabionicpatch,ormalware,forhumanlimits Wespendalotoftimeinbehavioraleconomicsthinkingabouthowpoliticalandeconomicsystemseitherexploitbadchoicesorhelppeoplemakegoodchoices.Whatbehavioraleconomicshastooffertothisgeneraldiscussionistospecifyamorepsychologicallyaccuratemodelofhumanchoiceandhumannaturethanthecaricatureofconstrainedutility-maximization(asusefulasithasbeen). AIentersbycreatingbettertoolsforbothmakinginferencesaboutwhatapersonwantsorwilldo.Sometimesthesetoolswillhurtandsometimestheywillhelp. AIhelps:Aclearexampleisrecommendersystems.Goodsystemsareakindofbehavioralprosthetictoremedyhumanlimitsonattentionandtheresultingincompletenessofpreferences. ConsiderNetflixmovierecommendations.Netflixusesaperson’sviewingandratingshistory,aswellasopinionsofothersandmovieproperties,asinputstoavarietyofalgorithmstosuggestwhatcontenttowatch.Astheirdatascientistsexplained(Gomez-UribeandHunt,2015):

atypicalNetflixmemberlosesinterestafterperhaps60to90secondsof choosing,havingreviewed10to20titles(perhaps3indetail)ononeortwo12Anotherwaytomodelthisisastherefinementofapredictiontree,wherebranchesareaddedfornewfeaturewhenpredictionsareincorrect.Thiswillgenerateabushytree,whichgenerallyharmstest-setaccuracy.

Page 14: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

screens...Therecommenderproblemistomakesurethatonthosetwo screenseachmemberinourdiversepoolwillfindsomethingcompellingto view,andwillunderstandwhyitmightbeofinterest.

Forexample,their“BecauseYouWatched”lineusesavideo-videosimilarityalgorithmtosuggestunwatchedvideossimilartoonestheuserwatchedandliked.

Therearesomanyinterestingimplicationsofthesekindsofrecommendersystemsforeconomicsingeneral,andforbehavioraleconomicsinparticular.

Forexample,notethatNetflixwantsitsmembersto“understandwhyit[arecommendedvideo]mightbeofinterest”.Thisis,atbottom,aquestionaboutinterpretabilityofAIoutput,howamemberlearnsfromrecommendersuccessesanderrors,andwhetheramember“trusts”Netflixingeneral.Allthesearepsychologicalprocesseswhichmayalsodependheavilyondesignandexperiencefeatures(UD,UX).

AI‘hurts’13:AnotherfeatureofAI-drivenpersonalizationispricediscrimination.Ifpeopledoknowalotaboutwhattheywant,andhaveprecisewillingness-to-pay(WTP),thencompanieswillquicklydevelopthecapacitytopersonalizepricestoo.Thisseemstobeaconceptthatisemergingrapidlyanddesperatelyneedstobestudiedbyindustrialeconomistswhocanfigureoutwelfareimplications.Behavioraleconomicsenterswithsomeevidenceabouthowpeoplemakejudgmentsaboutfairnessofprices(e.g.,Kahneman,KnetschandThaler,198614)andhowfairnessjudgmentsinfluencesbehavior. Myintuitionisthatingeneral,peoplecancometoacceptahighdegreeofvariationinpricesforwhatisessentiallythesameproduct,aslongasthereisveryminorproductdifferentiation15orfirmscanarticulatewhydifferentpricesarefair(whichrequiresinsightintohowconsumersthink,andheterogeneity).Itisalsolikelythatpersonalizedpricingwillharmconsumerswhoarethemosthabitualorwhodonotshopcleverly,butwillhelpsavvyconsumerswhocanhijackthepersonalizationalgorithmstolooklikelowWTPconsumersandsavemoney.SeeGabaixandLaibson(2006?)foracarefullyworked-outmodelabouthidden(“shrouded”)productattributes.V:ConclusionTBA

13Iputtheword‘hurts’inquoteshereasawaytoconjecture,throughpunctuation,thatinmanyindustriestheAI-drivencapacitytopersonalizepricingwillharmconsumerwelfareoverall.14Recent?15mariachi

Page 15: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

References (partial)

Blattberg, R. C., & Hoch, S. J. (1990). Database models and managerial intuition: 50% database + 50% manager. Management Science 36(8). 887-899.

Camerer, C. F. (1981a). The validity and utility of expert judgment. Unpublished Ph.D. dissertation, Center for Decision Research, University of Chicago Gradu- ate Schoo] of Business.

Camerer, C. F. (1981b). General conditions for the success of bootstrapping models. Organizational Behavior and Human Performance, 27, 411-422.

Camerer, C.F and Eric Johnson. 1991. The process-performance paradox in expert judgment: Why can experts know so much and predict so badly? ” In A. Ericsson and J. Smith (Eds.), Toward a General Theory of Expertise: Prospects and Limits, Cambridge: Cambridge University Press, 1991.

Colin F. Camerer, Gideon Nave & Alec Smith. (2017) Dynamic unstructured bargaining with private information and deadlines: theory and experiment. Working paper

Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of Abnormal Psychology, 46, 271-280.

Dawes, R. M. (1971). A case study of graduate admissions: Application of three principles of human decision making. American Psychologist, 26, 180-188.

Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81, 97.

Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668-1674.

Edwards, D. D., & Edwards, J. S. Marriage: Direct and continuous measurement. Bulletin of the Psychonomic Society, 1977, 10, 187-188.

Einhorn, H. J. (1986). Accepting error to make less error. Journal of Personality Assessment, 50, 387-395.

Einhorn, H. J., & Hogarth, R. M. (1975). Unit weighting schemas for decision mak- ing. Organization Behavior and Human Performance, 13, 171-192.

Gabaix Laibson QJE

Page 16: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

Goldberg, L. R. Man versus model of man: A rationale, plus some evidence for a method of improving on clinical inferences. Psychological Bulletin, 1970, 73, 422-432;

Goldberg, L. R. (1959). The effectiveness of clinicians' judgments: The diagnosis of organic brain damage from the Bender-Gestalt test. Journal ofConsulting Psychol· ogy, 23, 25-33.

Goldberg, L. R. (1968). Simple models or simple processes? American Psychologist, 23, 483-496.

Gomez-Uribe, Carlos and Neil Hunt. (2016)The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Transactions on Management Information Systems (TMIS), 6(4), article 13.

Grove, W. M., Zald, D. H., Lebow, B. S., Snits, B. E., & Nelson, C. E. (2000) Clinical vs. mechanical prediction: A meta-analysis. Psychological Assessment, 12:19-30.

Johnson, E. 1. (1980). Expertise in admissions judgment. Unpublished doctoral dissertation, Carnegie-Mellon University.

Johnson, E. J. (1988). Expertise and decision under uncertainty: Performance and process. In M. T. H. Chi, R. Glaser & M. 1. Farr (Eds.), The nature of expertise (pp. 209-228). Hillsdale, NJ: Erlbaum.

Kahneman,KnetschandThaler,1986fairness

Kleinberg,Jon,AnnieLiang,andSendhilMullainathan.2015.“TheTheoryisPredictive,ButisItComplete?AnApplicationtoHumanPerceptionofRandomness.”Unpublishedpaper.

Kleinberg,Jon,HimabinduLakkaraju,JureLeskovec,JensLudwig,andSendhilMullainathan.2017.“HumanDecisionsandMachinePredic-tions.”NBERWorkingPaper23180.

Meehl,P.E.(1954).Clinicalversusstatisticalprediction:Atheoreticalanalysisandareviewoftheevidence.Minneapolis:UniversityofMinnesotaPress.

Mullainathan,SendhilandJannSpiess(2017)MachineLearning:AnAppliedEconometricApproach.JournalofEconomicPerspectives31(2):87-106

Peysakhovich,AlexanderandNaecker,Jeffrey.Usingmethodsfrommachinelearningtoevaluatebehavioralmodelsofchoiceunderriskandambiguity.JournalofEconomicBehavior&OrganizationVolume133,January2017,Pages373-384

Sawyer, Jack (1966). Measurement and prediction, -clinical and statistical. Psychological Bulletin, 66, 178-200.

Page 17: Artificial intelligence and behavioral economicsgovernance40.com/wp-content/uploads/2018/12/Camerer.pdfArtificial intelligence and behavioral economics Colin F. Camerer Caltech camerer@caltech.edu

Thornton, B. Linear prediction of marital happiness: A replication. Personality and Social Psychology Bulletin, 1977, 3, 674-676.

Daniel Kahneman and Dan Lovallo (1993) Timid Choices and Bold Forecasts: A Cognitive Perspective on Risk Taking Management Science 39:1 , 17-31

Klayman, J., & Ha, Y. (1985). Confirmation, disconfirmation, and information in hypothesis testing. Psychological Review, 211-228.