pcast big data and privacy - may 2014

76
 REPORT TO THE PRESIDENT BIG DATA AND PRIVACY: A TECHNOLOGICAL PERSPECTIVE Executive Office of the President President’s Council of Advisors on Science and Technology May 2014 

Upload: red-rex-revived

Post on 24-Nov-2015

42 views

Category:

Documents


1 download

DESCRIPTION

Brokers use ‘billions’ of data points to profile Americans. This is the report to the White House, which, one suspects, backs the collection of all information about its subjects...

TRANSCRIPT

  • ta

    REPORTTOTHEPRESIDENTBIGDATAANDPRIVACY:

    ATECHNOLOGICALPERSPECTIVE

    ExecutiveOfficeofthePresidentPresidentsCouncilofAdvisorson

    ScienceandTechnology

    May2014

  • REPORTTOTHEPRESIDENTBIGDATAANDPRIVACY:

    ATECHNOLOGICALPERSPECTIVE

    ExecutiveOfficeofthePresidentPresidentsCouncilofAdvisorson

    ScienceandTechnology

    May2014

  • AboutthePresidentsCouncilofAdvisorsonScienceandTechnology

    ThePresidentsCouncilofAdvisorsonScienceandTechnology(PCAST)isanadvisorygroupoftheNationsleadingscientistsandengineers,appointedbythePresidenttoaugmentthescienceand technology advice available to him from inside the White House and from cabinetdepartments and other Federal agencies. PCAST is consulted about, and oftenmakes policyrecommendationsconcerning,thefullrangeofissueswhereunderstandingsfromthedomainsof science, technology, and innovation bear potentially on the policy choices before thePresident.FormoreinformationaboutPCAST,seewww.whitehouse.gov/ostp/pcast

  • i

    ThePresidentsCouncilofAdvisorsonScienceandTechnology

    CoChairsJohnP.HoldrenAssistanttothePresidentforScienceandTechnologyDirector,OfficeofScienceandTechnologyPolicy

    EricS.LanderPresidentBroadInstituteofHarvardandMIT

    ViceChairsWilliamPressRaymerProfessorinComputerScienceandIntegrativeBiologyUniversityofTexasatAustin

    MaxineSavitzVicePresidentNationalAcademyofEngineering

    Members RosinaBierbaumDean,SchoolofNaturalResourcesandEnvironmentUniversityofMichigan

    ChristineCasselPresidentandCEONationalQualityForum

    ChristopherChybaProfessor,AstrophysicalSciencesandInternationalAffairsDirector,ProgramonScienceandGlobalSecurityPrincetonUniversity

    S.JamesGates,Jr.JohnS.TollProfessorofPhysicsDirector,CenterforStringandParticleTheoryUniversityofMaryland,CollegePark

    MarkGorenbergManagingMemberZettaVenturePartners

    SusanL.GrahamPehongChenDistinguishedProfessorEmeritainElectricalEngineeringandComputerScienceUniversityofCalifornia,Berkeley

  • ii

    ShirleyAnnJacksonPresidentRensselaerPolytechnicInstitute

    RichardC.Levin(throughmidApril2014)PresidentEmeritusFrederickWilliamBeineckeProfessorofEconomicsYaleUniversity

    MichaelMcQuadeSeniorVicePresidentforScienceandTechnologyUnitedTechnologiesCorporation

    ChadMirkinGeorgeB.RathmannProfessorofChemistryDirector,InternationalInstituteforNanotechnologyNorthwesternUniversity

    MarioMolinaDistinguishedProfessor,ChemistryandBiochemistryUniversityofCalifornia,SanDiegoProfessor,CenterforAtmosphericSciencesattheScrippsInstitutionofOceanography

    CraigMundieSeniorAdvisortotheCEOMicrosoftCorporation

    EdPenhoetDirector,AltaPartnersProfessorEmeritus,BiochemistryandPublicHealthUniversityofCalifornia,Berkeley

    BarbaraSchaalMaryDellChiltonDistinguishedProfessorofBiologyWashingtonUniversity,St.Louis

    EricSchmidtExecutiveChairmanGoogle,Inc.

    DanielSchragSturgisHooperProfessorofGeologyProfessor,EnvironmentalScienceandEngineeringDirector,HarvardUniversityCenterforEnvironmentHarvardUniversity

    StaffMarjoryS.BlumenthalExecutiveDirector

    AshleyPredithAssistantExecutiveDirector

    KnatokieFordAAASScience&TechnologyPolicyFellow

  • iii

    PCASTBigDataandPrivacyWorkingGroupWorkingGroupCoChairsSusanL.GrahamPehongChenDistinguishedProfessorEmeritainElectricalEngineeringandComputerScienceUniversityofCalifornia,Berkeley

    WilliamPressRaymerProfessorinComputerScienceandIntegrativeBiologyUniversityofTexasatAustin

    WorkingGroupMembersS.JamesGates,Jr.JohnS.TollProfessorofPhysicsDirector,CenterforStringandParticleTheoryUniversityofMaryland,CollegePark

    MarkGorenbergManagingMemberZettaVenturePartners

    JohnP.HoldrenAssistanttothePresidentforScienceandTechnologyDirector,OfficeofScienceandTechnologyPolicy

    WorkingGroupStaffMarjoryS.BlumenthalExecutiveDirectorPresidentsCouncilofAdvisorsonScienceandTechnology

    EricS.LanderPresidentBroadInstituteofHarvardandMIT

    CraigMundieSeniorAdvisortotheCEOMicrosoftCorporation

    MaxineSavitzVicePresidentNationalAcademyofEngineering

    EricSchmidtExecutiveChairmanGoogle,Inc.

    MichaelJohnsonAssistantDirectorNationalSecurityandInternationalAffairs

  • iv

  • EXECUTIVE OFFICE OF THE PRESIDENT PRESIDENTS COUNCIL OF ADVISORS ON SCIENCE AND TECHNOLOGY

    WASHINGTON, D.C. 20502 President Barack Obama The White House Washington, DC 20502 Dear Mr. President, We are pleased to send you this report, Big Data and Privacy: A Technological Perspective, prepared for you by the Presidents Council of Advisors on Science and Technology (PCAST). It was developed to complement and inform the analysis of big-data implications for policy led by your Counselor, John Podesta, in response to your requests of January 17, 2014. PCAST examined the nature of current technologies for managing and analyzing big data and for preserving privacy, it considered how those technologies are evolving, and it explained what the technological capabilities and trends imply for the design and enforcement of public policy intended to protect privacy in big-data contexts. Big data drives big benefits, from innovative businesses to new ways to treat diseases. The challenges to privacy arise because technologies collect so much data (e.g., from sensors in everything from phones to parking lots) and analyze them so efficiently (e.g., through data mining and other kinds of analytics) that it is possible to learn far more than most people had anticipated or can anticipate given continuing progress. These challenges are compounded by limitations on traditional technologies used to protect privacy (such as de-identification). PCAST concludes that technology alone cannot protect privacy, and policy intended to protect privacy needs to reflect what is (and is not) technologically feasible. In light of the continuing proliferation of ways to collect and use information about people, PCAST recommends that policy focus primarily on whether specific uses of information about people affect privacy adversely. It also recommends that policy focus on outcomes, on the what rather than the how, to avoid becoming obsolete as technology advances. The policy framework should accelerate the development and commercialization of technologies that can help to contain adverse impacts on privacy, including research into new technological options. By using technology more effectively, the Nation can lead internationally in making the most of big datas benefits while limiting the concerns it poses for privacy. Finally, PCAST calls for efforts to assure that there is enough talent available with the expertise needed to develop and use big data in a privacy-sensitive way. PCAST is grateful for the opportunity to serve you and the country in this way and hope that you and others who read this report find our analysis useful. Best regards,

    John P. Holdren Co-chair, PCAST

    Eric S. Lander Co-chair, PCAST

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    vii

    TableofContents ThePresidentsCouncilofAdvisorsonScienceandTechnology............................................iPCASTBigDataandPrivacyWorkingGroup...........................................................................iiTableofContents..................................................................................................................viiExecutiveSummary................................................................................................................ix1.Introduction........................................................................................................................1

    1.1Contextandoutlineofthisreport............................................................................11.2Technologyhaslongdriventhemeaningofprivacy................................................31.3Whatisdifferenttoday?..........................................................................................51.4Values,harms,andrights.........................................................................................6

    2.ExamplesandScenarios....................................................................................................112.1Thingshappeningtodayorverysoon....................................................................112.2Scenariosofthenearfutureinhealthcareandeducation.....................................13

    2.2.1Healthcare:personalizedmedicine.............................................................132.2.2Healthcare:detectionofsymptomsbymobiledevices..............................132.2.3Education....................................................................................................14

    2.3Challengestothehomesspecialstatus................................................................142.4Tradeoffsamongprivacy,security,andconvenience............................................17

    3.Collection,Analytics,andSupportingInfrastructure........................................................193.1Electronicsourcesofpersonaldata.......................................................................19

    3.1.1Borndigitaldata......................................................................................193.1.2Datafromsensors.......................................................................................22

    3.2Bigdataanalytics....................................................................................................243.2.1Datamining.................................................................................................243.2.2Datafusionandinformationintegration....................................................253.2.3Imageandspeechrecognition....................................................................263.2.4Socialnetworkanalysis...............................................................................28

    3.3Theinfrastructurebehindbigdata........................................................................303.3.1Datacenters................................................................................................303.3.2Thecloud....................................................................................................31

    4.TechnologiesandStrategiesforPrivacyProtection.........................................................334.1Therelationshipbetweencybersecurityandprivacy.............................................334.2Cryptographyandencryption................................................................................35

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    viii

    4.2.1WellEstablishedencryptiontechnology.....................................................354.2.2Encryptionfrontiers....................................................................................36

    4.3Noticeandconsent................................................................................................384.4Otherstrategiesandtechniques............................................................................38

    4.4.1Anonymizationordeidentification............................................................384.4.2Deletionandnonretention........................................................................39

    4.5Robusttechnologiesgoingforward.......................................................................404.5.1ASuccessortoNoticeandConsent............................................................404.5.2ContextandUse..........................................................................................414.5.3Enforcementanddeterrence......................................................................424.5.4OperationalizingtheConsumerPrivacyBillofRights.................................43

    5.PCASTPerspectivesandConclusions................................................................................475.1Technicalfeasibilityofpolicyinterventions...........................................................485.2Recommendations.................................................................................................495.4FinalRemarks.........................................................................................................53

    AppendixA.AdditionalExpertsProvidingInput...................................................................55SpecialAcknowledgment......................................................................................................57

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    ix

    ExecutiveSummaryTheubiquityofcomputingandelectroniccommunicationtechnologieshasledtotheexponentialgrowthofdatafrombothdigitalandanalogsources.Newcapabilitiestogather,analyze,disseminate,andpreservevastquantitiesofdataraisenewconcernsaboutthenatureofprivacyandthemeansbywhichindividualprivacymightbecompromisedorprotected.Afterprovidinganoverviewofthisreportanditsorigins,Chapter1describesthechangingnatureofprivacyascomputingtechnologyhasadvancedandbigdatahascometothefore.Thetermprivacyencompassesnotonlythefamousrighttobeleftalone,orkeepingonespersonalmattersandrelationshipssecret,butalsotheabilitytoshareinformationselectivelybutnotpublicly.Anonymityoverlapswithprivacy,butthetwoarenotidentical.Likewise,theabilitytomakeintimatepersonaldecisionswithoutgovernmentinterferenceisconsideredtobeaprivacyright,asisprotectionfromdiscriminationonthebasisofcertainpersonalcharacteristics(suchasrace,gender,orgenome).Privacyisnotjustaboutsecrets.ConflictsbetweenprivacyandnewtechnologyhaveoccurredthroughoutAmericanhistory.Concernwiththeriseofmassmediasuchasnewspapersinthe19thcenturyledtolegalprotectionsagainsttheharmsoradverseconsequencesofintrusionuponseclusion,publicdisclosureofprivatefacts,andunauthorizeduseofnameorlikenessincommerce.Wireandradiocommunicationsledto20thcenturylawsagainstwiretappingandtheinterceptionofprivatecommunicationslawsthat,PCASTnotes,havenotalwayskeptpacewiththetechnologicalrealitiesoftodaysdigitalcommunications.Pastconflictsbetweenprivacyandnewtechnologyhavegenerallyrelatedtowhatisnowtermedsmalldata,thecollectionanduseofdatasetsbyprivateandpublicsectororganizationswherethedataaredisseminatedintheiroriginalformoranalyzedbyconventionalstatisticalmethods.Todaysconcernsaboutbigdatareflectboththesubstantialincreasesintheamountofdatabeingcollectedandassociatedchanges,bothactualandpotential,inhowtheyareused.Bigdataisbigintwodifferentsenses.Itisbiginthequantityandvarietyofdatathatareavailabletobeprocessed.And,itisbiginthescaleofanalysis(termedanalytics)thatcanbeappliedtothosedata,ultimatelytomakeinferencesanddrawconclusions.Bydataminingandotherkindsofanalytics,nonobviousandsometimesprivateinformationcanbederivedfromdatathat,atthetimeoftheircollection,seemedtoraiseno,oronlymanageable,privacyissues.Suchnewinformation,usedappropriately,mayoftenbringbenefitstoindividualsandsocietyChapter2ofthisreportgivesmanysuchexamples,andadditionalexamplesarescatteredthroughouttherestofthetext.Eveninprinciple,however,onecanneverknowwhatinformationmaylaterbeextractedfromanyparticularcollectionofbigdata,bothbecausethatinformationmayresultonlyfromthecombinationofseeminglyunrelateddatasets,andbecausethealgorithmforrevealingthenewinformationmaynotevenhavebeeninventedatthetimeofcollection.Thesamedataandanalyticsthatprovidebenefitstoindividualsandsocietyifusedappropriatelycanalsocreatepotentialharmsthreatstoindividualprivacyaccordingtoprivacynormsbothwidely

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    x

    sharedandpersonal.Forexample,largescaleanalysisofresearchondisease,togetherwithhealthdatafromelectronicmedicalrecordsandgenomicinformation,mightleadtobetterandtimeliertreatmentforindividualsbutalsotoinappropriatedisqualificationforinsuranceorjobs.GPStrackingofindividualsmightleadtobettercommunitybasedpublictransportationfacilities,butalsotoinappropriateuseofthewhereaboutsofindividuals.AlistofthekindsofadverseconsequencesorharmsfromwhichindividualsshouldbeprotectedisproposedinSection1.4.PCASTbelievesstronglythatthepositivebenefitsofbigdatatechnologyare(orcanbe)greaterthananynewharms.Chapter3ofthereportdescribesthemanynewwaysinwhichpersonaldataareacquired,bothfromoriginalsources,andthroughsubsequentprocessing.Today,althoughtheymaynotbeawareofit,individualsconstantlyemitintotheenvironmentinformationwhoseuseormisusemaybeasourceofprivacyconcerns.Physically,theseinformationemanationsareoftwotypes,whichcanbecalledborndigitalandbornanalog.Wheninformationisborndigital,itiscreated,byusorbyacomputersurrogate,specificallyforusebyacomputerordataprocessingsystem.Whendataareborndigital,privacyconcernscanarisefromovercollection.Overcollectionoccurswhenaprogramsdesignintentionally,andsometimesclandestinely,collectsinformationunrelatedtoitsstatedpurpose.Overcollectioncan,inprinciple,berecognizedatthetimeofcollection.Wheninformationisbornanalog,itarisesfromthecharacteristicsofthephysicalworld.Suchinformationbecomesaccessibleelectronicallywhenitimpingesonasensorsuchasacamera,microphone,orotherengineereddevice.Whendataarebornanalog,theyarelikelytocontainmoreinformationthantheminimumnecessaryfortheirimmediatepurpose,andforvalidreasons.Onereasonisforrobustnessofthedesiredsignalinthepresenceofvariablenoise.Anotheristechnologicalconvergence,theincreasinguseofstandardizedcomponents(e.g.,cellphonecameras)innewproducts(e.g.,homealarmsystemscapableofrespondingtogesture).Datafusionoccurswhendatafromdifferentsourcesarebroughtintocontactandnewfactsemerge(seeSection3.2.2).Individually,eachdatasourcemayhaveaspecific,limitedpurpose.Theircombination,however,mayuncovernewmeanings.Inparticular,datafusioncanresultintheidentificationofindividualpeople,thecreationofprofilesofanindividual,andthetrackingofanindividualsactivities.Morebroadly,dataanalyticsdiscoverspatternsandcorrelationsinlargecorpusesofdata,usingincreasinglypowerfulstatisticalalgorithms.Ifthosedataincludepersonaldata,theinferencesflowingfromdataanalyticsmaythenbemappedbacktoinferences,bothcertainanduncertain,aboutindividuals.Becauseofdatafusion,privacyconcernsmaynotnecessarilyberecognizableinborndigitaldatawhentheyarecollected.Becauseofsignalprocessingrobustnessandstandardization,thesameistrueofbornanalogdataevendatafromasinglesource(e.g.,asinglesecuritycamera).Borndigitalandbornanalogdatacanbothbecombinedwithdatafusion,andnewkindsofdatacanbegeneratedfromdataanalytics.Thebeneficialusesofnearubiquitousdatacollectionarelarge,andtheyfuelanincreasinglyimportantsetofeconomicactivities.Takentogether,theseconsiderationssuggestthatapolicyfocusonlimitingdatacollectionwillnotbeabroadlyapplicableorscalablestrategynorone

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    xi

    likelytoachievetherightbalancebetweenbeneficialresultsandunintendednegativeconsequences(suchasinhibitingeconomicgrowth).Ifcollectioncannot,inmostcases,belimitedpractically,thenwhat?Chapter4discussesindetailanumberoftechnologiesthathavebeenusedinthepastforprivacyprotection,andothersthatmay,toagreaterorlesserextent,serveastechnologybuildingblocksforfuturepolicies.Sometechnologybuildingblocks(forexample,cybersecuritystandards,technologiesrelatedtoencryption,andformalsystemsofauditableaccesscontrol)arealreadybeingutilizedandneedtobeencouragedinthemarketplace.Ontheotherhand,sometechniquesforprivacyprotectionthathaveseemedencouraginginthepastareusefulassupplementarywaystoreduceprivacyrisk,butdonotnowseemsufficientlyrobusttobeadependablebasisforprivacyprotectionwherebigdataisconcerned.Foravarietyofreasons,PCASTjudgesanonymization,datadeletion,anddistinguishingdatafrommetadata(definedbelow)tobeinthiscategory.Theframeworkofnoticeandconsentisalsobecomingunworkableasausefulfoundationforpolicy.Anonymizationisincreasinglyeasilydefeatedbytheverytechniquesthatarebeingdevelopedformanylegitimateapplicationsofbigdata.Ingeneral,asthesizeanddiversityofavailabledatagrows,thelikelihoodofbeingabletoreidentifyindividuals(thatis,reassociatetheirrecordswiththeirnames)growssubstantially.Whileanonymizationmayremainsomewhatusefulasanaddedsafeguardinsomesituations,approachesthatdeemit,byitself,asufficientsafeguardneedupdating.Whileitisgoodbusinesspracticethatdataofallkindsshouldbedeletedwhentheyarenolongerofvalue,economicorsocialvalueoftencanbeobtainedfromapplyingbigdatatechniquestomassesofdatathatwereotherwiseconsideredtobeworthless.Similarly,archivaldatamayalsobeimportanttofuturehistorians,orforlaterlongitudinalanalysisbyacademicresearchersandothers.Asdescribedabove,manysourcesofdatacontainlatentinformationaboutindividuals,informationthatcanbeknownonlyiftheholderexpendsanalyticresources,orthatmaybecomeknowableonlyinthefuturewiththedevelopmentofnewdataminingalgorithms.Insuchcasesitispracticallyimpossibleforthedataholdereventosurfaceallthedataaboutanindividual,muchlessdeleteitonanyspecifiedscheduleorinresponsetoanindividualsrequest.Today,giventhedistributedandredundantnatureofdatastorage,itisnotevenclearthatdata,evensmalldata,canbedestroyedwithanyhighdegreeofassurance.Asdatasetsbecomemorecomplex,sodotheattachedmetadata.Metadataareancillarydatathatdescribepropertiesofthedatasuchasthetimethedatawerecreated,thedeviceonwhichtheywerecreated,orthedestinationofamessage.Includedinthedataormetadatamaybeidentifyinginformationofmanykinds.Itcannottodaygenerallybeassertedthatmetadataraisefewerprivacyconcernsthandata.Noticeandconsentisthepracticeofrequiringindividualstogivepositiveconsenttothepersonaldatacollectionpracticesofeachindividualapp,program,orwebservice.Onlyinsomefantasyworlddousersactuallyreadthesenoticesandunderstandtheirimplicationsbeforeclickingtoindicatetheirconsent.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    xii

    Theconceptualproblemwithnoticeandconsentisthatitfundamentallyplacestheburdenofprivacyprotectionontheindividual.Noticeandconsentcreatesanonlevelplayingfieldintheimplicitprivacynegotiationbetweenprovideranduser.Theprovideroffersacomplex,takeitorleaveitsetofterms,whiletheuser,inpractice,canallocateonlyafewsecondstoevaluatingtheoffer.Thisisakindofmarketfailure.PCASTbelievesthattheresponsibilityforusingpersonaldatainaccordancewiththeuserspreferencesshouldrestwiththeproviderratherthanwiththeuser.Asapracticalmatter,intheprivatesector,thirdpartieschosenbytheconsumer(e.g.,consumerprotectionorganizations,orlargeappstores)couldintermediate:Aconsumermightchooseoneofseveralprivacyprotectionprofilesofferedbytheintermediary,whichinturnwouldvetappsagainsttheseprofiles.Byvettingapps,theintermediarieswouldcreateamarketplaceforthenegotiationofcommunitystandardsforprivacy.TheFederalgovernmentcouldencouragethedevelopmentofstandardsforelectronicinterfacesbetweentheintermediariesandtheappdevelopersandvendors.Afterdataarecollected,dataanalyticscomeintoplayandmaygenerateanincreasingfractionofprivacyissues.Analysis,perse,doesnotdirectlytouchtheindividual(itisneithercollectionnor,withoutadditionalaction,use)andmayhavenoexternalvisibility.Bycontrast,itistheuseofaproductofanalysis,whetherincommerce,bygovernment,bythepress,orbyindividuals,thatcancauseadverseconsequencestoindividuals.Morebroadly,PCASTbelievesthatitistheuseofdata(includingborndigitalorbornanalogdataandtheproductsofdatafusionandanalysis)thatisthelocuswhereconsequencesareproduced.Thislocusisthetechnicallymostfeasibleplacetoprotectprivacy.Technologiesareemerging,bothintheresearchcommunityandinthecommercialworld,todescribeprivacypolicies,torecordtheorigins(provenance)ofdata,theiraccess,andtheirfurtherusebyprograms,includinganalytics,andtodeterminewhetherthoseusesconformtoprivacypolicies.Someapproachesarealreadyinpracticaluse.Giventhestatisticalnatureofdataanalytics,thereisuncertaintythatdiscoveredpropertiesofgroupsapplytoaparticularindividualinthegroup.Makingincorrectconclusionsaboutindividualsmayhaveadverseconsequencesforthemandmayaffectmembersofcertaingroupsdisproportionately(e.g.,thepoor,theelderly,orminorities).Amongthetechnicalmechanismsthatcanbeincorporatedinausebasedapproacharemethodsforimposingstandardsfordataaccuracyandintegrityandpoliciesforincorporatinguseableinterfacesthatallowanindividualtocorrecttherecordwithvoluntaryadditionalinformation.PCASTschargeforthisstudydidnotaskittorecommendspecificprivacypolicies,butrathertomakearelativeassessmentofthetechnicalfeasibilitiesofdifferentbroadpolicyapproaches.Chapter5,accordingly,discussestheimplicationsofcurrentandemergingtechnologiesforgovernmentpoliciesforprivacyprotection.Theuseoftechnicalmeasuresforenforcingprivacycanbestimulatedbyreputationalpressure,butsuchmeasuresaremosteffectivewhenthereareregulationsandlawswithcivilorcriminalpenalties.Rulesandregulationsprovidebothdeterrenceofharmfulactionsandincentivestodeployprivacyprotectingtechnologies.Privacyprotectioncannotbeachievedbytechnicalmeasuresalone.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    xiii

    Thisdiscussionleadstofiverecommendations.Recommendation1.Policyattentionshouldfocusmoreontheactualusesofbigdataandlessonitscollectionandanalysis.Byactualuses,wemeanthespecificeventswheresomethinghappensthatcancauseanadverseconsequenceorharmtoanindividualorclassofindividuals.Inthecontextofbigdata,theseevents(uses)arealmostalwaysactionsofacomputerprogramorappinteractingeitherwiththerawdataorwiththefruitsofanalysisofthosedata.Inthisformulation,itisnotthedatathemselvesthatcausetheharm,northeprogramitself(absentanydata),buttheconfluenceofthetwo.Theseuseevents(incommerce,bygovernment,orbyindividuals)embodythenecessaryspecificitytobethesubjectofregulation.Bycontrast,PCASTjudgesthatpoliciesfocusedontheregulationofdatacollection,storage,retention,apriorilimitationsonapplications,andanalysis(absentidentifiableactualusesofthedataorproductsofanalysis)areunlikelytoyieldeffectivestrategiesforimprovingprivacy.Suchpolicieswouldbeunlikelytobescalableovertime,ortobeenforceablebyotherthansevereandeconomicallydamagingmeasures.Recommendation2.Policiesandregulation,atalllevelsofgovernment,shouldnotembedparticulartechnologicalsolutions,butrathershouldbestatedintermsofintendedoutcomes.Toavoidfallingbehindthetechnology,itisessentialthatpolicyconcerningprivacyprotectionshouldaddressthepurpose(thewhat)ratherthanprescribingthemechanism(thehow).Recommendation3.WithcoordinationandencouragementfromOSTP,1theNITRDagencies2shouldstrengthenU.S.researchinprivacyrelatedtechnologiesandintherelevantareasofsocialsciencethatinformthesuccessfulapplicationofthosetechnologies.Someofthetechnologyforcontrollingusesalreadyexists.However,research(andfundingforit)isneededinthetechnologiesthathelptoprotectprivacy,inthesocialmechanismsthatinfluenceprivacypreservingbehavior,andinthelegaloptionsthatarerobusttochangesintechnologyandcreateappropriatebalanceamongeconomicopportunity,nationalpriorities,andprivacyprotection.Recommendation4.OSTP,togetherwiththeappropriateeducationalinstitutionsandprofessionalsocieties,shouldencourageincreasededucationandtrainingopportunitiesconcerningprivacyprotection,includingcareerpathsforprofessionals.Programsthatprovideeducationleadingtoprivacyexpertise(akintowhatisbeingdoneforsecurityexpertise)areessentialandneedencouragement.Onemightenvisioncareersfordigitalprivacyexpertsbothonthesoftwaredevelopmentsideandonthetechnicalmanagementside.

    1TheWhiteHouseOfficeofScienceandTechnologyPolicy2NITRDreferstotheNetworkingandInformationTechnologyResearchandDevelopmentprogram,whoseparticipatingFederalagenciessupportunclassifiedresearchinadvancedinformationtechnologiessuchascomputing,networking,andsoftwareandincludebothresearchandmissionfocusedagenciessuchasNSF,NIH,NIST,DARPA,NOAA,DOEsOfficeofScience,andtheD0Dmilitaryservicelaboratories(seehttp://www.nitrd.gov/SUBCOMMITTEE/nitrd_agencies/index.aspx).

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    xiv

    Recommendation5.TheUnitedStatesshouldtaketheleadbothintheinternationalarenaandathomebyadoptingpoliciesthatstimulatetheuseofpracticalprivacyprotectingtechnologiesthatexisttoday.Itcanexhibitleadershipbothbyitsconveningpower(forinstance,bypromotingthecreationandadoptionofstandards)andalsobyitsownprocurementpractices(suchasitsownuseofprivacypreservingcloudservices).PCASTisnotawareofmoreeffectiveinnovationorstrategiesbeingdevelopedabroad;rather,somecountriesseeminclinedtopursuewhatPCASTbelievestobeblindalleys.ThiscircumstanceoffersanopportunityforU.S.technicalleadershipinprivacyintheinternationalarena,anopportunitythatshouldbetaken.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    1

    1.Introduction

    InawidelynotedspeechonJanuary17,2014,PresidentBarackObamachargedhisCounselor,JohnPodesta,withleadingacomprehensivereviewofbigdataandprivacy,onethatwouldreachouttoprivacyexperts,technologists,andbusinessleadersandlookathowthechallengesinherentinbigdataarebeingconfrontedbyboththepublicandprivatesectors;whetherwecanforgeinternationalnormsonhowtomanagethisdata;andhowwecancontinuetopromotethefreeflowofinformationinwaysthatareconsistentwithbothprivacyandsecurity.3ThePresidentandCounselorPodestaaskedthePresidentsCouncilofAdvisorsonScienceandTechnology(PCAST)toassistwiththetechnologydimensionsofthereview.ForthistaskPCASTsstatementofworkreads,inpart,

    PCASTwillstudythetechnologicalaspectsoftheintersectionofbigdatawithindividualprivacy,inrelationtoboththecurrentstateandpossiblefuturestatesoftherelevanttechnologicalcapabilitiesandassociatedprivacyconcerns.Relevantbigdataincludedataandmetadatacollected,orpotentiallycollectable,fromoraboutindividualsbyentitiesthatincludethegovernment,theprivatesector,andotherindividuals.Itincludesbothproprietaryandopendata,andalsodataaboutindividualscollectedincidentallyoraccidentallyinthecourseofotheractivities(e.g.,environmentalmonitoringortheInternetofThings).

    Thisisatallorder,especiallyontheambitioustimescalerequestedbythePresident.Theliteratureandpublicdiscussionofbigdataandprivacyarevast,withnewideasandinsightsgenerateddailyfromavarietyofconstituencies:technologistsinindustryandacademia,privacyandconsumeradvocates,legalscholars,andjournalists(amongothers).IndependentlyofPCAST,butinformingthisreport,thePodestastudysponsoredthreepublicworkshopsatuniversitiesacrossthecountry.Limitingthisreportschargetotechnological,notpolicy,aspectsoftheproblemnarrowsPCASTsmandatesomewhat,butthisisasubjectwheretechnologyandpolicyaredifficulttoseparate.Inanycase,itisthenatureofthesubjectthatthisreportmustberegardedasbasedonamomentarysnapshotofthetechnology,althoughwebelievethekeyconclusionsandrecommendationshavelastingvalue.1.1ContextandoutlineofthisreportTheubiquityofcomputingandelectroniccommunicationtechnologieshasledtotheexponentialgrowthofonlinedata,frombothdigitalandanalogsources.Newtechnologicalcapabilitiestocreate,analyze,anddisseminatevastquantitiesofdataraisenewconcernsaboutthenatureofprivacyandthemeansbywhichindividualprivacymightbecompromisedorprotected.Thisreportdiscussespresentandfuturetechnologiesconcerningthissocalledbigdataasitrelatestoprivacyconcerns.Itisnotacompletesummaryofthetechnologyconcerningbigdata,noracompletesummaryofthewaysinwhichtechnologyaffectsprivacy,butfocusesonthewaysinwhichbigdataandprivacyinteract.Asanexample,ifLeslieconfidesasecrettoChrisandChrisbroadcaststhatsecretbyemailortexting,thatmightbea3RemarksbythePresidentonReviewofSignalsIntelligence,January17,2014.http://www.whitehouse.gov/thepressoffice/2014/01/17/remarkspresidentreviewsignalsintelligence

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    2

    privacyinfringinguseofinformationtechnology,butitisnotabigdataissue.Asanotherexample,ifoceanographicdataarecollectedinlargequantitiesbyremotesensing,thatisbigdata,butnot,inthefirstinstance,aprivacyconcern.Somedataaremoreprivacysensitivethanothers,forexample,personalmedicaldata,asdistinctfrompersonaldatapubliclysharedbythesameindividual.Differenttechnologiesandpolicieswillapplytodifferentclassesofdata.Thenotionsofbigdataandthenotionsofindividualprivacyusedinthisreportareintentionallybroadandinclusive.BusinessconsultantsGartner,Inc.definebigdataashighvolume,highvelocityandhighvarietyinformationassetsthatdemandcosteffective,innovativeformsofinformationprocessingforenhancedinsightanddecisionmaking,4whilecomputerscientistsreviewingmultipledefinitionsofferthemoretechnical,atermdescribingthestorageandanalysisoflargeand/orcomplexdatasetsusingaseriesoftechniquesincluding,butnotlimitedto,NoSQL,MapReduce,andmachinelearning.5(SeeSections3.2.1and3.3.1fordiscussionofthesetechnicalterms.)Inaprivacycontext,thetermbigdatatypicallymeansdataaboutoneoragroupofindividuals,orthatmightbeanalyzedtomakeinferencesaboutindividuals.Itmightincludedataormetadatacollectedbygovernment,bytheprivatesector,orbyindividuals.Thedataandmetadatamightbeproprietaryoropen,theymightbecollectedintentionallyorincidentallyoraccidentally.Theymightbetext,audio,video,sensorbased,orsomecombination.Theymightbedatacollecteddirectlyfromsomesource,ordataderivedbysomeprocessofanalysis.Theymightbesavedforalongperiodoftime,ortheymightbeanalyzedanddiscardedastheyarestreamed.Inthisreport,PCASTusuallydoesnotdistinguishbetweendataandinformation.Thetermprivacyencompassesnotonlyavoidingobservation,orkeepingonespersonalmattersandrelationshipssecret,butalsotheabilitytoshareinformationselectivelybutnotpublicly.Anonymityoverlapswithprivacy,butthetwoarenotidentical.Votingisrecognizedasprivate,butnotanonymous,whileauthorshipofapoliticaltractmaybeanonymous,butitisnotprivate.Likewise,theabilitytomakeintimatepersonaldecisionswithoutgovernmentinterferenceisconsideredtobeaprivacyright,asisprotectionfromdiscriminationonthebasisofcertainpersonalcharacteristics(suchasanindividualsrace,gender,orgenome).So,privacyisnotjustaboutsecrets.Thepromiseofbigdatacollectionandanalysisisthatthederiveddatacanbeusedforpurposesthatbenefitbothindividualsandsociety.Threatstoprivacystemfromthedeliberateorinadvertentdisclosureofcollectedorderivedindividualdata,themisuseofthedata,andthefactthatderiveddatamaybeinaccurateorfalse.Thetechnologiesthataddresstheconfluenceoftheseissuesarethesubjectofthisreport.6TheremainderofthisintroductorychaptergivesfurthercontextintheformofasummaryofhowthelegalconceptofprivacydevelopedhistoricallyintheUnitedStates.Interestingly,andrelevanttothisreport,privacyrightsandthedevelopmentofnewtechnologieshavelongbeenintertwined.Todaysissuesarenoexception.Chapter2ofthisreportisdevotedtoscenariosandexamples,somefromtoday,butmostanticipatinganeartomorrow.YogiBerrasmuchquotedremarkItstoughtomakepredictions,especiallyaboutthefutureis4Gartner,Inc.,ITGlossary.https://www.gartner.com/itglossary/bigdata/5Barker,AdamandJonathanStuartWard,UndefinedByData:ASurveyofBigDataDefinitions,arXiv:1309.5821.http://arxiv.org/abs/1309.58216PCASTacknowledgesgratefullytheassistanceofseveralcontributorsattheNationalScienceFoundation,whohelpedtoidentifyanddistillkeyinsightsfromthetechnicalliteratureandresearchcommunity,aswellasothertechnicalexpertsinacademiaandindustrythatitconsultedduringthisproject.SeeAppendixA.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    3

    germane.Butitisequallytrueforthissubjectthatpoliciesbasedonoutofdateexamplesandscenariosaredoomedtofailure.Bigdatatechnologiesareadvancingsorapidlythatpredictionsaboutthefuture,howeverimperfect,mustguidetodayspolicydevelopment.Chapter3examinesthetechnologydimensionsofthetwogreatpillarsofbigdata:collectionandanalysis.Inacertainsensebigdataisexactlytheconfluenceofthesetwo:bigcollectionmeetsbiganalysis(oftentermedanalytics).Thetechnicalinfrastructureoflargescalenetworkingandcomputingthatenablesbigisalsodiscussed.Chapter4looksattechnologiesandstrategiesfortheprotectionofprivacy.Althoughtechnologymaybepartoftheproblem,itmustalsobepartofthesolution.Manycurrentandforeseeabletechnologiescanenhanceprivacy,andtherearemanyadditionalpromisingavenuesofresearch.Chapter5,drawingonthepreviouschapters,containsPCASTsperspectivesandconclusions.Whileitisnotwithinthisreportschargetorecommendspecificpolicies,itisclearthatcertainkindsofpoliciesaretechnicallymorefeasibleandlesslikelytoberenderedirrelevantorunworkablebynewtechnologiesthanothers.Theseapproachesarehighlighted,alongwithcommentsonthetechnicaldeficienciesofsomeotherapproaches.ThischapteralsocontainsPCASTsrecommendationsinareasthatliewithinourcharge,thatis,otherthanpolicy.1.2TechnologyhaslongdriventhemeaningofprivacyTheconflictbetweenprivacyandnewtechnologyisnotnew,exceptperhapsnowinitsgreaterscope,degreeofintimacy,andpervasiveness.Formorethantwocenturies,valuesandexpectationsrelatingtoprivacyhavebeencontinuallyreinterpretedandrearticulatedinlightoftheimpactofnewtechnologies.ThenationwidepostalsystemadvocatedbyBenjaminFranklinandestablishedin1775wasanewtechnologydesignedtopromoteinterstatecommerce.ButmailwasroutinelyandopportunisticallyopenedintransituntilCongressmadethisactionillegalin1782.WhiletheConstitutionsFourthAmendmentcodifiedtheheightenedprivacyprotectionaffordedtopeopleintheirhomesorontheirpersons(previouslyprinciplesofBritishcommonlaw),ittookanothercenturyoftechnologicalchallengestoexpandtheconceptofprivacyrightsintomoreabstractspaces,includingtheelectronic.Theinventionofthetelegraphand,later,telephonecreatednewtensionsthatwereslowtoberesolved.Abilltoprotecttheprivacyoftelegrams,introducedinCongressin1880,wasneverpassed.7Itwasnottelecommunications,however,buttheinventionoftheportable,consumeroperablecamera(soonknownastheKodak)thatgaveimpetustoWarrenandBrandeiss1890articleTheRighttoPrivacy,8thenacontroversialtitle,butnowviewedasthefoundationaldocumentformodernprivacylaw.Inthearticle,WarrenandBrandeisgavevoicetotheconcernthat[i]nstantaneousphotographsandnewspaperenterprisehaveinvadedthesacredprecinctsofprivateanddomesticlife;andnumerousmechanicaldevicesthreatentomakegoodthepredictionthatwhatiswhisperedintheclosetshallbeproclaimedfromthehousetops,furthernotingthat[f]oryearstherehasbeenafeelingthatthelawmustaffordsomeremedyfortheunauthorizedcirculationofportraitsofprivatepersons9

    7Seipp,DavidJ.,TheRighttoPrivacyinAmericanHistory,HarvardUniversity,ProgramonInformationResourcesPolicy,Cambridge,MA,1978.8Warren,SamuelD.andLouisD.Brandeis,"TheRighttoPrivacy."HarvardLawReview4:5,193,December15,1890.9Id.at195.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    4

    WarrenandBrandeissoughttoarticulatetherightofprivacybetweenindividuals(whosefoundationliesinciviltortlaw).Today,manystatesrecognizeanumberofprivacyrelatedharmsascausesforcivilorcriminallegalaction(furtherdiscussedinSection1.4).10FromWarrenandBrandeisrighttoprivacy,ittookanother75yearsfortheSupremeCourttofind,inGriswoldv.Connecticut11(1965),arighttoprivacyinthe"penumbras"and"emanations"ofotherconstitutionalprotections(asJusticeWilliamO.Douglasputit,writingforthemajority).12Withabroadperspective,scholarstodayrecognizeanumberofdifferentlegalmeaningsforprivacy.FiveoftheseseemparticularlyrelevanttothisPCASTreport:

    (1) Theindividualsrighttokeepsecretsorseekseclusion(thefamousrighttobeleftaloneofBrandeis1928dissentingopinioninOlmsteadv.UnitedStates).13

    (2) Therighttoanonymousexpression,especially(butnotonly)inpoliticalspeech(asinMcIntyrev.OhioElectionsCommission14)

    (3) Theabilitytocontrolaccessbyotherstopersonalinformationafteritleavesonesexclusivepossession(forexample,asarticulatedintheFTCsFairInformationPracticePrinciples).15

    (4) Thebarringofsomekindsofnegativeconsequencesfromtheuseofanindividualspersonalinformation(forexample,jobdiscriminationonthebasisofpersonalDNA,forbiddenin2008bytheGeneticInformationNondiscriminationAct16).

    (5) Therightoftheindividualtomakeintimatedecisionswithoutgovernmentinterference,asinthedomainsofhealth,reproduction,andsexuality(asinGriswold).

    Theseareasserted,notabsolute,rights.Allaresupported,butalsocircumscribed,bybothstatuteandcaselaw.Withtheexceptionofnumber5onthelist(arightofdecisionalprivacyasdistinctfrominformationalprivacy),allareapplicableinvaryingdegreesbothtocitizengovernmentinteractionsandtocitizencitizeninteractions.Collisionsbetweennewtechnologiesandprivacyrightshaveoccurredinallfive.Apatchworkofstateandfederallawshaveaddressedconcernsinmanysectors,buttodatetherehasnotbeencomprehensivelegislationtohandletheseissues.Collisionsbetweennewtechnologiesandprivacyrightsshouldbeexpectedtocontinuetooccur.

    10DigitalMediaLawProject,PublishingPersonalandPrivateInformation.http://www.dmlp.org/legalguide/publishingpersonalandprivateinformation11Griswoldv.Connecticut,381U.S.479(1965).12Id.at48384.13Olmsteadv.UnitedStates,277U.S.438(1928).14McIntyrev.OhioElectionsCommission,514U.S.334,34041(1995).Thedecisionreadsinpart,Protectionsforanonymousspeecharevitaltodemocraticdiscourse.Allowingdissenterstoshieldtheiridentitiesfreesthemtoexpresscriticalminorityviews...Anonymityisashieldfromthetyrannyofthemajority....ItthusexemplifiesthepurposebehindtheBillofRightsandoftheFirstAmendmentinparticular:toprotectunpopularindividualsfromretaliation...atthehandofanintolerantsociety.15FederalTradeCommission,PrivacyOnline:FairInformationPracticesintheElectronicMarketplace,May2000.16GeneticInformationNondiscriminationActof2008,PL110233,May21,2008,122Stat881.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    5

    1.3Whatisdifferenttoday?Newcollisionsbetweentechnologiesandprivacyhavebecomeevident,asnewtechnologicalcapabilitieshaveemergedatarapidpace.Itisnolongerclearthatthefiveprivacyconcernsraisedabove,ortheircurrentlegalinterpretations,aresufficientinthecourtofpublicopinion.Muchofthepublicsconcerniswiththeharmdonebytheuseofpersonaldata,bothinisolationorincombination.Controllingaccesstopersonaldataaftertheyleaveonesexclusivepossessionhasbeenseenhistoricallyasameansofcontrollingpotentialharm.Buttoday,personaldatamayneverbe,orhavebeen,withinonespossessionforinstancetheymaybeacquiredpassivelyfromexternalsourcessuchaspubliccamerasandsensors,orwithoutonesknowledgefrompublicelectronicdisclosuresbyothersusingsocialmedia.Inaddition,personaldatamaybederivedfrompowerfuldataanalyses(seeSection3.2)whoseuseandoutputisunknowntotheindividual.Thoseanalysessometimesyieldvalidconclusionsthattheindividualwouldnotwantdisclosed.Worseyet,theanalysescanproducefalsepositivesorfalsenegativesinformationthatisaconsequenceoftheanalysisbutisnottrueorcorrect.Furthermore,toamuchgreaterextentthanbefore,thesamepersonaldatahavebothbeneficialandharmfuluses,dependingonthepurposesforwhichandthecontextsinwhichtheyareused.Informationsuppliedbytheindividualmightbeusedonlytoderiveotherinformationsuchasidentityoracorrelation,afterwhichitisnotneeded.Thederiveddata,whichwereneverundertheindividualscontrol,mightthenbeusedeitherforgoodorill.Inthecurrentdiscourse,someassertthattheissuesconcerningprivacyprotectionarecollectiveaswellasindividual,particularlyinthedomainofcivilrightsforexample,identificationofcertainindividualsatagatheringusingfacialrecognitionfromvideos,andtheinferencethatotherindividualsatthesamegathering,alsoidentifiedfromvideos,havesimilaropinionsorbehaviors.Currentcircumstancesalsoraiseissuesofhowtherighttoprivacyextendstothepublicsquare,ortoquasiprivategatheringssuchaspartiesorclassrooms.Iftheobserversinthesevenuesarenotjustpeople,butalsobothvisibleandinvisiblerecordingdeviceswithenormousfidelityandeasypathstoelectronicpromulgationandanalysis,doesthatchangetherules?Alsorapidlychangingarethedistinctionsbetweengovernmentandtheprivatesectoraspotentialthreatstoindividualprivacy.Governmentisnotjustagiantcorporation.Ithasamonopolyintheuseofforce;ithasnodirectcompetitorswhoseekmarketadvantageoveritandmaythusmotivateittocorrectmissteps.Governmentshavechecksandbalances,whichcancontributetoselfimposedlimitsonwhattheymaydowithpeoplesinformation.Companiesdecidehowtheywillusesuchinformationinthecontextofsuchfactorsascompetitiveadvantagesandrisks,governmentregulation,andperceivedthreatsandconsequencesoflawsuits.Itisthusappropriatethattherearedifferentsetsofconstraintsonthepublicandprivatesectors.Butgovernmenthasasetofauthoritiesparticularlyintheareasoflawenforcementandnationalsecuritythatplaceitinauniquelypowerfulposition,andthereforetherestraintsplacedonitscollectionanduseofdatadeservespecialattention.Indeed,theneedforsuchattentionisheightenedbecauseoftheincreasinglyblurrylinebetweenpublicandprivatedata.Whilethesedifferencesarereal,bigdataistosomeextentalevelerofthedifferencesbetweengovernmentandcompanies.Bothgovernmentsandcompanieshavepotentialaccesstothesamesourcesofdataandthesameanalytictools.Currentrulesmayallowgovernmenttopurchaseorotherwiseobtaindatafromtheprivate

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    6

    sectorthat,insomecases,itcouldnotlegallycollectitself,17ortooutsourcetotheprivatesectoranalysesitcouldnotitselflegallyperform.18Thepossibilityofgovernmentexercising,withoutpropersafeguards,itsownmonopolypowersandalsohavingunfetteredaccesstotheprivateinformationmarketplaceisunsettling.Whatkindsofactionsshouldbeforbiddenbothtogovernment(Federal,state,andlocal,andincludinglawenforcement)andtotheprivatesector?Whatkindsshouldbeforbiddentoonebutnottheother?Itisunclearwhethercurrentlegalframeworksaresufficientlyrobustfortodayschallenges.1.4Values,harms,andrightsAswasseeninSections1.2and1.3,newprivacyrightsusuallydonotcomeintobeingasacademicabstractions.Rather,theyarisewhentechnologyencroachesonwidelysharedvalues.Wherethereisconsensusonvalues,therecanalsobeconsensusonwhatkindsofharmstoindividualsmaybeanaffronttothosevalues.Notallsuchharmsmaybepreventableorremediablebygovernmentactions,but,conversely,itisunlikelythatgovernmentactionswillbewelcomeoreffectiveiftheyarenotgroundedtosomedegreeinvaluesthatarewidelyshared.Intherealmofprivacy,WarrenandBrandeisin189019(seeSection1.2)beganadialogueaboutprivacythatledtotheevolutionoftherightinacademiaandthecourts,latercrystalizedbyWilliamProsserasfourdistinctharmsthathadcometoearnlegalprotection.20Adirectresultisthat,today,manystatesrecognizeascausesforlegalactionthefourharmsthatProsserenumerated,21andwhichhavebecome(thoughvaryingfromstatetostate22)privacyrights.Theharmsare:

    Intrusionuponseclusion.Apersonwhointentionallyintrudes,physicallyorotherwise(nowincludingelectronically),uponthesolitudeorseclusionofanotherpersonorherprivateaffairsorconcerns,canbesubjecttoliabilityfortheinvasionofherprivacy,butonlyiftheintrusionwouldbehighlyoffensivetoareasonableperson.

    Publicdisclosureofprivatefacts.Similarly,apersoncanbesuedforpublishingprivatefactsaboutanotherperson,evenifthosefactsaretrue.Privatefactsarethoseaboutsomeonespersonallifethathavenotpreviouslybeenmadepublic,thatarenotoflegitimatepublicconcern,andthatwouldbeoffensivetoareasonableperson.

    17OneHundredTenthCongress,Privacy:Theuseofcommercialinformationresellersbyfederalagencies,HearingbeforetheSubcommitteeonInformationPolicy,Census,andNationalArchivesoftheCommitteeonOversightandGovernmentReform,HouseofRepresentatives,March11,2008.18Forexample,ExperianprovidesmuchofHealthcare.govsidentityverificationcomponentusingconsumercreditinformationnotavailabletothegovernment.SeeConsumerReports,HavingtroubleprovingyouridentitytoHealthCare.gov?Here'showtheprocessworks,December18,2013.http://www.consumerreports.org/cro/news/2013/12/howtoproveyouridentityonhealthcaregov/index.htm?loginMethod=auto19Warren,SamuelD.andLouisD.Brandeis,"TheRighttoPrivacy."HarvardLawReview4:5,193,December15,1890.20Prosser,WilliamL.,Privacy,CaliforniaLawReview48:383,389,1960.21Id.22(1)DigitalMediaLawProject,PublishingPersonalandPrivateInformation.http://www.dmlp.org/legalguide/publishingpersonalandprivateinformation.(2)Id.,ElementsofanIntrusionClaim.http://www.dmlp.org/legalguide/elementsintrusionclaim

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    7

    Falselightorpublicity.Closelyrelatedtodefamation,thisharmresultswhenfalsefactsarewidelypublishedaboutanindividual.Insomestates,falselightincludesuntrueimplications,notjustuntruefactsassuch.

    Misappropriationofnameorlikeness.Individualshavearightofpublicitytocontroltheuseoftheirnameorlikenessincommercialsettings.

    ItseemslikelythatmostAmericanstodaycontinuetosharethevaluesimplicitintheseharms,evenifthelegallanguage(bynowrefinedinthousandsofcourtdecisions)strikesoneasarchaicandquaint.However,newtechnologicalinsultstoprivacy,actualorprospective,andacenturysevolutionofsocialvalues(forexample,todaysgreaterrecognitionoftherightsofminorities,andofrightsassociatedwithgender),mayrequirealongerlistthansufficedin1960.AlthoughPCASTsengagementwiththissubjectiscenteredontechnology,notlaw,anyreportonthesubjectofprivacy,includingPCASTs,shouldbegroundedinthevaluesofitsday.Asastartingpointfordiscussion,albeitonlyasnapshotoftheviewsofonesetoftechnologicallymindedAmericans,PCASTofferssomepossibleaugmentationstotheestablishedlistofharms,eachofwhichsuggestsapossibleunderlyingrightintheageofbigdata.PCASTalsobelievesstronglythatthepositivebenefitsoftechnologyare(orcanbe)greaterthananynewharms.Almosteverynewharmisrelatedtooradjacenttobeneficialusesofthesametechnology.23Toemphasizethispoint,foreachsuggestednewharm,wedescribearelatedbeneficialuse.

    Invasionofprivatecommunications.Digitalcommunicationstechnologiesmakesocialnetworkingpossibleacrosstheboundariesofgeography,andenablesocialandpoliticalparticipationonpreviouslyunimaginablescales.Anindividualsrighttoprivatecommunication,securedforwrittenmailandwirelinetelephoneinpartbytheisolationoftheirdeliveryinfrastructure,mayneedreaffirmationinthedigitalera,however,whereallkindsofbitssharethesamepipelines,andthebarrierstointerceptionareoftenmuchlower.(Inthiscontext,wediscusstheuseandlimitationsofencryptioninSection4.2.)

    Invasionofprivacyinapersonsvirtualhome.TheFourthAmendmentgivesspecialprotectionagainstgovernmentintrusionintothehome,forexampletheprotectionofprivaterecordswithinthehome;tortlawoffersprotectionagainstsimilarnongovernmentintrusion.ThenewvirtualhomeincludestheInternet,cloudstorage,andotherservices.Personaldatainthecloudcanbeaccessibleandorganized.Photographsandrecordsinthecloudcanbesharedwithfamilyandfriends,andcanbepasseddowntofuturegenerations.Theunderlyingsocialvalue,thehomeasonescastle,shouldlogicallyextendtoonescastleinthecloud,butthisprotectionhasnotbeenpreservedinthenewvirtualhome.(WediscussthissubjectfurtherinSection2.3.)

    Publicdisclosureofinferredprivatefacts.Powerfuldataanalyticsmayinferpersonalfactsfromseeminglyharmlessinputdata.Sometimestheinferencesarebeneficial.Atitsbest,targetedadvertisingdirectsconsumerstoproductsthattheyactuallywantorneed.Inferencesaboutpeopleshealthcanleadtobetterandtimeliertreatmentsandlongerlives.Butbeforetheadventofbigdata,itcouldbeassumedthattherewasacleardistinctionbetweenpublicandprivateinformation:eitherafactwasoutthere(andcouldbepointedto),oritwasnot.Today,analyticsmaydiscoverfactsthat

    23Oneperspectiveinformedbynewtechnologiesandtechnologymedicatedcommunicationsuggeststhatprivacyisaboutthecontinualmanagementofboundariesbetweendifferentspheresofactionanddegreesofdisclosurewithinthosespheres,withprivacyandonespublicfacebeingbalancedindifferentwaysatdifferenttimes.See:LeysiaPalenandPaulDourish,UnpackingPrivacyforaNetworkedWorld,ProceedingsofCHI2003,AssociationforComputingMachinery,April510,2003.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    8

    arenolessprivatethanyesterdayspurelyprivatesphereoflife.Examplesincludeinferringsexualpreferencefrompurchasingpatterns,orearlyAlzheimersdiseasefromkeyclickstreams.Inthelattercase,theprivatefactmaynotevenbeknowntotheindividualinquestion.(Section3.2discussesthetechnologybehindthedataanalyticsthatmakessuchinferencespossible.)Thepublicdisclosureofsuchinformation(andpossiblyalsosomenonpubliccommercialuses)seemsoffensivetowidelysharedvalues.

    Tracking,stalking,andviolationsoflocationalprivacy.Todaystechnologieseasilydetermineanindividualscurrentorpriorlocation.Usefullocationbasedservicesincludenavigation,suggestingbettercommuterroutes,findingnearbyfriends,avoidingnaturalhazards,andadvertisingtheavailabilityofnearbygoodsandservices.Sightinganindividualinapublicplacecanhardlybeaprivatefact.Whenbigdataallowssuchsightings,orotherkindsofpassiveoractivedatacollection,tobeassembledintothecontinuouslocationaltrackofanindividualsprivatelife,however,manyAmericans(includingSupremeCourtJusticeSotomayor,forexample24)perceiveapotentialaffronttoawidelyacceptedreasonableexpectationofprivacy.

    Harmarisingfromfalseconclusionsaboutindividuals,basedonpersonalprofilesfrombigdataanalytics.Thepowerofbigdata,andthereforeitsbenefit,isoftencorrelational.Inmanycasestheharmsfromstatisticalerrorsaresmall,forexampletheincorrectinferenceofamoviepreference;orthesuggestionthatahealthissuebediscussedwithaphysician,followingfromanalysesthatmay,onaverage,bebeneficial,evenwhenaparticularinstanceturnsouttobeafalsealarm.Evenwhenpredictionsarestatisticallyvalid,moreover,theymaybeuntrueaboutparticularindividualsandmistakenconclusionsmaycauseharm.Societymaynotbewillingtoexcuseharmscausedbytheuncertaintiesinherentinstatisticallyvalidalgorithms.Theseharmsmayunfairlyburdenparticularclassesofindividuals,forexample,racialminoritiesortheelderly.

    Foreclosureofindividualautonomyorselfdetermination.Dataanalysesaboutlargepopulationscandiscoverspecialcasesthatapplytoindividualswithinthatpopulation.Forexample,byidentifyingdifferencesinlearningstyles,bigdatamaymakeitpossibletopersonalizeeducationinwaysthatrecognizeeveryindividualspotentialandoptimizethatindividualsachievement.Buttheprojectionofpopulationfactorsontoindividualscanbemisused.Itiswidelyacceptedthatindividualsshouldbeabletomaketheirownchoicesandpursueopportunitiesthatarenotnecessarilytypical,andthatnooneshouldbedeniedthechancetoachievemorethansomestatisticalexpectationofthemselves.Itwouldoffendourvaluesifachildschoicesinvideogameswerelaterusedforeducationaltracking(forexample,collegeadmissions).Similarlyoffensivewouldbeafuture,akintoPhilipK.DickssciencefictionshortstoryadaptedbyStevenSpielberginthefilmMinorityReport,whereprecrimeisstatisticallyidentifiedandpunished.25

    Lossofanonymityandprivateassociation.Anonymityisnotacceptableasanenablerofcommittingfraud,orbullying,orcyberstalking,orimproperinteractionswithchildren.Apartfromwrongfulbehavior,however,theindividualsrighttochoosetobeanonymousisalongheldAmericanvalue(as,forexample,theanonymousauthorshipoftheFederalistpapers).Usingdatato(re)identifyanindividualwhowishestobeanonymous(exceptinthecaseoflegitimategovernmentalfunctions,suchaslawenforcement)isregardedasaharm.Similarly,individualshavearightofprivateassociationwithgroupsorotherindividuals,andtheidentificationofsuchassociationsmaybeaharm.

    24IwouldaskwhetherpeoplereasonablyexpectthattheirmovementswillberecordedandaggregatedinamannerthatenablestheGovernmenttoascertain,moreorlessatwill,theirpoliticalandreligiousbeliefs,sexualhabits,andsoon.UnitedStatesv.Jones(101259),Sotomayorconcurrenceathttp://www.supremecourt.gov/opinions/11pdf/101259.pdf.25Dick,PhillipK.,TheMinorityReport,firstpublishedinFantasticUniverse(1956)andreprintedinSelectedStoriesofPhilipK.Dick,NewYork:Pantheon,2002.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    9

    Whileinnosenseistheabovelistintendedtobecomplete,itdoeshaveafewintentionalomissions.Forexample,individualsmaywantbigdatatobeusedfairly,inthesenseoftreatingpeopleequally,but(apartfromthesmallnumberofprotectedclassesalreadydefinedbylaw)itseemsimpossibletoturnthisintoarightthatisspecificenoughtobemeaningful.Likewise,individualsmaywanttheabilitytoknowwhatothersknowaboutthem;butthatissurelynotarightfromthepredigitalage;and,inthecurrenteraofstatisticalanalysis,itisnotsoeasytodefinewhatknowmeans.ThisimportantissueisdiscussedinSection3.1.2,andagaintakenupinchapter5,wheretheattemptistofocusonactualharmsdonebytheuseofinformation,notbyaconceptastechnicallyambiguousaswhetherinformationisknown.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    10

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    11

    2.ExamplesandScenariosThischapterseekstomakeChapter1sintroductorydiscussionmoreconcretebysketchingsomeexamplesandscenarios.Whilesomeoftheseapplicationsoftechnologyareinusetoday,otherscomprisePCASTstechnologicalprognosticationsaboutthenearfuture,uptoperhaps10yearsfromtoday.Takentogethertheexamplesandscenariosareintendedtoillustrateboththeenormousbenefitsthatbigdatacanprovideandalsotheprivacychallengesthatmayaccompanythesebenefits.Inthefollowingthreesections,itwillbeusefultodevelopsomescenariosmorecompletelythanothers,movingfromverybriefexamplesofthingshappeningtodaytomorefullydevelopedscenariossetinthefuture.2.1ThingshappeningtodayorverysoonHerearesomerelevantexamples:

    Pioneeredmorethanadecadeago,devicesmountedonutilitypolesareabletosensetheradiostationsbeinglistenedtobypassingdrivers,withtheresultssoldtoadvertisers.26

    In2011,automaticlicenseplatereaderswereinusebythreequartersoflocalpolicedepartmentssurveyed.Within5years,25%ofdepartmentsexpecttohavetheminstalledonallpatrolcars,alertingpolicewhenavehicleassociatedwithanoutstandingwarrantisinview.27Meanwhile,civilianusesoflicenseplatereadersareemerging,leveragingcloudplatformsandpromisingmultiplewaysofusingtheinformationcollected.28

    ExpertsattheMassachusettsInstituteofTechnologyandtheCambridgePoliceDepartmenthaveusedamachinelearningalgorithmtoidentifywhichburglarieslikelywerecommittedbythesameoffender,thusaidingpoliceinvestigators.29

    Differentialpricing(offeringdifferentpricestodifferentcustomersforessentiallythesamegoods)hasbecomefamiliarindomainssuchasairlineticketsandcollegecosts.Bigdatamayincreasethepowerandprevalenceofthispracticeandmayalsodecreaseevenfurtheritstransparency.30

    26ElBoghdady,Dina,AdvertisersTuneIntoNewRadioGauge,TheWashingtonPost,October25,2004.http://www.washingtonpost.com/wpdyn/articles/A600132004Oct24.html27AmericanCivilLibertiesUnion,YouAreBeingTracked:HowLicensePlateReadersAreBeingUsedToRecordAmericansMovements,July,2013.https://www.aclu.org/files/assets/071613aclualprreportoptv05.pdf28Hardy,Quentin,HowUrbanAnonymityDisappearsWhenAllDataIsTracked,TheNewYorkTimes,April19,2014.29Rudin,Cynthia,Predictivepolicing:UsingMachineLearningtoDetectPatternsofCrime,Wired,August22,2013.http://www.wired.com/insights/2013/08/predictivepolicingusingmachinelearningtodetectpatternsofcrime/.:www.wired.com/insights/2013/08/predictivedetectpattern30(1)Schiller,Benjamin,FirstDegreePriceDiscriminationUsingBigData,Jan.30.2014,BrandeisUniversity.http://benjaminshiller.com/images/First_Degree_PD_Using_Big_Data_Jan_27,_2014.pdfandhttp://www.forbes.com/sites/modeledbehavior/2013/09/01/willbigdatabringmorepricediscrimination/(2)Fisher,WilliamW.WhenShouldWePermitDifferentialPricingofInformation?UCLALawReview55:1,2007.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    12

    TheUKfirmFeatureSpaceoffersmachinelearningalgorithmstothegamingindustrythatmaydetectearlysignsofgamblingaddictionorotheraberrantbehavioramongonlineplayers.31

    RetailerslikeCVSandAutoZoneanalyzetheircustomersshoppingpatternstoimprovethelayoutoftheirstoresandstocktheproductstheircustomerswantinaparticularlocation.32Bytrackingcellphones,RetailNextoffersbricksandmortarretailersthechancetorecognizereturningcustomers,justascookiesallowthemtoberecognizedbyonlinemerchants.33SimilarWiFitrackingtechnologycoulddetecthowmanypeopleareinaclosedroom(andinsomecasestheiridentities).

    TheretailerTargetinferredthatateenagecustomerwaspregnantand,bymailinghercouponsintendedtobeuseful,unintentionallydisclosedthisfacttoherfather.34

    Theauthorofananonymousbook,magazinearticle,orwebpostingisfrequentlyoutedbyinformalcrowdsourcing,fueledbythenaturalcuriosityofmanyunrelatedindividuals.35

    Socialmediaandpublicsourcesofrecordsmakeiteasyforanyonetoinferthenetworkoffriendsandassociatesofmostpeoplewhoareactiveontheweb,andmanywhoarenot.36

    MaristCollegeinPoughkeepsie,NewYork,usespredictivemodelingtoidentifycollegestudentswhoareatriskofdroppingout,allowingittotargetadditionalsupporttothoseinneed.37

    TheDurkheimProject,fundedbytheU.S.DepartmentofDefense,analyzessocialmediabehaviortodetectearlysignsofsuicidalthoughtsamongveterans.38

    LendUp,aCaliforniabasedstartup,soughttousenontraditionaldatasourcessuchassocialmediatoprovidecredittounderservedindividuals.Becauseofthechallengesinensuringaccuracyandfairness,however,theyhavebeenunabletoproceed.39,40

    31BurnMurdoch,John,UKtechnologyfirmusesmachinelearningtocombatgamblingaddiction,TheGuardian,August1,2013.http://www.theguardian.com/news/datablog/2013/aug/01/ukfirmusesmachinelearningfightgamblingaddiction32Clifford,Stephanie,UsingDatatoStageManagePathstothePrescriptionCounter,TheNewYorkTimes,June19,2013.http://bits.blogs.nytimes.com/2013/06/19/usingdatatostagemanagepathstotheprescriptioncounter/33Clifford,Stephanie,Attention,Shoppers:StoreIsTrackingYourCell,TheNewYorkTimes,July14,2013.34Duhigg,Charles,HowCompaniesLearnYourSecrets,TheNewYorkTimesMagazine,February12,2012.http://www.nytimes.com/2012/02/19/magazine/shoppinghabits.html?pagewanted=all&_r=035Volokh,Eugene,OutingAnonymousBloggers,June8,2009.http://www.volokh.com/2009/06/08/outinganonymousbloggers/;A.Narayananetal.,OntheFeasibilityofInternetScaleAuthorIdentification,IEEESymposiumonSecurityandPrivacy,May2012.http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=623442036FacebooksTheGraphAPI(athttps://developers.facebook.com/docs/graphapi/)describeshowtowritecomputerprogramsthatcanaccesstheFacebookfriendsdata.37Oneoffourbigdataapplicationshonoredbythetradejournal,Computerworld,in2013.King,Julia,UNtacklessocioeconomiccriseswithbigdata,Computerworld,June3,2013.http://www.computerworld.com/s/article/print/9239643/UN_tackles_socio_economic_crises_with_big_data38Ungerleider,Neal,ThisMayBeTheMostVitalUseOfBigDataWeveEverSeen,FastCompany,July12,2013.http://www.fastcolabs.com/3014191/thismaybethemostvitaluseofbigdataweveeverseen.39CenterforDataInnovations,100DataInnovations,InformationTechnologyandInnovationFoundation,Washington,DC,January2014.http://www2.datainnovation.org/2014100datainnovations.pdf40Waters,Richard,Dataopendoorstofinancialinnovation,FinancialTimes,December13,2013.http://www.ft.com/intl/cms/s/2/3c59d58a43fb11e2844c00144feabdc0.html

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    13

    Insightintothespreadofhospitalacquiredinfectionshasbeengainedthroughtheuseoflargeamountsofpatientdatatogetherwithpersonalinformationaboutuninfectedpatientsandclinicalstaff.41

    Individualsheartratescanbeinferredfromthesubtlechangesintheirfacialcolorationthatoccurwitheachbeat,enablinginferencesabouttheirhealthandemotionalstate.42

    2.2ScenariosofthenearfutureinhealthcareandeducationHereareafewexamplesofthekindsofscenariosthatcanreadilybeconstructed.2.2.1Healthcare:personalizedmedicineNotallpatientswhohaveaparticulardiseasearealike,nordotheyrespondidenticallytotreatment.Researcherswillsoonbeabletodrawonmillionsofhealthrecords(includinganalogdatasuchasscansinadditiontodigitaldata),vastamountsofgenomicinformation,extensivedataonsuccessfulandunsuccessfulclinicaltrials,hospitalrecords,andsoforth.Insomecasestheywillbeabletodiscernthatamongthediversemanifestationsofthedisease,asubsetofthepatientshaveacollectionoftraitsthattogetherformavariantthatrespondstoaparticulartreatmentregime.Sincetheresultoftheanalysiscouldleadtobetteroutcomesforparticularpatients,itisdesirabletoidentifythoseindividualsinthecohort,contactthem,treattheirdiseaseinanovelway,andusetheirexperiencesinadvancingtheresearch.Theirdatamayhavebeengatheredonlyanonymously,however,oritmayhavebeendeidentified.Solutionsmaybeprovidedbyspecificnewtechnologiesfortheprotectionofdatabaseprivacy.Thesemaycreateaprotectedquerymechanismsoindividualscanfindoutwhethertheyareinthecohort,orprovideanalertmechanismbasedonthecohortcharacteristicssothat,whenamedicalprofessionalseesapatientinthecohort,anoticeisgenerated.2.2.2Healthcare:detectionofsymptomsbymobiledevicesManybabyboomerswonderhowtheymightdetectAlzheimer'sdiseaseinthemselves.Whatwouldbebettertoobservetheirbehaviorthanthemobiledevicethatconnectsthemtoapersonalassistantinthecloud(e.g.,SiriorOKGoogle),helpsthemnavigate,remindsthemwhatwordsmean,rememberstodothings,recallsconversations,measuresgait,andotherwiseisinapositiontodetectgradualdeclinesontraditionalandnovelmedicalindicatorsthatmightbeimperceptibleeventotheirspouses?Atthesametime,anyleakofsuchinformationwouldbeadamagingbetrayaloftrust.Whatareindividualsprotectionsagainstsuchrisks?Cantheinferredinformationaboutindividualshealthbesold,withoutadditionalconsent,tothirdparties(e.g.,pharmaceuticalcompanies)?Whatifthisisastatedconditionofuseof

    41(1)Wiens,Jenna,JohnGuttag,andEricHorvitz,AStudyinTransferLearning:LeveragingDatafromMultipleHospitalstoEnhanceHospitalSpecificPredictions,JournaloftheAmericanMedicalInformaticsAssociation,January2014.(2)Weitzner,DanielJ.,etal.,ConsumerPrivacyBillofRightsandBigData:ResponsetoWhiteHouseOfficeofScienceandTechnologyPolicyRequestforInformation,April4,2014.42Frazer,Bryant,MITComputerProgramRevealsInvisibleMotioninVideo,TheNewYorkTimesvideo,February27,2013.https://www.youtube.com/watch?v=3rWycBEHn3s

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    14

    theapp?Shouldinformationgotoindividualspersonalphysicianswiththeirinitialconsentbutnotasubsequentconfirmation?2.2.3EducationDrawingonmillionsoflogsofonlinecourses,includingbothmassiveopenonlinecourses(MOOCs)andsmallerclasses,itwillsoonbepossibletocreateandmaintainlongitudinaldataabouttheabilitiesandlearningstylesofmillionsofstudents.Thiswillincludenotjustbroadaggregateinformationlikegrades,butfinegrainedprofilesofhowindividualstudentsrespondtomultiplenewkindsofteachingtechniques,howmuchhelptheyneedtomasterconceptsatvariouslevelsofabstraction,whattheirattentionspanisinvariouscontexts,andsoforth.AMOOCplatformcanrecordhowlongastudentwatchesaparticularvideo;howoftenasegmentisrepeated,spedup,orskipped;howwellastudentdoesonaquiz;howmanytimesheorshemissesaparticularproblem;andhowthestudentbalanceswatchingcontenttoreadingatext.Astheabilitytopresentdifferentmaterialtodifferentstudentsmaterializesintheplatforms,thepossibilityofblind,randomizedA/Btestingenablesthegoldstandardofexperimentalsciencetobeimplementedatlargescaleintheseenvironments.43Similardataarealsobecomingavailableforresidentialclasses,aslearningmanagementsystems(suchasCanvas,Blackboard,orDesire2Learn)expandtheirrolestosupportinnovativepedagogy.Inmanycoursesonecannowgetmomentbymomenttrackingofthestudent'sengagementwiththecoursematerialsandcorrelatethatengagementwiththedesiredlearningoutcomes.Withthisinformation,itwillbepossiblenotonlytogreatlyimproveeducation,butalsotodiscoverwhatskills,taughttowhichindividualsatwhichpointsinchildhood,leadtobetteradultperformanceincertaintasks,ortoadultpersonalandeconomicsuccess.Whilethesedatacouldrevolutionizeeducationalresearch,theprivacyissuesarecomplex.44Therearemanyprivacychallengesinthisvisionofthefutureofeducation.Knowledgeofearlyperformancecancreateimplicitbiases45thatcolorlaterinstructionandcounseling.Thereisgreatpotentialformisuse,ostensiblyforthesocialgood,inthemassiveabilitytodirectstudentsintohighorlowpotentialtracks.Parentsandothershaveaccesstosensitiveinformationaboutchildren,butmechanismsrarelyexisttochangethosepermissionswhenthechildreachesmajority.2.3ChallengestothehomesspecialstatusThehomehasspecialsignificanceasasanctuaryofindividualprivacy.TheFourthAmendmentslist,persons,houses,papers,andeffects,putsonlythephysicalbodyintherhetoricallymoreprominentposition;andahouseisoftenthephysicalcontainerfortheotherthree,aboundaryinsideofwhichenhancedprivacyrightsapply.43ForanoverviewofMOOCsandassociatedanalyticsopportunities,seePCASTsDecember2013lettertothePresident.http://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_edit_dec2013.pdf44Thereisalsouncertaintyabouthowtointerpretapplicablelaws,suchastheFamilyEducationalRightsandPrivacyAct(FERPA).RecentFederalguidanceisintendedtohelpclarifythesituation.See:U.S.DepartmentofEducation,ProtectingStudentPrivacyWhileUsingOnlineEducationalServices:RequirementsandBestPractices,February2014.http://ptac.ed.gov/sites/default/files/Student%20Privacy%20and%20Online%20Educational%20Services%20%28February%202014%29.pdf45Cukier,Kenneth,andViktorMayerSchoenberger,"HowBigDataWillHauntYouForever,"Quartz,March11,2014.http://qz.com/185252/howbigdatawillhauntyouforeveryourhighschooltranscript/

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    15

    ExistinginterpretationsoftheFourthAmendmentareinadequateforthepresentworld,however.We,alongwiththepapersandeffectscontemplatedbytheFourthAmendment,liveincreasinglyincyberspace,wherethephysicalboundaryofthehomehaslittlerelevance.In1980,afamilysfinancialrecordswerepaperdocuments,locatedperhapsinadeskdrawerinsidethehouse.By2000,theyweremigratingtotheharddriveofthehomecomputerbutstillwithinthehouse.By2020,itislikelythatmostsuchrecordswillbeinthecloud,notjustoutsidethehouse,butlikelyreplicatedinmultiplelegaljurisdictionsbecausecloudstoragetypicallyuseslocationdiversitytoachievereliability.Thepictureisthesameifonesubstitutesforfinancialrecordssomethinglikepoliticalbookswepurchase,orlovelettersthatwereceive,oreroticvideosthatwewatch.Absentdifferentpolicy,legislative,andjudicialapproaches,thephysicalsanctityofthehomespapersandeffectsisrapidlybecominganemptylegalvessel.ThehomeisalsothecentrallocusofBrandeisrighttobeleftalone.Thisrightisalsoincreasinglyfragile,however.Increasingly,peoplebringsensorsintotheirhomeswhoseimmediatepurposeistoprovideconvenience,safety,andsecurity.Smokeandcarbonmonoxidealarmsarecommon,andoftenrequiredbysafetycodes.46Radondetectorsareusualinsomepartsofthecountry.Integratedairmonitorsthatcandetectandidentifymanydifferentkindsofpollutantsandallergensarereadilyforeseeable.Refrigeratorsmaysoonbeabletosniffforgasesreleasedfromspoiledfood,or,asanotherpossiblepath,maybeabletoreadfoodexpirationdatesfromradiofrequencyidentification(RFID)tagsinthefoodspackaging.Ratherthantodaysannoyingcacophonyofbeeps,tomorrowssensors(assomealreadydotoday)willinterfacetoafamilythroughintegratedappsonmobiledevicesordisplayscreens.Thedatawillhavebeenprocessedandinterpreted.Mostlikelythatprocessingwilloccurinthecloud.So,todeliverservicestheconsumerwants,muchdatawillneedtohaveleftthehome.Environmentalsensorsthatenablenewfoodandairsafetymayalsobeabletodetectandcharacterizetobaccoormarijuanasmoke.Healthcareorhealthinsuranceprovidersmaywantassurancethatselfdeclarednonsmokersaretellingthetruth.Mightthey,asaconditionoflowerpremiums,requirethehomeownersconsentfortappingintotheenvironmentalmonitorsdata?Ifthemonitordetectsheroinsmoking,isaninsurancecompanyobligatedtoreportthistothepolice?Cantheinsurercancelthehomeownerspropertyinsurance?Tosome,itseemsfarfetchedthatthetypicalhomewillforeseeablyacquirecamerasandmicrophonesineveryroom,butthatappearstobealikelytrend.Whatcanyourcellphone(alreadyequippedwithfrontandbackcameras)hearorseewhenitisonthenightstandnexttoyourbed?Tablets,laptops,andmanydesktopcomputershavecamerasandmicrophones.Motiondetectortechnologyforhomeintrusionalarmswilllikelymovefromultrasoundandinfraredtoimagingcameraswiththebenefitoffewerfalsealarmsandtheabilitytodistinguishpetsfrompeople.Facialrecognitiontechnologywillallowfurthersecurityandconvenience.Forthesafetyoftheelderly,camerasandmicrophoneswillbeabletodetectfallsorcollapses,orcallsforhelp,andbenetworkedtosummonaid.Peoplenaturallycommunicatebyvoiceandgesture.Itisinevitablethatpeoplewillcommunicatewiththeirelectronicservantsinbothsuchmodes(necessitatingthattheyhaveaccesstocamerasandmicrophones).

    46Nest,acquiredbyGoogle,attractedattentionearlyforitsdesignanditsuseofbigdatatoadapttoconsumerbehavior.See:Aoki,Kenji,"NestGivestheLowlySmokeDetectoraBrain,"Wired,October,2013.http://www.wired.com/2013/10/nestsmokedetector/all/

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    16

    CompaniessuchasPrimeSense,anIsraelifirmrecentlyboughtbyApple,47aredevelopingsophisticatedcomputervisionsoftwareforgesturereading,alreadyakeyfeatureintheconsumercomputergameconsolemarket(e.g.,MicrosoftKinect).Consumertelevisionsarealreadyamongthefirstappliancestorespondtogesture;already,devicessuchastheNestsmokedetectorrespondtogestures.48TheconsumerwhotapshistempletosignalaspokencommandtoGoogleGlass49maywanttousethesamegestureforthetelevision,orforthatmatterforthethermostatorlightswitch,inanyroomathome.Thisimpliesomnipresentaudioandvideocollectionwithinthehome.Alloftheseaudio,video,andsensordatawillbegeneratedwithinthesupposedsanctuaryofthehome.Buttheyarenomorelikelytostayinthehomethanthepapersandeffectsalreadydiscussed.Electronicdevicesinthehomealreadyinvisiblycommunicatetotheoutsideworldviamultipleseparateinfrastructures:Thecableindustryshardwiredconnectiontothehomeprovidesmultipletypesoftwowaycommunication,includingbroadbandInternet.WirelinephoneisstillusedbysomehomeintrusionalarmsandsatelliteTVreceivers,andasthephysicallayerforDSLbroadbandsubscribers.Somehomedevicesusethecellphonewirelessinfrastructure.ManyotherspiggybackonthehomeWiFinetworkthatisincreasinglyanecessityofmodernlife.TodayssmarthomeentertainmentsystemknowswhatapersonrecordsonaDVR,whatsheactuallywatches,andwhenshewatchesit.Likepersonalfinancialrecordsin2000,thisinformationtodayisinpartlocalizedinsidethehome,ontheharddriveinsidetheDVR.Aswithfinancialinformationtoday,however,itisontracktomoveintothecloud.Today,NetflixorAmazoncanofferentertainmentsuggestionsbasedoncustomerspastkeyclickstreamsandviewinghistoryontheirplatforms.Tomorrow,evenbettersuggestionsmaybeenabledbyinterpretingtheirminutebyminutefacialexpressionsasseenbythegesturereadingcamerainthetelevision.Thesecollectionsofdataarebenign,inthesensethattheyarenecessaryforproductsandservicesthatconsumerswillknowinglydemand.Theirchallengestoprivacyarisebothfromthefactthattheiranalogsensorsnecessarilycollectmoreinformationthanisminimallynecessaryfortheirfunction(seeSection3.1.2),andalsobecausetheirdatapracticallycryoutforsecondaryusesrangingfrominnovativenewproductstomarketingbonanzastocriminalexploits.Asinmanyotherkindsofbigdata,thereisambiguityastodataownership,datarights,andalloweddatause.Computervisionsoftwareislikelyalreadyabletoreadthebrandlabelsonproductsinitsfieldofviewthisisamucheasiertechnologythanfacialrecognition.Ifthecamerainyourtelevisionknowswhatbrandofbeeryouaredrinkingwhilewatchingafootballgame,andknowswhetheryouopenedthebottlebeforeorafterthebeerad,who(ifanyone)isallowedtosellthisinformationtothebeercompany,ortoitscompetitors?Isthecameraallowedtoreadbrandnameswhenthetelevisionsetissupposedlyoff?Canitwatchformagazinesorpoliticalleaflets?IftheRFIDtagsensorinyourrefrigeratorusefullydetectsoutofdatefood,canitalsoreportyourbrandchoicestovendors?Isthiscreepyandstrange,oraconsumerfinancialbenefitwheneverysupermarketcanofferyourelevantcoupons?50Or(thedilemmaof

    47Reuters,AppleacquiresIsraeli3DchipdeveloperPrimeSense,November25,2013.http://www.reuters.com/article/2013/11/25/usprimesenseofferappleidUSBRE9AO04C2013112548Id.49Google,Glassgestures.https://support.google.com/glass/answer/3064184?hl=en50Tene,Omer,andJulesPolonetsky,"ATheoryofCreepy:Technology,PrivacyandShiftingSocialNorms,"YaleJournalofLawandTechnology16:59,2013,pp.59100.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    17

    differentialpricing51)isitanydifferentifthedataareusedtoofferothersabetterdealwhileyoupayfullpricebecauseyourbrandloyaltyisknowntobestrong?AboutonethirdofAmericansrent,ratherthanown,theirresidences.Thisnumbermayincreasewithtimeasaresultoflongtermeffectsofthe2007financialcrisis,aswellasagingoftheU.S.population.Todayandforeseeably,rentersarelessaffluent,onaverage,thanhomeowners.Thelawdemarcatesafinelinebetweenthepropertyrightsoflandlordsandtheprivacyrightsoftenants.Landlordshavetherighttoentertheirpropertyundervariousconditions,generallyincludingwherethetenanthasviolatedhealthorsafetycodes,ortomakerepairs.Asmoredataarecollectedwithinthehome,therightsoftenantandlandlordmayneednewadjustment.Ifenvironmentalmonitorsarefixturesofthelandlordsproperty,doesshehaveanunconditionalrighttotheirdata?Canshesellthosedata?Iftheleasesoprovides,cansheevictthetenantifthemonitorrepeatedlydetectscigarettesmoke,oracamerasensorisabletodistinguishaprohibitedpet?Ifathirdpartyoffersfacialrecognitionservicesforlandlords(nodoubtwithallkindsofcryptographicsafeguards!),canthelandlordusethesedatatoenforceleaseprovisionsagainstsublettingoradditionalresidents?Cansherequiresuchmonitoringasaconditionofthelease?Whatifthelandlordscamerasareoutsidethedoors,butkeeptrackofeveryonewhoentersorleavesherproperty?Howisthisdifferentfromthecaseofasecuritycameraacrossthestreetthatisownedbythelocalpolice?2.4Tradeoffsamongprivacy,security,andconvenienceNotionsofprivacychangegenerationally.Oneseestodaymarkeddifferencesbetweentheyoungergenerationofdigitalnativesandtheirparentsorgrandparents.Inturn,thechildrenoftodaysdigitalnativeswilllikelyhavestilldifferentattitudesabouttheflowoftheirpersonalinformation.Raisedinaworldwithdigitalassistantswhoknoweverythingaboutthem,and(onemayhope)withwisepoliciesinforcetogovernuseofthedata,futuregenerationsmayseelittlethreatinscenariosthatindividualstodaywouldfindthreatening,ifnotOrwellian.PCASTsfinalscenario,perhapsattheouterlimitofitsabilitytoprognosticate,isconstructedtoillustratethispoint.TaylorRodriguezpreparesforashortbusinesstrip.Shepackedabagthenightbeforeandputitoutsidethefrontdoorofherhomeforpickup.Noworriesthatitwillbestolen:Thecameraonthestreetlightwaswatchingit;and,inanycase,almosteveryiteminithasatinyRFIDtag.Anywouldbethiefwouldbetrackedandarrestedwithinminutes.Noristhereanyneedtogiveexplicitinstructionstothedeliverycompany,becausethecloudknowsTaylorsitineraryandplans;thebagispickedupovernightandwillbeinTaylorsdestinationhotelroombythetimeofherarrival.Taylorfinishesbreakfastandstepsoutthefrontdoor.Knowingtheschedule,thecloudhasprovidedaselfdrivingcar,waitingatthecurb.Attheairport,Taylorwalksdirectlytothegatenoneedtogothroughanysecurity.Norarethereanyformalitiesatthegate:Atwentyminuteopendoorintervalisprovidedforpassengerstostrollontotheplaneandtaketheirseats(whicheachseesindividuallyhighlightedinhisorherwearableopticaldevice).Therearenoboardingpassesandnoorganizedlines.Whybother,whenTaylorsidentity(asforeveryoneelsewhoenterstheairport)hasbeentrackedandisknownabsolutely?Whenherknowninformationemanations(phone,RFIDtagsinclothes,facialrecognition,gait,emotionalstate)areknowntothecloud,vetted,andessentiallyunforgeable?When,intheunlikelyeventthatTaylorhasbecomederangedanddangerous,manydetectablesignswouldalreadyhavebeentracked,detected,andactedon?51Seereferencesatfootnote30.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    18

    Indeed,everythingthatTaylorcarrieshasbeenscreenedfarmoreeffectivelythananyrushedairportsearchtoday.FriendlycamerasineveryLEDlightingfixtureinTaylorshousehavewatchedherdressandpack,astheydoeveryday.NormallythesedatawouldbeusedonlybyTaylorspersonaldigitalassistants,perhapstoofferremindersorfashionadvice.Asaconditionofusingtheairporttransitsystem,however,Taylorhasauthorizedtheuseofthedataforensuringairportsecurityandpublicsafety.Taylorsworldseemscreepytous.Taylorhasacceptedadifferentbalanceamongthepublicgoodsofconvenience,privacy,andsecuritythanwouldmostpeopletoday.Tayloractsintheunconsciousbelief(whetherjustifiedornot,dependingonthenatureandeffectivenessofpoliciesinforce)thatthecloudanditsroboticservantsaretrustworthyinmattersofpersonalprivacy.Insuchaworld,majorimprovementsintheconvenienceandsecurityofeverydaylifebecomepossible.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    19

    3.Collection,Analytics,andSupportingInfrastructureBigdataisbigintwodifferentsenses.Itisbiginthequantityandvarietyofdatathatareavailabletobeprocessed.And,itisbiginthescaleofanalysis(analytics)thatcanbeappliedtothosedata,ultimatelytomakeinferences.Bothkindsofbigdependontheexistenceofamassiveandwidelyavailablecomputationalinfrastructure,onethatisincreasinglybeingprovidedbycloudservices.Thischapterexpandsonthesebasicconcepts.3.1ElectronicsourcesofpersonaldataSinceearlyinthecomputerage,publicandprivateentitieshavebeenassemblingdigitalinformationaboutpeople.Databasesofpersonalinformationwerecreatedduringthedaysofbatchprocessing.52Indeed,earlydescriptionsofdatabasetechnologyoftentalkaboutpersonnelrecordsusedforpayrollapplications.Ascomputingpowerincreased,moreandmorebusinessapplicationsmovedtodigitalform.Therenowaredigitaltelephonecallrecords,creditcardtransactionrecords,bankaccountrecords,emailrepositories,andsoon.Asinteractivecomputinghasadvanced,individualshaveenteredmoreandmoredataaboutthemselves,bothforselfidentificationtoanonlineserviceandforproductivitytoolssuchasfinancialmanagementsystems.Thesedigitaldataarenormallyaccompaniedbymetadataorancillarydatathatexplainthelayoutandmeaningofthedatatheydescribe.Databaseshaveschemasandemailhasheaders,53asdonetworkpackets.54Asdatasetsbecomemorecomplex,sodotheattachedmetadata.Includedinthedataormetadatamaybeidentifyinginformationsuchasaccountnumbers,loginnames,andpasswords.Thereisnoreasontobelievethatmetadataraisefewerprivacyconcernsthanthedatatheydescribe.Inrecenttimes,thekindsofelectronicdataavailableaboutpeoplehaveincreasedsubstantially,inpartbecauseoftheemergenceofsocialmediaandinpartbecauseofthegrowthinmobiledevices,surveillancedevices,andadiversityofnetworkedsensors.Today,althoughtheymaynotbeawareofit,individualsconstantlyemitintotheenvironmentinformationwhoseuseormisusemaybeasourceofprivacyconcerns.Physically,theseinformationemanationsareoftwotypes,whichcanbecalledborndigitalorbornanalog.3.1.1BorndigitaldataWheninformationisborndigital,itiscreated,byusorbyacomputersurrogate,specificallyfordigitalusethatis,forusebyacomputerordataprocessingsystem.Examplesofdatathatareborndigitalinclude:

    emailandtextmessaging inputviamouseclicks,taps,swipes,orkeystrokesonaphone,tablet,computer,orvideogame;thatis,

    datathatpeopleintentionallyenterintoadevice52Suchdatabasesendureandformthebasisofcontinuingconcernamongprivacyadvocates.53Schemasareformaldefinitionsoftheconfigurationofadatabase:itstables,relations,andindices.Headersarethesometimesinvisibleprefacestoemailmessagesthatcontaininformationaboutthesendinganddestinationaddressesandsometimestheroutingofthepathbetweenthem.54IntheInternetandsimilarnetworks,informationisbrokenupintochunkscalledpackets,whichmaytravelindependentlyanddependonmetadatatobereassembledproperlyatthedestinationofthetransmission.

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    20

    GPSlocationdata metadataassociatedwithphonecalls:thenumbersdialedfromorto,thetimeanddurationofcalls dataassociatedwithmostcommercialtransactions:creditcardswipes,barcodereads,readsofRFID

    tags(asusedforantitheftandinventorycontrol) dataassociatedwithportalaccess(keycardorIDbadgereads)andtollroadaccess(remotereadsof

    RFIDtags) metadatathatourmobiledevicesusetostayconnectedtothenetwork,includingdevicelocationand

    status increasingly,datafromcars,televisions,appliances:theInternetofThings

    Consumertrackingdataprovideanexampleofborndigitaldatathathasbecomeeconomicallyimportant.Itisgenerallypossibleforcompaniestoaggregatelargeamountsofdataandthenusethosedataformarketing,advertising,ormanyotheractivities.Thetraditionalmechanismhasbeentousecookies,smalldatafilesthatabrowsercanleaveonauserscomputer(pioneeredbyNetscapetwodecadesago).Thetechniqueistoleaveacookiewhenauserfirstvisitsasiteandthenbeabletocorrelatethatvisitwithasubsequentevent.Thisinformationisveryvaluabletoretailersandformsthebasisofmanyoftheadvertisingbusinessesofthelastdecade.Therehasbeenavarietyofproposalstoregulatesuchtracking,55andmanycountriesrequireoptinpermissionbeforethistrackingisdone.Cookiesinvolverelativelysimplepiecesofinformationthatproponentsrepresentasunlikelytobeabused.Althoughnotalwaysawareoftheprocess,peopleacceptsuchtrackinginreturnforafreeorsubsidizedservice.56Atthesametime,cookiefreealternativesaresometimesavailable.57Evenwithoutcookies,socalledfingerprintingtechniquescanoftenidentifyauserscomputerormobiledeviceuniquelybytheinformationthatitexposespublicly,suchasthesizeofitsscreen,itsinstalledfonts,andotherfeatures.58Mosttechnologistsbelievethatapplicationswillmoveawayfromcookies,thatcookiesaretoosimpleanidea,andthattherearebetteranalyticscomingandbetterapproachesbeinginvented.Theeconomicincentivesforconsumertrackingwillremain,however,andbigdatawillallowformorepreciseresponses.Trackingisalsotheenablingtechnologyofsomemorenefarioususes.Unfortunately,manysocialnetworkingappsbeginbytakingapersonscontactlistandspammingalltherecipientswithadvertisingfortheapp.Thistechniqueisoftenabused,especiallybysmallstartupswhomayassessthevaluegainedbyreachingnewcustomersasbeinggreaterthanthevaluelosttotheirreputationforhonoringprivacy.

    55FederalTradeCommission,FTCStaffRevisesOnlineBehavioralAdvertisingPrinciples,PressRelease,February12,2009.http://www.ftc.gov/newsevents/pressreleases/2009/02/ftcstaffrevisesonlinebehavioraladvertisingprinciples56(1)Cf.TheWallStreetJournalsWhattheyknowseries(http://online.wsj.com/public/page/whattheyknowdigitalprivacy.html).(2)Turow,Joseph,TheDailyYou:HowtheAdvertisingIndustryisDefiningyourIdentityandYourWorth,YaleUniversityPress,2012.http://yalepress.yale.edu/book.asp?isbn=978030016501257DuckDuckGoisanontrackingsearchenginethat,whileperhapsyieldingfewerresultsthanleadingsearchengines,isusedbythoselookingforlesstracking.See:https://duckduckgo.com/58(1)Tanner,Adam,TheWebCookieIsDying.Here'sTheCreepierTechnologyThatComesNext,Forbes,June17,2013.http://www.forbes.com/sites/adamtanner/2013/06/17/thewebcookieisdyingheresthecreepiertechnologythatcomesnext/(2)Acar,G.etal.,FPDetective:DustingtheWebforFingerprinters,2013.http://www.cosic.esat.kuleuven.be/publications/article2334.pdf

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    21

    Allinformationthatisborndigitalsharescertaincharacteristics.Itiscreatedinidentifiableunitsforparticularpurposes.Theseunitsareinmostcasesdatapacketsofoneoranotherstandardtype.Sincetheyarecreatedbyintent,theinformationthattheycontainisusuallylimited,forreasonsofefficiencyandgoodengineeringdesign,tosupporttheimmediatepurposeforwhichtheyarecollected.Whendataareborndigital,privacyconcernscanariseintwodifferentmodes,oneobvious(overcollection),theothermorerecentandsubtle(datafusion).Overcollectionoccurswhenanengineeringdesignintentionally,andsometimesclandestinely,collectsinformationunrelatedtoitsstatedpurpose.Whileyoursmartphonecouldeasilyphotographandtransmittoathirdpartyyourfacialexpressionasyoutypeeverykeystrokeofatextmessage,orcouldcaptureallkeystrokes,therebyrecordingtextthatyouhaddeleted,thesewouldbeinefficientandunreasonablesoftwaredesignchoicesforthedefaulttextmessagingapp.Inthatcontexttheywouldbeinstancesofovercollection.ArecentexampleofovercollectionwastheBrightestFlashlightFreephoneapp,downloadedbymorethan50millionusers,whichpassedbacktoitsvendoritslocationeverytimetheflashlightwasused.Notonlyislocationinformationunnecessaryfortheilluminationfunctionofaflashlight,butitalsodisclosespersonalinformationthattheusermightwishtokeepprivate.TheFederalTradeCommissionissuedacomplaintbecausethefineprintonthenoticeandconsentscreen(seeSection4.3)hadneglectedtodisclosethatlocationinformation,whosecollectionwasdisclosed,wouldbesoldtothirdparties,suchasadvertisers.59,60Oneseesinthisexamplethelimitationsofthenoticeandconsentframework:AmoredetailedinitialfineprintdisclosurebyBrightestFlashlightFree,whichalmostnoonewouldhaveactuallyread,wouldlikelyhaveforestalledanyFTCactionwithoutmuchaffectingthenumberofdownloads.Incontrasttoovercollection,datafusionoccurswhendatafromdifferentsourcesarebroughtintocontactandnew,oftenunexpected,phenomenaemerge(seeSection3.1).Individually,eachdatasourcemayhavebeendesignedforaspecific,limitedpurpose.Butwhenmultiplesourcesareprocessedbytechniquesofmodernstatisticaldatamining,patternrecognition,andthecombiningofrecordsfromdiversesourcesbyvirtueofcommonidentifyingdata,newmeaningscanbefound.Inparticular,datafusionfrequentlyresultsintheidentificationofindividualpeople(thatis,theassociationofeventswithuniquepersonalidentities),thecreationofdatarichprofilesofanindividual,andthetrackingofanindividualsactivitiesoverdays,months,oryears.Bydefinition,theprivacychallengesfromdatafusiondonotlieintheindividualdatastreams,eachofwhosecollection,realtimeprocessing,andretentionmaybewhollynecessaryandappropriateforitsovert,immediatepurpose.Rather,theprivacychallengesareemergentpropertiesofourincreasingabilitytobringintoanalyticaljuxtapositionlarge,diversedatasetsandtoprocessthemwithnewkindsofmathematicalalgorithms.

    59FederalTradeCommission,AndroidFlashlightAppDeveloperSettlesFTCChargesItDeceivedConsumers,PressRelease,December5,2013.http://www.ftc.gov/newsevents/pressreleases/2013/12/androidflashlightappdevelopersettlesftcchargesitdeceived60(1)FTCFileNo.1323087Decisionandorder.http://www.ftc.gov/system/files/documents/cases/140409goldenshoresdo.pdf(2)FTCApprovesFinalOrderSettlingChargesAgainstFlashlightAppCreator.http://www.ftc.gov/newsevents/pressreleases/2014/04/ftcapprovesfinalordersettlingchargesagainstflashlightapp

  • BIGDATAANDPRIVACY:ATECHNOLOGICALPERSPECTIVE

    22

    3.1.2DatafromsensorsTurnnowtothesecondbroadclassofinformationemanations.Onecansaythatinformationisbornanalogwhenitarisesfromthecharacteristicsofthephysicalworld.Suchinformationdoesnotbecomeaccessibleelectronicallyuntilitimpingesonasensor,anengineereddevicethatobservesphysicaleffectsandconvertsthemtodigitalform.Themostcommonsensorsarecameras,includingvideo,whichsensevisibleelectromagneticradiation;andmicrophones,whichsensesoundandvibration.Therearemanyotherkindsofsensors,however.Today,cellphonesroutinelycontainnotonlycameras,microphones,andradiosbutalsoanalogsensorsformagneticfields(3Dcompass)andmotion(acceleration).Otherkindsofsensorsincludethoseforthermalinfrared(IR)radiation;airquality,includingtheidentificationofchemicalpollutants;barometricpressure(andaltitude);lowlevelgammaradiation;andmanyotherphenom